The request encoding is the character encoding in which parameters in an incoming request are interpreted. Currently, many browsers do not send a request encoding qualifier with the Content-Type header. In such cases, a Web container will use the default encoding--ISO-8859-1--to parse request data.
If the client hasn't set character encoding and the request data is encoded with a different encoding than the default, the data won't be interpreted correctly. To remedy this situation, you can use the ServletRequest.setCharacterEncoding(String enc) method to override the character encoding supplied by the container. This method must be called prior to reading request parameters or reading input using getReader. To control the request encoding from JSP pages, you can use the JSTL fmt:requestEncoding tag.
This method must be called prior to parsing any request parameters or reading any input from the request. Calling this method once data has been read will not affect the encoding.
There are several approaches to determining and tracking HTTP request locale:
Deduce encoding from the Accept-language HTTP header--The Accept-language header does not unambiguously indicate request encoding, but it can provide an appropriate locale for content generation. The method ServletRequest.getLocale returns a preferred Locale that the Web container chooses based on the Accept-language header value. The method ServletRequest.getLocales returns an Enumeration of Locale objects that the client will accept, based on the contents of multiple Accept-language header values. A Web component can use getLocales to select the most appropriate locale from among the available options.
On the other hand, however, this approach is unreliable because there is no unique relationship between the value of the Accept-language header and the request encoding. Most character sets may be represented in a variety of encodings. The Accept-language value, even if accurate, only narrows the range of possible encodings. For these reasons, relying on Accept-language for determining request encoding is discouraged.
HTTP defines two other relevant Accept- headers. Accept-charset is a list of character sets the browser will accept, which can be useful in choosing a response encoding. Accept-encoding is a document's so-called "content coding," usually a type of data compression. Neither of these headers indicates request encoding. See RFC 2616 listed in Section 10.9 on page 345 for details.
Provide separate application entry points for different locales--In the Web tier, one servlet may be mapped to several URLs, each corresponding to a particular locale. The URL might even contain the locale identifier; for example, http://j2eeserver/j2eeapp/login/en_US for United States English, and http://j2eeserver/j2eeapp/login/de_CH, for Swiss German. This approach is especially appropriate for applications that heavily use manually-localized JSP pages, because such pages are typically already separated by the URL namespace.
Define an application-wide encoding--If every Web component in an application transmits all of its pages in the same encoding, then requests from those pages will always be in that encoding. This approach simplifies design, but has the drawback that any component that does not set the encoding correctly will not work properly. This drawback can be eliminated using a servlet filter; see the next section for a description. As described previously in this chapter, UTF-8 encoding unifies ASCII with Unicode. Standardizing on UTF-8 is the recommended approach because it provides the broadest coverage of character sets.