一个关于 Web 应用国际化的问题

03-09-03 iceant
我从 Sun 的WebService 教材中看到以下这段话:

Ref:http://java.sun.com/webservices/docs/1.2/tutorial/doc/index.html

但是我觉得很奇怪,Server 端怎么才知道 Client 使用的 Charset 是什么呢?

虽然我们可以通过 accept-language Header 知道用户的语言偏好,但是这不能说明用户提交时使用的是相应的 Encoding 啊,如果不能判断用户提交时用的 Encoding 是什么,那这个 setRequestEncoding 的用处就不大了?

Request Encoding

The request encoding is the character encoding in which parameters in an incoming request are interpreted. Currently, many browsers do not send a request encoding qualifier with the Content-Type header. In such cases, a Web container will use the default encoding--ISO-8859-1--to parse request data.

If the client hasn't set character encoding and the request data is encoded with a different encoding than the default, the data won't be interpreted correctly. To remedy this situation, you can use the ServletRequest.setCharacterEncoding(String enc) method to override the character encoding supplied by the container. This method must be called prior to reading request parameters or reading input using getReader. To control the request encoding from JSP pages, you can use the JSTL fmt:requestEncoding tag.

This method must be called prior to parsing any request parameters or reading any input from the request. Calling this method once data has been read will not affect the encoding.

Njord
2003-09-03 17:42
浏览器上有设置语言的地方,应该是设置了哪种语言浏览器就使用哪种语言的默认编码方式来传送信息吧

应该每种语言有一个默认的编码方式

iceant
2003-09-03 18:06
服务器确实是没法得到 Client 的 Encoding 信息

具体描述如下:

ref: http://java.sun.com/blueprints/guidelines/designing_enterprise_applications_2e/i18n/i18n4.html

可见 Sun 是建议使用 UTF-8 做为 jsp/servlet 的 default encoding.

根据我实践的经验, UTF-8 传送到 client 的页面再提交回来时,确实是 UTF8 编码的。

除了这个方法我确实也无法想出更好的解决方案。

=======================================================

There are several approaches to determining and tracking HTTP request locale:

Deduce encoding from the Accept-language HTTP header--The Accept-language header does not unambiguously indicate request encoding, but it can provide an appropriate locale for content generation. The method ServletRequest.getLocale returns a preferred Locale that the Web container chooses based on the Accept-language header value. The method ServletRequest.getLocales returns an Enumeration of Locale objects that the client will accept, based on the contents of multiple Accept-language header values. A Web component can use getLocales to select the most appropriate locale from among the available options.

On the other hand, however, this approach is unreliable because there is no unique relationship between the value of the Accept-language header and the request encoding. Most character sets may be represented in a variety of encodings. The Accept-language value, even if accurate, only narrows the range of possible encodings. For these reasons, relying on Accept-language for determining request encoding is discouraged.

HTTP defines two other relevant Accept- headers. Accept-charset is a list of character sets the browser will accept, which can be useful in choosing a response encoding. Accept-encoding is a document's so-called "content coding," usually a type of data compression. Neither of these headers indicates request encoding. See RFC 2616 listed in Section 10.9 on page 345 for details.

Provide separate application entry points for different locales--In the Web tier, one servlet may be mapped to several URLs, each corresponding to a particular locale. The URL might even contain the locale identifier; for example, http://j2eeserver/j2eeapp/login/en_US for United States English, and http://j2eeserver/j2eeapp/login/de_CH, for Swiss German. This approach is especially appropriate for applications that heavily use manually-localized JSP pages, because such pages are typically already separated by the URL namespace.

Define an application-wide encoding--If every Web component in an application transmits all of its pages in the same encoding, then requests from those pages will always be in that encoding. This approach simplifies design, but has the drawback that any component that does not set the encoding correctly will not work properly. This drawback can be eliminated using a servlet filter; see the next section for a description. As described previously in this chapter, UTF-8 encoding unifies ASCII with Unicode. Standardizing on UTF-8 is the recommended approach because it provides the broadest coverage of character sets.

mellon
2003-09-04 11:09
MSN注册的时候,就让你选语言。

可见,是选择的啦。

猜你喜欢