anonymous
2004-09-26 16:38
你可能还有些迷惑,那么你再看看一下官方说明的内容,

11.1 Character Sets and Collations in General

A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.

Suppose that we have an alphabet with four letters: `A', `B', `a', `b'. We give each letter a number: `A' = 0, `B' = 1, `a' = 2, `c' = 3. The letter `A' is a symbol, the number 0 is the encoding for `A', and the combination of all four letters and their encodings is a character set.

Now, suppose that we want to compare two string values, `A' and `B'. The simplest way to do this is to look at the encodings: 0 for `A' and 1 for `B'. Because 0 is less than 1, we say `A' is less than `B'. Now, what we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): ``compare the encodings.'' We call this simplest of all possible collations a binary collation.

But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters `a' and `b' as equivalent to `A' and `B'; (2) then compare the encodings. We call this a case-insensitive collation. It's a little more complex than a binary collation.

In real life, most character sets have many characters: not just `A' and `B' but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules: not just case insensitivity but also accent insensitivity (an ``accent'' is a mark attached to a character as in German `Ö') and multiple-character mappings (such as the rule that `Ö' = `OE' in one of the two German collations).

MySQL 4.1 can do these things for you:

・ Store strings using a variety of character sets

・ Compare strings using a variety of collations

・ Mix strings with different character sets or collations in the same server, the same database, or even the same table

・ Allow specification of character set and collation at any level

In these respects, not only is MySQL 4.1 far more flexible than MySQL 4.0, it also is far ahead of other DBMSs. However, to use the new features effectively, you will need to learn what character sets and collations are available, how to change their defaults, and what the various string operators do with them.

Character Sets and Collations in MySQL

The MySQL server can support multiple character sets. To list the available character sets, use the SHOW CHARACTER SET statement:

mysql> SHOW CHARACTER SET;

+----------+-----------------------------+---------------------+

| Charset | Description | Default collation |

+----------+-----------------------------+---------------------+

| big5 | Big5 Traditional Chinese | big5_chinese_ci |

| dec8 | DEC West European | dec8_swedish_ci |

| cp850 | DOS West European | cp850_general_ci |

| hp8 | HP West European | hp8_english_ci |

| koi8r | KOI8-R Relcom Russian | koi8r_general_ci |

| latin1 | ISO 8859-1 West European | latin1_swedish_ci |

| latin2 | ISO 8859-2 Central European | latin2_general_ci |

...

The output actually includes another column that is not shown so that the example fits better on the page.

Any given character set always has at least one collation. It may have several collations.

To list the collations for a character set, use the SHOW COLLATION statement. For example, to see the collations for the latin1 (``ISO-8859-1 West European'') character set, use this statement to find those collation names that begin with latin1:

mysql> SHOW COLLATION LIKE 'latin1%';

+-------------------+---------+----+---------+----------+---------+

| Collation | Charset | Id | Default | Compiled | Sortlen |

+-------------------+---------+----+---------+----------+---------+

| latin1_german1_ci | latin1 | 5 | | | 0 |

| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |

| latin1_danish_ci | latin1 | 15 | | | 0 |

| latin1_german2_ci | latin1 | 31 | | Yes | 2 |

| latin1_bin | latin1 | 47 | | Yes | 1 |

| latin1_general_ci | latin1 | 48 | | | 0 |

| latin1_general_cs | latin1 | 49 | | | 0 |

| latin1_spanish_ci | latin1 | 94 | | | 0 |

+-------------------+---------+----+---------+----------+---------+

The latin1 collations have the following meanings:

Collation Meaning

latin1_bin Binary according to latin1 encoding

latin1_danish_ci Danish/Norwegian

latin1_general_ci Multilingual

latin1_general_cs Multilingual, case sensitive

latin1_german1_ci German DIN-1

latin1_german2_ci German DIN-2

latin1_spanish_ci Modern Spanish

latin1_swedish_ci Swedish/Finnish

Collations have these general characteristics:

・ Two different character sets cannot have the same collation.

・ Each character set has one collation that is the default collation. For example, the default collation for latin1 is latin1_swedish_ci.

・ There is a convention for collation names: They start with the name of the character set with which they are associated, they usually include a language name, and they end with _ci (case insensitive), _cs (case sensitive), or _bin (binary).

Previous / Next / Up / Table of Contents

11.3 Determining the Default Character Set and Collation

There are default settings for character sets and collations at four levels: server, database, table, and connection. The following description may appear complex, but it has been found in practice that multiple-level defaulting leads to natural and obvious results.

Subsections

・ 11.3.1 Server Character Set and Collation

・ 11.3.2 Database Character Set and Collation

・ 11.3.3 Table Character Set and Collation

・ 11.3.4 Column Character Set and Collation

・ 11.3.5 Examples of Character Set and Collation Assignment

・ 11.3.6 Connection Character Sets and Collations

・ 11.3.7 Character String Literal Character Set and Collation

・ 11.3.8 Using COLLATE in SQL Statements

・ 11.3.9 COLLATE Clause Precedence

・ 11.3.10 BINARY Operator

・ 11.3.11 Some Special Cases Where the Collation Determination Is Tricky

・ 11.3.12 Collations Must Be for the Right Character

11.3.1 Server Character Set and Collation

The MySQL Server has a server character set and a server collation, which may not be null.

MySQL determines the server character set and server collation thus:

・ According to the option settings in effect when the server starts

・ According to the values set at runtime

At the server level, the decision is simple. The server character set and collation depend initially on the options that you use when you start mysqld. You can usec for the character set, and along with it you can add --default-collation for the collation. If you don't specify a character set, that is the same as saying --default-character-set=latin1. If you specify only a character set (for example, latin1) but not a collation, that is the same as saying --default-charset=latin1 --default-collation=latin1_swedish_ci because latin1_swedish_ci is the default collation for latin1. Therefore, the following three commands all have the same effect:

shell> mysqld

shell> mysqld-nt --default-character-set=utf8

shell> mysqld --default-character-set=latin1 \

--default-collation=latin1_swedish_ci

collation_connection utf8_general_ci

One way to change the settings is by recompiling. If you want to change the default server character set and collation when building from sources, use: --with-charset and --with-collation as arguments for configure. For example:

shell> ./configure --with-charset=latin1

Or:

shell> ./configure --with-charset=latin1 \

--with-collation=latin1_german1_ci

Both mysqld and configure verify that the character set/collation combination is valid. If not, each program displays an error message and terminates.

The current server character set and collation are available as the values of the character_set_server and collation_server system variables. These variables can be changed at runtime.

Database Character Set and Collation

Every database has a database character set and a database collation, which may not be null. The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation:

CREATE DATABASE db_name

[[DEFAULT] CHARACTER SET charset_name]

[[DEFAULT] COLLATE collation_name]

ALTER DATABASE db_name

[[DEFAULT] CHARACTER SET charset_name]

[[DEFAULT] COLLATE collation_name]

Example:

CREATE DATABASE db_name

DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;

MySQL chooses the database character set and database collation thus:

・ If both CHARACTER SET X and COLLATE Y were specified, then character set X and collation Y.

・ If CHARACTER SET X was specified without COLLATE, then character set X and its default collation.

・ Otherwise, the server character set and server collation.

MySQL's CREATE DATABASE ... DEFAULT CHARACTER SET ... syntax is analogous to the standard SQL CREATE SCHEMA ... CHARACTER SET ... syntax. Because of this, it is possible to create databases with different character sets and collations on the same MySQL server.

The database character set and collation are used as default values if the table character set and collation are not specified in CREATE TABLE statements. They have no other purpose.

The character set and collation for the default database are available as the values of the character_set_database and collation_database system variables. The server sets these variables whenever the default database changes. If there is no default database, the variables have the same value as the corresponding server-level variables, character_set_server and collation_server.

Table Character Set and Collation

Every table has a table character set and a table collation, which may not be null. The CREATE TABLE and ALTER TABLE statements have optional clauses for specifying the table character set and collation:

CREATE TABLE tbl_name (column_list)

[DEFAULT CHARACTER SET charset_name [COLLATE collation_name]]

ALTER TABLE tbl_name

[DEFAULT CHARACTER SET charset_name] [COLLATE collation_name]

Example:

CREATE TABLE t1 ( ... )

DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

MySQL chooses the table character set and collation thus:

・ If both CHARACTER SET X and COLLATE Y were specified, then character set X and collation Y.

・ If CHARACTER SET X was specified without COLLATE, then character set X and its default collation.

・ Otherwise, the database character set and collation.

The table character set and collation are used as default values if the column character set and collation are not specified in individual column definitions. The table character set and collation are MySQL extensions; there are no such things in standard SQL.

Column Character Set and Collation

EEvery ``character'' column (that is, a column of type CHAR, VARCHAR, or TEXT) has a column character set and a column collation, which may not be null. Column definition syntax has optional clauses for specifying the column character set and collation:

col_name {CHAR | VARCHAR | TEXT} (col_length)

[CHARACTER SET charset_name [COLLATE collation_name]]

Example:

CREATE TABLE Table1

(

column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_german1_ci

);

MySQL chooses the column character set and collation thus:

・ If both CHARACTER SET X and COLLATE Y were specified, then character set X and collation Y.

・ If CHARACTER SET X was specified without COLLATE, then character set X and its default collation.

・ Otherwise, the table character set and collation.

The CHARACTER SET and COLLATE clauses are standard SQL.

A tip:

ALTER TABLE

-------------------

ALTER TABLE table_x modify column column_x ortography varchar(255) CHARACTER SET latin1 COLLATE latin1_general_ci NOT NULL;

Examples of Character Set and Collation Assignment

The following examples show how MySQL determines default character set and collation values.

Example 1: Table + Column Definition

CREATE TABLE t1

(

c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci

) DEFAULT CHARACTER SET latin2 COLLATE latin2_bin;

Here we have a column with a latin1 character set and a latin1_german1_ci collation. The definition is explicit, so that's straightforward. Notice that there's no problem storing a latin1 column in a latin2 table.

Example 2: Table + Column Definition

CREATE TABLE t1

(

c1 CHAR(10) CHARACTER SET latin1

) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

This time we have a column with a latin1 character set and a default collation. Now, although it might seem natural, the default collation is not taken from the table level. Instead, because the default collation for latin1 is always latin1_swedish_ci, column c1 will have a collation of latin1_swedish_ci (not latin1_danish_ci).

Example 3: Table + Column Definition

CREATE TABLE t1

(

c1 CHAR(10)

) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

We have a column with a default character set and a default collation. In this circumstance, MySQL looks up to the table level for inspiration in determining the column character set and collation. So, the character set for column c1 is latin1 and its collation is latin1_danish_ci.

Example 4: Database + Table + Column Definition

CREATE DATABASE d1

DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci;

USE d1;

CREATE TABLE t1

(

c1 CHAR(10)

);

We create a column without specifying its character set and collation. We're also not specifying a character set and a collation at the table level. In this circumstance, MySQL looks up to the database level for inspiration. (The database's settings become the table's settings, and thereafter become the column's setting.) So, the character set for column c1 is latin2 and its collation is latin2_czech_ci.

Connection Character Sets and Collations

Several character set and collation system variables relate to a client's interaction with the server. Some of these have already been mentioned in earlier sections:

・ The server character set and collation are available as the values of the character_set_server and collation_server variables.

・ The character set and collation of the default database are available as the values of the character_set_database and collation_database variables.

Additional character set and collation variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation variables.

Consider what a ``connection'' is: It's what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets, over the connection back to the client. This leads to several questions about character set and collation handling for client connections, each of which can be answered in terms of system variables:

・ What character set is the query in when it leaves the client? The server takes the character_set_client variable to be the character set in which queries are sent by the client.

・ What character set should the server translate a query to after receiving it? For this, character_set_connection and collation_connection are used by the server. It converts queries sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, it does not matter because columns have a higher collation precedence.

・ What character set should the server translate to before shipping result sets or error messages back to the client? The character_set_results variable indicates the character set in which the server returns query results to the client. This includes result data such as column values, and result metadata such as column names.

You can fine-tune the settings for these variables, or you can depend on the defaults (in which case, you can skip this section).

There are two statements that affect the connection character sets:

SET NAMES 'charset_name'

SET CHARACTER SET charset_name

SET NAMES indicates what is in the SQL statements that the client sends. Thus, SET NAMES 'cp1251' tells the server ``future incoming messages from this client will be in character set cp1251.'' It also specifies the character set for results that the server sends back to the client. (For example, it indicates what character set column values will have if you use a SELECT statement.)

A SET NAMES 'x' statement is equivalent to these three statements:

mysql> SET character_set_client = x;

mysql> SET character_set_results = x;

mysql> SET character_set_connection = x;

Setting character_set_connection to x also sets collation_connection to the default collation for x.

SET CHARACTER SET is similar but sets the connection character set and collation to be those of the default database. A SET CHARACTER SET x statement is equivalent to these three statements:

mysql> SET character_set_client = x;

mysql> SET character_set_results = x;

mysql> SET collation_connection = @@collation_database;

When a client connects, it sends to the server the name of the character set that it wants to use. The server sets the character_set_client, character_set_results, and character_set_connection variables to that character set. (In effect, the server performs a SET NAMES operation using the character set.)

With the mysql client, it is not necessary to execute SET NAMES every time you start up if you want to use a character set different from the default. You can add the --default-character-set option setting to your mysql statement line, or in your option file. For example, the following option file setting changes the three character set variables set to koi8r each time you run mysql:

[mysql]

default-character-set=koi8r

Example: Suppose that column1 is defined as CHAR(5) CHARACTER SET latin2. If you do not say SET NAMES or SET CHARACTER SET, then for SELECT column1 FROM t, the server will send back all the values for column1 using the character set that the client specified when it connected. On the other hand, if you say SET NAMES 'latin1' or SET CHARACTER SET latin1, then just before sending results back, the server will convert the latin2 values to latin1. Conversion may be lossy if there are characters that are not in both character sets.

If you do not want the server to perform any conversion, set character_set_results to NULL:

mysql> SET character_set_results = NULL;

--------------------------------------------

常用命令:

mysqld-nt.exe --default-character-set=utf8

SHOW COLLATION LIKE 'utf8%';

show variavles;

anonymous
2004-09-26 16:51
对于jsp页我们也要统一指定,

<%@ page language="java" contentType="text/html; charset=UTF-8" %>

对于每个请求我们也要进行统一过滤,可以参考jdon的相应文章。

java的编码方式在jbuider中也应用utf-8。

不知大家还有什么问题,欢迎提出来。我看了很多国外的文章,里面很少有为我们中国人本地化的,可能咱们的技术还不行吧。起码,在国外论坛中不是很活跃,所以也引不起多大风波了。希望大家多到国外去看看。很多事情,技术,到了国内就变的很失真了,都道听途说,不爱看英文文档,老是在国内搜索,浪费时间。

anonymous
2004-09-26 18:03
我的qq:27222328欢迎交流。

猜你喜欢
2Go 上一页 1 2