About character encodings

An encoding maps each character in a character set to a numeric value that can be represented by a computer. These numbers can be represented by a single bytes or multiple bytes. For example, the ASCII encoding uses seven bits to represent the Latin alphabet, punctuation, and control characters.

You use Japanese encodings, such as Shift-JIS, EUC-JP, and ISO-2022-JP, to represent Japanese text. These encodings can vary slightly, but they include a common set of approximately 10,000 characters used in Japanese.

The following terms that apply to character encodings:

The following table lists some common character encodings, however, there are many additional character encodings that browsers and web servers support:
Encoding
Type
Description
ASCII
SBCS
7-bit encoding used by English and Indonesian Bahasa languages
Latin-1
SBCS
8-bit encoding used by many Western European languages
Shift-JIS
DBCS
16-bit Japanese encoding
EUC-KR
DBCS
16-bit Korean encoding
UCS-2
DBCS
Two-byte Unicode encoding
UTF-8
MBCS
ASCII is 7-bit, European characters with diacriticals are two-byte and Asian characters are three-byte

The World Wide Web Consortium maintains a list of all character encodings supported by the Internet. You can find this information at the following URL:

http://www.w3.org/International/O-charset.html

The Unicode character encoding

ColdFusion uses the Java Unicode Standard for representing character data internally. The Unicode Standard Character encoding can represent many major languages, including ASCII, Latin-1, Shift-JIS, and others. Therefore, ColdFusion can input, store, process, and output text from all languages supported by Unicode.

By default, ColdFusion uses UTF-8 to represent text data sent to a browser. UTF-8 converts characters into a variable-length encoding. Most data is sent as a single byte, for ASCII, or as three bytes, for most other languages. One advantages of UTF-8 is that it can be recognized by systems designed to process single-byte ASCII character while being flexible enough to handle multiple-byte character representations.

While the default format of text data output by ColdFusion is UTF-8. you can set the output type of a ColdFusion page to any character set. For example, you can output text using the Japanese language Shift-JIS character set. For more information, see "Determining the character set of server output".

Comments