About character encodings

An encoding maps each character in a character set to a numeric value that can be represented by a computer. These numbers can be represented by a single bytes or multiple bytes. For example, the ASCII encoding uses seven bits to represent the Latin alphabet, punctuation, and control characters.

You use Japanese encodings, such as Shift-JIS, EUC-JP, and ISO-2022-JP, to represent Japanese text. These encodings can vary slightly, but they include a common set of approximately 10,000 characters used in Japanese.

The following terms that apply to character encodings:

SBCS single-byte character set such as ASCII
DBCS double-byte character set such as Shift-JIS
MBCS multiple-byte character set

The following table lists some common character encodings, however, there are many additional character encodings that browsers and web servers support:

Encoding

Type

Description

ASCII

SBCS

7-bit encoding used by English and Indonesian Bahasa languages

Latin-1

SBCS

8-bit encoding used by many Western European languages

Shift-JIS

DBCS

16-bit Japanese encoding

EUC-KR

DBCS

16-bit Korean encoding

UCS-2

DBCS

Two-byte Unicode encoding

UTF-8

MBCS

ASCII is 7-bit, European characters with diacriticals are two-byte and Asian characters are three-byte

Encoding	Type	Description
ASCII	SBCS	7-bit encoding used by English and Indonesian Bahasa languages
Latin-1	SBCS	8-bit encoding used by many Western European languages
Shift-JIS	DBCS	16-bit Japanese encoding
EUC-KR	DBCS	16-bit Korean encoding
UCS-2	DBCS	Two-byte Unicode encoding
UTF-8	MBCS	ASCII is 7-bit, European characters with diacriticals are two-byte and Asian characters are three-byte

The World Wide Web Consortium maintains a list of all character encodings supported by the Internet. You can find this information at the following URL:

http://www.w3.org/International/O-charset.html

The Unicode character encoding

ColdFusion uses the Java Unicode Standard for representing character data internally. The Unicode Standard Character encoding can represent many major languages, including ASCII, Latin-1, Shift-JIS, and others. Therefore, ColdFusion can input, store, process, and output text from all languages supported by Unicode.

By default, ColdFusion uses UTF-8 to represent text data sent to a browser. UTF-8 converts characters into a variable-length encoding. Most data is sent as a single byte, for ASCII, or as three bytes, for most other languages. One advantages of UTF-8 is that it can be recognized by systems designed to process single-byte ASCII character while being flexible enough to handle multiple-byte character representations.

While the default format of text data output by ColdFusion is UTF-8. you can set the output type of a ColdFusion page to any character set. For example, you can output text using the Japanese language Shift-JIS character set. For more information, see "Determining the character set of server output".

Developing ColdFusion MX Applications with CFML
Developing Globalized Applications

About character encodings

The Unicode character encoding

Comments