An encoding maps each character in a character set to a numeric value that can be represented by a computer. These numbers can be represented by a single bytes or multiple bytes. For example, the ASCII encoding uses seven bits to represent the Latin alphabet, punctuation, and control characters.
You use Japanese encodings, such as Shift-JIS, EUC-JP, and ISO-2022-JP, to represent Japanese text. These encodings can vary slightly, but they include a common set of approximately 10,000 characters used in Japanese.
The following terms that apply to character encodings:
The following table lists some common character encodings, however, there are many additional character encodings that browsers and web servers support:
The World Wide Web Consortium maintains a list of all character encodings supported by the Internet. You can find this information at the following URL:
http://www.w3.org/International/O-charset.html
ColdFusion uses the Java Unicode Standard for representing character data internally. The Unicode Standard Character encoding can represent many major languages, including ASCII, Latin-1, Shift-JIS, and others. Therefore, ColdFusion can input, store, process, and output text from all languages supported by Unicode.
By default, ColdFusion uses UTF-8 to represent text data sent to a browser. UTF-8 converts characters into a variable-length encoding. Most data is sent as a single byte, for ASCII, or as three bytes, for most other languages. One advantages of UTF-8 is that it can be recognized by systems designed to process single-byte ASCII character while being flexible enough to handle multiple-byte character representations.
While the default format of text data output by ColdFusion is UTF-8. you can set the output type of a ColdFusion page to any character set. For example, you can output text using the Japanese language Shift-JIS character set. For more information, see "Determining the character set of server output".