Processing a request in ColdFusion

When ColdFusion receives an HTTP request for a ColdFusion page, ColdFusion resolves the request URL to a physical file and reads its contents to parse it. A ColdFusion page can be encoded in a variety of ways, using different character sets and formats.

The following figure shows an example of a client making a request to ColdFusion:

Client making a request to ColdFusion

The content of the ColdFusion page on the server can be static data (typically HTML and plain text not processed by ColdFusion), and dynamic content written in CFML. Static content is written directly to the response to the browser and the dynamic content is processed by ColdFusion.

The default language of a website might be different than from that of the person connecting to it. For example, you could connect to an English website from a French computer. When ColdFusion generates a response, the response must be formatted in the way expected by the customer. This includes both the character set of the response and the locale.

This section describes how ColdFusion determines the character set of the files that it processes, and how it determines the character set and locale of its response to the client.

Determining the character set of a ColdFusion page

When a request for a ColdFusion page occurs, the ColdFusion server opens the page, processes the static (HTML) content, processes the dynamic content (CFML), and returns the results back to the browser of the requestor. In order to process the ColdFusion page, though, ColdFusion has to interpret the page content.

One piece of information used by ColdFusion is the Byte Order Mark (BOM) in a ColdFusion page. The BOM is special a character at the beginning of a text stream that specifies the byte order (big/little endian) used by the page. The following table lists the common BOM values:
Encoding
BOM Signature
UTF-8
EE BB BF
UTF-16 Big Endian
FE FF
UTF-16 Little Endian
FF FE

To insert a BOM mark in a file, your editor must support BOM marks. Many IDEs support insertion of these character, including Macromedia Dreamweaver MX, however, ColdFusion Studio does not.

If your file does not contain a BOM, or if your IDE does not let you set one, you can use the cfprocessingdirective tag to set the character encoding of the page. However, if you insert the cfprocessingdirective tag on a page that has a BOM, the information specified by the cfprocessingdirective tag must be the same as for the BOM; otherwise ColdFusion issues an error.

The following procedure describes how ColdFusion recognizes the encoding format of a ColdFusion page.

To ColdFusion determines the page encoding:

  1. Use the BOM if specified.

    Macromedia recommends that you use BOM marks in your files.

  2. Default to the JVM system encoding.

    Typically, the JVM uses the same encoding as the operating system but you can override it.

  3. Use the pageEncoding attribute of the cfprocessingdirective tag if specified.

    If a BOM is detected in the file, it throws an error if cfprocessingdirective specifies an encoding different from the BOM.

    If there are multiple occurrences of the cfprocessingdirective tag in the same ColdFusion page, the pageEncoding attribute must specify the same setting or else ColdFusion throws an error.

    If you use the cfprocessingdirective tag, insert it as close to the top of the page as possible; for example, immediately after any cfsetting or cfsilent tag, but before any other logic.

    The cfprocessingdirective tag specifies information to the ColdFusion compiler and is evaluated when ColdFusion compiles the page, not when it executes the page. Therefore, you cannot embed the cfprocessingdirective tag within conditional logic. For example, the following code will not have any effect at execution time since the cfprocessingdirective tag will already have been evaluated:

    <cfif dynEncoding is not "dynamic encoding is not possible">
      <cfprocessingdirective pageencoding=#dynEncoding# />
    </cfif> 
    

Determining the character set of server output

As part of servicing an HTTP request, ColdFusion must determine the character set of the data returned in the HTTP response. By default, ColdFusion returns character data using the Unicode UTF-8 format.

However, within a ColdFusion page you can override the default character encoding of the response using the cfcontent tag. Use the type attribute of cfcontent to specify the MIME type of the page output, including the character set, as follows:

<cfcontent type="text/html charset=EUC-JP"> 

ColdFusion pages (meaning .cfm pages) default to using the Unicode UTF-8 format for the response even if you include the HTML meta tag in the page. Therefore, the following code will not modify the character set of the response:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
           "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" 
content="text/html; 
charset="Shift-JIS">
</head>
...

In this example, the response will still use the UTF-8 character set. Use the cfcontent tag to set the output character set.

Comments