Encoding
When ColdFusion receives an HTTP request for a ColdFusion page, ColdFusion resolves the request URL to a physical file path and reads the file contents to parse it. A ColdFusion page can be encoded in any character encoding supported by the JVM used by ColdFusion, but need to be specified so that ColdFusion can identify it.
The content of the ColdFusion page on the server can be static data (typically HTML and plain text not processed by ColdFusion), and dynamic content written in CFML. Static content is written directly to the response to the browser, and dynamic content is processed by ColdFusion.
The default language of a website might be different from that of the person connecting to it. For example, you could connect to an English website from a French computer. When ColdFusion generates a response, the response must be formatted in the way expected by the customer. This includes both the character set of the response and the locale.
How ColdFusion determines the character set of the files that it processes, and how it determines the character set and locale of its response to the client are described as follows:
Determining the character encoding of a ColdFusion page
When a request for a ColdFusion page occurs, ColdFusion opens the page, processes the content, and returns the results back to the browser of the requestor. To process the ColdFusion page, though, ColdFusion has to interpret the page content.
One piece of information used by ColdFusion is the Byte Order Mark (BOM) in a ColdFusion page. The BOM is a special character at the beginning of a text stream that specifies the order of bytes in multibyte characters used by the page. The following table lists the common BOM values:
|
BOM signature |
---|---|
UTF-8 |
EF BB BF |
UTF-16 Big Endian |
FE FF |
UTF-16 Little Endian |
FF FE |
o insert a BOM character in a CFML page easily, your editor must support BOM characters. Many web page development tools support insertion of these characters, including Dreamweaver, which automatically sets the BOM based on the Page Properties Document Encoding selection.
If your page does not contain a BOM, you can use the cfprocessingdirective tag to set the character encoding of the page. If you insert the cfprocessingdirective tag on a page that has a BOM, the information specified by the cfprocessingdirective tag must be the same as for the BOM; otherwise, ColdFusion issues an error.
The following procedure describes how ColdFusion recognizes the encoding format of a ColdFusion page.
Determine the page encoding (performed by ColdFusion)
-
Use the BOM, if specified on the page.Adobe recommends that you use BOM characters in your files.
-
Use the pageEncoding attribute of the cfprocessingdirective tag, if specified. For detailed information on how to use this attribute, see the cfprocessingdirective tag in the CFML Reference.
-
Default to the JVM default file character encoding. By default, this is the operating system default character encoding.
Determining the page encoding of server output
Before ColdFusion can return a response to the client, it must determine the encoding to use for the data in the response. By default, ColdFusion returns character data using the Unicode UTF-8 format.
ColdFusion pages ( .cfm pages) default to using the Unicode UTF-8 format for the response, even if you include the HTML meta tag in the page. Therefore, the following example does*not* modify the character set of the response:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Untitled Document</title> <meta http-equiv="Content-Type" content="text/html; charset="Shift_JIS"> </head> ...
In this example, the response still uses the UTF-8 character set. Use the cfcontent tag to set the output character set.
However, within a ColdFusion page you can use the cfcontent tag to override the default character encoding of the response. Use the type attribute of the cfcontent tag to specify the MIME type of the page output, including the character set, as follows:
<cfcontent type="text/html charset=EUC-JP">
ColdFusion also provides attributes that let you specify the encoding of specific elements, such as HTTP requests, request headers, files, and mail messages. For more information, see Tags and functions for controlling character encoding in Tags and functions for globalizing applications and Handling data in ColdFusion.
The rest of this chapter describes ColdFusion tags and functions that you use for globalization, and discusses specific globalization issues.