mirror of git://gcc.gnu.org/git/gcc.git
				
				
				
			
		
			
				
	
	
		
			688 lines
		
	
	
		
			22 KiB
		
	
	
	
		
			XML
		
	
	
	
			
		
		
	
	
			688 lines
		
	
	
		
			22 KiB
		
	
	
	
		
			XML
		
	
	
	
<section xmlns="http://docbook.org/ns/docbook" version="5.0" 
 | 
						|
	 xml:id="std.localization.facet.codecvt" xreflabel="codecvt">
 | 
						|
<?dbhtml filename="codecvt.html"?>
 | 
						|
 | 
						|
<info><title>codecvt</title>
 | 
						|
  <keywordset>
 | 
						|
    <keyword>ISO C++</keyword>
 | 
						|
    <keyword>codecvt</keyword>
 | 
						|
  </keywordset>
 | 
						|
</info>
 | 
						|
 | 
						|
 | 
						|
 | 
						|
<para>
 | 
						|
The standard class codecvt attempts to address conversions between
 | 
						|
different character encoding schemes. In particular, the standard
 | 
						|
attempts to detail conversions between the implementation-defined wide
 | 
						|
characters (hereafter referred to as <type>wchar_t</type>) and the standard
 | 
						|
type <type>char</type> that is so beloved in classic <quote>C</quote>
 | 
						|
(which can now be referred to as narrow characters.)  This document attempts
 | 
						|
to describe how the GNU libstdc++ implementation deals with the conversion
 | 
						|
between wide and narrow characters, and also presents a framework for dealing
 | 
						|
with the huge number of other encodings that iconv can convert,
 | 
						|
including Unicode and UTF8. Design issues and requirements are
 | 
						|
addressed, and examples of correct usage for both the required
 | 
						|
specializations for wide and narrow characters and the
 | 
						|
implementation-provided extended functionality are given.
 | 
						|
</para>
 | 
						|
 | 
						|
<section xml:id="facet.codecvt.req"><info><title>Requirements</title></info>
 | 
						|
 | 
						|
 | 
						|
<para>
 | 
						|
Around page 425 of the C++ Standard, this charming heading comes into view:
 | 
						|
</para>
 | 
						|
 | 
						|
<blockquote>
 | 
						|
<para>
 | 
						|
22.2.1.5 - Template class codecvt
 | 
						|
</para>
 | 
						|
</blockquote>
 | 
						|
 | 
						|
<para>
 | 
						|
The text around the codecvt definition gives some clues:
 | 
						|
</para>
 | 
						|
 | 
						|
<blockquote>
 | 
						|
<para>
 | 
						|
<emphasis>
 | 
						|
-1- The class <code>codecvt<internT,externT,stateT></code> is for use
 | 
						|
when converting from one codeset to another, such as from wide characters
 | 
						|
to multibyte characters, between wide character encodings such as
 | 
						|
Unicode and EUC.
 | 
						|
</emphasis>
 | 
						|
</para>
 | 
						|
</blockquote>
 | 
						|
 | 
						|
<para>
 | 
						|
Hmm. So, in some unspecified way, Unicode encodings and
 | 
						|
translations between other character sets should be handled by this
 | 
						|
class.
 | 
						|
</para>
 | 
						|
 | 
						|
<blockquote>
 | 
						|
<para>
 | 
						|
<emphasis>
 | 
						|
-2- The <type>stateT</type> argument selects the pair of codesets being mapped between.
 | 
						|
</emphasis>
 | 
						|
</para>
 | 
						|
</blockquote>
 | 
						|
 | 
						|
<para>
 | 
						|
Ah ha! Another clue...
 | 
						|
</para>
 | 
						|
 | 
						|
<blockquote>
 | 
						|
<para>
 | 
						|
<emphasis>
 | 
						|
-3- The instantiations required in the Table 51 (lib.locale.category), namely
 | 
						|
<classname>codecvt<wchar_t,char,mbstate_t></classname> and
 | 
						|
<classname>codecvt<char,char,mbstate_t></classname>, convert the
 | 
						|
implementation-defined native character set.
 | 
						|
<classname>codecvt<char,char,mbstate_t></classname> implements a
 | 
						|
degenerate conversion; it does not convert at all.
 | 
						|
<classname>codecvt<wchar_t,char,mbstate_t></classname> converts between
 | 
						|
the native character sets for tiny and wide characters. Instantiations on
 | 
						|
<type>mbstate_t</type> perform conversion between encodings known to the library
 | 
						|
implementor.  Other encodings can be converted by specializing on a
 | 
						|
user-defined <type>stateT</type> type. The <type>stateT</type> object can
 | 
						|
contain any state that is useful to communicate to or from the specialized
 | 
						|
<function>do_convert</function> member.
 | 
						|
</emphasis>
 | 
						|
</para>
 | 
						|
</blockquote>
 | 
						|
 | 
						|
<para>
 | 
						|
At this point, a couple points become clear:
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
One: The standard clearly implies that attempts to add non-required
 | 
						|
(yet useful and widely used) conversions need to do so through the
 | 
						|
third template parameter, <type>stateT</type>.</para>
 | 
						|
 | 
						|
<para>
 | 
						|
Two: The required conversions, by specifying <type>mbstate_t</type> as the
 | 
						|
third template parameter, imply an implementation strategy that is mostly
 | 
						|
(or wholly) based on the underlying C library, and the functions
 | 
						|
<function>mcsrtombs</function> and <function>wcsrtombs</function> in
 | 
						|
particular.</para>
 | 
						|
</section>
 | 
						|
 | 
						|
<section xml:id="facet.codecvt.design"><info><title>Design</title></info>
 | 
						|
 | 
						|
 | 
						|
<section xml:id="codecvt.design.wchar_t_size"><info><title><type>wchar_t</type> Size</title></info>
 | 
						|
    
 | 
						|
 | 
						|
    <para>
 | 
						|
      The simple implementation detail of <type>wchar_t</type>'s size seems to
 | 
						|
      repeatedly confound people. Many systems use a two byte,
 | 
						|
      unsigned integral type to represent wide characters, and use an
 | 
						|
      internal encoding of Unicode or UCS2. (See AIX, Microsoft NT,
 | 
						|
      Java, others.) Other systems, use a four byte, unsigned integral
 | 
						|
      type to represent wide characters, and use an internal encoding
 | 
						|
      of UCS4. (GNU/Linux systems using glibc, in particular.) The C
 | 
						|
      programming language (and thus C++) does not specify a specific
 | 
						|
      size for the type <type>wchar_t</type>.
 | 
						|
    </para>
 | 
						|
 | 
						|
    <para>
 | 
						|
      Thus, portable C++ code cannot assume a byte size (or endianness) either.
 | 
						|
    </para>
 | 
						|
  </section>
 | 
						|
 | 
						|
<section xml:id="codecvt.design.unicode"><info><title>Support for Unicode</title></info>
 | 
						|
  
 | 
						|
  <para>
 | 
						|
    Probably the most frequently asked question about code conversion
 | 
						|
    is: "So dudes, what's the deal with Unicode strings?"
 | 
						|
    The dude part is optional, but apparently the usefulness of
 | 
						|
    Unicode strings is pretty widely appreciated. The Unicode character
 | 
						|
    set (and useful encodings like UTF-8, UCS-4, ISO 8859-10,
 | 
						|
    etc etc etc) were not mentioned in the first C++ standard. (The 2011
 | 
						|
    standard added support for string literals with different encodings
 | 
						|
    and some library facilities for converting between encodings, but the
 | 
						|
    notes below have not been updated to reflect that.)
 | 
						|
  </para>
 | 
						|
 | 
						|
  <para>
 | 
						|
    A couple of comments:
 | 
						|
  </para>
 | 
						|
 | 
						|
  <para>
 | 
						|
    The thought that all one needs to convert between two arbitrary
 | 
						|
    codesets is two types and some kind of state argument is
 | 
						|
    unfortunate. In particular, encodings may be stateless. The naming
 | 
						|
    of the third parameter as <type>stateT</type> is unfortunate, as what is
 | 
						|
    really needed is some kind of generalized type that accounts for the
 | 
						|
    issues that abstract encodings will need. The minimum information
 | 
						|
    that is required includes:
 | 
						|
  </para>
 | 
						|
 | 
						|
  <itemizedlist>
 | 
						|
    <listitem>
 | 
						|
      <para>
 | 
						|
	Identifiers for each of the codesets involved in the
 | 
						|
	conversion. For example, using the iconv family of functions
 | 
						|
	from the Single Unix Specification (what used to be called
 | 
						|
	X/Open) hosted on the GNU/Linux operating system allows
 | 
						|
	bi-directional mapping between far more than the following
 | 
						|
	tantalizing possibilities:
 | 
						|
      </para>
 | 
						|
 | 
						|
      <para>
 | 
						|
	(An edited list taken from <code>`iconv --list`</code> on a
 | 
						|
	Red Hat 6.2/Intel system:
 | 
						|
      </para>
 | 
						|
 | 
						|
<blockquote>
 | 
						|
<programlisting>
 | 
						|
8859_1, 8859_9, 10646-1:1993, 10646-1:1993/UCS4, ARABIC, ARABIC7,
 | 
						|
ASCII, EUC-CN, EUC-JP, EUC-KR, EUC-TW, GREEK-CCIcode, GREEK, GREEK7-OLD,
 | 
						|
GREEK7, GREEK8, HEBREW, ISO-8859-1, ISO-8859-2, ISO-8859-3,
 | 
						|
ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8,
 | 
						|
ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14,
 | 
						|
ISO-8859-15, ISO-10646, ISO-10646/UCS2, ISO-10646/UCS4,
 | 
						|
ISO-10646/UTF-8, ISO-10646/UTF8, SHIFT-JIS, SHIFT_JIS, UCS-2, UCS-4,
 | 
						|
UCS2, UCS4, UNICODE, UNICODEBIG, UNICODELIcodeLE, US-ASCII, US, UTF-8,
 | 
						|
UTF-16, UTF8, UTF16).
 | 
						|
</programlisting>
 | 
						|
</blockquote>
 | 
						|
 | 
						|
<para>
 | 
						|
For iconv-based implementations, string literals for each of the
 | 
						|
encodings (i.e. "UCS-2" and "UTF-8") are necessary,
 | 
						|
although for other,
 | 
						|
non-iconv implementations a table of enumerated values or some other
 | 
						|
mechanism may be required.
 | 
						|
</para>
 | 
						|
</listitem>
 | 
						|
 | 
						|
<listitem><para>
 | 
						|
 Maximum length of the identifying string literal.
 | 
						|
</para></listitem>
 | 
						|
 | 
						|
<listitem><para>
 | 
						|
 Some encodings require explicit endian-ness. As such, some kind
 | 
						|
  of endian marker or other byte-order marker will be necessary. See
 | 
						|
  "Footnotes for C/C++ developers" in Haible for more information on
 | 
						|
  UCS-2/Unicode endian issues. (Summary: big endian seems most likely,
 | 
						|
  however implementations, most notably Microsoft, vary.)
 | 
						|
</para></listitem>
 | 
						|
 | 
						|
<listitem><para>
 | 
						|
 Types representing the conversion state, for conversions involving
 | 
						|
  the machinery in the "C" library, or the conversion descriptor, for
 | 
						|
  conversions using iconv (such as the type iconv_t.)  Note that the
 | 
						|
  conversion descriptor encodes more information than a simple encoding
 | 
						|
  state type.
 | 
						|
</para></listitem>
 | 
						|
 | 
						|
<listitem><para>
 | 
						|
 Conversion descriptors for both directions of encoding. (i.e., both
 | 
						|
  UCS-2 to UTF-8 and UTF-8 to UCS-2.)
 | 
						|
</para></listitem>
 | 
						|
 | 
						|
<listitem><para>
 | 
						|
 Something to indicate if the conversion requested if valid.
 | 
						|
</para></listitem>
 | 
						|
 | 
						|
<listitem><para>
 | 
						|
 Something to represent if the conversion descriptors are valid.
 | 
						|
</para></listitem>
 | 
						|
 | 
						|
<listitem><para>
 | 
						|
 Some way to enforce strict type checking on the internal and
 | 
						|
  external types. As part of this, the size of the internal and
 | 
						|
  external types will need to be known.
 | 
						|
</para></listitem>
 | 
						|
</itemizedlist>
 | 
						|
</section>
 | 
						|
 | 
						|
<section xml:id="codecvt.design.issues"><info><title>Other Issues</title></info>
 | 
						|
  
 | 
						|
<para>
 | 
						|
In addition, multi-threaded and multi-locale environments also impact
 | 
						|
the design and requirements for code conversions. In particular, they
 | 
						|
affect the required specialization
 | 
						|
<classname>codecvt<wchar_t, char, mbstate_t></classname>
 | 
						|
when implemented using standard "C" functions.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
Three problems arise, one big, one of medium importance, and one small.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
First, the small: <function>mcsrtombs</function> and
 | 
						|
<function>wcsrtombs</function> may not be multithread-safe
 | 
						|
on all systems required by the GNU tools. For GNU/Linux and glibc,
 | 
						|
this is not an issue.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
Of medium concern, in the grand scope of things, is that the functions
 | 
						|
used to implement this specialization work on null-terminated
 | 
						|
strings. Buffers, especially file buffers, may not be null-terminated,
 | 
						|
thus giving conversions that end prematurely or are otherwise
 | 
						|
incorrect. Yikes!
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
The last, and fundamental problem, is the assumption of a global
 | 
						|
locale for all the "C" functions referenced above. For something like
 | 
						|
C++ iostreams (where codecvt is explicitly used) the notion of
 | 
						|
multiple locales is fundamental. In practice, most users may not run
 | 
						|
into this limitation. However, as a quality of implementation issue,
 | 
						|
the GNU C++ library would like to offer a solution that allows
 | 
						|
multiple locales and or simultaneous usage with computationally
 | 
						|
correct results. In short, libstdc++ is trying to offer, as an
 | 
						|
option, a high-quality implementation, damn the additional complexity!
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
For the required specialization
 | 
						|
<classname>codecvt<wchar_t, char, mbstate_t></classname>,
 | 
						|
conversions are made between the internal character set (always UCS4
 | 
						|
on GNU/Linux) and whatever the currently selected locale for the
 | 
						|
LC_CTYPE category implements.
 | 
						|
</para>
 | 
						|
 | 
						|
</section>
 | 
						|
 | 
						|
</section>
 | 
						|
 | 
						|
<section xml:id="facet.codecvt.impl"><info><title>Implementation</title></info>
 | 
						|
 | 
						|
 | 
						|
<para>
 | 
						|
The two required specializations are implemented as follows:
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
<code>
 | 
						|
codecvt<char, char, mbstate_t>
 | 
						|
</code>
 | 
						|
</para>
 | 
						|
<para>
 | 
						|
This is a degenerate (i.e., does nothing) specialization. Implementing
 | 
						|
this was a piece of cake.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
<code>
 | 
						|
codecvt<char, wchar_t, mbstate_t>
 | 
						|
</code>
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
This specialization, by specifying all the template parameters, pretty
 | 
						|
much ties the hands of implementors. As such, the implementation is
 | 
						|
straightforward, involving <function>mcsrtombs</function> for the conversions
 | 
						|
between <type>char</type> to <type>wchar_t</type> and
 | 
						|
<function>wcsrtombs</function> for conversions between <type>wchar_t</type>
 | 
						|
and <type>char</type>.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
Neither of these two required specializations deals with Unicode
 | 
						|
characters. As such, libstdc++ implements a partial specialization
 | 
						|
of the <type>codecvt</type> class with an iconv wrapper class,
 | 
						|
<classname>encoding_state</classname> as the third template parameter.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
This implementation should be standards conformant. First of all, the
 | 
						|
standard explicitly points out that instantiations on the third
 | 
						|
template parameter, <type>stateT</type>, are the proper way to implement
 | 
						|
non-required conversions. Second of all, the standard says (in Chapter
 | 
						|
17) that partial specializations of required classes are A-OK. Third
 | 
						|
of all, the requirements for the <type>stateT</type> type elsewhere in the
 | 
						|
standard (see 21.1.2 traits typedefs) only indicate that this type be copy
 | 
						|
constructible.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
As such, the type <type>encoding_state</type> is defined as a non-templatized,
 | 
						|
POD type to be used as the third type of a <type>codecvt</type> instantiation.
 | 
						|
This type is just a wrapper class for iconv, and provides an easy interface
 | 
						|
to iconv functionality.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
There are two constructors for <type>encoding_state</type>:
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
<code>
 | 
						|
encoding_state() : __in_desc(0), __out_desc(0)
 | 
						|
</code>
 | 
						|
</para>
 | 
						|
<para>
 | 
						|
This default constructor sets the internal encoding to some default
 | 
						|
(currently UCS4) and the external encoding to whatever is returned by
 | 
						|
<code>nl_langinfo(CODESET)</code>.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
<code>
 | 
						|
encoding_state(const char* __int, const char* __ext)
 | 
						|
</code>
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
This constructor takes as parameters string literals that indicate the
 | 
						|
desired internal and external encoding. There are no defaults for
 | 
						|
either argument.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
One of the issues with iconv is that the string literals identifying
 | 
						|
conversions are not standardized. Because of this, the thought of
 | 
						|
mandating and/or enforcing some set of pre-determined valid
 | 
						|
identifiers seems iffy: thus, a more practical (and non-migraine
 | 
						|
inducing) strategy was implemented: end-users can specify any string
 | 
						|
(subject to a pre-determined length qualifier, currently 32 bytes) for
 | 
						|
encodings. It is up to the user to make sure that these strings are
 | 
						|
valid on the target system.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
<code>
 | 
						|
void
 | 
						|
_M_init()
 | 
						|
</code>
 | 
						|
</para>
 | 
						|
<para>
 | 
						|
Strangely enough, this member function attempts to open conversion
 | 
						|
descriptors for a given encoding_state object. If the conversion
 | 
						|
descriptors are not valid, the conversion descriptors returned will
 | 
						|
not be valid and the resulting calls to the codecvt conversion
 | 
						|
functions will return error.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
<code>
 | 
						|
bool
 | 
						|
_M_good()
 | 
						|
</code>
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
Provides a way to see if the given <type>encoding_state</type> object has been
 | 
						|
properly initialized. If the string literals describing the desired
 | 
						|
internal and external encoding are not valid, initialization will
 | 
						|
fail, and this will return false. If the internal and external
 | 
						|
encodings are valid, but <function>iconv_open</function> could not allocate
 | 
						|
conversion descriptors, this will also return false. Otherwise, the object is
 | 
						|
ready to convert and will return true.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
<code>
 | 
						|
encoding_state(const encoding_state&)
 | 
						|
</code>
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
As iconv allocates memory and sets up conversion descriptors, the copy
 | 
						|
constructor can only copy the member data pertaining to the internal
 | 
						|
and external code conversions, and not the conversion descriptors
 | 
						|
themselves.
 | 
						|
</para>
 | 
						|
 | 
						|
<para>
 | 
						|
Definitions for all the required codecvt member functions are provided
 | 
						|
for this specialization, and usage of <code>codecvt<<replaceable>internal
 | 
						|
character type</replaceable>, <replaceable>external character type</replaceable>, <replaceable>encoding_state</replaceable>></code> is consistent with other
 | 
						|
codecvt usage.
 | 
						|
</para>
 | 
						|
 | 
						|
</section>
 | 
						|
 | 
						|
<section xml:id="facet.codecvt.use"><info><title>Use</title></info>
 | 
						|
 | 
						|
<para>A conversion involving a string literal.</para>
 | 
						|
 | 
						|
<programlisting>
 | 
						|
  typedef codecvt_base::result                  result;
 | 
						|
  typedef unsigned short                        unicode_t;
 | 
						|
  typedef unicode_t                             int_type;
 | 
						|
  typedef char                                  ext_type;
 | 
						|
  typedef encoding_state                          state_type;
 | 
						|
  typedef codecvt<int_type, ext_type, state_type> unicode_codecvt;
 | 
						|
 | 
						|
  const ext_type*       e_lit = "black pearl jasmine tea";
 | 
						|
  int                   size = strlen(e_lit);
 | 
						|
  int_type              i_lit_base[24] =
 | 
						|
  { 25088, 27648, 24832, 25344, 27392, 8192, 28672, 25856, 24832, 29184,
 | 
						|
    27648, 8192, 27136, 24832, 29440, 27904, 26880, 28160, 25856, 8192, 29696,
 | 
						|
    25856, 24832, 2560
 | 
						|
  };
 | 
						|
  const int_type*       i_lit = i_lit_base;
 | 
						|
  const ext_type*       efrom_next;
 | 
						|
  const int_type*       ifrom_next;
 | 
						|
  ext_type*             e_arr = new ext_type[size + 1];
 | 
						|
  ext_type*             eto_next;
 | 
						|
  int_type*             i_arr = new int_type[size + 1];
 | 
						|
  int_type*             ito_next;
 | 
						|
 | 
						|
  // construct a locale object with the specialized facet.
 | 
						|
  locale                loc(locale::classic(), new unicode_codecvt);
 | 
						|
  // sanity check the constructed locale has the specialized facet.
 | 
						|
  VERIFY( has_facet<unicode_codecvt>(loc) );
 | 
						|
  const unicode_codecvt& cvt = use_facet<unicode_codecvt>(loc);
 | 
						|
  // convert between const char* and unicode strings
 | 
						|
  unicode_codecvt::state_type state01("UNICODE", "ISO_8859-1");
 | 
						|
  initialize_state(state01);
 | 
						|
  result r1 = cvt.in(state01, e_lit, e_lit + size, efrom_next,
 | 
						|
		     i_arr, i_arr + size, ito_next);
 | 
						|
  VERIFY( r1 == codecvt_base::ok );
 | 
						|
  VERIFY( !int_traits::compare(i_arr, i_lit, size) );
 | 
						|
  VERIFY( efrom_next == e_lit + size );
 | 
						|
  VERIFY( ito_next == i_arr + size );
 | 
						|
</programlisting>
 | 
						|
 | 
						|
</section>
 | 
						|
 | 
						|
<section xml:id="facet.codecvt.future"><info><title>Future</title></info>
 | 
						|
 | 
						|
<itemizedlist>
 | 
						|
<listitem>
 | 
						|
  <para>
 | 
						|
   a. things that are sketchy, or remain unimplemented:
 | 
						|
      do_encoding, max_length and length member functions
 | 
						|
      are only weakly implemented. I have no idea how to do
 | 
						|
      this correctly, and in a generic manner.  Nathan?
 | 
						|
</para>
 | 
						|
</listitem>
 | 
						|
 | 
						|
<listitem>
 | 
						|
  <para>
 | 
						|
   b. conversions involving <type>std::string</type>
 | 
						|
  </para>
 | 
						|
   <itemizedlist>
 | 
						|
      <listitem><para>
 | 
						|
      how should operators != and == work for string of
 | 
						|
      different/same encoding?
 | 
						|
      </para></listitem>
 | 
						|
 | 
						|
      <listitem><para>
 | 
						|
      what is equal? A byte by byte comparison or an
 | 
						|
      encoding then byte comparison?
 | 
						|
      </para></listitem>
 | 
						|
 | 
						|
      <listitem><para>
 | 
						|
      conversions between narrow, wide, and unicode strings
 | 
						|
      </para></listitem>
 | 
						|
   </itemizedlist>
 | 
						|
</listitem>
 | 
						|
<listitem><para>
 | 
						|
   c. conversions involving std::filebuf and std::ostream
 | 
						|
</para>
 | 
						|
   <itemizedlist>
 | 
						|
      <listitem><para>
 | 
						|
      how to initialize the state object in a
 | 
						|
      standards-conformant manner?
 | 
						|
      </para></listitem>
 | 
						|
 | 
						|
		<listitem><para>
 | 
						|
      how to synchronize the "C" and "C++"
 | 
						|
      conversion information?
 | 
						|
      </para></listitem>
 | 
						|
 | 
						|
		<listitem><para>
 | 
						|
      wchar_t/char internal buffers and conversions between
 | 
						|
      internal/external buffers?
 | 
						|
      </para></listitem>
 | 
						|
   </itemizedlist>
 | 
						|
</listitem>
 | 
						|
</itemizedlist>
 | 
						|
</section>
 | 
						|
 | 
						|
 | 
						|
<bibliography xml:id="facet.codecvt.biblio"><info><title>Bibliography</title></info>
 | 
						|
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
    <citetitle>
 | 
						|
      The GNU C Library
 | 
						|
    </citetitle>
 | 
						|
    <author><personname><surname>McGrath</surname><firstname>Roland</firstname></personname></author>
 | 
						|
    <author><personname><surname>Drepper</surname><firstname>Ulrich</firstname></personname></author>
 | 
						|
    <copyright>
 | 
						|
      <year>2007</year>
 | 
						|
      <holder>FSF</holder>
 | 
						|
    </copyright>
 | 
						|
    <pagenums>
 | 
						|
      Chapters 6 Character Set Handling and 7 Locales and Internationalization
 | 
						|
    </pagenums>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
    <citetitle>
 | 
						|
      Correspondence
 | 
						|
    </citetitle>
 | 
						|
    <author><personname><surname>Drepper</surname><firstname>Ulrich</firstname></personname></author>
 | 
						|
    <copyright>
 | 
						|
      <year>2002</year>
 | 
						|
      <holder/>
 | 
						|
    </copyright>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
    <citetitle>
 | 
						|
      ISO/IEC 14882:1998 Programming languages - C++
 | 
						|
    </citetitle>
 | 
						|
    <copyright>
 | 
						|
      <year>1998</year>
 | 
						|
      <holder>ISO</holder>
 | 
						|
    </copyright>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
    <citetitle>
 | 
						|
      ISO/IEC 9899:1999 Programming languages - C
 | 
						|
    </citetitle>
 | 
						|
    <copyright>
 | 
						|
      <year>1999</year>
 | 
						|
      <holder>ISO</holder>
 | 
						|
    </copyright>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
      <title>
 | 
						|
	<link xmlns:xlink="http://www.w3.org/1999/xlink"
 | 
						|
	      xlink:href="http://pubs.opengroup.org/onlinepubs/9699919799/">
 | 
						|
      System Interface Definitions, Issue 7 (IEEE Std. 1003.1-2008)
 | 
						|
	</link>
 | 
						|
      </title>
 | 
						|
 | 
						|
    <copyright>
 | 
						|
      <year>2008</year>
 | 
						|
      <holder>
 | 
						|
	The Open Group/The Institute of Electrical and Electronics
 | 
						|
	Engineers, Inc.
 | 
						|
      </holder>
 | 
						|
    </copyright>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
    <citetitle>
 | 
						|
      The C++ Programming Language, Special Edition
 | 
						|
    </citetitle>
 | 
						|
    <author><personname><surname>Stroustrup</surname><firstname>Bjarne</firstname></personname></author>
 | 
						|
    <copyright>
 | 
						|
      <year>2000</year>
 | 
						|
      <holder>Addison Wesley, Inc.</holder>
 | 
						|
    </copyright>
 | 
						|
    <pagenums>Appendix D</pagenums>
 | 
						|
    <publisher>
 | 
						|
      <publishername>
 | 
						|
	Addison Wesley
 | 
						|
      </publishername>
 | 
						|
    </publisher>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
    <citetitle>
 | 
						|
      Standard C++ IOStreams and Locales
 | 
						|
    </citetitle>
 | 
						|
    <subtitle>
 | 
						|
      Advanced Programmer's Guide and Reference
 | 
						|
    </subtitle>
 | 
						|
    <author><personname><surname>Langer</surname><firstname>Angelika</firstname></personname></author>
 | 
						|
    <author><personname><surname>Kreft</surname><firstname>Klaus</firstname></personname></author>
 | 
						|
    <copyright>
 | 
						|
      <year>2000</year>
 | 
						|
      <holder>Addison Wesley Longman, Inc.</holder>
 | 
						|
    </copyright>
 | 
						|
    <publisher>
 | 
						|
      <publishername>
 | 
						|
	Addison Wesley Longman
 | 
						|
      </publishername>
 | 
						|
    </publisher>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
      <title>
 | 
						|
	<link xmlns:xlink="http://www.w3.org/1999/xlink"
 | 
						|
	      xlink:href="http://www.lysator.liu.se/c/na1.html">
 | 
						|
      A brief description of Normative Addendum 1
 | 
						|
	</link>
 | 
						|
      </title>
 | 
						|
 | 
						|
    <author><personname><surname>Feather</surname><firstname>Clive</firstname></personname></author>
 | 
						|
    <pagenums>Extended Character Sets</pagenums>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
      <title>
 | 
						|
	<link xmlns:xlink="http://www.w3.org/1999/xlink"
 | 
						|
	      xlink:href="http://tldp.org/HOWTO/Unicode-HOWTO.html">
 | 
						|
	  The Unicode HOWTO
 | 
						|
	</link>
 | 
						|
      </title>
 | 
						|
 | 
						|
    <author><personname><surname>Haible</surname><firstname>Bruno</firstname></personname></author>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
  <biblioentry>
 | 
						|
      <title>
 | 
						|
	<link xmlns:xlink="http://www.w3.org/1999/xlink"
 | 
						|
	      xlink:href="http://www.cl.cam.ac.uk/~mgk25/unicode.html">
 | 
						|
      UTF-8 and Unicode FAQ for Unix/Linux
 | 
						|
	</link>
 | 
						|
      </title>
 | 
						|
 | 
						|
 | 
						|
    <author><personname><surname>Khun</surname><firstname>Markus</firstname></personname></author>
 | 
						|
  </biblioentry>
 | 
						|
 | 
						|
</bibliography>
 | 
						|
 | 
						|
</section>
 |