mirror of git://gcc.gnu.org/git/gcc.git
				
				
				
			
		
			
				
	
	
		
			751 lines
		
	
	
		
			48 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			751 lines
		
	
	
		
			48 KiB
		
	
	
	
		
			HTML
		
	
	
	
| <?xml version="1.0" encoding="UTF-8" standalone="no"?>
 | ||
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Facets</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="localization.html" title="Chapter 8.  Localization" /><link rel="prev" href="localization.html" title="Chapter 8.  Localization" /><link rel="next" href="containers.html" title="Chapter 9.  Containers" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Facets</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="localization.html">Prev</a> </td><th width="60%" align="center">Chapter 8. 
 | ||
|   Localization
 | ||
|   
 | ||
| </th><td width="20%" align="right"> <a accesskey="n" href="containers.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="std.localization.facet"></a>Facets</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.ctype"></a>ctype</h3></div></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.impl"></a>Implementation</h4></div></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="facet.ctype.impl.spec"></a>Specializations</h5></div></div></div><p>
 | ||
| For the required specialization <code class="classname">codecvt<wchar_t, char, mbstate_t></code>,
 | ||
| conversions are made between the internal character set (always UCS4
 | ||
| on GNU/Linux) and whatever the currently selected locale for the
 | ||
| <code class="code">LC_CTYPE</code> category implements.
 | ||
| </p><p>
 | ||
| The two required specializations are implemented as follows:
 | ||
| </p><p>
 | ||
| <code class="code">
 | ||
| ctype<char>
 | ||
| </code>
 | ||
| </p><p>
 | ||
| This is simple specialization. Implementing this was a piece of cake.
 | ||
| </p><p>
 | ||
| <code class="code">
 | ||
| ctype<wchar_t>
 | ||
| </code>
 | ||
| </p><p>
 | ||
| This specialization, by specifying all the template parameters, pretty
 | ||
| much ties the hands of implementors. As such, the implementation is
 | ||
| straightforward, involving <code class="function">mcsrtombs</code> for the
 | ||
| conversions between <span class="type">char</span> to <span class="type">wchar_t</span> and
 | ||
| <code class="function">wcsrtombs</code> for conversions between <span class="type">wchar_t</span>
 | ||
| and <span class="type">char</span>.
 | ||
| </p><p>
 | ||
| Neither of these two required specializations deals with Unicode
 | ||
| characters.
 | ||
| </p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.future"></a>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
 | ||
|    How to deal with the global locale issue?
 | ||
|    </p></li><li class="listitem"><p>
 | ||
|    How to deal with types other than <span class="type">char</span>, <span class="type">wchar_t</span>?
 | ||
|    </p></li><li class="listitem"><p>
 | ||
|    Overlap between codecvt/ctype: narrow/widen
 | ||
|    </p></li><li class="listitem"><p>
 | ||
|        <span class="type">mask</span> typedef in <code class="classname">codecvt_base</code>,
 | ||
|        argument types in <span class="type">codecvt</span>.  what is know about this type?
 | ||
|    </p></li><li class="listitem"><p>
 | ||
|    Why mask* argument in codecvt?
 | ||
|    </p></li><li class="listitem"><p>
 | ||
|        Can this be made (more) generic? is there a simple way to
 | ||
|        straighten out the configure-time mess that is a by-product of
 | ||
|        this class?
 | ||
|    </p></li><li class="listitem"><p>
 | ||
|        Get the <span class="type">ctype<wchar_t>::mask</span> stuff under control.
 | ||
|        Need to make some kind of static table, and not do lookup every time
 | ||
|        somebody hits the <code class="code">do_is...</code> functions. Too bad we can't
 | ||
|        just redefine <span class="type">mask</span> for
 | ||
|        <code class="classname">ctype<wchar_t></code>
 | ||
|    </p></li><li class="listitem"><p>
 | ||
|        Rename abstract base class. See if just smash-overriding is a
 | ||
|        better approach. Clarify, add sanity to naming.
 | ||
|      </p></li></ul></div></div><div class="bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.biblio"></a>Bibliography</h4></div></div></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.2"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       The GNU C Library
 | ||
|     </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2007 FSF. </span><span class="pagenums">Chapters 6  Character Set Handling and 7 Locales and Internationalization. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.3"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       Correspondence
 | ||
|     </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2002 . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.4"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       ISO/IEC 14882:1998 Programming languages - C++
 | ||
|     </em>. </span><span class="copyright">Copyright © 1998 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.5"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       ISO/IEC 9899:1999 Programming languages - C
 | ||
|     </em>. </span><span class="copyright">Copyright © 1999 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.6"></a><p><span class="title"><em>
 | ||
| 	<a class="link" href="http://www.unix.org/version3/ieee_std.html" target="_top">
 | ||
| 	The Open Group Base Specifications, Issue 6 (IEEE Std. 1003.1-2004)
 | ||
| 	</a>
 | ||
|       </em>. </span><span class="copyright">Copyright © 1999 
 | ||
|       The Open Group/The Institute of Electrical and Electronics Engineers, Inc.. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.7"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       The C++ Programming Language, Special Edition
 | ||
|     </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername">
 | ||
| 	Addison Wesley
 | ||
|       . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.2.4.8"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       Standard C++ IOStreams and Locales
 | ||
|     </em>. </span><span class="subtitle">
 | ||
|       Advanced Programmer's Guide and Reference
 | ||
|     . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername">
 | ||
| 	Addison Wesley Longman
 | ||
|       . </span></span></p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.codecvt"></a>codecvt</h3></div></div></div><p>
 | ||
| The standard class codecvt attempts to address conversions between
 | ||
| different character encoding schemes. In particular, the standard
 | ||
| attempts to detail conversions between the implementation-defined wide
 | ||
| characters (hereafter referred to as <span class="type">wchar_t</span>) and the standard
 | ||
| type <span class="type">char</span> that is so beloved in classic <span class="quote">“<span class="quote">C</span>”</span>
 | ||
| (which can now be referred to as narrow characters.)  This document attempts
 | ||
| to describe how the GNU libstdc++ implementation deals with the conversion
 | ||
| between wide and narrow characters, and also presents a framework for dealing
 | ||
| with the huge number of other encodings that iconv can convert,
 | ||
| including Unicode and UTF8. Design issues and requirements are
 | ||
| addressed, and examples of correct usage for both the required
 | ||
| specializations for wide and narrow characters and the
 | ||
| implementation-provided extended functionality are given.
 | ||
| </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.req"></a>Requirements</h4></div></div></div><p>
 | ||
| Around page 425 of the C++ Standard, this charming heading comes into view:
 | ||
| </p><div class="blockquote"><blockquote class="blockquote"><p>
 | ||
| 22.2.1.5 - Template class codecvt
 | ||
| </p></blockquote></div><p>
 | ||
| The text around the codecvt definition gives some clues:
 | ||
| </p><div class="blockquote"><blockquote class="blockquote"><p>
 | ||
| <span class="emphasis"><em>
 | ||
| -1- The class <code class="code">codecvt<internT,externT,stateT></code> is for use
 | ||
| when converting from one codeset to another, such as from wide characters
 | ||
| to multibyte characters, between wide character encodings such as
 | ||
| Unicode and EUC.
 | ||
| </em></span>
 | ||
| </p></blockquote></div><p>
 | ||
| Hmm. So, in some unspecified way, Unicode encodings and
 | ||
| translations between other character sets should be handled by this
 | ||
| class.
 | ||
| </p><div class="blockquote"><blockquote class="blockquote"><p>
 | ||
| <span class="emphasis"><em>
 | ||
| -2- The <span class="type">stateT</span> argument selects the pair of codesets being mapped between.
 | ||
| </em></span>
 | ||
| </p></blockquote></div><p>
 | ||
| Ah ha! Another clue...
 | ||
| </p><div class="blockquote"><blockquote class="blockquote"><p>
 | ||
| <span class="emphasis"><em>
 | ||
| -3- The instantiations required in the Table 51 (lib.locale.category), namely
 | ||
| <code class="classname">codecvt<wchar_t,char,mbstate_t></code> and
 | ||
| <code class="classname">codecvt<char,char,mbstate_t></code>, convert the
 | ||
| implementation-defined native character set.
 | ||
| <code class="classname">codecvt<char,char,mbstate_t></code> implements a
 | ||
| degenerate conversion; it does not convert at all.
 | ||
| <code class="classname">codecvt<wchar_t,char,mbstate_t></code> converts between
 | ||
| the native character sets for tiny and wide characters. Instantiations on
 | ||
| <span class="type">mbstate_t</span> perform conversion between encodings known to the library
 | ||
| implementor.  Other encodings can be converted by specializing on a
 | ||
| user-defined <span class="type">stateT</span> type. The <span class="type">stateT</span> object can
 | ||
| contain any state that is useful to communicate to or from the specialized
 | ||
| <code class="function">do_convert</code> member.
 | ||
| </em></span>
 | ||
| </p></blockquote></div><p>
 | ||
| At this point, a couple points become clear:
 | ||
| </p><p>
 | ||
| One: The standard clearly implies that attempts to add non-required
 | ||
| (yet useful and widely used) conversions need to do so through the
 | ||
| third template parameter, <span class="type">stateT</span>.</p><p>
 | ||
| Two: The required conversions, by specifying <span class="type">mbstate_t</span> as the
 | ||
| third template parameter, imply an implementation strategy that is mostly
 | ||
| (or wholly) based on the underlying C library, and the functions
 | ||
| <code class="function">mcsrtombs</code> and <code class="function">wcsrtombs</code> in
 | ||
| particular.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.design"></a>Design</h4></div></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.wchar_t_size"></a><span class="type">wchar_t</span> Size</h5></div></div></div><p>
 | ||
|       The simple implementation detail of <span class="type">wchar_t</span>'s size seems to
 | ||
|       repeatedly confound people. Many systems use a two byte,
 | ||
|       unsigned integral type to represent wide characters, and use an
 | ||
|       internal encoding of Unicode or UCS2. (See AIX, Microsoft NT,
 | ||
|       Java, others.) Other systems, use a four byte, unsigned integral
 | ||
|       type to represent wide characters, and use an internal encoding
 | ||
|       of UCS4. (GNU/Linux systems using glibc, in particular.) The C
 | ||
|       programming language (and thus C++) does not specify a specific
 | ||
|       size for the type <span class="type">wchar_t</span>.
 | ||
|     </p><p>
 | ||
|       Thus, portable C++ code cannot assume a byte size (or endianness) either.
 | ||
|     </p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.unicode"></a>Support for Unicode</h5></div></div></div><p>
 | ||
|     Probably the most frequently asked question about code conversion
 | ||
|     is: "So dudes, what's the deal with Unicode strings?"
 | ||
|     The dude part is optional, but apparently the usefulness of
 | ||
|     Unicode strings is pretty widely appreciated. The Unicode character
 | ||
|     set (and useful encodings like UTF-8, UCS-4, ISO 8859-10,
 | ||
|     etc etc etc) were not mentioned in the first C++ standard. (The 2011
 | ||
|     standard added support for string literals with different encodings
 | ||
|     and some library facilities for converting between encodings, but the
 | ||
|     notes below have not been updated to reflect that.)
 | ||
|   </p><p>
 | ||
|     A couple of comments:
 | ||
|   </p><p>
 | ||
|     The thought that all one needs to convert between two arbitrary
 | ||
|     codesets is two types and some kind of state argument is
 | ||
|     unfortunate. In particular, encodings may be stateless. The naming
 | ||
|     of the third parameter as <span class="type">stateT</span> is unfortunate, as what is
 | ||
|     really needed is some kind of generalized type that accounts for the
 | ||
|     issues that abstract encodings will need. The minimum information
 | ||
|     that is required includes:
 | ||
|   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
 | ||
| 	Identifiers for each of the codesets involved in the
 | ||
| 	conversion. For example, using the iconv family of functions
 | ||
| 	from the Single Unix Specification (what used to be called
 | ||
| 	X/Open) hosted on the GNU/Linux operating system allows
 | ||
| 	bi-directional mapping between far more than the following
 | ||
| 	tantalizing possibilities:
 | ||
|       </p><p>
 | ||
| 	(An edited list taken from <code class="code">`iconv --list`</code> on a
 | ||
| 	Red Hat 6.2/Intel system:
 | ||
|       </p><div class="blockquote"><blockquote class="blockquote"><pre class="programlisting">
 | ||
| 8859_1, 8859_9, 10646-1:1993, 10646-1:1993/UCS4, ARABIC, ARABIC7,
 | ||
| ASCII, EUC-CN, EUC-JP, EUC-KR, EUC-TW, GREEK-CCIcode, GREEK, GREEK7-OLD,
 | ||
| GREEK7, GREEK8, HEBREW, ISO-8859-1, ISO-8859-2, ISO-8859-3,
 | ||
| ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8,
 | ||
| ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14,
 | ||
| ISO-8859-15, ISO-10646, ISO-10646/UCS2, ISO-10646/UCS4,
 | ||
| ISO-10646/UTF-8, ISO-10646/UTF8, SHIFT-JIS, SHIFT_JIS, UCS-2, UCS-4,
 | ||
| UCS2, UCS4, UNICODE, UNICODEBIG, UNICODELIcodeLE, US-ASCII, US, UTF-8,
 | ||
| UTF-16, UTF8, UTF16).
 | ||
| </pre></blockquote></div><p>
 | ||
| For iconv-based implementations, string literals for each of the
 | ||
| encodings (i.e. "UCS-2" and "UTF-8") are necessary,
 | ||
| although for other,
 | ||
| non-iconv implementations a table of enumerated values or some other
 | ||
| mechanism may be required.
 | ||
| </p></li><li class="listitem"><p>
 | ||
|  Maximum length of the identifying string literal.
 | ||
| </p></li><li class="listitem"><p>
 | ||
|  Some encodings require explicit endian-ness. As such, some kind
 | ||
|   of endian marker or other byte-order marker will be necessary. See
 | ||
|   "Footnotes for C/C++ developers" in Haible for more information on
 | ||
|   UCS-2/Unicode endian issues. (Summary: big endian seems most likely,
 | ||
|   however implementations, most notably Microsoft, vary.)
 | ||
| </p></li><li class="listitem"><p>
 | ||
|  Types representing the conversion state, for conversions involving
 | ||
|   the machinery in the "C" library, or the conversion descriptor, for
 | ||
|   conversions using iconv (such as the type iconv_t.)  Note that the
 | ||
|   conversion descriptor encodes more information than a simple encoding
 | ||
|   state type.
 | ||
| </p></li><li class="listitem"><p>
 | ||
|  Conversion descriptors for both directions of encoding. (i.e., both
 | ||
|   UCS-2 to UTF-8 and UTF-8 to UCS-2.)
 | ||
| </p></li><li class="listitem"><p>
 | ||
|  Something to indicate if the conversion requested if valid.
 | ||
| </p></li><li class="listitem"><p>
 | ||
|  Something to represent if the conversion descriptors are valid.
 | ||
| </p></li><li class="listitem"><p>
 | ||
|  Some way to enforce strict type checking on the internal and
 | ||
|   external types. As part of this, the size of the internal and
 | ||
|   external types will need to be known.
 | ||
| </p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.issues"></a>Other Issues</h5></div></div></div><p>
 | ||
| In addition, multi-threaded and multi-locale environments also impact
 | ||
| the design and requirements for code conversions. In particular, they
 | ||
| affect the required specialization
 | ||
| <code class="classname">codecvt<wchar_t, char, mbstate_t></code>
 | ||
| when implemented using standard "C" functions.
 | ||
| </p><p>
 | ||
| Three problems arise, one big, one of medium importance, and one small.
 | ||
| </p><p>
 | ||
| First, the small: <code class="function">mcsrtombs</code> and
 | ||
| <code class="function">wcsrtombs</code> may not be multithread-safe
 | ||
| on all systems required by the GNU tools. For GNU/Linux and glibc,
 | ||
| this is not an issue.
 | ||
| </p><p>
 | ||
| Of medium concern, in the grand scope of things, is that the functions
 | ||
| used to implement this specialization work on null-terminated
 | ||
| strings. Buffers, especially file buffers, may not be null-terminated,
 | ||
| thus giving conversions that end prematurely or are otherwise
 | ||
| incorrect. Yikes!
 | ||
| </p><p>
 | ||
| The last, and fundamental problem, is the assumption of a global
 | ||
| locale for all the "C" functions referenced above. For something like
 | ||
| C++ iostreams (where codecvt is explicitly used) the notion of
 | ||
| multiple locales is fundamental. In practice, most users may not run
 | ||
| into this limitation. However, as a quality of implementation issue,
 | ||
| the GNU C++ library would like to offer a solution that allows
 | ||
| multiple locales and or simultaneous usage with computationally
 | ||
| correct results. In short, libstdc++ is trying to offer, as an
 | ||
| option, a high-quality implementation, damn the additional complexity!
 | ||
| </p><p>
 | ||
| For the required specialization
 | ||
| <code class="classname">codecvt<wchar_t, char, mbstate_t></code>,
 | ||
| conversions are made between the internal character set (always UCS4
 | ||
| on GNU/Linux) and whatever the currently selected locale for the
 | ||
| LC_CTYPE category implements.
 | ||
| </p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.impl"></a>Implementation</h4></div></div></div><p>
 | ||
| The two required specializations are implemented as follows:
 | ||
| </p><p>
 | ||
| <code class="code">
 | ||
| codecvt<char, char, mbstate_t>
 | ||
| </code>
 | ||
| </p><p>
 | ||
| This is a degenerate (i.e., does nothing) specialization. Implementing
 | ||
| this was a piece of cake.
 | ||
| </p><p>
 | ||
| <code class="code">
 | ||
| codecvt<char, wchar_t, mbstate_t>
 | ||
| </code>
 | ||
| </p><p>
 | ||
| This specialization, by specifying all the template parameters, pretty
 | ||
| much ties the hands of implementors. As such, the implementation is
 | ||
| straightforward, involving <code class="function">mcsrtombs</code> for the conversions
 | ||
| between <span class="type">char</span> to <span class="type">wchar_t</span> and
 | ||
| <code class="function">wcsrtombs</code> for conversions between <span class="type">wchar_t</span>
 | ||
| and <span class="type">char</span>.
 | ||
| </p><p>
 | ||
| Neither of these two required specializations deals with Unicode
 | ||
| characters. As such, libstdc++ implements a partial specialization
 | ||
| of the <span class="type">codecvt</span> class with an iconv wrapper class,
 | ||
| <code class="classname">encoding_state</code> as the third template parameter.
 | ||
| </p><p>
 | ||
| This implementation should be standards conformant. First of all, the
 | ||
| standard explicitly points out that instantiations on the third
 | ||
| template parameter, <span class="type">stateT</span>, are the proper way to implement
 | ||
| non-required conversions. Second of all, the standard says (in Chapter
 | ||
| 17) that partial specializations of required classes are A-OK. Third
 | ||
| of all, the requirements for the <span class="type">stateT</span> type elsewhere in the
 | ||
| standard (see 21.1.2 traits typedefs) only indicate that this type be copy
 | ||
| constructible.
 | ||
| </p><p>
 | ||
| As such, the type <span class="type">encoding_state</span> is defined as a non-templatized,
 | ||
| POD type to be used as the third type of a <span class="type">codecvt</span> instantiation.
 | ||
| This type is just a wrapper class for iconv, and provides an easy interface
 | ||
| to iconv functionality.
 | ||
| </p><p>
 | ||
| There are two constructors for <span class="type">encoding_state</span>:
 | ||
| </p><p>
 | ||
| <code class="code">
 | ||
| encoding_state() : __in_desc(0), __out_desc(0)
 | ||
| </code>
 | ||
| </p><p>
 | ||
| This default constructor sets the internal encoding to some default
 | ||
| (currently UCS4) and the external encoding to whatever is returned by
 | ||
| <code class="code">nl_langinfo(CODESET)</code>.
 | ||
| </p><p>
 | ||
| <code class="code">
 | ||
| encoding_state(const char* __int, const char* __ext)
 | ||
| </code>
 | ||
| </p><p>
 | ||
| This constructor takes as parameters string literals that indicate the
 | ||
| desired internal and external encoding. There are no defaults for
 | ||
| either argument.
 | ||
| </p><p>
 | ||
| One of the issues with iconv is that the string literals identifying
 | ||
| conversions are not standardized. Because of this, the thought of
 | ||
| mandating and/or enforcing some set of pre-determined valid
 | ||
| identifiers seems iffy: thus, a more practical (and non-migraine
 | ||
| inducing) strategy was implemented: end-users can specify any string
 | ||
| (subject to a pre-determined length qualifier, currently 32 bytes) for
 | ||
| encodings. It is up to the user to make sure that these strings are
 | ||
| valid on the target system.
 | ||
| </p><p>
 | ||
| <code class="code">
 | ||
| void
 | ||
| _M_init()
 | ||
| </code>
 | ||
| </p><p>
 | ||
| Strangely enough, this member function attempts to open conversion
 | ||
| descriptors for a given encoding_state object. If the conversion
 | ||
| descriptors are not valid, the conversion descriptors returned will
 | ||
| not be valid and the resulting calls to the codecvt conversion
 | ||
| functions will return error.
 | ||
| </p><p>
 | ||
| <code class="code">
 | ||
| bool
 | ||
| _M_good()
 | ||
| </code>
 | ||
| </p><p>
 | ||
| Provides a way to see if the given <span class="type">encoding_state</span> object has been
 | ||
| properly initialized. If the string literals describing the desired
 | ||
| internal and external encoding are not valid, initialization will
 | ||
| fail, and this will return false. If the internal and external
 | ||
| encodings are valid, but <code class="function">iconv_open</code> could not allocate
 | ||
| conversion descriptors, this will also return false. Otherwise, the object is
 | ||
| ready to convert and will return true.
 | ||
| </p><p>
 | ||
| <code class="code">
 | ||
| encoding_state(const encoding_state&)
 | ||
| </code>
 | ||
| </p><p>
 | ||
| As iconv allocates memory and sets up conversion descriptors, the copy
 | ||
| constructor can only copy the member data pertaining to the internal
 | ||
| and external code conversions, and not the conversion descriptors
 | ||
| themselves.
 | ||
| </p><p>
 | ||
| Definitions for all the required codecvt member functions are provided
 | ||
| for this specialization, and usage of <code class="code">codecvt<<em class="replaceable"><code>internal
 | ||
| character type</code></em>, <em class="replaceable"><code>external character type</code></em>, <em class="replaceable"><code>encoding_state</code></em>></code> is consistent with other
 | ||
| codecvt usage.
 | ||
| </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.use"></a>Use</h4></div></div></div><p>A conversion involving a string literal.</p><pre class="programlisting">
 | ||
|   typedef codecvt_base::result                  result;
 | ||
|   typedef unsigned short                        unicode_t;
 | ||
|   typedef unicode_t                             int_type;
 | ||
|   typedef char                                  ext_type;
 | ||
|   typedef encoding_state                          state_type;
 | ||
|   typedef codecvt<int_type, ext_type, state_type> unicode_codecvt;
 | ||
| 
 | ||
|   const ext_type*       e_lit = "black pearl jasmine tea";
 | ||
|   int                   size = strlen(e_lit);
 | ||
|   int_type              i_lit_base[24] =
 | ||
|   { 25088, 27648, 24832, 25344, 27392, 8192, 28672, 25856, 24832, 29184,
 | ||
|     27648, 8192, 27136, 24832, 29440, 27904, 26880, 28160, 25856, 8192, 29696,
 | ||
|     25856, 24832, 2560
 | ||
|   };
 | ||
|   const int_type*       i_lit = i_lit_base;
 | ||
|   const ext_type*       efrom_next;
 | ||
|   const int_type*       ifrom_next;
 | ||
|   ext_type*             e_arr = new ext_type[size + 1];
 | ||
|   ext_type*             eto_next;
 | ||
|   int_type*             i_arr = new int_type[size + 1];
 | ||
|   int_type*             ito_next;
 | ||
| 
 | ||
|   // construct a locale object with the specialized facet.
 | ||
|   locale                loc(locale::classic(), new unicode_codecvt);
 | ||
|   // sanity check the constructed locale has the specialized facet.
 | ||
|   VERIFY( has_facet<unicode_codecvt>(loc) );
 | ||
|   const unicode_codecvt& cvt = use_facet<unicode_codecvt>(loc);
 | ||
|   // convert between const char* and unicode strings
 | ||
|   unicode_codecvt::state_type state01("UNICODE", "ISO_8859-1");
 | ||
|   initialize_state(state01);
 | ||
|   result r1 = cvt.in(state01, e_lit, e_lit + size, efrom_next,
 | ||
| 		     i_arr, i_arr + size, ito_next);
 | ||
|   VERIFY( r1 == codecvt_base::ok );
 | ||
|   VERIFY( !int_traits::compare(i_arr, i_lit, size) );
 | ||
|   VERIFY( efrom_next == e_lit + size );
 | ||
|   VERIFY( ito_next == i_arr + size );
 | ||
| </pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.future"></a>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
 | ||
|    a. things that are sketchy, or remain unimplemented:
 | ||
|       do_encoding, max_length and length member functions
 | ||
|       are only weakly implemented. I have no idea how to do
 | ||
|       this correctly, and in a generic manner.  Nathan?
 | ||
| </p></li><li class="listitem"><p>
 | ||
|    b. conversions involving <span class="type">std::string</span>
 | ||
|   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
 | ||
|       how should operators != and == work for string of
 | ||
|       different/same encoding?
 | ||
|       </p></li><li class="listitem"><p>
 | ||
|       what is equal? A byte by byte comparison or an
 | ||
|       encoding then byte comparison?
 | ||
|       </p></li><li class="listitem"><p>
 | ||
|       conversions between narrow, wide, and unicode strings
 | ||
|       </p></li></ul></div></li><li class="listitem"><p>
 | ||
|    c. conversions involving std::filebuf and std::ostream
 | ||
| </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
 | ||
|       how to initialize the state object in a
 | ||
|       standards-conformant manner?
 | ||
|       </p></li><li class="listitem"><p>
 | ||
|       how to synchronize the "C" and "C++"
 | ||
|       conversion information?
 | ||
|       </p></li><li class="listitem"><p>
 | ||
|       wchar_t/char internal buffers and conversions between
 | ||
|       internal/external buffers?
 | ||
|       </p></li></ul></div></li></ul></div></div><div class="bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.biblio"></a>Bibliography</h4></div></div></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.2"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       The GNU C Library
 | ||
|     </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2007 FSF. </span><span class="pagenums">
 | ||
|       Chapters 6 Character Set Handling and 7 Locales and Internationalization
 | ||
|     . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.3"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       Correspondence
 | ||
|     </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2002 . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.4"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       ISO/IEC 14882:1998 Programming languages - C++
 | ||
|     </em>. </span><span class="copyright">Copyright © 1998 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.5"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       ISO/IEC 9899:1999 Programming languages - C
 | ||
|     </em>. </span><span class="copyright">Copyright © 1999 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.6"></a><p><span class="title"><em>
 | ||
| 	<a class="link" href="http://pubs.opengroup.org/onlinepubs/9699919799/" target="_top">
 | ||
|       System Interface Definitions, Issue 7 (IEEE Std. 1003.1-2008)
 | ||
| 	</a>
 | ||
|       </em>. </span><span class="copyright">Copyright © 2008 
 | ||
| 	The Open Group/The Institute of Electrical and Electronics
 | ||
| 	Engineers, Inc.
 | ||
|       . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.7"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       The C++ Programming Language, Special Edition
 | ||
|     </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername">
 | ||
| 	Addison Wesley
 | ||
|       . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.8"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       Standard C++ IOStreams and Locales
 | ||
|     </em>. </span><span class="subtitle">
 | ||
|       Advanced Programmer's Guide and Reference
 | ||
|     . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername">
 | ||
| 	Addison Wesley Longman
 | ||
|       . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.9"></a><p><span class="title"><em>
 | ||
| 	<a class="link" href="http://www.lysator.liu.se/c/na1.html" target="_top">
 | ||
|       A brief description of Normative Addendum 1
 | ||
| 	</a>
 | ||
|       </em>. </span><span class="author"><span class="firstname">Clive</span> <span class="surname">Feather</span>. </span><span class="pagenums">Extended Character Sets. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.10"></a><p><span class="title"><em>
 | ||
| 	<a class="link" href="http://tldp.org/HOWTO/Unicode-HOWTO.html" target="_top">
 | ||
| 	  The Unicode HOWTO
 | ||
| 	</a>
 | ||
|       </em>. </span><span class="author"><span class="firstname">Bruno</span> <span class="surname">Haible</span>. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.3.8.11"></a><p><span class="title"><em>
 | ||
| 	<a class="link" href="https://www.cl.cam.ac.uk/~mgk25/unicode.html" target="_top">
 | ||
|       UTF-8 and Unicode FAQ for Unix/Linux
 | ||
| 	</a>
 | ||
|       </em>. </span><span class="author"><span class="firstname">Markus</span> <span class="surname">Khun</span>. </span></p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.messages"></a>messages</h3></div></div></div><p>
 | ||
| The <code class="classname">std::messages</code> facet implements message retrieval functionality
 | ||
| equivalent to Java's <code class="classname">java.text.MessageFormat</code> using either GNU <code class="function">gettext</code>
 | ||
| or IEEE 1003.1-200 functions.
 | ||
| </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.req"></a>Requirements</h4></div></div></div><p>
 | ||
| The <code class="classname">std::messages</code> facet is probably the most vaguely defined facet in
 | ||
| the standard library. It's assumed that this facility was built into
 | ||
| the standard library in order to convert string literals from one
 | ||
| locale to the other. For instance, converting the "C" locale's
 | ||
| <code class="code">const char* c = "please"</code> to a German-localized <code class="code">"bitte"</code>
 | ||
| during program execution.
 | ||
| </p><div class="blockquote"><blockquote class="blockquote"><p>
 | ||
| 22.2.7.1 - Template class messages [lib.locale.messages]
 | ||
| </p></blockquote></div><p>
 | ||
| This class has three public member functions, which directly
 | ||
| correspond to three protected virtual member functions.
 | ||
| </p><p>
 | ||
| The public member functions are:
 | ||
| </p><p>
 | ||
| <code class="code">catalog open(const string&, const locale&) const</code>
 | ||
| </p><p>
 | ||
| <code class="code">string_type get(catalog, int, int, const string_type&) const</code>
 | ||
| </p><p>
 | ||
| <code class="code">void close(catalog) const</code>
 | ||
| </p><p>
 | ||
| While the virtual functions are:
 | ||
| </p><p>
 | ||
| <code class="code">catalog do_open(const string& name, const locale& loc) const</code>
 | ||
| </p><div class="blockquote"><blockquote class="blockquote"><p>
 | ||
| <span class="emphasis"><em>
 | ||
| -1- Returns: A value that may be passed to <code class="code">get()</code> to retrieve a
 | ||
| message, from the message catalog identified by the string <code class="code">name</code>
 | ||
| according to an implementation-defined mapping. The result can be used
 | ||
| until it is passed to <code class="code">close()</code>.  Returns a value less than 0 if no such
 | ||
| catalog can be opened.
 | ||
| </em></span>
 | ||
| </p></blockquote></div><p>
 | ||
| <code class="code">string_type do_get(catalog cat, int set , int msgid, const string_type& dfault) const</code>
 | ||
| </p><div class="blockquote"><blockquote class="blockquote"><p>
 | ||
| <span class="emphasis"><em>
 | ||
| -3- Requires: A catalog <code class="code">cat</code> obtained from <code class="code">open()</code> and not yet closed.
 | ||
| -4- Returns: A message identified by arguments <code class="code">set</code>, <code class="code">msgid</code>, and <code class="code">dfault</code>,
 | ||
| according to an implementation-defined mapping. If no such message can
 | ||
| be found, returns <code class="code">dfault</code>.
 | ||
| </em></span>
 | ||
| </p></blockquote></div><p>
 | ||
| <code class="code">void do_close(catalog cat) const</code>
 | ||
| </p><div class="blockquote"><blockquote class="blockquote"><p>
 | ||
| <span class="emphasis"><em>
 | ||
| -5- Requires: A catalog cat obtained from <code class="code">open()</code> and not yet closed.
 | ||
| -6- Effects: Releases unspecified resources associated with <code class="code">cat</code>.
 | ||
| -7- Notes: The limit on such resources, if any, is implementation-defined.
 | ||
| </em></span>
 | ||
| </p></blockquote></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.design"></a>Design</h4></div></div></div><p>
 | ||
| A couple of notes on the standard.
 | ||
| </p><p>
 | ||
| First, why is <code class="code">messages_base::catalog</code> specified as a typedef
 | ||
| to int? This makes sense for implementations that use
 | ||
| <code class="code">catopen</code> and define <code class="code">nl_catd</code> as int, but not for
 | ||
| others. Fortunately, it's not heavily used and so only a minor irritant. 
 | ||
| This has been reported as a possible defect in the standard (LWG 2028).
 | ||
| </p><p>
 | ||
| Second, by making the member functions <code class="code">const</code>, it is
 | ||
| impossible to save state in them. Thus, storing away information used
 | ||
| in the 'open' member function for use in 'get' is impossible. This is
 | ||
| unfortunate.
 | ||
| </p><p>
 | ||
| The 'open' member function in particular seems to be oddly
 | ||
| designed. The signature seems quite peculiar. Why specify a <code class="code">const
 | ||
| string& </code> argument, for instance, instead of just <code class="code">const
 | ||
| char*</code>? Or, why specify a <code class="code">const locale&</code> argument that is
 | ||
| to be used in the 'get' member function? How, exactly, is this locale
 | ||
| argument useful? What was the intent? It might make sense if a locale
 | ||
| argument was associated with a given default message string in the
 | ||
| 'open' member function, for instance. Quite murky and unclear, on
 | ||
| reflection.
 | ||
| </p><p>
 | ||
| Lastly, it seems odd that messages, which explicitly require code
 | ||
| conversion, don't use the codecvt facet. Because the messages facet
 | ||
| has only one template parameter, it is assumed that ctype, and not
 | ||
| codecvt, is to be used to convert between character sets.
 | ||
| </p><p>
 | ||
| It is implicitly assumed that the locale for the default message
 | ||
| string in 'get' is in the "C" locale. Thus, all source code is assumed
 | ||
| to be written in English, so translations are always from "en_US" to
 | ||
| other, explicitly named locales.
 | ||
| </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.impl"></a>Implementation</h4></div></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="messages.impl.models"></a>Models</h5></div></div></div><p>
 | ||
|     This is a relatively simple class, on the face of it. The standard
 | ||
|     specifies very little in concrete terms, so generic
 | ||
|     implementations that are conforming yet do very little are the
 | ||
|     norm. Adding functionality that would be useful to programmers and
 | ||
|     comparable to Java's java.text.MessageFormat takes a bit of work,
 | ||
|     and is highly dependent on the capabilities of the underlying
 | ||
|     operating system.
 | ||
|   </p><p>
 | ||
|     Three different mechanisms have been provided, selectable via
 | ||
|     configure flags:
 | ||
|   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
 | ||
|        generic
 | ||
|      </p><p>
 | ||
|        This model does very little, and is what is used by default.
 | ||
|      </p></li><li class="listitem"><p>
 | ||
|        gnu
 | ||
|      </p><p>
 | ||
|        The gnu model is complete and fully tested. It's based on the
 | ||
|        GNU gettext package, which is part of glibc. It uses the
 | ||
|        functions <code class="code">textdomain, bindtextdomain, gettext</code> to
 | ||
|        implement full functionality. Creating message catalogs is a
 | ||
|        relatively straight-forward process and is lightly documented
 | ||
|        below, and fully documented in gettext's distributed
 | ||
|        documentation.
 | ||
|      </p></li><li class="listitem"><p>
 | ||
|        ieee_1003.1-200x
 | ||
|      </p><p>
 | ||
|        This is a complete, though untested, implementation based on
 | ||
|        the IEEE standard. The functions <code class="code">catopen, catgets,
 | ||
|        catclose</code> are used to retrieve locale-specific messages
 | ||
|        given the appropriate message catalogs that have been
 | ||
|        constructed for their use. Note, the script <code class="code">
 | ||
|        po2msg.sed</code> that is part of the gettext distribution can
 | ||
|        convert gettext catalogs into catalogs that
 | ||
|        <code class="code">catopen</code> can use.
 | ||
|    </p></li></ul></div><p>
 | ||
| A new, standards-conformant non-virtual member function signature was
 | ||
| added for 'open' so that a directory could be specified with a given
 | ||
| message catalog. This simplifies calling conventions for the gnu
 | ||
| model.
 | ||
| </p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="messages.impl.gnu"></a>The GNU Model</h5></div></div></div><p>
 | ||
|     The messages facet, because it is retrieving and converting
 | ||
|     between characters sets, depends on the ctype and perhaps the
 | ||
|     codecvt facet in a given locale. In addition, underlying "C"
 | ||
|     library locale support is necessary for more than just the
 | ||
|     <code class="code">LC_MESSAGES</code> mask: <code class="code">LC_CTYPE</code> is also
 | ||
|     necessary. To avoid any unpleasantness, all bits of the "C" mask
 | ||
|     (i.e. <code class="code">LC_ALL</code>) are set before retrieving messages.
 | ||
|   </p><p>
 | ||
|     Making the message catalogs can be initially tricky, but become
 | ||
|     quite simple with practice. For complete info, see the gettext
 | ||
|     documentation. Here's an idea of what is required:
 | ||
|   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
 | ||
|        Make a source file with the required string literals that need
 | ||
|        to be translated. See <code class="code">intl/string_literals.cc</code> for
 | ||
|        an example.
 | ||
|      </p></li><li class="listitem"><p>
 | ||
|        Make initial catalog (see "4 Making the PO Template File" from
 | ||
|        the gettext docs).</p><p>
 | ||
|    <code class="code"> xgettext --c++ --debug string_literals.cc -o libstdc++.pot </code>
 | ||
|    </p></li><li class="listitem"><p>Make language and country-specific locale catalogs.</p><p>
 | ||
|    <code class="code">cp libstdc++.pot fr_FR.po</code>
 | ||
|    </p><p>
 | ||
|    <code class="code">cp libstdc++.pot de_DE.po</code>
 | ||
|    </p></li><li class="listitem"><p>
 | ||
|        Edit localized catalogs in emacs so that strings are
 | ||
|        translated.
 | ||
|      </p><p>
 | ||
|    <code class="code">emacs fr_FR.po</code>
 | ||
|    </p></li><li class="listitem"><p>Make the binary mo files.</p><p>
 | ||
|    <code class="code">msgfmt fr_FR.po -o fr_FR.mo</code>
 | ||
|    </p><p>
 | ||
|    <code class="code">msgfmt de_DE.po -o de_DE.mo</code>
 | ||
|    </p></li><li class="listitem"><p>Copy the binary files into the correct directory structure.</p><p>
 | ||
|    <code class="code">cp fr_FR.mo (dir)/fr_FR/LC_MESSAGES/libstdc++.mo</code>
 | ||
|    </p><p>
 | ||
|    <code class="code">cp de_DE.mo (dir)/de_DE/LC_MESSAGES/libstdc++.mo</code>
 | ||
|    </p></li><li class="listitem"><p>Use the new message catalogs.</p><p>
 | ||
|    <code class="code">locale loc_de("de_DE");</code>
 | ||
|    </p><p>
 | ||
|    <code class="code">
 | ||
|    use_facet<messages<char> >(loc_de).open("libstdc++", locale(), dir);
 | ||
|    </code>
 | ||
|    </p></li></ul></div></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.use"></a>Use</h4></div></div></div><p>
 | ||
|    A simple example using the GNU model of message conversion.
 | ||
|  </p><pre class="programlisting">
 | ||
| #include <iostream>
 | ||
| #include <locale>
 | ||
| using namespace std;
 | ||
| 
 | ||
| void test01()
 | ||
| {
 | ||
|   typedef messages<char>::catalog catalog;
 | ||
|   const char* dir =
 | ||
|   "/mnt/egcs/build/i686-pc-linux-gnu/libstdc++/po/share/locale";
 | ||
|   const locale loc_de("de_DE");
 | ||
|   const messages<char>& mssg_de = use_facet<messages<char> >(loc_de);
 | ||
| 
 | ||
|   catalog cat_de = mssg_de.open("libstdc++", loc_de, dir);
 | ||
|   string s01 = mssg_de.get(cat_de, 0, 0, "please");
 | ||
|   string s02 = mssg_de.get(cat_de, 0, 0, "thank you");
 | ||
|   cout << "please in german:" << s01 << '\n';
 | ||
|   cout << "thank you in german:" << s02 << '\n';
 | ||
|   mssg_de.close(cat_de);
 | ||
| }
 | ||
| </pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.future"></a>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
 | ||
|     Things that are sketchy, or remain unimplemented:
 | ||
|   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
 | ||
| 	  _M_convert_from_char, _M_convert_to_char are in flux,
 | ||
| 	  depending on how the library ends up doing character set
 | ||
| 	  conversions. It might not be possible to do a real character
 | ||
| 	  set based conversion, due to the fact that the template
 | ||
| 	  parameter for messages is not enough to instantiate the
 | ||
| 	  codecvt facet (1 supplied, need at least 2 but would prefer
 | ||
| 	  3).
 | ||
| 	</p></li><li class="listitem"><p>
 | ||
| 	  There are issues with gettext needing the global locale set
 | ||
| 	  to extract a message. This dependence on the global locale
 | ||
| 	  makes the current "gnu" model non MT-safe. Future versions
 | ||
| 	  of glibc, i.e. glibc 2.3.x will fix this, and the C++ library
 | ||
| 	  bits are already in place.
 | ||
| 	</p></li></ul></div></li><li class="listitem"><p>
 | ||
|     Development versions of the GNU "C" library, glibc 2.3 will allow
 | ||
|     a more efficient, MT implementation of std::messages, and will
 | ||
|     allow the removal of the _M_name_messages data member. If this is
 | ||
|     done, it will change the library ABI. The C++ parts to support
 | ||
|     glibc 2.3 have already been coded, but are not in use: once this
 | ||
|     version of the "C" library is released, the marked parts of the
 | ||
|     messages implementation can be switched over to the new "C"
 | ||
|     library functionality.
 | ||
|   </p></li><li class="listitem"><p>
 | ||
|     At some point in the near future, std::numpunct will probably use
 | ||
|     std::messages facilities to implement truename/falsename
 | ||
|     correctly. This is currently not done, but entries in
 | ||
|     libstdc++.pot have already been made for "true" and "false" string
 | ||
|     literals, so all that remains is the std::numpunct coding and the
 | ||
|     configure/make hassles to make the installed library search its
 | ||
|     own catalog. Currently the libstdc++.mo catalog is only searched
 | ||
|     for the testsuite cases involving messages members.
 | ||
|   </p></li><li class="listitem"><p> The following member functions:</p><p>
 | ||
|    <code class="code">
 | ||
| 	catalog
 | ||
| 	open(const basic_string<char>& __s, const locale& __loc) const
 | ||
|    </code>
 | ||
|    </p><p>
 | ||
|    <code class="code">
 | ||
|    catalog
 | ||
|    open(const basic_string<char>&, const locale&, const char*) const;
 | ||
|    </code>
 | ||
|    </p><p>
 | ||
|    Don't actually return a "value less than 0 if no such catalog
 | ||
|    can be opened" as required by the standard in the "gnu"
 | ||
|    model. As of this writing, it is unknown how to query to see
 | ||
|    if a specified message catalog exists using the gettext
 | ||
|    package.
 | ||
|    </p></li></ul></div></div><div class="bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.biblio"></a>Bibliography</h4></div></div></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.2"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       The GNU C Library
 | ||
|     </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2007 FSF. </span><span class="pagenums">Chapters 6 Character Set Handling, and 7 Locales and Internationalization
 | ||
|     . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.3"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       Correspondence
 | ||
|     </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2002 . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.4"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       ISO/IEC 14882:1998 Programming languages - C++
 | ||
|     </em>. </span><span class="copyright">Copyright © 1998 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.5"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       ISO/IEC 9899:1999 Programming languages - C
 | ||
|     </em>. </span><span class="copyright">Copyright © 1999 ISO. </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.6"></a><p><span class="title"><em>
 | ||
| 	<a class="link" href="http://pubs.opengroup.org/onlinepubs/9699919799/" target="_top">
 | ||
|       System Interface Definitions, Issue 7 (IEEE Std. 1003.1-2008)
 | ||
| 	</a>
 | ||
|       </em>. </span><span class="copyright">Copyright © 2008 
 | ||
| 	The Open Group/The Institute of Electrical and Electronics
 | ||
| 	Engineers, Inc.
 | ||
|       . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.7"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       The C++ Programming Language, Special Edition
 | ||
|     </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername">
 | ||
| 	Addison Wesley
 | ||
|       . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.8"></a><p><span class="citetitle"><em class="citetitle">
 | ||
|       Standard C++ IOStreams and Locales
 | ||
|     </em>. </span><span class="subtitle">
 | ||
|       Advanced Programmer's Guide and Reference
 | ||
|     . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername">
 | ||
| 	Addison Wesley Longman
 | ||
|       . </span></span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.9"></a><p><span class="title"><em>
 | ||
| 	<a class="link" href="https://www.oracle.com/technetwork/java/api/index.html" target="_top">
 | ||
| 	API Specifications, Java Platform
 | ||
| 	</a>
 | ||
|       </em>. </span><span class="pagenums">java.util.Properties, java.text.MessageFormat,
 | ||
| java.util.Locale, java.util.ResourceBundle
 | ||
|     . </span></p></div><div class="biblioentry"><a id="id-1.3.4.6.3.4.8.10"></a><p><span class="title"><em>
 | ||
| 	<a class="link" href="https://www.gnu.org/software/gettext/" target="_top">
 | ||
|       GNU gettext tools, version 0.10.38, Native Language Support
 | ||
|       Library and Tools.
 | ||
| 	</a>
 | ||
|       </em>. </span></p></div></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="localization.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="localization.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="containers.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 8. 
 | ||
|   Localization
 | ||
|   
 | ||
|  </td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 9. 
 | ||
|   Containers
 | ||
|   
 | ||
| </td></tr></table></div></body></html> |