Tuesday, February 11, 2025

wstring_convert and wbuffer_convert

Overview
Performs conversions between wide strings and byte strings (on either direction) using a conversion object of type Codecvt.

Details
wstring_convert
Performs conversions between wide strings and byte strings (on either direction) using a conversion object of type Codecvt.
The standard facets suitable for use with wstring_convert are codecvt_utf8 for UTF-8/UCS-2 and UTF-8/UCS-4 conversions and codecvt_utf8_utf16 for UTF-8/UTF-16 conversions.
It's behavior is similar tombsrtowcs() api
.
Syntax
template<
    class Codecvt,
    class Elem = wchar_t,
    class Wide_alloc = std::allocator<Elem>,
    class Byte_alloc = std::allocator<char>
> class wstring_convert;

template parameters
NameDescription
Codecvt Type of the conversion object: codecvt_utf8 for UTF-8/UCS-2 and UTF-8/UCS-4 conversions and codecvt_utf8_utf16 for UTF-8/UTF-16 conversions.
Elem Wide character type.
Wide_alloc Allocator for elements of type Elem. Defaults to: allocator<Elem>
Byte_alloc Allocator for elements of type charDefaults to: allocator<char>


member types
NameDescription
byte_stringbasic_string<char>
wide_string
basic_string<Elem>
state_typeCodecvt::state_type
int_typechar_traits<Elem>::int_type

Fields
NameDescription
byte_string byte_err_stringthe byte string to display on errors.
wide_string wide_err_stringthe wide string to display on errors.
Codecvt* cvtptra pointer to the allocated conversion object
state_type cvtstatethe conversion state object
size_t cvtcountthe conversion counts

Constructor
NameDescription
  1. wstring_convert() 
  2. wstring_convert
    (Codecvt* pcvt = Codecvt()) 
  3. wstring_convert
    (Codecvt* pcvt, state_type state)
  4. wstring_convert
    (const byte_string& byte_err,
    const wide_string& wide_err = wide_string())
  1. Default Constructor 
  2. Initialization constructor. Constructs an object that uses pcvt as conversion object with a default-constructed shift state.
  3. Initialization constructor with state. Constructs an object that uses pcvt as conversion object and state as the initial value for its shift state
  4. Constructor with error strings. Constructs an object that returns byte_err or wide_err on failure to convert, instead of throwing an exception. The object uses a conversion object automatically constructed with new Codecvt whose shift state is reset before every conversion operation.
Example
typedef codecvt_utf8<wchar_t> ccvt;

    //1
    wstring_convert<ccvt> wsc;
    //2
    wstring_convert<ccvt> wsc2 (new ccvt);
    //3
    wstring_convert<ccvt> wsc3 (new ccvt, ccvt::state_type());
    //4
    wstring_convert<ccvt> wsc4 ("[error]",L"[error]");

    wstring wstr (L"Khrisha Rao👸");
    //str:"Khrisha Rao👸"
    string str = wsc4.to_bytes (wstr);


Methods
NameDescription
  1. wide_string from_bytes
    (char byte)
  2. wide_string from_bytes
    (const char* ptr)
  3. wide_string  from_bytes
    (const byte_string& str)
  4. wide_string from_bytes
    (const char* first, const char* last )

  1. The byte sequence only consists of one element byte.
  2. The byte sequence is the null-terminated sequence beginning at ptr.
  3. The byte sequence is the sequence contained in str.
  4. The byte sequence is the range [first, last).
If the conversion succeeds, returns the conversion result. Otherwise, returns wide_err_string or throws an exception.
Example
char str8[] = u8"ಖ್ರಿಷಾ Rao👸";
    wstring_convert<codecvt_utf8<char32_t>, char32_t> u8to32;
    u32string str32 = u8to32.from_bytes(str8);
    cout << std::showbase << std::hex;
    //prints:0xc96 0xccd 0xcb0 0xcbf 0xcb7 0xcbe 0x20 0x52 0x61 0x6f 0x1f478 
    for (char32_t c : str32)
        cout << static_cast<unsigned long long>(c) << ' ';
  1. byte_string to_bytes
    (Elem wchar)
  2. byte_string  to_bytes
    (
    const Elem* ptr)
  3. byte_string to_bytes
    (const wide_string& str)
  4. byte_string to_bytes
    (const Elem* first, const Elem* last )

  1. The wide sequence only consists of one element wchar.
  2. The wide sequence is the null-terminated sequence beginning at ptr.
  3. The wide sequence is the sequence contained in str.
  4. The wide sequence is the range [first, last).
If the conversion succeeds, returns the conversion result. Otherwise, returns wide_err_string or throws an exception.
Example
    char32_t str32[] = U"ಖ್ರಿಷಾ Rao👸";
    wstring_convert<codecvt_utf8<char32_t>, char32_t> uconv;
    string str8 = uconv.to_bytes(str32);
    //prints:ಖ್ರಿಷಾ Rao👸
    cout << str8 << endl;
size_t converted()Returns the number of input elements successfully converted by the last conversion operation.
state_type state()Returns the current value of the conversion state, which is stored in this wstring_convert object. The conversion state may be explicitly set in the constructor and is updated by all conversion operations.

wbuffer_convert
wbuffer_convert is a wrapper over stream buffer of type streambuf which gives it the appearance of basic_streambuf<Elem>.  
The class uses another stream buffer of bytes (narrow characters of type char) as its underlying byte stream buffer to/from which it converts wide characters of type Elem (its second template argument).
All I/O performed through wbuffer_convert undergoes character conversion as defined by the facet Codecvt. wbuffer_convert assumes ownership of the conversion facet, and cannot use a facet managed by a locale.
The standard facets suitable for use with std::wbuffer_convert are codecvt_utf8 for UTF-8/UCS-2 and UTF-8/UCS-4 conversions and std::codecvt_utf8_utf16 for UTF-8/UTF-16 conversions.
.
Syntax
template<
    class Codecvt,
    class Elem = wchar_t,
    class Tr = std::char_traits<Elem>
> class wbuffer_convert : public std::basic_streambuf<Elem, Tr>

template parameters
NameDescription
Codecvt Type of the conversion object: codecvt_utf8 for UTF-8/UCS-2 and UTF-8/UCS-4 conversions and codecvt_utf8_utf16 for UTF-8/UTF-16 conversions.
Elem Wide character type.
Tr   Character traits class

member types
NameDescription
state_typeCodecvt::state_type

Fields
NameDescription
streambuf* bufptra pointer to the underlying byte stream buffer.
Codecvt* cvtptra pointer to the allocated conversion object
state_type cvtstatethe conversion state object

Constructor
NameDescription
  1. wbuffer_convert() 
  2. wbuffer_convert
    (streambuf* bytebuf,
    Codecvt* pcvt = new Codecvt, 
    state_type state= state_type())
  1. Default Constructor 
  2. Initialization constructor.  Constructs a wbuffer_convert object with its internal state initialized to the arguments passed. The object wraps a bytebuf object, which becomes its underlying byte stream buffer.
Example
typedef codecvt_utf8<wchar_t> ccvt;

    //1
    wbuffer_convert<ccvt> wbuf;
    //2
    stringbuf utf8buf(reinterpret_cast<const char*>(u8"ಖ್ರಿಷಾ Rao👸"));
    wbuffer_convert<codecvt_utf8<wchar_t>> conv_in(&utf8buf);
    wistream ucsbuf(&conv_in);
    cout << hex << showbase;
    istreambuf_iterator<wchar_t> oit(ucsbuf), end;
    //prints:0xc96 0xccd 0xcb0 0xcbf 0xcb7 0xcbe 0x20 0x52 0x61 0x6f 0x1f478 
    for_each(oit,end, [](wchar_t c){cout << (unsigned long long)c << ' '; });
    cout << endl;


Methods
NameDescription
  1. streambuf* rdbuf() 
  2. streambuf* rdbuf
    (streambuf* bytebuf)

  1. Returns the pointer to the underlying byte stream.
  2. Replaces the associated byte stream with bytebuf.
Example
    //use clang
    wbuffer_convert<codecvt_utf8<wchar_t>> conv_out(cout.rdbuf());
    wostream out(&conv_out);
    //prints:ಖ್ರಿಷಾ Rao👸
    out << reinterpret_cast<const wchar_t*>(U"ಖ್ರಿಷಾ Rao👸");
state_type state()Returns the current value of the conversion state, which is stored in this wbuffer_convert object. The conversion state may be explicitly set in the constructor and is updated by all conversion operations.



No comments:

Post a Comment