Saturday, January 11, 2025

ctype facet

Overview
The facet ctype belongs to ctype category. The ctype standard facet classifies and transforms characters.  
Details
ctype_base
This is a base class for ctype facet. It lists the character classification categories which are inherited by the ctype facet. 

types
NameDescription
mask  bitmask type

constants
These are static constants of mask type.
NameDescription
spacewhitespace character
CharacterValueDescription
' '(0x20)space (SPC)
'\t'(0x09)horizontal tab (TAB)
'\n'(0x0a)newline (LF)
'\v'(0x0b)vertical tab (VT)
'\f'(0x0c)feed (FF)
'\r'(0x0d)carriage return (CR)
printprintable character.
A printable character is a character that occupies a printing position on a display (this is the opposite of a control character, checked with iscntrl).
For the standard ASCII character set (used by the "C" locale), printing characters are all with an ASCII code greater than 0x1f (US), except 0x7f (DEL).
cntrlcontrol character.
For the standard ASCII character set (used by the "C" locale), control characters are those between ASCII codes 0x00 (NUL) and 0x1f (US), plus 0x7f (DEL).
upperuppercase character.
Any of A B C D E F G H I J K L M N O P Q R S T U V W X Y Z.
lowerlowercase character.
Any of a b c d e f g h i j k l m n o p q r s t u v w x y z.
alphaalphabetic character.
In the default "C" locale, what constitutes a letter is only what returns true by either isupper or islower.
digitdigit character.
Decimal digits are any of: 0 1 2 3 4 5 6 7 8 9
punctpunctuation character.
The standard "C" locale considers punctuation characters all graphic characters (as in isgraph) that are not alphanumeric (as in isalnum).
xdigithexadecimal digit character.
Any of: 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
blankblank character.
The standard "C" locale considers blank characters the tab character ('\t') and the space character (' ').
alnumalpha | digit
graphalnum | punct
space character (' '), returns false when checked with isgraph.

ctype
ctype encapsulates character classification features. All stream input operations performed through istream use the ctype  of the locale imbued in the stream to identify whitespace characters for input tokenization. Stream output operations apply widen() to narrow-character arguments prior to output.

Syntax
template <class charT> 
class ctype : public locale:facet, public ctype_base

types
NameDescription
char_type First template parameter type which can be char and wchar_t.

Specializations
//narrow characters
ctype<char>

//wide characters
ctype<wchar_t>

ctype is derived from locale::facet and ctype_base class.
Fields
NameDescription
locale::id id the identifier of the facet. Represents the ctype category of the facet.

Constructor
NameDescription
ctype(size_t refs = 0)Creates a ctype facet and forwards the starting reference count refs to the base class constructor, locale::facet::facet()

Methods
Character Classification
NameDescription
  1. bool is
    (mask m, char_type c)
  2. const char_type * is
    (const char_type * l, const char_type * h, 
    mask * v)
  1. Returns whether c belongs to any of the categories specified in bitmask m.
  2. Classifies the characters in the range [l,h], sequentially filling the array v with the bitmask classification of each character and returns h.
Example
  //kannada locale
  locale loc("kn-IN.UTF-8");
  //ctype facet
  auto& f = use_facet<ctype<wchar_t>>(loc);
  
   //1
  //digits and hex digits
  auto m = ctype<wchar_t>::digit|ctype<wchar_t>::xdigit;
  //b:true
  bool b = f.is (m, L'A');
  //b:true
  b = f.is (m, L'1');
  //u0ce8 or ೧ -> 1 in kannada
  //b:false. 
  b = f.is (m, L'\u0ce8');

  //2  
  wstring s = L"Khri$ha Rao 18";
  vector<ctype<char>::mask> masks(s.length());
  auto p = f.is(&(*begin(s)), &(*end(s)), masks.data());
  
  /*prints
    K is:  alnum  alpha  graph  print  upper 
    h is:  alnum  alpha  graph  lower  print 
    r is:  alnum  alpha  graph  lower  print 
    i is:  alnum  alpha  graph  lower  print 
    $ is:  graph  print  punct 
    h is:  alnum  alpha  graph  lower  print 
    a is:  alnum  alpha  graph  lower  print  xdigit 
      is:  print  space  blank 
    R is:  alnum  alpha  graph  print  upper 
    a is:  alnum  alpha  graph  lower  print  xdigit 
    o is:  alnum  alpha  graph  lower  print 
    is:  print  space  blank 
    1 is:  alnum  graph  print  xdigit 
    8 is:  alnum  graph  print  xdigit 
  */

    for (size_t n = 0; n < s.length(); ++n) 
    {
	wcout  << s[n] << " is: ";
	if (masks[n] & ctype_base::alnum)
	    wcout << " alnum ";
	if (masks[n] & ctype_base::alpha)
	    wcout << " alpha ";
	if (masks[n] & ctype_base::cntrl)
	    wcout << " cntrl ";
	if (masks[n] & ctype_base::graph)
	    wcout << " graph ";
	if (masks[n] & ctype_base::lower)
	    wcout << " lower ";
	if (masks[n] & ctype_base::print)
	    wcout << " print ";
	if (masks[n] & ctype_base::punct)
	    wcout << " punct ";
	if (masks[n] & ctype_base::space)
	    wcout << " space ";
	if (masks[n] & ctype_base::upper)
	    wcout << " upper ";
	if (masks[n] & ctype_base::xdigit)
	    wcout << " xdigit ";
	if (masks[n] & ctype_base::blank)
	    wcout << " blank ";
	wcout << endl;
    }
const char_type * scan_not
(mask m, const char_type * l, const char_type * h)
Returns the first character in the range [l,h] that does not classify into any of the categories specified in m. If no such character is found in the range, h is returned. 

Example
    wstring s = L"Khri$ha Rao 18";
    auto& f = use_facet<ctype<wchar_t>>(locale::classic());
    auto p = f.scan_not(ctype<wchar_t>::digit, &(*begin(s)), &(*end(s)));
    //prints K
    wcout << *p << endl;
const char_type * scan_is
(mask m, const CharT* l, const CharT* h)
Returns the first character in the range [l,h] that classifies into any of the categories specified in m. If no such character is found in the range, h is returned.

Example
    wstring s = L"Khri$ha Rao 18";
    auto& f = use_facet<ctype<wchar_t>>(locale::classic());
    auto p = f.scan_is(ctype<wchar_t>::digit, &(*begin(s)), &(*end(s)));
    //prints 1
    wcout << *p << endl;

Character Transformation
NameDescription

  1. char_type toupper
    (char_type c) 
  2. const char_type * toupper
    (char_type * l, const char_type * h)
  1. Returns the uppercase equivalent of c. If no such equivalent character exists, the value returned is c, unchanged.
  2. Replaces any lowercase characters in the range [l,h] with its uppercase equivalent and returns h.
Example
    auto& f = use_facet<ctype<wchar_t>>(locale::classic());
//1
    //c:A
    auto c = f.toupper(L'a');
//2
    wstring s = L"Khri$ha Rao 18";
    //s:KHRI$HA RAO 18
    auto p = f.toupper(&(*begin(s)), &(*end(s)));
  1. char_type tolower
    (char_type c) 
  2. const char_type * tolower
    (char_type * l,  const char_type * h)
  1. Returns the lowercase equivalent of c. If no such equivalent character exists, the value returned is c, unchanged.
  2. Replaces any uppercase characters in the range [l,h] with its lowercase equivalent and returns h.
Example
    auto& f = use_facet<ctype<wchar_t>>(locale::classic());
    //1
    //c:a
    auto c = f.tolower(L'A');

    //2
    wstring s = L"Khri$ha Rao 18";
    //s:khri$ha rao 18
    auto p = f.tolower(&(*begin(s)), &(*end(s)));
  1. char_type widen
    (char c)
  2. const char * widen
    (const char * l, const char * h,
    char_type * to)
  1. Converts the single-byte character c to the corresponding wide character representation.
  2. For every character in the character range [l,h], writes the corresponding widened character to the successive locations in the character array pointed to by to.
Conversion is done using the simplest reasonable transformation. Typically, this applies only to the characters whose multibyte encoding is a single byte (e.g. U+0000-U+007F in UTF-8).
Widening, if successful, preserves all character classification categories known to is().

Example
    auto& f = use_facet<ctype<wchar_t>>(locale::classic());
    //1
    //c:A
    wchar_t c = f.widen('A');

    //2
    string s = "Khri$ha Rao 18";
    wstring ws(s.length(),0);
    //ws:Khri$ha Rao 18
    auto p = f.widen(&(*begin(s)), &(*end(s)),ws.data());
  1. char narrow
    (char_type c, char def)
  2. const char_type* narrow
    (const char_type * l, const char_type * h,
    char def, char * to) 
  1. Converts the (possibly wide) character c to multibyte representation if the character can be represented with a single byte. Returns def if such conversion does not exist.
  2. For every character in the character range [l,h], writes narrowed characters (or def whenever narrowing fails) to the successive locations in the character array pointed to by to.
Conversion is done using the simplest reasonable transformation. Typically, this applies only to the characters whose multibyte encoding is a single byte (e.g. U+0000-U+007F in UTF-8).
Narrowing, if successful, preserves all character classification categories known to is().

Example
    auto& f = use_facet<ctype<wchar_t>>(locale::classic());
    //1
    //c:A
    char c = f.narrow(L'A','?');

    //2
    wstring ws {L"Khri$ha Rao 18"};
    string s(ws.length(),0);
    //s:Khri$ha Rao 18
    auto p = f.narrow(&(*begin(ws)), &(*end(ws)),'?',s.data());
This example 5 prints character classification of a kannada locale text using ctype facet. It calls is() to collect classification of all the letters in the text. Later prints classification of each letter as seen in its console output.


No comments:

Post a Comment