Modern C++ 11,14: Quick tour with examples: ctype facet

Overview

The facet ctype belongs to ctype category. The ctype standard facet classifies and transforms characters.

Topics
ctype_base
ctype

Topics
ctype_base
ctype

Details

ctype_base

This is a base class for ctype facet. It lists the character classification categories which are inherited by the ctype facet.

types

Name	Description
mask	bitmask type

constants

These are static constants of mask type.

Name

Description

space

whitespace character

Character	Value	Description
' '	(0x20)	space (SPC)
'\t'	(0x09)	horizontal tab (TAB)
'\n'	(0x0a)	newline (LF)
'\v'	(0x0b)	vertical tab (VT)
'\f'	(0x0c)	feed (FF)
'\r'	(0x0d)	carriage return (CR)

printable character.

A printable character is a character that occupies a printing position on a display (this is the opposite of a control character, checked with iscntrl).

For the standard ASCII character set (used by the "C" locale), printing characters are all with an ASCII code greater than 0x1f (US), except 0x7f (DEL).

cntrl

control character.
For the standard ASCII character set (used by the "C" locale), control characters are those between ASCII codes 0x00 (NUL) and 0x1f (US), plus 0x7f (DEL).

upper

uppercase character.
Any of A B C D E F G H I J K L M N O P Q R S T U V W X Y Z.

lower

lowercase character.
Any of a b c d e f g h i j k l m n o p q r s t u v w x y z.

alpha

alphabetic character.
In the default "C" locale, what constitutes a letter is only what returns true by either isupper or islower.

digit

digit character.
Decimal digits are any of: 0 1 2 3 4 5 6 7 8 9

punct

punctuation character.
The standard "C" locale considers punctuation characters all graphic characters (as in isgraph) that are not alphanumeric (as in isalnum).

xdigit

hexadecimal digit character.
Any of: 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

blank

blank character.
The standard "C" locale considers blank characters the tab character ('\t') and the space character (' ').

alnum

alpha | digit

graph

alnum | punct
space character (' '), returns false when checked with isgraph.

ctype

ctype encapsulates character classification features. All stream input operations performed through istream use the ctype of the locale imbued in the stream to identify whitespace characters for input tokenization. Stream output operations apply widen() to narrow-character arguments prior to output.

Syntax

template <class charT> 
class ctype : public locale:facet, public ctype_base

types

Name	Description
char_type	First template parameter type which can be char and wchar_t.

Specializations

//narrow characters
ctype<char>

//wide characters
ctype<wchar_t>

ctype is derived from locale::facet and ctype_base class.

Fields

Name	Description
locale::id id	the identifier of the facet. Represents the ctype category of the facet.

Constructor

Name	Description
ctype(size_t refs = 0)	Creates a ctype facet and forwards the starting reference count refs to the base class constructor, locale::facet::facet()

Methods

Character Classification

Name	Description
bool is (mask m, char_type c) const char_type * is (const char_type * l, const char_type * h, mask * v)	Returns whether c belongs to any of the categories specified in bitmask m. Classifies the characters in the range [l,h], sequentially filling the array v with the bitmask classification of each character and returns h.
Example //kannada locale locale loc("kn-IN.UTF-8"); //ctype facet auto& f = use_facet<ctype<wchar_t>>(loc); //1 //digits and hex digits auto m = ctype<wchar_t>::digit\|ctype<wchar_t>::xdigit; //b:true bool b = f.is (m, L'A'); //b:true b = f.is (m, L'1'); //u0ce8 or ೧ -> 1 in kannada //b:false. b = f.is (m, L'\u0ce8'); //2 wstring s = L"Khri$ha Rao 18"; vector<ctype<char>::mask> masks(s.length()); auto p = f.is(&(begin(s)), &(end(s)), masks.data()); /prints K is: alnum alpha graph print upper h is: alnum alpha graph lower print r is: alnum alpha graph lower print i is: alnum alpha graph lower print $ is: graph print punct h is: alnum alpha graph lower print a is: alnum alpha graph lower print xdigit is: print space blank R is: alnum alpha graph print upper a is: alnum alpha graph lower print xdigit o is: alnum alpha graph lower print is: print space blank 1 is: alnum graph print xdigit 8 is: alnum graph print xdigit / for (size_t n = 0; n < s.length(); ++n) { wcout << s[n] << " is: "; if (masks[n] & ctype_base::alnum) wcout << " alnum "; if (masks[n] & ctype_base::alpha) wcout << " alpha "; if (masks[n] & ctype_base::cntrl) wcout << " cntrl "; if (masks[n] & ctype_base::graph) wcout << " graph "; if (masks[n] & ctype_base::lower) wcout << " lower "; if (masks[n] & ctype_base::print) wcout << " print "; if (masks[n] & ctype_base::punct) wcout << " punct "; if (masks[n] & ctype_base::space) wcout << " space "; if (masks[n] & ctype_base::upper) wcout << " upper "; if (masks[n] & ctype_base::xdigit) wcout << " xdigit "; if (masks[n] & ctype_base::blank) wcout << " blank "; wcout << endl; }
const char_type * scan_not (mask m, const char_type * l, const char_type * h)	Returns the first character in the range [l,h] that does not classify into any of the categories specified in m. If no such character is found in the range, h is returned. Example wstring s = L"Khri$ha Rao 18"; auto& f = use_facet<ctype<wchar_t>>(locale::classic()); auto p = f.scan_not(ctype<wchar_t>::digit, &(begin(s)), &(end(s))); //prints K wcout << *p << endl;
const char_type * scan_is (mask m, const CharT* l, const CharT* h)	Returns the first character in the range [l,h] that classifies into any of the categories specified in m. If no such character is found in the range, h is returned. Example wstring s = L"Khri$ha Rao 18"; auto& f = use_facet<ctype<wchar_t>>(locale::classic()); auto p = f.scan_is(ctype<wchar_t>::digit, &(begin(s)), &(end(s))); //prints 1 wcout << *p << endl;

Character Transformation

Name	Description
char_type toupper (char_type c) const char_type * toupper (char_type * l, const char_type * h)	Returns the uppercase equivalent of c. If no such equivalent character exists, the value returned is c, unchanged. Replaces any lowercase characters in the range [l,h] with its uppercase equivalent and returns h. Example auto& f = use_facet<ctype<wchar_t>>(locale::classic()); //1 //c:A auto c = f.toupper(L'a'); //2 wstring s = L"Khri$ha Rao 18"; //s:KHRI$HA RAO 18 auto p = f.toupper(&(begin(s)), &(end(s)));
char_type tolower (char_type c) const char_type * tolower (char_type * l, const char_type * h)	Returns the lowercase equivalent of c. If no such equivalent character exists, the value returned is c, unchanged. Replaces any uppercase characters in the range [l,h] with its lowercase equivalent and returns h. Example auto& f = use_facet<ctype<wchar_t>>(locale::classic()); //1 //c:a auto c = f.tolower(L'A'); //2 wstring s = L"Khri$ha Rao 18"; //s:khri$ha rao 18 auto p = f.tolower(&(begin(s)), &(end(s)));
char_type widen (char c) const char * widen (const char * l, const char * h, char_type * to)	Converts the single-byte character c to the corresponding wide character representation. For every character in the character range [l,h], writes the corresponding widened character to the successive locations in the character array pointed to by to. Conversion is done using the simplest reasonable transformation. Typically, this applies only to the characters whose multibyte encoding is a single byte (e.g. U+0000-U+007F in UTF-8). Widening, if successful, preserves all character classification categories known to is(). Example auto& f = use_facet<ctype<wchar_t>>(locale::classic()); //1 //c:A wchar_t c = f.widen('A'); //2 string s = "Khri$ha Rao 18"; wstring ws(s.length(),0); //ws:Khri$ha Rao 18 auto p = f.widen(&(begin(s)), &(end(s)),ws.data());
char narrow (char_type c, char def) const char_type* narrow (const char_type * l, const char_type * h, char def, char * to)	Converts the (possibly wide) character c to multibyte representation if the character can be represented with a single byte. Returns def if such conversion does not exist. For every character in the character range [l,h], writes narrowed characters (or def whenever narrowing fails) to the successive locations in the character array pointed to by to. Conversion is done using the simplest reasonable transformation. Typically, this applies only to the characters whose multibyte encoding is a single byte (e.g. U+0000-U+007F in UTF-8). Narrowing, if successful, preserves all character classification categories known to is(). Example auto& f = use_facet<ctype<wchar_t>>(locale::classic()); //1 //c:A char c = f.narrow(L'A','?'); //2 wstring ws {L"Khri$ha Rao 18"}; string s(ws.length(),0); //s:Khri$ha Rao 18 auto p = f.narrow(&(begin(ws)), &(end(ws)),'?',s.data());

This example 5 prints character classification of a kannada locale text using ctype facet. It calls is() to collect classification of all the letters in the text. Later prints classification of each letter as seen in its console output.

Modern C++ 11,14: Quick tour with examples

Pages

Saturday, January 11, 2025

ctype facet

No comments:

Post a Comment