Saturday, January 11, 2025

collate category and facet

Overview
This category supports one facet collate.

Details
collate
Strings can be compared in multiple ways:
Collate comparison  is based on language based rules. For example, in German, ß is sorted as SS. Therefore in German locale, tenß < tenv. Whereas in C locale,  tenß > tenv.

collate facet type can be used for collate comparison of strings. Also generate hash() in a locale specific manner.  This facet is used by basic_regex and can be applied, by means of locale::operator(), directly to all standard algorithms that expect a string comparison predicate.

Syntax
template <class charT> 
class collate: public locale:facet
This class is derived from local::facet class.

types
NameDescription
char_type First template parameter. This can be char and wchar_t
member_typeString type corresponding to the char_type.
This is basic_string<charT>

Specializations
//narrow characters
collate<char>

//wide characters
collate<wchar_t>

fields
NameDescription
locale::id id the identifier of the facet. Represents the collate category of the facet.

Constructor
NameDescription
collate(size_t refs = 0)Creates a collate facet and forwards the starting reference count refs to the base class constructor, locale::facet::facet()

Methods
NameDescription
int compare
(const char_type* l1, const char_type*h1,
const char_type* l2, const char_type* h2)
Compares the character sequence in the range  [l1,h1] to the one in [l2,h2]  using locales collation rules. It returns:
-1 if    [l1,h1]  <     [l2,h2] .
0  if    [l1,h1]  ==   [l2,h2]
1  if    [l1,h1]   >    [l2,h2] 

Example
wchar_t s[] = L"tenv";
wchar_t s2[] = L"tenß";
locale l;
auto& f = use_facet<collate<wchar_t>>(l);
//r:-1
auto  r = f.compare(begin(s), end(s), begin(s2), end(s2)) ;

l = locale("de_DE.UTF-8");
auto& f2 = use_facet<collate<wchar_t>>(l);
//r:1
r =  f2.compare(begin(s), end(s), begin(s2), end(s2));
string_type transform
(const char_type* l, const char_type* h)
Transforms a string so that collation can be replaced by comparison.
Converts the character sequence [l, h] to a string that, compared lexicographically with the result of calling transform() on another string, produces the same result as calling compare() on the same two strings. In other words, both the strings compared must be transformed before comparison.
The transformed string is returned. In classic locale, transform has no effect.

Example
wchar_t s[] = L"tenv";
wchar_t s2[] = L"tenß";


auto l = locale("de_DE.UTF-8");
auto& f = use_facet<collate<wchar_t>>(l);

auto o =  f.transform(begin(s), end(s));
auto o2 = f.transform(begin(s2), end(s2));

//r:1
auto  r = f.compare(begin(s), end(s), begin(s2), end(s2)) ;

auto& f2 = use_facet<collate<wchar_t>>(locale::classic());
//r:-1
r = f2.compare(begin(s), end(s), begin(s2), end(s2)) ;

//r:1
r = f2.compare(&(*begin(o)), &(*end(o)), &(*begin(o2)), &(*end(o2)));
long hash
(const char_type* l, const char_type* h)
Converts the character sequence [l, h] to an integer value that is equal to the hash obtained for all strings that collate equivalent in this locale (compare() returns ​0​). 

Example
wchar_t s[] = L"tenß";

auto l = locale("de_DE.UTF-8");
auto& f = use_facet<collate<wchar_t>>(l);

auto h =  f.hash(begin(s), end(s));

auto& f2 = use_facet<collate<wchar_t>>(locale::classic());
auto h2 =  f2.hash(begin(s), end(s));

//b:true
bool b = (h == h2);
This example 4 demonstrates collate facet under swedish locale as seen its console output.


    No comments:

    Post a Comment