Showing posts with label String and Literals. Show all posts
Showing posts with label String and Literals. Show all posts

Monday, November 13, 2023

basic_string


Overview
strings are essential part of any programming language. In C/C++ strings are represented as set of characters, terminated by a null character. 
A string can be created as a const char* or as char[]. However for backward compatibility, it is referred as const char *, when passed by value as shown below. Note that the compiler adds null character implicitly at the end. These strings are also known as cstring.
template <typename T>
void printtype(T) 
{
    cout << typeid(T).name() << endl;
}

//Example
const char *str = "hello,";
const char str2[9] = "world!";
printtype(str);   //prints char const * 
printtype( str2); //prints char const * 

The example 6  depicts the usage.

Details
CRT provides plethora of functions to handle strings. There are different functions for getting length, append, copy etc. 
std::basic_string class attempts to objectify strings so that it's easier to use. string is  a template based class as defined below. 
syntax
//charT - one of the character types char_t, wchar_t, char16_t or char32_t
//traits - defines null character, comparison and other properties
template <typename charT,
typename traits = char_traits<charT>,
typename Allocator = allocator<charT> >
class basic_string;

//Predefined string classes 
typedef basic_string<char_t> string;
typedef basic_string<wchar_t> wstring;
typedef basic_string<char16_t> u16string;
typedef basic_string<char32_t> u32string;

char_traits classes define common behavior such as comparison, assignment, copy etc. and also other aspects such as eof  type, offset type position type etc.

Buffering
Internally raw strings are stored in heap as shown below. It's possible to change storage. This example 15 allocates 100 mb string  using a shared memory based custom allocator class.
Constants
npos is used in constructor, string modification functions. Based on the context, it means end of string.

Constructors
The following constructors are available. Note [khrisha rao]  denotes a string object. "khrisha rao"  denotes a string literal.
NameDescription
string() default constructor
string(const string&)copy constructor
string(string&&)move constructor
string(initializer_list<char>)initializer list constructor.
string(const string& s, size_t p, size_t n=npos)from s, copy n chars, starting from p
string(const char* s)copy all chars from c string s
string(const char* s, size_t n)copy n chars from c string s
string(size_t n, char c)fill n chars with char c
string(InputIterator b, InputIterator e)copy chars from iterator b to iterator e
The example 7 depicts the usage.

Access using Iterators
Following functions return one of the iterators - random access iterators, const access iterators, reverse iterators and  const access reverse iterators.
NameDescription
iterator begin()Return iterator to beginning
iterator end()Return iterator to end
reverse_iterator rbegin()Return reverse iterator to reverse beginning
reverse_iterator rend()Return reverse iterator to reverse end
const_iterator cbegin()Return const_iterator to beginning
const_iterator cend()Return const_iterator to end
const_reverse_iterator  crbegin()Return const_reverse_iterator to reverse beginning
const_reverse_iterator  crend()Return const_reverse_iterator to reverse end
The example 8 depicts the usage.

Storage and length
The following functions returns length, capacity of the underlying string buffer.
NameDescription
const char* c_str()Get C string equivalent
const char* data()Same as c_str()  except null character in the end.
size_t size()Return length of string
size_t length()Return length of string
size_t max_size()Return maximum size of string
  1. void resize(size_t n)
  2. void resize(size_t n, char  c)  
  1. Resize string to size n and copy contents. The string should be initialized.
  2. Resize string to size n and fill with char c. The string should be empty.
size_t  capacity()Return size of allocated storage
void reserve(size_t n=0)Change  capacity to size n  or minimum
void clear()Clear string
bool empty()Test if string is empty
void shrink_to_fit()Shrink to fit
The example 9 depicts the usage.

Access using Index
Access individual elements in the string
NameDescription
  1. char& at(size_t p)
  2. const char& at(size_t p) const 
Get character in string at position p
  1. char& back()
  2. const char& back() const 
Access last character
  1. char& front()
  2. const char& front() const 
Access first character
  1. char& operator[] (size_t p)
  2. const char& operator[] (size_t p) const 
indexed access to characters of the string

String Modifications
Perform operations such as insert, append, replace, erase, assign and swap
NameDescription
  1. string& append(const string& s)
  2. string& append(const string& s, size_t p, size_t n)
  3. string& append(const char* s)
  4. string& append(const char* s, size_t n)
  5. string& append(size_t n, char c)
  6. string& append(InputIterator b, InputIterator e)
  7. string& append(initializer_list<char> il)
  1. Append string s
  2. Append substring from s
  3. Append c string s
  4. Append n chars from c string s
  5. Append  char c, n times
  6. Append  chars from iterators
  7. Append  chars from initializer list
  1. string& assign(const  string& s)
  2. string& assign(const string& s, size_t p, size_t n)
  3. string& assign(const char* s)
  4. string& assign(const char* s, size_t n)
  5. string& assign(size_t n, char c)
  6. string& assign(InputIterator b, InputIterator e)
  7. string& assign(initializer_list<char> il)
  8. string&  assign (string&& s)
  1. Assign string s
  2. Assign substring from s
  3. Assign c string s
  4. Assign n chars from c string s
  5. Assign char c, n times
  6. Assign chars from iterators
  7. Assign chars from initializer list
  8. Assign string s
  1. string&  insert (size_t p, const string& s)
  2. string&  insert (size_t p, const string& s, size_t sp, size_t n)
  3. string&  insert (size_t p, const char* s)
  4. string&  insert (size_t p, const char* s, size_t n)
  5. string&  insert (size_t p,   size_t n, char c)
  6. iterator insert (const_iterator p, size_t n, char c)
  7. iterator insert (const_iterator p, char c)
  8. iterator insert (iterator p, InputIterator b, InputIterator e)
  9. iterator  insert (const_iterator p, initializer_list<char> )
  1. insert string s at position p
  2. insert substring from s at position p
  3. insert c string s at position p
  4. insert n chars c string s at position p
  5. insert char c, n times at position p
  6. insert char c, n times at iterator p
  7. insert char c, at iterator p
  8. insert chars from b to e, at iterator p
  9. insert chars from initializer list at iterator p
  1. string&  erase (size_t p = 0, size_t n = npos)
  2. iterator erase (const_iterator p)
  3. iterator erase (const_iterator b, const_iterator e)
  1. Erase n chars at p 
  2. Erase char at p 
  3. Erase chars from b to e 
  1. string&  replace (size_t p, size_t n, const string& s)
  2. string&  replace (const_iterator i1, const_iterator i2, const string& s) 
  3. string& replace (size_t p, size_t n, const string& s, size_t sp, size_t sn)
  4. string&  replace (size_t p, size_t n, const char* s)
  5. string&  replace (const_iterator i1, const_iterator i2, const char* s)
  6. string&  replace (size_t p, size_t n, const char* s, size_t sn)
  7. string&  replace (const_iterator i1, const_iterator i2, 
    const 
    char
    * s, size_t n)
  8. string&  replace (size_t  pos, size_t len, size_t n, char c)
  9. string&  replace (const_iterator i1, const_iterator i2, size_t n, char c)
  10. string&  replace (const_iterator i1,
     const_iterator i2, InputIterator  b, InputIterator e)
  11. string&  replace (const_iterator i1, const_iterator i2, initializer_list<char>)
  1. replace n chars from string s at position p
  2. replace chars from i1 to i2 from string s
  3. replace n chars at position p from substring of string s 
  4. replace n chars at position p from cstring s
  5. replace chars from i1 to i2 from cstring s
  6. replace n chars or less at position p from cstring s 
  7. replace n chars from i1 to i2 from cstring s
  8. replace n chars with char c at position p
  9. replace n chars from i1 to i2 with char c
  10. replace chars from i1 to i2 with chars from iterators b to e
  11. replace chars from i1 to i2 with chars from initializer list
void push_back(char c)Append character to string
void pop_back()Delete last character
void swap (string& x, string& y) Exchanges the values of two strings
void swap(string& s)Swap string values with s
The example 10 depicts the usage.

String Operations
Perform operations such as extract string buffer, copy, find, compare and substring. The find functions return the position of the match, otherwise, npos.

compare functions, compare full or substring with a string  or its substring or a cstring or its substring. 
The return values are
Result  Description
0compare equal.
< 0either the value of the first character that does not match is lower in the compared string,
or all compared characters match but the compared string is shorter.
> 0either the value of the first character that does not match is greater in the compared string,
or all compared characters match but the compared string is longer.

NameDescription
size_t copy (char* s, size_t n, size_t p = 0)Copy n chars from position p to buffer s
  1. size_t find (const string& s, size_t p = 0) 
  2. size_t find (const char* s, size_t p = 0)
  3. size_t find (const char* s, size_t p, size_t n) 
  4. size_t find (char c, size_t p = 0) 
  1. Find string s from position p
  2. Find cstring s from position p
  3. Find n chars in cstring s from position p
  4. Find char c from position p
  1. size_t rfind (const string& s, size_t p = npos) 
  2. size_t rfind (const char* s, size_t p =npos)
  3. size_t rfind (const char* s, size_t p, size_t n) 
  4. size_t rfind (char c, size_t p=npos) 
  1. Reverse find string s from position p
  2. Reverse find cstring s from position p
  3. Reverse find n chars in cstring s from position p
  4. Reverse find char c from position p
  1. size_t find_first_of(const string& s, size_t p = 0) 
  2. size_t find_first_of(const char* s, size_t p = 0)
  3. size_t find_first_of(const char* s, size_t p, size_t n) 
  4. size_t find_first_of(char c, size_t p = 0)
  1. Find any char in string s from position p
  2. Find any char in cstring s from position p
  3. Find any of n chars in cstring s from position p
  4. Find char c from position p
  1. size_t find_last_of(const string& s, size_t p = npos) 
  2. size_t find_last_of(const char* s, size_t p = npos)
  3. size_t find_last_of(const char* s, size_t p, size_t n) 
  4. size_t find_last_of(char c, size_t p = npos)
  1. Reverse Find any char in string s from position p
  2. Reverse Find any char in cstring s from position p
  3. Reverse Find any of n chars in cstring s from position p
  4. Reverse Find char c from position p
  1. size_t find_first_not_of(const string& s, size_t p = 0) 
  2. size_t find_first_not_of(const char* s, size_t p = 0)
  3. size_t find_first_not_of(const char* s, size_t p, size_t n) 
  4. size_t find_first_not_of(char c, size_t p = 0)
  1. Find any char not in string s from position p
  2. Find any char not in cstring s from position p
  3. Find any of n chars not in cstring s from position p
  4. Find any char but not char c from position p
  1. size_t find_last_not_of(const string& s, size_t p = npos) 
  2. size_t find_last_not_of(const char* s, size_t p = npos)
  3. size_t find_last_not_of(const char* s, size_t p, size_t n) 
  4. size_t find_last_not_of(char c, size_t p = npos)
  1. Reverse Find any char not in string s from position p
  2. Reverse Find any char not in cstring s from position p
  3. Reverse Find any of n chars not in cstring s from position p
  4. Reverse Find any char but not char c from position p
  1. int compare (const string& s) 
  2. int compare (size_t p, size_t n, const string& s)
  3. int compare (size_t p, size_t n, const  string& s, size_t sp, size_t  sn)
  4. int compare (const char* s)
  5. int compare (size_t p, size_t n, const char* s)
  6. int compare (size_t p, size_t n, const char* s, size_t n)
  1. Compare with string s
  2. Compare substring with string s
  3. Compare substring with substring of string s
  4. Compare with cstring s
  5. Compare substring with cstring s
  6. Compare substring with substring of cstring s
string substr (size_t p = 0, size_t n = npos)Generate substring
The example 11 depicts the usage.

Overloaded Operators
Following overloaded operators are implemented externally and internally.
NameDescription
  1. string operator+ (const string& lhs, const string& rhs)
  2. string operator+ (const string& lhs, const char*   rhs)
  3. string operator+ (const char * lhs, const string& rhs)
  4. string operator+ (const string& lhs, char rhs)
  5. string operator+ (char lhs, const string& rhs)
  1. add two strings. return the result.
  2. add string and cstring. return the result.
  3. add cstring and string. return the result.
  4. add string and char. return the result.
  5. add char and string. return the result.
  1. string& operator=(const string& str)
  2. string& operator=(const char* s)
  3. string& operator=(char c)
  4. string& operator=(initializer_list<char>)
  5. string& operator=(string&& s)
  1. assign string s
  2. assign cstring s
  3. assign char c
  4. assign from initializer list
  5. assign and move string s
  1. string& operator+= (const string& str)
  2. string& operator+= (const char * s)
  3. string& operator+= (char c)
  1. Append a string
  2. Append a cstring
  3. Append a char

Relational Operators
These operators are externally implemented for string comparison.
NameDescription
  1. bool operator== (const string& lhs, const string& rhs)
  2. bool operator== (const char *   lhs, const string& rhs)
  3. bool operator== (const string& lhs, const  char *   rhs)
Equality operator for string
  1. bool operator!= (const string& lhs, const string& rhs)
  2. bool operator!= (const char *   lhs, const string& rhs)
  3. bool operator!= (const string& lhs, const char *   rhs)
Non Equality operator for string
  1. bool operator< (const  string& lhs, const string& rhs)
  2. bool operator< (const char *   lhs, const string& rhs)
  3. bool operator< (const string& lhs, const char *   rhs)
Less than operator for string
  1. bool operator<= (const string& lhs, const string& rhs)
  2. bool operator<= (const char *   lhs, const string& rhs)
  3. bool operator<= (const string& lhs, const  char *   rhs)
 Less than equal operator for string
  1. bool operator> (const string& lhs, const string& rhs)
  2. bool operator> (const char *   lhs, const string& rhs),
  3. bool operator> (const string& lhs, const char*   rhs)
 Greater than operator for string
  1. bool operator>= (const string& lhs, const string& rhs)
  2. bool operator>=(const char *   lhs, const string& rhs),
  3. bool operator>= (const string& lhs, const char *   rhs)
 Greater than equal operator for string

Stream operators and functions
The following overloaded function can be used with stream read and write operations.
NameDescription
istream& operator>> (istream& is, string& str)Extract string from stream
ostream& operator<< (ostream& os, const string& str)Insert string into stream
  1. istream& getline (istream& is, string& str, char delim)
  2. istream& getline (istream& is, string& str)
Externally overloaded function, reads a line from stream into string
The example 12 depicts the usage.

Convert strings to arithmetic types
The following functions convert strings to arithmetic type. The supported char types are char and wchar.
The value of the parameter  const string&  str can be preceded with or whitespace but followed by a numerical value. which can be trailed by any character. Examples:"99", "  99namaskara" etc
The value of the parameter size_t* idx , is set by the function to the position of the next character in str after the numerical value. This parameter can also be a null pointer, in which case it is not used. 
The value of the parameter int base represents numerical base (radix) that determines the valid characters and their interpretation.
If this is 0, the base used is determined by the format in the sequence. Notice that by default this argument is 10, not 0.
NameDescription
int stoi (const string&  str, size_t* idx = 0, int base = 10)
Convert string to integer
long stol (const  string&  str, size_t* idx = 0, int base = 10)
Convert string to long int
unsigned long stoul (const string&  str, size_t* idx = 0, int base = 10)
Convert string to unsigned integer
long long stoll (const string&  str, size_t* idx = 0, int base = 10)
Convert string to long long
unsigned long long stoull (const str, string&  str, size_t* idx = 0, int base = 10)
Convert string to unsigned long long
float stof (const string&  str, size_t* idx = 0)
Convert string to float
double stod (const  string&  str, size_t* idx = 0)
Convert string to double
long double stold (const string&  str, size_t* idx = 0)
Convert string to long double
The example 13 depicts the usage.

Convert arithmetic types to string
The following functions convert strings to arithmetic type. The supported char types are char and wchar.
NameDescription
  1. string to_string (int val)
  2. wstring to_wstring (int val)
Convert integer to string / wstring
  1. string to_string (long val)
  2. wstring to_wstring (long val)
Convert long int to string / wstring
  1. string to_string (long long val)
  2. wstring to_wstring (long long val)
Convert long long to string / wstring
  1. string to_string (unsigned val)
  2. wstring to_wstring (unsigned val)
Convert unsigned int to string / wstring
  1. string to_string (unsigned long val)
  2. wstring to_wstring (unsigned long val)
Convert unsigned long to string / wstring
  1. string to_string (unsigned long long val)
  2. wstring to_wstring (unsigned long long val)
Convert unsigned long long to string / wstring
  1. string to_string (float val)
  2. wstring to_wstring (float val)
Convert float to string / wstring
  1. string to_string (double val)
  2. wstring to_wstring (double val)
Convert double to string / wstring
  1. string to_string (long double val)
  2. wstring to_wstring (long double val)
Convert long double to string / wstring
The example 14 depicts the usage.




Monday, November 6, 2023

Regular Expression

 
Overview
Regular expressions are indispensable when looking for a specific information, for example while peering thru log files. When used in programming languages, it's highly versatile to filter out information or validating inputs without needing to write lots of complex code. 

The following websites contain in depth information and examples:

The following websites contain interactive tutorials:

The following websites provide test harness to test regex expressions and even debug. 

The following websites facilitate  compiling and running code samples in C++  and other platforms

Notepad++ supports regular expression search and replace.

Validation
Regular expressions can be used to validate an input. Some examples are below.

Regular Expression

Description

Inputs

Demo

\d{3}-\d{2}-\d{4}

US social security number

123-45-6789

Example

\((\d{3})\)\d{3}-\d{4}

US phone number

(408)333-4444

Example 2

9505[0-6]

santa clara zip codes

95056

Example 3

(?:0?\d|1[0-2]|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[ \/-](?:0?\d|1\d|2\d|3[0-1])[ \/-]\d{4}

date in different formats with limited validation

1/1/1987 
09-21-2018 
mar 8 1969 
7 17 1975

Example 4

Extraction
Regular expressions can be used to extract information from the input. To extract information, capture groups can be used where the information inside () will be extracted. Some examples are below.
Regular ExpressionDescriptionInputsDemo
(\(\d{3}\))-\d{3}-\d{4}extract area code from a US phone number(408)-333-4444Example 5
\s+(\w+)\s+\1find all the duplicate wordsstate of of the artExample 6

Basics
A regular expression basically searches for a pattern. A pattern can be  simple text comprising of a few  letters or numbers. Example: Bangalore 560082
A pattern can be complex comprising of character classes, quantifiers, grouping etc. as shown in the examples above.  The structure of such pattern is governed by a set of regular expression grammar rules. The grammar uses  these metacharacters  <([{\^-=$!|]})?*+.>

The following topics describe each feature of the grammar in detail

Dot character
Patterns can use . to map any character except some control characters. Note that it has no effect in a character class construct and maps to decimal point. 
Some examples are catExample etc.   Example 7

Character class
The basic ingredient of the regular expression grammar is a character class. The structure of a character class is defined as below. Note that the yellow  background characters indicates matches in the input text.

Construct

Description

Matches

Demo

[ae]

a or e (simple class)

 gray  grey

    Example 8

[^aeiou]

Any character except aeiou (negation)

marcial

    Example 9

[a-zA-Z]

a through z, or A through Z inclusive (range)

Khri$ha

     Example 10

[^7-9]

any number other than 7-9

95056

    Example 11


Predefined character classes or Shorthands
To reduce clutter, shorthands to character classes are provided. These can be freely used in another character class or even in pattern. The shorthands and their expansions are below. Note that the yellow  background characters indicates matches in the input text.

Construct

Description

Matches

Demo

\d

A digit: [0-9]

$10.99

   Example 12

\D

A non-digit: [^0-9]

$10.99

   Example 13

\s

A whitespace character: [ \t\n\x0B\f\r]

try        it!

   Example 14

\S

A non-whitespace character: [^\s]

try        it!

   Example 15

\w

A word character: [a-zA-Z_0-9]

try        it!

   Example 16

\W

A non-word character: [^\w]

try        it!

   Example 17


Escaping
Some times a meta character needs to be escaped in a pattern or a character class. Escaping is done by placing \  in front of the meta character. 
Examples: \[  or \] escapes[ and ] meta characters and matches [] in the pattern as in [1,2,3]. \. escapes decimal point as in 10.99 Example 18

Anchors and boundary markers
Anchors and  boundary markers marks special locations such as beginning or ending of the lines, word boundaries etc. These can be used only in the pattern and not in character classes. Note that the yellow  background indicates markers in the input text: "Hello, World!"

Construct

Description

Matches

Demo

^

The beginning of a line

 Hello, World!

 Example 19

$

The end of a line

Hello, World! 

 Example 20

\b

A word boundary

 Hello  World !

 Example 21

\B

A non-word boundary

H e l l o,  W o r l d! 

 Example 22

\A

The beginning of the input

 Hello, World!

 Example 23

\G

The end of the previous match

 Hello, World!

 Example 24

\Z

The end of the input but for the final terminator, if any

Hello, World! 

 Example 25

\z

The end of the input

Hello, World! 

 Example 26


Quantifiers
Quantifiers determine  repetitiveness of a token in the pattern. Quantifiers applies to any token in the pattern only. The table below lists in detail. Note that <empty> means blank or no matches were found.  Matches are highlighted in yellow.

Construct

Description

Pattern

Matches

Demo

?

Matches 0 or once

S?

 

s

 Example 27

+

Matches once or more

S+

s

sss

 Example 28

*

Matches 0 or more

S*

 

sss

 Example 29

{n}

Exactly n times

S{3}

sss

 Example 30 

{m,}

Minimum m times or more

S{2,}

s
ss
sss

Example 31

{m.n}

Minimum m times and

Maximum n times

S{2,3}

s
ss
sss

Example 32


Greedy, Lazy and Possessive Quantifiers 
The results of the same pattern for the same input but different quantifiers can be surprisingly different.
This is more pronounced when the pattern has a dot (.) followed by ? or * or +.
This has to do with the how much the regular expression engine grabs the input text for matching and then backtracks when a match is not found. During backtracking, the regular expression engine  looses one token from the grabbed text and tries again. This repeats till the grabbed text is empty or a match is found.
In case of greedy, the entire input is grabbed and when no match is found, backtracking happens.
In case of lazy, a few tokens are grabbed and when no match is found, backtracking happens.
In case of Possessive, the entire input is grabbed and when no match is found, no backtracking happens.
By default, quantifiers are greedy. They can be made lazy by adding ? or Possessive by adding + as shown below.

Greedy

Lazy

Possessive

Meaning

X*

X*?

X*+

X, zero or more times

X+

X+?

X++

X, one or more times

X{n}

X{n}?

X{n}+

X, exactly n times

X{n,}

X{n,}?

X{n,}+

X, at least n times

X{n,m}

X{n,m}?

X{n,m}+

X, at least n but not more than m times


For example, consider text "This is a <B> bold </B> text" . Using the pattern <.*>, the expectation is to match, <B> and </B>. However Greedy matches more and Possessive matches none. Only lazy matches correctly.

PatternRemarkMatchDemo
<.*>GreedyThis is a <B> bold </B> example Example 33
<.*?>LazyThis is a <B> bold </B> example Example 34
<.*+>PossessiveThis is a <B> bold </B> example Example 35

Capture groups
The capture groups are one of the key aspects of the regular expression. It enables capturing a specific information such as area code as seen in the example below. The groups are defined and enclosed in ().
For example, the pattern \((\d{3})\)\d{3}-\d{4} matches  (408)333-4444.    Example 36
Following topics discuss different features of the same in depth,

OR (|) Operator
Suppose the phone numbers of city of Los Angeles needs to be filtered, the OR operator(|) can be used in the capture groups. For example, the pattern below has a group setup as (213|323).
For example, the pattern \((213|323)\)\d{3}-\d{4} matches (213)123-4567 or (323)456-1234  Example 37

Non capturing groups
Non capturing groups are used for efficiency and optimization. As the name indicates, contents of non capturing groups are discarded. The Non capturing groups are defined and enclosed in (?:)
For example, the pattern (\w\w) (\d{5})-(?:\d{4}) discards the last 4 numbers of the zip code CA 95131-3059  Example 38

References
Captured groups in a pattern are internally labeled as \1, \2 , \3 etc. as shown below.
([0-9])([-/ ])[a-z][-/ ]([0-9])
|--1--||--2--|          |--3--|
Nested references are labeled differently as shown below.
(([0-9])([-/ ]))([a-z])
|--2--||--3--|
|-------1------||--4--|
A reference refers to a previously captured group in the pattern. These are useful when looking for duplicate words in a text. There can be 3 different kinds of references.

Back reference 
Back references are located after the captured group. For example, the pattern \s(\w*)\s\1 looks for adjacent duplicate words in a text. Here \1 refers to the first global captured group. For the input "This is is a test" a match is made as highlighted in blue. Example 39

Forward reference 
Forward references are located before the captured group. For example, the pattern (\2two|(one)) looks for the text in the second global captured group "one" in a text. For the input "oneonetwo" a match is made as highlighted in blue. Example 40

Nested  reference 
Nested reference are defined with in a captured group and refers to sub captured group defined with in it. For example,  (\1two|(one))\1 refers to the first relative captured group with in the outer captured group. For the input "oneonetwo" a match is made as highlighted in blue. Example 41

In addition to the above, references can also be named and used with \k switch. Relative referencing is also supported with negative numbers  using \k switch.

Named references
Captured groups can have names and they can be used instead of numbers. For example,  the back reference pattern discussed earlier can be rewritten as  \s(?'dup'\w*\s)\k'dup'.  Example 43 Note that the captured group name is defined as dup. The syntax is ?'name'. It can be referenced as \k'name'. Alternatively,  \g can also be used. \g'name' for referencing. Example 42 
Duplicate names are allowed however the last captured group with the same name is used for matching.

Relative references
References to relatively placed capture groups are allowed with \kn. Here n needs to be negative number starting with -1. For example for the pattern (a)(b)(c)\k-3 matches abca.  Similarly  
(a)(b)(c)\k-1 matches abcc. Another pattern (a)(b)(c\k-2) matches abcb Example 44

Advanced Features
Flags
The behavior of the regular expression engine can be changed by setting flags in the pattern. For example, to make the search case insensitive. The syntax is (?gi) turns on global flag and case insensitive flag. Example 45 Similarly, (?gi-mx) turns on global flag and case insensitive flag and turns off multiline and skipping whitespace.
The following is a partial list supported by most engines. 
g: matches the pattern multiple times
i: makes the regex case insensitive
m: enables multi-line mode. Where ^ and $ match the start and end of the entire string. Without this,         multi-line strings match the beginning and end of each line.
u: enables support for unicode
s: short for single line, it causes the . to also match new line characters
x: ignore whitespace
U:make quantifiers lazy

Unicode support
First some basic information. Unicode standard describes how to represent characters of all the languages in the world - living or dead. In Unicode, a codepoint describe an unique artifact of the script. It can be a character or  it can be combining mark. For example a (U+61) and combining grave accent . (U+300)  An unit of readable representation of the script  is called  grapheme cluster. It can consists of one or more codepoints. For example a (U+61) or Ã (U+61 and U+300). Note that à can also be represented as single code point (U+E0).  
The dot (.) equivalent of unicode is \X except it also matches line breaks. A single codepoint can also be represented \x{FFFF} where FFFF is the codepoint.
Unicode categories
Unicode also defines categories that are represented as \p{xxx} where xxx can be languages(\p{L}), mark(\p{M}), numbers(\pN), currencies(\p{Sc}) etc. \P{xxx} matches anything that does not belong to that category.
Examples: 
 \p{Sc} matches "Prices: $2, 1, ¥9"  Example 46

\p{M}*\p{L}*  matches kannada script "ಖ್ರಿಷಾ" as six different code points  à²– ್ ರ ಿ ಷ Example 47
Here \p{L} maps to à²– ರ ಷ   and \p{M} maps to   ್ ಿ
combining code points à²–  and yields  à²–್
combining code points  à²° and ಿ yields  à²°ಿ
combining  à²–್  and à²°ಿ yields  à²–್ರಿ 
combining code points   ಷ and yields  à²·ಾ
combining  à²–್ರಿ and   à²·ಾ yield à²–್ರಿಷಾ

Branch Reset Groups
Consider a pattern (1a)|(2a)|(1b)\1. This defines three capture groups. For the input 1a1a, it is expected to match, however it does not.  The solution is to use a branch reset group. The pattern (?|(1a)|(2a)|(1b))\1 defines one capture group that matches inputs 1a1a or 2a2a or 1b1b. Example 48

LookAround
There are 4 types look around, positive/negative look ahead/behind. Collectively they are called lookaround, are zero-length assertions just like the start and end of line, or start and end of word anchors. The difference is that lookaround actually matches characters, but discards it, returning only the result: match or no match. 
Negative lookahead can be used if you want to match something not followed by something else. For example q(?!u) matches words like qack but not quit.
Positive lookahead works just the opposite. For example q(?=u) matches words like quit but not qack.
Lookbehind works backwards. Negative lookbehind can be used if you want to match something not preceded by something else. For example, (?<!a)b matches a “b” that is not preceded by an “a”. It doesn’t match cab, but matches the b (and only the b) in bed or debt. whereas  positive lookbehind  
(?<=a)b matches the b (and only the b) in cab, but does not match bed or debt.

Detailed example
Let's say there is an inventory of different writing items in different colors as below:

black pen
black pencil
red pen
red crayon
purple crayon

The Look around functions can be used to filter out unique items based on certain criteria as discussed below.

LookaroundPatternDescriptionDemo
Positive Look ahead(\w+) (?=pen\s)      extract all  the  colors of all the pen in the inventory. (black, red)Example 49       
Negative Look ahead(\w+) (?=pen\s)extract all  the  colors of all the items in the inventory that are not pen.
 (
black, red, purple)
 Example 50
Positive Look behind(\w+) (?=pen\s)extract all  the black color items in the inventory. (pen, pencil) Example 51
Negative Look behind(\w+) (?=pen\s)         extract all  the  items in the inventory that are not black color (pen, crayon) Example 52

LookBehind with \K
Due to certain restriction in matching expression of positive lookbehind i.e., <=expression,  as an alternative to positive lookbehind, \K switch can be used. For example, the pattern
(ab\Kc|d\Ke)f matches abcf and def  Example 53

Atomic Grouping
An atomic group is a group that, when the regex engine exits from it, automatically throws away all backtracking positions remembered by any tokens inside the group. Atomic groups are non-capturing. The syntax is (?>group). Lookaround groups are also atomic. 
Example:  The pattern  a(?>bc|b)c matches abcc but not abcExample 54 When applied to abc,  a matches to a, bc to bc, and then c will fail to match at the end of the string. In otherwords, backtracking will not happen as in case capture group and failure is reported.

If-Then-Else Conditionals 
If-Then-Else is a special construct allows creation of conditional regular expressions. If the if condition evaluates to true, then the regex engine will attempt to match the then part. Otherwise, the else part is attempted instead. The syntax is as below:
(?(condition)then|else)
The else part is optional. The condition can be the number of the group set or a lookaround etc.
Example: 
Consider (?:(a)|(b)|(c))(?(n)x|y) where n can be 1 or 2 or 3.

PatternifelseDemo
(?:(a)|(b)|(c))(?(1)x|y)axby  cyExample 55
(?:(a)|(b)|(c))(?(2)x|y)bxay  cyExample 56
(?:(a)|(b)|(c))(?(3)x|y)cxay  byExample 57

Recursion
Suppose the task is to find out if random number of open and close braces such as () or {} match, regular expression recursion comes to the rescue.
The syntax for recursion is (?R)
For example,  \{(?R)?\} matches input {{{}}} but fails {{{}}}} Example 58
Here { are matched with equal number of }. First, a matches the first { in the string. Then the regex engine reaches (?R). This tells the engine to attempt the whole regex again at the present position in the string. Now, { matches the second { in the string. The engine reaches (?R) again. On the second recursion, { matches the third {. On the third recursion, a fails to match the first } in the string with {. This causes (?R) to fail. But the regex uses a quantifier to make (?R) optional. So the engine continues with } which matches the first } in the string.
Now, the regex engine has reached the end of the regex. But since it’s two levels deep in recursion, it hasn’t found an overall match yet. It only has found a match for (?R). Exiting the recursion after a successful match, the engine also reaches }. It now matches the second } in the string. The engine is still one level deep in recursion, from which it exits with a successful match. Finally, } matches the third } in the string. The engine is again at the end of the regex. This time, it’s not inside any recursion. Thus, it returns {{{}}} as the overall regex match.
The main purpose of recursion is to match balanced constructs or nested constructs as shown below.
Example: The pattern \((?>[^()]|(?R))*\) matches the input 
( 1000 - ( 22 / ( 7 + 4  ) * 8  ) *  9  ) Example 59

Subroutines
Subroutines are applied to the capture groups. These are very similar to regular expression recursion. Instead of matching the entire regular expression again, a subroutine call only matches the regular expression inside a capturing group. A subroutine call can be made to any capturing group from anywhere in the pattern. A call made to same capturing group leads to recursion.
Recursion can be called in different ways. For example, (?1) calls a numbered group,  (?+1) to call the next group, (?-1) to call the preceding group,  (?&name) to call a named group.
For example, (?+1)(?'name'[abc])(?1)(?-1)(?&name) matches a string that is five letters long and consists only of the first three letters of the alphabet such as abcabcabab etc. Example 60
This regex is exactly the same as [abc](?'name'[abc])[abc][abc][abc]
Another example would be ([abc])(?1){4} matches cabab  Example 61
Recursion into a capturing group is a more flexible way of matching balanced constructs than recursion of the whole regex. We can wrap the regex in a capturing group, recurse into the capturing group instead of the whole regex, and add anchors outside the capturing group. 
The above example of matching equation can be written as 
(\((?>[^()]|(?1))*\)) 
This matches inputs such as ( 10 + 9 ) * ( 13 *7 ) + ( 6 * ( 9 ) * 7 )  Example 62
Another example is to match palindromes 
(?'word'(?'letter'[a-z])(?&word)\k'letter'|[a-z]?)
This matches inputs such as radar , dad , abba etc  Example 63

Using Regular expressions in C++
It's possible to replace  captured groups or entire match. It's discussed  below.
Regular expressions is a part of in C++ 11 standard library,  however it does not support many features discussed here. An alternate would be to use boost libraries which seems compatible with feature rich  perl.
As noted earlier, wandbox can be used to try out compile and run the example below. The examples discussed here use latest C++ lang compiler along latest boost library.
Majorly, regex programming involves three operations.

Match 
To match if the input text is an valid input. Examples: date, email address, phone numbers, SSN  etc

Search or extract
Extract certain information from the text. Examples : date, email address, phone numbers, SSN  etc

Replace
Replacing certain information from the text. Examples : date, email address, phone numbers, SSN  etc

The following example 64 demonstrates all the three above operations in detail as seen in its output.

Summary of Examples
NameCategoryDescriptionGithubRegex101/WandBox
  Example ValidationsSSNoutput     source + output   
  Example 2ValidationsPhone #output     source + output   
  Example 3ValidationsZip codeoutput     source + output   
  Example 4ValidationsDate Formats    output     source + output   
  Example 5ExtractionsArea codeoutput     source + output   
  Example 6ExtractionsDuplicate words      output     source + output   
  Example 7Dot CharacterUsageoutput     source + output   
  Example 8Character classSimpleoutput     source + output   
  Example 9Character classNegationoutput     source + output   
 Example 10Character classRangeoutput     source + output   
  Example 11Character classRange Negationoutput     source + output   
  Example 12ShorthandsDigitsoutput     source + output   
  Example 13ShorthandsNon Digitsoutput     source + output   
  Example 14ShorthandsWhite Spaceoutput     source + output   
  Example 15ShorthandsNon White Space Word output     source + output   
  Example 16ShorthandsNon Wordoutput     source + output   
  Example 17ShorthandsNon Wordoutput     source + output   
  Example 18EscapingUsageoutput     source + output   
  Example 19Anchors and boundary markersLine Beginoutput     source + output   
  Example 20Anchors and boundary markersLine Endoutput     source + output   
  Example 21Anchors and boundary markersWord Boundaryoutput     source + output   
  Example 22Anchors and boundary markersNon Word Boundary                output     source + output   
  Example 23Anchors and boundary markersBegin of Inputoutput     source + output   
  Example 24Anchors and boundary markersEnd of Previous Match                    output     source + output   
  Example 25Anchors and boundary markers               End of Inputoutput     source + output   
  Example 26Anchors and boundary markers               End of Input2output     source + output   
  Example 27Quantifiers?output     source + output   
  Example 28Quantifiers+output     source + output   
  Example 29Quantifiers*output     source + output   
  Example 30Quantifiers{n}output     source + output   
  Example 31Quantifiers{m,}output     source + output   
  Example 32Quantifiers{m,n}output     source + output   
  Example 33Greedy, Lazy and Possessive QuantifiersGreedyoutput     source + output   
  Example 34Greedy, Lazy and Possessive QuantifiersLazyoutput     source + output   
  Example 35Greedy, Lazy and Possessive QuantifiersCapture with OR(|) Operatoroutput     source + output   
  Example 36Capture groupsCaptureoutput     source + output   
  Example 37Capture groupsUsageoutput     source + output   
  Example 38Non capturing groupsSource                        output     source + output   
  Example 39ReferencesBack referenceoutput     source + output   
  Example 40ReferencesForward referenceoutput     source + output   
  Example 41ReferencesNested referenceoutput     source + output   
  Example 42ReferencesNamed reference output     source + output   
  Example 43ReferencesNamed reference 2output     source + output   
  Example 44ReferencesRelative reference source  output     source + output   
  Example 45FlagsSource                        output     source + output   
  Example 46UnicodeCurrencyoutput     source + output   
  Example 47UnicodeInternationaloutput     source + output   
  Example 48Branch Reset GroupsSource                        output     source + output   
  Example 49Look AroundPositive Look Aheadoutput     source + output   
  Example 50Look AroundNegative Look Aheadoutput     source + output   
  Example 51Look AroundPositive Look Behindoutput     source + output   
  Example 52Look AroundNegative Look Behindoutput     source + output   
  Example 53Look AroundLookBehind with \Koutput     source + output   
  Example 54Atomic GroupingSource                        output     source + output   
  Example 55If-Then-Else ConditionalsCapture Group 1output     source + output   
  Example 56If-Then-Else ConditionalsCapture Group 2output     source + output   
  Example 57If-Then-Else ConditionalsCapture Group 3output     source + output   
  Example 58RecursionBracesoutput     source + output   
  Example 59RecursionEquationoutput     source + output   
  Example 60SubroutinesThree Letteroutput     source + output   
  Example 61SubroutinesThree Letter 2output     source + output   
  Example 62SubroutinesEquationoutput     source + output   
  Example 63SubroutinesPalindromesoutput     source + output   
  Example 64Search and Replacement Source                        source  output     source + output