A C# Regular Expressions is a pattern that could be matched against an input text. The .Net framework provides a regular expression engine that allows such matching. A pattern consists of one or more character literals, operators, or constructs.
Constructs for Defining Regular C# Expressions
There are various categories of characters, operators, and constructs that let you define regular expressions. Click the following links to find these constructs.
- Character escapes
- Character classes
- Anchors
- Grouping constructs
- Quantifiers
- Backreference constructs
- Alternation constructs
- Substitutions
- Miscellaneous constructs
1. Character escapes
These are basically the special characters or escape characters. The backslash character (\) in a regular expression indicates that the character that follows it either is a special character or should be interpreted literally.
The following table lists the escape characters −
Escape character | Description | Pattern | Matches |
---|---|---|---|
\a | Matches a bell character, \u0007. | \a | “\u0007” in “Warning!” + ‘\u0007’ |
\b | In a character class, matches a backspace, \u0008. | [\b]{3,} | “\b\b\b\b” in “\b\b\b\b” |
\t | Matches a tab, \u0009. | (\w+)\t | “Name\t”, “Addr\t” in “Name\tAddr\t” |
\r | Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.) | \r\n(\w+) | “\r\nHello” in “\r\Hello\nWorld.” |
\v | Matches a vertical tab, \u000B. | [\v]{2,} | “\v\v\v” in “\v\v\v” |
\f | Matches a form feed, \u000C. | [\f]{2,} | “\f\f\f” in “\f\f\f” |
\n | Matches a new line, \u000A. | \r\n(\w+) | “\r\nHello” in “\r\Hello\nWorld.” |
\e | Matches an escape, \u001B. | \e | “\x001B” in “\x001B” |
\nnn | Uses octal representation to specify a character (nnn consists of up to three digits). | \w\040\w | “a b”, “c d” in “a bc d” |
\x nn | Uses hexadecimal representation to specify a character (nn consists of exactly two digits). | \w\x20\w | “a b”, “c d” in “a bc d” |
\c X\c x | Matches the ASCII control character that is specified by X or x, where X or x is the letter of the control character. | \cC | “\x0003” in “\x0003” (Ctrl-C) |
\u nnnn | Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn). | \w\u0020\w | “a b”, “c d” in “a bc d” |
\ | When followed by a character that is not recognized as an escaped character, matches that character. | \d+[\+-x\*]\d+\d+[\+-x\*\d+ | “2+2” and “3*9” in “(2+2) * 3*9” |
2. Character classes
A character class matches any one of a set of characters. The following table describes the character classes −
Character class | Description | Pattern | Matches |
---|---|---|---|
[character_group] | Matches any single character in character_group. By default, the match is case-sensitive. | [mn] | “m” in “mat” “m”, “n” in “moon” |
[^character_group] | Negation: Matches any single character that is not in character_group. By default, characters incharacter_group are case-sensitive. | [^aei] | “v”, “l” in “avail” |
[ first – last ] | Character range: Matches any single character in the range from first to last. | [b-d] | [b-d]irds Birds Cirds Dirds |
. | Wildcard: Matches any single character except \n. | a.e | “ave” in “have” “ate” in “mate” |
\p{ name } | Matches any single character in the Unicode general category or named block specified by name. | \p{Lu} | “C”, “L” in “City Lights” |
\P{ name } | Matches any single character that is not in the Unicode general category or named block specified by name. | \P{Lu} | “i”, “t”, “y” in “City” |
\w | Matches any word character. | \w | “R”, “o”, “m” and “1” in “Room#1” |
\W | Matches any non-word character. | \W | “#” in “Room#1” |
\s | Matches any white-space character. | \w\s | “D ” in “ID A1.3” |
\S | Matches any non-white-space character. | \s\S | ” _” in “int __ctr” |
\d | Matches any decimal digit. | \d | “4” in “4 = IV” |
\D | Matches any character other than a decimal digit. | \D | ” “, “=”, ” “, “I”, “V” in “4 = IV” |
3. Anchors
Anchors allow a match to succeed or fail depending on the current position in the string. The following table lists the anchors −
Assertion | Description | Pattern | Matches |
---|---|---|---|
^ | The match must start at the beginning of the string or line. | ^\d{3} | “567” in “567-777-“ |
$ | The match must occur at the end of the string or before \n at the end of the line or string. | -\d{4}$ | “-2012” in “8-12-2012” |
\A | The match must occur at the start of the string. | \A\w{3} | “Code” in “Code-007-“ |
\Z | The match must occur at the end of the string or before \n at the end of the string. | -\d{3}\Z | “-007” in “Bond-901-007” |
\z | The match must occur at the end of the string. | -\d{3}\z | “-333” in “-901-333” |
\G | The match must occur at the point where the previous match ended. | \\G\(\d\) | “(1)”, “(3)”, “(5)” in “(1)(3)(5)[7](9)” |
\b | The match must occur on a boundary between a \w (alphanumeric) and a \W(nonalphanumeric) character. | \w | “R”, “o”, “m” and “1” in “Room#1” |
\B | The match must not occur on a \b boundary. | \Bend\w*\b | “ends”, “ender” in “end sends endure lender” |
4. Grouping constructs
Grouping constructs delineate sub-expressions of a regular expression and capture substrings of an input string. The following table lists the grouping constructs −
Grouping construct | Description | Pattern | Matches |
---|---|---|---|
( subexpression ) | Captures the matched subexpression and assigns it a zero-based ordinal number. | (\w)\1 | “ee” in “deep” |
(?< name >subexpression) | Captures the matched subexpression into a named group. | (?< double>\w)\k< double> | “ee” in “deep” |
(?< name1 -name2 >subexpression) | Defines a balancing group definition. | (((?’Open’\()[^\(\)]*)+((?’Close-Open’\))[^\(\)]*)+)*(?(Open)(?!))$ | “((1-3)*(3-1))” in “3+2^((1-3)*(3-1))” |
(?: subexpression) | Defines a noncapturing group. | Write(?:Line)? | “WriteLine” in “Console.WriteLine()” |
(?imnsx-imnsx:subexpression) | Applies or disables the specified options within subexpression. | A\d{2}(?i:\w+)\b | “A12xl”, “A12XL” in “A12xl A12XL a12xl” |
(?= subexpression) | Zero-width positive lookahead assertion. | \w+(?=\.) | “is”, “ran”, and “out” in “He is. The dog ran. The sun is out.” |
(?! subexpression) | Zero-width negative lookahead assertion. | \b(?!un)\w+\b | “sure”, “used” in “unsure sure unity used” |
(?< =subexpression) | Zero-width positive lookbehind assertion. | (?< =19)\d{2}\b | “99”, “50”, “05” in “1851 1999 1950 1905 2003” |
(?< ! subexpression) | Zero-width negative lookbehind assertion. | (?< !19)\d{2}\b | “51”, “03” in “1851 1999 1950 1905 2003” |
(?> subexpression) | Nonbacktracking (or “greedy”) subexpression. | [13579](?>A+B+) | “1ABB”, “3ABB”, and “5AB” in “1ABB 3ABBC 5AB 5AC” |
5. Quantifiers
Quantifiers specify how many instances of the previous element (which can be a character, a group, or a character class) must be present in the input string for a match to occur.
Quantifier | Description | Pattern | Matches |
---|---|---|---|
* | Matches the previous element zero or more times. | \d*\.\d | “.0”, “19.9”, “219.9” |
+ | Matches the previous element one or more times. | “be+” | “bee” in “been”, “be” in “bent” |
? | Matches the previous element zero or one time. | “rai?n” | “ran”, “rain” |
{ n } | Matches the previous element exactly n times. | “,\d{3}” | “,043” in “1,043.6”, “,876”, “,543”, and “,210” in “9,876,543,210” |
{ n ,} | Matches the previous element at least n times. | “\d{2,}” | “166”, “29”, “1930” |
{ n , m } | Matches the previous element at least n times, but no more than m times. | “\d{3,5}” | “166”, “17668” “19302” in “193024” |
*? | Matches the previous element zero or more times, but as few times as possible. | \d*?\.\d | “.0”, “19.9”, “219.9” |
+? | Matches the previous element one or more times, but as few times as possible. | “be+?” | “be” in “been”, “be” in “bent” |
?? | Matches the previous element zero or one time, but as few times as possible. | “rai?? n” | “ran”, “rain” |
{ n }? | Matches the preceding element exactly n times. | “,\d{3}?” | “,043” in “1,043.6”, “,876”, “,543”, and “,210” in “9,876,543,210” |
{ n ,}? | Matches the previous element at least n times, but as few times as possible. | “\d{2,}?” | “166”, “29”, “1930” |
{ n , m }? | Matches the previous element between n and m times, but as few times as possible. | “\d{3,5}?” | “166”, “17668” “193”, “024” in “193024” |
6. Backreference constructs
Backreference constructs allow a previously matched sub-expression to be identified subsequently in the same regular expression.
The following table lists these constructs −
Backreference construct | Description | Pattern | Matches |
---|---|---|---|
\ number | Backreference. Matches the value of a numbered subexpression. | (\w)\1 | “ee” in “seek” |
\k< name > | Named backreference. Matches the value of a named expression. | (?< char>\w)\k< char> | “ee” in “seek” |
7. Alternation constructs
Alternation constructs modify a regular expression to enable either/or matching. The following table lists the alternation constructs −
Alternation construct | Description | Pattern | Matches |
---|---|---|---|
| | Matches any one element separated by the vertical bar (|) character. | th(e|is|at) | “the”, “this” in “this is the day. “ |
(?( expression )yes | no ) | Matches yes if expression matches; otherwise, matches the optional no part. Expression is interpreted as a zero-width assertion. | (?(A)A\d{2}\b|\b\d{3}\b) | “A10”, “910” in “A10 C103 910” |
(?( name )yes | no ) | Matches yes if the named capture name has a match; otherwise, matches the optional no. | (?< quoted>”)?(?(quoted).+?”|\S+\s) | Dogs.jpg, “Yiska playing.jpg” in “Dogs.jpg “Yiska playing.jpg”” |
8. Substitutions
Substitutions are used in replacement patterns. The following table lists the substitutions −
Character | Description | Pattern | Replacement pattern | Input string | Resulting string |
---|---|---|---|---|---|
$number | Substitutes the substring matched by group number. | \b(\w+)(\s)(\w+)\b | $3$2$1 | “one two” | “two one” |
${name} | Substitutes the substring matched by the named groupname. | \b(?< word1>\w+)(\s)(?< word2>\w+)\b | ${word2} ${word1} | “one two” | “two one” |
$$ | Substitutes a literal “$”. | \b(\d+)\s?USD | $$$1 | “103 USD” | “$103” |
$& | Substitutes a copy of the whole match. | (\$*(\d*(\.+\d+)?){1}) | **$& | “$1.30” | “**$1.30**” |
$` | Substitutes all the text of the input string before the match. | B+ | $` | “AABBCC” | “AAAACC” |
$’ | Substitutes all the text of the input string after the match. | B+ | $’ | “AABBCC” | “AACCCC” |
$+ | Substitutes the last group that was captured. | B+(C+) | $+ | “AABBCCDD” | AACCDD |
$_ | Substitutes the entire input string. | B+ | $_ | “AABBCC” | “AAAABBCCCC” |
9. Miscellaneous constructs
The following table lists various miscellaneous constructs −
Construct | Definition | Example |
---|---|---|
(?imnsx-imnsx) | Sets or disables options such as case insensitivity in the middle of a pattern. | \bA(?i)b\w+\b matches “ABA”, “Able” in “ABA Able Act” |
(?#comment) | Inline comment. The comment ends at the first closing parenthesis. | \bA(?#Matches words starting with A)\w+\b |
# [to end of line] | X-mode comment. The comment starts at an unescaped # and continues to the end of the line. | (?x)\bA\w+\b#Matches words starting with A |
The Regex Class
The Regex class is used for representing a regular expression. It has the following commonly used methods −
Sr.No. | Methods & Description |
---|---|
1 | public bool IsMatch(string input)Indicates whether the regular expression specified in the Regex constructor finds a match in a specified input string. |
2 | public bool IsMatch(string input, int start at)Indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string, beginning at the specified starting position in the string. |
3 | public static bool IsMatch(string input, string pattern)Indicates whether the specified regular expression finds a match in the specified input string. |
4 | public MatchCollection Matches(string input)Searches the specified input string for all occurrences of a regular expression. |
5 | public string Replace(string input, string replacement)In a specified input string, replaces all strings that match a regular expression pattern with a specified replacement string. |
6 | public string[] Split(string input)Splits an input string into an array of substrings at the positions defined by a regular expression pattern specified in the Regex constructor. |
For the complete list of methods and properties, please read the Microsoft documentation on C#.
Example 1
The following example matches words that start with ‘S’ −
using System; using System.Text.RegularExpressions; namespace RegExApplication { class Program { private static void showMatch(string text, string expr) { Console.WriteLine("The Expression: " + expr); MatchCollection mc = Regex.Matches(text, expr); foreach (Match m in mc) { Console.WriteLine(m); } } static void Main(string[] args) { string str = "A Thousand Splendid Suns"; Console.WriteLine("Matching words that start with 'S': "); showMatch(str, @"\bS\S*"); Console.ReadKey(); } } }
When the above code is compiled and executed, it produces the following result −
Matching words that start with 'S': The Expression: \bS\S* Splendid Suns
Example 2
The following example matches words that start with ‘m’ and end with ‘e’ −
using System; using System.Text.RegularExpressions; namespace RegExApplication { class Program { private static void showMatch(string text, string expr) { Console.WriteLine("The Expression: " + expr); MatchCollection mc = Regex.Matches(text, expr); foreach (Match m in mc) { Console.WriteLine(m); } } static void Main(string[] args) { string str = "make maze and manage to measure it"; Console.WriteLine("Matching words start with 'm' and ends with 'e':"); showMatch(str, @"\bm\S*e\b"); Console.ReadKey(); } } }
When the above code is compiled and executed, it produces the following result −
Matching words start with 'm' and ends with 'e': The Expression: \bm\S*e\b make maze manage measure
Example 3
This example replaces extra white space −
using System; using System.Text.RegularExpressions; namespace RegExApplication { class Program { static void Main(string[] args) { string input = "Hello World "; string pattern = "\\s+"; string replacement = " "; Regex rgx = new Regex(pattern); string result = rgx.Replace(input, replacement); Console.WriteLine("Original String: {0}", input); Console.WriteLine("Replacement String: {0}", result); Console.ReadKey(); } } }
When the above code is compiled and executed, it produces the following result −
Original String: Hello World Replacement String: Hello World
Next Topic – Click Here
Pingback: C# - Preprocessor Directives - Adglob Infosystem Pvt Ltd
I am so grateful for your blog. Much obliged.