C# – Regular Expressions

  • Post author:
  • Post category:C#
  • Post comments:2 Comments
C# - Regular Expressions

C# Regular Expressions is a pattern that could be matched against an input text. The .Net framework provides a regular expression engine that allows such matching. A pattern consists of one or more character literals, operators, or constructs.

Constructs for Defining Regular C# Expressions

There are various categories of characters, operators, and constructs that let you define regular expressions. Click the following links to find these constructs.

  • Character escapes
  • Character classes
  • Anchors
  • Grouping constructs
  • Quantifiers
  • Backreference constructs
  • Alternation constructs
  • Substitutions
  • Miscellaneous constructs

1. Character escapes

These are basically the special characters or escape characters. The backslash character (\) in a regular expression indicates that the character that follows it either is a special character or should be interpreted literally.

The following table lists the escape characters −

Escape characterDescriptionPatternMatches
\aMatches a bell character, \u0007.\a“\u0007” in “Warning!” + ‘\u0007’
\bIn a character class, matches a backspace, \u0008.[\b]{3,}“\b\b\b\b” in “\b\b\b\b”
\tMatches a tab, \u0009.(\w+)\t“Name\t”, “Addr\t” in “Name\tAddr\t”
\rMatches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.)\r\n(\w+)“\r\nHello” in “\r\Hello\nWorld.”
\vMatches a vertical tab, \u000B.[\v]{2,}“\v\v\v” in “\v\v\v”
\fMatches a form feed, \u000C.[\f]{2,}“\f\f\f” in “\f\f\f”
\nMatches a new line, \u000A.\r\n(\w+)“\r\nHello” in “\r\Hello\nWorld.”
\eMatches an escape, \u001B.\e“\x001B” in “\x001B”
\nnnUses octal representation to specify a character (nnn consists of up to three digits).\w\040\w“a b”, “c d” in “a bc d”
\x nnUses hexadecimal representation to specify a character (nn consists of exactly two digits).\w\x20\w“a b”, “c d” in “a bc d”
\c X\c xMatches the ASCII control character that is specified by X or x, where X or x is the letter of the control character.\cC“\x0003” in “\x0003” (Ctrl-C)
\u nnnnMatches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn).\w\u0020\w“a b”, “c d” in “a bc d”
\When followed by a character that is not recognized as an escaped character, matches that character.\d+[\+-x\*]\d+\d+[\+-x\*\d+“2+2” and “3*9” in “(2+2) * 3*9”

2. Character classes

A character class matches any one of a set of characters. The following table describes the character classes −

Character classDescriptionPatternMatches
[character_group]Matches any single character in character_group. By default, the match is case-sensitive.[mn]“m” in “mat” “m”, “n” in “moon”
[^character_group]Negation: Matches any single character that is not in character_group. By default, characters incharacter_group are case-sensitive.[^aei]“v”, “l” in “avail”
[ first – last ]Character range: Matches any single character in the range from first to last.[b-d][b-d]irds Birds Cirds Dirds
.Wildcard: Matches any single character except \n.a.e“ave” in “have” “ate” in “mate”
\p{ name }Matches any single character in the Unicode general category or named block specified by name.\p{Lu}“C”, “L” in “City Lights”
\P{ name }Matches any single character that is not in the Unicode general category or named block specified by name.\P{Lu}“i”, “t”, “y” in “City”
\wMatches any word character.\w“R”, “o”, “m” and “1” in “Room#1”
\WMatches any non-word character.\W“#” in “Room#1”
\sMatches any white-space character.\w\s“D ” in “ID A1.3”
\SMatches any non-white-space character.\s\S” _” in “int __ctr”
\dMatches any decimal digit.\d“4” in “4 = IV”
\DMatches any character other than a decimal digit.\D” “, “=”, ” “, “I”, “V” in “4 = IV”

3. Anchors

Anchors allow a match to succeed or fail depending on the current position in the string. The following table lists the anchors −

AssertionDescriptionPatternMatches
^The match must start at the beginning of the string or line.^\d{3}“567” in “567-777-“
$The match must occur at the end of the string or before \n at the end of the line or string.-\d{4}$“-2012” in “8-12-2012”
\AThe match must occur at the start of the string.\A\w{3}“Code” in “Code-007-“
\ZThe match must occur at the end of the string or before \n at the end of the string.-\d{3}\Z“-007” in “Bond-901-007”
\zThe match must occur at the end of the string.-\d{3}\z“-333” in “-901-333”
\GThe match must occur at the point where the previous match ended.\\G\(\d\)“(1)”, “(3)”, “(5)” in “(1)(3)(5)[7](9)”
\bThe match must occur on a boundary between a \w (alphanumeric) and a \W(nonalphanumeric) character.\w“R”, “o”, “m” and “1” in “Room#1”
\BThe match must not occur on a \b boundary.\Bend\w*\b“ends”, “ender” in “end sends endure lender”

4. Grouping constructs

Grouping constructs delineate sub-expressions of a regular expression and capture substrings of an input string. The following table lists the grouping constructs −

Grouping constructDescriptionPatternMatches
( subexpression )Captures the matched subexpression and assigns it a zero-based ordinal number.(\w)\1“ee” in “deep”
(?< name >subexpression)Captures the matched subexpression into a named group.(?< double>\w)\k< double>“ee” in “deep”
(?< name1 -name2 >subexpression)Defines a balancing group definition.(((?’Open’\()[^\(\)]*)+((?’Close-Open’\))[^\(\)]*)+)*(?(Open)(?!))$“((1-3)*(3-1))” in “3+2^((1-3)*(3-1))”
(?: subexpression)Defines a noncapturing group.Write(?:Line)?“WriteLine” in “Console.WriteLine()”
(?imnsx-imnsx:subexpression)Applies or disables the specified options within subexpression.A\d{2}(?i:\w+)\b“A12xl”, “A12XL” in “A12xl A12XL a12xl”
(?= subexpression)Zero-width positive lookahead assertion.\w+(?=\.)“is”, “ran”, and “out” in “He is. The dog ran. The sun is out.”
(?! subexpression)Zero-width negative lookahead assertion.\b(?!un)\w+\b“sure”, “used” in “unsure sure unity used”
(?< =subexpression)Zero-width positive lookbehind assertion.(?< =19)\d{2}\b“99”, “50”, “05” in “1851 1999 1950 1905 2003”
(?< ! subexpression)Zero-width negative lookbehind assertion.(?< !19)\d{2}\b“51”, “03” in “1851 1999 1950 1905 2003”
(?> subexpression)Nonbacktracking (or “greedy”) subexpression.[13579](?>A+B+)“1ABB”, “3ABB”, and “5AB” in “1ABB 3ABBC 5AB 5AC”

5. Quantifiers

Quantifiers specify how many instances of the previous element (which can be a character, a group, or a character class) must be present in the input string for a match to occur.

QuantifierDescriptionPatternMatches
*Matches the previous element zero or more times.\d*\.\d“.0”, “19.9”, “219.9”
+Matches the previous element one or more times.“be+”“bee” in “been”, “be” in “bent”
?Matches the previous element zero or one time.“rai?n”“ran”, “rain”
{ n }Matches the previous element exactly n times.“,\d{3}”“,043” in “1,043.6”, “,876”, “,543”, and “,210” in “9,876,543,210”
{ n ,}Matches the previous element at least n times.“\d{2,}”“166”, “29”, “1930”
{ n , m }Matches the previous element at least n times, but no more than m times.“\d{3,5}”“166”, “17668” “19302” in “193024”
*?Matches the previous element zero or more times, but as few times as possible.\d*?\.\d“.0”, “19.9”, “219.9”
+?Matches the previous element one or more times, but as few times as possible.“be+?”“be” in “been”, “be” in “bent”
??Matches the previous element zero or one time, but as few times as possible.“rai?? n”“ran”, “rain”
{ n }?Matches the preceding element exactly n times.“,\d{3}?”“,043” in “1,043.6”, “,876”, “,543”, and “,210” in “9,876,543,210”
{ n ,}?Matches the previous element at least n times, but as few times as possible.“\d{2,}?”“166”, “29”, “1930”
{ n , m }?Matches the previous element between n and m times, but as few times as possible.“\d{3,5}?”“166”, “17668” “193”, “024” in “193024”

6. Backreference constructs

Backreference constructs allow a previously matched sub-expression to be identified subsequently in the same regular expression.

The following table lists these constructs −

Backreference constructDescriptionPatternMatches
\ numberBackreference. Matches the value of a numbered subexpression.(\w)\1“ee” in “seek”
\k< name >Named backreference. Matches the value of a named expression.(?< char>\w)\k< char>“ee” in “seek”

7. Alternation constructs

Alternation constructs modify a regular expression to enable either/or matching. The following table lists the alternation constructs −

Alternation constructDescriptionPatternMatches
|Matches any one element separated by the vertical bar (|) character.th(e|is|at)“the”, “this” in “this is the day. “
(?( expression )yes | no )Matches yes if expression matches; otherwise, matches the optional no part. Expression is interpreted as a zero-width assertion.(?(A)A\d{2}\b|\b\d{3}\b)“A10”, “910” in “A10 C103 910”
(?( name )yes | no )Matches yes if the named capture name has a match; otherwise, matches the optional no.(?< quoted>”)?(?(quoted).+?”|\S+\s)Dogs.jpg, “Yiska playing.jpg” in “Dogs.jpg “Yiska playing.jpg””

8. Substitutions

Substitutions are used in replacement patterns. The following table lists the substitutions −

CharacterDescriptionPatternReplacement patternInput stringResulting string
$numberSubstitutes the substring matched by group number.\b(\w+)(\s)(\w+)\b$3$2$1“one two”“two one”
${name}Substitutes the substring matched by the named groupname.\b(?< word1>\w+)(\s)(?< word2>\w+)\b${word2} ${word1}“one two”“two one”
$$Substitutes a literal “$”.\b(\d+)\s?USD$$$1“103 USD”“$103”
$&Substitutes a copy of the whole match.(\$*(\d*(\.+\d+)?){1})**$&“$1.30”“**$1.30**”
$`Substitutes all the text of the input string before the match.B+$`“AABBCC”“AAAACC”
$’Substitutes all the text of the input string after the match.B+$’“AABBCC”“AACCCC”
$+Substitutes the last group that was captured.B+(C+)$+“AABBCCDD”AACCDD
$_Substitutes the entire input string.B+$_“AABBCC”“AAAABBCCCC”

9. Miscellaneous constructs

The following table lists various miscellaneous constructs −

ConstructDefinitionExample
(?imnsx-imnsx)Sets or disables options such as case insensitivity in the middle of a pattern.\bA(?i)b\w+\b matches “ABA”, “Able” in “ABA Able Act”
(?#comment)Inline comment. The comment ends at the first closing parenthesis.\bA(?#Matches words starting with A)\w+\b
[to end of line]X-mode comment. The comment starts at an unescaped # and continues to the end of the line.(?x)\bA\w+\b#Matches words starting with A

The Regex Class

The Regex class is used for representing a regular expression. It has the following commonly used methods −

Sr.No.Methods & Description
1public bool IsMatch(string input)Indicates whether the regular expression specified in the Regex constructor finds a match in a specified input string.
2public bool IsMatch(string input, int start at)Indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string, beginning at the specified starting position in the string.
3public static bool IsMatch(string input, string pattern)Indicates whether the specified regular expression finds a match in the specified input string.
4public MatchCollection Matches(string input)Searches the specified input string for all occurrences of a regular expression.
5public string Replace(string input, string replacement)In a specified input string, replaces all strings that match a regular expression pattern with a specified replacement string.
6public string[] Split(string input)Splits an input string into an array of substrings at the positions defined by a regular expression pattern specified in the Regex constructor.

For the complete list of methods and properties, please read the Microsoft documentation on C#.

Example 1

The following example matches words that start with ‘S’ −

using System;
using System.Text.RegularExpressions;

namespace RegExApplication {
   class Program {
      private static void showMatch(string text, string expr) {
         Console.WriteLine("The Expression: " + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         
         foreach (Match m in mc) {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args) {
         string str = "A Thousand Splendid Suns";
         
         Console.WriteLine("Matching words that start with 'S': ");
         showMatch(str, @"\bS\S*");
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it produces the following result −

Matching words that start with 'S':
The Expression: \bS\S*
Splendid
Suns

Example 2

The following example matches words that start with ‘m’ and end with ‘e’ −

using System;
using System.Text.RegularExpressions;

namespace RegExApplication {
   class Program {
      private static void showMatch(string text, string expr) {
         Console.WriteLine("The Expression: " + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         
         foreach (Match m in mc) {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args) {
         string str = "make maze and manage to measure it";

         Console.WriteLine("Matching words start with 'm' and ends with 'e':");
         showMatch(str, @"\bm\S*e\b");
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it produces the following result −

Matching words start with 'm' and ends with 'e':
The Expression: \bm\S*e\b
make
maze
manage
measure

Example 3

This example replaces extra white space −

using System;
using System.Text.RegularExpressions;

namespace RegExApplication {
   class Program {
      static void Main(string[] args) {
         string input = "Hello   World   ";
         string pattern = "\\s+";
         string replacement = " ";
         
         Regex rgx = new Regex(pattern);
         string result = rgx.Replace(input, replacement);

         Console.WriteLine("Original String: {0}", input);
         Console.WriteLine("Replacement String: {0}", result);    
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it produces the following result −

Original String: Hello World   
Replacement String: Hello World  

Next Topic – Click Here

This Post Has 2 Comments

Leave a Reply