This part covers:
- Recursive
regex patterns (via balancing groups)
- Named
group replacement in strings
- Unicode
support & matching
- Regex
+ LINQ for text processing
- Building
a mini domain-specific language (DSL) parser
- Creating
a reusable input validation framework
- Regex
+ Span (high-performance .NET)
- Localization-aware
patterns
- Conditional
expressions in regex
- Building
readable regex for maintainability
6.1 Recursive Patterns (Balancing Groups)
C# supports balancing groups for handling recursive
structures (e.g., nested parentheses).
🧪 Example: Match Balanced
Parentheses
string pattern = @"^ (?> [^()]+ # Non-parens | \( (?<open>) # Open paren | (?<-open>\)) # Close paren )* (?(open)(?!))$ # Fail if open count ≠ closed "; string[] tests = { "(abc)", // ✅ "((a)(b))", // ✅ "(a(b)", // ❌ "(a)b)", // ❌ }; foreach (var test in tests) Console.WriteLine($"{test}: {Regex.IsMatch(test, pattern, RegexOptions.IgnorePatternWhitespace)}");
6.2 Named Group Replacement
.NET allows named group replacements using $name
syntax.
Example: Swap names
string pattern = @"(?<first>\w+)\s(?<last>\w+)"; string input = "John Doe"; string result = Regex.Replace(input, pattern, "${last}, ${first}"); Console.WriteLine(result); // Doe, John
6.3 Unicode-Aware Regex
By default, \w includes only ASCII. Use \p{L} to match all
Unicode letters.
Example: Match Vietnamese names
string pattern = @"\p{L}+"; string input = "Nguyễn Văn A"; foreach (Match m in Regex.Matches(input, pattern)) Console.WriteLine(m.Value); // Nguyễn, Văn, A
📌 Use RegexOptions.CultureInvariant
if you want culture-neutral behavior.
6.4 Regex + LINQ for Text Analysis
Use regex with LINQ to build efficient text-processing
pipelines.
Example: Top 3 most common words
string text = "hello world, hello universe, hello regex."; var topWords = Regex.Matches(text.ToLower(), @"\w+") .Cast<Match>() .GroupBy(m => m.Value) .OrderByDescending(g => g.Count()) .Take(3); foreach (var group in topWords) Console.WriteLine($"{group.Key}: {group.Count()}");
6.5 Build a Simple DSL Parser with Regex
Imagine a mini language like:
set x = 5
set y = x + 2
print y
You can use regex to tokenize:
string[] lines = { "set x = 5", "set y = x + 2", "print y" }; Regex assignment = new Regex(@"^set (?<var>\w+) = (?<expr>.+)$"); Regex print = new Regex(@"^print (?<var>\w+)$"); foreach (string line in lines) { if (assignment.IsMatch(line)) { var m = assignment.Match(line); Console.WriteLine($"Set {m.Groups["var"].Value} to {m.Groups["expr"].Value}"); } else if (print.IsMatch(line)) { var m = print.Match(line); Console.WriteLine($"Print value of {m.Groups["var"].Value}"); } }
6.6 Build a Reusable Input Validator
InputValidator.cs
public class InputValidator { private readonly Dictionary<string, string> _rules = new(); public void AddRule(string name, string pattern) => _rules[name] = pattern; public bool Validate(string name, string input) => _rules.TryGetValue(name, out string pattern) && Regex.IsMatch(input, pattern); }
Usage
var validator = new InputValidator(); validator.AddRule("email", @"^[\w\.-]+@[\w\.-]+\.\w{2,}$"); validator.AddRule("username", @"^[a-zA-Z0-9]{4,12}$"); Console.WriteLine(validator.Validate("email", "user@example.com")); // True Console.WriteLine(validator.Validate("username", "bad name")); // False
🛠 This can be extended
into a full-blown validation framework.
6.7 Regex with Span for High Performance
Use Regex.TryMatch and Span<T> (in .NET 8+):
ReadOnlySpan<char> input = "123-456-7890"; Regex regex = new Regex(@"^\d{3}-\d{3}-\d{4}$"); bool success = regex.IsMatch(input); Console.WriteLine(success); // True
⚡ Span<T> avoids string
allocations — great for large-scale processing.
6.8 Locale-Specific Matching
If you want case-insensitive matching for Turkish, Vietnamese,
etc., use:
Regex rx = new Regex(@"abc", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
For locale-aware scenarios, remove CultureInvariant.
6.9 Conditional Matching
C# supports conditional logic in regex patterns.
Example: Optional area code:
string pattern = @"^ (?(?=\() # If it starts with ( \(\d{3}\)\s? # Match area code | \d{3}- # Else match standard prefix ) \d{3}-\d{4} # Always match last part $"; string[] phones = { "(123) 456-7890", "123-456-7890", "456-7890" }; foreach (string phone in phones) Console.WriteLine($"{phone}: {Regex.IsMatch(phone, pattern, RegexOptions.IgnorePatternWhitespace)}");
6.10 Make Complex Patterns More Readable
Use RegexOptions.IgnorePatternWhitespace to write readable
multiline regex:
string pattern = @" ^ # Start (?<name>[\p{L}\s]+) # Name \s \((?<age>\d{2})\) # Age in parentheses $ "; string input = "Nguyễn Văn A (25)"; Match m = Regex.Match(input, pattern, RegexOptions.IgnorePatternWhitespace); Console.WriteLine($"Name: {m.Groups["name"]}, Age: {m.Groups["age"]}");
Summary
In this advanced part, you’ve learned how to:
✅ Use recursion and balancing
groups
✅
Handle Unicode, locales, and named replacements
✅
Combine regex with LINQ and high-performance features
✅
Build reusable, readable, and maintainable regex-powered tools
✅
Go beyond pattern matching — into full
parsing and DSL territory
Coming Up Next
In Part 7, we’ll build a complete C# Regex
Application Project:
A Log File Analyzer that extracts:
- Errors
and warnings
- Timestamps
and levels
- Stack
traces
- Generates a report file with summaries and patterns
Comments
Post a Comment