Skip to main content

Mastering Regular Expression. Part 6: Advanced Regex Patterns & Techniques in C#

This part covers:

  1. Recursive regex patterns (via balancing groups)
  2. Named group replacement in strings
  3. Unicode support & matching
  4. Regex + LINQ for text processing
  5. Building a mini domain-specific language (DSL) parser
  6. Creating a reusable input validation framework
  7. Regex + Span (high-performance .NET)
  8. Localization-aware patterns
  9. Conditional expressions in regex
  10. Building readable regex for maintainability

6.1 Recursive Patterns (Balancing Groups)

C# supports balancing groups for handling recursive structures (e.g., nested parentheses).

🧪 Example: Match Balanced Parentheses

string pattern = @"^
    (?>
        [^()]+             # Non-parens
        | \(
            (?<open>)      # Open paren
        |
            (?<-open>\))   # Close paren
    )*
    (?(open)(?!))$         # Fail if open count ≠ closed
";

string[] tests = {
    "(abc)",         // ✅
    "((a)(b))",      // ✅
    "(a(b)",         // ❌
    "(a)b)",         // ❌
};

foreach (var test in tests)
    Console.WriteLine($"{test}: {Regex.IsMatch(test, pattern, RegexOptions.IgnorePatternWhitespace)}");

6.2 Named Group Replacement

.NET allows named group replacements using $name syntax.

Example: Swap names

string pattern = @"(?<first>\w+)\s(?<last>\w+)";
string input = "John Doe";

string result = Regex.Replace(input, pattern, "${last}, ${first}");
Console.WriteLine(result); // Doe, John

6.3 Unicode-Aware Regex

By default, \w includes only ASCII. Use \p{L} to match all Unicode letters.

Example: Match Vietnamese names

string pattern = @"\p{L}+";
string input = "Nguyễn Văn A";

foreach (Match m in Regex.Matches(input, pattern))
    Console.WriteLine(m.Value); // Nguyễn, Văn, A

📌 Use RegexOptions.CultureInvariant if you want culture-neutral behavior.


6.4 Regex + LINQ for Text Analysis

Use regex with LINQ to build efficient text-processing pipelines.

Example: Top 3 most common words

string text = "hello world, hello universe, hello regex.";

var topWords = Regex.Matches(text.ToLower(), @"\w+")
    .Cast<Match>()
    .GroupBy(m => m.Value)
    .OrderByDescending(g => g.Count())
    .Take(3);

foreach (var group in topWords)
    Console.WriteLine($"{group.Key}: {group.Count()}");

6.5 Build a Simple DSL Parser with Regex

Imagine a mini language like:

set x = 5

set y = x + 2

print y

You can use regex to tokenize:

string[] lines = {
    "set x = 5",
    "set y = x + 2",
    "print y"
};

Regex assignment = new Regex(@"^set (?<var>\w+) = (?<expr>.+)$");
Regex print = new Regex(@"^print (?<var>\w+)$");

foreach (string line in lines) {
    if (assignment.IsMatch(line)) {
        var m = assignment.Match(line);
        Console.WriteLine($"Set {m.Groups["var"].Value} to {m.Groups["expr"].Value}");
    } else if (print.IsMatch(line)) {
        var m = print.Match(line);
        Console.WriteLine($"Print value of {m.Groups["var"].Value}");
    }
}

6.6 Build a Reusable Input Validator

InputValidator.cs

public class InputValidator {
    private readonly Dictionary<string, string> _rules = new();

    public void AddRule(string name, string pattern) => _rules[name] = pattern;

    public bool Validate(string name, string input) =>
        _rules.TryGetValue(name, out string pattern) &&
        Regex.IsMatch(input, pattern);
}

Usage

var validator = new InputValidator();
validator.AddRule("email", @"^[\w\.-]+@[\w\.-]+\.\w{2,}$");
validator.AddRule("username", @"^[a-zA-Z0-9]{4,12}$");

Console.WriteLine(validator.Validate("email", "user@example.com")); // True
Console.WriteLine(validator.Validate("username", "bad name")); // False

🛠 This can be extended into a full-blown validation framework.


6.7 Regex with Span for High Performance

Use Regex.TryMatch and Span<T> (in .NET 8+):

ReadOnlySpan<char> input = "123-456-7890";
Regex regex = new Regex(@"^\d{3}-\d{3}-\d{4}$");

bool success = regex.IsMatch(input);
Console.WriteLine(success); // True

Span<T> avoids string allocations — great for large-scale processing.


6.8 Locale-Specific Matching

If you want case-insensitive matching for Turkish, Vietnamese, etc., use:

Regex rx = new Regex(@"abc", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);

For locale-aware scenarios, remove CultureInvariant.

6.9 Conditional Matching

C# supports conditional logic in regex patterns.

Example: Optional area code:

string pattern = @"^
    (?(?=\()         # If it starts with (
        \(\d{3}\)\s?  # Match area code
        | \d{3}-      # Else match standard prefix
    )
    \d{3}-\d{4}       # Always match last part
$";

string[] phones = { "(123) 456-7890", "123-456-7890", "456-7890" };

foreach (string phone in phones)
    Console.WriteLine($"{phone}: {Regex.IsMatch(phone, pattern, RegexOptions.IgnorePatternWhitespace)}");

6.10 Make Complex Patterns More Readable

Use RegexOptions.IgnorePatternWhitespace to write readable multiline regex:

string pattern = @"
    ^                         # Start
    (?<name>[\p{L}\s]+)       # Name
    \s
    \((?<age>\d{2})\)         # Age in parentheses
    $
";

string input = "Nguyễn Văn A (25)";

Match m = Regex.Match(input, pattern, RegexOptions.IgnorePatternWhitespace);
Console.WriteLine($"Name: {m.Groups["name"]}, Age: {m.Groups["age"]}");

Summary

In this advanced part, you’ve learned how to:

Use recursion and balancing groups
Handle Unicode, locales, and named replacements
Combine regex with LINQ and high-performance features
Build reusable, readable, and maintainable regex-powered tools
Go beyond pattern matching into full parsing and DSL territory


Coming Up Next

In Part 7, we’ll build a complete C# Regex Application Project:

A Log File Analyzer that extracts:

  • Errors and warnings
  • Timestamps and levels
  • Stack traces
  • Generates a report file with summaries and patterns

Comments

Popular posts from this blog