Skip to main content

Mastering Regular Expression. Part 2: The Core Syntax of Regex – Deep Dive

2.1 Overview

In this part, you'll learn:

  • How to build patterns from the ground up
  • The full range of symbols: characters, classes, anchors, quantifiers
  • How regex interprets your inputs
  • How to avoid common beginner mistakes

By the end of this part, you’ll be ready to construct flexible and powerful regex patterns from scratch.


2.2 Literal Characters

Literal characters are the simplest form of regex: they match themselves.

Example:

hello

Matches:

  • "hello world"
  • "say hello"

Does NOT match:

  • "Hello" (case-sensitive by default)

2.3 Character Classes [ ]

Character classes let you match one character out of a group.

Example:

[cCbB]

Matches: 'c', 'C', 'b', 'B'

Combined:

[cCbB]at

Matches: "cat", "bat", "Cat", "Bat"

Ranges:

[a-z]

Matches all lowercase letters

[A-Za-z0-9]

Matches letters and digits


Negated Classes:

Use [^ ] to match anything except characters inside.

[^0-9]

Matches any non-digit

[^aeiou]

Matches any character except vowels


2.4 Predefined Character Classes

Class

Meaning

Equivalent

\d

Digit

[0-9]

\D

Non-digit

[^0-9]

\w

Word character (alphanumeric + _)

[A-Za-z0-9_]

\W

Non-word character

[^A-Za-z0-9_]

\s

Whitespace (space, tab, newline)

[ \t\n\r\f\v]

\S

Non-whitespace

[^ \t\n\r\f\v]

Example:

\w+@\w+\.\w+

Matches simple email addresses like: hello@world.com


2.5 Anchors: ^ and $

Anchors don’t match characters — they match positions.

Symbol

Meaning

^

Start of string/line

$

End of string/line

Example:

^hello

Matches: "hello world"
Does not match: "say hello"

world!$

Matches: "hello world!"
Does not match: "world! says hello"


2.6 Quantifiers

Quantifiers define how many times a pattern repeats.

Quantifier

Meaning

*

0 or more times

+

1 or more times

?

0 or 1 time (optional)

{n}

Exactly n times

{n,}

At least n times

{n,m}

Between n and m times

Examples:

a*

Matches: "", "a", "aa", "aaa"

a{2,4}

Matches: "aa", "aaa", "aaaa"


2.7 Greedy vs Lazy Matching

By default, quantifiers are greedy: they match as much as possible.

Example:

".*"

Matches: "Hello world!" said the fox. → "Hello world!" said the fox.

To make it lazy, use ?:

".*?"

Matches only: "Hello world!"


2.8 Groups and Capturing ()

Parentheses group parts of regex and capture them for later use.

Example:

(bat|cat|hat)

Matches: "bat", "cat", "hat"

Capturing Example:

(\d{3})-(\d{2})-(\d{4})

Matches U.S. SSN-like format: "123-45-6789"

Groups:

  • Group 1: "123"
  • Group 2: "45"
  • Group 3: "6789"

🔄 Backreferences

Use \1, \2, etc. to refer back to captured groups.

(\w+)\s+\1

Matches repeated words: "hello hello", "yes yes"


2.9 Non-Capturing Groups (?: )

Sometimes you want to group without capturing.

(?:http|https)://

Matches both http:// and https:// but doesn’t store the match as a capturing group.


2.10 Lookahead and Lookbehind

Positive Lookahead: X(?=Y)

Matches X only if followed by Y

\w+(?=\.)

Matches words followed by a period, e.g., "Hello." → "Hello"


Negative Lookahead: X(?!Y)

Matches X only if not followed by Y

foo(?!bar)

Matches "foo" not followed by "bar"


🔁 Lookbehind (Python regex, Java, .NET, JS ES2018+)

  • Positive: (?<=Y)X
  • Negative: (?<!Y)X

Example:

(?<=\$)\d+

Matches digits preceded by a dollar sign, e.g., $123 → "123"


2.11 Practice Patterns

1️ Match a valid 24-hour time:

^([01]\d|2[0-3]):[0-5]\d$

Matches:

  • "00:00"
  • "23:59"

Does NOT match:

  • "24:00"
  • "99:99"

2️ Match HTML tags:

<[^>]+>

Matches: <div>, <img src="x.jpg">, <a href="#">


3️ Extract numbers from a string:

\d+(\.\d+)?

Matches:

  • "123"
  • "3.14"
  • "0.99"

2.12 Escaping Special Characters

To match special symbols like ., *, +, you must escape them with a backslash \.

Special characters in regex:

. ^ $ * + ? ( ) [ ] { } \ | /

Example:

\$100

Matches: "$100"


2.13 Flags/Modifiers

These change how the regex behaves.

Flag

        Description

i

Case-insensitive

g

Global match (find all matches, not just the first)

m

Multiline (^ and $ match start/end of lines)

s

Dot matches newline

Example in JavaScript:

const regex = /hello/gi;

"Hello HELLO".match(regex); // ["Hello", "HELLO"]


2.14 Summary

We’ve now covered all the essential building blocks of regex:

Literal characters
Character classes and ranges
Anchors
Quantifiers
Groups and backreferences
Lookaheads and lookbehinds
Flags and modifiers


🔜 What’s Next?

In Part 3, we’ll explore how programming language C# implement regex, including hands-on examples.

Comments

Popular posts from this blog