2.1 Overview
In this part, you'll learn:
- How to
build patterns from the ground up
- The
full range of symbols: characters, classes, anchors, quantifiers
- How
regex interprets your inputs
- How to
avoid common beginner mistakes
By the end of this part, you’ll be ready to construct
flexible and powerful regex patterns from scratch.
2.2 Literal Characters
Literal characters are the simplest form of regex:
they match themselves.
Example:
hello
Matches:
- "hello
world"
- "say
hello"
Does NOT match:
- "Hello" (case-sensitive by default)
2.3 Character Classes [ ]
Character classes let you match one character out of
a group.
Example:
[cCbB]
Matches: 'c', 'C', 'b', 'B'
Combined:
[cCbB]at
Matches: "cat", "bat", "Cat",
"Bat"
Ranges:
[a-z]
Matches all lowercase letters
[A-Za-z0-9]
Matches letters and digits
❌ Negated Classes:
Use [^ ] to match anything except characters inside.
[^0-9]
Matches any non-digit
[^aeiou]
Matches any character except vowels
2.4 Predefined Character Classes
Class |
Meaning |
Equivalent |
\d |
Digit |
[0-9] |
\D |
Non-digit |
[^0-9] |
\w |
Word character (alphanumeric + _) |
[A-Za-z0-9_] |
\W |
Non-word character |
[^A-Za-z0-9_] |
\s |
Whitespace (space, tab, newline) |
[ \t\n\r\f\v] |
\S |
Non-whitespace |
[^ \t\n\r\f\v] |
Example:
\w+@\w+\.\w+
Matches simple email addresses like: hello@world.com
2.5 Anchors: ^ and $
Anchors don’t match characters — they match positions.
Symbol |
Meaning |
^ |
Start of string/line |
$ |
End of string/line |
Example:
^hello
Matches: "hello world"
Does not match: "say hello"
world!$
Matches: "hello world!"
Does not match: "world! says hello"
2.6 Quantifiers
Quantifiers define how many times a pattern repeats.
Quantifier |
Meaning |
* |
0 or more times |
+ |
1 or more times |
? |
0 or 1 time (optional) |
{n} |
Exactly n times |
{n,} |
At least n times |
{n,m} |
Between n and m times |
Examples:
a*
Matches: "", "a", "aa",
"aaa"
a{2,4}
Matches: "aa", "aaa", "aaaa"
2.7 Greedy vs Lazy Matching
By default, quantifiers are greedy: they match as
much as possible.
Example:
".*"
Matches: "Hello world!" said the fox. → "Hello
world!" said the fox.
To make it lazy, use ?:
".*?"
Matches only: "Hello world!"
2.8 Groups and Capturing ()
Parentheses group parts of regex and capture them for
later use.
Example:
(bat|cat|hat)
Matches: "bat", "cat", "hat"
Capturing Example:
(\d{3})-(\d{2})-(\d{4})
Matches U.S. SSN-like format: "123-45-6789"
Groups:
- Group
1: "123"
- Group
2: "45"
- Group
3: "6789"
🔄 Backreferences
Use \1, \2, etc. to refer back to captured groups.
(\w+)\s+\1
Matches repeated words: "hello hello", "yes
yes"
2.9 Non-Capturing Groups (?: )
Sometimes you want to group without capturing.
(?:http|https)://
Matches both http:// and https:// but doesn’t store the
match as a capturing group.
2.10 Lookahead and Lookbehind
✅ Positive Lookahead: X(?=Y)
Matches X only if followed by Y
\w+(?=\.)
Matches words followed by a period, e.g., "Hello."
→ "Hello"
❌ Negative Lookahead: X(?!Y)
Matches X only if not followed by Y
foo(?!bar)
Matches "foo" not followed by "bar"
🔁 Lookbehind (Python regex,
Java, .NET, JS ES2018+)
- Positive:
(?<=Y)X
- Negative:
(?<!Y)X
Example:
(?<=\$)\d+
Matches digits preceded by a dollar sign, e.g., $123 →
"123"
2.11 Practice Patterns
1️ Match a valid 24-hour time:
^([01]\d|2[0-3]):[0-5]\d$
Matches:
- "00:00"
- "23:59"
Does NOT match:
- "24:00"
- "99:99"
2️ Match HTML tags:
<[^>]+>
Matches: <div>, <img src="x.jpg">, <a
href="#">
3️ Extract numbers from a string:
\d+(\.\d+)?
Matches:
- "123"
- "3.14"
- "0.99"
2.12 Escaping Special Characters
To match special symbols like ., *, +, you must escape them
with a backslash \.
Special characters in regex:
. ^ $ * + ? ( ) [ ] { } \ | /
Example:
\$100
Matches: "$100"
2.13 Flags/Modifiers
These change how the regex behaves.
Flag |
Description |
i |
Case-insensitive |
g |
Global match (find all matches, not just the first) |
m |
Multiline (^ and $ match start/end of lines) |
s |
Dot matches newline |
Example in JavaScript:
const regex = /hello/gi;
"Hello HELLO".match(regex); // ["Hello",
"HELLO"]
2.14 Summary
We’ve now covered all the essential building blocks of
regex:
✅ Literal characters
✅
Character classes and ranges
✅
Anchors
✅
Quantifiers
✅
Groups and backreferences
✅
Lookaheads and lookbehinds
✅
Flags and modifiers
🔜 What’s Next?
In Part 3, we’ll explore how programming language C# implement regex, including hands-on examples.
Comments
Post a Comment