ToolHub
查看所有文章

Regular Expressions Cheat Sheet and Common Patterns

Regular expressions, or regex, are powerful pattern-matching tools used in virtually every programming language and text editor. While the syntax can appear cryptic at first, mastering regex unlocks the ability to search, validate, extract, and transform text with precision that would require dozens of lines of procedural code. This guide covers the essential syntax, common patterns, and practical techniques you need to know.

Basic Regex Syntax Cheat Sheet

Character Classes

Pattern Meaning Example
. Any character except newline a.c matches abc, a1c, a c
\d Any digit (0-9) \d\d matches 42
\D Any non-digit \D matches a, !, space
\w Word character (a-z, A-Z, 0-9, _) \w+ matches hello_123
\W Non-word character \W matches @, #, space
\s Whitespace (space, tab, newline) a\sb matches a b
\S Non-whitespace \S+ matches hello
[abc] Any character in the set [aeiou] matches a vowel
[^abc] Any character not in the set [^0-9] matches non-digits
[a-z] Character range [A-Za-z] matches any letter

Quantifiers

Pattern Meaning Example
* Zero or more ab*c matches ac, abc, abbc
+ One or more ab+c matches abc, abbc
? Zero or one colou?r matches color, colour
{n} Exactly n times \d{4} matches 4 digits
{n,} n or more times \d{2,} matches 2+ digits
{n,m} Between n and m times \d{3,5} matches 3-5 digits

Anchors and Boundaries

Pattern Meaning Example
^ Start of string or line ^Hello matches Hello at start
$ End of string or line world$ matches world at end
\b Word boundary \bcat\b matches cat alone
\B Non-word boundary \Bcat matches scattered

Groups and Alternation

Pattern Meaning Example
(abc) Capturing group (ab)+ matches abab
(?:abc) Non-capturing group (?:ab)+ matches abab
a|b Alternation (or) cat|dog matches cat or dog
\1 Backreference to group 1 (\w)\1 matches aa, bb
(?<name>abc) Named capturing group Referenced as name

Common Regex Patterns

Here are the most frequently needed regex patterns for web development and data validation:

Email Validation

A practical email regex that catches most invalid addresses without being overly strict: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$. This pattern validates the basic email structure while allowing common characters in the local part and requiring a valid domain with a TLD of at least two characters.

URL Validation

To match HTTP and HTTPS URLs: ^https?:\/\/(www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(\/[^\s]*)?$. This pattern matches URLs with optional www prefix, a domain name, and an optional path.

Phone Number (US)

For US phone numbers in various formats: ^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$. This handles formats like (555) 123-4567, 555-123-4567, and 5551234567.

IP Address (IPv4)

To validate IPv4 addresses: ^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$. This ensures each octet is between 0 and 255.

Date (YYYY-MM-DD)

For ISO date format: ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$. This validates the format structure but does not check for valid days in each month.

Hex Color Code

To match CSS hex colors: ^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$. This matches both 3-digit shorthand (#fff) and 6-digit (#ffffff) formats.

Advanced Techniques

Lookahead and Lookbehind

Lookahead and lookbehind assertions let you match patterns only when they are (or are not) followed or preceded by another pattern, without including the surrounding text in the match.

Greedy vs Lazy Matching

By default, quantifiers are greedy, meaning they match as much text as possible. Adding a question mark after a quantifier makes it lazy, matching as little as possible. For example, <.*> applied to "<b>bold</b>" matches the entire string, while <.*?> matches only "<b>".

Pro Tip: When extracting content between delimiters like HTML tags or quotes, always use lazy quantifiers to avoid matching too much text. Greedy matching is the most common source of regex bugs.

Regex Performance Tips

Testing and Debugging Regex

Always test your regex patterns with real-world input before deploying them. Use an interactive regex tester that highlights matches in real time and explains each part of the pattern. This helps you catch edge cases and understand why a pattern does or does not match specific input. Try our free Regex Tester to build and test your patterns interactively.

Build, test, and debug your regex patterns with our free tools.

Try Our Regex Tester Regex Cheat Sheet

Frequently Asked Questions

What is the difference between greedy and lazy regex matching?

Greedy quantifiers match as much text as possible, while lazy quantifiers match as little as possible. Adding a question mark after a quantifier makes it lazy. For example, .* matches the longest possible string, while .*? matches the shortest.

Are regular expressions the same across all programming languages?

No, regex implementations vary across languages. Most support the basic POSIX syntax, but advanced features like lookbehind, named groups, and Unicode support differ. JavaScript historically had limited regex support but has added many features in recent versions.

Can regex parse HTML or XML?

No, regular expressions cannot reliably parse HTML or XML because these are nested, context-free languages. Regex has no concept of nesting depth or balanced tags. Use a proper HTML or XML parser instead. Regex can extract simple patterns from HTML, but it will fail on complex structures.

What are lookahead and lookbehind in regex?

Lookahead and lookbehind are zero-width assertions that check for patterns without consuming characters. (?=pattern) is a positive lookahead that matches if the pattern follows. (?<=pattern) is a positive lookbehind that matches if the pattern precedes. Negative versions use ! instead of =.

How do I make my regex case-insensitive?

Add the i flag after the closing delimiter to make the entire pattern case-insensitive. In JavaScript, use /pattern/i. In Python, use re.IGNORECASE or re.I as a flag. You can also use (?i) inline to make specific portions case-insensitive.