Regex Tester
Matches Found
—
Regular Expressions Explained: Syntax, Metacharacters, Patterns, and Advanced Features
Regular expressions (often abbreviated as regex or regexp) are one of the most powerful and versatile tools in a programmer's toolkit. A regular expression is a sequence of characters that defines a search pattern, used for matching, finding, replacing, and validating text. First formalized by mathematician Stephen Kleene in the 1950s as part of formal language theory, regex was implemented in Unix tools like ed, grep, and sed in the 1970s and has since been built into virtually every programming language, text editor, and command-line tool in existence.
This regex tester lets you write a pattern, set flags, enter test text, and see matches instantly — all processed locally in your browser using JavaScript's built-in RegExp engine. Whether you are validating email addresses, parsing log files, or extracting data from HTML, this tool helps you build and test patterns interactively before embedding them in your code.
Regex Metacharacter Reference Table
| Pattern | Name | Meaning | Example |
|---|---|---|---|
. | Dot | Any character except newline | a.c matches "abc", "a1c" |
^ | Caret | Start of string (or line with m flag) | ^Hello matches "Hello World" |
$ | Dollar | End of string (or line with m flag) | end$ matches "the end" |
\d | Digit | Any digit [0-9] | \d{3} matches "123" |
\w | Word | Word character [a-zA-Z0-9_] | \w+ matches "hello_world" |
\s | Space | Whitespace (space, tab, newline) | \s+ matches " " (spaces) |
\b | Boundary | Word boundary (between \w and \W) | \bcat\b matches "cat" not "cats" |
[abc] | Char class | Any one of the listed characters | [aeiou] matches vowels |
[^abc] | Negated class | Any character NOT listed | [^0-9] matches non-digits |
(x|y) | Alternation | Match x or y | (cat|dog) matches either |
Quantifiers: Controlling How Many Times a Pattern Matches
| Quantifier | Meaning | Greedy | Lazy |
|---|---|---|---|
* | 0 or more | a* — as many as possible | a*? — as few as possible |
+ | 1 or more | a+ | a+? |
? | 0 or 1 | a? | a?? |
{n} | Exactly n | \d{4} — exactly 4 digits | N/A |
{n,m} | Between n and m | \d{2,4} | \d{2,4}? |
{n,} | n or more | \w{3,} | \w{3,}? |
The difference between greedy and lazy matching is one of the most important regex concepts to understand. Greedy quantifiers (the default) consume as much text as possible and then backtrack if the rest of the pattern fails. Lazy quantifiers (adding ? after the quantifier) consume as little text as possible and expand only if needed. The classic example: given the text <b>bold</b>, the greedy pattern <.*> matches the entire string (from the first < to the last >), while the lazy pattern <.*?> matches just <b> and then </b> separately.
Grouping and Capturing
Capturing groups (pattern) group parts of a pattern and capture the matched text for later use. You can reference captured groups in replacements with $1, $2, etc. For example, the pattern (\d{4})-(\d{2})-(\d{2}) captures year, month, and day separately from a date string, letting you rearrange them in a replacement like $2/$3/$1.
Non-capturing groups (?:pattern) group without capturing, which is useful when you need grouping for alternation or quantifiers but do not need to reference the match later. This is slightly more efficient because the regex engine does not need to store the match.
Named groups (?<name>pattern) let you assign meaningful names to captured groups instead of relying on numeric indices. For example, (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) makes your code far more readable.
Lookahead and Lookbehind Assertions
Lookahead and lookbehind are zero-width assertions — they check whether a pattern exists ahead of or behind the current position without consuming any characters. This is incredibly powerful for matching patterns that depend on context.
Positive lookahead (?=pattern): Matches a position followed by the pattern. Example: \d+(?= dollars) matches "100" in "100 dollars" but not in "100 euros".
Negative lookahead (?!pattern): Matches a position NOT followed by the pattern. Example: \b\w+(?!ing\b) matches words not ending in "ing".
Positive lookbehind (?<=pattern): Matches a position preceded by the pattern. Example: (?<=\$)\d+ matches "50" in "$50" but not "50" standing alone.
Negative lookbehind (?<!pattern): Matches a position NOT preceded by the pattern. All four lookaround assertions are supported in modern JavaScript (ES2018+).
Common Regex Patterns You Can Copy and Use
| Purpose | Pattern | Notes |
|---|---|---|
| Email (basic) | [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} | Covers most common email formats |
| URL | https?://[^\s/$.?#].[^\s]* | HTTP and HTTPS URLs |
| IPv4 address | \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b | Basic format; does not validate range 0-255 |
| Date (YYYY-MM-DD) | \d{4}-\d{2}-\d{2} | ISO 8601 date format |
| Phone (US) | \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4} | Matches (555) 123-4567, 555.123.4567, etc. |
| HTML tag | <[^>]+> | Matches opening and closing tags |
| Hex color | #([0-9a-fA-F]{3}){1,2}\b | Matches #FFF and #FF00FF |
| Password strength | ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$ | Min 8 chars, upper, lower, digit |
Regex Flags Explained
Flags modify how the regex engine processes your pattern. In this tester, enter flags in the Flags field (e.g., "gi" for global and case-insensitive). g (global) finds all matches in the text instead of stopping at the first. i (case-insensitive) treats uppercase and lowercase letters as equivalent. m (multiline) makes ^ and $ match the start and end of each line, not just the entire string. s (dotAll) makes the . metacharacter match newline characters (which it normally skips). u (unicode) enables full Unicode matching, correctly handling surrogate pairs and Unicode property escapes like \p{Emoji}.
Regex Performance Tips
Poorly written regex patterns can cause catastrophic backtracking — exponential time complexity that can freeze your application or even crash the regex engine. To avoid this: avoid nested quantifiers like (a+)+, use possessive quantifiers or atomic groups when available, prefer specific character classes over .*, and always test patterns against worst-case inputs (long strings with near-matches). In JavaScript, the d flag (hasIndices, ES2022) provides match indices without performance penalty.
Regex Engine Differences Across Languages
While core regex syntax is similar across languages, important differences exist. JavaScript does not support possessive quantifiers (++, *+) or atomic groups (?>...) (though they may be added in future ECMAScript versions). Python's re module uses a different syntax for named groups (?P<name>...). PCRE (used by PHP, Nginx, and many other tools) supports recursive patterns and conditional subpatterns. Go's RE2 engine guarantees linear-time matching but does not support backreferences or lookaround. When porting regex patterns between languages, always test in the target language's engine.
Frequently Asked Questions
What are regular expressions and why are they useful?
Regular expressions (regex) are sequences of characters that define search patterns for matching, finding, and replacing text. They are built into virtually every programming language (JavaScript, Python, Java, C#, Go, Ruby, PHP) and text editor (VS Code, Sublime Text, vim). They are essential for tasks like input validation (email, phone numbers, URLs), log file parsing, find-and-replace operations, data extraction from unstructured text, and URL routing in web frameworks.
What do the regex flags g, i, m, s, and u mean?
g (global) finds all matches instead of stopping at the first. i (case-insensitive) ignores case differences so "hello" matches "Hello" and "HELLO". m (multiline) makes ^ and $ match start/end of each line instead of the entire string. s (dotAll) makes the dot metacharacter match newline characters, which it normally skips. u (unicode) enables full Unicode matching including surrogate pairs and Unicode property escapes like \p{Letter}.
What is the difference between greedy and lazy quantifiers?
Greedy quantifiers (*, +, ?) match as much text as possible, then backtrack if needed for the rest of the pattern to match. Lazy quantifiers (*?, +?, ??) match as little text as possible, expanding only if needed. For example, given the text "<b>bold</b>", the greedy pattern "<.*>" matches the entire string from the first < to the last >, while the lazy pattern "<.*?>" matches just "<b>" and then "</b>" as two separate matches.
What are lookahead and lookbehind assertions?
Lookahead (?=...) and lookbehind (?<=...) are zero-width assertions that match a position without consuming characters. Positive lookahead (?=X) matches a position followed by X. Negative lookahead (?!X) matches a position NOT followed by X. Positive lookbehind (?<=X) matches a position preceded by X. Negative lookbehind (?<!X) matches a position NOT preceded by X. These are powerful for matching patterns that depend on surrounding context — for example, (?<=\$)\d+ matches digits that follow a dollar sign without including the dollar sign in the match.
What is catastrophic backtracking in regex?
Catastrophic backtracking occurs when a poorly written regex pattern causes the engine to try an exponentially growing number of match paths, freezing or crashing your application. Common triggers include nested quantifiers like (a+)+ or patterns with overlapping alternatives. To avoid it, use specific character classes instead of dot-star, avoid nesting quantifiers, and test patterns against worst-case inputs such as long strings with near-matches before deploying to production.
How do I validate an email address with regex?
A practical email regex is [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} which covers the vast majority of real-world email addresses. The full RFC 5322 specification is extremely complex and rarely implemented in practice. For production applications, use a simple regex for basic format checking, then verify the address by sending a confirmation email. No regex can confirm an email address actually exists or is reachable.