
What is a Regular Expression#
A regular expression is code that records text rules.
Testing Regular Expressions#
Online testing tool: wegester
Metacharacters#
| Code | Description |
|---|---|
| . | Matches any character except a newline |
| \w | Matches a letter, digit, underscore, or Chinese character |
| \s | Matches any whitespace character |
| \d | Matches a digit |
| \b | Matches the beginning or end of a word |
| ^ | Matches the beginning of a string |
| $ | Matches the end of a string |
Example:
| Target | Regular Expression |
|---|---|
| A Lucy following not far behind hi | \bhi\b.*\bLucy\b |
| Starts with 0, followed by two digits, then a hyphen "-", and finally eight digits | 0\d\d-\d\d\d\d\d\d\d\d or 0\d{2}-\d{8} |
| Matches words starting with the letter a | \ba\w*\b |
| One or more consecutive digits | \d+ |
| Exactly six-character words | \b\w{6}\b |
| The entire string is 5 to 12 digits | ^\d{5,12}$ |
Character Escaping#
Example:
| Target | Regular Expression |
|---|---|
| deerchao.cn | deerchao\.cn |
| C:\Windows | C:\\Windows |
Repetition#
| Code/Syntax | Description |
|---|---|
| * | Matches zero or more times |
| + | Matches one or more times |
| ? | Matches zero or one time |
| {n} | Matches exactly n times |
| {n,} | Matches n or more times |
| {n,m} | Matches between n and m times |
Example:
| Regular Expression | Target |
|---|---|
| Windows\d+ | Windows followed by one or more digits |
| ^\w+ | The first word of a line |
Character Classes#
Example:
| Regular Expression | Target |
|---|---|
| [aeiou] | Any one English vowel letter |
| [0-9] | One digit |
| (?0\d{2}[) -]?\d{8} | Several formats of phone numbers |
Branch Conditions#
When matching branch conditions, each condition will be tested from left to right. If one branch is satisfied, the other conditions will not be considered.
Example:
| Regular Expression | Target |
|---|---|
| 0\d{2}-\d{8}|0\d{3}-\d{7} | Three-digit area code, followed by an eight-digit local number or a four-digit area code, followed by a seven-digit local number |
| \d{5}-\d{4}|\d{5} | The rule for US zip codes is either five digits or nine digits separated by a hyphen |
Grouping#
You can use parentheses to specify subexpressions (also known as groups).
Example:
| Regular Expression | Target |
|---|---|
| (\d{1,3}.){3}\d{1,3} | Simple IP address matching |
| ((2[0-4]\d|25[0-5]|[01]?\d\d?).){3}(2[0-4]\d|25[0-5]|[01]?\d\d?) | Correct IP address |
Negation#
| Code | Description |
|---|---|
| \W | Matches any character that is not a letter, digit, underscore, or Chinese character |
| \S | Matches any character that is not a whitespace character |
| \D | Matches any character that is not a digit |
| \B | Matches a position that is not the beginning or end of a word |
| [^x] | Matches any character except x |
| [^aeiou] | Matches any character except the specified vowels |
Example:
| Regular Expression | Target |
|---|---|
| \S+ | Strings without whitespace |
| <a[^>]+> | Strings starting with "a" enclosed in angle brackets |