Show/Hide Toolbars

Navigation: Reference

Regular Expressions

Scroll Prev Top Next More

Regular Expressions are a common way of expressing a "pattern" for matching/searching text. People may be familiar with * and ? 'wildcards'. Regular Expressions are much more powerful than that. There are many tutorials on the Internet for Regular Expressions. http://www.regular-expressions.info/ is one we know.

A Regular Expressions is also known as a regex or regexp.

There are several 'flavours' of regular expressions. VPOP3 uses Perl Compatible Regular Expressions (PCRE) most of the time. (Lua scripting uses Lua's native pattern matching).

 

The simplest regular expression is just some text, so the regular expression cat will match the words cat, catch, caterpillar, abdicate, etc. Regular expressions are usually case sensitive, unless indicated otherwise.

Often regular expressions are entered with / characters around them, and optional 'flags' at the end, such as 'i' to indicate case insensitivity. So, /cat/i would match cat, Cat or cAt.

Many non-alphanumeric characters have special meaning, some of the most common are described below:

. (period/full stop/dot) matches any character except a line break, so c.t will match cat or cut, but not cant.

[...] matches any character defined inside the square brackets. Ranges can be specified using -, so valid ranges may be [abc123] or [a-z0-7]. So, c[aeiou]t will match cat, cet, cit, cot or cut but not cbt.

\s means a space character (space, tab, line break)

\ before any control characters (including \) means the second character literally rather than as a control character, eg \* means *, not 'zero or more characters'

* (asterisk/star) means zero or more of the preceding character, so ca*t will match ct, cat or caaaaaaat, but not cabt. The preceding character can be a control character or sequence as well, e.g. c[a-c]*t

? means zero or one of the preceding character or sequence, so ca?t will match ct or cat, but not caat.

+ means one or more of the preceding character or sequence, so ca+t will match cat or caaaaaaaat, but not ct.

{n,m} means from n to m of the preceding character or sequence, e.g. c[aeiou]{3,5}t will match caaat, caiout but not cat, caat or caeiouat

^ is an "anchor" which matches at the start of a string

$ is an "anchor" which matches at the end of a string, so /cat/ will match cat, catch, abdicate, but /^cat/ will only match cat & catch, and /^cat$/ will only match cat.

 

There are many others.

 

For more details, or more examples, we recommend looking at a tutorial on the Internet.

 

 

If you think this help topic could be improved, please send us constructive feedback