regular expressions (regexes)

 

Overview

There are some things that just can't be done without a regular expression (for validation for instance, or search and replace). so here are some that may come in handy.

Downloads

right click on the download button and pick save link as. otherwise your browser may be smart enough to do the right thing with it rather than display it. If not, try to do a file save as.

Download Now
rfc5322-email-regex.txt - look for address (//2010, MB)


dreamweaver regex

dreamweaver regex can only offer whatever javascript (ecmascript 262) offers. In fact, currently it does not offer ^ and $ due to a bug. submit your bug reports and that will be taken care of by vote. you have to vote for your bug with your bug report through the Adobe Wish Form and use Internet Explorer (most of Adobe's web site only works using Internet Explorer - sorry firefox, safari, opera, and chrome users).

It does not support PCRE engine. it is Javascript. Javascript is not listed in the Mastering Regular Expressions book (only Ecmascript.NET)

features of javascript: | ^ $ \b \B ? * + {num} {num,} {num,num} \atomescape (expr) (?:expr) (?=expr) (?!expr) \f \n \r \t \v \d \D \s \S \w \W [charclass] [^charclass] \decimalescape \b \cControlCharacterLetter \xHexEscapeSequence \uUnicodeEscapeSequence

RFC 822-spec email address

([^\x80-\xff\(\)<>@,;:\"\.\[\]\x00-\x20]+|\"([^\x80-\xff"\\x0d]*|\[\x00-\x7f])*\")(\.([^\x80-\xff\(\)<>@,;:\"\.\[\]\x00-\x20]+|\"([^\x80-\xff"\\x0d]*|\[\x00-\x7f])*\"))*\@([^\x80-\xff\(\)<>@,;:\"\.\[\]\x00-\x20]+|\[([^\x80-\xff\[\]\\x0d]*|\[\x00-\x7f])*\])(\.([^\x80-\xff\(\)<>@,;:\"\.\[\]\x00-\x20]+|\[([^\x80-\xff\[\]\\x0d]*|\[\x00-\x7f])*\]))*

RFC 822 domain

([^\x80-\xff\(\)<>@,;:\"\.\[\]\x00-\x20]+|\[([^\x80-\xff\[\]\\x0d]*|\[\x00-\x7f])*\])(\.([^\x80-\xff\(\)<>@,;:\"\.\[\]\x00-\x20]+|\[([^\x80-\xff\[\]\\x0d]*|\[\x00-\x7f])*\]))*

html singleton or open tag

the html tag matcher in the o'reilly book will not do. there are attributes!<[a-zA-Z]+(\s+[a-zA-Z_-][a-zA-Z0-9_-]*(\s*=\s*('[^']*'|"[^"]*"|[^<>\x00-\x20\x7f-\xff]+))\s*>

html close tag

<\/[a-zA-Z]+\s*>

xml open tag (really could be beefed up here)

<[^<>\x00-\x20\x7f-\xff]+(\s+[^<>\x00-\x20\x7f-\xff]+(\s*=\s*('[^']*'|"[^"]*"|[^<>\x00-\x20\x7f-\xff]+))\s*>

xml close tag (really could be beefed up here)

</[^<>\x00-\x20\x7f-\xff]+\s*>

xml singleton (void element) tag (really could be beefed up here)

<[^<>\x00-\x20\x7f-\xff]+(\s+[^<>\x00-\x20\x7f-\xff]+(\s*=\s*('[^']*'|"[^"]*"|[^<>\x00-\x20\x7f-\xff]+))\s*/>

ISSUES with full XML/HTML open and close tags

I have had problems with matching, searching for, and replacing sets of open+close tags when I have made regexes for them, mainly for one big reason: tags are nested, and regex does not know how to handle the recursion properly that I know of yet (still some studying to do yet). so until then. I think Dreamweaver uses Javascript regex, which is not the PCRE regex. Javascript regex is very simple and only has a few peren operators, and the PCRE's (?R) is not one of them If you do happen to have PCRE, then you could do these here below:

html open+close tag match, recursively matching open+close+singleton, PCRE regex only

heep in mind that there can be singleton elements within the recursion mix... and keep in mind that the recursion applies to the entire expression.(<[a-zA-Z]+(\s+[a-zA-Z_-][a-zA-Z0-9_-]*(\s*=\s*('[^']*'|"[^"]*"|[^<>\x00-\x20\x7f-\xff]+))\s*>|<[a-zA-Z]+(\s+[a-zA-Z_-][a-zA-Z0-9_-]*(\s*=\s*('[^']*'|"[^"]*"|[^<>\x00-\x20\x7f-\xff]+))\s*>[^<>\x00-\x20\x7f-\xff]*(?R)[^<>\x00-\x20\x7f-\xff]*<\/[a-zA-Z]+\s*>)

xml open+close tag match, recursively matching open+close+singleton, PCRE regex only

heep in mind that there can be singleton elements within the recursion mix... and keep in mind that the recursion applies to the entire expression.(<[^<>\x00-\x20\x7f-\xff]+(\s+[^<>\x00-\x20\x7f-\xff]+(\s*=\s*('[^']*'|"[^"]*"|[^<>\x00-\x20\x7f-\xff]+))\s*>[^<>\x00-\x20\x7f-\xff]*(?R)[^<>\x00-\x20\x7f-\xff]*</[^<>\x00-\x20\x7f-\xff]+\s*>|<[^<>\x00-\x20\x7f-\xff]+(\s+[^<>\x00-\x20\x7f-\xff]+(\s*=\s*('[^']*'|"[^"]*"|[^<>\x00-\x20\x7f-\xff]+))\s*/>)

fix option tags (dreamweaver)

This is the most commonly broken tag on the web. this puts the closing tag on. not lined-up perfect placement, mind you, but it DOES fix any problems. you can always "Apply Source Formatting" to the code later with dreamweaver once you have fixed it. according to this whatwg document, this is a NOT a void element (not a singleton).

<([Oo][Pp][Tt][Ii][Oo][Nn][^>]*)>([^<]*)(?=</[sS][Ee][lL][Ee][Cc][Tt]>|</[Oo][Pp][Tt][gG][rR][oO][uU][pP]>|<[Oo][Pp][Tt][iI][oO][nN]>)</option> replace with <\1>\2</option>

Download Now
fix-option-tags.dwr (dw saved find/replace query) (5/31/2011)


fix li tags (dreamweaver)

This is the most commonly broken tag on the web. this puts the closing tag on. not lined-up perfect placement, mind you, but it DOES fix any problems. you can always "Apply Source Formatting" to the code later with dreamweaver once you have fixed it. according to this whatwg document, this is a NOT a void element (not a singleton).

<([lL][iI][^>]*)>([^<]*)(?=</[oO][lL]>|</[uU][lL]>|</[lL][iI]>)</li> replace with <\1>\2</li>

Download Now
fix-li-tags.dwr (dw saved find/replace query) (5/31/2011)


floating point

([+-]*([0-9]+|\.[0-9]+|[0-9]+\.[0-9]+)([Ee][+-]*[0-9]+)?|(0[xX][0-9A-Fa-f]\.[0-9A-Fa-f]|0[0-7]\.[0-7]+|[0-9]\.[0-9]+)[pP][0-9]+)

decimal integer

([+-]*[0-9]+)

C/C++ integer literal

([+-]*(0[xX][0-9A-Fa-f]|0[0-7]+|[0-9]+)L{0,2})

US/CANADA/MEXICO/??? Phone Number

with or without perens or can have digits only or dots in place of dashes

^(((1[.-])?\([0-9]{3}\)[.-]|(1[.-])?[0-9]{3}[.-])?[0-9]{3}[.-][0-9]{4}|[0-9]{11}|[0-9]{10}|[0-9]{7})$

with perens on area code

^(((1[.-])?\([0-9]{3}\)[.-][0-9]{3})?[.-][0-9]{4})$

without perens on area code

^(((1[.-])?[0-9]{3}[.-][0-9]{3})?[.-][0-9]{4})$

just digits

^([0-9]{11}|[0-9]{10}|[0-9]{7})$

books
Mastering Regular Expressions, O'Reilly Press, available from $22.49
Regular Expressions Cookbook, available from $28.49