Thursday, February 01, 2007

PHP: Regular Expressions


^ Start of line
$ End of line
n? Zero or only one single occurrence of character 'n'
n* Zero or more occurrences of character 'n'
n+ At least one or more occurrences of character 'n'
n{2} Exactly two occurrences of 'n'
n{2,} At least 2 or more occurrences of 'n'
n{2,4} From 2 to 4 occurrences of 'n'
. Any single character
() Parenthesis to group expressions
(.*) 0 or more occurrences of a single character (anything)
(n|a) Either 'n' or 'a'
[1-6] Any single digit in the range between 1 and 6
[c-h] Any single lower case letter between c and h
[D-M] Any single upper case letter between D and M
[^a-z] Any single char EXCEPT lower case letter from a to z.

Pitfall: the ^ symbol only acts as an EXCEPT rule if it is
thevery first character inside a range, and it denies the
entire range including the ^ symbol itself if it appears
again later in the range. Also remember that if it is the
first character in the entire expression, it means "start
of line". In any other place, it is always treated as a
regular ^ symbol. In other words, you cannot deny a word
with ^undesired_word or a group with ^(undesired_phrase).

Read more detailed regex documentation to find out what is
necessary to achieve this.

[_4^a-zA-Z]
Any single character which can be the underscore or the
number 4 or the ^ symbol or any letter, lower or upper case

?, +, * and the {}
count parameters can be appended not only to a single
character, but also to a group() or a range[].

therefore,
^.{2}[a-z]{1,2}_?[0-9]*([1-6]|[a-f])[^1-9]{2}a+$
would mean:

^.{2} = A line beginning with any two characters,
[a-z]{1,2}= followed by either 1 or 2 lower case letters,
_? = followed by an optional underscore,
[0-9]* = followed by zero or more digits,
([1-6]|[a-f]) = followed by either a digit between 1 and
6 OR a lower case letter between a and f,
[^1-9]{2} = followed by any two characters
except digits between 1 and 9 (0 is possible),
a+$ = followed by at least one or more occurrences
of 'a' at the end of a line.


I used what i knew of regular expressions to create this for form checking.

//mm/dd/yyyy date checking
if (!ereg("^[0-1]{0,1}[0-9]{1}/[0-9]{1,2}/[19|20]{2}[0-9]{2}$", $date))
die('invalid date format, use mm/dd/yyyy');
$mdy = explode("/",$date);
//checkdate() will check things like feb29th in the wrong year etc.
if (!checkdate( $mdy[0], $mdy[1], $mdy[2] ))
die('invalid date format, use mm/dd/yyyy');

//phone format checking
if (!ereg("^[0-9]{3}-[0-9]{3}-[0-9]{4}$", $phone))
die('invalid phone format, use xxx-xxx-xxxx');

//ssn format checking
if (!ereg("^[0-9]{3}-[0-9]{2}-[0-9]{4}$", $ssn))
die('invalid ssn format, use xxx-xx-xxxx');

//zip code format checking
if (!ereg("^[0-9]{5,5}$", $zip) && !ereg("^[0-9]{5,5}-[0-9]{4,4}$", $zip))
die('invalid zipcode format, use xxxxx-xxxx (last 4 digits are optional)');

//email address checking (i got this from somewhere on the net.
//First, check that there's one @ symbol, and that lengths are right
if (!ereg("[^@]{1,64}@[^@]{1,255}", $email))
die('invalid email');

$email_array = explode("@", $email);
// Split it into sections to make life easier
$local_array = explode(".", $email_array[0]);
for ($i = 0; $i < sizeof($local_array); $i++)
if(!ereg("^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$", $local_array[$i]))
die('invalid email');


// Check if domain is IP. If not, it should be valid domain name

if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1]))
{
$domain_array = explode(".", $email_array[1]);
if (sizeof($domain_array) < 2)// Not enough parts to domain
die('invalid email');

for ($i = 0; $i < sizeof($domain_array); $i++)
if (!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|([A-Za-z0-9]+))$", $domain_array[$i]))
die('invalid email');
}


source: http://us2.php.net/manual/en/ref.regex.php

No comments: