larger smaller normal text version of this page
Phplist Documentation » Installation » Configuring » Experimental Features » Validating email addresses

Validating email addresses


As a result of various problems with email addresses we set out to explore RFC821 and RFC2821/2822. RFC2821 and 2822 have a more complex definition of a valid email address and I even think it is in conflict with RFC821:

RFC821

The definition of an email address is roughly as follows:
The address consists of a local part and a domain part:
<mailbox> ::= <local-part>"@" <domain>
The local part can be a dot string, basically this is a string of restricted characters which can contain dots, but not at the first or last position. The restricted characters '<c>' are: a-zA-Z0-9!#$%&'*+-/=?^_`{|}~. Apart from these a local part can contain any character when escaped by '\'.
<dot-string> ::= <string> | <string> "." <dot-string>
<string> ::= <char> | <char> <string>
<char> ::= <c> | "\" <x>
<x> ::= any one of the 128 ASCII characters (no exceptions)
The local part can also be completely quoted. This string between double quotes can contain any character but the line end characters and the double quote and escape characters '"' and '\'. It can contain these but then again they must be escaped.
<local-part> ::= <dot-string>| <quoted-string>
<quoted-string> ::= " "
<qtext> ::= "\" <x>| "\" <x> <qtext> | <q> | <q><qtext>
<x> ::= any one of the 128 ASCII characters (no exceptions)
<q> ::= any one of the 128 ASCII characters except <CR>, <LF>, quote ("), or backslash (\)
The domain name is much more restricted. The dot separated domain labels are limited to letters, digits, and hyphens drawn from the ASCII character set.

RFC2821

This spec is a lot less easy to follow. It 'consolidates, updates and clarifies, but doesn't add new or change existing functionality of (...) RFC821' . It might clarify other syntax or semantics of the SMTP protocol but it doesn't clarify the address syntax in my humble opinion. Besides this it does define the local part differently :
The local part is again a dot string or fully quoted. The dot string consists of atext which is defined in RFC2822. This defines atext, and therefore the unquoted local part, to be built up of only the restricted set of characters, ignoring the option of escaping control characters.
Local-part = Dot-string / Quoted-string
Dot-string = Atom *("." Atom)
Atom = 1*atext
atext = a-zA-Z0-9!#$%&'*+-/=?^_`{|}~
I believe atext should have been similar to ccontent, also defined in RFC2822:
ccontent = ctext / quoted-pair / comment

What do we want to use in PHPlist


Using the full RFC specification present us with some problems. According to the specification these are valid addresses:
"name last"@company.com
name\ last@company.com
name\<CR>last@company.com
"name\@tag"@example.com
!#$%&'*+-/=.?^_`{|}~@example.com
name(comment)@company.com
and even: name<SP><CRLF>
<SP>last@domain.net (this is called 'folding', where a CRLF surround by whitespaces breaks the line syntactically without breaking it semantically!)
  • Even if we accept these email addresses they will most likely be blocked by most other services. Therefore it is unlikely that users have an email addres like that.
  • Even if we accept email addresses they will most likely be blocked by SPAM controlling software. Therefore it is unlikely the email will be delivered.
  • We might open up security holes, accepting quotes could allow code injection for instance.

So the full RFC821 email validation in phpList will not:
  • Length of domainPart is not checked
  • Not accepted are CR and LF even if escaped by \
  • Not accepted is Folding
  • Not accepted is literal domain-part (eg. [1.0.0.127])
  • Not accepted is comments (eg. (this is a comment)@example.com)

Apart from that a config setting EMAIL_ADDRESS_VALIDATION_LEVEL determines the level of checking, including allowing to fallback to 10.4 style email validation.
Page was generated in 0.0611 seconds