Regex considerations for Machine Readable Passport

For a recent project I was looking into Passport Numbers and the, apparently, impossibility to validate them. From there I played with MRTDs or Machine Readable Travel Documents and passports are nowadays one of those, making the customs procedures faster (unless you land in Miami, then it’s going to be a mess anyway).

On a MR Passport there are two lines. Each is 44 characters long, with a filler character < (less sign) in case an empty space is needed.

Here’s an example of a fictional MR Passport code.

P&lt;ITADAVINCI&lt;&lt;LEONARDO&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;
L898902C&lt;3ITA6908061F9406236ZE184226B&lt;&lt;&lt;&lt;&lt;14

As can be found on the Wikipedia page, the format of the first row can be defined as:

PositionsLengthCharactersMeaning
11alphaP, indicating a passport
21alphaType (for countries that distinguish between different types of passports)
3–53alphaIssuing country or organization (ISO 3166-1 alpha-3 code with modifications)
6–4439alphaSurname, followed by two filler characters, followed by given names. Given names are separated by single filler characters

and the second row as

PositionsLengthCharactersMeaning
1–99alpha+numPassport number
101numericCheck digit over digits 1–9
11–133alphaNationality (ISO 3166-1 alpha-3 code with modifications)
14–196numericDate of birth (YYMMDD)
201numCheck digit over digits 14–19
211alphaSex (M, F or < for male, female or unspecified)
22–276numericExpiration date of passport (YYMMDD)
281numericCheck digit over digits 22–27
29–4214alpha+numPersonal number (may be used by the issuing country as it desires)
431numericCheck digit over digits 29–42 (may be < if all characters are <)
441numericCheck digit over digits 1–10, 14–20, and 22–43


From the above specification, here’s a possible implementation of a regex rule which tries to validate, parse and extract data from the the 2 rows.

As a side note

  1. Currently I am either able to segment surname and given name or check that the length is 39.

  2. States can be checked against the list as in ISO 3166-1 or with a general regex command depending on the needs.

  3. Check digits are extracted but not validated in Regex. If interested, the check digit calculation is as follows…

  • Convert symbols to integers as per the table below
<ABCDEFGHIJKLMNOPQRSTUVWXYZ
01011121314151617181920212223242526272829303132333435


  • The value of each integer is then multiplied by its weight; the weight of the first position is 7, of the second it is 3, and of the third it is 1, and after that the weights repeat 7, 3, 1, and so on.

  • All values are added together

  • The remainder of the final value divided by 10 is the check digit.

Below the implementation or to see it in a better format the same version can be found on Regex101; I am planning to give it a check in the near future to see if I can make it better…

/^
(?&lt;FirstLine&gt;
  # First line capturing group
  (?&lt;Passport&gt;P)
  # Passport character capturing group (P char, length 1)
  (?&lt;PassportType&gt;.)
  # Passport Type (any char, generally &lt;, length 1)
  (?&lt;IssuingCountry&gt;\[ITA]{3})
  # To be completed with the ISO 3166-1 alpha-3 country codes (length 3)
  # Or in alternative it can be checked for char only as
  # (?&lt;IssuingCountry&gt;\[A-Z&lt;]{3}) if check state not necessary
  (?=\[A-Z&lt;]{39})
  # Passport lookahead for lenght validation -- NOT WORKING
  (?&lt;Surname&gt;\[A-Z]+)
  # Surname, it has to be followed by &lt;&lt;
  &lt;&lt;
  (?&lt;GivenName&gt;
    # Given Name
    (?:\[A-Z]+&lt;?)+
  )
  \[&lt;]+
)
\n
(?&lt;SecondLine&gt;
  # Second Line capturing group
  (?&lt;PassportNumber&gt;\[A-Z0-9&lt;]{9})
  # Passport number, length 9, padded with &lt;
  (?&lt;CheckDigit19&gt;\[0-9]{1})
  # Check digit for position 1 to 9
  (?&lt;Nationality&gt;\g{IssuingCountry})
  # Nationality, follows the same rule as match group 4
  (?&lt;DoB&gt;
    # Date Of Birth
    (?&lt;DoBYear&gt;\[0-9]{2})
    (?&lt;DoBMonth&gt;(?:0\[1-9]|1\[0-2]))
    (?&lt;DoBDay&gt;(?:0\[1-9]|(?:1|2)\[0-9]|3\[01])
    )
  )
  (?&lt;CheckDigit1419&gt;\[0-9])
  # Check digit for position 14 to 19
  (?&lt;Sex&gt;\[MF&lt;])
  # Sex (Male, Female or not defined)
  (?&lt;Expiral&gt;
    # Expiral date
    (?&lt;ExpiralYear&gt;\[0-9]{2})
    (?&lt;ExpiralMonth&gt;(?:0\[1-9]|1\[0-2]))
    (?&lt;ExpiralDay&gt;(?:0\[1-9]|(?:1|2)\[0-9]|3\[01]))
  )
  (?&lt;CheckDigit2227&gt;\[0-9])
  # Check digit for position 22 to 27
  (?&lt;PersonalNumber&gt;\[A-Z0-9&lt;]{14})
  # Personal number padded with &lt;
  (?&lt;CheckDigit2942&gt;\[0-9&lt;])
  # Check digit for position 29 to 42 (can be &lt; empty)
  (?&lt;CheckDigitF&gt;\[0-9])
  # Check digit
)
$/xm
Mauro Leonelli
Mauro Leonelli
Product Manager

Mauro Leonelli is a Product Manager with a passion for code and over 10 years experience in the Travel business.

comments powered by Disqus