Regex considerations for Machine Readable Passport

Posted by Mauro Leonelli on July 31, 2015 - Reading time: about 4 minutes

For a recent project I was looking into Passport Numbers and the, apparently, impossibility to validate them. From there I played with MRTDs or Machine Readable Travel Documents and passports are nowadays one of those, making the customs procedures faster ( unless you land in Miami, then it’s going to be a mess anyway ).

On a MR Passport there are two lines. Each is 44 characters long, with a filler character < (less sign) in case an empty space is needed. Here’s an example of a fictional MR Passport code.


As can be found on the Wikipedia page, the format of the first row can be defined as

Positions Length Characters Meaning
1 1 alpha P, indicating a passport
2 1 alpha Type (for countries that distinguish between different types of passports)
3–5 3 alpha Issuing country or organization (ISO 3166-1 alpha-3 code with modifications)
6–44 39 alpha Surname, followed by two filler characters, followed by given names. Given names are separated by single filler characters

and the second row as

Positions Length Characters Meaning
1–9 9 alpha+num Passport number
10 1 numeric Check digit over digits 1–9
11–13 3 alpha Nationality (ISO 3166-1 alpha-3 code with modifications)
14–19 6 numeric Date of birth (YYMMDD)
20 1 num Check digit over digits 14–19
21 1 alpha Sex (M, F or < for male, female or unspecified)
22–27 6 numeric Expiration date of passport (YYMMDD)
28 1 numeric Check digit over digits 22–27
29–42 14 alpha+num Personal number (may be used by the issuing country as it desires)
43 1 numeric Check digit over digits 29–42 (may be < if all characters are <)
44 1 numeric Check digit over digits 1–10, 14–20, and 22–43

From the above specification, here’s a possible implementation of a regex rule which tries to validate, parse and extract data from the the 2 rows.

As a side note

  1. Currently I am either able to segment surname and given name or check that the length is 39.

  2. States can be checked against the list as in ISO 3166-1 or with a general regex command depending on the needs.

  3. Check digits are extracted but not validated in Regex. If interested, the check digit calculation is as follows…

  • Convert symbols to integers as per the table below

  • The value of each integer is then multiplied by its weight; the weight of the first position is 7, of the second it is 3, and of the third it is 1, and after that the weights repeat 7, 3, 1, and so on.

  • All values are added together

  • The remainder of the final value divided by 10 is the check digit.

Below the implementation or to see it in a better format the same version can be found on Regex101; I am planning to give it a check in the near future to see if I can make it better…

  # First line capturing group
  # Passport character capturing group (P char, length 1)
  # Passport Type (any char, generally <, length 1)
  # To be completed with the ISO 3166-1 alpha-3 country codes (length 3)
  # Or in alternative it can be checked for char only as
  # (?<IssuingCountry>[A-Z<]{3}) if check state not necessary
  # Passport lookahead for lenght validation -- NOT WORKING
  # Surname, it has to be followed by <<
    # Given Name
  # Second Line capturing group
  # Passport number, length 9, padded with <
  # Check digit for position 1 to 9
  # Nationality, follows the same rule as match group 4
    # Date Of Birth
  # Check digit for position 14 to 19
  # Sex (Male, Female or not defined)
    # Expiral date
  # Check digit for position 22 to 27
  # Personal number padded with <
  # Check digit for position 29 to 42 (can be < empty)
  # Check digit