Regex considerations for Machine Readable Passport
For a recent project I was looking into Passport Numbers and the, apparently, impossibility to validate them. From there I played with MRTDs or Machine Readable Travel Documents and passports are nowadays one of those, making the customs procedures faster ( unless you land in Miami, then it’s going to be a mess anyway ).
On a MR Passport there are two lines. Each is 44 characters long, with a filler character < (less sign) in case an empty space is needed. Here’s an example of a fictional MR Passport code.
As can be found on the Wikipedia page, the format of the first row can be defined as
|1||1||alpha||P, indicating a passport|
|2||1||alpha||Type (for countries that distinguish between different types of passports)|
|3–5||3||alpha||Issuing country or organization (ISO 3166-1 alpha-3 code with modifications)|
|6–44||39||alpha||Surname, followed by two filler characters, followed by given names. Given names are separated by single filler characters|
and the second row as
|10||1||numeric||Check digit over digits 1–9|
|11–13||3||alpha||Nationality (ISO 3166-1 alpha-3 code with modifications)|
|14–19||6||numeric||Date of birth (YYMMDD)|
|20||1||num||Check digit over digits 14–19|
|21||1||alpha||Sex (M, F or < for male, female or unspecified)|
|22–27||6||numeric||Expiration date of passport (YYMMDD)|
|28||1||numeric||Check digit over digits 22–27|
|29–42||14||alpha+num||Personal number (may be used by the issuing country as it desires)|
|43||1||numeric||Check digit over digits 29–42 (may be < if all characters are <)|
|44||1||numeric||Check digit over digits 1–10, 14–20, and 22–43|
From the above specification, here’s a possible implementation of a regex rule which tries to validate, parse and extract data from the the 2 rows.
As a side note
Currently I am either able to segment surname and given name or check that the length is 39.
States can be checked against the list as in ISO 3166-1 or with a general regex command depending on the needs.
Check digits are extracted but not validated in Regex. If interested, the check digit calculation is as follows…
- Convert symbols to integers as per the table below
The value of each integer is then multiplied by its weight; the weight of the first position is 7, of the second it is 3, and of the third it is 1, and after that the weights repeat 7, 3, 1, and so on.
All values are added together
The remainder of the final value divided by 10 is the check digit.
Below the implementation or to see it in a better format the same version can be found on Regex101; I am planning to give it a check in the near future to see if I can make it better…