Regex considerations for Machine Readable Passport
For a recent project I was looking into Passport Numbers and the, apparently, impossibility to validate them. From there I played with MRTDs or Machine Readable Travel Documents and passports are nowadays one of those, making the customs procedures faster (unless you land in Miami, then it’s going to be a mess anyway).
On a MR Passport there are two lines. Each is 44 characters long, with a filler character < (less sign) in case an empty space is needed.
Here’s an example of a fictional MR Passport code.
P<ITADAVINCI<<LEONARDO<<<<<<<<<<<<<<<<<<<<<<
L898902C<3ITA6908061F9406236ZE184226B<<<<<14
As can be found on the Wikipedia page, the format of the first row can be defined as:
Positions | Length | Characters | Meaning |
---|---|---|---|
1 | 1 | alpha | P, indicating a passport |
2 | 1 | alpha | Type (for countries that distinguish between different types of passports) |
3–5 | 3 | alpha | Issuing country or organization (ISO 3166-1 alpha-3 code with modifications) |
6–44 | 39 | alpha | Surname, followed by two filler characters, followed by given names. Given names are separated by single filler characters |
and the second row as
Positions | Length | Characters | Meaning |
---|---|---|---|
1–9 | 9 | alpha+num | Passport number |
10 | 1 | numeric | Check digit over digits 1–9 |
11–13 | 3 | alpha | Nationality (ISO 3166-1 alpha-3 code with modifications) |
14–19 | 6 | numeric | Date of birth (YYMMDD) |
20 | 1 | num | Check digit over digits 14–19 |
21 | 1 | alpha | Sex (M, F or < for male, female or unspecified) |
22–27 | 6 | numeric | Expiration date of passport (YYMMDD) |
28 | 1 | numeric | Check digit over digits 22–27 |
29–42 | 14 | alpha+num | Personal number (may be used by the issuing country as it desires) |
43 | 1 | numeric | Check digit over digits 29–42 (may be < if all characters are <) |
44 | 1 | numeric | Check digit over digits 1–10, 14–20, and 22–43 |
From the above specification, here’s a possible implementation of a regex rule which tries to validate, parse and extract data from the the 2 rows.
As a side note
Currently I am either able to segment surname and given name or check that the length is 39.
States can be checked against the list as in ISO 3166-1 or with a general regex command depending on the needs.
Check digits are extracted but not validated in Regex. If interested, the check digit calculation is as follows…
- Convert symbols to integers as per the table below
< | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
0 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 |
The value of each integer is then multiplied by its weight; the weight of the first position is 7, of the second it is 3, and of the third it is 1, and after that the weights repeat 7, 3, 1, and so on.
All values are added together
The remainder of the final value divided by 10 is the check digit.
Below the implementation or to see it in a better format the same version can be found on Regex101; I am planning to give it a check in the near future to see if I can make it better…
/^
(?<FirstLine>
# First line capturing group
(?<Passport>P)
# Passport character capturing group (P char, length 1)
(?<PassportType>.)
# Passport Type (any char, generally <, length 1)
(?<IssuingCountry>\[ITA]{3})
# To be completed with the ISO 3166-1 alpha-3 country codes (length 3)
# Or in alternative it can be checked for char only as
# (?<IssuingCountry>\[A-Z<]{3}) if check state not necessary
(?=\[A-Z<]{39})
# Passport lookahead for lenght validation -- NOT WORKING
(?<Surname>\[A-Z]+)
# Surname, it has to be followed by <<
<<
(?<GivenName>
# Given Name
(?:\[A-Z]+<?)+
)
\[<]+
)
\n
(?<SecondLine>
# Second Line capturing group
(?<PassportNumber>\[A-Z0-9<]{9})
# Passport number, length 9, padded with <
(?<CheckDigit19>\[0-9]{1})
# Check digit for position 1 to 9
(?<Nationality>\g{IssuingCountry})
# Nationality, follows the same rule as match group 4
(?<DoB>
# Date Of Birth
(?<DoBYear>\[0-9]{2})
(?<DoBMonth>(?:0\[1-9]|1\[0-2]))
(?<DoBDay>(?:0\[1-9]|(?:1|2)\[0-9]|3\[01])
)
)
(?<CheckDigit1419>\[0-9])
# Check digit for position 14 to 19
(?<Sex>\[MF<])
# Sex (Male, Female or not defined)
(?<Expiral>
# Expiral date
(?<ExpiralYear>\[0-9]{2})
(?<ExpiralMonth>(?:0\[1-9]|1\[0-2]))
(?<ExpiralDay>(?:0\[1-9]|(?:1|2)\[0-9]|3\[01]))
)
(?<CheckDigit2227>\[0-9])
# Check digit for position 22 to 27
(?<PersonalNumber>\[A-Z0-9<]{14})
# Personal number padded with <
(?<CheckDigit2942>\[0-9<])
# Check digit for position 29 to 42 (can be < empty)
(?<CheckDigitF>\[0-9])
# Check digit
)
$/xm