Phone Number input validation in Kenya

Many times, I have seen developers write code for validating entries and they write very many if blocks or worse combine them with numerous switch cases. This approach would not have a problem but then more times than not, they need to be updated in certain periods when something new arises or for certain edge scenarios not originally considered.

One of this situations is validating phone numbers input into a field. For the Americas and Europe, this is not normally a problem because several frameworks will have that already inbuilt. This is not the same for other countries, such as Kenya (where I am from). In this blog post, I quickly construct a regular expression for validating phone numbers in this country.

First, let us consider the common formats that people use to represent mobile phone numbers: 07xxxxxxxx, +2547xxxxxxxx or +2547xxxxxxxx. Sometimes you will also see 7xxxxxxxx. Most systems will store phone numbers for use in calling or verifying ownership of a phone. In this situations, they need to send a message or interact with the phone in certain ways. Usually, to do so, the phone number should be in MSISDN format (read more on that here). As a brief, the format is:

MSISDN = CC + NDC + SN  
CC = Country Code  
NDC = National Destination Code. It identifies one or part of a Public Land Mobile Network (PLMN)  
SN = Subscriber Number  

In Kenya, CC = 254, NDC depends on the mobile network provider (71,72,70,79 for Safaricom, 73,78 and for Airtel etc.) The other digits in the complete phone number designate the subscriber number.

All the possible prefixes for mobile phone numbers in Kenya are described clearly here thus I will not delve into it but instead use the information provided.

Testing

In this blog, I will build regular expressions step by step. To test them I recommend using regex101.com, it is quite easy and useful.

Safaricom

1) Possible prefixes

Given the number 0712345678, it is common to consider 345678 as the subscriber number and 712 as the NDC since the latter is tightly controlled by the provider or the regulatory body. The possible prefixes range: 700 to 708, 710 to 719, 720 to 729 and 790 to 792. The regular expression for this part is thus 7(([12][0-9])|(0[0-8])|(9[0-2])). The first digit (7) ensures the prefix starts with digit 7. The following group is used to capture the other two numbers. If the number following the compulsory 7 is a 1 or 2, then the next number ranges 0 to 9. If the number following the compulsory 7 is a zero then the next number ranges 0 to 8. Otherwise, if the number following the compulsory 7 is a 9 then the next number only ranges 0 to 2. The character | is used to show possible combinations.

2) The last 6 digits

The last 6 digits are the easiest to validate since though they are usually serialized during manufacture, they seem almost random because there is no particular pattern to validate. The only thing we need to validate is that they are 6 digits. That is done using [0-9]{6}

3) Combining the prefix and the last 6 digits

This is as simple as a concatenation of the two expressions to form 7(([12][0-9])|(0[0-8])|(9[0-2]))[0-9]{6}.
However, this concatenation collects groups that we do not need. To make a non-capturing group, we add '?:' before the group (a group is denoted using normal brackets). In our case this becomes 7(?:(?:[12][0-9])|(?:0[0-8])|(?:9[0-2]))[0-9]{6}.

4) Adding the possible country code variations

As mentioned earlier, people tend to write the numbers in different prefixes (personal choices) but we desire to standardize. Before the compulsory digit 7, we have three possibilities: +254, 254 or 0. To validate this choice, I use (254|\+254|0). Making this a non-capturing group results in: (?:254|\+254|0). However, there are situations that neither of the three is present hence we make this group optional by appending '?' at the end, resulting in (?:254|\+254|0)?

5) Final expression

First we combine the result of 3 with that of 4 above to get:
(?:254|\+254|0)?(7(?:(?:[12][0-9])|(?:0[0-8])|(9[0-2]))[0-9]{6}). Finally, to ensure a whole string input is a phone number we enforce checking the beginning and ending of the string by adding ^ and $ at the beginning and ending respectively. The final expression is thus:
^(?:254|\+254|0)?(7(?:(?:[12][0-9])|(?:0[0-8])|(9[0-2]))[0-9]{6})$

Other providers

The same 5 step process (or shorter) can be used to validate phone numbers from Airtel and Orange/Telecom. Once you do your homework you can compare with my results below. They are likely the answers. :-)

Airtel=  ^(?:254|\+254|0)?(7(?:(?:[3][0-9])|(?:5[0-6])|(8[5-9]))[0-9]{6})$  
Orange=  ^(?:254|\+254|0)?(77[0-6][0-9]{6})$  
Equitel= ^(?:254|\+254|0)?(76[34][0-9]{6})$  

Using this/these in code

The regular expression above once validate results in two groups, if it matches. The first group is normally the whole string which was validated. The second group, in this case, will be the MSISDN without the country code e.g. 712345678. Since the country is known in this case, we only need to append it to the match results. I will show you how in Java and C#.

Java
String inputPhoneNumber = ""; //todo: populate correct number  
String validPhoneNumber = null;  
Pattern pattern = Pattern.compile("^(?:254|\\+254|0)?(7(?:(?:[12][0-9])|(?:0[0-8])|(?:9[0-2]))[0-9]{6})$");  
Matcher matcher = pattern.matcher(inputPhoneNumber);  
if (matcher.matches()) {  
    validPhoneNumber = "254" + matcher.group(1);
}
C-sharp
string inputPhoneNumber = ""; //todo: populate correct number  
string validPhoneNumber = null;  
var regEx = new Regex("^(?:254|\\+254|0)?(7(?:(?:[12][0-9])|(?:0[0-8])|(?:9[0-2]))[0-9]{6})$");  
var match = regEx.Match(inputPhoneNumber);  
if (match.Success) {  
    validPhoneNumber = "254" + match.Groups[1].Value;
}

Conclusion

That is not too hard after all. I find it better than having multiple switch cases and/or if conditions. In case, the possible prefixes change, I will try and keep these regular expressions update but it should not be too hard to adjust them before I update the post.

Happy coding :-)

Update (Safaricom) [Wed, Jun-14-2016]

After weeks of using the above expression in production, I encountered other numbers that did not match. Though these numbers were valid, there is no existing documentation pointing to the same. In particular, Safaricom has exhausted all prefixes in the 79x series and towards the end of 2017-Q1, they started using numbers in the 74x series.
Since the 79x series is simlar to 71x and 72x series they can be combined to form:
^(?:254|\+254|0)?(7(?:(?:[129][0-9])|(?:0[0-8]))[0-9]{6})$ As of the time of writing I have only encoutered 740-740 and 741. Adjusting the expression for the new numbers you can now use:
^(?:254|\+254|0)?(7(?:(?:[129][0-9])|(?:0[0-8])|(4[0-1]))[0-9]{6})$

Explanation (Safaricom)

from regex101.com