Unlocking the Power of RegEx in XAMN Pro ^\w+([.-]?\w+)*@\w+([.-]?\w+)*(\.\w{2,3})+$

Author: Mikael Ibanez, Mar 2025

RegEx in XAMN Pro for investigators!

If you are anything like me, you are probably wondering what this is, what it means and how it can be used in Mobile Forensics?
There are limitless uses of RegEx in both forensics and software development. This paper will focus on the Mobile Forensic part of it and specifically on extractions viewed in XAMN Pro from MSAB.

What is RegEx?

RegEx is short for Regular Expressions (of course this has been debated over the years, but for this blog post, that is what I refer to)
– It is a text string search.
You can use it to find text strings, or part there off, in a body of text, that holds a certain pattern. It is based on the mathematical concept of regular sets and regularity. Basically, it is a sequence of characters that are used in search patterns. You can use digits, letters, and specific characters in different combinations.

RegEx in XAMN Pro

There are two ways to use RegEx in XAMN Pro, but I will only cover the Easy/Basic way in this paper. This is where most of the examiners and investigators will be working. The “Advanced” way is from our “Hex view”, and if you know and understand Hex, you should already be familiar with how RegEx works and how to use it.

The advanced way in XAMN Pro can be found in our HEX View under “Find”

The easy way can be found under the “Smart processing” tab in XAMN Pro. Where one of the options are ”(.*) Pattern analysis

From there you will just link to your RegEx and run it.

XAMN Pro has “Boost regex” implemented, and not .NET version. Boost supports several dialects of regex, and XAMN use Perl (PCRE) and supports Unicode.

Keep in mind that if your RegEx finds some data, then the search is valid.

There are many uses and possibilities for RegEx in Mobile Forensics.

  • License plates
  • MAC addresses
  • URLs
  • Email addresses
  • Credit Card numbers
  • Social Security numbers
  • Telephone numbers
  • Hashes
  • Geo data

It is your imagination that sets the limit. If the data, you are looking for has any form of pattern in it RegEx can be the solution.


The basics of RegEx

Special characters to help you in your search strings.

[ ]         Letters or digits within brackets, order does not matter
( )         Character group, matches the characters in the exact order written.
(sho) = matches to Seashore

|           Alternation, allows for alternate matches. ” | “ operates like an ”OR”.           dat(abc|123) = databc OR dat123

?                    Question mark matches when the character after the “ ? ” occurs 0 or 1 time only, making the character match optional.
colou?r = colour (u is found 1 time)
colou?r = color (u is found 0 times)

*                    Asterisk matches when the character after the “ * “ matches 0 or more times.
abuse*= abusee (e is found 2 times)
abuse* = abuse (e is found 1 time)
abuse* = abus (e is found 0 time)

+                   Plus sign matches when the character after the ” + ” matches 1 or more times. The + sign makes the character match mandatory.
license+ = licensee (e is found 2 times)
license+ = license (e is found 1 time)

. (period)        The ” . “ matches any alphanumeric character or symbol.
ton. = tone
ton. = ton#
ton. = ton4

.*                   Combine the metacharacters . and *, in that order .* to match for any character 0 or more times. Like a “Wildcard” search.
ar.* = ar
ar.* = artist
ar.* = armour
ar.* = arrest

 

A set is a set of characters inside a pair of square brackets [] with a special meaning:
Set Description
[arn] Returns a match where one of the specified characters (a, r, or n) is present
[a-n] Returns a match for any lower-case character, alphabetically between a and n
[^arn] Returns a match for any character EXCEPT a, r, and n
[0123] Returns a match where any of the specified digits (0, 1, 2, or 3) are present
[0-9] Returns a match for any digit between 0 and 9
[0-5][0-9] Returns a match for any two-digit numbers from 00 and 59
[a-zA-Z] Returns a match for any character alphabetically between a and z, lower case OR upper case
[+] In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

 

Example 1
How to use RegEx to find an email address.

How is an email address usually constructed?
aaa.bbb@ccc.dd
aaa = Could be First name, username or similar
bbb = Could be Last name or similar
@ = Divider for username and domain name
ccc = Domain name
dd = Top-Level Domain (TLD), also known as a domain extension. (.com / .se / .travel)

Since an email address can contain an infinite number and/or characters both before the ”@“ and after, as well as in the TLD, we need to define all of that in our RegEx search. Most likely, the TLD of an email address usually contains 2-6 letters, the .com / .se / .travel, and so on. So, we will limit our search to that. The industry standard limit on domain name is 63 characters.

First, we need to specify that it contains letters, between “A-Z” / “a-z” and/or digits between “0-9”. Even though you may use special characters as well as space and some other things, it is not used very often. So let us focus on the most common ones. So, it can be ”A-Z” or ”a-z” (it may be case sensitive and using the format “A-Za-z” eliminates that potential problem) and the digits “0-9”:
[A-Za-z0-9]
But the first part of the email address can contain a “.” Or special letters like “ _ % + – “. So we must take that into consideration as well:
[A-Za-z0-9._%+-]
Then we have the “@” sign:
[A-Za-z0-9._%+-]+@
And then repeat the same again for the Domain name:
[A-Za-z0-9._%+-]+@[A-Za-z0-9]
and the  “.” and TLD (“ +\. “ = match a single “.” )
[A-Za-z0-9._%+-]+@[A-Za-z0-9]+\.[A-Za-z]
If you want to define the TLD and minimize false positives you can add a limitation on the amount of letters being used, in this example i will use 1-6 characters:
[A-Za-z0-9._%+-]+@[A-Za-z0-9]+\.[A-Za-z]{1,6}

Final result: [A-Za-z0-9._%+-]+@[A-Za-z0-9]+\.[A-Za-z]{1,6}

This will most likely generate a high number of “hits” in XAMN Pro. Since it will show all categories, unless you have applied some other filters. All depending on what and where you are looking. By adding a filter for “Messages”, “Databases” and / or “Documents” you will narrow down your search results. This goes for all your searches in XAMN Pro. The more filters you use, the more precise the result.

Example 2
How to use RegEx to find a Swedish License Plate.

First you need to define how a Swedish License Plate is “constructed”?
AAA 111 or AAA 11A
Where “A” represents letters and “1” represent digits.
So it is very basic.
Of course there are custom license plates as well, but that is still quite unusual in Sweden which makes this example so much easier, since I just skip them.

You can write it with or without a space in-between “AAA 111” or “AAA111”.

We use the same start as above with the email address. We start with letters for the “AAA”:
[A-Za-z]
Then we want to specify that it has to be 3 letters, A-Z upper or a-z lover case:
[A-Za-z]{3}
Then we have the space or no space:
[A-Za-z]{3}[\s-]?
Then the last 2-3 digits and/or 1 letter:
[A-Za-z]{3}[\s-]?[0-9]{2}[A-Za-z0-9]{1}
Where:
[0-9]{2} = 2 digits 0-9
[A-Za-z0-9]{1} = 1 letter A-Z / a-z or 1 digit 0-9

Final result: [A-Za-z]{3}[\s-]?[0-9]{2}[A-Za-z0-9]{1}

Easy right?

Last example – Universal Tracking number

The universal tracking number for parcels shipped with national post couriers have a set pattern:

ab123456789cd
ab – 2 letters
123456789 – 9 digits
cd – 2 letters

If you look for UPS (example after this), DHL, USPS or some other postal company’s tracking number, they all have their own pattern. All of them easy Googleable!

So, lets stick with the “standard” for this example.

Starting with 2 letters “A-Z” or ”a-z” (it may be case sensitive and using the same format as in email address search, “A-Za-z” eliminates that potential problem).:
[a-zA-Z]
and specify that it should only be 2 letter, we add {2} to the end:
[a-zA-Z]{2}
The we need digits ”0-9” and 9 of them:
[a-zA-Z]{2}[0-9]{9}
and finish it of with 2 letters at the end:
[a-zA-Z]{2}[0-9]{9}[a-zA-Z]{2}

Final result Universal tracking number: [a-zA-Z]{2}[0-9]{9}[a-zA-Z]{2}

 

Bonus – This is a search for UPS tracking:
[0-9]{1}[a-zA-Z]{1}[a-zA-Z0-9]{6}[a-zA-Z0-9]{2}[0-9]{8}

Let us break it down:
[0-9]{1} = 1 digit
[a-zA-Z]{1} = 1 letter
[a-zA-Z0-9]{6} = 6 letters or digits
[a-zA-Z0-9]{2} = 2 letters or digits
[0-9]{8} = 8 digits

Again, to minimize all the false positives, limit your search to ”Messages” and ”Notes” in XAMN.

 

About the Author: 

Mikael Ibanez

Mikael has been working for MSAB for over 16 years in various roles from Technical Support, Technical Trainer, Product Manager to Product Specialist. During his years at MSAB, he has gained extensive knowledge about the Mobile Forensic industry, mobile phones and the products that MSAB produces. Mikael is also in charge of purchasing phones, tablets and all other kind of gadgets that XRY supports. Between 500-1000 devices are bought every year.