One of the most common problem with free form data entry is that the data is not submitted in a standard form. This makes it hard to identify duplicated records and even harder to integrate data from a number of different sources.
For example, email addresses should always be in the form firstname.lastname@example.org and telephone numbers in the US should always have 10 digits. If you’re matching two datasets and one has +1 202-456-1111 and the other (202) 456 1111 you wouldn’t know that they are the same phone number unless you pre-process them.
At Spotless, we use regular expressions to validate that a particular record is in the right format and if it isn’t then we automatically cleanse the data to put it into the right format.
Some of the most popular standard rules use regular expressions:
You can also create your own regular expression rules and share them with the Spotless community. For a great tutorial on regular expressions, try this link:
If you've any questions about Spotless Data or would like to speak with our team, please email email@example.com.