Data cleaning against reference datasets

Data cleaning against a reference dataset

Frequently data can be checked by seeing if it exists in another database. For example, countries, cities, and streets are all well defined, as are car registration plates and popular products. However, when getting people to enter information into online forms or when combining data from two different databases, different practices can frequently lead to different spellings and data which is apparently different even though it should be the same.

At Spotless, we correct these problems to ensure data integrity by comparing data to a reference dataset to ensure the data validation of all the data. Any data is checked to see if it is present in the reference dataset and if it isn’t then it can be set to the closest match (to correct for typos) or even added to the reference dataset (if it’s missing).

    You can also create your own reference solutions set by uploading your own reference dataset. Remember to share them with the Spotless community!

    Data cleaning has never been easier thanks to Spotless Data's machine learning filters. Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing.

    If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now