Data Cleaning

As well as identifying data quality issues, Spotless can clean dirty data and replacing invalid fields with corrected data.

Supported rules are:

  • Fixing dates and times to a specific format
  • Correcting numbers to a given number of decimal places
  • Setting unique foreign key references that have been truncated or corrupted
  • Adding missing data using our machine learning lookalike model
  • Updating common spelling mistakes with a lookup rule

    Blog posts about Data Cleaning

    Sept. 2, 2016, 11:45 a.m.
    Spotless Data's machine learning filters will help your business win the race to the top. We've just released version 3 of Spotless with two key new features: Firstly, deduplication solutions let you identify data that's been duplicated and delete it from the platform. You can check against individual columns, all columns, or all but one column and get the row deleted automati...
    Read More
    March 18, 2016, 6:01 a.m.
    Spotless Data's machine learning filters can really help make Tableau work perfectly every time. Tableau has the most fantastic mapping functionality but how frequently have you uploaded your geographic data only to find a large number of “nulls” on the map? At Spotless we have set up a number of solutions specifically to transform your geographic data into Tableau cities, co...
    Read More
    April 19, 2016, 5:37 a.m.
    Dirty addresses are one of the many tasks that Spotless Data's Machine Learning Filters can seamlessly clean. Address databases are often badly formatted with duplicate addresses, different spellings: sometimes because of bad data entry, and sometimes because customers are deliberately entering their address incorrectly to take advantage of a new customer offer. At Spotless we provide...
    Read More
    April 19, 2016, 5:36 a.m.
    Frequently data can be checked by seeing if it exists in another database. For example, countries, cities, and streets are all well defined, as are car registration plates and popular products. However, when getting people to enter information into online forms or when combining data from two different databases, different practices can frequently lead to different spellings and data which is a...
    Read More
    March 31, 2016, 8:59 a.m.
    One of the most common problems with free-form data entry is that the data is not submitted in a standard form. This makes it hard to identify duplicated records and even harder to integrate data from a number of different sources to ensure data integrity. For example, email addresses should always be in the form xxxx@yyy.zz and telephone numbers in the US should always have 10 digits. If yo...
    Read More