Spotless provides a web service where dirty data is transformed into clean, quality data by a combination of automatic and manual data cleansing.
Spotless runs extensive validation including matching against a known reference set, full regular expression validation, duplication removal, session rules to cleanse gaps and overlaps and data validation to cleanse any data with a format. Spotless also validates that CSV and TSV files are well formed and UTF-8 encoded.
These can be used for checking fields are well formed, matching database reference fields between two data sources, manually entered town or country names, or any other manually entered data.
In the future we will be supporting other types of cleansing and including grammar-checking, and multi-field data matching such as full address checking. .
Check out questions on Stack Overflow here. If you have a data-set that needs to be cleaned that we don’t support or if you can’t see the answer to your question, just post a new one tagged with spotless-data and we’ll pick it up and reply.
We’re passionate about clean data quality. Every job is submitted together with an escalation email address and if there are problems with your data that we can’t resolve on our own (perhaps because your reference dataset is inaccurate) we’ll contact your escalation email address directly to resolve the problem.
Each new user signed up to Spotless gets 100Mb to trial the system. After that is completed, you need to enter into a subscription. See our pricing here.
Our army of data scientists has created unique algorithms to cleanse and ensure trustworthy data quality more effectively than any other algorithms. We also have a team of data scrubbers who can check the automatic cleansing and provide a full workflow to check and approve any data.
How do you help customers with complex data sets?
Spotless Data Consulting and Professional Services team can help you by providing a proof of concept. Terms apply.
You always get a clean data set back immediately. If you’ve selected manual review then your data set won’t be marked as complete and you can continue to call the API to get updates until the manual cleansing is fully complete.
The time it takes to do this depends on the difficulties in cleaning the dataset. For example, if you’re matching two database tables together and there are missing foreign keys in the clean data table then we’ll kick off an escalation back to you before we mark the job as completed.
You can submit up to 1,000,000 records in a single job, 500,000 for a single reference job. If you need more than that then please split them into multiple jobs. We'll also reject jobs with a high number of dirty records as these are more likely to be due to a configuration issue on your side than being real data issues!
In Ireland in the EU - if you want Singapore or US processing, please contact us on twitter.
Spotless Data does not retain any customer data after the files are cleansed and reports are provided; all customer data then gets removed from our systems.