Sign up for an account here, integrate the API into your workflow, and get cleaning.
Spotless provides a web service where dirty data is transformed into clean data by a combination of automatic and manual data cleansing.
Spotless runs extensive validation including matching against a known reference set, full regular expression validation, and validation that CSV files are well formed and UTF-8 encoded.
This can be used for check fields are well formed, matching database reference fields between two data sources, manually entered town or country names, or any other manually entered data.
In the future we will be supporting other types of cleansing and including grammar-checking, and multi-field data matching such as full address checking. If you have a data-set that needs to be cleaned that we don’t support, please start a new thread here.
We’re passionate about clean data. Every job is submitted together with an escalation email address and if there are problems with your data that we can’t resolve on our own (perhaps because your reference dataset is inaccurate) we’ll contact your escalation email address directly to resolve the problem.
Each new user signed up to Spotless gets 100Mb to trial the system. After that is completed, you need to enter into a subscription.
Check out questions on Stack Overflow here. If you can’t see the answer to your question, just post a new one tagged with spotless-data and we’ll pick it up and reply.
Our army of data scientists has created unique algorithms to cleanse data more effectively than any other algorithms. We also have a team of data scrubbers who can check the automatic cleansing and provide a full workflow to check and approved any data?
You always get a clean data set back immediately. If you’ve selected manual review then your data set won’t be marked as complete and you can continue to call the API to get updates until the manual cleansing is fully complete.
The time it takes to do this depends on the difficulties in cleaning the dataset. For example, if you’re matching two database tables together and there are missing foreign keys in the clean data table then we’ll kick off an escalation back to you before we mark the job as completed.
You can submit up to 1,000,000 records in a single job, 500,000 for a single reference job. If you need more than that then please split them into multiple jobs. We'll also reject jobs with a high number of dirty records as these are more likely to be due to a configuration issue on your side than being real data issues!
In Ireland in the EU - if you want Singapore or US processing, please contact us on twitter @spotlessdata