Spotless version 12 out now

A road sign to illustrate version 12 of Spotless' Data Integration solution

Version 12 of Spotless Data's unique browseable API Machine Learning Filters for data quality you can trust.

We are excited and delighted to announce the long-awaited and vasttly improved release of version 12 of Spotless Data's unique, browseable API solution to your rogue data issues. A major update to our service which ensures data quality you can trust again every time, thanks to our new improved Machine Learning filters which seamlessly filter out any dirty data and replace them with spotlessly clean data ready for use. We do this by allowing you the client to implement specifications which you have stipulated on your My Filters page, which you need to be logged in to see or if you have not used our service before please sign-up using your email address, Facebook, Google or GitHub accounts and give our service a try. We are sure you'll be impressed.

As a part of our version 12 release we have also just published this youtube video explaining in a clear, easy-to-understand manner how to use these filters. They learn through Machine Learning from their own experience of data cleaning and are therefore particularly useful to clients who keep coming back to have their new data cleaned or who have incorporated our API into the entry point to their data warehouse or data lake in order to ensure these work perfectly. So prepare to say goodbye to rogue data!

  • Re-designing of the website We have entirely redesigned our site, making it both more attractive and easier to view and navigate, focussing especially on our front page and our blogs. Please do take a look. We hope you like it!
  • Uniqueness rules We now support a new rule type. While a lot of rogue data issues are fairly common and appear time and again, we have found that there are also some unusual types of dirty data and this rule type will allow you to address these less common rogue data issues. 
  • Session rules These are used to clean both gaps and overlaps in any time series session data, and we have now improved the support for this type of rule.
  • Lookup rules are used to include best matches and to fill in blanks It is not always possible to perfectly match inaccurate data, and there are many occasions in which the best match is better than no match at all. You can specify these using your My Filters page specifications. While removing blank entries can skew a whole column of data, a blank entry in your data is often inappropriate and may leave your customers and employees scratching their heads and wondering what has happened. So lookup rules allow the filling in of these blank spaces, typically by using the best match available. This filling in of the blanks is a particularly cool function in our opinion. Try it out and see if you agree!
  • Increased number of file encodings and range of file separators supported We now support a greater range of file encodings or character encodings than we did before. We also allow a greater range of file separators and not just comma separated value (CSV) files. Both these improvements make it easier than ever to use our filters.
  • Improved code using pyspotless You can take a look at our Using Spotless with Python code page. We have now improved and updated the code.
  • We now offer 500Mb of free data cleaning Realising that you will want to test our data cleaning service to check for yourself that it really is as good as we say it is we had already offered 100Mb of data cleaning when you start to use our unique Machine Learning filters. We have now increased this to 500Mb as we recognise that many businesses have a whole range of different and challenging dirty data issues and that 100Mb of free data cleaning may not be enough to test our service against all your different types of rogue data. Simply sign up to our service using your email address or your facebook or github account. You can check out our Pricing and range of Subscription Packages.

See our introduction to using our API. You can also view our video on data cleaning a genre column.

Many thanks to the hard effort of the Spotless Data Science team in implementing this new, improved version. Have a great day!

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now