How to use Spotless Data?

Getting Started with the API

Our 2nd video focusses on cleaning a genre column which has rogue data problems within an EPG file using Spotless Machine Learning filters

Most people use our Python API to implement Spotless. 

 

To use the API, you can just set up a filter here and copy and paste the example code for the filter.

 

Installation

 

You can install the pyspotless API client by typing:

pip install pyspotless

 

Usage

 

  1. Select a filter you want to use from https://spotlessdata.com/filters/ 
  2. Copy and paste the code from the Example code panel
  3. Customise the code by replacing “my_dirty_file.csv” and my_function(file) in the example code with details on your file and how you want to process the clean file.

 

The example code is based on this code:

spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>')

job = spotless.run_job("https://spotlessdata.com/plans/<PLAN ID HERE>", "my_dirty_file.csv")


# wait for the job to complete

while not job["processing_complete"]:

    job = spotless.get_job(job["url"])


# get the processed file and ensure it works

file = spotless.get_processed_file(job)

spotless.delete_job(job)

 

 

If you are using a multi-threaded environment you can also use a callback on the API


 

from pyspotless import SpotlessClient


def callback(job, spotless, id):

    clean_file = spotless.get_processed_file(job)

    # now do something with the file

    spotless.delete_job(job)


spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>')

spotless.run_job("https://spotlessdata.com/plans/<PLAN ID HERE>", "my_dirty_file.csv", callback, "my identifier")