How to use Spotless Data?

Getting Started with the API

Our 2nd video focusses on cleaning a genre column which has rogue data problems within an EPG file using Spotless Machine Learning filters

Most people use our Python API to implement Spotless. 


To use the API, you can just set up a filter here and copy and paste the example code for the filter.




You can install the pyspotless API client by typing:

pip install pyspotless




  1. Select a filter you want to use from 
  2. Copy and paste the code from the Example code panel
  3. Customise the code by replacing “my_dirty_file.csv” and my_function(file) in the example code with details on your file and how you want to process the clean file.


The example code is based on this code:

spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>')

job = spotless.run_job("<PLAN ID HERE>", "my_dirty_file.csv")

# wait for the job to complete

while not job["processing_complete"]:

    job = spotless.get_job(job["url"])

# get the processed file and ensure it works

file = spotless.get_processed_file(job)




If you are using a multi-threaded environment you can also use a callback on the API


from pyspotless import SpotlessClient

def callback(job, spotless, id):

    clean_file = spotless.get_processed_file(job)

    # now do something with the file


spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>')

spotless.run_job("<PLAN ID HERE>", "my_dirty_file.csv", callback, "my identifier")