How to use Spotless Data?

Getting Started with the API

Most people use our Python API to implement Spotless. 

 

To use this, you can just set up a filter here and copy and paste the example code for the filter.

 

Installation

 

You can install the pyspotless API client by typing:

pip install pyspotless

 

Usage

 

  1. Select a filter you want to use from https://spotlessdata.com/filters/ 
  2. Copy and paste the code from the Example code panel
  3. Customise the code by replacing “my_dirty_file.csv” and my_function(file) in the example code with details on your file and how you want to process the clean file.

 

The example code is based on this code:

spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>')

job = spotless.run_job("https://spotlessdata.com/plans/<PLAN ID HERE>", "my_dirty_file.csv")


# wait for the job to complete

while not job["processing_complete"]:

    job = spotless.get_job(job["url"])


# get the processed file and ensure it works

file = spotless.get_processed_file(job)

spotless.delete_job(job)

 

 

If you are using a multi-threaded environment you can also use a callback on the API


 

from pyspotless import SpotlessClient


def callback(job, spotless, id):

    clean_file = spotless.get_processed_file(job)

    # now do something with the file

    spotless.delete_job(job)


spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>')

spotless.run_job("https://spotlessdata.com/plans/<PLAN ID HERE>", "my_dirty_file.csv", callback, "my identifier")