Using Spotless with Python

You can easily use Spotless in your python applications by using the pyspotless library.

The code is open source and you can download it here: https://bitbucket.org/tom-weiss/pyspotless

You can also view detail on pypi: https://pypi.python.org/pypi/pyspotless

Installation

You can install pyspotless by running

pip install pyspotless


Usage

1. Get your access token by going to https://spotlessdata.com/api/ when you are logged in
2. Get the URL of a plan you want to run from https://spotlessdata.com/plans/

3. Write the following python code:

spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>')
job = spotless.run_job("https://spotlessdata.com/plans/<PLAN ID HERE>", "my_dirty_file.csv")

# wait for the job to complete
while not job["processing_complete"]:
    job = spotless.get_job(job["url"])

# get the processed file and ensure it works
file = spotless.get_processed_file(job)
spotless.delete_job(job)

Callback option

If you are using a multi-threaded environment you can use the callback option

from pyspotless import SpotlessClient

def callback(job, spotless, id):
    clean_file = spotless.get_processed_file(job)
    # now do something with the file
    spotless.delete_job(job)

spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>')
spotless.run_job("https://spotlessdata.com/plans/<PLAN ID HERE>", "my_dirty_file.csv", callback, "my identifier")