Our 2nd video focusses on cleaning a genre column which has rogue data problems within an EPG file using Spotless Machine Learning filters
Most people use our Python API to implement Spotless.
To use the API, you can just set up a filter here and copy and paste the example code for the filter.
You can install the pyspotless API client by typing:
pip install pyspotless
The example code is based on this code:
spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>') job = spotless.run_job("https://spotlessdata.com/plans/<PLAN ID HERE>", "my_dirty_file.csv") # wait for the job to complete while not job["processing_complete"]: job = spotless.get_job(job["url"]) # get the processed file and ensure it works file = spotless.get_processed_file(job) spotless.delete_job(job)
If you are using a multi-threaded environment you can also use a callback on the API
from pyspotless import SpotlessClient def callback(job, spotless, id): clean_file = spotless.get_processed_file(job) # now do something with the file spotless.delete_job(job) spotless = SpotlessClient(access_token='<SPOTLESS TOKEN HERE>') spotless.run_job("https://spotlessdata.com/plans/<PLAN ID HERE>", "my_dirty_file.csv", callback, "my identifier")