How to use Spotless Data?

Implementing the API from Scratch

If you are not using Python or want to write your won API client, you can start by browsing the RESTful API here 

 

At the top of the page your will see your API key. This key is unique to you and is needed to make any calls to the RESTful API.

 

In this example, we will go through the task of validating a list of device IDs that list the average temperature of the devices at different times:

 

device_id session_start session_stop average_temperature

1022385 24/06/2016 09:17 24/06/2016 17:41 5.22

1022385 24/06/2016 17:41 25/06/2016 12:57 8.93

1022385 25/06/2016 12:57 26/06/2016 06:13 5.61

1022385 26/06/2016 06:13 26/06/2016 13:04 4.77

1022385 26/06/2016 13:04 26/06/2016 22:26 6.29

1022385 26/06/2016 22:26 27/06/2016 02:22 7.51

 

You can download the full file here [link to attached file]

 

First, we you should submit the file at https://spotlessdata.com/filters/

 

That should create an initial plan that you can use. From the example code section you should be able to get your own token and the URL of the filter (or plan) as it has been set up.

 

You can now set up code to submit a file for cleansing:


 

import requests

token = "<YOUR TOKEN GOES HERE>"

# create a job
response = requests.post(
    "https://spotlessdata.com/api/jobs/",
    headers={"Authorization": "Token " + token},
    data={"plan": <PLAN URL GOES HERE>},
    files={"original_file": open(‘fridge test small.csv’, 'rb')}
)

 

Remember that the value of plan comes from the profile URL from your previous call.

 

When the job first comes back it includes a URL you can use to retrieve the job:


 

#wait for the response to come through...
while response.json()["processed_file"] is None:
    time.sleep(1)
    response = requests.get(response.json()["url"], headers={"Authorization": "Token " + token},)

 

We then return a JSON file that includes a link to the cleaned file:

{
    "profile": "https://spotlessdata.com/api/plans/47381c9a-4363-4964-93d0-0871d257f2d8/",
    "original_file": "https://spotlessdata.com/uploads/original/source/ip_list_Utrd5TD.csv",
    "can_delete": False,
    "url": "https://spotlessdata.com/api/jobs/bf386633-84ae-4e47-a2f5-547d95250b88/",
    "processed_file": "http://spotlessdata.com/uploads/complete/dirty/bf386633-84ae-4e47-a2f5-547d95250b88-cleaned.csv",
    "processing_complete": True
}

 

We can then download and view the file referenced in process_file.

 

The full code in Python 3 for this example is:


 

import requests
token = "<YOUR TOKEN GOES HERE>"
# create a job
response = requests.post(
    "https://spotlessdata.com/api/jobs/",
    headers={"Authorization": "Token " + token},
    data={"plan": <PLAN URL GOES HERE>},
    files={"original_file": open('ip_list.csv', 'rb')}
)


#wait for the response to come through...
while response.json()["processed_file"] is None:
    time.sleep(1)
    response = requests.get(response.json()["url"], headers={"Authorization": "Token " + token},)

response = requests.get(response.json()["processed_file"])

print(response.text)

 

API AUTHENTICATION

Authentication in Spotless is handled by an additional header that needs to be added to every API requests that you make:

{"Authorization" : "Token 2A9B2D63285B4AD2CE6DEF71A2B6F28553C3DAB8”}

This token uniquely identifies your account and can be seen at the top of every page in the browsable API when you are logged in.

You should keep this token secret and only use it in HTTPS requests to spotlessdata.com. Any requests made using it will be billed to your account.

An example code for making a secure API call in Python are shown below:

import requests

response = requests.get(
    "https://spotlessdata.com/api/rules/reference_rules/",
    headers = {"Authorization" : "Token <YOUR TOKEN GOES HERE>”})

print (response.json())