How to use Spotless Data?

Plans

Plans link together the different rules needed to validate a submitted file. Where possible you should always try to include a rule for every field in any types of file that you are going to submit.

 

Plan fields

Each plan has the following fields:

  • name - a name to identify your plan for easy reference
  • description - a description of how the plan functions. This is primarily for your own reference but will also be reviewed by our data science team if they need to optimise the automatching for your plan
  • rules - a collection of rules and the source field in the file that they apply to. Note that for CSV files the source field specifies the column in the source data, with the first column being zero
  • csv fields - these specify how the file is encoded and include text encoding, delimiters (',', '\t', '|' etc...) and quoting characters

 

Example plans

There are no example plans shipped with Spotless but any plans you create will be listed here

If you run through the getting started instructions then you will have an example plan created in your account for validating IP addresses.

The JSON for this plan looks like this:

{
    "id": "my-first-spotless-plan-2540ea34-0dba-413b-8176-45beeca78f8d",
    "url": "http://localhost:8000/api/plans/2540ea34-0dba-413b-8176-45beeca78f8d/",
    "name": "my first spotless plan",
    "description": "",
    "rules": [
       {
           "rule": "ip-address-validator-regex-rule-c0e8fc58-e6c4-11e5-9730-9a79f06e9478",
           "source_field": 0
       },
       {
           "rule": "positive-integer-validator-regex-rule-c0e8e0ce-e6c4-11e5-9730-9a79f06e9478",
           "source_field": 1
       }
    ]
}

 

Creating plans

Plans can only be created using code as shown in the getting started instructions:

import requests

token = "<YOUR TOKEN GOES HERE>"

response = requests.post(
    "https://spotlessdata.com/api/plans/",
    headers={"Authorization": "Token " + token},
    json={
        "name": "my first spotless plan",
        "rules": [
            {"rule": "ip-address-validator-regex-rule-c0e8fc58-e6c4-11e5-9730-9a79f06e9478",
             "source_field": 0},
            {"rule": "positive-integer-validator-regex-rule-c0e8e0ce-e6c4-11e5-9730-9a79f06e9478",
             "source_field": 1}
        ]
    }
)

Note that for CSV files the source field is the column number, with the first column being zero.