Examining data quality within the context of TV show titles.
TV titles can be a source of great frustration for websites who base their business model on having easily identifiable TV show names. A title can make-or-break a TV show, but a successful name from the perspective of a TV production company is not the same thing as a good name from the point of view of websites and companies which deal with the big data of large numbers of TV show titles, which may change very rapidly. From the point of view of a webmaster, having a show called Six of One would make more sense than having a show called Friends, and yet experts in the field question whether the famous US comedy would have been a success if it had been called Six of One, its original name. So what is good for TV Production companies may well not seem to be good for websites and others who are relying on targetted ads, are producing TV listings or EPGs, are showing VOD, or where they or their customers are discussing TV shows in articles and forums, all of which require data integrity.
To give an example of a naming problem, one of the most successful comedies from the UK is a show called Only Fools and Horses. Unfortunately, it is also known by many viewers as Only Fools & Horses, i.e. the and becomes an ampersand, written thus: &. Unless your database can recognise and verify that these two apparently different titles, in fact, belong to the same show, it will think there are two different shows here. Imagine this show only appears in your database as Only Fools and Horses. Then a customer types Only Fools & Horses into your TV search engine and gets no results for the show. Later that night she is surfing the TV channels and comes across the last five minutes of her favourite episode of the show. Your now furious customer has just lost faith in your TV website and as a result will be using your competitors' service instead tomorrow, and indeed forever more if the said competitor has resolved this ampersand problem better than you have due to the data validation of their TV titles.
Exactly the same issue would occur if the initial data called the show Only Fools and Horses S1 E5, specifying the particular series and episode from which it comes. This is useful information, but only if your customer can correctly see when the show is appearing on the telly. A similar issue occurs if someone using your site types in Coronation St, shortening Street to St. Unless your database can validate that what this customer actually wants is information about Coronation Street, you will again not merely have lost a customer, but, given said customer away as a gift to one of your competitors!
The solution to this problem is fortunately very simple; it is to clean and validate your TV titles to ensure that they are now data quality by entering them into Spotless Data's unique web-based data quality API solution for TV show title validationbefore entering the data into your platform. We have developed a reference dataset with a list of approximately 17,000 TV show titles, and this dataset can then be used to standardise naming conventions and ensure that your customers find the shows that they are looking for on your TV-themed platform.
Please do sign up for our service using your email address, Facebook, Google or GitHub accounts. Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing, and try it out with 500Mb of free data cleaning. You can also view our videos on data cleaning an EPG file and data cleaning a genre column which explain how to use our API. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.
Spotless data, the one-stop data quality API solution!
If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now