Our Machine Learning Filters are ideal for assuring quality data that allows you to measure your online presence
Securing and maintaining an online digital presence for one's brand and using it to drive sales represents a new challenge for many businesses regardless of what market they are in. While this challenge is made more complicated by the recent explosion in big data, these also represent a tremendous opportunity which the commercial winners in the coming decade will use to their advantage to stand out from their rivals. One of the keys to achieving this goal is for companies to successfully measure the data they have about their online presence or use to maintain said presence, blending the data from various sources such as multiple measurement datasets, external feeds that either drive or monitor one's online presence, and ad impressions.
While we at Spotless do not pretend to be able to help you with the actual measuring of your digital presence, there are no lack of tools that can do so. What we can do is to make sure that the data your company is using to do this task of measuring are trustworthy and quality data which all hang together when analysed to portray a true and accurate picture of your digital presence. We do this through our unique Machine Learning filters, accessed through a simple-to-use python API, which filters out rogue data so that your data are then ready for action!
Data stored in datasets in your data lake to help assess your digital presence include Google analytics, your Content Delivery Networks (CDN) and data which measure the various devices people use to access your website(s), typically PCs, tablets and mobiles. These data are particularly useful when it comes to marketing, in understanding the impact and range one's digital presence is having and to get an idea of where one's digital offering could be improved.
One issue consistently faced by those tasked with measuring the digital footprint of their companies is integrating together these various measurement datasets so that an overall picture can emerge, which can then help your business identify your KPIs and reach your overall business goals. You need your data to determine those things that are of importance to your company, which reports can then display so that the key personnel in your business can easily understand them. However, quality data that are free from the rogue corruptions and inaccuracies of all sorts, including mismatches caused by non-standardisation of the various data, is the vital first step. This is why it makes so much sense to build Spotless into the entry point of your data lake so that the only data in there are fit to measure your online presence as it really is rather than building a false or distorted picture.
Google Analytics is a great tool but not of itself sufficient for all but the smallest and least complex businesses and to integrate it in with your other measurement datasets is the best way to use this tool to help build up a detailed picture of your website(s).
CDN or web logs provide a tremendous amount of data which contains useful information. This includes how many people access your website, and their physical location, which is helpful not merely for identifying who is interested in your digital presence but also in determining and deterring spam networks. The pages of your website which people access, how long they stay, error rates, where a client is unable to view a web page, perhaps due to server overload indicating, if it is a chronic problem, that you need to increase your server capacity, are also handy pieces of information about your digital presence.
Some businesses report that customers initially come to their websites using their mobiles but when it comes to actually doing things on your site, such as buying a product, making a reservation or watching the VOD programmes they discovered through an EPG on their phone, that they prefer to use a more sophisticated device such as a PC or a smart television. For this reason, as well as being an indicator of how much time and effort you should spend on making your website(s) friendly and usable for all devices, the CDN data about how many people are viewing and using your digital content through various devices has become a must in 2017.
A website that just stretches to a couple of pages outlining a company's basic products and services while saying something about the company and is packed full of other people's ads isn't going to establish much of an online presence. This is especially so when rivals have sophisticated ad-free websites running to many pages which include a great deal of content and a detailed description of their products and services if not an actual online store where customers can order these products and services. Some sophisticated SEO work to ensure that the site ranks well on Google and other search engines, and perhaps a regularly updated blog as well, all help to portray a professional digital presence. And then there are the measurement tools described above, required to gauge how well the web presence is impacting.
What this means in practice is both employing somebody or even a team to build and market your great website(s), but also the use of external feeds to actualise your content, including RSS feeds and videos, perhaps uploaded to youtube and then embedded into your site. These should change over time to obtain the best SEO results.
Many sites use adverts to make extra money for their websites, particularly media sites offering free content. Websites which aim to make their money through selling products and services are likely to be those who advertise on these other websites to increase awareness of their brand, attract new customers and drive sales. There are many companies which offer online advertising services such as Google ads, who also help third-party websites display ads, and Facebook ads, advertising directly on Facebook.
Ad impressions mean that the company paying for the ad gets a much better idea of how many people see the ad than is the case for other forms of advertising. Ads which appear on billboards or on television, where many viewers get up to do something else during the ad breaks, or in a newspaper, where unless an ad is on the front page there is no guarantee readers will see it as many readers just look at the front page and the sports or business sections, do not have as many people seeing them as those who offer the advertising space might claim. Generally, companies pay for the number of ad impressions rather than the number of clicks from the ad that lead the viewer directly to one's website. A comparison of the number of clicks as a percentage of the number of ad impressions is of itself a useful piece of data, especially regarding judging the efficacy of any particular ad. So if the click-through rate of an ad is low in comparison to other ad campaigns, your marketing department needs to review the ad and come up with something better. Analysing such factors as the bounce rate for users who do click through to your site but don't stay there for very long is another measurement that says a lot about your website as well as about the type of who has been person shown the ad. So if a company got the same click-through rate from Facebook and Google but twice the bounce rate from those customers sent by Google it would probably indicate a need to spend more on Facebook ads and less on Google ads.
However, more detailed analysis of ad impressions can also be a useful measure how many users get to see the ad. If the vast majority of ad impressions come from mobile users, where the ad is perhaps too small to be seen, these are probably not ad impressions worth paying for. Gauging the number of users using ad-blocking software or whose device has been infected by malware which blocks the view of your ad and, unknown to the device's user, hijacks your ad with the malware owner's ads which you are then paying for, are also worth discovering through your ad impressions to help build a more truthful picture of the real number of ad impressions that users will get to see.
To present an overall picture of your advertising campaign requires that these data are blended with the other measuring data for your digital presence. It is unlikely that the raw ad impressions data you have will be consistent with your other measured data. This is where, by building Spotless into the access point where data enters your data lake or data store, you can ensure that a single report includes ad impressions with all the other vital data required to build up the picture you want of your online presence.
Your data store contains all the data which you have about your online presence including your databases, emails, ad impression, measurement data and any external feeds. Some of these are dark data which you have but are not using to their full potential, such as the IP locations of your customers.
So if you sell your products exclusively within India but a lot of your IP locations are coming from within the European Union you either need to expand to that territory or, if that is not what you want or are capable of doing, revise your online presence as ad impressions which are seen by audiences in the European Union may have little value to your business and not be worth paying for.
Ensuring your company has smooth access to all these ad impressions data is never going to be an easy task. However, it will be much easier if the data are quality data. Rogue data are those which are inconsistent and don't match with each other rather than being inherently wrong. For instance, if your company uses both Google and Facebook to advertise and then stores the data from both companies monitoring how these ad campaigns are going, the differences between the ways these two companies provide their data can cause confusion. What is required is one standard way of dealing with your ad data, whether it is Google's way, Facebook's way or your way, which may be a combination of the two and which would require you to submit the Facebook and Google ad impression data to our Spotless Data python API so that the different data become standardized and thus easy for your software programmes, analytic tools and key personnel to understand. The same is the case with your measurement tools, your CDN logs and all the data within the data store, even down to simple things such as using two different email providers, one for internal communication and the other for communication with customers. If the standard terms these email providers use are different, so one may refer to users and the other to clients, this can cause endless confusion if our Machine Learning filters do not subsequently modify the data. There is no right answer as to what you should call those with whom you communicate via email, what is required though is to have one term and not several to describe these people. As you set the specifications for data cleaning using our solution, this is made easy.
You can read our introduction to using our API and then try using our service on your My Filters page but, for this, you need to log in first. If you haven't already done so, you can sign-up using your email address, Facebook, Google or GitHub accounts. You may also view our videos on data cleaning an EPG file and data cleaning a genre column , which both explain how to use our API.
We are giving away 500Mb of free data cleaning to each newly signed-up customer so that you can see for yourself how effortlessly and swiftly our API filters work, typically under five minutes from start to finish.
We guarantee your data are secure while they are in our care and that they are not accessible by any third party, a responsibility we take very seriously.
If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now