Data Quality

The 4 essential criteria for data source selection

In today’s digital world, companies increasingly recognize the value of data as a business asset that should be protected. Tech companies license data to power their solutions and market their products across a variety of applications:

Although the use cases are different, the criteria for selecting a data source remains largely the same. Technology companies and developers should consider the following components when licensing data to power their solutions:

Data Coverage

When selecting a data source, developers need to make sure the datasets they are licensing have good coverage, meaning that they encompass the entirety of the data needed to fulfill the purpose that a product, application, or service was designed for. Comprehensive coverage is paramount in creating both a good product and a good user experience.

For example, data coverage can help a company set the price of a product. Airbnb, the popular vacation home rental service, can use data coverage to help set prices to rent out their homes.

If the data set has good coverage, inclusive of various attributes such as traffic reports, restaurants/bars, shopping, crime statistics, public transit routes, and more Airbnb will be able to compare the prices and addresses of their current listings and determine an appropriate range. Rentals that are in low-crime areas boasting a lot of restaurants and shops and close to public transit would be priced higher.

Data Accuracy

Data accuracy determines how effective your solution will be and how well it can serve your customers’ needs. A developer cannot create a quality product if it is powered by inaccurate data.

For example, if a navigation or a business listing tool guarantees coverage of a specific geographic area, it needs to provide accurate information about businesses in that area. Poor quality data can mean that a user attempts to contact or visit a business that has closed, moved to a different location, or changed their phone numbers. Customers are busy and wasting their time because of inaccurate data will immediately turn them off from using an app, a platform, or a solution. Disappointed consumers are much more likely to leave bad reviews or share their negative experiences on social media.1

Inaccurate data has negative consequences that can affect a company in a number of different ways. For example, it can lead to a product miscategorizing the industry or size of a business. For companies using data for marketing purposes, inaccurate data can lead to wasted budget, low inboxing rates and ineffective campaigns. If a company is using inaccurate data for risk assessment, misrepresenting the industry and size of businesses within a building or overstating the number of businesses in a building will lead to poor conclusions when assessing the risk of a commercial property.

When assessing the coverage and accuracy of data, developers should take into account what sources were used to build the database, what processes are employed to verify the data, and what ongoing monitoring the data vendor performs to identify changes and update their data. Waiting for businesses to self-report changes is not enough to maintain an accurate dataset.

Brand example: Boeing

When it came to Boeing’s 737 aircraft, data accuracy was literally a matter of life-or-death. In March 2019, the Boeing 737 MAX passenger airliner was grounded worldwide after two tragic crashes. The cause of the crashes? Inaccurate flight data.

Boeing has reported that the flights crashed because the software piloting the flight pushed the noses of the aircraft down repeatedly, due to inaccurate flight data. The training of the human pilots did not adequately cover what to do in this situation and did not allow for them to override the software. Boeing has spent 8 billion dollars in relation to the accidents, suffered a damaged brand reputation, and most importantly – the loss of life was immeasurable.2

Smith was referring to the multi-billion dollar fallout3 from Boeing’s faulty 737 Max software that resulted in two crashes killing 346 people. Boeing’s manufacturing center is based in Renton, Wash. The company has announced more than $8 billion in costs related to the accidents and the damage to airlines and suppliers extends far beyond that.

The flights reportedly crashed because software tried to push the noses of the aircrafts down repeatedly, due to inaccurate flight data, and the pilots were not able to address the problem in time.4

Real-Time Updates

Did you know that every hour roughly: 521 business addresses will change, 872 telephone numbers will change or disconnect, and 1,504 URLs will be created or ? These rapid changes mean the data in an organization’s CRM is  . Companies can tackle this challenge by accessing real-time data through APIs as long as they use a data provider who offers real-time updates and has processes to identify when information has changed

Data partners should offer robust APIs that enable analysts and developers to access large databases instrumental for building and scaling products while decreasing lag time and eliminating the financial burden of maintaining large data stores.

Going back to our hypothetical navigation product example – real-time data updates are necessary to inform product developers of the changes to a business or residence as these changes become available. Delaying such updates by weeks or even days can make a product or platform outdated, leading to a poor user experience and dissatisfaction with the product.

Brand example: Zillow

Zillow, the premier real estate listing website, used real-time data to increase the accuracy of their ‘Zestimates’, the company’s estimates of a listing’s worth. Zillow currently offers Zestimates for more than 100 million U.S. homes, alongside hundreds of attributes for each property. To figure out how much a home is worth, Zillow uses a variety of public-record data, such as tax assessments, sales transactions, images of houses, MLS listing data, etc. When Zillow started ten years ago, they would input this data into their in-house machine-learning framework. As Zillow grew, they had a hard time scaling their process. Real estate valuations change quickly, and they needed a data source that would feed them a constant stream of new data.

Zillow has partnered with areal-time data streaming service that continuously captures the data they need to calculate their Zestimates.  In turn, this data is ingested and pushed into Apache Spark, which runs machine-learning models on the data and allows Zillow to compute Zestimates in seconds.5

Simplicity of Integration

that can easily integrate data into a product or platform is crucial. Developers should look for sources that provide standardized data, reliable, and flexible file formats. The provider should also offer comprehensive customer service, technical support, and proper documentation to seamlessly onboard new data and ensure smooth product operation.

It goes without saying that every product or platform’s data requirements are different as they support an extensive range of use cases. Yet, one thing remains the same: businesses and developers need to prioritize coverage, accuracy, ease of implementation, and frequency of data updates when vetting available sources to set themselves up for success.

Conclusion

Data is the foundation of any successful business. Companies need a data source that is comprehensive, accurate, and in real-time, that can be easily integrated into their products, apps, and internal software systems. Selecting the right data partner the first time around will provide companies with a superior product and fuel successful marketing programs.

Use our vendor selection worksheet to make sure you select the right data partner for your product.


1.https://www.onlinereputationmanagement.us/angry-customers-likely-post-bad-review-happy-customers-good-one/
2.https://www.geekwire.com/2019/microsofts-brad-smith-cites-boeing-crisis-cautionary-tale-intelligent-machines-calls-ai-kill-switch
3.https://www.nytimes.com/2019/07/24/business/boeing-earnings-737-max.html/
4.https://www.bloomberg.com/news/articles/2019-07-27/latest-737-max-fault-that-alarmed-test-pilots-rooted-in-software
5.https://aws.amazon.com/solutions/case-studies/zillow-zestimate

Natasia Langfelder
Content Marketing Manager

As Content Marketing Manager, Natasia is responsible for helping strategize, produce and execute Infogroup’s content. With a passion for writing and an enthusiasm for data management and technology, Natasia creates content that is designed to deliver nuggets of wisdom to help brands and individuals elevate their data governance policies. A native New Yorker, when Natasia is not at work she can be found enjoying New York’s food scene, at one of NYC’s many museums, or at one of the city’s many parks with her two teacup yorkies.