December 31, 2020 6 min read
Opinions expressed by Entrepreneur contributors are their own.
Google “big data,” and autocomplete for the search tells quite the story: “Big data is the future. Big data is the new oil.”
These are some exciting statements, but what’s often lost in the conversation about big data is the high cost of bad data.
If your company prides itself on making data-driven decisions, it’s important to recognize that those decisions will only ever be as good as your data. Poor data quality costs the U.S. economy $3.1 trillion per year, and it’s creating a crisis of faith across many industries.
According to a recent Gartner report, more than half of senior marketing leaders are disappointed in the outcomes they’ve seen from investments in data analytics. As a result, only 54% of their activities are influenced by data. By 2023, Gartner predicts that CMOs will be downsizing their analytics teams due to unmet expectations.
Related: Why Bad Data Could Cost Entrepreneurs Millions
The importance of quality data cannot be overstated, but often leaders don’t know where their data collection and analysis is breaking down. Here are three data-quality issues you might not be aware of …
Anomalies become harder to manage as data balloons
Often, data doesn’t follow a logical pattern. That doesn’t necessarily mean your data isn’t accurate, but outliers (such as seasonal fluctuations) must be accounted for.
If you own an apparel company that sees a huge demand for red sweaters in the lead-up to Christmas, you can easily identify the root cause and handle it appropriately. However, completely removing your outliers is not the answer, as some departments may need this outlier information, such as your procurement and merchandising teams.
This gets much more complicated as your company starts collecting and using more data. Each new metric will have its own trends and anomalies, and you can’t manually investigate and adjust all these outliers. As Credera’s Chief Data Officer Vincent Yates said in a blog post, “Classifying anomalies in data is an art, and virtually no organization has a mechanism to codify these annotations globally.”
Over time, the problem with inaccurate data sets creates a snowball effect. This data can’t be used to forecast future demand with any accuracy, which erodes organizational trust in data.
Related: Walking With AI: How to Spot, Store and Clean the Data You Need
Data models break down with volume
Just as unmanaged outliers can skew data over time, many data models begin to break down as the volume of data increases. This doesn’t mean that data models suddenly stop working. Most data quality issues within an organization exist from the outset, but they don’t become apparent until they reach a certain scale.
This is particularly relevant now, as stay-at-home orders have caused retail data to dry up overnight. When restrictions are lifted, it’s unlikely that consumer behavior will be exactly as it was before. Customers may be spending less or ordering more goods online. Many companies will find that their old data is no longer relevant.
“Even businesses that had amassed great volumes of customer data before Covid-19 are finding themselves in the same cold-start position as businesses venturing into unknown markets,” wrote Stanford University professor Angel Evan.
Nearly all companies will be re-scaling their data in the coming years. They’ll have to update their models to account for changes in consumer behavior.
Different departments use the same data for different purposes
Today, companies are producing more and more data, and that data is being used to inform company decisions at every level. Key Performance Indicators (KPIs) generated by an ecommerce team may be used by the marketing department to target a new customer segment. Or, that same data might be used at the highest levels to build models around financial performance to inform hiring decisions.
The trouble is that the people creating that data generally have no idea who is using it or how it is being used. It’s unclear who’s responsible for the accuracy and management of that data. This becomes problematic when it’s used to drive decision-making.
How to get a handle on data quality
Just as new challenges arise when you scale a company, you must re-evaluate your approach as you scale your data. Here are three best practices for improving data quality:
1. Appoint a chief data officer. The chief data officer (CDO) will be responsible for creating a plan to manage company data and maintain data quality as you grow. He or she should be an expert at drawing insights from data and contextualizing it so that the rest of your team can utilize the information.
2. Create a data strategy. Data is no longer just a helpful byproduct of marketing and sales activities. Data is an asset. Just like any other asset, it must be continually protected and managed.
Unfortunately, most companies keep their data and data activities siloed in different departments. While they might have lofty discussions about data quality or data protection, there’s no overarching strategy for how that data should be managed or used. This can lead to huge swaths of dark data and data decay.
A solid data strategy establishes how your company will manage and share data across the organization so that it can be used for its greatest benefit. This strategy should be scalable and repeatable.
3. Update data models as you scale. As your customer base grows and you begin collecting more data, you or your CDO must continually reevaluate the models you use to make sense of that data. It’s important to ask what has changed since you initially developed that model and whether certain metrics are even still relevant. This will allow you to get the clearest picture as you crunch the numbers.
Related: Key Challenges for Data Governance
We can’t go back to a time before big data, and nor should we. Big data has helped us make enormous advancements in everything from self-driving cars to patient outcomes. In the coming year, it will tell us how well Covid-19 vaccines work and on which groups they work best. But to make big data sustainable, you must first be proactive in improving the data quality within your organization.