Your board has been demanding more insight into the day-to-day operations of your business. They want access to a dashboard that displays key business metrics so they can assess the health of the company. In order to meet this need, you have just completed the rigorous process of evaluating your core business processes, implementing “best-of-breed” applications, and getting everything up and running.
Now that everything has been deployed and is running smoothly, you can sit back and relax, because your applications are gathering all the data you need. Or can you...
Don’t get me wrong, you have completed a crucial building block of any data management system. However, it’s critical to understand that all that data is being fed into a database, and databases are living, breathing things. The data is constantly changing, and as a result, it is getting dirty. In fact, it probably needs a bath more frequently than you and I do.
Here are five reasons that your application’s database is getting dirtier by the day.
1. Duplicate Entries are Still Happening
Raise your hand if you have a customer in your CRM that is named “John Smith - DO NOT USE.” Most people that raise their hand will have two or more “John Smith” records, but will add the “DO NOT USE” to the title in an effort to signal to everyone else which record is actually the most recent and accurate.
CRM applications are complex by nature, therefore making the above strategy a bit dangerous. CRMs are designed to track each and every time you interact with your customers, and by default, this means customer data will have many different things attached to it.
For example, one of the records might have every email you have ever sent to that individual attached to it, while the other duplicate record might have all the phone calls that your sales team has placed. Or worse yet, the two records might have differing information about the same email that was sent. In this situation, the only option is to combine the two records into one clean record. The problem is that you aren’t confident that the information about the emails or phone calls will still be there when you are done, or that important information has been overwritten.
2. Invalid Dates
Dates may be the single most important piece of information to a business intelligence system. You need to know when you made that sale, when the customer signed up, or when the last time you sold an item was.
Despite the simplicity of the “date” value, it can be quite difficult for a software application to know if they “make sense” or not. You as a human can do this quickly and easily. Tell me if these make sense to you:
- We first sold to customer John Doe on 2/1/2014, but their account was created on 5/1/2014
- We had 3 customers sign up on 12/31/9999
- The last time we sold a flux capacitor was 1/1/1753
Okay, anyone who is a Back to the Future fan could argue about that last one.
While these seem obviously wrong to you and me, a computer still sees those as valid dates. Interestingly enough, the value of 1/1/1753 is the oldest date that a Microsoft SQL Server will recognize. The date 12/31/9999 will come at some point, but I doubt we will be around to see it.
3. Custom Fields Often Cannot be Validated
One of the ways software applications try to appeal to a broader audience is by offering the ability for customizations. A modification may come in the way of adjusting an application feature to suit a particular organization, individual, or task. Most of the time, these customization frameworks are crude, and lack any real type of validation or controls.
In my consulting days, I worked quite heavily with an industry-leading retail management system. It was obvious that the software developers were aware of this need to customize, and they tried to come up with a clever solution of being one step ahead of you. They did this by adding 5 additional fields to the item information, aptly named CustomText1, CustomText2, CustomText3, CustomText4, and CustomText5. When entering a new item into the system, the user was present with these 5 fields that could contain whatever they wanted. Some of the customers would use them to provide additional descriptions, while some others would do something like put the initials of the person who entered it. In these situations, there is absolutely no way the application can determine what is good vs. bad data.
4. Naming Conventions Still Rely on Human Beings
It is often helpful to name items in your system with codes that convey important information. This allows someone to quickly identify where something belongs or where it came from. For example, if you’re in the business of warehousing fruit products, you most likely generate lot numbers for tracking. These lot numbers are often made up codes that identify what they are, when and where they were made, and how much product is included. So if I looked at lot code “15CHRY-WH1-112015,” I could determine that this product contains 15 lbs of cherries, was packaged in Warehouse 1, and was produced in November of 2015.
While these naming conventions are very powerful, they often differ from customer to customer, and it is unlikely the software application can decipher a bad one from a good one. This is a common problem with inventory management systems because “inventory” can mean many different things. A company that sells sunglasses might code their items in a way that they can easily tell what brand and style the code represents. So they might have an item code for “OAKM01,” which quickly tells them that this code is for a men’s Oakley style pair of sunglasses. The data quality challenge here is that based on this convention, the code would need to follow these rules:
- Must be 6 digits long
- The first 3 characters must match a brand code (such as OAK for Oakley)
- The 4th character must be either M or W (Men’s or Women’s)
- The last 2 digits must be a valid number between 00 and 99
As you can see, these rules are heavily dependent on the person entering the information. They are also structured enough that a data quality system could easily pick out invalid codes.
As a business, part of your competitive advantage can be around the business processes you've built, as well as the applications you've leveraged to support that business process. In selecting those applications, you do your best to find a "best match" by weighing features against cost against adoption, etc. In many cases, the application you select may have 90% of what you need, but will unfortunately require you to come up with creative ways accomplish the rest.
So how do you handle this mismatch between requirements and functionality? You work around them. This means you make the software do something it wasn’t intended to do. The result is that the application doesn’t know how to handle this input and cannot do much about validating it.
Workarounds are typically used when a company isn’t ready to jump into a full blown solution for a certain aspect of their business. One of the first applications that a company is likely to deploy is a Customer Relationship Management (CRM) system. This makes sense, because you don’t really have a business without customers. These CRM systems are often asked to do things they weren’t intended to, like act as a support ticket system. While CRM systems are great at tracking interactions with your customers (like providing support), they aren’t as great as tracking how much time it takes you to perform that action. As a result, companies will result to logging time information in notes fields. Due to the generic nature of a notes field, there is really no way to validate what is being put in, or that it is being entered at all.
So How Do I Stop It?
The hard truth is that as long as you’re using your data, you can’t. The BEST option is to start actually managing it. You need to define a data quality process that will allow you to proactively be notified of data quality issues so they can be handled before they cause problems.
Uncovering the health of your database is a simple process. In many cases, you can find this out within 24 hours or fewer. Try Naveego DQS (Data Quality Services) for FREE as a first step to gaining control of your data quality.