The immediate answer to this question related to the correctness of metrics seems to be very simple, with an obvious example:
There is displayed a metric “Profit” on a report. In the business glossary we can find that a business term “Profit” is calculated as the difference between “Revenue” and “Costs”. Then we would look at the business data lineage which would lead us to where these business terms of interest are stored in the database (by knowing the relations to the data catalog). And then it would be easy, we would just prepare data quality rules for verifying if the metrics are well calculated e.g., checking whether Profit = Revenue - Costs. But… is this the right approach? Should data quality management be based only on calculation of how metrics are set up in the business model?
No, in fact, data quality should be about checking how good the data you work with are. But data quality should not solve, if some aggregated value is correctly calculated from the business perspective. So, the answer is to perform data quality checks for measuring, if the data for those metrics are good or poor data quality. Let us illustrate this with a few examples:
Are there filled data in the Profit column? – Does it contain NULL values?
Are there any wrong characters? – Are all the data in the Revenue column numbers?
Are all numbers properly formatted in the Profit?
Are there any duplicated values in the costs (that need to be checked)?
Do Costs and Revenue relate to the same period of time?
Are there any negative numbers in columns which are supposed to have only positive values e.g., profit?
So, what would be the recommended approach to achieve correct metrics in the reports?
The ultimate answer is to have it well covered from the processes’ perspective, step by step. When starting any data initiative in your company, you don’t normally start with data quality. Firstly, you need to start with business terms definitions. Therefore, the initial step is to start with a business glossary and clearly define the business terms including the metrics (relation among many business terms e.g., calculations). Then, you need to make sure that the business model definitions of metrics used in the business glossary (the calculated attributes) are in place.
This will give you a solid foundation for the creation of any report based on the business needs in your organization. People responsible for such a report will have the information how the metrics should be calculated and they must check, as part of the report creation, if those metrics are calculated properly there. All of this should, of course, be supported by measuring of the data quality used beneath each column/business term. This is a better approach – it’s not just about the “data”. It is a process of creating the semantic layer over customer data which then declares the meaning of your data. Data you know the quality for.
Data governance is not only about definitions, SQL statements, or data flows, but also about the processes, and how to work properly with data. When creating any report, you need to make sure that you meet the necessary steps to ensure correct representation of your data in the report. Pay attention not just to calculations, but also if respective metrics meet the data quality checks based on defined rules.
In case you would like to discuss how to set up data governance properly in your organization you can schedule a free personalized 1-on-1 demo of the Accurity data intelligence platform, where we will be happy to discuss the right approach for you.
Our new whitepaper will help you plan for and achieve your own Data Quality Management (DQM) framework. It covers everything we have mentioned here in a lot more detail and also has significantly more in-depth DQM process steps. This whitepaper is a free download from the Accurity website.