Dave Hooper is an Executive Consultant at Waterstons with over a decade of industry experience. He specialises in systems analysis, business intelligence and data integration having implemented many such projects and holding multiple Microsoft Certified qualifications.
A Single Version of the Truth
It is vital to the success of an Intelligent Business that everybody trusts the data it consumes and the reports it generates. I frequently see teams ‘correcting’ their data in tools such as Excel before creating reports whilst other teams are ‘correcting’ related data in a slightly different way. This type of process is simply not compatible with having a single set of data that everybody trusts.
The solution is to have a single place where all the required data is collated to and stored. Only validated, corrected data is allowed into this central repository. No one is allowed to ‘correct’ the data – it must either be corrected by the import process itself or corrected at source and then reimported. This way the owners of the data are responsible for it rather than the consumers and there is no possibility of anyone distrusting the data.
This repository cannot impact on the source systems and does not replace their functionality in any way as this might negatively impact on the business, in both system performance and operational familiarity. Businesses need their users to buy in, and starting out by demanding changes to existing systems will only alienate staff – we must instead earn trust by building something demonstrably better. By using such a repository as a single data source in a bespoke Business Intelligence solution we can replace systems’ higher level reporting and data analysis functions, which I’ve found are frequently weak in comparison, whilst including relevant data from other systems.
The concept of such a repository is known as a Data Warehouse. Data is pulled into the Warehouse from systems across the business and processed, in a completely automated fashion, to ensure that it fits with all the other data. All the KPI's and other output systems then take their data from the Data Warehouse, ensuring that the values extracted do not differ depending on the form of output.
In practice the Data Warehouse is implemented as a database structured to aid the requirements of reporting performance, aggregation of data and ease of selecting data by categories such as date, location or product type. This is frequently referred to as ‘slice and dice’ functionality and provides many users with a powerful insight into their data.
What Data is Required to Calculate the KPI's?
Now that we have created the concept of a data warehouse we need to decide what data it should hold. Rather than looking at what is available it is critical that this is done by looking at the KPI's to see what data they require. Otherwise the resulting warehouse simply becomes a rehash of the unwieldy monthly management reports, with all their drawbacks, that we’re trying to avoid.
Taking a simple KPI of profit per manufactured unit – what data do we need? We will need all the sales figures, from all regions, for all the products we produce. We will also need the corresponding production and component purchasing cost figures. Plus the relevant overheads and other fixed costs.
Another KPI might be based around production quality. So we will need to know the return rates from customers for all products in all regions, as well as the results of quality control testing and customer satisfaction levels.
This process continues until the data requirements for all of the KPI's are defined. Where the data will come from is not yet a concern – we just need to define what data we need. We can also identify that much of the data is common to several KPI's. For example sales regions, product ID's and dates are in both of the above examples.
Once this stage is complete we can begin to design the structure of the Data Warehouse to hold all the data.
What Data is Already Available?
We now need to find all the required data so that we can populate the Data Warehouse. The reason for working in this order (i.e. requirements and then data sources) is to ensure that we end up reporting on the KPI's the business needs rather than risk merely reporting on the factors that we can easily get our hands on. Reporting based on available data is a trait commonly associated with businesses which have evolved without clearly defining their KPI's or management information strategy. The result is that decision makers do not have access to all the data or the right data and so cannot make the best decisions for the business.
In the above manufacturing example it is likely that the ERP (manufacturing) system can provide much of the required data but there will almost certainly be some exceptions and omissions. There will also be duplicates as not only the ERP system but also the sales and finance systems will have product information.
Some possible examples might include:
- The South American region hasn’t migrated to the new corporate ERP yet; they use a different one, from another vendor, with differing data structures on a different infrastructure platform.
- North America gathers return rates via Account Managers into a bespoke CRM system, Europe uses a module of their ERP and Asia doesn’t formally record return rates at all.
- The ERP works by country but North American return rates are collated by State.
- Some of the products the business sells are bought in and are not manufactured – so they are not in the ERP system.
We now face what seems to be a large obstacle to progress. We have missing data, data in different systems, data in different formats, data broken down in different ways and data that is not formatted at all. How can we build one version of the truth from this?
We might propose that we should spend the next few years migrating everyone and everything into the corporate ERP and enforcing a uniform process on all regions. We could then pick up the BI process again and continue to build the data warehouse.
However, this will mean that the accurate and intelligent reporting we want will also be delayed. Furthermore, in 3 years’ time the business may have changed direction, the old ERP system may be entirely unsuited and the ideal KPI's may be different. We’ll have spent a lot of money with the ERP vendor, enforced a lot of change on the business and we’ll not have gained any extra knowledge or insights.
This is where the process feels like trying to herd cats – everything is trying to go in a different direction whilst the ground is moving fast beneath us! I have been in this situation many times and I believe that the best approach is always to take a more flexible and pragmatic approach, that will first use what we have and only later fill in the gaps if it proves necessary. For example I’ve seen a BI prototype on a warehousing operation that was completed in only a couple of days and was able to release a million pounds of capital to the business over the next month. The sooner a BI project can show some results and demonstrate its capabilities the more likely it is to be a success.
How do we Collate Data from Disparate Systems?
The first part of this pragmatic approach is to identify what data we have, what is missing and what is duplicated.
Where we have a data source we can review it to analyse its quality. If it needs cleansing we can define an approach for how this would be done but more importantly we can understand how to use the data as it stands now.
Where data is duplicated, for example a products list, we can identify the best source system and make that the primary data source. We then use various techniques to allow the Data Warehouse to merge, map and align other systems’ data against this primary source.
Where data is missing, for example the Asian reject rates, we simply accept that we cannot have that data at present. This acceptance doesn’t prevent us from importing that information from other continents and making use of it. It also highlights the gap and means that we might only need a simple data capture front end to populate the warehouse and the gap can soon be filled.
By taking these simple steps we can filter out the data that we will use from the data that we can’t or don’t wish to – it’s about being pragmatic! There will be gaps, potential inaccuracies and differences but it will allow us to progress towards the first iteration of our Business Intelligence solution. Only then can we decide what steps should be taken next to move the business most efficiently towards its strategic goals. For example the KPI's may help us to decide that the Asian operation should be sold; in this example it would have been a waste of time, resource and money to formalise that data only for the resultant intelligence to suggest a sale!
We are now in a position where the approach has allowed us to create islands of consistent data that between them cover some (and hopefully the majority) of the idealised data warehouse requirements; but we still have some issues to solve. What do we do when these different islands of data don’t match up and can’t be linked together?
How do we Fill the Data Gaps?
We can often do the bulk of the data mapping between our data ‘islands’ automatically by matching on information such as the product name, a supplier reference code or some other common field. But we often find that different systems have different naming conventions, use historical names or organise data in a totally different way.
In the pragmatic approach I would never advocate spending a year updating all the data in one system to match another. I would produce a mapping table in the database to allow these items to be cross referenced. This will then be made available, via a maintenance website or other mechanism, to allow data owners to update and correct the mappings. In many cases I’ve been able to do the bulk of this mapping myself and then the customer can do the much smaller remainder themselves the following day.
This is facilitated by the data import process highlighting any cases it cannot match and reporting to the appropriate users. They can then update the mappings and trigger the import to run again.
In some cases the mappings can be undertaken in a one off process – a typical example is country names. One system might have ‘United Kingdom’ where another has ‘Great Britain’. Or regional organisations might be by continent vs. more generic terms like ‘East’ and ‘West’.
In conjunction with the customer we may decide that we really do need some of the missing data. For example, if quality is one of our key KPI's then we absolutely need to get that Asian return rate data. I’ve frequently written a simple data capture website for this as mentioned above. The returns data can then be captured and stored in the data warehouse as items arrive back in the physical warehouse. It’s not a perfect solution but if the data is that important then we need to start capturing it as soon as possible which demands the simplest possible solution.
At the same time I’d recommend a project is kicked off to capture this data in a more suitable manner, ideally in line with other parts of the business. We can then plan to simply retire the short term data entry website once the business has ‘caught up’.
Once some simple reporting has been implemented the business is then able to determine where future investment should be made. Typically this will be to improve the reporting perhaps by investing in a complete enterprise BI suite with dashboards, scorecards and reports etc. It might prove to be more important to improve the quality, amount or type of data. Some businesses choose to expand the scope and add more (perhaps lower level or operational) KPI's. I’ve even seen cases where the data has shown that immediate business process change must be implemented.
An intelligent business provides relevant up-to-date information to strategic decision makers to allow them to make effective decisions against their goals and so drive the business forward. A Business Intelligence project should be led by the business – not by an IT vendor or particular product.
Business Intelligence projects must therefore be approached by focusing on the KPI's and then following the structured process covered here to produce an effective decision making tool. I have also outlined some of the pragmatic decisions that must be made in order to deliver the project as efficiently as possible.
These logical, pragmatic methods ensure that an effective business intelligence system can be built in a timely manner. It will then present appropriate data to key decision makers so that they can run an intelligent business.