Getting your data analysis right

4 September 2020

We’ve spoken about the importance of volumetric data previously on Fuzzy Friday, but as we all are aware, data in general is an important ingredient for the success and accuracy of most projects, not only warehousing or logistics projects. Data helps create a summary of what happens right now and also provides a basis of underlying trends and information which can be used to plan for the future. While ensuring the analyses and modelling is accurate, there are three requirements that should be identified and defined at the beginning of any data-based project and will help make sure you get your analysis right.

Data Champion

This is an often forgotten role and one which can easily prevent a lot of pain and inaccuracy further down the line. So what does this Data Champion look like?

The data champion would be someone who can not only provide you with the data you need, but also understands the data and can provide you with some guidance on working with the data. For example, what does each data field signify? Are dimensions in mm, cm or m? What do the values of a particular data field represent? They would also ideally provide you with the best data for your task, taking care to exclude any exceptions or anomalies to the data and also will be able to answer most of your data related questions. Add to this, they should also be able to provide the validation that what you see in the data is accurate and representative of what is actually happening.

Data validation is an important milestone in a project as it is at this stage that you determine if the current state summary created from the data, or the baseline as we like to call it is accurate. Moving forward with an inaccurate baseline will result in all future analysis and modelling being incorrect.

The Data Champion doesn’t necessarily need to be one person and most often could involve multiple stakeholders working together. Most organisations we have worked with will have a technical person to provide us with the data and help us prepare the data, while someone else will help validate the baseline we create from the data to ensure we’re on the right track.

Data Source

Together with the Data Champion, a key item to determine and define at the beginning of any project requiring data is the data source/s required. Remember – Garbage In, Garbage Out – if you use the wrong data sources, this could affect all your downstream analyses, models and outputs. Given multiple systems and software in use throughout enterprises, it’s expected that there might be different sources of data for various measures, for example, one for outbound freight, one for inbound freight and a main ERP system. Ideally, all these various data sources will be transformed and collated together in a Data Warehouse for easy access. If your organisation has a Data Warehouse, this would be your primary or only source-of-truth for all things data. Alternatively, it’s important to identify the source-of-truth for the various data sets you are going to be working on. This ensures you are working on un-adulterated, raw data which does not have any transformations or filters applied to it. Identifying a source-of-truth also ensures that the analysis results can be compared to other existing analytic models or dashboards, reducing variability in results.

There will always be the temptation to take the quick and easy route and extract the data from a user software application or use some existing Excel sheets currently used for analysis, especially if you still have not identified the Data Champion. If possible, the best way to obtain data should be through direct database export with the help of the Data Champion. Database tables will provide you access to the raw data and most importantly will also ensure the data is a nice tabular format. We’ve come across data extracts in non-tabular form which still can be used, but introduce a lot of complexity and effort to the data preparation and cleaning stage.

Data Level

You have the Data Champion and the Data Source, so everything should be smooth sailing from now on, right? Only if you make sure you have the right amount of data and at the right level of granularity. I’ve previously covered the Vs of Data and how much data is too much data – it’s always important to ensure you have the right amount – too much data that you spend time on data that doesn’t provide any value or too little data which provides gaps in your modelling and analysis. The granularity of data is a key factor to keep in mind as well. Too much granularity and you could get swamped with unnecessary data lines that don’t really provide much value. Too little granularity and then you can miss key trends and seasonalities that can make or break your analysis. For example, monthly sales data will not show you underlying weekly or monthly trends that need to be taken into account, especially when designing a warehouse.

Looking at the 3Ds above – Data Champion, Data Source and Data Level – they all work together and leaving one of them out could be detrimental to the success of the project. The 3Ds are by no means a complete list for attaining success in a project but are 3 key requirements we have identified. Look out for future Fuzzy Friday posts as we delve further into additional requirements and best practices for data analysis.

Yohan Fernando is the Manager – Systems & Data Science at Fuzzy LogX who are the leading warehouse, logistics, and process improvement consultants in Australia. Fuzzy LogX provide project management & consulting services, leading-edge data analytics, process improvements, concept design & validation, solution/software tendering, implementation and solution validation services to businesses with Storage & Distribution operations looking to improve their distribution centres.