Garbage In, Garbage Out

11 January 2019

As we begin a new year, I thought it would be appropriate to go back to the beginning of a warehouse analytics (or improvement) process and talk about the types of data used and why. Previously, I’ve touched on a few analyses we execute plus some of the in-house tools we use at Fuzzy LogX but all these would be pointless without the right data. If you don’t have the right data, regardless of how amazingly perfect your analysis or model is, your analysis results will be wrong. Always remember, “garbage in, garbage out”.

What’s the problem or question?

The first step of any analytics process (or any problem-solving process) is to first define and determine what the problem (or the question being asked) is. This will help you identify what sort of data is required. Usually, for warehouse analytics, we at Fuzzy LogX are looking to get a better idea of your warehouse operations, especially around order fulfillment and inventory. Some of the questions we are aiming to answer are:

How does the warehouse receive orders?
How are the orders picked and dispatched?
How much stock is in the warehouse?
How much stock is received?
Where and how is the stock stored and picked?

Once you know what questions need to be answered, then its easier to acquire the right data from the relevant sources. So typically, what type of data would a warehouse analytics process require?

Warehouse Analytics Data

Order Line/ Transactional Data

The key dataset needed is order line or transactional data. This should be a dataset which represents a unique order line on each line (or record) of the dataset. From the order line data, a daily (or weekly or monthly) summary of order lines, orders, quantity and SKUs picked together with additional analyses such as order profiles and pareto analysis can be extracted.

At the bare minimum, order line data records should consist of the following attributes:

Date
Order ID
SKU ID
Quantity

This data can be further enriched by other attributes such as order type (store or e-commerce) and container ID (tote/carton licence plate number), which provide you with the opportunity to view your operations through multiple dimensions, but such additional attributes are dependent on your business processes and requirements.

It’s important to note that you should be mindful of the Date variable your data contains. Depending on how data is archived and stored, your dataset could have one or many of multiple dates which could include, the Order Date, the Pick Date or the Ship Date. While any of these dates will provide insights to your operations, take care what question you are answering with each date. For instance,

Order date – usually specifies the date/time an order was placed. This measure helps you get an idea of how you receive orders. This is ideal for determining if your operations are fulfilling orders in a timely manner and if you’re looking at designing a new warehouse, what capacity is required. Depending on your order fulfillment processes and time frames, this will also allow you to look at opportunities for smoothing out peaks or identifying high volume periods. But, if you are looking at improving your dispatch and shipping processes, this may not be ideal especially if some orders are not dispatched on the same day they are picked.
Pick date – this specifies the date/time an order was picked in your warehouse. Again, depending on how orders are dropped into your WMS/WCS and how they are fulfilled, this would help you create a summary of how picking is completed, but will not tell you how the orders are coming in. This would be ideal to understand current pick operation productivity.
Ship date – most operations would ship orders the same day they are picked while some may be only shipped the next day. This would be ideal to understand dispatch and shipping volumes, but may not be the most ideal to summarise order fulfillment.

However, some operations may have only one of the above dates archived in the data. Depending on what the aim of the analysis is and the current state of your warehouse, assumptions and estimations will need to be made. For example, if you have happy customers and you know you get all your orders out in time and you have only Pick Date, then it’s probably a fair approximation to assume that’s the same date your orders “come in”. However, if you’re always under the pump, have a huge backlog and are looking to improve, Pick Date will only tell you what you’re actually doing right now, whereas if you have Order Date, it will help you understand how you receive orders which then allows you to adequately plan your improvements. It’s always important to confirm initial summaries of the analysis with operational staff to ensure that what’s seen in the data is reflective of what happens in the warehouse.

SKU Master

While the order line data provides information about how much of product is moving out of your warehouse, the SKU Master provides information about the products themselves. This can include information such as SKU categories and family groups, dimensions and pack-sizes.

Dimensions enable the calculation of the cubic volume product being stored and picked in your warehouse which will help with slotting and sizing of storage. Dimensions also provide the opportunity to determine bulk SKUs and SKUs that can’t be conveyed or placed in totes/cartons – especially useful if you’re going from a manual environment to an automated environment. Pack-sizes assist in identifying if there’s the potential for large orders to be picked as full cases or pallets. SKU categories and family group information will aid in highlighting if there are unique order profile characteristics for certain SKU categories. SKU categories are also required for slotting if there are specific categories that should be stored together (or shouldn’t be)

What else?

While order line data and SKU Master data are the most important datasets, there are a few other datasets which will allow additional, refined analyses.

Stock-on-Hand (SOH) – SOH data provides a snapshot of inventory levels at a given period in time. Usually, this also includes information such as minimum and maximum SOH levels, minimum order quantities, lead time, stock on order, stock in transit and average SOH. As this data is a snapshot, analysing multiple snapshots across the year will enable a clearer picture of stock requirements to be created. This helps with warehouse sizing and storage requirement calculation. However, in the absence of SOH data, using order line data together with an average days’ SOH per SKU would also allow you to calculate an estimate storage and size requirement.
Inbound – Inbound (or receiving) data would typically have a line-by-line record for each SKU shipment coming in. While this doesn’t affect pick operations, it assists in determining inbound performance requirements such as cartons per hour and TEUs per day which impact the design of the inbound processes and areas of the warehouse.

Too much data (within reason) is never a problem, so occasionally there may be other datasets that you can use as well depending on your own unique processes and requirements.

Data Dictionary

As it’s name implies, the data dictionary is a list of all the attributes within a dataset together with some details about each attribute. It’s not a dataset, but ensuring your data has a data dictionary allows anyone to look at the dataset, refer the data dictionary and then understand what is represented by each attribute. At the minimum, it should include what quantities or measures are represented by each attribute, but it can also include information such as:

If the attribute is a calculation, what’s the underlying formula
What type of data is in each attribute
What’s the typical range

How much data?

Especially when using data to plan for the future, it’s a good idea to look at as much data as possible. We usually ask for as much data as possible, but at least a year’s worth of data is recommended. Looking at an entire year will include seasonal trends and peaks that should be accounted for and would help plan for any peaks occurring in your order profile. The advantage of looking at a period more than a year allows you to look at longer term trends within your order profiles.

However, looking at a longer time-frame will result in more data records in the dataset which increases its size. Excel has a limit of approximately 1 million lines and Access limits the database size to 2GB which is why we created SAM to deal with datasets of all shapes and sizes!

While we’ve covered the main dataset types that are useful in warehouse analytics, each operation will have its own unique set of tables each with their unique columns. Which is why we’ve created an easy reference sheet to specify what types of data and attributes we need. If you want SAM to see what he can do with your data, contact us here

About the author: Yohan Fernando is the Manager – Systems & Data Science at Fuzzy LogX who are the leading warehouse, logistics, and process improvement consultants in Australia. Fuzzy LogX provide project management & consulting services, leading-edge data analytics, process improvements, concept design & validation, solution/software tendering, implementation and solution validation services to businesses with Storage & Distribution operations looking to improve their distribution centres.