Exploratory Data Analysis: Shape of the Data

Start to understand how your time series data looks
machine-learning
time-series
Author

Mike Tokic

Published

October 15, 2024

This post is part of the exploratory data analysis chapter within a larger learning series around time series forecasting fundamentals. Check out the main learning path to see other posts in the series.

The example data used in this series can be found here.

Initial Data Smell Test

The first step of exploratory data analysis for time series is getting a feel for the data at a high level. This doesn’t involve complex statistical analysis. Just simple charts to see how things look. When analyzing these charts we’ll use the most underrated concept in machine learning. The human smell test. This involves using your eye balls to look at the data and understand what could be going on based on your domain experience. Let’s chart the time series in a few ways and give it a smell test.

Total revenue by month

Looking at the data all up shows a pretty stable upward trend with a few bumps in the data. Right now it’s hard to tell if those bumps in 2019 and 2021 are outliers or not. We would have to dig deeper into the data. The trend seems to be growing exponentially. Meaning it’s not a straight line but instead one that seems to grow faster each year. This kind of hockey stick growth is what all businesses aspire to. Lastly it’s hard to tell if there is any strong seasonality.

Total country revenue by month

Breaking out the data by each country gives us a better picture of what’s going on in our monthly sales. Canada is the country with the exponential growth, combined with a few spikes in some months. These spikes could be one off events that are unpredictable, or something happened in those months that can be quantified in data.

Revenue in the United States has more of a linear upward trend with strong seasonality. There is something strange going on in 2020 though. Any guesses on what that could be?

Total product revenue by month

Looking at revenue by product tells a similar story as revenue by country. Cookie sales seem to be a lot more bumpier than ice cream, which is more stable and growing fast.

Revenue for each individual time series by month

Finally we can view each time series individually. Our initial hunches at higher aggregations are now confirmed. We can see Canada cookie sales have those crazy spikes. After talking to our sales and marketing teams, we learn that the spikes are caused by new cookie product launches. Ice cream sales in Canada is the one series with the exponential growth. Both don’t have much seasonality.

Revenue in the United States is all about seasonality. Cookie sales show an interesting dip in 2020 but seem to rebound in 2021. This could be caused by the impact of COVID. Ice cream sales has both strong trend and seasonality. This will make it easy to forecast going forward. The more stable the trends and seasonality in your time series, the easier it is to create an accurate future forecast. The future should always resemble the past.

Final Thoughts

We have now completed our initial smell test of the historical data. Based on our domain knowledge of the business we can start to understand what happened in the past, and use that information to help us create a better future forecast. The smell test was easy on this dataset since there are only four time series. You might be working with thousands of time series at your company, so higher level smell checks might be the only thing you can do. There are more automated ways at analyzing the data, and will be explored more in future posts.