This post is part of a larger learning series around time series forecasting fundamentals. Check out the learning path to see other posts in the series.
AI’s Are Like Onions, We Both Have Layers
There’s a lot of terminology in the world of artificial intelligence. With a lot of layers to it. The practice of time series forecasting is many layers down. How deep? Let’s start peeling back the layers.
Artificial Intelligence
Artificial Intelligence, or AI, can be described as the ability for machines to learn and think like humans. It’s as simple as that. While that’s an easy thing to say, it’s a hard thing to do. AI has been around since the mid 20th century, with a lot of ups and downs. There are multiple subdisciplines of AI. Things like robotics and expert systems. But the biggest and most important is machine learning.
Machine Learning
Machine Learning, or ML, is all about creating algorithms learn on their own from data. ML turns the idea of traditional programming on its head. Traditional programming is all about writing out explicit instructions or rules, in the form of code, that tell a computer what to do. ML on the other hand is code in the form of an algorithm where the rules are learned from training data. It’s a unique paradigm shift that has unlocked countless advancements in computing. Self-driving cars, human genome sequencing, and yes even forecasting has all been driven by these algorithms that can learn. Similar to AI, the world of ML has multiple disciplines. The one we’re interested in is supervised learning.
Supervised Learning
Supervised learning is a type of machine learning where you teach a computer to make predictions or decisions using examples. It’s like teaching a child to identify animals by showing them pictures of different animals and telling them what each one is. In supervised learning, you give the computer a bunch of “input” data (like pictures of animals) and the correct “output” (the names of the animals). The computer learns from these examples, and over time it gets better at predicting the outputs based on new inputs it hasn’t seen before. There are a few different flavors of supervised learning, but in this learning series we care about regression.
Regression
Regression is a type of supervised learning method that helps predict a numerical value based on past data. Imagine you want to predict the price of a house based on its size, location, and number of rooms. Regression analyzes the historical data of house prices and their characteristics to find a relationship. Once this relationship is understood, you can input the characteristics of any house and the regression model will predict its price.
Traditional regression modeling usually doesn’t have a time component, but when you need to predict a number over time, that’s when we enter the crazy world of time series.
Time Series
A time series is a series of numerical data points recorded at regular intervals. For example the highest temperature each day over the last 10 years can be thought of as one time series. If we have that temperature data for 50 cities, then we have 50 time series in our dataset. A time series can be at any kind of time interval. Yearly, quarterly, monthly, daily, hourly, etc. You get the idea.
The art of forecasting is just taking historical time series data, and feeding it into a model that can learn from historical trends and seasonality patterns. Ultimately creating a future prediction for the next few periods of time.
Most regression models in machine learning can be turned into a time series forecast model by the process of incorporating time dependant data as features through a process called feature engineering. A feature is just a variable in your dataset that can be used to help predict your target variable. The process of creating features before you train a model is called feature engineering. The target variable is what you want to predict going forward. These machine learning regression models can be thought of as multivariate models, since they can incorporate many different variables or features into a model.
Let’s apply this to a quick example. For a monthly revenue time series forecast, our target variable will be the revenue variable in our dataset. And we can create features like month of the year, how much money we made in the same month last year, etc. to help predict revenue.
There are also models that only exist in the world of time series forecasting. These are more statistical models and less machine learning models. Meaning they operate like equations that can applied to your data compared to a machine learning model that tries to learn the patterns underlying the data. These models are often univariate, which means that you only need the historical target variable values accompanied by a timestamp (list of dates) to train these models. While they might be simple in nature, they are still very useful even today. Often beating more complex models like deep learning.
Time Series Example
Now that we’ve peeled back all of the layers on time series forecasting, let’s take a look at some data. In this learning series I will be using the most common time series that applies to every company in the world, historical revenue. The data we’ll use is completely made up, but created in a way to illustrate a lot of the learning concepts throughout the series. The sales data is recoded at a monthly level, broken out by countries our hypothetical company operates in and what products it sells. Let’s take a quick look at the data.
You can find the raw data here.
In the data there are sales for two countries across two products. This means we have four unique time series we can forecast. One for each unique combination of country and product. Throughout this learning series we will use this data as an example as we work through different time series concepts.
Final Thoughts
Time series is a very niche corner in the AI universe. It may seem small but in reality every company in the world deals with time series. They make up most financial data like sales and are top of mind for every leader in the C-suite. You should now know what a time series is and understand more of the common terms you will hear around machine learning. Now that we have a good foundation of terminology, lets explore the data even more in the next post.