Time Series in Machine Learning

Adithyavegi
Analytics Vidhya
Published in
4 min readNov 22, 2020

--

What is a time series?

A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. For example, stock prices over a fixed period of time, hotel bookings, ecommerce sales, weather cycle reports etc.

Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values.

Let’s discuss a few definitions related to time series first.

Definitions

  • Level: Level is the average of the values of the series.
  • Trend: Trend shows a pattern in the data. For example, whether the stock prices are increasing with time(uptrend) or are they decreasing with time(downtrend) or time doesn’t have that much effect on the prices(Horizontal trend)
  • Seasonality: When the data shows a repetitive pattern for over an year, it can be termed as seasonal pattern. For example, the sale of air conditioners will increase every year during summer and the sale will decrease during winter.
  • Cyclic Patterns: These are the repetitive patterns shown over a longer period of time(more than one year). For example, after every five year the share market has some fluctuations due to the general elections.
  • Noise: The variations which do not show any pattern.

Let’s now take an example to see what was done before the advent of Time Series Analysis.

Let’s say that we have a problem at hand where we have been asked to predict the sales of skiing products for a sports manufacturer. You can do the predictions using the following methods:

Old Methods

  • Using Average: You might give the prediction as the average of all the previous values.
  • Using Moving Average: This is the average of the previous values over a fixed period. For example you might predict the sales in November based on the average of past 3 months. The past three months will be August, September and October. If you are predicting the sales for December, the past three months will be September, October and November. Although the number of months considered are same but the window moved from one set of months to another. Hence the name Moving Average.
  • Using the Naive Method: The Naive method says that the prediction will be same as the last figure. For example, the prediction for November will be the sales for October.
  • Using the Seasonal Naive Method: Seasonal naive method is similar to naive method. Here, the new prediction is equal to the sales for the previous season.

Lets see coding part of Time Series in Python

ARIMA(Auto Regression Integrated Moving Average)

import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.graphics.api import qqplot
%matplotlib inline
In[]:
female_birth_data=pd.read_csv("daily-total-female-births-CA.csv")
# This is a free datset
In [ ]:female_birth_data.head()

Out[ ]:

In [ ]:birth_data=pd.read_csv("daily-total-female-births-CA.csv", index_col=[0], parse_dates=[0])birth_data.head()

Out[ ]:

In [ ]:
birth_data.describe()

Out[ ]:

In[]:
birth_data.plot() #almost a stationary series

Out[ ]:

In[]:# also called as smoothingmoving_average_birth=birth_data.rolling(window=20).mean() # window: This is the number of observations used for calculating the statistic.In []:moving_average_birth

Out[22]:

In [ ]:
moving_average_birth.plot() # we can see that there is a peak in the month of october

Out[ ]:

<matplotlib.axes._subplots.AxesSubplot at 0x1f150907b00>

In [25]:

sm.stats.durbin_watson(birth_data) # very less correlation

Out[25]:

array([0.04624491])In [ ]:
# show plots in the notebook
%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(birth_data.values.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(birth_data, lags=40, ax=ax2)
In []:training_data=birth_data[0:320]
test_data=birth_data[320:]
In []:from sklearn.model_selection import train_test_splitt_x,t=train_test_split(birth_data)
In []:
from statsmodels.tsa.arima_model import ARIMA
In []:
arima= ARIMA(training_data,order=(2,1,3))
model=arima.fit()
model.aic
Out[ ]:
2159.076974912458
In []:pred= model.forecast(steps=45)[0]pred

Out[ ]:

In []:from sklearn.metrics import mean_squared_errornp.sqrt(mean_squared_error(test_data,pred))Out[]:
6.419420721712673

We will further discuss about Time Series in RNN by predicting stock price.

If you like this article

Please like it , check my other content if you like it

Then follow me

--

--

Adithyavegi
Analytics Vidhya

Techie | Engineer | AI-Enthusiast | Astrophile