How to Use ARIMA Model to Predict the Stock Market? (Time Series Analysis-Python)

What is a Time Series?

A time series is a series of data points indexed in time order. It is a sequence of discrete-time data. One example of time series data is stock price historical data.

One of the main concepts in time series is stationary.

What is stationary data?

“Stationary” means the statistical structure of the time series is independent of time.

The following chart of US GDP is an example of a time series data. We see that mean and variance of GDP change over time. This is an example of Non-stationary time series data.

There are three different ways to finding a data is stationary or not.

First by plotting the data as we did in the above chart. We can see an obvious upward trend in our data that suggest Non-stationary data set.

Second by measuring summary statistics of our data such as Average and Standard-Deviation of our data in various point of time in our data and check for obvious or significant differences between them. If we found that significant differences of average and standard-deviation that means our data is Non-stationary otherwise it is stationary.

Third we can conduct statistical tests to check the expectation of stationarity are met or violated. The Augmented Dickey fuller test is a way that provides us p-value. If the p-value be less than 0.05 we can assume that the data is stationary.

Statistical time series methods and modern time series models take benefit of stationary data that is not dependent on time. Because we need dealing with a model that parameters and structure are stable over time (No change over time). Stationary also matters because it provides a framework that averaging can be properly used to describe the time series behavior. So, we need to find a way to convert a non-stationary data into stationary data.

One way to convert non-stationary data into stationary data is to difference it. How? We can do it by difference one time period from another. For example, in stock historical daily price are typically upward or downward trending and have changing mean over time. we can calculate daily (weekly, monthly…) return in order to make it stationary. If the daily (weekly, monthly…) return still looks trendy, we can difference it one more time in order to have stationary series.

How to analyze a time series data?

Auto Correlation:

Almost all time series data have an inherent property which is at any point in time the data point is slightly or highly dependent to the previous data values. A correlation of a variable with itself at different time periods in the past is known as Auto correlation. Let’s see how it looks like.

In the code bellow I downloaded “AAPL” stock prices and examined its dependency to its previous values.

import yfinance as yf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

symbol = 'AAPL'
df = yf.download(symbol, period='10y')

df_week = df.resample('w').mean()
# Calculating the weekly return
df_week['weekly_return'] = np.log(df_week['Close']).diff()

# drop null rows
df_week.dropna(inplace = True)
# Plotting the weekly return
df_week['weekly_return'].plot(kind='line', figsize=(12,5));
# Dropping unused columns
udiff = df_week.drop(['Open','High','Low','Adj Close','Volume','Close'], axis = 1)

# Plot Auto Correlation of Weekly Return with 10 period lags 
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(udiff.values, lags=10)
plt.show()

Now let's see the plot:
As you can see in this weekly return plot of Apple it fluctuated around zero and it seems that it is stationary.

Now let do a Dickey_Fuller test in order to check the weekly returns are stationary or not.

import statsmodels.api as sm 
dftest = sm.tsa.adfuller(udiff['weekly_return'], autolag = 'AIC')       
output= pd.Series(dftest[0:4], index=['Test Statistic','P-value','# lags used','number of observation used'])
for key, value in dftest[4].items():
    output['Critical value({0})'.format(key)]=value
print(output['P-value'])
# :>>> 3.578168870749996e-20

By plotting 10 lags Auto Correlation plot of the APPLE weekly return we will have the following figure:

In the above plot we can see that in lag = 1 there is a significant correlation suggesting that APPLE stock weekly price has a significant correlation for its previous week price and we can use this information to predict the future price.

We just talked about Auto Correlation. Now it is the time to talk about Auto Regression (AR).

What is the difference between Auto Correlation and Auto Regression Process?

Correlation has no direction that means the correlation between X and Y is the same as the correlation between Y and X. Regression is different. In regression direction matters like Y regressed on X.

In ARIMA modeling we will start of our modeling by use the order of lag to regress and as explained above we get an inside of Auto-Correlation information to find the possible lags for our regression process. An AR process is where auto-regression occurs and our goal is to find the “correct” time lag that best captures the “order” of such an AR process. We usually find more than one significant correlation in our auto-correlation study and may need to test all of them to find the best AR process for our prediction.

MA _ Moving Average Models:

So far, we discussed about using previous values to predict the future values. We can also use the average values of past observe points to predict the future.

Moving Average (MA) models don’t take the previous ‘Y’ values as inputs, but rather take the previous error terms (how wrong our moving average was compared to the actual value).

ARIMA Model:

By combining AR process with MA models, we can have ARMA model. ARIMA model is one step further of ARMA model. The “I” stands for integrated. In other words, we are combining AR and MA techniques into a single integrate model. This integrated process helps to have a ‘stationary’ data.

In ARIMA (p, d, q) we have 3 parameters.

P: The lag number we derived from auto correlation

q: How many prior time periods we are considering for observing sudden trend changes (number of lags in MA part of the ARIMA model)

d: if d = 1 it tells ARMA model that we are now predicting the Difference between one prior period and the new period, rather than predicting the new period’s value itself. As already mentioned, it may cause stationarity for the model.

If d=2 it may capture exponential movements in our time series but it’s not frequently used.

So, we also can put d=0 if we sure that the time series is already stationary.

Now let us continue the previous example and forecast the APPLE stock price for two weeks from now.

making autocorrelation and partial autocorrelation charts help us choose

hyperparameters for the ARIMA model.

the ACF gives us a measure of how much each "y" value is correlated to the previous

n "y" values prior. (parameter q)

the PACF is the partial correlation function gives us ( a sample of) the amount of

correlation between two "y" values separated by n lags excluding the impact of

all the "y" values in between them (parameter p)

'''

for choosing the lags number for our prediction we need to choose the number that

be above(outside) the shaded area :::> for ARIMA parameter q '''
from statsmodels.graphics.tsaplots import plot_acf

plot_acf(udiff.values, lags=10)
plt.show()

# for partial autocorrelation ::: for ARIMA parameter p
from statsmodels.graphics.tsaplots import plot_pacf
plot_pacf(udiff.values, lags=10)
plt.show()

The ACF plot suggest to choose lag = 1 for our q parameter and the PCF plot suggests to choose either lag=1 or lag=3 for our p parameter in the ARIMA model.

Now let use this parameter in our ARMA model. Actually, we already used weekly returns for our model so, the I=0 in ARIMA model.

from statsmodels.tsa.arima_model import ARMA
#Notice that you have to use udiif( weekly_ return)_ the differenced data rather
# than the original data
ar1 =  ARMA (tuple(udiff.values),(3,1)).fit()#(3,1) ::> (p,q)
ar1.summary()

## Forecasting 2 weeks ahead
steps =2
forecast = ar1.forecast(steps=steps)[0]
print(forecast)
# :>>> [0.00945523 0.00603272]

The model predicted the weekly return will be 0.00945523 next week and will be 0.00603272 for two weeks from now.

Welcome to Best Quants

How to Use ARIMA Model to Predict the Stock Market? (Time Series Analysis-Python)

Comments

Post a Comment

Popular posts from this blog

How to check if a distribution is normal?

Asset price Prediction Using Principal Component Analysis And Machine Learning Regression Model

How to scrap Zacks Rank signal in Python