How to Use ARIMA Model to Predict the Stock Market? (Time Series Analysis-Python)
What is a Time Series?
A time series is a series of data points indexed in time
order. It is a sequence of discrete-time data. One example of time series data
is stock price historical data.
One of the main concepts in time series is stationary.
What is stationary data?
“Stationary” means the statistical structure of the time
series is independent of time.
The following chart of US GDP is an example of a time series
data. We see that mean and variance of GDP change over time. This is an example
of Non-stationary time series data.
There are three different ways to finding a data is
stationary or not.
First by plotting the data as we did in the above chart. We
can see an obvious upward trend in our data that suggest Non-stationary
data set.
Second by measuring summary statistics of our data such as
Average and Standard-Deviation of our data in various point of time in our data
and check for obvious or significant differences between them. If we found that
significant differences of average and standard-deviation that means our data
is Non-stationary otherwise it is stationary.
Third we can conduct statistical tests to check the
expectation of stationarity are met or violated. The Augmented Dickey fuller
test is a way that provides us p-value. If the p-value be less than 0.05 we can
assume that the data is stationary.
Statistical time series methods and modern time series
models take benefit of stationary data that is not dependent on time. Because
we need dealing with a model that parameters and structure are stable over time
(No change over time). Stationary also matters because it provides a framework
that averaging can be properly used to describe the time series behavior. So,
we need to find a way to convert a non-stationary data into stationary data.
One way to convert non-stationary data into stationary data
is to difference it. How? We can do it by difference one time period from
another. For example, in stock historical daily price are typically upward or
downward trending and have changing mean over time. we can calculate daily (weekly,
monthly…) return in order to make it stationary. If the daily (weekly, monthly…)
return still looks trendy, we can difference it one more time in order to have
stationary series.
How to analyze a time series data?
Auto Correlation:
Almost all time series data have an inherent property which
is at any point in time the data point is slightly or highly dependent to the previous
data values. A correlation of a variable with itself at different time periods
in the past is known as Auto correlation. Let’s see how it looks like.
In the code bellow I downloaded “AAPL” stock prices and
examined its dependency to its previous values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | import yfinance as yf import numpy as np import pandas as pd import matplotlib.pyplot as plt symbol = 'AAPL' df = yf.download(symbol, period='10y') df_week = df.resample('w').mean() # Calculating the weekly return df_week['weekly_return'] = np.log(df_week['Close']).diff() # drop null rows df_week.dropna(inplace = True) # Plotting the weekly return df_week['weekly_return'].plot(kind='line', figsize=(12,5)); # Dropping unused columns udiff = df_week.drop(['Open','High','Low','Adj Close','Volume','Close'], axis = 1) # Plot Auto Correlation of Weekly Return with 10 period lags from statsmodels.graphics.tsaplots import plot_acf plot_acf(udiff.values, lags=10) plt.show() |
Now let's see the plot:
As you can see in this weekly return plot of Apple it fluctuated around zero and it seems that it is stationary.
Now let do a Dickey_Fuller test in
order to check the weekly returns are stationary or not.
1 2 3 4 5 6 7 | import statsmodels.api as sm dftest = sm.tsa.adfuller(udiff['weekly_return'], autolag = 'AIC') output= pd.Series(dftest[0:4], index=['Test Statistic','P-value','# lags used','number of observation used']) for key, value in dftest[4].items(): output['Critical value({0})'.format(key)]=value print(output['P-value']) # :>>> 3.578168870749996e-20 |
In the above plot we
can see that in lag = 1 there is a significant correlation suggesting that
APPLE stock weekly price has a significant correlation for its previous week
price and we can use this information to predict the future price.
We just talked about Auto Correlation. Now it is the time to
talk about Auto Regression (AR).
What is the difference between Auto Correlation and Auto
Regression Process?
Correlation has no direction that means the correlation
between X and Y is the same as the correlation between Y and X. Regression is
different. In regression direction matters like Y regressed on X.
In ARIMA modeling we will start of our modeling by use the
order of lag to regress and as explained above we get an inside of Auto-Correlation
information to find the possible lags for our regression process. An AR process
is where auto-regression occurs and our goal is to find the “correct” time lag
that best captures the “order” of such an AR process. We usually find more than
one significant correlation in our auto-correlation study and may need to test
all of them to find the best AR process for our prediction.
MA _ Moving Average Models:
So far, we discussed about using previous values to predict
the future values. We can also use the average values of past observe points to
predict the future.
Moving Average (MA) models don’t take the previous ‘Y’
values as inputs, but rather take the previous error terms (how wrong our
moving average was compared to the actual value).
ARIMA Model:
By combining AR
process with MA models, we can have ARMA model. ARIMA model is one step further
of ARMA model. The “I” stands for integrated. In other words, we are combining
AR and MA techniques into a single integrate model. This integrated process
helps to have a ‘stationary’ data.
In ARIMA (p, d, q) we have 3 parameters.
P: The lag number we derived from auto correlation
q: How many prior time periods we are considering for observing
sudden trend changes (number of lags in MA part of the ARIMA model)
d: if d = 1 it tells ARMA model that we are now predicting
the Difference between one prior period and the new period, rather than
predicting the new period’s value itself. As already mentioned, it may cause
stationarity for the model.
If d=2 it may capture exponential movements in our time
series but it’s not frequently used.
So, we also can put d=0 if we sure that the time series is
already stationary.
Now let us continue the previous example and forecast the
APPLE stock price for two weeks from now.
making autocorrelation and partial autocorrelation charts
help us choose
hyperparameters for the ARIMA model.
the ACF gives us a measure of how much each "y"
value is correlated to the previous
n "y" values prior. (parameter q)
the PACF is the partial correlation function gives us ( a sample
of) the amount of
correlation between two "y" values separated by n
lags excluding the impact of
all the "y" values in between them (parameter p)
The ACF plot suggest to choose lag = 1 for our q parameter
and the PCF plot suggests to choose either lag=1 or lag=3 for our p parameter
in the ARIMA model.
Now let use this parameter in our ARMA model. Actually, we
already used weekly returns for our model so, the I=0 in ARIMA model.
1 2 3 4 5 6 7 8 9 10 11 | from statsmodels.tsa.arima_model import ARMA #Notice that you have to use udiif( weekly_ return)_ the differenced data rather # than the original data ar1 = ARMA (tuple(udiff.values),(3,1)).fit()#(3,1) ::> (p,q) ar1.summary() ## Forecasting 2 weeks ahead steps =2 forecast = ar1.forecast(steps=steps)[0] print(forecast) # :>>> [0.00945523 0.00603272] |
The model predicted the weekly return will be 0.00945523 next week and will be 0.00603272 for two weeks from now.
Shootingercasino.com | - Shootercasino.com
ReplyDeleteShootingercasino.com is a Licensed Online Casino with 제왕카지노 a US 메리트 카지노 Friendly & Licensed Shooting Club: Shooting Club, 1xbet korean the ultimate destination for slots, table games,