Bolin Wu

Modelling Daily Dow Jones Industrial Average by GARCH

Modelling Daily Dow Jones Industrial Average by GARCH
2021-01-14 · 11 min read
R financial data analysis

GARCH is a well known model to capture the volatility in the data. It can be useful to deal with the financial or time series data. This blog will explain the model structure, intuition, application and evaluation.

Prerequisite to read the following blog:

  • Basic knowledge of ARMA model, Box-Ljung test, PACF and ACF.
  • Basic knowledge of R programming.

Data Inspection

The data is from Yukai Yang (2020). FE:Data Sets and Functions for the Course "Financial Econometrics". R package version 1.2.2.
It consists of daily Dow Jones Industrial Average prices spanning from January 1915 to February 1990. The column that we will use is r_Daw_jones, which is the log returns of the prices.
The reason why I use log return is that mathematically, it is equivalent to the continuously compounded gross return of an asset.

# load the package
# this package is for modeling the GARCH
# will use it later

Let's take a look at the data:


   Date Dow_Jones r_Dow_Jones
  <int>     <dbl>       <dbl>
1 10515      55.5     0.00180
2 10615      55.4    -0.00180
3 10715      56.1     0.0126
4 10815      56.6     0.00887
5 11115      57.4     0.0140
6 11215      57.4     0       

From the plots of time series plot the log returns and squared log returns of the daily Dow Jones index, I can see that there are periods of significant increases in volatility, for example, around day 5000. The Ljung Box test gives p-value = 0.000 which rejects the null and indicates that there is serially correlation.
Therefore GARCH model could be a remedy.

Box.test(DJ_d$r_Dow_Jones, lag = 5, type = c("Ljung-Box"), fitdf = 0)

Box-Ljung test
data:  DJ_d$r_Dow_Jones
X-squared = 329.9, df = 5,
p-value < 2.2e-16

Model structure and intuition

Next I will give a very brief introduction of the model. The equations in the following parts are derived from Yukai Yang's financial econometrics course at Uppsala University.
To model the log return $$r_{t}$$, I can decompose it into a predictable part and an unpredictable part:

rt=μt+εt r_{t} = \mu_{t} + \varepsilon_{t}

The μt\mu_{t} is defined to be the condition mean of rtr_{t} and εt\varepsilon_{t} captures the unpredictable part.

Then GARCH(p,q) is given by

εt=ztσtσt=ω+j=1pαjεtj+j=1pβjσtj2\begin{aligned} \varepsilon_{t} &= z_{t} \sigma_{t} \\ \sigma_{t} &= \omega + \sum_{j=1}^{p} \alpha_{j} \varepsilon_{t-j} + \sum_{j=1}^{p} \beta_{j} \sigma_{t-j}^{2} \end{aligned}

ztz_{t} is assumed to be i.i.d standard normal distributed (white noise) and σt\sigma_{t} is the standard deviation of εt\varepsilon_{t}. The assumption of ztz_{t} is important and related to the evaluation of the model which will be shown later.
Note that ARCH(p) is just a special case of GARCH(p,q) with βj=0\beta_{j} = 0

The intuition behind GARCH is that if squaring the εt2\varepsilon_{t}^{2}, we can find it is actually equivalent to an ARMA(p,q) for εt2\varepsilon_{t}^{2}.


First we need to find the appropriate ARMA to model the condition mean function μt\mu_{t}. We can either use ACF and PACF plot or auto.arima() function in R. For simplicity I use the latter one and find that
ARMA(2,3) is good for the conditional mean.

arma_fit = arma(DJ_d$r_Dow_Jones, order = c(2, 3))

mu_t = arma_fit$fitted.values
# follow the equation above
epi_t = DJ_d$r_Dow_Jones - mu_t
epi_t = na.omit(epi_t)
# note that since we are using ARMA(2,3), so the first 2 values are missing for epi_t
# therefore we use na.omit

Next, before finding the lag order of GARCH model, we need to first check whether the mean-corrected returns εt=rtμt\varepsilon_{t} = r_{t} - \mu_{t} are serially uncorrelated. This can be done by using Ljung Box test. The results are listed in the table below. We can see that for ε\varepsilon, the null is not rejected and for ε2\varepsilon^{2}, the null is rejected. It means that the model is fitted well for ε\varepsilon but there is serial correlation in ε2\varepsilon^{2}, as expected.
The reason for expecting ε\varepsilon to be uncorrelated is that previously we mention that εt=ztσt\varepsilon_{t} = z_{t} \sigma_{t} where ztz_{t} is assumed to be white noise therefore by construction it should be uncorrelated.

Ljung-Box Test / Estimators ε\varepsilon ε2\varepsilon^{2}
p-value 0.243 0.000

Then, we can choose the lag order of ε2\varepsilon^{2} based on ACF and PACF.

The plots are a bit tricky to interpret. For PACF, after fisrt two big spikes, the PACF decreases gradually. For ACF, it decreases gradually after first spike. Therefore I assume that ARMA(2,1) could be a good fit for ε2\varepsilon^{2} which is equivalent to GARCH(2,1). However, after running the model I found that the p-value of α2\alpha_{2} is 1, so I slightly adjust it and estimate GARCH(1,1).

fit = garchFit( ~ arma(2,3)+garch(1, 1), data = DJ_d$r_Dow_Jones, trace = FALSE)

Error Analysis:
         Estimate  Std. Error  t value Pr(>|t|)    
mu      4.773e-04   1.093e-04    4.366 1.26e-05 ***
ar1     3.967e-01   1.851e-01    2.144 0.032062 *  
ar2    -4.729e-01   1.046e-01   -4.522 6.11e-06 ***
ma1    -2.365e-01   1.846e-01   -1.281 0.200124    
ma2     3.622e-01   1.044e-01    3.469 0.000523 ***
ma3     1.046e-01   2.136e-02    4.899 9.63e-07 ***
omega   1.460e-06   1.328e-07   10.996  < 2e-16 ***
alpha1  1.026e-01   4.664e-03   21.988  < 2e-16 ***
beta1   8.887e-01   4.881e-03  182.064  < 2e-16 ***

Standardised Residuals Tests:
 Jarque-Bera Test   R    Chi^2  38018.25
 Shapiro-Wilk Test  R    W      NA       
 LM Arch Test       R    TR^2   0.08747904
Information Criterion Statistics:
      AIC       BIC       SIC      HQIC
-6.617908 -6.614161 -6.617908 -6.616678

Based on the results, all the estimators are significant except that ma1 is not significant while ma2 and ma3 are significant. To be honest I do not understand why it happens.


In principle, the following evaluations are needed for the estimated GARCH model:

  1. Check the normality of standardized residuals zt^=etσt^\hat{z_{t}} = \frac{e_{t}}{\hat{\sigma_{t}}}, where et=rtμt^e_{t} = r_{t} - \hat{\mu_{t}}.
  2. Chek the dynamical properties zt^\hat{z_{t}} and zt^2\hat{z_{t}}^{2} by autocorrelation functions.
  3. Test for GARCH with ARCH-LM test.
  4. Find other potential optimal lag orders by information criteria.

Let's look at them one by one.

  1. Since we have large data set, JB normality test is needed instead of Shapiro-Wilk test. The null hypothesis is the data conforms the normal distribution.
    # find the z_t
    sigma_t = fit@sigma.t

    # standardized risiduals, P27
    # remove the first three sigma_t which are NA so that its length is the same as epi_t
    z_t = epi_t /sigma_t[-1:-3]


    Jarque Bera Test

    data:  z_t
    X-squared = 38033, df = 2,
    p-value < 2.2e-16

The calculated p-value is 0.0000 which means that the null is rejected. By design, the standardized residual ztz_{t} is supposed to be normally distributed. Therefore ARMA(2,3)-GARCH(1,1) might be a questionable model.
2. Let's look at ACF of the (squared) standardized residuals

There is only one big spike at lag = 0 for both cases and after lag = 0, the ACF decreases gradually. This looks like a realisation of a white noise process indicating that we have achieved a good fit with the model.
3. The basic intuition behind LM ARCH-LM test is that we want to test against if the estimated model is an ARCH(p) model. The null hypothesis of the test is it is not ARCH(p) model. If we accept the null then needless to say it is a suitable GARCH model. If we reject the null then we can go forward with GARCH directly. The process is actually a pretty smart design.
The p-value is 0.087 as is shown above in summary function.
If the significant level is 0.1 then the null is rejected thus it indicates that GARCH could potentially be a good fit.
If the significant level is 0.05 then the null is not rejected thus GARCH is not an appropriate model.
4. In the summary function we can also get the information criteria. Here let's use AIC for reference.

Model AIC
GARCH(1,1) -6.617908
GARCH(1,2) -6.620575
GARCH(1,3) -6.621514
GARCH(2,1) -6.617790
GARCH(2,2) -6.620469
GARCH(2,3) -6.621408

ARMA(2,3)-GARCH(2,1) is slightly better than ARMA(2,3)-GARCH(1,1).

Potential problems and improvements


  • The LM test is not robust to confirm that the estimated GARCH is good fit.
  • The standard residual failed to pass the normality test, so we may need to try out with other models.
  • In the given dataset, there is volatility clustering, but it only last a short period or and does not show up frequently.


  • Try other ARMA models for the conditional mean μt\mu_{t}.
  • Update the Dow Jones log returns so that we can use a larger dataset for modelling.

Thank you for reading, I hope this blog can be helpful for you. If there is any mistake or confusion please let me know.


Prudence is a fountain of life to the prudent.