Cointegration analysis case study: Pairs trading

For an introduction on Cointegration, read the Wikipedia entry. In the rest of the post, I will assume you know what cointegration means. I will, of course, provide context where needed, but this is an intermediate read, and the assumption is that you’re familiar with Time Series Analysis.

The primary focus here is to show how to correctly apply the cointegration framework to real data. The secondary focus is to apply this framework to pairs trading. The rest of this post will focus on the data considerations and statistical modelling.

1. Data Link to heading

  • Tickers: EUR/CHF, EUR/GBP, GBP/CHF

  • Type:

    • Spot market: Bid-Ask Close prices
  • Timeframe: 2009 - 2019

  • Frequency: daily

  • Total observations: ~2870

The data is widely available from other sources, but I can provide it upon request (contact links at the end of the article).

2. Analysis scope Link to heading

Intuition: cointegration establishes whether two or more variables have a statistically significant correlation in the long term. One of the main points of this framework is to avoid the issue of spurious regressions.

The three tickers under analysis, are the major currency pairs of the countries within the European Economic Area (EEA). The expectation here is that these pairs are “linked” in the long-term because they are a proxy for the economies of the underlying countries. These countries are also closely linked to one another via trade, geography, culture, and history (working hours, holidays, freedom of movement, etc.).

The scope of the analysis will show if this thesis is validated or not by the data.

3. Analysis Link to heading

To correctly model this framework to the data, we need to go through the following steps:

  1. pick a significance level for the statistical tests, e.g., 1%, 5%, 10%, etc. I selected $\alpha=0.05$.

  2. perform stationarity tests on the variables to make sure that they are non-stationary in levels, or integrated of order 1, or $ I(1) $. All variables need to be integrated of the same order.

  3. select a Vector Autoregression $(VAR)$ model and choose an appropriate lag order $ p $. The $ p $ can be selected with information criteria like Akaike (AIC), Bayesian (BIC), or Hannan-Quinn (HQC), or other underlying theory or belief of the analyst.

  4. estimate the $VAR(p)$ model in step 2. and test for the whiteness of residuals.

  5. if the errors are white noise, then the selected $VAR(p)$ correctly describes the data.

  6. perform Johansen cointegration test on a Vector Error Correction Model $(VECM)$ of order $p -1$ to establish the number of cointegrating relationships between input variables. For three variables, the outcome of the test can be any of the following:

    • 0 -> variables are integrated but not cointegrated;
    • 1 -> 1 cointegrating relationship -> VECM is an appropriate model; (best outcome)
    • 2 -> 2 cointegrating relationships -> VECM is an appropriate model; (second best outcome)
    • 3 -> no cointegration, the variables follow different systematic factors -> VECM is not an appropriate model.

    In general, for $m$ variables, we want $r = m - 1 \geq 1$, for cointegration to be present. Where $r$ represents the number of cointegrating relationships, which is also the rank of the long-run impact matrix, $ \beta $, in the following equation

    $$ Π=αβ' \tag{1} $$

    where, $Π$ is a $m\ x\ m$ matrix. $Π$ comes from the full VECM model below,

    $$ \Delta X_t = c + \Pi X_{t-1} + \sum_{i=1}^{p-1} \delta_i \Delta X_{t-i} + \epsilon_t \tag{2} $$

    where, $X_t$ is a vector of $m$ variables, $c$ is a vector of constants, $\delta_i$ are $m\ x\ m$ matrices of coefficients, and $\epsilon_t$ is a multivariate white noise process.

  7. if cointegration is found in the previous step, we can estimate the $VECM(p -1)$ model with cointegration rank $r$. Once we’ve fit the model to the data, we can analyse the results and draw conclusions. In this case the initial thesis was that EUR/CHF, EUR/GBP, and GBP/CHF are cointegrated.

4. Results Link to heading

I used Python’s statsmodels to implement the steps in section 3., but if you use a different software the results may vary. Also, from equation (2) you can see I used a constant term, but not all VECMs need one. It makes sense here because the exchange rates of the major European currencies are not going to drop to zero, yet I could not, a priori, establish what this constant term was, and so also had to estimate it.

If equities were under analysis, the constant term could’ve been omitted, depending on the actual ticker/symbol. The results I obtained required that assumption on my part – although they were formed based on a prior analysis of the data. In general, The smallest amount of assumptions the better.

The results from the Johansen test supports the existence of $1$ cointegrating relationship between the three currency pairs. My thesis has been validated by the data. From Table 1, we can extract the coefficients for the cointegrating equation in formula (3).

                 Cointegration relations for loading-coefficients
coefstd errzP>|z|[0.0250.975]Beta matrix rank

Table 1. Cointegrating relationship


$$ S_t = 1.2860 + EUR/CHF_t -1.4658\ EUR/GBP_t -0.8768\ GBP/CHF_t \tag{3} $$


According to the theory, series $S_t$ is stationary, or $ S_t \sim I(0)$. There are different ways to test that,

  1. run stationarity tests as suggested in the previous section;
  2. plot series $ S_t$;
  3. plot the residuals of $ VECM(p - 1)$.


I will opt for option 2. because of simplicity:

S_t series

But for good measure I took another stationarity test and confirmed it. By visual inspection, the spikes might seem excessive or could lead to believe it’s non-stationary, but if you look at the magnitude of the values on the Y-axis you’ll notice even the “extreme” values are really not extreme.

5. Concluding remarks Link to heading

My expectations have been matched by the data because the results show that there’s cointegration in the major European currencies from 2009 to 2019. My focus was not to explain this relationship but simply to test whether it exists or not. I’ve also managed to show how to correctly implement the cointegration framework to real data and extract the most important parts for the application to pairs trading, or trio trading in this case. I’m hoping this would help you in implementing your money-printing trading strategy. I’m not asking for much, just that you remember me when you’re on your megayacht.

What about other pairs or trios? Link to heading

I only covered those tickers because I’m familiar with them and had a good reason to believe they were correlated. You can apply this framework to other variables, for purposes outside pairs trading. VAR and VECM models have applications outside Econ/Finance.

If you found this analysis interesting feel free to share on the social media links below. For anything else you can also send an email.

Share on: