Predicting the share price of a JSE Top 40 company.

Authors : Blessing Magabane and Masilo Leope.
Can we really predict the share prices on the stock market using machine learning?
Introduction:
Machine learning has revolutionised mankind’s powers to predict and has accordingly become an essential tool in predicting the occurrence of events. This blog explores the wonders of machine learning with specific regard to the use of machine learning in predicting share prices on the stock market.
Predicting the stock market can be a challenge given all the factors that come into play are but not limited to macro economic factors, which includes recession or economic expansion or change in legislation,which are in fact very difficult to measure how they can impact the stock market.
Out of the top 40 performing companies in the JSE we chose MTN group to predict its share price. MTN is a leading telecommunication provider in emerging markets, with a market capitalisation of approximately R108,36Billion. MTN group offers a unique value proposition due to its presence in multiple African countries and the Middle East nations. For these reasons we found it interesting to investigate and make deductions of it’s share price performance.
There are several ways in which one can analyse the market to eventually come up with a prediction, we are going to use technical analysis, whereby we use historical data to predict the future patterns of the stock market. These predictions are purely scientific; they make use of mathematical constructs that rely on prior state of the data with an end goal to make intelligent guesses about the future.
Advancements in the artificial intelligence space have offered some algorithms that are robust enough to predict the share price to some degree of confidence. We will be looking into these algorithms to see if they can predict the share price, notable of those being LSTM(Long Short Term Memory) and ARIMA(Autoregressive integrated moving average). We will be using these algorithms to predict the share price of MTN Group.
Datasets:
Financial data taken from Yahoo finance, which contains data from companies all over the world, will be used to make predictions. To obtain the share price of MTN Group we will use an API that connects to Yahoo Finance, this API is in the form of a library known as “yFinance”.
Below is a screen-print showing the querying of data from Yahoo Finance.

The trading name for MTN Group on JSE is simply ‘MTN’, hence we specify the tickerSymbol to ‘MTN’. We query data from 2010 till to 2019, which will be used for training the model.
Data Acquisition process:

Before building the model we do some data analysis, see the plot below of the closing share price.

Early in 2010 the share price for MTN Group was below 50 rands, but over the years the share price increased steadily. The data suggests that the company is growing quite well. Between 2010 and 2017 the share price increased by 78%, we excluded 2018 and 2019 since a lot was going on between those years.
Below is a plot of the moving average,

The moving average gives us the direction of the share price. There is a positive direction of the share price which confirms that the price momentum is increasing. However in 2018 the share price dropped sharply and it recovered early in 2019, which seems like a market correction. The analysis shows for i.e (if you invested early on into the company and held the investment for a decade you would have received a high return on investment).
Model:
In this section we will be predicting the share price of MTN group for the first two months of 2020, by using Long Short Term Memory (LSTM) and Auto Regressive Integrated Moving Average (ARIMA).
LSTM is a sequence based algorithm, while ARIMA is a statistical based algorithm.We will also compare the two models based on their accuracy and level of prediction.
Model development:
We will start with LSTM, but before we make any predictions we need to index the date and isolate the closing price.
See the below screen-print showing the indexing of the date,


LSTMs model can be manipulated for various parameters such as changing the number of LSTM layers and increasing the number of epochs. For the purpose of our model we used only one layer, the Adam optimiser, 10 epochs and the measure of accuracy:Mean squared error.
See the below screen-print showing the implementation of LSTM,

The data is fairly small hence we used one layer, had we added more we would’ve overfitted the model.
Next we look at the ‘Auto Regressive Integrated Moving Average’ or simply ARIMA. ARIMA is a model that is used to fit time series data to forecast future values. It handles time series sequence data quite well.
One of the properties of ARIMA is the ability to analyse seasonality, trends and the residual.
See the below plots of the analysis seasonality, trends and the residual of the share price.

The following screen-print below shows implementation of ARIMA model,

The ARIMA model automatically selects the best combination of (p,q,d) that provides the least error. The start_p value is currently at 0, while the start_q value is also at 0. The max_p and max_q values are both 3. Since we use the auto- ARIMA model the optimal d value is automatically chosen by the model.
These three important parameters(p,q,d) in ARIMA are used for different purposes, the p value is the past value used for forecasting the next value, while the q is the past forecast errors used to predict the future values and d is order of differencing.
Results :
To validate the data from our prediction, we queried data from the beginning of January -2020 until the end of February -2020 on Yahoo finance. We are going to to predict the first 41 days of 2020. We plotted the result from our prediction below:
We start with LSTM results,

LSTM produced results that are comparable to the actual closing price. The trend and the fluctuations have similar characteristics that are observed in share prices. The limitation with LSTM is the inability to analyse seasonality and the trend.
We look at ARIMA results,

ARIMA produces predictions that are more linear in nature and gives direction to the closing price. But this is not surprising since the algorithm makes use of moving averages; the prediction will always be an average of the actual values. From the plots above the model has only captured a linear trend in the series, however does not focus on the seasonality part of the data.
Anomaly Detection :
In late February 2020, the stock markets around the world began to crash, as negative news sentiment about Covid-19 were spreading like wildfire. Due to interconnectedness of the global markets the JSE also began to tumble. MTN group also took a huge plunge following the global trend. Unfortunately neither the Arima nor the LSTM models could detect the crash in the market, specifically the sudden changes in the share price of MTN group. A resolution to this issue has however been found through anomaly detection.This technique can detect sudden changes, although we are not going to look at anomaly detection in this blog it is worth mentioning, that it can be used to spot unusual data patterns.
Conclusion:
In terms of prediction power, it is without doubt that LSTM performed better than ARIMA. But despite its prowess and agility it lacked detail and that’s where ARIMA shines. ARIMA provided the seasonality, residual and trend while LSTM only gave us the trend alone.
But both models struggle to pick the market crash due to Covid-19. Anomaly detection methods can be used to identify strange patterns in the data. We can also adjust the parameters of both ARIMA and LSTM to improve the performance.
The code for the above models can be accessed on Github,
You can contact me on the following platforms,
Twitter :@blessing3ke