I have been researching for some time how to create trading strategies using Quantitative Finance and Machine Learning.
In this article I present a study made exclusively for the Bitcoin Portal to answer the following question: Is a machine learning model capable of overcoming the Buy & Hold in Bitcoin? This is an interesting question even more so in a year like 2018 where bears are dominating the cryptocurrency market. What is machine learning?
In plain English, machine learning is a subfield of computer science that has evolved from the study of pattern recognition and computational learning theories in artificial intelligence (Wikipedia / Encyclopedia Britannica).
That is, it is the ability to write an algorithm (via programming language) that uses an input database (training) to develop a mathematical equation that best represents a certain phenomenon, reducing as much as possible the difference between the predicted and the observed value (test / validation databases). What is Quantitative Finance?
According to an excellent definition of FGV, quantitative finance and financial engineering are the areas of finance that involve the application of tools and methods of traditional finance, mathematics, physics, computer science, economics and econometrics to solve problems of interest in areas such as investment management, corporate finance, risk management, pricing and hedging of derivative instruments, trading, economic finance, structured products and asset allocation.
In a definition I like to use, that's when math meets the investment world. Do past returns influence future returns?
This is a classic question when we talk about trading. The quest for the holy grail  the moneymaking machine  gets the attention of the brightest minds in the world. Even more at this moment where the power of processing, storage and access to data is the greatest of our entire history.
To help answer this question, I selected a database of Bitcoin quotes from every 15 minutes, from May 2017 to July 2018, and calculated the return for the period, going back to a total of 12 periods (or 3 hours, since we speak of intervals of 15 minutes). The central idea is to assess whether there is any interference or past return relationship for Bitcoin's current return. In the bottom chart, the current return is called Return_P0 (on the Yaxis) followed successively from the previous returns from P1 to P12 (on the Xaxis of each chart). The scale is decimal representing the percentage of return. The blue line is an added trend line on each chart that represents: • If tilted down, it means that the current return tends to follow inversely past returns. That is, the stronger the market falls or rises, the opposite is perceived in the sequence, a trend to return to the mean; • If it is horizontal, past returns do not influence the current return at all  this is the expected behavior according to the theory of efficient markets; • If tilted up, it means that the current return tends to follow the same direction as past returns, presented what we call momentum;
Analyzing the graphs, we see that there is not a really significant slope of the blue line in any of the 12 behaviors. However, they are not all exactly horizontal. There is a slight downward slope in some of them, especially the periods P4, P5 and P6. And a slightly positive slope in the first period P1. This is very interesting because it means that the price tends to follow the moment of the last period.
The existence of these small inclinations up or down are the evidence that I needed to develop the mathematical model. Developing the machine learning model
Using the statistical software R, I developed a simple linear regression. A linear regression is that of a kind Y = aX + b that we learn in high school, but no one has the ability to explain what it really means. If I had known these things since then, well ... this is another story.
I divided the database between training and testing. The training of the model consists of the data from 2017 and the test / validation of the model was done with the data of 2018. The target or dependent variable  the information we want to predict  is the (Period_P0) and the independent variables are the returns from P1 to P12. The first result of the model is as follows:
Leaving aside the mathematical rigor to evaluate the results and to improve the model (they are not the purpose of this article), what matters at the moment is to understand the last information in the table, which are the 3 dots ***
Notice the fourth line of the image (from the bottom up) where we have Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1 This means that the variables accompanied by *** are the most relevant to explain the phenomenon of the model  our prediction of the return of the current period. And note that coincidence: they are the variables of returns P4, P5 and P6, precisely those with the greatest slope of the line in the first graph. As the other variables are not statistically significant, I will remove them and run a new regression. The result is as follow, cleaner and more organized. The moment when Machine Learning Outperforms Buy & Hold
Now I will apply this model in the test base (the year 2018). The result speaks for itself. Whoever bought Bitcoin in early 2018 and held it so far has a return of 44.6%. Anyone who had followed a model like this would have only 10.2%  without considering operational costs.
Of course, one can argue that the result is still negative. However, we must consider three very important points:
