Our current goal is to introduce Reinforcement Learning into the decision making component of an existing Bitcoin trading system. To put this into a broader context: What we are about to do is — in the flowery terms of business speak — the upgrade from “Predictive Analytics” to “Prescriptive Analytics”. Please google for “Analytic Value Escalator” for a 30000 ft. view, and think a minute about this: Is the question “What will happen” really harder to answer then “Why did it happen”? (In data science you have to take market research serious, even if it hurts.)
Walking up a step on this escalator, with the task at hand, we are interested in going from the question “What will happen?” to “How can we make it happen?”. Again, I am not completely convinced, that the latter one is more difficult to answer, but it requires us to make a big architectural change in the trading system.
To understand this, let’s have a look on the training pipeline that we have used so far.
Old Architecture: Separation of Concerns
We had three servers involved:
- A build server running Jenkins, which is used for multiple projects and not of particular interest for us here and now.
- A server running the trading software, called “Trade Hardware”, executing a stack of shell scripts and Java programs.
- A powerful machine for computationally intense tasks implemented in Matlab and Java, called “Compute Hardware”.
Here is what happens at the blue numbered circles:
- Once a week the build server triggered a training session. The reason for regular re-training is, that we wanted to have current trends in the market behavior to be reflected in our prediction model. Also, each week we had aggregated significantly more training data. More training data promised better prediction results.
- Input variables are collected and normalized for neural network training.
- Target variables are calculated for training. We have used a binary classifier with 19 output variables that predicted different events like this: “The BTC/USD rate will go up by 2% in the next 20 minutes”.
- To reduce the size of the input, a PCA was performed and only the strongest factors were used as input variables. The PCA Eigenvectors and the normalization factors from step 3 are stored for later, to transform raw input data in production to a format consistent with the training input.
- The previous neural network model is loaded to be used as initial model for training.
- The training is run in Matlab. We don’t need to dive deeper into this topic, because in the new architecture, we will use Deeplearning4J instead of Matlab for the training step.
- The new trained model is stored.
- The new model is tested in an extensive trade simulation.
- The trading software is restarted so it uses the updated model.
- Normal trading goes on for the rest of the week.
New architecture: Tight Integration
This pipeline has been built around the concept of a strict separation between trading execution and prediction. The prediction algorithm was part of a decision making module, which itself was just a plugin module of the trading software which could be replaced by another implementation that encodes another strategy. This was actually used to assess simulation results: To determine a baseline performance the decision making component in the simulation has been replaced by one that follows a random trading strategy.
With the transition to reinforcement learning, this strict separation goes away. The learning agent learns from the interaction with the environment, so it completely assumes the role of the decision making component. From a system design perspective, this makes our life much easier, because many hard questions in the decision making component are now covered by machine learning: How to interpret the predictions? Where to set thresholds for the output variables? What percentage of available assets to use in a trade? The reinforcement learning agent produces a finished decision that can be directly converted into a buy- or sell-order.
Also the agent does not stop learning once it is in production. The learning is a permanent background process, that takes place during trading. This means, that after the initial training phase, we can retire the “Compute Hardware”, because there is no necessity for weekly retraining.
All this looks lean and efficient at first glance, but it will create problems down the road:
- The tight integration between machine learning and business logic results in a monolithic architecture, which will be hard to maintain.
- The interface between data science and software development has been largely inflated. In the old architecture, the responsibility of data science ended with the prediction, and software development could take on from there. Both groups worked strictly within the bounds of their traditional realms, each with their established tools and processes, which have very little in common, other than the fact that they mostly run on computers. The new architecture leads to a huge overlap of responsibilities, which will require new tools, a common language, mutual understanding and a lot of patience with each other.
Even without looking at the specifics of Reinforcement Learning, we already see, that the system design will become much simpler. The machine learning subsystem assumes more responsibility, leaving less moving parts in the rest of the system.
The project management on the other hand might turn out to be challenging, because the different worlds of data science and software development need to work much closer together.
- More information about Reinforcement Learning with DL4J
- Andrej Karpathy blog: Deep Reinforcement Learning: Pong from Pixels