What’s next for Bitcoin?

We have seen plenty of drama lately in the Bitcoin arena. Technological questions aside, it is time to re-evaluate the involvement in a market with looming roller coaster dynamics.

Originally, our motivation to favor cryptocurrency over other asset classes as a subject to automated trading, resulted from these four considerations:

  • Bitcoin exchanges like Bitstamp or Gemini are easily accessible from a software developers point of view. They have modern, well maintained and well documented APIs, and you don’t need to work for a financial services institution to receive access.
  • The trading hours are 24×7, so the trading bot does not run idle 2/3 of the time, which seems like a waste of resources.
  • Cryptocurrencies are cool. I don’t blame you if you beg to differ, with Bitcoin going more and more mainstream nowadays. But back in 2014 there was no dispute about it being the coolest thing since the UNIVAC 9000 series.
  • Little regulation. Don’t get me wrong: Regulation is a very very very good thing. After the 2010 flash crash, regulators in all major markets started to look very closely on automated trading, and put sensible restrictions in place. Since then we have seen a few more flash crashes, but non of them was nearly as severe as the 2010 incident. And this was certainly not due to a more responsible behavior of the market participants. So regulation is a good thing. But that said, if automated trading is what you want to do, it complicates your life. In German law, Bitcoin is neither a currency nor a security, so it is mostly unregulated, which made our project a little easier.

The last point was originally an advantage, but it seems to turn into a hassle now, because as it stands today, the market seems to go crazy. This is a problem, because it becomes inherently unpredictable.

gr13

It might not be obvious, but with a deep neural network the predictive performance, at the end of the day, still depends on the ability to find statistical relationships, however hidden and convoluted they might be. In a market with constantly changing influencing factors, these interrelations are hard to find, even under normal conditions. Add craziness as another complexity layer and your neural network’s only output will be white noise. At least with mine, this is the case.

So what can we do?

I hear many people talking about tulips lately. They refer, of cause, to the Tulip mania in the early 17th century. They point out parallels of today’s Bitcoin exchange rate to the historic tulip prices, to point out that Bitcoin is a case of an irrationally inflated bubble that is doomed to burst.

Indeed there seems to be a good portion of irrationality. In the past, whenever we saw a hike in the Bitcoin price, it came with an obvious explanation: The disappropriation of Russian bank customers in Cyprus; the Indian demonetarization policy; gambling in China. The last event in this series was the cancelled hard fork in November. Although the problem addressed by the proposed fork has not been solved yet by any means, since then the price has more then tripled. I don’t believe, that many of the buyers have a firm grasp of blockchain hard forks. It just does not justify the latest price hike.

Use this finding as input for a little ducktesting,  and you will likely come to the conclusion that we are, in fact, dealing with a bad case of a speculative bubble.

So is this the time to abandon Bitcoin and blockchain technology and move on the the next cool thing? Or walk back to something more conservative?

Let’s come back to the tulips: What happened after the bubble has burst? Take a walk through almost any neighborhood in almost any western community, and you see, that, while the tulip bubble is gone, the tulips are still there.  They can be found in most private and public gardens. They cover a significant share of the land surface of the Netherlands and represent a small but notable share of the Dutch economy.

To me, this looks like a blueprint for the further way of Bitcoin. The current craze has the beautiful side effect, that for the first time people with no immediate need and no interest in the technology, create Bitcoin wallets and acquire cryptocurrency. No matter if the price stabilizes at the current level or it crashes and then stabilizes at a much lower level: The wallets will still be there and people will still own Bitcoin and know a lot more about it then a few months ago.

No matter how this ends: When it’s over, Bitcoin will likely be ubiquitous in more and more areas, like tulips are today. We might finally enter a phase where Bitcoin will be used the intended way: as a currency.

As a conclusion: This is not the time to leave the field of Bitcoin. If anything it is a good time to enter the area of cryptocurrencies and blockchain technology, because no matter if the current market is a bubble or just very healthy growth: It will contribute to a much broader use of the technology in the next few years.

 

Advertisements

Building the Reinforcement Learning Framework

To build our reinforcement learning framework, we are going to follow the basic recipe laid out in the February 2015 Nature article “Human-level control through deep reinforcement learning” (http://dx.doi.org/10.1038/nature14236).

Reinforcement learning has been shown to reach human and superhuman performance in board games and video games. Transferring the methods and experiences from this to the use case of trading goods or securities seems promising, because it has many similar characteristics:

  • interaction with an environment that represents certain aspects of the real world,
  • a limited set of actions to interact with this environment,
  • a well-defined success measure (called “reward”),
  • past actions determine the future rewards,
  • a finite, semi structured definition of the state of the environment,
  • unfeasibility of directly determining the future outcome of an action due to a prohibitively large decision tree, incomplete information and missing knowledge about the interaction between the influencing factors.

Our inference engine is going to be Deeplearning4J (DL4J, see https://deeplearning4j.org/). The DL4J website contains a very brief and well written introduction to reinforcement learning, which I highly recommend, if you are not familiar with the concept yet: https://deeplearning4j.org/reinforcementlearning.

The first step in implementing a RL framework for Bitcoin trading is, to map the conceptual elements of the process to our use case:

  • Action
  • Actor / Agent
  • Environment
  • State
  • Reward

Action

An action is a distinct operation with a direct impact on the state of the actor and the environment. In the case of game playing, placing a tile on a specific field on the board or moving a joystick in a certain direction are examples of actions. In the case of Bitcoin trading, the obvious actions are placing and cancelling orders to buy or sell certain amounts of Bitcoin at a given cryptocurrency exchange.

A smaller set of actions improves the learning speed. For optimal performance we will restrain our action set to only three possible actions for now:

  1. Cancel all open sell orders and place a buy order at the last market price using 10% of the available USD assets.
  2. Cancel all open buy orders and place a sell order at the last market price using 10% of the available Bitcoin assets.
  3. Hold (do nothing).

In a later version we will likely extend this to

  • have cancelling and placing orders as distinct actions,
  • a larger variety of amounts (other than “10% of available assets”) to use for buy and sell orders,
  • different limits, above and below the last market price.

But for gaining experience, and to go easy on our computational resources, we are going to keep it simple for now.

Actor

The actor is our trading bot, which is using the Bitstamp-API to place and cancel orders. We are going to reuse existing Java code from the old trading system for this.

Environment

Since we don’t want to reinvent the data collection and we already have collected several years worth of training data, the environment is given by all the data sources that we have defined in the old trading system. (https://notesonpersonaldatascience.wordpress.com/2016/03/06/wrapping-up-data-collection/)

State

The current state of the environment is the input for the inference machine. We can reuse the format that we have used for the old Bitcoin prediction system for this (https://notesonpersonaldatascience.wordpress.com/2016/03/21/data-re-coding-1/). It has some issues that we might address later, but we don’t want to reinvent the wheel so we stick to it for now.

Reward

We have two possible ways to define the reward:

  • After each executed sell order: the difference between the sell price and the previous average buy prices of the sold Bitcoins, minus transaction costs
  • In each step: the difference of the current net value (USD +BTC) and the net value in the previous step.

The first option compares better to the game analogy and also takes advantage of one of the key features of reinforcement learning : assessing future outcomes of current actions), but the second option promises faster convergence, so to begin, we choose the second option.

 

Transition to Reinforcement Learning

Our current goal is to introduce Reinforcement Learning into the decision making component of an existing Bitcoin trading system. To put this into a broader context: What we are about to do is — in the flowery terms of business speak — the upgrade from “Predictive Analytics” to “Prescriptive Analytics”. Please google for “Analytic Value Escalator” for a 30000 ft. view, and think a minute about this: Is the question “What will happen” really harder to answer then “Why did it happen”? (In data science you have to take market research serious, even if it hurts.)

Walking up a step on this escalator, with the task at hand, we are interested in going from the question “What will happen?” to “How can we make it happen?”. Again, I am not completely convinced, that the latter one is more difficult to answer, but it requires us to make a big architectural change in the trading system.

To understand this, let’s have a look on the training pipeline that we have used so far.

Old Architecture: Separation of Concerns

BtcOldPredictionPipeline

We had three servers involved:

  • A build server running Jenkins, which is used for multiple projects and not of particular interest for us here and now.
  • A server running the trading software, called “Trade Hardware”, executing a stack of shell scripts and Java programs.
  • A powerful machine for computationally intense tasks implemented in Matlab and Java, called “Compute Hardware”.

Here is what happens at the blue numbered circles:

  1. Once a week the build server triggered a training session. The reason for regular re-training is, that we wanted to have current trends in the market behavior to be reflected in our prediction model. Also, each week we had aggregated significantly more training data. More training data promised better prediction results.
  2. Input variables are collected and normalized for neural network training.
  3. Target variables are calculated for training. We have used a binary classifier with 19 output variables that predicted different events like this: “The BTC/USD rate will go up by 2% in the next 20 minutes”.
  4. To reduce the size of the input, a PCA was performed and only the strongest factors were used as input variables. The PCA Eigenvectors and the normalization factors from step 3 are stored for later, to transform raw input data in production to a format consistent with the training input.
  5. The previous neural network model is loaded to be used as initial model for training.
  6. The training is run in Matlab. We don’t need to dive deeper into this topic, because in the new architecture, we will use Deeplearning4J instead of Matlab for the training step.
  7. The new trained model is stored.
  8. The new model is tested in an extensive trade simulation.
  9. The trading software is restarted so it uses the updated model.
  10. Normal trading goes on for the rest of the week.

New architecture: Tight Integration

This pipeline has been built around the concept of a strict separation between trading execution and prediction. The prediction algorithm was part of a decision making module, which itself was just a plugin module of the trading software which could be replaced by another implementation that encodes another strategy. This was actually used to assess simulation results: To determine a baseline performance the decision making component in the simulation has been replaced by one that follows a random trading strategy.

With the transition to reinforcement learning, this strict separation goes away. The learning agent learns from the interaction with the environment, so it completely assumes the role of the decision making component.  From a system design perspective, this makes our life much easier, because many hard questions in the decision making component are now covered by machine learning:  How to interpret the predictions? Where to set thresholds for the output variables? What percentage of available assets to use in a trade? The reinforcement learning agent produces a finished decision that can be directly converted into a buy- or sell-order.

Also the agent does not stop learning once it is in production. The learning is a permanent background process, that takes place during trading. This means, that after the initial training phase, we can retire the “Compute Hardware”, because there is no necessity for weekly retraining.

All this looks lean and efficient at first glance, but it will create problems down the road:

  • The tight integration between machine learning and business logic results in a monolithic architecture,  which will be hard to maintain.
  • The interface between data science and software development has been largely inflated. In the old architecture, the responsibility of data science ended with the prediction, and software development could take on from there. Both groups worked strictly within the bounds of their traditional realms, each with their established tools and processes, which have very little in common, other than the fact that they mostly run on computers. The new architecture leads to a huge overlap of responsibilities, which will require new tools, a common language, mutual understanding and a lot of patience with each other.

Conclusion

Even without looking at the specifics of Reinforcement Learning, we already see, that the system design will become much simpler. The machine learning subsystem assumes more responsibility, leaving less moving parts in the rest of the system.

The project management on the other hand might turn out to be challenging, because the different worlds of data science and software development need to work much closer together.

Further Reading

Deep Reinforcement Learning for Bitcoin trading

It’s been more than a year, since the last entry regarding automated Bitcoin trading has been published here. The series was supposed to cover a project, in which we have used deep learning to predict Bitcoin exchange rates for fun and profit.

We have developed the system in 2014 and operated it all through the year 2015. It has performed very well during the first 3 quarters of 2015, … and terribly during the last quarter. At the end of the year we have stopped it. Despite serious losses during the last three months, it can still be considered a solid overall success.

I have never finished the series, but recently we have deployed a new version, which includes some major changes, that hopefully will turn out to be improvements:

  • We use Reinforcement Learning, following DeepMind’s basic recipe (Deep Q-learning with Experience Replay) from the iconic Atari article in Nature magazine. This eliminates the separation of prediction and trading as distinct processes. The inference component directly creates a buy/sell decision instead of just a prediction. Furthermore the new approach eliminates the separation of training and production (after an initial training phase). The neural network is trained continuously on the trading machine. No more downtime is needed for re-training once a week, and no separate compute hardware is lying idle with nothing to do for the other six days of the week.
  • We use Deeplearning4J (DL4J) instead of Matlab code for the training of the neural network. DL4J is a Java framework for defining, training and executing machine learning models. It integrates nicely with the trading code, which is written in Java.

This will change the course of this blog. Instead of finishing the report on what we have done in 2014, I am now planning to write about the new system. It turns out, that most of the code we have looked at so far, is also in the new system, so we can just continue where we left off a year ago.