Still struggeling with GDPR

The GDPR,  and it’s German derivative BDSG-new, is in a sense like a prophecy from the oracle of Delphi. You hear the words, but no matter how hard you try, you can’t understand what they really mean, until the course of history knocks you down. Now, in Germany for regulations, the role of the “course of history” is assumed by the courts, and until they provide some clarification about what exactly constitutes compliant behavior, I believe that at this blog we have to take the words of the regulation literally, which leads to a number of restrictions and inconveniences in our communication. Scroll beyond the next two paragraphs for details.

To provide some context: The German version of the European general privacy protection regulation is very generic in the description of the requirements, and on the other hand draconian in the measures. The fines for noncompliance are clearly made to put you out of business forever.

Experience tells us, that the most absurd possible interpretation of the regulation will prevail in judicature, until after decades of mindless harassment of all well meaning parties involved, a high court cleans out the mess for good. In Germany this is almost always the case when the internet is involved.

As a consequence there is currently only one way for me to comply, and I have no idea how anyone else seems to get around it: I refuse to process any personal information in matters regarding this blog. So, like most people, I have turned off the comment function. Also I do not accept any direct electronic communication about this blog. If you are a resident of an EU country, please do not even try to send me emails. They will be deleted instantly. Instead, please post your questions, thoughts and comments on Facebook, Twitter, LinkedIn, Google+, etc..

If you need to send me a private message, please encrypt it using this key, and then again post it on Facebook, Google+, etc. using the hashtag #notesonpersonaldatascience. I will find it and answer using the same channel.

The point of this is: this mode of communication leaves none of your personal data in any computer, router, firewall, cache or backup disk under my control.

Of course, it will at the same time refine your profile at Facebook or Google. I am truly sorry for that, and I also assume, that this is the opposite of the intention of the lawmakers who created the GDPR.

If anyone comes up with a better solution, I will happily adopt it. Maybe this should be Watsons next challenge! Meanwhile things are what they are.

The good news at the end: Users from the EU are no longer locked out from NotesOnPersonalDataScience.

Timeout for European readers

A few days from now, the site will not be available to users in the European Union any more. I will put it back to normal operation, as soon as a few open questions regarding the GDPR law have been sorted out by German courts. I am optimistic, that this will not take too long. Sorry for the inconvenience.

The Force Awakens: AI and Modern Conflict — #MSC2018 warm up

The 54th Munich Security Conference had an unofficial pre-opening yesterday, with only a handful of the formal attendees and a public panel discussion about the upcoming role of AI in modern warfare. The panelists represented political and military entities and one NGO. This composition distinguished yesterday’s event from a technical conference in a way that was at the same time delightful and disturbing.

The most notable contributions came from the two and a half women on the stage. Kersti Kaljulaid, president of Estonia, offered some advice on how the executive might be able to contain the development of rogue AI. Her proposals filled the whole spectrum from helpless actionism (monitoring energy use, apparently hoping that the developers of bad AI don’t use cloud resources) to pragmatic and feasible, but generic approaches (build a blockchain based marketplace for whistleblowers to generate leads to malicious operations by their own flakey members).

Mary Wareham of Human Rights Watch coordinates the “Campaign to Stop Killer Robots”. She used the discussion to draw the attention to the question, what can be done by international agreements to prevent development and use of fully autonomous lethal weapons in warfare. Given the scope of the conference (and the fact that many of the folks involved in this discussion have to rely on second hand information when it comes to technical capabilities), this seems to be the only question really leading anywhere.


And then there was Sophia. She never held a public office or exerted much influence on international matters. But she is the first robotic citizen of Saudi Arabia, and she delivered the opening speech  of the day. Without an active role in the panel, she spent the rest of the event at the speaker’s desk, and it was quite entertaining to watch her (probably unintendedly) shaking her head when certain topics came up.

The other panelists were Darryl A. Williams, Lieutenant-General, Commander of the Land Forces of NATO, and

Anders Fogh Rasmussen, former NATO Secretary General. The moderator was NYT columnist David E. Sanger.

Applied AI with DeepLearning, IBM Watson IoT Data Science Certificate

I’ve just (literally minutes ago) completed “Applied AI with DeepLearning, IBM Watson IoT Data Science Certificate”. It is a very well prepared course by IBM — mostly by the very nice people of the Munich Watson IoT Center 🙂 and also some important portions by Skymind, the awesome creators of DL4J —  delivered through Coursera.

The course covers a lot of ground in a very short time. Details get lost at this speed, so if you look for a deep understanding of AI, you will be happier with some of the offerings of academia. But if you look for a refresher or an update on industry trends, this course is for you. Even more so, if you are an industry practitioner with a software background, and need to come up to speed on AI.

Here is the link to the course. If you have more time, and look for a solid foundation, I recommend Andrew Ng’s “Machine Learning”. Of course there is nothing to stop you from taking both courses…



What’s next for Bitcoin?

We have seen plenty of drama lately in the Bitcoin arena. Technological questions aside, it is time to re-evaluate the involvement in a market with looming roller coaster dynamics.

Originally, our motivation to favor cryptocurrency over other asset classes as a subject to automated trading, resulted from these four considerations:

  • Bitcoin exchanges like Bitstamp or Gemini are easily accessible from a software developers point of view. They have modern, well maintained and well documented APIs, and you don’t need to work for a financial services institution to receive access.
  • The trading hours are 24×7, so the trading bot does not run idle 2/3 of the time, which seems like a waste of resources.
  • Cryptocurrencies are cool. I don’t blame you if you beg to differ, with Bitcoin going more and more mainstream nowadays. But back in 2014 there was no dispute about it being the coolest thing since the UNIVAC 9000 series.
  • Little regulation. Don’t get me wrong: Regulation is a very very very good thing. After the 2010 flash crash, regulators in all major markets started to look very closely on automated trading, and put sensible restrictions in place. Since then we have seen a few more flash crashes, but non of them was nearly as severe as the 2010 incident. And this was certainly not due to a more responsible behavior of the market participants. So regulation is a good thing. But that said, if automated trading is what you want to do, it complicates your life. In German law, Bitcoin is neither a currency nor a security, so it is mostly unregulated, which made our project a little easier.

The last point was originally an advantage, but it seems to turn into a hassle now, because as it stands today, the market seems to go crazy. This is a problem, because it becomes inherently unpredictable.


It might not be obvious, but with a deep neural network the predictive performance, at the end of the day, still depends on the ability to find statistical relationships, however hidden and convoluted they might be. In a market with constantly changing influencing factors, these interrelations are hard to find, even under normal conditions. Add craziness as another complexity layer and your neural network’s only output will be white noise. At least with mine, this is the case.

So what can we do?

I hear many people talking about tulips lately. They refer, of cause, to the Tulip mania in the early 17th century. They point out parallels of today’s Bitcoin exchange rate to the historic tulip prices, to point out that Bitcoin is a case of an irrationally inflated bubble that is doomed to burst.

Indeed there seems to be a good portion of irrationality. In the past, whenever we saw a hike in the Bitcoin price, it came with an obvious explanation: The disappropriation of Russian bank customers in Cyprus; the Indian demonetarization policy; gambling in China. The last event in this series was the cancelled hard fork in November. Although the problem addressed by the proposed fork has not been solved yet by any means, since then the price has more then tripled. I don’t believe, that many of the buyers have a firm grasp of blockchain hard forks. It just does not justify the latest price hike.

Use this finding as input for a little ducktesting,  and you will likely come to the conclusion that we are, in fact, dealing with a bad case of a speculative bubble.

So is this the time to abandon Bitcoin and blockchain technology and move on the the next cool thing? Or walk back to something more conservative?

Let’s come back to the tulips: What happened after the bubble has burst? Take a walk through almost any neighborhood in almost any western community, and you see, that, while the tulip bubble is gone, the tulips are still there.  They can be found in most private and public gardens. They cover a significant share of the land surface of the Netherlands and represent a small but notable share of the Dutch economy.

To me, this looks like a blueprint for the further way of Bitcoin. The current craze has the beautiful side effect, that for the first time people with no immediate need and no interest in the technology, create Bitcoin wallets and acquire cryptocurrency. No matter if the price stabilizes at the current level or it crashes and then stabilizes at a much lower level: The wallets will still be there and people will still own Bitcoin and know a lot more about it then a few months ago.

No matter how this ends: When it’s over, Bitcoin will likely be ubiquitous in more and more areas, like tulips are today. We might finally enter a phase where Bitcoin will be used the intended way: as a currency.

As a conclusion: This is not the time to leave the field of Bitcoin. If anything it is a good time to enter the area of cryptocurrencies and blockchain technology, because no matter if the current market is a bubble or just very healthy growth: It will contribute to a much broader use of the technology in the next few years.


Building the Reinforcement Learning Framework

To build our reinforcement learning framework, we are going to follow the basic recipe laid out in the February 2015 Nature article “Human-level control through deep reinforcement learning” (

Reinforcement learning has been shown to reach human and superhuman performance in board games and video games. Transferring the methods and experiences from this to the use case of trading goods or securities seems promising, because it has many similar characteristics:

  • interaction with an environment that represents certain aspects of the real world,
  • a limited set of actions to interact with this environment,
  • a well-defined success measure (called “reward”),
  • past actions determine the future rewards,
  • a finite, semi structured definition of the state of the environment,
  • unfeasibility of directly determining the future outcome of an action due to a prohibitively large decision tree, incomplete information and missing knowledge about the interaction between the influencing factors.

Our inference engine is going to be Deeplearning4J (DL4J, see The DL4J website contains a very brief and well written introduction to reinforcement learning, which I highly recommend, if you are not familiar with the concept yet:

The first step in implementing a RL framework for Bitcoin trading is, to map the conceptual elements of the process to our use case:

  • Action
  • Actor / Agent
  • Environment
  • State
  • Reward


An action is a distinct operation with a direct impact on the state of the actor and the environment. In the case of game playing, placing a tile on a specific field on the board or moving a joystick in a certain direction are examples of actions. In the case of Bitcoin trading, the obvious actions are placing and cancelling orders to buy or sell certain amounts of Bitcoin at a given cryptocurrency exchange.

A smaller set of actions improves the learning speed. For optimal performance we will restrain our action set to only three possible actions for now:

  1. Cancel all open sell orders and place a buy order at the last market price using 10% of the available USD assets.
  2. Cancel all open buy orders and place a sell order at the last market price using 10% of the available Bitcoin assets.
  3. Hold (do nothing).

In a later version we will likely extend this to

  • have cancelling and placing orders as distinct actions,
  • a larger variety of amounts (other than “10% of available assets”) to use for buy and sell orders,
  • different limits, above and below the last market price.

But for gaining experience, and to go easy on our computational resources, we are going to keep it simple for now.


The actor is our trading bot, which is using the Bitstamp-API to place and cancel orders. We are going to reuse existing Java code from the old trading system for this.


Since we don’t want to reinvent the data collection and we already have collected several years worth of training data, the environment is given by all the data sources that we have defined in the old trading system. (


The current state of the environment is the input for the inference machine. We can reuse the format that we have used for the old Bitcoin prediction system for this ( It has some issues that we might address later, but we don’t want to reinvent the wheel so we stick to it for now.


We have two possible ways to define the reward:

  • After each executed sell order: the difference between the sell price and the previous average buy prices of the sold Bitcoins, minus transaction costs
  • In each step: the difference of the current net value (USD +BTC) and the net value in the previous step.

The first option compares better to the game analogy and also takes advantage of one of the key features of reinforcement learning : assessing future outcomes of current actions), but the second option promises faster convergence, so to begin, we choose the second option.


Transition to Reinforcement Learning

Our current goal is to introduce Reinforcement Learning into the decision making component of an existing Bitcoin trading system. To put this into a broader context: What we are about to do is — in the flowery terms of business speak — the upgrade from “Predictive Analytics” to “Prescriptive Analytics”. Please google for “Analytic Value Escalator” for a 30000 ft. view, and think a minute about this: Is the question “What will happen” really harder to answer then “Why did it happen”? (In data science you have to take market research serious, even if it hurts.)

Walking up a step on this escalator, with the task at hand, we are interested in going from the question “What will happen?” to “How can we make it happen?”. Again, I am not completely convinced, that the latter one is more difficult to answer, but it requires us to make a big architectural change in the trading system.

To understand this, let’s have a look on the training pipeline that we have used so far.

Old Architecture: Separation of Concerns


We had three servers involved:

  • A build server running Jenkins, which is used for multiple projects and not of particular interest for us here and now.
  • A server running the trading software, called “Trade Hardware”, executing a stack of shell scripts and Java programs.
  • A powerful machine for computationally intense tasks implemented in Matlab and Java, called “Compute Hardware”.

Here is what happens at the blue numbered circles:

  1. Once a week the build server triggered a training session. The reason for regular re-training is, that we wanted to have current trends in the market behavior to be reflected in our prediction model. Also, each week we had aggregated significantly more training data. More training data promised better prediction results.
  2. Input variables are collected and normalized for neural network training.
  3. Target variables are calculated for training. We have used a binary classifier with 19 output variables that predicted different events like this: “The BTC/USD rate will go up by 2% in the next 20 minutes”.
  4. To reduce the size of the input, a PCA was performed and only the strongest factors were used as input variables. The PCA Eigenvectors and the normalization factors from step 3 are stored for later, to transform raw input data in production to a format consistent with the training input.
  5. The previous neural network model is loaded to be used as initial model for training.
  6. The training is run in Matlab. We don’t need to dive deeper into this topic, because in the new architecture, we will use Deeplearning4J instead of Matlab for the training step.
  7. The new trained model is stored.
  8. The new model is tested in an extensive trade simulation.
  9. The trading software is restarted so it uses the updated model.
  10. Normal trading goes on for the rest of the week.

New architecture: Tight Integration

This pipeline has been built around the concept of a strict separation between trading execution and prediction. The prediction algorithm was part of a decision making module, which itself was just a plugin module of the trading software which could be replaced by another implementation that encodes another strategy. This was actually used to assess simulation results: To determine a baseline performance the decision making component in the simulation has been replaced by one that follows a random trading strategy.

With the transition to reinforcement learning, this strict separation goes away. The learning agent learns from the interaction with the environment, so it completely assumes the role of the decision making component.  From a system design perspective, this makes our life much easier, because many hard questions in the decision making component are now covered by machine learning:  How to interpret the predictions? Where to set thresholds for the output variables? What percentage of available assets to use in a trade? The reinforcement learning agent produces a finished decision that can be directly converted into a buy- or sell-order.

Also the agent does not stop learning once it is in production. The learning is a permanent background process, that takes place during trading. This means, that after the initial training phase, we can retire the “Compute Hardware”, because there is no necessity for weekly retraining.

All this looks lean and efficient at first glance, but it will create problems down the road:

  • The tight integration between machine learning and business logic results in a monolithic architecture,  which will be hard to maintain.
  • The interface between data science and software development has been largely inflated. In the old architecture, the responsibility of data science ended with the prediction, and software development could take on from there. Both groups worked strictly within the bounds of their traditional realms, each with their established tools and processes, which have very little in common, other than the fact that they mostly run on computers. The new architecture leads to a huge overlap of responsibilities, which will require new tools, a common language, mutual understanding and a lot of patience with each other.


Even without looking at the specifics of Reinforcement Learning, we already see, that the system design will become much simpler. The machine learning subsystem assumes more responsibility, leaving less moving parts in the rest of the system.

The project management on the other hand might turn out to be challenging, because the different worlds of data science and software development need to work much closer together.

Further Reading

Deep Reinforcement Learning for Bitcoin trading

It’s been more than a year, since the last entry regarding automated Bitcoin trading has been published here. The series was supposed to cover a project, in which we have used deep learning to predict Bitcoin exchange rates for fun and profit.

We have developed the system in 2014 and operated it all through the year 2015. It has performed very well during the first 3 quarters of 2015, … and terribly during the last quarter. At the end of the year we have stopped it. Despite serious losses during the last three months, it can still be considered a solid overall success.

I have never finished the series, but recently we have deployed a new version, which includes some major changes, that hopefully will turn out to be improvements:

  • We use Reinforcement Learning, following DeepMind’s basic recipe (Deep Q-learning with Experience Replay) from the iconic Atari article in Nature magazine. This eliminates the separation of prediction and trading as distinct processes. The inference component directly creates a buy/sell decision instead of just a prediction. Furthermore the new approach eliminates the separation of training and production (after an initial training phase). The neural network is trained continuously on the trading machine. No more downtime is needed for re-training once a week, and no separate compute hardware is lying idle with nothing to do for the other six days of the week.
  • We use Deeplearning4J (DL4J) instead of Matlab code for the training of the neural network. DL4J is a Java framework for defining, training and executing machine learning models. It integrates nicely with the trading code, which is written in Java.

This will change the course of this blog. Instead of finishing the report on what we have done in 2014, I am now planning to write about the new system. It turns out, that most of the code we have looked at so far, is also in the new system, so we can just continue where we left off a year ago.

Why Neural Networks work

One popular explanation of the fact, that artificial neural networks can do what they can do, goes along these lines:

  1. A brain is capable to do these things.
  2. An artificial neural network is a simulation of a brain.
  3. Therefor an artificial neural network can do these things, too.

20160222_091412.jpgAdmittedly, most things in computer science that “work” in the sense, that they produce useful output for the real world, are implementations of theoretical models that have been build by other sciences, so this is kind of a valid explanation. I don’t like it anyway.

My first problem with it is this: it’s not quite true, that an artificial neural network (ANN) is a simulation of a brain. To be fair, some  come impressively close. But in the context of this blog we unambitiously restrain ourselves to the level of sophistication that we find in most real world ANNs, which are radical (radical!) simplifications of even the simplest natural neural networks.

Second: it does not help most people to understand, why the ANN is capable of doing useful work. Unless you already understand the brain, it won’t help you much, when I tell you, how we are going to map gray matter to mathematical concepts. (And if you already understand the brain: Welcome to the blog. You can skip the rest of this post, if you want.)

I want to come from the other side, and approach the topic as an engineering problem. Buckle up. We are going to manufacture a special purpose classification machine, and then (in a later post) we will generalize it and see if the result has any similarity to what the neuro sciences know about the brain.

As a basic motivation, let’s assume, that your boss has found a webcam that shows a stock market chart (like this), and came up with this brilliant idea: he will become insanely rich with a new software, that reads the chart and outputs some kind of likelihood, that the market is in an upward trend. Your boss calls this likelihood “Zuversicht”, and we are going to stick with this term for a while, because we like German words, and because the corresponding English word (“confidence”) already has a certain meaning in statistics, and we want to prevent confusion resulting from ambiguous terminology.

Ok, now our input is an image from a webcam, so we have a two dimensional array of pixel colors. To make it easier, you convert the image to grayscale, so you only have to think about the pixels’ brightness and ignore hue and saturation. You look at some examples of upwards trends and can’t help but to observe, that the lines tend to start in the lower left corner and zigzag their way to the upper right corner.


Breaking down the image in quadrants, you notice, that in these cases the average pixel brightness in Q2 and Q3 is higher than in Q1 and Q4. With this insight you write the following lines of code and declare your job done.

double zuversichtUptrend(double[][] image){
  int imgLenth = image.size();
  int imgWidth = image[0].size();
  double averageBrightnessQ1 = Math.avg(Arrays.subarray(image,0,imgLenght/2,0,imgHeight/2);
  return averageBrightnessQ2 + averageBrightnessQ3  


It works great on the test data, your boss is happy and his boss gives him a raise. But a few weeks later, he tells you, that he’s not happy anymore. He has not become insanely rich!

What went wrong?

Apparently, your program has mis-classified the trend on several occasions. So you have a look on the chart images for these days, and see two major flaws of your approach:


  1. On some days, the chart went almost flat or turned back to negative. The chart was just low enough in the early hours to run through Q3 and just high enough in the later hours to run mostly through Q2.
  2. On other days the chart went clearly down, but your software’s Zuversicht value was very high. The reason turns out to be, that the overall brightness of the picture was high on those days, illuminating Q2 and Q3 without the chart line covering much space in them.

So you start the second iteration of your engineering endeavor.

To solve problem 1, you obviously need a higher resolution. Let’s try 8×8! This partition conveniently allows us to identify each field with a chessboard notation.


A perfect upward trend, wich your boss defines as a straight line from the lower left to the upper right, will result in the fields A1, B2, …, H8 lit up and the other fields remain dark. The flat chart from problem 1 will rather lite up the fields A4, B4, .., G5, H5. Great, but what about all the other possible charts that show a trend that goes upward in a non steady, somewhat chaotic fashion? This is, after all, rather the norm then the exception.


Let’s add some fuzziness to the system.  The intuition is like this: For each field you guess the probability that the full chart shows an overall upward trend if the particular field is lit. For example, if the lower right corner (H1) is lit, the probability of an overall positive trend is zero. If the field left to it (G1) is lit, the probability is close to zero, but there is still a possibility, that the chart makes a radical upward turn in the remaining 1/8 of the chart. The closer you get to the perfect upward trend, the higher the probability becomes.


You call the resulting 8×8 numbers a “weight matrix”. You can use it as a filter for the actual chart images by doing the following:

  1. For each field of the loaded picture you multiply the actual average brightness with the corresponding value in the probability matrix. The product will be a high value, when the average brightness is high and the value in the probability matrix is high. Otherwise it is a low value. You repeat this step for each field, 64 times altogether
  2. You add up all the  products.

The closer the actual chart zigzags around the ideal chart, the higher the sum will be. But even when the actual chart goes astray: if it remains on a positive trajectory, we will get a relatively high result in this calculation.

So far so good. Let’s  look at the second problem: the tide lifts all boats and the ceiling light lights up all pixels. When someone turns on the light in the trading room, all pixels in the Webcam picture become brighter. Even areas that are not trespassed by the line chart seem brighter, which renders our filtering result worthless.

Let’s add a preprocessing step to fix this.  If there was no line chart in the picture, all pixels would have approximately the same brightness and they were supposed to be black (brightness zero). If in this case, you would subtract the overall average brightness from each fields measured brightness, the result would be all black fields. Subtracting the overall average brightness normalizes the picture to what it is needed for our further processing.

Now add the line chart to your consideration. Because it covers only a very small fraction of the image, it does not change the overall average brightness too much. The light noise that illuminated the dark parts of the image, also made the bright parts (the lines of the chart) brighter. So if we subtract the average overall brightness from the bright pixels, we also normalize those parts of the image to what is expected as an input for the next processing step.

Great, now you know what to do to solve problem 2. Question is: How do you do it. Wouldn’t it be great if you could implement both processing steps in a unified way. In other words: is it possible to define a weight matrix in such a way that when we apply it to the input data, the average overall brightness subtraction of your preprocessing step is executed. Turns out: it is possible.

Imagine the following weight matrix for field A1:

  • Value at position A1: 1-1/64
  • Value at all other positions: -1/64

Please convince yourself, that this Matrix will do the average subtraction for position A1. Of course, this works just as well for all other positions.


Hmmm, interesting, you just solved two seemingly totally different problems with the same approach. It feels a little odd to define a huge matrix for a calculation that could easily be done procedurally, but you have a feeling, that there might be a systematical advantage in a unified way to tackle problems in this project. Also, of course, you know, that vector (and with it matrix-) calculations are the strongpoint of GPU data processing as well as highly optimized Software packages like Matlab (“Matrix Lab”!) and Octave. You feel that after your initial success, your boss might become greedy, which will ultimately put more load on your software. Having some strong performance afterburners like these in your arsenal, might come out handy later.

Your overall process has three steps now:

  1. You create an 8×8 matrix from the image data as input data layer. (To facilitate vector operations, you “flatten” this matrix to a vector of lenght 64, but that’s an implementation detail).
  2. For each field you apply the corresponding preprocessing 8×8 weight matrix to whole input layer 8×8 matrix. The result is a new 8×8 matrix, which you call the “hidden layer“. (And in the real world, you would do this again with “flattened” vectors and a large 64×64 weight matrix representing all fields. This is mathematically equivalent and can be well parallelized. Again: just an implementation detail)
  3. You apply the classification weight matrix an the hidden layer and get the Zuversicht value as output.


There you go: without thinking much about neurons, synapses and ganglia, you have handcrafted your first artificial neural network. Your new software is actually what people call a Feedforward Neural Network with a linear activation function. When you define a threshold value for the Zuversicht output, you also have a binary linear classifier.

Your neural network is still far from being perfect. You will eventually get there, but not today. Lets just mention a few things that you would need to think about before going into production:

  • It is not able to learn! It works because you were able to provide a “model” (that is the weights in the weight matrices). This is good enough for now, but for the future we prefer to let the computer do the work of figuring out the model data.
  • It is not well protected from eccentric input data. Imagine what happens, if a camera error or a data transfer problem produces for a single pixel a value of  325212498434 instead of a value in the expected range between 0 and 1.
  • It will still fail to make your boss immeasurably rich, because it does not predict anything. It only classifies a chart as close enough to your bosses definition of a perfect chart. This is, what he wanted, so it is partly his fault. But we nevertheless can do better.

Even with these shortcomings, you hopefully have built up some comprehension as to how a neural network is able to recognize a pattern. We have seen that, even without actively imitating nature, we get to a similar result, when we just work our way to the best solution in a straightforward manner.

A little heads-up: In the next post, we will build the software to convert the collected Bitcoin price and market data to a format suited as an input data layer for a neural network like this. If your data collector from the previous post is not running yet, please start it soon to have some data to play with next time.



Wrapping up data collection


To finally start with the data collection, you need some framework for your Quote Readers. So this is, what we are going to build next.

We use three external libraries to facilitate this task:

  • Google Gueva V. 19.0 (Apache license 2.0)
  • Cron4J 2.2 by Sauron Software (LGPL license)
  • JFreeChart 1.0.19 by Object Refinery (LGPL license).

Cron4J provides a lightweight scheduler, which we use to repeatedly execute the following process in a controlled manner:

  1. Load data from online resources using a bunch of QuoteReaders.
  2. Write the data to a new line in an output file.
  3. Visualize the data in a line chart.

From the Gueva library we use the EventBus to decouple the main process from the data output to keep technical problems contained. Please note, that the EventBus is marked @Beta in Gueva.

JFreeChart gives us an easy way to output the data as a chart, which is sometimes helpful.


Let’s examine the main method.

 * Main method
 * @param args Optional: 1. chart|nochart, 2. output filename.
public static void main(String[] args) {

 // add a shutdown hook for finishing the output file when the process is
 // stopped

 // Quote Reader for Bitstamp data
 final QuoteReader rdr = new BitstampQuoteReader();

 // etf tracking the csi300
 final QuoteReader yqrSha = new YahooFinanceQuoteReader(
 final QuoteReader yqrCac40 = new YahooFinanceQuoteReader(
 final QuoteReader yqrChf = new YahooFinanceQuoteReader(
 final QuoteReader yqrZar = new YahooFinanceQuoteReader(
 final QuoteReader rqrEur = new YahooFinanceQuoteReader(
 final QuoteReader yqrRub = new YahooFinanceQuoteReader(
 final QuoteReader yqrCny = new YahooFinanceQuoteReader(
 final QuoteReader yqrGoldFutures = new YahooFinanceQuoteReader(
 // SPDR Dow Jones Industrial Average ETF (DIA) reflects the Dow Jones
 // Index.
 // we need to use this, because Yahoo no longer provides the original
 // index
 // as CSV
 final QuoteReader yqrDowJones = new YahooFinanceQuoteReader(

 // create event bus
 final EventBus eventBus = new EventBus("hsecDataCollectorEventBus");

 // register file writer
 eventBus.register(new OutputFileWriter(getOutputFileName(args)));

 // if paramter "chart" is set, register ChartWriter
 if (checkChartArgument(args))
 eventBus.register(new ChartWriter());

 // -------------------
 // here would be the place to register more event handlers
 // like a database writer, RSS publisher ...
 // -------------------

 final IntHolder failedConsecutiveLoops = new IntHolder();

 // a random offset time for scheduled events for desynchronization of
 // running instances:
 // prevents that all instances of this program access the APIs at the
 // first second of each minute.
 final long offsetTime = Math.round(Math.random() * 59000d);

 Runnable dataCollectorSchedulerTask = new Runnable() {
 public void run() {
   //lets look at this later...

 // Scheduler for data collection
 Scheduler dataCollectorScheduler = new Scheduler();

 // keep running as long as everything is ok.
 while (keepRunning && failedConsecutiveLoops.i > 100) {
 try {
 } catch (InterruptedException e) {

Starting in line 16, we initialize the following quote readers:

Symbol Comment
BTC Bitcoin in USD, Quote from Bitstamp.
ASHR An ETF tracking the CSI500, giving us information about the health of Chinese markets.
^FCHI CAC 40 Quote from Yahoo
CHFUSD.X CHF in USD, Quote from Yahoo
ZARUSD.X ZAR in USD, Quote from Yahoo
EURUSD.X EUR in USD, Quote from Yahoo
RUBUSD.X RUB in USD, Quote from Yahoo
CNY.X CNY in USD, Quote from Yahoo
GCJ16.CMX Gold Futures, Quote from Yahoo
DIA An ETF tracking the Dow Jones Industrial Index.

Some indices are not provided by Yahoo when using the API, but for our purpose, an ETF tracking the index works just as well.

Starting in line 40, we initialize the EventBus and register the data sinks.

In line 68ff, we prepare and start the Scheduler to execute the main loop once a minute.

Next we take a closer look on the Runnable:

public void run() {
 if (failedConsecutiveLoops.i > 5) {
 // something's stinky. Sleep some extra time to give remote
 // systems time to recover.
 if (failedConsecutiveLoops.i % 7 != 0) {
 // sleep offset time
 try {
 } catch (InterruptedException e1) {
 // ignore

 // read and publish quotes
 QuotesChangedEvent e = new QuotesChangedEvent();
 e.currentBtcQuote = rdr.getCurrentQuote();
 e.quoteSha = yqrSha.getCurrentQuote();
 e.quoteCac40 = yqrCac40.getCurrentQuote();
 e.quoteChf = yqrChf.getCurrentQuote();
 e.quoteZar = yqrZar.getCurrentQuote();
 e.quoteEur = rqrEur.getCurrentQuote();
 e.quoteRub = yqrRub.getCurrentQuote();
 e.quoteGoldFuture = yqrGoldFutures.getCurrentQuote(); 

 e.quoteCny = yqrCny.getCurrentQuote();
 e.quoteDJI = yqrDowJones.getCurrentQuote();
 e.bidBtc = rdr.getBid();
 e.askBtc = rdr.getAsk();
 e.min24Btc = rdr.getMin24();
 e.max24Btc = rdr.getMax24();
 e.volume24Btc = rdr.getVolume24();
 e.vwapBtc = rdr.getVwap();

 // Post event;

 // reset failed loop counter
 failedConsecutiveLoops.i = 0;

The counter “failedConsecutiveLoop” contains the number of runs that have failed so far in a row. Normally it should be 0. From time to time it will be a small number. When it reaches a big number, we assume that something is broken and will not recover in the forseable future. In this case we stop the scheduler and end the program.


This class is just a plain data container. For the sake of readability we don’t use getters and setters.

 * Event to be fired when new quotes have been loaded.
 static final class QuotesChangedEvent {
 double currentBtcQuote;
 double quoteSha;
 double quoteCac40;
 double quoteChf;
 double quoteZar;
 double quoteEur;
 double quoteRub;
 double quoteGoldFuture;
 double quoteCny;
 double quoteDJI;
 double bidBtc;
 double askBtc;
 double min24Btc;
 double max24Btc;
 double volume24Btc;
 double vwapBtc;

Data Sinks

The data sinks are ChartWriter and FileOutputWriter. There is not much to explain. You could use them as a template for other data sinks. The one that probably makes most sense is a relational database.

 * Writes a set of quotes to a line chart.
static final class ChartWriter {
 * dataset object for chart
 DefaultCategoryDataset dataset = new DefaultCategoryDataset();

 * initialize the chart.
 * @return JFreeChart object to feed.
 private JFreeChart createLineChart() {
 JFreeChart createLineChart = ChartFactory.createLineChart(
 "data collection", "", "Range", dataset);
 JFreeChart lineChart = createLineChart;
 return createLineChart;

 public ChartWriter() {
 try {
 JFrame frame = new JFrame();
 ChartPanel cp = new ChartPanel(createLineChart());
 frame.setSize(1200, 700);;
 } catch (Exception e) {

 public void recordQuoteChange(QuotesChangedEvent e) {
 String lTimeFormatted = DateFormat.getTimeInstance().format(
 new Date());

 // some normalization for the charts, so all values are in the same
 // ballpark
 dataset.addValue(e.currentBtcQuote / 100, "BTC", lTimeFormatted);
 dataset.addValue(e.quoteSha / 30, "SHA", lTimeFormatted);
 dataset.addValue(e.quoteCac40 / 5000, "Cac40", lTimeFormatted);
 dataset.addValue(e.quoteChf, "CHF", lTimeFormatted);
 dataset.addValue(e.quoteZar * 10, "ZAR", lTimeFormatted);
 dataset.addValue(e.quoteEur, "EUR", lTimeFormatted);
 dataset.addValue(e.quoteRub * 10, "RUB", lTimeFormatted);
 dataset.addValue(e.quoteGoldFuture / 20000, "Gold Fut",
 dataset.addValue(e.quoteCny / 10, "CNY", lTimeFormatted);
 dataset.addValue(e.quoteDJI / 1000, "DJI", lTimeFormatted);

 if (dataset.getColumnCount() > 1000) {


 * Writes a set of quotes to a new line in the output file.
static final class OutputFileWriter {
 * writer for collected data
 private FileWriter w;

 * Constructor
 * @param outputFileName not null
 public OutputFileWriter(String outputFileName) {
 try {
 w = new FileWriter(outputFileName);
 + "\tquoteSha\tquoteCac40\tquoteChf\tquoteZar\tquoteEur\tquoteRub\tquoteGold"
 + "\tquoteCny\tquoteDji"
 + "\tbid\task\tmin24\tmax24\tvolume24\tvwap");
 } catch (Exception e) {

 public void recordQuoteChange(QuotesChangedEvent e) {
 try {

 + "\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f", e.currentBtcQuote,
 e.quoteSha, e.quoteCac40, e.quoteChf, e.quoteZar,
 e.quoteEur, e.quoteRub, e.quoteGoldFuture, e.quoteCny,
 e.bidBtc, e.askBtc, e.min24Btc, e.max24Btc,
 e.volume24Btc, e.vwapBtc));
 } catch (IOException ioe) {
 log.warning("failed to write out data");

A few auxiliary methods:

* check if chart parameter has been set
private static boolean checkChartArgument(String[] args) {
return args != null && args.length > 0
&& "chart".equalsIgnoreCase(args[0]);

* read the output filename from arguments
private static String getOutputFileName(String[] args) {
return (args != null && args.length > 1) ? args[1] : "output.csv";

* clean up when the VM is about to be terminated
private static void addShutdownHook() {
final Thread mainThread = Thread.currentThread();
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() {
log.warning("VM is being shut down now.");
// stop schedulers
keepRunning = false;
try {
} catch (InterruptedException e) {
log.log(Level.SEVERE, "", e);

And for the sake of completeness: here are the statics:

 * Data Collector schedule 1/minute (cron string)
 private static final String SCHEDULE_EVERY_MINUTE = "* * * * *";

 * Logger
 static Logger log = Logger.getLogger(DataCollector.class.getName());

 * Data collection proceeds, while the value is true.
 static volatile boolean keepRunning = true;

To run the programm from the console, type

java de.hsec.datascience.btctrader.DataCollector chart myCollectedData.csv

If you (like me) run the data collection on a remote machine without GUI, then you want to use “nochart” instead of “chart” as first parameter.


You have reached the first important milestone. There is still a lot of work ahead:

  • Building the Prediction Neural Network
  • Training the Neural Network
  • Testing the Neural Network
  • Creating a Bitstamp acount and deposit funds
  • Building a trading bot that uses the predictions of your Neural Network for automated transactions on the Bitstamp exchange.

That is quite some ground to cover and it will take us a while to get there, but while you go the next steps, you can already collect that data, that you need later for training and testing.