Still struggeling with GDPR

The GDPR,  and it’s German derivative BDSG-new, is in a sense like a prophecy from the oracle of Delphi. You hear the words, but no matter how hard you try, you can’t understand what they really mean, until the course of history knocks you down. Now, in Germany for regulations, the role of the “course of history” is assumed by the courts, and until they provide some clarification about what exactly constitutes compliant behavior, I believe that at this blog we have to take the words of the regulation literally, which leads to a number of restrictions and inconveniences in our communication. Scroll beyond the next two paragraphs for details.

To provide some context: The German version of the European general privacy protection regulation is very generic in the description of the requirements, and on the other hand draconian in the measures. The fines for noncompliance are clearly made to put you out of business forever.

Experience tells us, that the most absurd possible interpretation of the regulation will prevail in judicature, until after decades of mindless harassment of all well meaning parties involved, a high court cleans out the mess for good. In Germany this is almost always the case when the internet is involved.

As a consequence there is currently only one way for me to comply, and I have no idea how anyone else seems to get around it: I refuse to process any personal information in matters regarding this blog. So, like most people, I have turned off the comment function. Also I do not accept any direct electronic communication about this blog. If you are a resident of an EU country, please do not even try to send me emails. They will be deleted instantly. Instead, please post your questions, thoughts and comments on Facebook, Twitter, LinkedIn, Google+, etc..

If you need to send me a private message, please encrypt it using this key, and then again post it on Facebook, Google+, etc. using the hashtag #notesonpersonaldatascience. I will find it and answer using the same channel.

The point of this is: this mode of communication leaves none of your personal data in any computer, router, firewall, cache or backup disk under my control.

Of course, it will at the same time refine your profile at Facebook or Google. I am truly sorry for that, and I also assume, that this is the opposite of the intention of the lawmakers who created the GDPR.

If anyone comes up with a better solution, I will happily adopt it. Maybe this should be Watsons next challenge! Meanwhile things are what they are.

The good news at the end: Users from the EU are no longer locked out from NotesOnPersonalDataScience.

Advertisements

Timeout for European readers

A few days from now, the site NotesOnPersonalDataScience.wordpress.com will not be available to users in the European Union any more. I will put it back to normal operation, as soon as a few open questions regarding the GDPR law have been sorted out by German courts. I am optimistic, that this will not take too long. Sorry for the inconvenience.

The Force Awakens: AI and Modern Conflict — #MSC2018 warm up

The 54th Munich Security Conference had an unofficial pre-opening yesterday, with only a handful of the formal attendees and a public panel discussion about the upcoming role of AI in modern warfare. The panelists represented political and military entities and one NGO. This composition distinguished yesterday’s event from a technical conference in a way that was at the same time delightful and disturbing.

The most notable contributions came from the two and a half women on the stage. Kersti Kaljulaid, president of Estonia, offered some advice on how the executive might be able to contain the development of rogue AI. Her proposals filled the whole spectrum from helpless actionism (monitoring energy use, apparently hoping that the developers of bad AI don’t use cloud resources) to pragmatic and feasible, but generic approaches (build a blockchain based marketplace for whistleblowers to generate leads to malicious operations by their own flakey members).

Mary Wareham of Human Rights Watch coordinates the “Campaign to Stop Killer Robots”. She used the discussion to draw the attention to the question, what can be done by international agreements to prevent development and use of fully autonomous lethal weapons in warfare. Given the scope of the conference (and the fact that many of the folks involved in this discussion have to rely on second hand information when it comes to technical capabilities), this seems to be the only question really leading anywhere.

wp-1518767221682.jpeg

And then there was Sophia. She never held a public office or exerted much influence on international matters. But she is the first robotic citizen of Saudi Arabia, and she delivered the opening speech  of the day. Without an active role in the panel, she spent the rest of the event at the speaker’s desk, and it was quite entertaining to watch her (probably unintendedly) shaking her head when certain topics came up.

The other panelists were Darryl A. Williams, Lieutenant-General, Commander of the Land Forces of NATO, and

Anders Fogh Rasmussen, former NATO Secretary General. The moderator was NYT columnist David E. Sanger.

Applied AI with DeepLearning, IBM Watson IoT Data Science Certificate

I’ve just (literally minutes ago) completed “Applied AI with DeepLearning, IBM Watson IoT Data Science Certificate”. It is a very well prepared course by IBM — mostly by the very nice people of the Munich Watson IoT Center 🙂 and also some important portions by Skymind, the awesome creators of DL4J —  delivered through Coursera.

The course covers a lot of ground in a very short time. Details get lost at this speed, so if you look for a deep understanding of AI, you will be happier with some of the offerings of academia. But if you look for a refresher or an update on industry trends, this course is for you. Even more so, if you are an industry practitioner with a software background, and need to come up to speed on AI.

Here is the link to the course. If you have more time, and look for a solid foundation, I recommend Andrew Ng’s “Machine Learning”. Of course there is nothing to stop you from taking both courses…

 

 

What’s next for Bitcoin?

We have seen plenty of drama lately in the Bitcoin arena. Technological questions aside, it is time to re-evaluate the involvement in a market with looming roller coaster dynamics.

Originally, our motivation to favor cryptocurrency over other asset classes as a subject to automated trading, resulted from these four considerations:

  • Bitcoin exchanges like Bitstamp or Gemini are easily accessible from a software developers point of view. They have modern, well maintained and well documented APIs, and you don’t need to work for a financial services institution to receive access.
  • The trading hours are 24×7, so the trading bot does not run idle 2/3 of the time, which seems like a waste of resources.
  • Cryptocurrencies are cool. I don’t blame you if you beg to differ, with Bitcoin going more and more mainstream nowadays. But back in 2014 there was no dispute about it being the coolest thing since the UNIVAC 9000 series.
  • Little regulation. Don’t get me wrong: Regulation is a very very very good thing. After the 2010 flash crash, regulators in all major markets started to look very closely on automated trading, and put sensible restrictions in place. Since then we have seen a few more flash crashes, but non of them was nearly as severe as the 2010 incident. And this was certainly not due to a more responsible behavior of the market participants. So regulation is a good thing. But that said, if automated trading is what you want to do, it complicates your life. In German law, Bitcoin is neither a currency nor a security, so it is mostly unregulated, which made our project a little easier.

The last point was originally an advantage, but it seems to turn into a hassle now, because as it stands today, the market seems to go crazy. This is a problem, because it becomes inherently unpredictable.

gr13

It might not be obvious, but with a deep neural network the predictive performance, at the end of the day, still depends on the ability to find statistical relationships, however hidden and convoluted they might be. In a market with constantly changing influencing factors, these interrelations are hard to find, even under normal conditions. Add craziness as another complexity layer and your neural network’s only output will be white noise. At least with mine, this is the case.

So what can we do?

I hear many people talking about tulips lately. They refer, of cause, to the Tulip mania in the early 17th century. They point out parallels of today’s Bitcoin exchange rate to the historic tulip prices, to point out that Bitcoin is a case of an irrationally inflated bubble that is doomed to burst.

Indeed there seems to be a good portion of irrationality. In the past, whenever we saw a hike in the Bitcoin price, it came with an obvious explanation: The disappropriation of Russian bank customers in Cyprus; the Indian demonetarization policy; gambling in China. The last event in this series was the cancelled hard fork in November. Although the problem addressed by the proposed fork has not been solved yet by any means, since then the price has more then tripled. I don’t believe, that many of the buyers have a firm grasp of blockchain hard forks. It just does not justify the latest price hike.

Use this finding as input for a little ducktesting,  and you will likely come to the conclusion that we are, in fact, dealing with a bad case of a speculative bubble.

So is this the time to abandon Bitcoin and blockchain technology and move on the the next cool thing? Or walk back to something more conservative?

Let’s come back to the tulips: What happened after the bubble has burst? Take a walk through almost any neighborhood in almost any western community, and you see, that, while the tulip bubble is gone, the tulips are still there.  They can be found in most private and public gardens. They cover a significant share of the land surface of the Netherlands and represent a small but notable share of the Dutch economy.

To me, this looks like a blueprint for the further way of Bitcoin. The current craze has the beautiful side effect, that for the first time people with no immediate need and no interest in the technology, create Bitcoin wallets and acquire cryptocurrency. No matter if the price stabilizes at the current level or it crashes and then stabilizes at a much lower level: The wallets will still be there and people will still own Bitcoin and know a lot more about it then a few months ago.

No matter how this ends: When it’s over, Bitcoin will likely be ubiquitous in more and more areas, like tulips are today. We might finally enter a phase where Bitcoin will be used the intended way: as a currency.

As a conclusion: This is not the time to leave the field of Bitcoin. If anything it is a good time to enter the area of cryptocurrencies and blockchain technology, because no matter if the current market is a bubble or just very healthy growth: It will contribute to a much broader use of the technology in the next few years.

 

Building the Reinforcement Learning Framework

To build our reinforcement learning framework, we are going to follow the basic recipe laid out in the February 2015 Nature article “Human-level control through deep reinforcement learning” (http://dx.doi.org/10.1038/nature14236).

Reinforcement learning has been shown to reach human and superhuman performance in board games and video games. Transferring the methods and experiences from this to the use case of trading goods or securities seems promising, because it has many similar characteristics:

  • interaction with an environment that represents certain aspects of the real world,
  • a limited set of actions to interact with this environment,
  • a well-defined success measure (called “reward”),
  • past actions determine the future rewards,
  • a finite, semi structured definition of the state of the environment,
  • unfeasibility of directly determining the future outcome of an action due to a prohibitively large decision tree, incomplete information and missing knowledge about the interaction between the influencing factors.

Our inference engine is going to be Deeplearning4J (DL4J, see https://deeplearning4j.org/). The DL4J website contains a very brief and well written introduction to reinforcement learning, which I highly recommend, if you are not familiar with the concept yet: https://deeplearning4j.org/reinforcementlearning.

The first step in implementing a RL framework for Bitcoin trading is, to map the conceptual elements of the process to our use case:

  • Action
  • Actor / Agent
  • Environment
  • State
  • Reward

Action

An action is a distinct operation with a direct impact on the state of the actor and the environment. In the case of game playing, placing a tile on a specific field on the board or moving a joystick in a certain direction are examples of actions. In the case of Bitcoin trading, the obvious actions are placing and cancelling orders to buy or sell certain amounts of Bitcoin at a given cryptocurrency exchange.

A smaller set of actions improves the learning speed. For optimal performance we will restrain our action set to only three possible actions for now:

  1. Cancel all open sell orders and place a buy order at the last market price using 10% of the available USD assets.
  2. Cancel all open buy orders and place a sell order at the last market price using 10% of the available Bitcoin assets.
  3. Hold (do nothing).

In a later version we will likely extend this to

  • have cancelling and placing orders as distinct actions,
  • a larger variety of amounts (other than “10% of available assets”) to use for buy and sell orders,
  • different limits, above and below the last market price.

But for gaining experience, and to go easy on our computational resources, we are going to keep it simple for now.

Actor

The actor is our trading bot, which is using the Bitstamp-API to place and cancel orders. We are going to reuse existing Java code from the old trading system for this.

Environment

Since we don’t want to reinvent the data collection and we already have collected several years worth of training data, the environment is given by all the data sources that we have defined in the old trading system. (https://notesonpersonaldatascience.wordpress.com/2016/03/06/wrapping-up-data-collection/)

State

The current state of the environment is the input for the inference machine. We can reuse the format that we have used for the old Bitcoin prediction system for this (https://notesonpersonaldatascience.wordpress.com/2016/03/21/data-re-coding-1/). It has some issues that we might address later, but we don’t want to reinvent the wheel so we stick to it for now.

Reward

We have two possible ways to define the reward:

  • After each executed sell order: the difference between the sell price and the previous average buy prices of the sold Bitcoins, minus transaction costs
  • In each step: the difference of the current net value (USD +BTC) and the net value in the previous step.

The first option compares better to the game analogy and also takes advantage of one of the key features of reinforcement learning : assessing future outcomes of current actions), but the second option promises faster convergence, so to begin, we choose the second option.

 

Transition to Reinforcement Learning

Our current goal is to introduce Reinforcement Learning into the decision making component of an existing Bitcoin trading system. To put this into a broader context: What we are about to do is — in the flowery terms of business speak — the upgrade from “Predictive Analytics” to “Prescriptive Analytics”. Please google for “Analytic Value Escalator” for a 30000 ft. view, and think a minute about this: Is the question “What will happen” really harder to answer then “Why did it happen”? (In data science you have to take market research serious, even if it hurts.)

Walking up a step on this escalator, with the task at hand, we are interested in going from the question “What will happen?” to “How can we make it happen?”. Again, I am not completely convinced, that the latter one is more difficult to answer, but it requires us to make a big architectural change in the trading system.

To understand this, let’s have a look on the training pipeline that we have used so far.

Old Architecture: Separation of Concerns

BtcOldPredictionPipeline

We had three servers involved:

  • A build server running Jenkins, which is used for multiple projects and not of particular interest for us here and now.
  • A server running the trading software, called “Trade Hardware”, executing a stack of shell scripts and Java programs.
  • A powerful machine for computationally intense tasks implemented in Matlab and Java, called “Compute Hardware”.

Here is what happens at the blue numbered circles:

  1. Once a week the build server triggered a training session. The reason for regular re-training is, that we wanted to have current trends in the market behavior to be reflected in our prediction model. Also, each week we had aggregated significantly more training data. More training data promised better prediction results.
  2. Input variables are collected and normalized for neural network training.
  3. Target variables are calculated for training. We have used a binary classifier with 19 output variables that predicted different events like this: “The BTC/USD rate will go up by 2% in the next 20 minutes”.
  4. To reduce the size of the input, a PCA was performed and only the strongest factors were used as input variables. The PCA Eigenvectors and the normalization factors from step 3 are stored for later, to transform raw input data in production to a format consistent with the training input.
  5. The previous neural network model is loaded to be used as initial model for training.
  6. The training is run in Matlab. We don’t need to dive deeper into this topic, because in the new architecture, we will use Deeplearning4J instead of Matlab for the training step.
  7. The new trained model is stored.
  8. The new model is tested in an extensive trade simulation.
  9. The trading software is restarted so it uses the updated model.
  10. Normal trading goes on for the rest of the week.

New architecture: Tight Integration

This pipeline has been built around the concept of a strict separation between trading execution and prediction. The prediction algorithm was part of a decision making module, which itself was just a plugin module of the trading software which could be replaced by another implementation that encodes another strategy. This was actually used to assess simulation results: To determine a baseline performance the decision making component in the simulation has been replaced by one that follows a random trading strategy.

With the transition to reinforcement learning, this strict separation goes away. The learning agent learns from the interaction with the environment, so it completely assumes the role of the decision making component.  From a system design perspective, this makes our life much easier, because many hard questions in the decision making component are now covered by machine learning:  How to interpret the predictions? Where to set thresholds for the output variables? What percentage of available assets to use in a trade? The reinforcement learning agent produces a finished decision that can be directly converted into a buy- or sell-order.

Also the agent does not stop learning once it is in production. The learning is a permanent background process, that takes place during trading. This means, that after the initial training phase, we can retire the “Compute Hardware”, because there is no necessity for weekly retraining.

All this looks lean and efficient at first glance, but it will create problems down the road:

  • The tight integration between machine learning and business logic results in a monolithic architecture,  which will be hard to maintain.
  • The interface between data science and software development has been largely inflated. In the old architecture, the responsibility of data science ended with the prediction, and software development could take on from there. Both groups worked strictly within the bounds of their traditional realms, each with their established tools and processes, which have very little in common, other than the fact that they mostly run on computers. The new architecture leads to a huge overlap of responsibilities, which will require new tools, a common language, mutual understanding and a lot of patience with each other.

Conclusion

Even without looking at the specifics of Reinforcement Learning, we already see, that the system design will become much simpler. The machine learning subsystem assumes more responsibility, leaving less moving parts in the rest of the system.

The project management on the other hand might turn out to be challenging, because the different worlds of data science and software development need to work much closer together.

Further Reading

Deep Reinforcement Learning for Bitcoin trading

It’s been more than a year, since the last entry regarding automated Bitcoin trading has been published here. The series was supposed to cover a project, in which we have used deep learning to predict Bitcoin exchange rates for fun and profit.

We have developed the system in 2014 and operated it all through the year 2015. It has performed very well during the first 3 quarters of 2015, … and terribly during the last quarter. At the end of the year we have stopped it. Despite serious losses during the last three months, it can still be considered a solid overall success.

I have never finished the series, but recently we have deployed a new version, which includes some major changes, that hopefully will turn out to be improvements:

  • We use Reinforcement Learning, following DeepMind’s basic recipe (Deep Q-learning with Experience Replay) from the iconic Atari article in Nature magazine. This eliminates the separation of prediction and trading as distinct processes. The inference component directly creates a buy/sell decision instead of just a prediction. Furthermore the new approach eliminates the separation of training and production (after an initial training phase). The neural network is trained continuously on the trading machine. No more downtime is needed for re-training once a week, and no separate compute hardware is lying idle with nothing to do for the other six days of the week.
  • We use Deeplearning4J (DL4J) instead of Matlab code for the training of the neural network. DL4J is a Java framework for defining, training and executing machine learning models. It integrates nicely with the trading code, which is written in Java.

This will change the course of this blog. Instead of finishing the report on what we have done in 2014, I am now planning to write about the new system. It turns out, that most of the code we have looked at so far, is also in the new system, so we can just continue where we left off a year ago.

Machine Learning in the Gartner Hype Cycle

Dear Gartner Inc.,

while there is still some time until you publish the 2017 Hype Cycle for Emerging Technologies, I’d like to ask you, to correct a terrible mistake, that you made in the 2016 edition. When you look closely at the summit of the “peak of inflated expectations”, you will find “Machine Learning” (ML) there. You see, I agree with the general progression of the curve, but what you describe here is the world in 1968!

In the late 1960’s, the (western) wold expected way too much from what was then called “Artificial Intelligence”. It was the Cold War, and the US government couldn’t wait to get a machine that understands the Russian language. In the area of cognition, inventions like the perceptron and backpropagation were promising, but too computation intensive for the hardware of the time.

The following phases (“trough of disilluionment” and “slope of enlightenment” in your terms) are, what we like to call the first and second “AI winter” (1970’s – 1990’s). This dark time caused people who still worked on it to change the name of the whole field from “AI” to “Machine Learning”, which probably caused your confusion.

In 2017, Machine Learning is a commodity, which you can buy for pennies in arbitrary amounts at Amazon. Amazon Web Services that is. We are now at a point, that is way out of your chart.

As I see it, we are in a similar situation as in the early 1980’s with PCs: Suddenly John and Jane Doe can access technology that is so powerful, that nations would have waged wars for them, just e few years before. In 2017, organizations either hide their Machine Learning capabilities, or they manage expectations very carefully.

Needless to say, there will of course be some disappointment and disenchantment in some applications of ML, but the field in general will certainly remain as dynamic as it is currently for at least a few decades. And there will be no intermediate down-phase until people (and companies) can take advantage of this technology, to improve products, processes and many other aspects of work and life.

Yours Sincerely

Helmut Hauschild

 

Data Re-Coding 1

Previously on BigNotesOnPersonalDataScienceTheory: Leonardo, being the great experimentalist that he is, has been collecting Bitcoin price data for weeks, while Sheldono has figured out, why the price prediction with artificial neural networks should work in theory. Now they need Howardo: to cobble the theory and the data together into something, that is actually usable in the real world. Meanwhile it remains unknown, what Rajesho is up to.

Howardo’s job is difficult. Leonardo has provided him with endless time series data from the past. Sheldono gave him an extended, patronizing, dismissive and snotty lecture on how easy it is, to determine, whether or not some limited, rather static data matches some idiosyncratic pattern in the present. And as an output, everyone expects a clear confident prediction of the future. It is clearly impossible to get from the input of his friends to the desired output.

DSCN2853

Poor Howardo! His slight despair turns into utter panic, when he learns, that — of all people in the world — you have been assigned the task of helping him to sort things out.

You recognize, that Howardo has two distinct problems to solve:

  1. Convert the time series data to a format, that can be used as an input for the neural network. Like the webcam picture from the previous post, it should have a fixed size and should come as a coherent block of data, and not as a continuous data stream.
  2. Turn the pattern matching result into a prediction of the future.

“Wow, I didn’t notice that, thank you (RollEye)”, says Howardo, “but how are you going to see the future based on the matching result — without a crystal ball?”

“Well,” you say, “that’s easy. I did this for my boss before.” (Howardo gasps of relief).

You add: “It was a disaster” (Howardo hyperventilates).

Turning the pattern matching result into a prediction of the future

Let’s reconsider: what went wrong with your bosses trend prediction? The basic idea does not seem to be wrong: You recognize a trend and base your action on the assumption, that the trend can be extrapolated.

This works great in daily life and is the reason, why we are able to walk without falling over, to recognize when it is a good day to take an umbrella to work, and to avoid snowballs that people throw at us. All without a crystal ball. All learned by the neural network between our ears.

Your bosses artificial neural network was really good in recognizing a straight line, but failed miserably in predicting the future because the prediction was based on the wrong assumption, that a straight line is a good predictor of future price increases. As it turns out, it is not.

Your boss has only considered a chart of a whole trading day, which means, that he probably bought and sold his shares just before the closing bell. What happened in the early trading hours is rather irrelevant at this point. The overall trend of the day (if there was any) is replaced by a new trend that is fed out of the anticipation of what happens overnight in other markets.

If your boss had thought it to the end, the filter would have probably looked not like this:

btcBlog8x8FilterHeatmapUptrend

Instead it would be more like this:

btcBlog8x8FilterHeatmapUptrendOverweightLaterHours

Chances are, that with this filter as the intermediate layer weight matrix of the neural network, your boss would have earned some money. But the prediction performance still would be far from great. It would just be a tool to point out, what is already obvious. Instead of a bad predictor, we now have a mediocre predictor.

This is the point, where intelligent design reaches it’s limit. In order to make better predictions, the weights in the weight matrix must be learned from real data, and not set by you. You must get the neural network to adapt in response to success and failure, much like you learned to predict, if a snowballs trajectory ends in your face or not — from failure and success.

“Howardo”, you hear yourself saying, “we must feed the learning algorithm with the first half of the price’s trajectory. We let the neural network predict, if the trajectory leads to a pleasant impact spot or not. When it is wrong, we will punish it”.

Howardo comes back to life. “That sounds like fun” he says. You are not sure, what part of your proposal he refers to, but it’s probably the punishment part.

For reasons that will become apparent in a later post, you decide to call the partial price trajectory a “feature vector” and the information, if the resulting price is pleasant or not, “class label”.

With this insight, it becomes easy to define a data format, that is suitable for training the neural network:

  • The feature vector is the content of a sliding window that you pull through Leonardo’s historic Bitcoin price data. For example you could say, that for each point in time you read the previous 1000 price samples out of the time series into a feature vector.
  • For the class label you have to peek into the future. Thank goodness, time is relative and from your point in spacetime, the near future of all of Leonardo’s price samples is also in the past. So you decide that for each sample in the time series you compare the price with the price 10 samples later, which accords to “10 minutes later”. If the later price is higher, you choose the class label “pleasant”, otherwise the class label “unpleasant”.

When your neural network is finally fully trained with this data, it will still not be able to look into the future. But it will hopefully be able to classify an aggregation of 1000 consecutive price samples as member of a class of price trends, that — with a certain likelihood — will lead to a higher price 10 minutes later. And that’s all we can hope for.

This brings us to a little taxonomical oddity. In much of machine learning literature, “prediction” seems to be used as a synonym for “recognized class”. This leads to funny statements like “the classifier predicts that the picture shows a cat” — after the classifier has processed the picture of the cat. Better get used to it…

In the next post, we will examine the actual Java code for the data re-coding.