Data Re-Coding 1

Previously on BigNotesOnPersonalDataScienceTheory: Leonardo, being the great experimentalist that he is, has been collecting Bitcoin price data for weeks, while Sheldono has figured out, why the price prediction with artificial neural networks should work in theory. Now they need Howardo: to cobble the theory and the data together into something, that is actually usable in the real world. Meanwhile it remains unknown, what Rajesho is up to.

Howardo’s job is difficult. Leonardo has provided him with endless time series data from the past. Sheldono gave him an extended, patronizing, dismissive and snotty lecture on how easy it is, to determine, whether or not some limited, rather static data matches some idiosyncratic pattern in the present. And as an output, everyone expects a clear confident prediction of the future. It is clearly impossible to get from the input of his friends to the desired output.


Poor Howardo! His slight despair turns into utter panic, when he learns, that — of all people in the world — you have been assigned the task of helping him to sort things out.

You recognize, that Howardo has two distinct problems to solve:

  1. Convert the time series data to a format, that can be used as an input for the neural network. Like the webcam picture from the previous post, it should have a fixed size and should come as a coherent block of data, and not as a continuous data stream.
  2. Turn the pattern matching result into a prediction of the future.

“Wow, I didn’t notice that, thank you (RollEye)”, says Howardo, “but how are you going to see the future based on the matching result — without a crystal ball?”

“Well,” you say, “that’s easy. I did this for my boss before.” (Howardo gasps of relief).

You add: “It was a disaster” (Howardo hyperventilates).

Turning the pattern matching result into a prediction of the future

Let’s reconsider: what went wrong with your bosses trend prediction? The basic idea does not seem to be wrong: You recognize a trend and base your action on the assumption, that the trend can be extrapolated.

This works great in daily life and is the reason, why we are able to walk without falling over, to recognize when it is a good day to take an umbrella to work, and to avoid snowballs that people throw at us. All without a crystal ball. All learned by the neural network between our ears.

Your bosses artificial neural network was really good in recognizing a straight line, but failed miserably in predicting the future because the prediction was based on the wrong assumption, that a straight line is a good predictor of future price increases. As it turns out, it is not.

Your boss has only considered a chart of a whole trading day, which means, that he probably bought and sold his shares just before the closing bell. What happened in the early trading hours is rather irrelevant at this point. The overall trend of the day (if there was any) is replaced by a new trend that is fed out of the anticipation of what happens overnight in other markets.

If your boss had thought it to the end, the filter would have probably looked not like this:


Instead it would be more like this:


Chances are, that with this filter as the intermediate layer weight matrix of the neural network, your boss would have earned some money. But the prediction performance still would be far from great. It would just be a tool to point out, what is already obvious. Instead of a bad predictor, we now have a mediocre predictor.

This is the point, where intelligent design reaches it’s limit. In order to make better predictions, the weights in the weight matrix must be learned from real data, and not set by you. You must get the neural network to adapt in response to success and failure, much like you learned to predict, if a snowballs trajectory ends in your face or not — from failure and success.

“Howardo”, you hear yourself saying, “we must feed the learning algorithm with the first half of the price’s trajectory. We let the neural network predict, if the trajectory leads to a pleasant impact spot or not. When it is wrong, we will punish it”.

Howardo comes back to life. “That sounds like fun” he says. You are not sure, what part of your proposal he refers to, but it’s probably the punishment part.

For reasons that will become apparent in a later post, you decide to call the partial price trajectory a “feature vector” and the information, if the resulting price is pleasant or not, “class label”.

With this insight, it becomes easy to define a data format, that is suitable for training the neural network:

  • The feature vector is the content of a sliding window that you pull through Leonardo’s historic Bitcoin price data. For example you could say, that for each point in time you read the previous 1000 price samples out of the time series into a feature vector.
  • For the class label you have to peek into the future. Thank goodness, time is relative and from your point in spacetime, the near future of all of Leonardo’s price samples is also in the past. So you decide that for each sample in the time series you compare the price with the price 10 samples later, which accords to “10 minutes later”. If the later price is higher, you choose the class label “pleasant”, otherwise the class label “unpleasant”.

When your neural network is finally fully trained with this data, it will still not be able to look into the future. But it will hopefully be able to classify an aggregation of 1000 consecutive price samples as member of a class of price trends, that — with a certain likelihood — will lead to a higher price 10 minutes later. And that’s all we can hope for.

This brings us to a little taxonomical oddity. In much of machine learning literature, “prediction” seems to be used as a synonym for “recognized class”. This leads to funny statements like “the classifier predicts that the picture shows a cat” — after the classifier has processed the picture of the cat. Better get used to it…

In the next post, we will examine the actual Java code for the data re-coding.