Getting started

20160222_091116.jpgUnfortunately every data science project starts with the somewhat tedious task of data acquisition and organization.

To accomplish anything at all, the first thing you’ll need is training data. So before anything else, you want to start collecting a lot of it.

Now, in a professional setting, you want to take some time and think about the volume and structure of your input- and output-data. Therefore I don’t recommend doing at work, what we are about to do next.

We postpone the careful thinking for now, because we want to get things moving, and we are pretty sure that, whatever the result of our (later) deep thinking might be, it will contain exchange rate ticker quotes for Bitcoin. Deliberating on this just a little bit further, we convince ourselves, that other ticker quotes (currency exchange rates, stock prices, economic indicators) might also be useful for the prediction of the Bitcoin price, and that there could be some data points related to the quotes that may give us a little statistical advantage, too.

While musing about that, it occurs to us, that for testing purposes we might also need a mechanism to easily generate random quotes, when no data source is available. And for assessing the quality of the training results, we might later need a mechanism to replay historical quotes to run old data against an updated neural network and find out, how well it would have performed during a certain time interval.

All these considerations lead us to our first little class diagram:


We see an interface for some sort of adapter (QuoteReader) with none of our favorite design patterns incorporated, and not even a data representation class around. I realize that this is scary. Get used to it! Because this is not an accident. We will do a lot of number crunching and – believe it or not – in this context, it is GOOD practice, to sacrifice beauty and object orientation on the altar of performance. The base rhythm of our architecture will be, to prevent object creation in critical areas whenever possible. We will use arrays instead of collections, unless we need collections in external libraries. We will use primitive data types whenever possible. It will feel a lot like 1985 with one positive side effect: We will be quite happy about these decisions when we try to communicate with the GPU later.

With this said, and the aesthetically minded among you properly scared, we move on to have a closer look on the interface:

package de.hsec.datascience.btctrader; 
 * Interface for Adapter classes to ticker information sources.
 * @author helmut
public interface QuoteReader {
 * Returns the current ticker value. Either a price fixed 
 * by a market maker or the last trading price. 
 public double getCurrentQuote();

 * Returns the highest bid price in the order book. 
 public double getBid();

 * Returns the lowest ask price in the order book. 
 public double getAsk();

 * Returns the lowest price of the last 24 hours. 
 public double getMin24();

 * Returns the highest price of the last 24 hours. 
 public double getMax24();

 * Returns the trade volume of the observed exchange. 
 public double getVolume24();

 * Returns the volume-weighted average price. 
 public double getVwap();



Ok, so we can use an instance of such a QuoteReader to access what seems to be market data from some exchange. The accessor methods come without a timestamp or index parameter so we (correctly) assume, that they return current data. We’ll have to discuss our working definition of the word “current” later.

In the next post, we’ll take a closer look on the implementations, especially the BitstampQuoteReader.

One thought on “Getting started

  1. Pingback: Start collecting data: the BitstampQuoteReader | notes on personal data science

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s