Collecting more raw data with the Yahoo Quote Reader

2013-07-28_12-04-26_679In the previous post we have built a Quote Reader for Data from Bitstamp. It provides useful market information regarding Bitcoin trading on the Bitstamp platform. That is a good start, but certainly not enough for prognostic purposes. How shall we proceed?

The Bitcoin market is heavily affected by world events and economic factors. The steep November price hike was arguably induced by Chinese gambling trends that emerged during the decline of the domestic stock and housing market. The all time high of the Bitcoin price was a reaction to the imminent expropriation of Cypriot bank customers.

We conclude, that knowledge about the world beyond the Bitcoin markets could be useful for a price prediction. There is one particular class of information, that is easy to acquire and at the same time reflects relevant political shocks and the macro economic situation very well: stock quotes.

YahooFinanceQuoteReader

You have a great range of options for feeding stock ticker data to your software. One of the most prominent is Yahoo finance. The following Java class reads quotes for arbitrary ticker symbols from a Yahoo service that returns them as comma separated values (CSV).

 


package de.hsec.datascience.btctrader;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.logging.Logger;

/**
* Reads quotes from yahoo finance.
*
* @author helmut hauschild
*/
public class YahooFinanceQuoteReader implements QuoteReader {

/**
* Yahoo finance API URL for ticker symbol
*/
private URL yahooFinanceApiUrl;

/**
* the quote last read.
*/
private double currentQuote;

static Logger log = Logger.getLogger(YahooFinanceQuoteReader.class
.getName());

/**
* Constructor
*
* @param pTickerSymbol
* ticker symbol to read
*/
public YahooFinanceQuoteReader(String pTickerSymbol) {
super();
String lUrl = String.format(
"http://finance.yahoo.com/d/quotes.csv?s=%s&f=nl1px",
pTickerSymbol);
try {
yahooFinanceApiUrl = new URL(lUrl);
} catch (MalformedURLException e) {
throw new RuntimeException(
"Cannot initialize YahooQuoteReader class.", e);
}
}

/**
* Read and return a new ticker quote from Yahoo Finance.
*/
public double getCurrentQuote() {
readNextAndUpdate();
return currentQuote;
}

// other getters

public double getBid() {
// We don't have this information. Return 0.
return 0;
}

// ...

/**
* Read current data from the Yahoo Finance and update fields.
*/
private void readNextAndUpdate() {
BufferedReader in = null;
StringBuffer sb = new StringBuffer();
try {
in = new BufferedReader(new InputStreamReader(
yahooFinanceApiUrl.openStream()));

String inputLine;
while ((inputLine = in.readLine()) != null)
sb.append(inputLine);
currentQuote = Double.parseDouble(sb.toString().split(",")[1]);
} catch (Exception e) {
// catch-all because we don't have a surrounding framework (other
// then the
// JRE) to handle unexpected exceptions).
log.severe(String.format("Error reading data from yahoo api: %s",
e.getLocalizedMessage()));
return;
} finally {
try {
if (in != null)
in.close();
} catch (IOException e) {
log.severe("Unable to close input stream from yahoo api");
}
}
}

/**
* main method for simple tests
*/
public final static void main(String[] args) {
YahooFinanceQuoteReader rqr = new YahooFinanceQuoteReader("A1YKTG.F");
log.info("Next Quote: " + rqr.getCurrentQuote());
}
}

A few remarks on this:

  • The BitstampQuoteReader does not need to be initialized with a ticker symbol, because there is only one price to read. The YahooFinanceQuoteReader is a bit different in this respect. For each ticker symbol you are interested in, you create a new dedicated instance.
  • The CSV service returns only the current value and the opening price. This does not match the QuoteReader interface, but you want to implement it anyway, because you want to handle the input data in a unified way as much as possible. As a consequence, you have to provide empty implementations for methods like getBid, getAsk, etc..

The main method is added as an easy way to test the implementation. If you prefer a unit test, it should be fairly easy to move the code there.

GoogleFinanceQuoteReader

The implementation for the GoogleFinanceQuoteReader is very similar to the Yahoo version. The URL can be created like this:


new URL(String.format("http://www.google.com/finance/info?q=%s",tickerSymbol));

The response comes as JSON. I will skip the rest of the implementation here to prevent redundancy. Also, it is not clear, how much longer the service will be available. The Google finance API is deprecated since 2011.

 

 

Start collecting data: the BitstampQuoteReader

img_20151220_145133.jpgThe QuoteReader implementations we need for a start, are luckily quite simple. They really do just one thing: loading information from an URL and returning it on request.

The most important data source for our project will be the Bitstamp API, so we will start with this.

It contains public functions that can be used without authentication and without an API key.  They have a throughput limitation in place though. If you send more then 600 requests in 10 minutes, your IP address will be banned.

For now, we just need one API function: ticker. The URL is

https://www.bitstamp.net/api/ticker/

When you open it in a web browser, you see, that the default response format is JSON. I have used org.json-20120521.jar for JSON parsing. Since the format in the ticker is fairly simple, any JSON java library will probably do the job.

We end up with a function that looks somewhat like this:

 

/**
* Read current data from the Bitstamp API and update fields.
*/
private void readNextAndUpdate() {
BufferedReader in = null;
StringBuffer sb = new StringBuffer();
try {
in = new BufferedReader(new InputStreamReader(
BITSTAMP_API_URL.openStream()));

String inputLine;
while ((inputLine = in.readLine()) != null)
sb.append(inputLine);
} catch (IOException e) {
log.severe(String.format(
"Error reading data from bitstamp api: %s",
e.getLocalizedMessage()));
// IO Exceptions will happen from time to time. If there is no systematic
// problem, the best way for us to deal with them is, to log and ignore
// them. As a consequence, the next time interval will be executed with
// outdated data. But this data is only a minute old, so it will not result
// in completely insane predictions.
return;
} finally {
try {
in.close();
} catch (IOException e) {
log.severe("Unable to close input stream from bitstamp api");
}
}
try {

JSONObject jo = new JSONObject(sb.toString());
currentQuote = jo.getDouble("last");
high24 = jo.getDouble("high");
low24 = jo.getDouble("low");
volume = jo.getDouble("volume");
bid = jo.getDouble("bid");
ask = jo.getDouble("ask");
vwap = jo.getDouble("vwap");
log.info("quote from remote service: " + sb + "\nquote-Time: "
+ new Date(1000 * jo.getLong("timestamp")));

} catch (Exception e) {
// catch-all because we don't have a surrounding framework (other
// then the
// JRE) to handle unexpected exceptions).
log.severe(e.getLocalizedMessage());
}
}

Note: the last trade price is not necessarily a good representation for the current value of a Bitcoin, because it can easily be be manipulated during low volume times. For the prediction, this is ok, because we have a neural network, which will either downgrade the importance of this value, if it turns out not to contribute to future prices, or it even extracts additional predictive power from recognizing the manipulation. Either way is fine for us.

A problem might arise from it though during trading. When you want to sell a Bitcoin at the exchange, and the last price is lower then the fair price would be, then you might be tempted to offer your Coin to a lower price then necessary.

When I have noticed this, my system ran stable for quite a while, so I refrained from changing it. But when you start from the scratch, you might want to keep this in mind.

For the sake of completeness, here is the rest of the class:

package de.hsec.datascience.btctrader;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Date;
import java.util.logging.Logger;

import org.json.JSONObject;

/**
 * Reads a BTC quote from the Bitstamp REST api.
 * 
 * @author helmut hauschild
 *
 */
public class BitstampQuoteReader implements QuoteReader {
  /**
   * Constant API URL
   */
  private static final URL BITSTAMP_API_URL;

  // static initializer for API URL because we must check for
  // MalformatedURLException
  static {
    try {
      BITSTAMP_API_URL = new URL("https://www.bitstamp.net/api/ticker/");
    } catch (MalformedURLException e) {
      throw new RuntimeException(
          "Cannot initialize BitstampQuoteReader class.", e);
    }
  }

  /**
   * Logger
   */
  static Logger log = Logger.getLogger(BitstampQuoteReader.class.getName());

  // fields
  private double currentQuote;
  private double high24;
  private double low24;
  private double volume;
  private double bid;
  private double ask;
  private double vwap;

  /**
   * Accessor for current quote. Updates all fields, so it should be called
   * before reading all other fields. The textbook way to do this would be to
   * create a data object with all relevant fields and return that object. We
   * don't do that because we want to prevent object creation for performance
   * reasons.
   */
  public double getCurrentQuote() {
    readNextAndUpdate();
    return currentQuote;
  }

  //Other Accessors
  public double getBid() {
    return bid;
  }
  ...

  /**
   * Read current data from the Bitstamp API and update fields.
   */
  private void readNextAndUpdate() {
  ...
  }

  /**
   * Main method for a simple test run.
   */
  public final static void main(String[] args) {
    BitstampQuoteReader rqr = new BitstampQuoteReader();
    log.info("Next Quote: " + rqr.getCurrentQuote());
  }
}

Getting started

20160222_091116.jpgUnfortunately every data science project starts with the somewhat tedious task of data acquisition and organization.

To accomplish anything at all, the first thing you’ll need is training data. So before anything else, you want to start collecting a lot of it.

Now, in a professional setting, you want to take some time and think about the volume and structure of your input- and output-data. Therefore I don’t recommend doing at work, what we are about to do next.

We postpone the careful thinking for now, because we want to get things moving, and we are pretty sure that, whatever the result of our (later) deep thinking might be, it will contain exchange rate ticker quotes for Bitcoin. Deliberating on this just a little bit further, we convince ourselves, that other ticker quotes (currency exchange rates, stock prices, economic indicators) might also be useful for the prediction of the Bitcoin price, and that there could be some data points related to the quotes that may give us a little statistical advantage, too.

While musing about that, it occurs to us, that for testing purposes we might also need a mechanism to easily generate random quotes, when no data source is available. And for assessing the quality of the training results, we might later need a mechanism to replay historical quotes to run old data against an updated neural network and find out, how well it would have performed during a certain time interval.

All these considerations lead us to our first little class diagram:

clsdiag_btcquotereader

We see an interface for some sort of adapter (QuoteReader) with none of our favorite design patterns incorporated, and not even a data representation class around. I realize that this is scary. Get used to it! Because this is not an accident. We will do a lot of number crunching and – believe it or not – in this context, it is GOOD practice, to sacrifice beauty and object orientation on the altar of performance. The base rhythm of our architecture will be, to prevent object creation in critical areas whenever possible. We will use arrays instead of collections, unless we need collections in external libraries. We will use primitive data types whenever possible. It will feel a lot like 1985 with one positive side effect: We will be quite happy about these decisions when we try to communicate with the GPU later.

With this said, and the aesthetically minded among you properly scared, we move on to have a closer look on the interface:

package de.hsec.datascience.btctrader; 
/**
 * Interface for Adapter classes to ticker information sources.
 * @author helmut
 */
public interface QuoteReader {
 /** 
 * Returns the current ticker value. Either a price fixed 
 * by a market maker or the last trading price. 
 */
 public double getCurrentQuote();

 /** 
 * Returns the highest bid price in the order book. 
 */
 public double getBid();

 /** 
 * Returns the lowest ask price in the order book. 
 */
 public double getAsk();

 /** 
 * Returns the lowest price of the last 24 hours. 
 */
 public double getMin24();

 /** 
 * Returns the highest price of the last 24 hours. 
 */
 public double getMax24();

 /** 
 * Returns the trade volume of the observed exchange. 
 */
 public double getVolume24();

 /** 
 * Returns the volume-weighted average price. 
 */
 public double getVwap();

}

 

Ok, so we can use an instance of such a QuoteReader to access what seems to be market data from some exchange. The accessor methods come without a timestamp or index parameter so we (correctly) assume, that they return current data. We’ll have to discuss our working definition of the word “current” later.

In the next post, we’ll take a closer look on the implementations, especially the BitstampQuoteReader.

Predicting Bitcoin Prices

In this initial blog series, I am going to report on an automated bitcoin trading system, that I have build in 2014 and sucessfully operated during 2015.

The decision making component in this trading system incorporates machine learning methods: mainly a neural network and – in a data preparation step – principal component analysis (PCA).

The code was written in Java and Matlab. It is not always pretty, so please when reading through it, keep in mind, that this has started as a hobby project.

Some of the code I can not publish, which I will explain when I come to it. But I will point out how to fill in the gaps.

I would like to encourage people to rebuild the system, use it to try out their own ideas and share them with the rest of us. Also I want to point out, that while bitcoin trading is a good point to start, it is certainly not the only area, where these methods are applicable.

Why is bitcoin a good point to start? Because of an excellent technological infrastructure and immediate financial rewards, to name a few reasons. Also Bitcoin is cool, which for me has some value on it’s own.

In the 12 months of operation, the system initiated roughly 11000 transactions on Bitstamp, a Bitcoin exchange which among other things allows to trade Bitcoin against fiat currency (USD). The system yielded a gross revenue a little above 26%. After transaction fees, a pre-tax return around 20% remained. The result after taxes is a wholy different story, which we will talk about in a later post.

Now, a buy and hold strategy during this year would have given me the same revenue during this time interval, even with less transaction fees. But I could not have known that in the beginning of the year.

The approach of the trading system is obviously completely different. It tries to predict small movements in the near future (a few minutes) based on observed market activities, news, economic data and a few other factors. In essence, it exploits the prices’ volatility. The beauty of this is, that it works almost as well, when the overall direction is southwards.

During the first months of the year, while doing it’s first clumpsy, inexerienced trading steps, the system has recorded the input data and added it to an increasingly larger body of training data. The neural network has been trained and retrained several times, each time with more input data. The results turned out increasingly better. From January to April the trading yielded net negative results while the overall market went sidewards. After that the results where positive, even during a severe market decline in November. The last training took place in May. Due to memory constraints (and because the training time has passed 24 hours), training with more data would have made a different approach necessary. Since the results were already satisfactory, I have decided to stick to what I have. So that is, where we are now: Having quite some room for improvement.

In the next few posts, I will very briefly lay out the theoretical foundation to the project, before we take a closer look into the code.