# Quote Arrival Frequency Distribution for Tick Data

High-frequency systems development is built upon the analysis of tick data. A classic example is statistically characterizing the frequency and arrival times of intra-day quotes, useful for building systems which exploit market microstructure effects.

Yet, the temporal regularity of such analysis fundamentally differs from traditional quantitative analysis: ticks arrive at irregularly-spaced times (even multiple at the same time), with time intervals ranging from zero to a few seconds (or even minutes). The irregular time arrival of ticks conflicts with the regularly-spaced assumption of classic statistical time series methods and corresponding computational tools.

Recent analysis bore out this challenge.

Consider generating a frequency distribution for quote arrival times for a single currency instrument, say EURUSD stored in a standard CSV file of the following format (with header line):

`pair,date,bid,ask`

EURUSD,2009-01-26 00:00:11.000,1.294500,1.294800

Considering all such irregularly-spaced quotes for 26 Jan 2009, calculate the temporal frequency distribution by both hour and minute. Turns out this is a bit harder in R than one naïvely expects. Any readers with R expertise, suggestions are welcome for improving the below code (admittedly not the most beautiful).

Assume a data frame, named *data*, has been loaded which contains all the above tick data. Begin by parsing the dates assuming a non-standard format, generating an ordered vector of *numeric* timestamps, measured in number of seconds since the epoch:

`datesNum <- as.numeric(as.POSIXct(strptime(as.character(data$date), "%Y-%m-%d %H:%M:%OS")))`

Next, truncate timestamps to be zero-based and convert to desired time unit (divide by 60 for minutes, 360 for hours):

`datesNum <- datesNum - min(datesNum)`

datesNum <- datesNum / 60

Now comes the R magic, made possible by converting the quote arrival timestamps into numerics:

`plot(cbind(table(cut(datesNum, seq(min(datesNum), max(datesNum), by=1), right=FALSE))), type="l", xlab="Minutes", ylab="Quote Frequency")`

This expression requires a bit of unpacking to see how it fits together (for more detailed explanation, see r-tutor):

- Range: calculate the range of timestamps, as returned by
`min()`

and`max()`

- Subinterval partitions: partition the range into non-overlapping sub-intervals by defining a sequence of equal distance break points, via
`seq()`

- Classification: classify each of the timestamps according to the sub-intervals, left closed and right open, via
`cut()`

- Frequencies: compute frequency of timestamps in each sub-interval via
`table()`

- Column binding: bind the sub-interval time and frequency columns via
`cbind()`

Given that, `plot()`

generates a simple line graph whose x-axis is the sub-interval time unit values (*i.e.* minutes or hours) and y-axis is the frequency of quotes which arrived in that sub-interval of time.

For those conducting high-frequency analysis, R packages suited for irregular-spaced methods include zoo, xts, its, tseries, and fts.

Hello,

xtsis another package suitable for irregularly-spaced time-series. It is heavily optimized for speed and memory, which allows you to quickly manipulate objects of several million rows.This code should provide similar results to your code above, but the times are aligned by minutes instead of time since the first observation.

See ?endpoints for details as well as vignette(“xts”) for more information.

Best,

Josh

# create random bid/ask data

require(xts)

N <- 1e7

data <- 1.2945+rnorm(N)/1000

data <- cbind(data,data+runif(N)/1000)

colnames(data) <- c("bid","ask")

# create and order random times

times <- Sys.time()-N:1+rnorm(N)*100

times <- times[order(times)]

# create xts object from data and times

EURUSD <- xts(data, times)

# create quote frequency chart

plot(diff(endpoints(EURUSD,"minutes")),type='l')

@Josh: thanks for your comments; nice work on xts. I am updating the post to include reference to it.