Algorithmic trading (also known as Black-box trading) - computer-guided trading, where a program with direct market access can monitor the market and order trades when certain conditions are met. As of this writing (2009) probably ~40% of all trades are done this way. The companies run by quants using algorithmic trading produce higher returns than conventional companies. At least during more or less stable times.
As computers eventually has beaten people in chess, it is expected that the same will happen in investing.
Algorithmic trading is usually performed on a relatively short-term basis (from thousandths of a second - to seconds, minutes, hours, days). So the Artificial Intelligence (AI) of those algorithms doesn't compete (yet) with Warren Buffett (long-term investing based on choosing good companies/teams/culture).
Direct Market Access (DMA) today is provided by many brokers, thus making Algorithmic Trading possible to even small investors. If you set up a personal account with a service like Interactivebrokers, and run a script on your own server to communicate via their API to monitor the prices and order trades - you can be in business. If, of course, your algorithms are profitable (statistically).
The benefits of algorithmic trading:
- Can do things which humans can not do - like making decisions based on complicated calculations, grabing a security as soon as it becomes available (whereas a human trader may be off the desk), making complex trades (splitting orders, selecting pools of liquidity, etc.), sniffing "gaming" - and protecting against it, etc.
- Predictability. Algorithms can be extensively tested (historical backtesting) before going live.
- Efficiency, fast reaction, precision.
- Data-based. No emotions.
- Time to market - implementing an algorithm may be as easy as punching several parametersin in a form on the screen.
- Working with decreasing margins, high-speed markets, increased data flows (options - ~ 1 mln updates per second), multiplicity of execution venues, cross-asset class opportunities, etc.
There are many kinds of algorithms, tactics and strategies of algorithmic trading.
- Algorithms can be based on price, volume, timing, etc. (for example, trigger a buy order on a certain percentage upward movement in a share price).
- Slicing a big order into many small orders to hide large order from market participants or to minimize its market impact. (CSFB - "Guerrilla" algorithm).
- "Benchmark" algorithms - to achieve a specific benchmark (for example, the volume weighted average price over a certain time period), etc.
- "Participating" algorithms (for example, % of trading volume)
- "smart order routing" - orders are sent out to the destinations offering the best prices or liquidity
- "dark pools of liquidity" - pools of liquidity not provided by conventional platforms (stock exchanges or crossing networks). CSFB "Sniper" algorithm to detect such pools.
- "Gamer" - sniff out large orders - and then try to use that knowledge to trade against the block at a profit.
- "sniffers" - algorithms to detect the presence of algorithmic trading and the algorithms they are using (bespoke, etc.).
- Artificial Intelligence algorithms of different kind. For example, a program can "read" news and web blogs (NLP = Natural Language Processing) searching for certain factors (hurricanes, wars, economic or political events, public opinions) - and make trading decisions based on that. "Hurricane" could signal to sell insurance stocks, "Drought" could affect wheat prices. Some vendors started to provide "machine readable news".
- It is possible to run hundreds of algorithms from one server - and to make cooperative algorithms.
- DMA - Direct Market Data
- backtesting - process of testing algorithms on historical data
- DOT and SuperDOT - (Super Designated Order Turnaround System) - electronic system used to place orders for stocks
- ECN - Electronic Communication Network - electronic system to execute orders outside of an Exchange
- ATS - Alternative Trading System
- Dark Pool Liquidity - trading volume created from institutional orders done "anonymously" and away from central exchanges.
- VWAP algorithm (Volume Weighted Average Price)
- AMEX, BOE (Boston Options Exchange), CBOE (Chicago Board Options Exchange), ISE (International Securities Exchange), NYSE Arca, NASDAQ OMX PHLX - exchanges
- MiFID (Markets in Financial Instruments Directive) - November 2007 - new rules created to help trading in Europe.
- Regulation NMS - (National Market System) - set of rules from SEC (Securities and Exchange Commission) to improve fairness in price execution.
- FIX protocol ( Financial Information eXchange) - eExample of the message: 8=FIX.4.2 | 9=67 | 35=8 | 49=PHLX | 56=PERS | 11=ATOMNOCCC9990900 | 52=20071123-05:30:00.000 | 20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 | 44=15 | 58=PHLX EQUITY TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 | 10=102 |
- FAST protocol (FIX Adapted for STreaming) - optimized for low latency
- Alpha - measure of performance adjusted to risk.
- Smart Order Routing - select the "best" place to do the trade - and route the order there.
- Latency - network latency, response latency, etc. Some algorithms need very low latency (milliseconds).
- Tick data - time series data containing both volume and price (and more) for each point.
- Statistical Arbitrage - A profit situation arising from pricing "inefficiencies" between securities.
- ATD (Automated Trading Desk, LLC) - was bought in2007 by Citi for $680 million.
- Credit Suisse, Goldman Sachs, Morgan Stanley, Deutsche Bank, Citadel - major algorithmic trading movers
Important 2 books:
- Active Portfolio Management: A Quantitative Approach for Producing Superior Returns and Controlling Risk (1999) by Richard Grinold, Ronald Kahn
(This is very importand book - although it may take time to go through.)
- Investment Management: Portfolio Diversification, Risk, and Timing--Fact and Fiction (Wiley Finance) (2003)
by Robert L. Hagin
(This book is lighter, should help with intuition for investing.)
More good books:
- Pairs Trading: Quantitative Methods and Analysis (Wiley Finance)
(2004) by Ganapathy Vidyamurthy
- Quantitative Strategies for Achieving Alpha (McGraw-Hill Finance & Investing) (2008) by Richard Tortoriello
- High Probability Trading Strategies: Entry to Exit Tactics for the Forex, Futures, and Stock Markets (Wiley Trading) (2008) by Robert Miner
- Fibonacci Trading: How to Master the Time and Price Advantage (2008) by Carolyn Boroden
- Quantitative Trading: How to Build Your Own Algorithmic Trading Business (Wiley Trading) (2008) by Ernie Chan
- Journal of Portfolio Management,
- Journal of Financial Engineering,
- Quantitative Finance,
- Journal of Investing,
- High volume of data (millions of data points (ticks) every day.
- Bad ticks (up to 2-3%) cause bad trades. So sophisticated data filtering/data cleaning algorithms are required. And they have to work in real time. Note: overscrubbing data also causes problems. http://www.tickdata.com/FilteringWhitePaper.pdf
- Sometimes information providers fail to deliver data on time
- Patterns - seasonal paterns, patterns during the wekk, during the day, 1st minute high volume
For historical backtesting one need to store tick data. It is a time series data. So using standard SQL database may be not the best solution. There are many approaches and things to pay attention to. Usually for processing people use arrays in memory (C, C++, J, OCaml). For data storage people use flat files, SQL databases, linea database, etc. You can google for: how to present tick data in the database
Here is excellent discussion:
- RDBMS. Big databases provide for intelligent memory caching, or even "in memory" database solutions, for example: Oracle TimesTen In-Memory Database.
DB2, MS Server, MS SQL (Falcon Engine), etc. also have fast in-memory solutions. Still, SQL databases have a lot of overhead.
- Specialized database/interface solutions - fast in both storing and retrieving, can work with terrabytes of data, and can process thousands of transactions per second:
- onetick.com - very very good
- vhayu.com - Vhayu tick database - used by RMD Server (Reuters).
- SunGard's data server - FAME
- hdf5 - HDF5 is better and faster than flat files for backtesting (HDF5 Packet Table API)
- Commodity Server
- Xenomorth -TimeScape XDB, not very good
- kdb+tick (kx.com) - 64 bit commercial and 32 bit non-commercial (has restrictions)
- MonetDB - Open Source
- StreamSQL - StreamSQL is an extension to SQL - procesing using sliding time-based windows
- streambase, Esper (can use HDF5 for storage, Esper for analysis).
- NetCDF - Network Common Data Form - a set of software libraries/formats for array-oriented data
- FITS - Flexible Image Transport System - data file format to manipulate/store/transmit images.
- db40 - object database
- Deltix QuantOffice - has its own tick database - and can work with vhayu.com and others.
- ReiserFS - journalling filesystem (supported on Linux)
- K programming language (processing arrays, derivative of APL and a cousin to the J programming language).
- http://www.vertica.com/ - Analytic database
- OCaml - high performance programming language (also persistent library)
- sqlite.org - memory-resident database written in C - convenient as a buffer between datafeed and real storage (say, MySQL database). Idea is to accumulate ticks in sqlite one-by-one - and then write them to real database in blocks (bulk) - thus increasing performance.
- For high frequency data disk speed can be a limitation. One way of solving this problem is to "compress" data by storing only differences - in minimal number of bits per tick (say, 5 bits on average).
Algorithmic trading (and modeling/backtesting) can be implemented in many different programming languages. See for example:
- http://www.algotradinggroup.com/cgi-bin/yabb2/YaBB.pl?num=1215367637/13 - good discussion
- http://www.janestreet.com/yaron_minsky-cufp_2006.pdf - good presentation/discussion making case for using OCaml
Generally FP (Functional Programming) Languages are very good for fast prototyping, although they are generally slower. But, again, this is not always the case - for example, OCaml works really fast. FPLs are also easier to parallelize (to run on several computers in parallel).
For some projects you can use combinations of languages. For some tasks selecting a language is just a matter of personal preference. For others
Here are some languages to choose from:
- C/C++ - fast. There are libraries to provide FP features: Boost, phoenix, lambda
- Java - fast (if you don't overuse the OOP features).
- Q - based on K - programming language, a query language for KDB+ ( see also Pure_programming_language ).
- K - variant of APL with elements of Scheme and A+. Fast. Used by kx.com for kdb (in-memory column-based database). The executable can be made very small - fit into CPU cache.
- J - from APL and FP, very terse, great for math, statistics, matrices. The executable can be made very small - fit into CPU cache.
- APL - Array Programming Language (since 1957, was used in IBM, etc.), usually interpreted, slow
- OCaml - very fast (matches C/C++), OOP, FP, static typing
- Scala - good, works with Java, a bit slower
- Haskell - FP, slow
- Erlang - FP, slow, used in telecom industry, designed to run distributed, fault-tolerant, real-time, non-stop applications.
- F# - from Microsoft for .NET platform, siimlar to OCaml
- Axum - from Microsoft - parallel programming ( see also concurrency )
Formal definition of a "tick" is a "minimum change in price a security". For stocks it may be as little as one cent, for US Treasuries - 1/32 of a point, or 31.25 cents. For futures it may be different depending on the contract. The tick size is determined by the exchange. Different products may have different tick sizes.
When people talk about "Tick Data" - they usually mean any type of time series data which includes both volume and price for every point. Additional info may include time (for asynchronous time series), bid/ask, partial Level 2 information (information not only for the best bid/ask, but for several others), etc. Depending on the feed, you can get more or less data. Also you can get absolute values - or only differences (this sometimes is useful to "compress" the data flow).
Sometimes people use the term "Market Data". It is similar. Market data feed may contain more information (for example: Ticker Symbol,
Last sale (price),
Trade time, Exchange,
Volume). The feed may be an aggregation of several feeds.
Tick data can be very big (it is not uncommon to handle terrabytes of data for analysis).
To receive high-frequency data in real time you may need special hardware, because regular ethernet cable and regular hard drive may be not fast enough to handle the traffic. So you may need to parallelize your systems.
For many applications when you don't need fine granularity, you can decrease the amount of data 100 times (and more) by using data presented on a "per minute" basis (or even "per day"). People refer to this data as "bars" or "candlesticks".
Low Latency Trading - done electronically using network connections to exchanges and ECNs (Electronic Communication Networks). ~ 60-70% of the NYSE volume is done this way. Today with algorithmic trading even a millisecond improvement in latency gives a competitive advantage.
Causes of latency:
- physical distance
- too much traffic causes network delays
- delays in feed/message handling components
- applications can't handle the volume fast enough
- market centers themselves may introduce latency - gateways, different subscribtions (direct pipe is faster than going through a consolidation process).
- average latency, standard deviation, trends in latency
- bespoke (custom) latency measurement (ground-up - measure each step).
- another way collect statistics - and make conclusions from that.
- histograms, heat maps, etc.
Who is affected by latency:
- algorithmic trading
- statistical and arbitrage models' trading
- Rebate Trading - when you trade large volumes on ECN (electronic communication networks), ECN will give you a small rebate (%) for each big transaction. If you do large volume of transactions - you will make money on rebates. The key is not to hold - but to get in/get out ASAP.
Methods to reduce latency:
- Move closer to ECN or exchange (colocate boxes on the exchange sites).
- Use fiber-optic direct pipes with enough bandwidth
- Use diversity in vendors for your connections
- Get fast hardware
- bespoke hardware - custom made to the buyer's specs.
- FPGA hardware acceleration (FPGA = Field-Programmable Gate Array) - semiconductor device that can be reconfigured by the customer.
- multi-core chips
- lots of memory - cache everything, minimize usage of hard drive
- fast hard drives (icluding solid-state drives)
- fast software to speed-up prcessing, analysis, decision making
- reducing the number of messages (remove unnecessary talk), using native messages (converting to FIX and back takes time), compressed messages.
- http://epchan.blogspot.com - Ernie Chan's excellent blog
- http://www.algotradingpodcast.com - interviews about algorithmic trading
- http://www.algotradinggroup.com - forum
- http://www.thetrade.ltd.uk/resources.asp - good resources
- http://www.marketcetera.com/site/ - open source trading platform
- http://www.datashaping.com - collection of resources.
- http://www.stacresearch.com/ performance
- http://www.suitellc.com - Suite LLC’s ALib™ analytic library for fixed-income and credit derivatives.
- http://streambase.typepad.com/streambase_stream_process/algorithmic-trading/ -
- http://www.investopedia.com - good place to search for term definitions
Some articles on short-term effects of trades:
- http://weber.ucsd.edu/~mbacci/engle/291.pdf - (2000) - Time and the Price Impact of a Trade - Alfonso Dufour, Robert F. Engle
- http://www.courant.nyu.edu/~almgren/papers/costestim.pdf - (2005) - Direct Estimation of
Equity Market Impact - Robert Almgren, Chee Thum, Emmanuel Hauptmann, and Hong Li
- http://www.courant.nyu.edu/~almgren/papers/optliq.pdf - (2000) - Optimal Execution
of Portfolio Transactions -
Robert Almgren, Neil Chriss
- http://www.santafe.edu/~jdf/papers/mastercurve.pdf - (2003) - Master curve for price-impact function - Fabrizio Lillo, J. Doyne Farmer, Rosario N. Mantegna
- http://arxiv.org/pdf/cond-mat/0406224v2 - (2008) - Random walks, liquidity molasses and critical response in financial markets - Jean-Philippe Bouchaud, Julien Kockelkoren, Marc Potters
- http://papers.ssrn.com/sol3/papers.cfm?abstract_id=931667 - (2006) - Tactical Liquidity Trading and Intraday Volume - Merrell Hora
More misc. resources:
- 29West.com - fast and affordable messaging middleware (up to 2.4 mln messages per second, latency under 50 microsec on a regular PC hardware)