The pursuit of alpha has been, for many years, the secretive quest of a handful of hedge funds.
If you were good at alpha generation (or were thought to be good) you were snapped up by one of these organisations and you fell off the radar, disappearing in a brave and mysterious world known as the “buy-side”.
Meanwhile, more prosaically, alpha is about time series forecasting. A time series is a sequence of data points listed in time order, labelled with corresponding timestamps. If we list the closing prices of IBM stock on multiple days along with the corresponding dates, we obtain a time series:
Nov 20, 2018, 117.20
Nov 21, 2018, 118.57
Nov 23, 2018, 117.19
And so on. We may list both the opening and closing prices next to each other, in which case we obtain a multivariate time series:
Date, Open, Close
Nov 20, 2018, 118.49, 117.20
Nov 21, 2018, 117.61, 118.57
Nov 23, 2018, 118.09, 117.19
In this example we have three observations (n = 3) corresponding to the three dates, the rows in our table, and two features (p = 2): Open and Close. We may also consider the prices of multiple stocks – again leading to a multivariate time series and increasing the number of features in it. Whenever we increase n or p we increase the dimensionality.
If you can forecast where the value of a particular feature of a time series, such as a stock price or an exchange rate, is going to be at some point in the future, you can (hopefully!) realise this prediction as profit. As either Niels Bohr or Piet Hein have informed us, “prediction is very difficult, especially about the future”. Usually our predictions are uncertain and we have to be able to quantify this uncertainty. In statistics, we often express this uncertainty as R^2 (read “R squared”). Time series forecasting is sometimes regarded as a branch of econometrics. In econometrics we usually work with differences, rather than raw time series, so we are forecasting the difference between the price at some future horizon (say five ticks) and current price.
Almost everyone can predict the past with certainty. We want our forecasts to do well on new, yet unobserved data; we want to achieve a high out-of-sample (rather than in-sample) R^2. We therefore need to minimise under- and overfitting – and this is what the discipline of machine learning is primarily concerned with. Thus alpha generation is not only an exercise in statistics and econometrics, but also an exercise in machine learning.
But how do you turn your, hopefully good, forecasts into profits – how do you realise your alpha? Either by placing orders and trading (aggressing) or by slightly modifying – skewing – the prices that you are quoting to others (known as passive risk management, as opposed to aggressive). In each case you leak some information about your forecast to the market – and interact with the very object that you are trying to predict. Will this interaction have no effect, help realise your “prophecy” (in which case it is a self-fulfilling prophecy) or thwart it (in which case it is a self-defeating prophecy, both terms having been coined by Robert K. Merton, the father of Robert C. Merton of the Black-Scholes-Merton fame).
In dynamical systems – and cybernetics – such interactions are known as positive and negative feedbacks. Indeed, trading strategies are prime examples of cybernetic systems, a fact expressed most vividly by Eugene Durenard in his book Professional Automated Trading. Durenard is interested in applying cybernetic metaphors to trading – but what if we go in the opposite direction and see how ideas from trading can benefit cybernetics?
Norbert Wiener introduced cybernetics in 1948 as “the scientific study of control and communication in the animal and the machine”. The word originates from the Greek kubernetes, “steersman” via the 1830s French term cybernétique, “the art of governing”. Hence the connotations: government, management, control…
In cybernetics we are considering the inputs and outputs of a particular system over time, possibly in the presence of feedbacks. We are using the inputs to predict – and hopefully control – the outputs. As Peter Drucker pointed out, “what gets measured gets managed”. Before something can get measured, one needs to “express it in numbers” – as pointed out by Lord Kelvin and also by Alex (Oleksandr) Bilokon, who uses Drucker’s ideas to manage fleets of ships. Cyberneticians go further and postulate: what you can measure, you can (sometimes) forecast; what you can forecast, you can (sometimes) manage; and what you can manage, you can (sometimes) prevent.
Notice that traders are also concerned with a cybernetic system: they are trying to realise gains and avoid losses in a market where the input time series is used to forecast an output time series in the presence of feedbacks.
Alpha generation is not easy. Financial time series are difficult to deal with: they are non-stationary (their statistical properties change over time), non-Gaussian (often skewed and exhibiting fat tails, making extreme events far more likely than they normally would be), influenced by animal spirits (which Keynes defined as “a spontaneous urge to action rather than inaction” – a property of the human soul), driven by unobservables (or latent variables, such as volatility), affected by human errors (including fat-finger errors), complex and interrelated, often multivariate and high-frequency (consisting of numerous intraday observations arriving at irregular time intervals).
Interestingly, and perhaps surprisingly, many biological, biochemical, and medical time series exhibit similar properties, which may pertain to life in general, rather than just human activities. That there are traders adept at generating alpha on such challenging time series gives us reasons to hope that we may achieve similar successes in other fields, where Wiener, Andrey Kolmogorov, and their colleagues tried to apply cybernetics with limited success in the 1950s and 1960s: economics, insurance, cardiology, oncology, neurology, more generally, the human mind.
Time series often have lower signal-to-noise ratios than images and natural language, where most of the successes in artificial intelligence (AI) have been achieved so far, and are therefore more challenging.
Cybernetics suffered from the same issues as AI in the 1970s, 1980s, and 1990s, and was as affected by the various AI winters, which led to significant decreases in funding. As Steven Strogatz wrote in Sync, “every decade or so, a grandiose theory comes along, bearing similar aspirations and often brandishing an ominous sounding C-name. In the 1960s it was cybernetics. In the 1970s it was catastrophe theory. Then came chaos theory in the ‘80s and complexity theory in the ‘90s.”
We seek to invoke the help of the One and consider neocybernetics instead of classical cybernetics. But avoiding a C-name is only the first step: we need to consider the technical issues that hindered progress in the 1970s, 1980s, and 1990s, and decide how to overcome them.
Cybernetics was explored first and foremost by academics, rather than by engineers or entrepreneurs. It never became a technology, which Stephen Boyd defines at something that “can be reliably used by many people who do not know, and do not need to know, the details” – such as an iPhone or, perhaps, glasses. At times we may even forget that we are wearing a pair of glasses; now, that’s a technology!
The computing power accessible in Wiener’s time was insufficient for most cybernetics applications. The MIT Autocorrelator used by Wiener, Jerome B. Wiesner, and Yuk W. Lee was way off the modern Moore’s Law charts.
The mathematical language of stochastic analysis, while extremely elegant and powerful, is very complex and labour-intensive: it requires a lot of hard work by extremely skilled practitioners to derive intuitively obvious results, let alone something highly nontrivial. Stochastic analysis is so complex that people naturally gravitate towards overly Platonic, overly parsimonious models (with very few parameters), which don’t adequately reflect reality.
In many applications, estimation theory, which tells us how well or poorly we know the estimated parameters, was neglected. Moreover, the focus of classical statistical theory was on optimising results in-sample, rather than out-of-sample.
Parallel programming is highly nontrivial, let alone the low-latency, high-throughput parallel programming needed for many real-time cybernetic systems. Message-driven processing, event-driven architectures (EDA), let alone reactive programming, were unheard of in Wiener’s times – there were simply no suitable software engineering paradigms to realise large-scale cybernetic systems. In particular, there was no straightforward way to represent a system with anything but the most trivial feedback loops in software.
A trading strategy – represented as a directed acyclic graph (DAG) – contains numerous such feedback loops, as do the graphs representing most metabolic pathways.
Neocybernetics would be attained by turning cybernetics into a technology through realising it as user-friendly processes, algorithms, software libraries and end-user products. We would need to harness the computing power offered by modern high-performance computing (HPC)technology, including cloud computing and potentially, going forward, quantum computing. We could complement stochastic analysis with the simpler mathematical language of deep learning and deep reinforcement learning, which rely on simpler probabilistic ideas to express uncertainty. We can use novel software engineering methodologies, such as the modified Functional Reactive Programming (FRP) incorporating transactions and making a clear expression of feedbacks possible.
While FRP was unavailable in Wiener’s time, he understood the importance of message-driven systems. In The Human Use of Human Beings (thank you, Michael Sarni, for giving me a copy of this book!), he wrote: “Messages are themselves a form of pattern and organization. Indeed, it is possible to treat sets of messages as having an entropy like sets of states of the external world. Just as entropy is a measure of disorganization, the information carried by a set of messages is a message of organization. In fact, it is possible to interpret the information carried by a message as essentially the negative of its entropy, and the negative logarithm of its probability. That is, the more probable the message, the less information it gives. Clichés, for example, are less illuminating than great poems.”
We now have the new mathematics that makes neocybernetics accessible; programming languages, such as Python, that simplify the process of data science; numerous libraries for dealing with time series data, such as NumPy, SciPy, Scikit-Learn, Matplotlib, and Pandas; FRP libraries, such as ReactiveX and Sodium; special-purpose databases, such as kdb+/q, suitable for capturing, storing and processing vast amounts of data in real-time; and, using TensorFlow and Keras, more or less any data scientist can calibrate a fairly sophisticated neural net.
Kolmogorov and Wiener both recognised that cybernetics would lead to a different view of human beings and a different appreciation of human life – a new anthropology. Something that Master Yoda summarised as “Luminous beings are we, not this crude matter”. Wiener stated, in The Human Use of Human Beings, that the goal of cybernetics is the age-old struggle of humanity against entropy: “In control and communication we are always fighting nature’s tendency to degrade the organized and to destroy the meaningful; the tendency, as Gibbs has shown us, for entropy to increase.” It turns out that alpha-generating traders are best positioned to help out in this quest.