The AI @ Oxford School

The graduates of the AI @ Oxford Summer School 2018 on the steps of the Hall of Christ Church, Oxford


Master Machine Learning (ML) / Artificial Intelligence (AI) in only three working days and one weekend at Oxford’s most beautiful college!

Some people talk about ML, AI, and data science. We actually do it.

Our lecturers hold prestigious academic positions at top universities. All of them have had a wealth of practical experience in the financial services industry, running electronic trading desks, and pioneering ML/AI in this highly lucrative and secretive area.

Our condensed, intensive, interactive five-day course will take you from a brief overview of the mathematical foundations to current state of research and industry practice in ML/AI.

In addition to covering the depths of the theory, we work through real financial, medical, and engineering datasets, showing how highly nontrivial and non-obvious information can be extracted from them. In finance, this leads to superior alpha and better execution. In medicine, this leads to saving lives and money.

Following an introduction to data science and the underlying mathematics, you will go up to Oxford to study the theory and practise machine learning on real financial examples. You will review, learn, and master:

  • probability and statistics
  • linear regression methods
  • dimensionality reduction
  • unsupervised machine learning
  • bias-variance tradeoff
  • model and feature selection,
  • model tuning
  • classification
  • neural nets
  • deep learning
  • recurrent neural networks, including LSTM
  • reinforcement learning
  • deep reinforcement learning
  • inverse reinforcement learning
  • generative adversarial networks (GANs)
  • current frontiers in artificial intelligence (AI) and machine learning (ML)

You will also network with other data scientists, fintech industry leaders, and Oxford academics.

Level and prerequisites

Among our delegates we have industry experts with multiple decades of experience and young Kaggle champions.

Most of our delegates have a degree in mathematics, computer science, physics, engineering, economics and/or finance. Some have PhD degrees. Some have extensive experience in ML/AI, whereas others are retraining from another quantitative discipline, such as finance or engineering.

The main mathematical prerequisites are

  • linear algebra
  • probability theory (basic, not measure-theoretic)
  • optimisation theory

However, if you are rusty on any/all of these we review them on the first day of the training. Some of our delegates have left the academe a while ago and we appreciate that you may need a refresher.

We expect you to be able to program in Python. If you are not familiar with Python, we recommend that you attend our separate, one-day introductory course: Python for Data Science and Artificial Intelligence

The course is intensive. We pride ourselves on our ability to take you from foundations to current state of research and practice in ML and AI in just five days.


  • Dr. Paul Bilokon
  • Prof. Matthew Dixon


The Christ Church (Ædes Christi) college of the University of Oxford.


Accommodation at the historic Christ Church college is included in the price. You will join a distinguished company of scholars who lived in these very rooms: Lewis Carroll, Albert Einstein, William Ewart Gladstone, Robert Hooke, John Locke, Sir Robert Peel, and many others.


And, in case you need a refresher on Python before the course, you can separately attend our Python refresher on Level39:


TimeDay 1Day 2Day 3Day 4Day 5
08:30 – 09:00Registration and welcomeRegistration and welcomeRegistration and welcomeRegistration and welcomeRegistration and welcome
09:00 – 10:00Lecture 1: Introduction to data scienceLecture 1: Statistical inference and estimation theoryLecture 1: From statistics to supervised Machine LearningLecture 1: Deep LearningLecture 1: Reinforcement Learning
10:00 – 10:30Tutorial 1Tutorial 1Tutorial 1Tutorial 1Tutorial 1
10:30 – 11:00Coffee breakCoffee breakCoffee breakCoffee breakCoffee break
11:00 – 12:00Lecture 2: Probability theoryLecture 2: Linear regressionLecture 2: Model and feature selectionLecture 2: Recurrent Neural NetworksLecture 2: Inverse Reinforcement Learning
12:00 – 12:30Tutorial 2Tutorial 2Tutorial 2Tutorial 2Tutorial 2
12:30 – 13:30LunchLunchLunchLunchLunch
13:30 – 14:30Lecture 3: Linear algebraLecture 3: Debugging linear regressionLecture 3: Classification methodsLecture 3: Applications of Neural Networks in financeLecture 3: Deep Reinforcement Learning
14:30 – 15:00Tutorial 3Tutorial 3Tutorial 3Tutorial 3Tutorial 3
15:00 – 15:30Coffee breakCoffee breakCoffee breakCoffee breakCoffee break
15:30 – 16:30Lecture 4: Optimisation theoryLecture 4: Principal Components Analysis (PCA) and dimensionality reductionLecture 4: Unsupervised Machine LearningLecture 4: Prediction from financial time seriesLecture 4: Particle filtering
16:30 – 17:00Tutorial 4Tutorial 4Tutorial 4Tutorial 4Tutorial 4
17:00 – 18:00LabLabLabLabGraduation and leaving drinks at the Battery
18:00 – 19:30Tour of Christ ChurchTour of Oxford City Centre Visit to Oxford’s historic pub The Eagle & Child, home of The Inklings 
19:30 – 21:00Dinner at the Dining HallDinner at the Dining HallDinner at the Dining HallDinner at the Dining Hall 


  • Introduction to data science
    • Data, information, knowledge, understanding, wisdom
    • Analysis and synthesis
    • Data analysis and data science
    • The process of data science
    • Artificial Intelligence and Machine Learning
    • The language of Machine Learning
    • Machine Learning and statistics
  • Probability theory
    • Random experiment and the sample space
    • The classical interpretation of probability
    • The frequentist interpretation of probability
    • Bayesian interpretation of probability
    • The axiomatic interpretation of probability
    • Kolmogorov’s axiomatisation
    • Conditional probability
    • The Law of Total probability
    • Bayes’s theorem
    • Random variables
    • Expectations
    • Variances
    • Covariances and correlations
  • Linear algebra
    • Vectors and matrices
    • Matrix multiplication
    • Inverse matrices
    • Independence, basis, and dimension
    • The four fundamental spaces
    • Orthogonal vectors
    • Eigenvalues and eigenvectors
  • Optimisation theory
    • The optimisation problem
    • Optimisation in one dimension
    • Optimisation in multiple dimensions
    • Grid search
    • Gradient-based optimisation
    • Vector calculus
    • Quasi-Newton methods
    • Gradient descent (stochastic, batch)
    • Evolutionary optimisation
    • Optimisation in practice
  • Statistical inference and estimation theory
    • Point estimation
    • Maximum likelihood estimation
    • Loss functions
    • Bias-variance tradeoff (dilemma)
    • Standard error
    • Fisher information
    • Cramér-Rao lower bound (CRLB)
    • Consistency
    • Hypothesis testing
    • p-values
    • Bayesian estimation
  • Linear regression
    • Linear regression model in matrix form
    • Disturbance versus residual
    • The Ordinary Least Squares (OLS) approach
    • Relationship with maximum likelihood estimation
    • The geometry of linear regression
    • The orthogonal projection with/without the intercept
  • Debugging linear regression
    • Variance partitioning
    • Coefficient of determination (R2)
    • An estimation theory view of linear regression
    • How many data points do we need?
    • Chi-squared distribution
    • Cochran’s theorem
    • Degrees of freedom
    • Student’s t-distribution test
    • Adjusted coefficient of determination
    • The F-statistic
    • The p-value
  • Principal Components Analysis (PCA) and dimensionality reduction
    • Modelling data as a random variable
    • The covariance matrix
    • Key properties of covariance matrices
    • The sample covariance matrix
    • The sample covariance matrix is unbiased
    • The correlation matrix
    • The sample correlation matrix
    • Centered data
    • The application of PCA
    • The interpretation of PCA
    • The advantages of PCA
  • From statistics to supervised Machine Learning
    • The supervised machine learning problem
    • Train/test error and overfitting
    • Underfitting
    • The bias-variance tradeoff revisited
    • Multicollinearity
    • Polynomial regression
  • Model and feature selection
    • Model selection and averaging, drop-out
    • Cross-validation: leave-N-out, K-fold
    • Cross-validation for time series
    • Sliding window for time series
    • Bootstrap
    • Ridge regression, L2 regularisation
    • Derivation and statistical properties of the ridge estimator
    • Examples of ridge regression
    • LASSO regression, L1 regularisation
    • Examples of LASSO regression
    • Applications to market impact models
  • Classification methods
    • The classification problem
    • Generalised Linear Models (GLM)
    • Logistic regression as an example of a GLM
    • Odds
    • Evaluation of a classification model: confusion matrix
    • Evaluation of a classification model: ROC chart
    • Decision tree models
    • Random forests
  • Unsupervised Machine Learning
    • K-means clustering
    • Enhancements to K-means, K++
    • Automatic specification of number of clusters
    • Partitioning Around Medoids (PAM)
    • Hierarchical clustering
    • Agglomerative hierarchical clustering
    • Divisive hierarchical clustering
    • Linkage methods
  • Deep Learning
    • The perceptron
    • Feed-forward Neural Networks
    • Other networks
    • Convergence results
    • Stochastic gradient descent
    • Variants of the stochastic gradient descent
    • Weight decay scheduling
    • Weight initialisation
    • A geometrical approach to model interpretability
    • A statistical approach to model interpretability
  • Recurrent Neural Networks
    • Autoregressive Neural Networks
    • Gated Recurrent Networks
    • Long-Short Term Memory Networks
    • Semi-parametric Neural Networks for panel data
    • Hybrid GARCH-ML models
  • Applications of Neural Networks in finance
    • Machine Learning in algorithmic finance
    • Momentum strategies
    • Predicting portfolio returns
    • Data preparation
    • Price impact models
    • Limit Order Book updates
    • Adverse selection
    • Predictive performance comparisons
  • Prediction from financial time series
    • Time series data
    • Classical time series analysis
    • Autoregressive and moving average processes
    • Stationarity
    • Parametric tests
    • In-sample diagnostics
    • Time series cross-validation
    • Predicting events
    • Entropy
    • Confusion matrix
    • ROC chart
    • Performance terminology
    • Practical prediction issues
    • Confusion matrices with oversampling
    • ROC curves with oversampling
    • Kernel regression
    • Kernel estimators
  • Reinforcement Learning
    • Stochastic dynamic programming
    • Markov Decision Process (MDP)
    • Q-learning
    • DeepMind: target networks
    • Replay memory
    • Exploration versus exploitation
    • The market making example
  • Inverse Reinforcement Learning
    • Reinforcement Learning and Inverse Reinforcement Learning: differences and similarities
    • Inverse Reinforcement Learning and Imitation Learning
    • Constraints-based IRL
    • Maximum entropy IRL
    • IRL for linear-quadratic-Gaussian regulation
    • Examples of IRL problems in finance
  • Deep Reinforcement Learning
    • Combination of Reinforcement Learning techniques with Supervised Learning approaches of Deep Neural Networks
    • Combination of Reinforcement Learning techniques with unsupervised learning approaches of Deep Neural Networks
    • Deep Reinforcement Learning techniques for partially observable Markov decision processes
    • State of the art in Deep Reinforcement Learning
  • Particle filtering
    • State-space models
    • Particle filtering methods
    • Applying the particle filter to stochastic volatility model with leverage and jumps
    • The Kalman filter
    • Some examples of linear-Gaussian state-space models: the Newtonian system, the autoregressive moving average models, continuous-time stochastic processes (the Wiener process, geometric Brownian motion (GBM), the Ornstein-Uhlenbeck process)
    • The extended Kalman filter
    • An example application of the extended Kalman filter: modelling credit spread
    • Outlier detection in (extended) Kalman filtering
    • Gaussian assumed density filtering
    • Parameter estimation
    • Relationship with Markov chain Monte Carlo methods
    • Prediction
    • Diagnostics


The course examines several real-life datasets, including:

  • S&P 500 stock data
  • High-frequency commodity future price data
  • Cryptocurrency order book data
  • FINRA TRACE corporate bond data
  • Car insurance claim data
  • The National Institute of Diabetes and Digestive and Kidney Diseases data about the Pima group diabetes tendency
Our delegates enter the celebrated Dining Hall at Christ Church, where, inter alios, Harry Potter used to dine


Your course is designed to be self-contained. However, should you wish to read up on Artificial Intelligence / Machine Learning before starting the course, we recommend

  • Michael R. Berthold (ed.), David Hand (ed.). Intelligent Data Analysis: An Introduction, second edition. Springer, 2006.
  • Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second edition. Springer, 2009.
  • Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning. MIT Press, 2017.
  • Murray R. Spiegel, John Schiller, R. Alu Srinivasan. Schaum’s Outlines: Probability and Statistics, second edition. McGraw-Hill, 2000.
  • John B. Fraleigh, Raymond A. Beauregard. Linear Algebra, third edition. Addison Wesley, 1995.
  • Gerard Cornuejols, Reha Tütüncü. Optimization Methods in Finance. Cambridge University Press, 2007.
  • Philip E. Gill, Walter Murray, Margaret H. Wright. Practical Optimization. Emerald Group Publishing Limited, 1982.

If you would like to read up on the mathematical foundations of your course (covered on the first day), we recommend

We also recommend the following video lectures: