### Overview

**Master Machine Learning (ML) / Artificial Intelligence (AI) in only three working days and one weekend at Oxford’s most beautiful college!**

Following an introduction to data science and the underlying mathematics, you will go up to Oxford to study the theory and practise machine learning on real financial examples. You will review, learn, and master:

- probability and statistics
- linear regression methods
- dimensionality reduction
- unsupervised machine learning
- bias-variance tradeoff
- model and feature selection
- classification
- neural nets
- deep learning
- recurrent neural networks, including LSTM
- reinforcement learning
- current frontiers in artificial intelligence (AI) and machine learning (ML)

You will also network with other data scientists, fintech industry leaders, and Oxford academics.

### Level and prerequisites

Among our delegates we have industry experts with multiple decades of experience and young Kaggle champions.

Most of our delegates have a degree in mathematics, computer science, physics, engineering, economics and/or finance. Some have PhD degrees. Some have extensive experience in ML/AI, whereas others are retraining from another quantitative discipline, such as finance or engineering.

The main mathematical prerequisites are

- linear algebra
- probability theory (basic, not measure-theoretic)
- optimisation theory

However, if you are rusty on any/all of these we review them on the first day of the training. Some of our delegates have left the academe a while ago and we appreciate that you may need a refresher.

We expect you to be able to program in Python. If you are not familiar with Python, we recommend that you attend our separate, one-day introductory course: Python for Data Science and Artificial Intelligence

**The course is intensive. We pride ourselves on our ability to take you from foundations to current state of research and practice in ML and AI in just five days.**

### Instructors

- Dr. Paul Bilokon
- Prof. Matthew Dixon
- Ed Silantyev

### Venue

The Christ Church (*Ædes Christi*) college of the University of Oxford.

### Accommodation

Accommodation at the historic Christ Church college is **included in the price**. You will join a **distinguished company of scholars** who lived in these very rooms: Lewis Carroll, Albert Einstein, William Ewart Gladstone, Robert Hooke, John Locke, Sir Robert Peel, and many others.

### Registration

### Schedule

Time | Day 1 | Day 2 | Day 3 | Day 4 | Day 5 |

08:30 – 09:00 | Registration and welcome | Registration and welcome | Registration and welcome | Registration and welcome | Registration and welcome |

09:00 – 10:00 | Lecture 1: Introduction to data science | Lecture 1: Statistical inference and estimation theory | Lecture 1: From statistics to supervised Machine Learning | Lecture 1: Deep Learning | Lecture 1: Reinforcement Learning |

10:00 – 10:30 | Tutorial 1 | Tutorial 1 | Tutorial 1 | Tutorial 1 | Tutorial 1 |

10:30 – 11:00 | Coffee break | Coffee break | Coffee break | Coffee break | Coffee break |

11:00 – 12:00 | Lecture 2: Probability theory | Lecture 2: Linear regression | Lecture 2: Model and feature selection | Lecture 2: Recurrent Neural Networks | Lecture 2: Inverse Reinforcement Learning |

12:00 – 12:30 | Tutorial 2 | Tutorial 2 | Tutorial 2 | Tutorial 2 | Tutorial 2 |

12:30 – 13:30 | Lunch | Lunch | Lunch | Lunch | Lunch |

13:30 – 14:30 | Lecture 3: Linear algebra | Lecture 3: Debugging linear regression | Lecture 3: Classification methods | Lecture 3: Applications of Neural Networks in finance | Lecture 3: Deep Reinforcement Learning |

14:30 – 15:00 | Tutorial 3 | Tutorial 3 | Tutorial 3 | Tutorial 3 | Tutorial 3 |

15:00 – 15:30 | Coffee break | Coffee break | Coffee break | Coffee break | Coffee break |

15:30 – 16:30 | Lecture 4: Optimisation theory | Lecture 4: Principal Components Analysis (PCA) and dimensionality reduction | Lecture 4: Unsupervised Machine Learning | Lecture 4: Prediction from financial time series | Lecture 4: Particle filtering |

16:30 – 17:00 | Tutorial 4 | Tutorial 4 | Tutorial 4 | Tutorial 4 | Tutorial 4 |

17:00 – 18:00 | Lab | Lab | Lab | Lab | Graduation and leaving drinks at the Battery |

18:00 – 19:30 | Tour of Christ Church | Tour of Oxford City Centre | Visit to Oxford’s historic pub The Eagle & Child, home of The Inklings | ||

19:30 – 21:00 | Dinner at the Dining Hall | Dinner at the Dining Hall | Dinner at the Dining Hall | Dinner at the Dining Hall |

### Syllabus

**Introduction to data science**- Data, information, knowledge, understanding, wisdom
- Analysis and synthesis
- Data analysis and data science
- The process of data science
- Artificial Intelligence and Machine Learning
- The language of Machine Learning
- Machine Learning and statistics

**Probability theory**- Random experiment and the sample space
- The classical interpretation of probability
- The frequentist interpretation of probability
- Bayesian interpretation of probability
- The axiomatic interpretation of probability
- Kolmogorov’s axiomatisation
- Conditional probability
- The Law of Total probability
- Bayes’s theorem
- Random variables
- Expectations
- Variances
- Covariances and correlations

**Linear algebra**- Vectors and matrices
- Matrix multiplication
- Inverse matrices
- Independence, basis, and dimension
- The four fundamental spaces
- Orthogonal vectors
- Eigenvalues and eigenvectors

**Optimisation theory**- The optimisation problem
- Optimisation in one dimension
- Optimisation in multiple dimensions
- Grid search
- Gradient-based optimisation
- Vector calculus
- Quasi-Newton methods
- Gradient descent (stochastic, batch)
- Evolutionary optimisation
- Optimisation in practice

**Statistical inference and estimation theory**- Point estimation
- Maximum likelihood estimation
- Loss functions
- Bias-variance tradeoff (dilemma)
- Standard error
- Fisher information
- Cramér-Rao lower bound (CRLB)
- Consistency
- Hypothesis testing
- p-values
- Bayesian estimation

**Linear regression**- Linear regression model in matrix form
- Disturbance versus residual
- The Ordinary Least Squares (OLS) approach
- Relationship with maximum likelihood estimation
- The geometry of linear regression
- The orthogonal projection with/without the intercept

**Debugging linear regression**- Variance partitioning
- Coefficient of determination (R2)
- An estimation theory view of linear regression
- How many data points do we need?
- Chi-squared distribution
- Cochran’s theorem
- Degrees of freedom
- Student’s t-distribution test
- Adjusted coefficient of determination
- The F-statistic
- The p-value

**Principal Components Analysis (PCA) and dimensionality reduction**- Modelling data as a random variable
- The covariance matrix
- Key properties of covariance matrices
- The sample covariance matrix
- The sample covariance matrix is unbiased
- The correlation matrix
- The sample correlation matrix
- Centered data
- The application of PCA
- The interpretation of PCA
- The advantages of PCA

**From statistics to supervised Machine Learning**- The supervised machine learning problem
- Train/test error and overfitting
- Underfitting
- The bias-variance tradeoff revisited
- Multicollinearity
- Polynomial regression

**Model and feature selection**- Model selection and averaging, drop-out
- Cross-validation: leave-N-out, K-fold
- Cross-validation for time series
- Sliding window for time series
- Bootstrap
- Ridge regression, L2 regularisation
- Derivation and statistical properties of the ridge estimator
- Examples of ridge regression
- LASSO regression, L1 regularisation
- Examples of LASSO regression
- Applications to market impact models

**Classification methods**- The classification problem
- Generalised Linear Models (GLM)
- Logistic regression as an example of a GLM
- Odds
- Evaluation of a classification model: confusion matrix
- Evaluation of a classification model: ROC chart
- Decision tree models
- Random forests

**Unsupervised Machine Learning**- K-means clustering
- Enhancements to K-means, K++
- Automatic specification of number of clusters
- Partitioning Around Medoids (PAM)
- Hierarchical clustering
- Agglomerative hierarchical clustering
- Divisive hierarchical clustering
- Linkage methods

**Deep Learning**- The perceptron
- Feed-forward Neural Networks
- Other networks
- Convergence results
- Stochastic gradient descent
- Variants of the stochastic gradient descent
- Weight decay scheduling
- Weight initialisation
- A geometrical approach to model interpretability
- A statistical approach to model interpretability

**Recurrent Neural Networks**- Autoregressive Neural Networks
- Gated Recurrent Networks
- Long-Short Term Memory Networks
- Semi-parametric Neural Networks for panel data
- Hybrid GARCH-ML models

**Applications of Neural Networks in finance**- Machine Learning in algorithmic finance
- Momentum strategies
- Predicting portfolio returns
- Data preparation
- Price impact models
- Limit Order Book updates
- Adverse selection
- Predictive performance comparisons

**Prediction from financial time series**- Time series data
- Classical time series analysis
- Autoregressive and moving average processes
- Stationarity
- Parametric tests
- In-sample diagnostics
- Time series cross-validation
- Predicting events
- Entropy
- Confusion matrix
- ROC chart
- Performance terminology
- Practical prediction issues
- Confusion matrices with oversampling
- ROC curves with oversampling
- Kernel regression
- Kernel estimators

**Reinforcement Learning**- Stochastic dynamic programming
- Markov Decision Process (MDP)
- Q-learning
- DeepMind: target networks
- Replay memory
- Exploration versus exploitation
- The market making example

**Inverse Reinforcement Learning**- Reinforcement Learning and Inverse Reinforcement Learning: differences and similarities
- Inverse Reinforcement Learning and Imitation Learning
- Constraints-based IRL
- Maximum entropy IRL
- IRL for linear-quadratic-Gaussian regulation
- Examples of IRL problems in finance

**Deep Reinforcement Learning**- Combination of Reinforcement Learning techniques with Supervised Learning approaches of Deep Neural Networks
- Combination of Reinforcement Learning techniques with unsupervised learning approaches of Deep Neural Networks
- Deep Reinforcement Learning techniques for partially observable Markov decision processes
- State of the art in Deep Reinforcement Learning

**Particle filtering**- State-space models
- Particle filtering methods
- Applying the particle filter to stochastic volatility model with leverage and jumps
- The Kalman filter
- Some examples of linear-Gaussian state-space models: the Newtonian system, the autoregressive moving average models, continuous-time stochastic processes (the Wiener process, geometric Brownian motion (GBM), the Ornstein-Uhlenbeck process)
- The extended Kalman filter
- An example application of the extended Kalman filter: modelling credit spread
- Outlier detection in (extended) Kalman filtering
- Gaussian assumed density filtering
- Parameter estimation
- Relationship with Markov chain Monte Carlo methods
- Prediction
- Diagnostics

### Datasets

The course examines several real-life datasets, including:

- S&P 500 stock data
- High-frequency commodity future price data
- Cryptocurrency order book data
- FINRA TRACE corporate bond data
- Car insurance claim data
- The National Institute of Diabetes and Digestive and Kidney Diseases data about the Pima group diabetes tendency

### Bibliography

Your course is designed to be self-contained. However, should you wish to read up on Artificial Intelligence / Machine Learning before starting the course, we recommend

- Michael R. Berthold (ed.), David Hand (ed.).
*Intelligent Data Analysis: An Introduction,*second edition. Springer, 2006. - Trevor Hastie, Robert Tibshirani, Jerome Friedman.
*The Elements of Statistical Learning: Data Mining, Inference, and Prediction*, second edition. Springer, 2009. - Ian Goodfellow, Yoshua Bengio, Aaron Courville.
*Deep Learning.*MIT Press, 2017.

- Murray R. Spiegel, John Schiller, R. Alu Srinivasan.
*Schaum’s Outlines: Probability and Statistics*, second edition. McGraw-Hill, 2000. - John B. Fraleigh, Raymond A. Beauregard.
*Linear Algebra*, third edition. Addison Wesley, 1995. - Gerard Cornuejols, Reha Tütüncü.
*Optimization Methods in Finance*. Cambridge University Press, 2007. - Philip E. Gill, Walter Murray, Margaret H. Wright.
*Practical Optimization.*Emerald Group Publishing Limited, 1982.

If you would like to read up on the mathematical foundations of your course (covered on the first day), we recommend

We also recommend the following video lectures:

- Gilbert Strang. Linear Algebra, course 18.06. MIT, Fall of 1999: https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/