Seminar Series
Thursday 3-4pm (GMT) (unless otherwise specified)
Participants can join our mailing list to receive notifications about the seminar series.
Organizers plan events in a hybrid format allowing participants on-site at the Oxford-Man Institute of Quantitative Finance to connect with participants off-site through Zoom.
The seminar series covers a wide range of topics at the intersection of statistics, machine learning and finance. Topics include network analysis, limit order books and analysis of order flows, time series forecasting, synthetic data generation, asset pricing, financial econometrics, market microstructure, news sentiment, portfolio management, high-frequency and high-dimensional statistics, and decentralised finance.
Talks (Michaelmas 2024)
Abstract
This paper focuses on the task of detecting local episodes involving violation of the standard Itô semimartingale assumption for financial asset prices in real time that might induce arbitrage opportunities. Our proposed detectors, defined as stopping rules, are applied sequentially to continually incoming high-frequency data. We show that they are asymptotically exponentially distributed in the absence of Itô semimartingale violations. On the other hand, when a violation occurs, we can achieve immediate detection under infill asymptotics. A Monte Carlo study demonstrates that the asymptotic results provide a good approximation to the finite-sample behavior of the sequential detectors. An empirical application to S&P 500 index futures data corroborates the effectiveness of our detectors in swiftly identifying the emergence of an extreme return persistence episode in real time.Abstract
TBDAbstract
TBDAbstract
TBDTalks (Hilary 2025)
Abstract
This study introduces a novel suite of historical large language models (LLMs) pre-trained specifically for accounting and finance, utilising a diverse set of major textual resources. The models are unique in that they are year-specific, spanning from 2007 to 2023, effectively eliminating look-ahead bias, a limitation present in other LLMs. Empirical analysis reveals that, in trading, these specialised models outperform much larger models, including the state-of-the-art LLaMA 1, 2, and 3, which are approximately 50 times their size. The findings are further validated through a range of robustness checks, confirming the superior performance of these LLMs.Abstract
TBDAbstract
This paper relates jumps in high frequency stock prices to firm-level, industry and macroeconomic news, in the form of machine-readable releases from Thomson Reuters News Analytics. We find that most relevant news, both idiosyncratic and systematic, lead quickly to price jumps, as market efficiency suggests they should. However, in the reverse direction, the vast majority of price jumps do not have identifiable public news that can explain them, in a departure from the ideal of a fair, orderly and efficient market. Microstructure-driven variables have only limited predictive power to help distinguish between jumps with and without news.Abstract
TBDAbstract
TBDPrevious Talks
Abstract
Based on options and realized returns we analyze risk premia in the Bitcoin market through the lens of the Pricing Kernel (PK). We identify that (1) The projected PK into Bitcoin returns is W-shaped and steep in the negative returns region; (2) Negative Bitcoin returns account for 33% of the total Bitcoin index premium (BP) in contrast to 70% of S&P500 equity premium explained by negative returns. Applying a novel clustering algorithm to the collection of estimated Bitcoin risk-neutral densities, we find that risk premia vary over time as a function of two distinct market volatility regimes. In the low-volatility regime, the PK projection is steeper for negative returns and has a more pronounced W-shape than the unconditional one, implying particularly high BP for both extreme positive and negative returns and a high Variance Risk Premium (VRP). In high-volatility states, the BP attributable to positive and negative returns is more balanced and VRP is lower. Overall, Bitcoin investors are more worried about variance and downside risk in low volatility states.Abstract
We provide necessary and sufficient conditions for an (Unbiased) Block estimator to have Uniformly Minimum Variance. Our theory parallels the theory of UMVU estimation, the main novel insight being the focus on the covariance among blocks. We use this theory to derive lower variance bounds for block estimators of functionals of high-frequency volatility when the block size is fixed. We further show the relevance of the new theory for the classical problem of estimation of homoskedastic nonparametric regressions with varying mean. Finally, we introduce a new test for the presence of drift in financial data which exploits the precision of BUMVU estimators. The test shows abundant presence of drift in financial data.Abstract
We develop a novel nonparametric estimator of integrated variance that utilizes intraday candlestick information, comprised of the high, low, open, and close prices within short time intervals. The proposed estimator is robust to short-lived extreme return persistence hardly attributable to the diffusion component, such as gradual jumps and flash crashes. By modelling such sharp but continuous price movements following some recent theoretical advances, we demonstrate that our new estimator can provide consistent estimates with variances about four times smaller than those obtained with the differenced-return volatility (DV) estimator. Monte Carlo simulations and empirical applications further validate the practical reliability of our proposed estimator with some finite-sample refinements.Abstract
I investigate how investors react to volatility shock in stock options market. In sharp contrast to the underreaction in the aggregate, investors overreact to less persistent idiosyncratic volatility shock in stock options market. Straddles written on stocks with large increases in volatility innovations underperform those with large decreases in volatility innovations by 5.30% per month. Consistent with the overreaction interpretation, higher idiosyncratic volatility shocks predict higher realized variance risk premiums. Moreover, the return predictability result is stronger for straddles written on stocks with earnings announcement during the holding period or dominated by unsophisticated investors. In response to this overreaction induced demand pressure, market makers charge higher premiums and bid-ask spreads as compensations for the increased market making risk. I also rule out the hedging alternative.Abstract
We generalize the parametric portfolio policy framework to portfolio weight functions of any complexity by using deep neural networks. More complex network-based portfolio policies increase investor utility and achieve between 75 and 276 basis points higher monthly certainty equivalent returns than a comparable linear portfolio policy. Risk aversion serves an important function as an economically motivated model regularization parameter, with higher risk aversion leaning against model complexity. Overall, our findings demonstrate that, looking beyond expected returns, network-based policies better capture the non-linear relationship between investor utility and firm characteristics but the benefits of using complex models vary with investor preferences. Results hold after considering realistic portfolio settings with short sale or weight restrictions and returns after transaction costs.Abstract
We delve into the computation of fill probabilities for limit orders positioned at various price levels within the limit order book, a critical aspect in execution optimization. We adopt a generic stochastic model to capture the dynamics of the order book as a series of queueing systems. This generic model is state-dependent and encompasses stylized factors. We subsequently derive semi-analytical expressions to compute the relevant probabilities within the context of state-dependent stochastic order flows. These probabilities cover various scenarios, including the probability of a change in the mid-price, the fill probabilities of orders posted at the best quotes, and those posted at a price level deeper in the book, before the opposite best quote moves. Lastly, we conduct extensive numerical experiments using real order book data from the foreign exchange spot market.Abstract
Hawkes Process has been used to model Limit Order Book (LOB) dynamics in several ways in the literature however the focus has been limited to capturing the inter-event times while the order size is usually assumed to be constant. We propose a novel methodology of using Compound Hawkes Process for the LOB where each event has an order size sampled from a calibrated distribution. The process is formulated in a novel way such that the spread of the process always remains positive. Further, we condition the model parameters on time of day to support empirical observations. We make use of an enhanced non-parametric method to calibrate the Hawkes kernels and allow for inhibitory cross-excitation kernels. We showcase the results and quality of fits for an equity stock's LOB in the NASDAQ exchange and compare them against several baselines. Finally, we conduct a market impact study of the simulator and show the empirical observation of a concave market impact function is indeed replicated.Abstract
We study the dynamics of crypto asset returns through the lens of factor models. Given the limited number of tradable assets and years of data and the rich set of available asset characteristics, we develop novel estimation procedures with supporting econometric theory for a dynamic latent-factor model with high-dimensional asset characteristics, that is, the number of characteristics is on the order of the sample size. Utilizing the Double Selection Lasso estimator, our procedure employs regularization to eliminate characteristics with low signal-to-noise ratios yet maintains asymptotically valid inference for our asset pricing tests. In our empirical panel, we find the new estimator obtains comparable out-of-sample pricing ability and risk-adjusted returns to benchmark methods. We provide an inference procedure for measuring the risk premium of an observable nontradable factor, and employ this to find crypto's inflation-mimicking portfolio has positive risk compensation. Finally, specifying a factor model with nonparametric loadings and factors, we utilize recent methods in deep learning to maximize out-of-sample risk-adjusted returns in an hourly panel, which yields economically significant alphas even after a detailed accounting of transaction costs.Abstract
This paper addresses a fundamental question in economic forecasting: is variable selection beneficial to predictions? Economists typically conduct variable selection to eliminate noise from predictors. However, we provide a theoretical justification that economic forecast models are not sparse. In addition, we prove a compelling result: in most economic forecasts, including noise in predictions yields greater benefits than its exclusion. Furthermore, if the total number of predictors is not sufficiently large, intentionally adding more noise yields superior forecast performance, outperforming benchmark predictors relying on dimension reduction. The intuition lies in economic predictive signals being densely distributed among regression coefficients, maintaining modest forecast bias while diversifying away overall variance. Therefore, economic forecasts can significantly benefit from the 'benign overfitting' even if a significant proportion of predictors constitute pure noise. One of our empirical demonstrations shows that intentionally adding 300-6000 pure noise to the Welch and Goyal ( 2008 ) dataset archives a noteworthy 10% out-of-sample R squre accuracy in forecasting the annual U.S. equity premium. The performance surpasses the majority of sophisticated machine learning models.Abstract
The use of machine learning to generate synthetic data has grown significantly in popularity over the past few years. The core methodology these models use is to learn the distribution of the underlying data, similar to the classical methods of fitting statistical models to data that are common in finance. In this presentation, we discuss the efficacy of using modern machine learning methods, specifically conditional importance weighted autoencoders (a variant of variational autoencoders) and conditional normalizing flows, to model the returns of equities. We apply our method to learn a 500-dimension joint distribution for SP 500 members. We show that this generative model has a broad range of applications in finance, including generating realistic synthetic data, volatility and correlation estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization.Abstract
We exploit cutting-edge deep learning methodologies to explore the predictability of high-frequency Limit Order Book mid-price changes for a heterogeneous set of stocks traded on the NASDAQ exchange. In so doing, we release `LOBFrame', an open-source code base to efficiently process large-scale Limit Order Book data and quantitatively assess state-of-the-art deep learning models' forecasting capabilities. Our results are twofold. We demonstrate that the stocks' microstructural characteristics influence the efficacy of deep learning methods and that their high forecasting power does not necessarily correspond to actionable trading signals. We argue that traditional machine learning metrics fail to adequately assess the quality of forecasts in the Limit Order Book context. As an alternative, we propose an innovative operational framework that evaluates predictions' practicality by focusing on the probability of accurately forecasting complete transactions. This work offers academics and practitioners an avenue to make informed and robust decisions on the application of deep learning techniques, their scope and limitations, effectively exploiting emergent statistical properties of the Limit Order Book.Abstract
When a company releases earnings results or makes announcements, a sectoral wide lead-lag effect from the stock on the entire system may occur. To improve the estimation of a system experiencing system-wide lead-lag effects from a single asset in the presence of short time series, we introduce a model for Large-scale Influencer Structures in Vector AutoRegressions (LISAR). We study the asymptotic properties of the estimator and validate its performance in extensive synthetic data experiments. We study the performance of the LISAR model on high-frequency data for the constituents of the SP100, separated by sectors. We find the LISAR model to significantly outperform on up to 14.7% of the days in terms of forecasting accuracy. Trading strategies with signals derived from the LISAR model achieved up to 60% excess return compared to other strategies. We show in this study, that in the presence of influencer structures within a sector, the LISAR model, compared to alternative models, provides higher accuracy, better forecasting results, and improves the understanding of market movements and sectoral structures.Abstract
Recent studies document strong performance for machine-learning-based investment strategies. These strategies use anomaly variables discovered ex-post as predictors of stock returns and may not be implementable in real time. We construct real-time machine learning strategies based on a universe of fundamental signals. While positive and significant, the out-of-sample performance of these strategies is significantly weaker than those documented by prior studies, especially in value-weighted portfolios. We find similar results when examining a universe of past return-based signals. Our results offer a more tempered view of the economic gains associated with machine learning strategies relative to prior literature.Abstract
We introduce a novel measure of weather risk implied from weather options' contracts. WIVOL captures risks of future temperature oscillations, increasing with climate uncertainty about physical events and regulatory policies. We find that shocks to weather volatility increase the likelihood of unexpected costs: a one-standard deviation change in WIVOL increases quarterly operating costs by 2%, suggesting that firms, on average, do not fully hedge exposures to weather risks. We estimate returns' exposure to WIVOL innovations and show that more negatively exposed firms are valued at a discount, with investors demanding higher compensations to hold these stocks. Firms' exposure to local but not foreign WIVOL predicts returns, which confirms the geographic nature of weather risks shocks.Abstract
We develop a novel methodology for extracting information from option implied volatility (IV) surfaces for the cross-section of stock returns, using image recognition techniques from machine learning (ML). The predictive information we identify is essentially uncorrelated with most of the existing option-implied characteristics, delivers a higher Sharpe ratio, and has a significant alpha relative to a battery of standard and option-implied factors. We show the virtue of ensemble complexity - Best results are achieved with a large ensemble of ML models, with the out-of-sample performance increasing in the ensemble size, saturating when the number of model parameters significantly exceeds the number of observations. We introduce principal linear features, an analog of principal components for ML, and use them to show IV feature complexity - A low-rank rotation of the IV surface cannot explain the model performance. Our results are robust to short-sale constraints and transaction costs.Abstract
Recent developments on joint scoring functions for Value-at-Risk and Expected Shortfall allow for consistent implementation of statistical tests based on the Model Confidence Set (MCS). MCS-test is shown to be a powerful tool for model comparison, both in-sample and out-of-sample. Another branch of literature focused on the superior performance of convex forecast combinations, which often outperform stand-alone forecasting models. This paper combines both research directions and proposes a novel approach to a forecast combination of Value-at-Risk and Expected Shortfall based on MCS-testing. By means of the bagged pretest forecasting combination (BPFC) algorithm we exploit the statistical properties of bootstrap aggregation (bagging) and combine competing models based on the bootstrapped probability of the model being in the MCS. The resulting forecast combination allows for a flexible and smooth switch between the underlying models and outperforms the corresponding stand-alone forecasts.Abstract
Signature transforms are iterated path integrals of continuous and discrete-time time series data, and their universal nonlinearity linearizes the problem of feature selection. This paper revisits some statistical properties of signature transform under stochastic integrals with a Lasso regression framework, both theoretically and numerically. Our study shows that, for processes and time series that are closer to Brownian motion or random walk with weaker inter-dimensional correlations, the Lasso regression is more consistent for their signatures defined by It\^o integrals; for mean reverting processes and time series, their signatures defined by Stratonovich integrals have more consistency in the Lasso regression. Our findings highlight the importance of choosing appropriate definitions of signatures and stochastic models in statistical inference and machine learning. This is joint work with Xin Guo and Chaoyi Zhao.Abstract
The LASSO-type shrinkage method have become increasingly popular in the big data era. However, variable correlations can significantly compromise the stability and validity of such estimators. This paper advances the development of a correlation-robust LASSO-type estimator, the ordered-weighted-LASSO (OWL) estimator (Figueiredo and Nowak, 2016). We develop the (non)asymptotic properties of the OWL estimator, considering less restrictive conditions, including the alpha-mixing condition and accommodating heavier tails than the standard i.i.d. sub-Gaussian setting. Furthermore, we propose a de-biased version of this estimator and establish its asymptotic normality. Through simulated data, we demonstrate that the de-biased OWL estimator significantly reduces estimation errors. Empirically, we apply it to identify crucial factors from the factor zoo, revealing that, despite high correlation with numerous other factors, the market factor is the most influential in driving cross-sectional asset returns. Our findings also highlight the significant impact of liquidity, profitability and momentum-related factors on these returns.Abstract
I refine the test for clustering of Patton and Weller (2022) to allow for cluster switching. In a multivariate panel setting, clustering on time-averages produces consistent estimators of means and group assignments. Once switching is introduced, we lose the consistency. In fact, under switching the time-averaged k-means clustering converges to equal, indistinguishable means. This causes the test for a single cluster to lose power under the alternative of multiple clusters. Power can be regained by clustering the N times T observations independently and carefully subsampling the time dimension. When applied to the empirical setting of Bonhomme and Manresa (2015) of an autoregression of democracy in a panel of countries, we are able to detect clusters in the data under noisier conditions than the original test.Abstract
There is extensive literature on stock return predictability in the cross section and time series, but it all examines the aggregate performance by ignoring the heterogeneous stock return predictability. This paper challenges these aggregate return predictability illusions and shows that most stock return observations are empirically unpredictable. By excluding a small set of predictable observations, most positive machine learning forecast-implied investment performances disappear, and so do many significant risk factors. To this end, we develop an asset clustering approach to identify the most likely stock return observations in the cross section and time series. Our regression tree approach confirms the heterogeneous return predictability sources with firm characteristic ranges and macroeconomic conditions. By focusing on a small set of predictable observations, forecast-based strategies and risk factors have performed significantly better over the past fifty years.Abstract
We investigate to what extent the volatility feedback effect contributes to explaining the time-varying sensitivity of stock markets to macroeconomic announcements. By combining the standard Campbell-Shiller log-linear present value framework with a novel two-component volatility model for the conditional variance of cash flow news, we show that news to the long-term volatility component is an important driver of discount rate news. When long-term volatility is high, stock returns are more sensitive to news, and there is an asymmetry in the response to good and bad news. By investigating the instantaneous reaction of the S&P 500 to major U.S. macroeconomic announcements, we empirically confirm our model's predictions.Abstract
Managing high-frequency market data has been a challenging task in finance. A limit order book is a collection of orders that a trader intends to place, either to buy or sell at a certain price. Traditional approaches often fall short in forecasting future limit orders because of their high frequency and volume. In this study, we propose a modified attention algorithm to analyze the movement patterns in a limit order book. The enormous amount of data with millisecond time stamps are efficiently examined and processed using an attention module, which highlights important aspects of limit orders. We demonstrate that our modified attention algorithm improves the forecasting accuracy of limit orders.Abstract
Inventory models posit that return autocorrelation is affected by collateral, volume, and expected volatility. We show that daily market autocorrelations are lower on negative return days, consistent with collateral concerns. We use DJIA intraday return data going back to 1933 to obtain more precise volatility estimates and document, unlike previous literature, a strong role of volatility on market autocorrelation. Puzzlingly, anticipated volume, not volume shocks, drive reversals. Sparked by these findings, we construct a liquidity risk factor in accordance with Pastor-Stambaugh (2003) that is volatility, not volume, based. The volatility-based factor is more robust and has a higher risk premium than the volume-based factor.Abstract
The recent high-profile failures of a number of crypto firms have reignited the debate on the appropriate policy response to address the risks in crypto. The “shadow financial” functions enabled by crypto markets share many of the vulnerabilities of traditional finance and risks are often exacerbated by specific features of crypto. Authorities may consider different, and not mutually exclusive, lines of action to tackle the risks in crypto. These include (i) bans, which could tackle specific aspects of the crypto ecosystem, (ii) containment so that the real economy is insulated from crypto risks, and (iii) the regulation of the crypto sector. The paper highlights the pros and cons of the different approaches and proposes a framework to choose when bans, containment and regulation are most appropriate. In any case, central banks and public authorities could also work to make traditional financial more attractive, thereby allowing responsible innovation to thrive.Abstract
This paper deals with identification and inference on the unobservable conditional factor space and its dimension in large unbalanced panels of asset returns. The model specification is nonparametric regarding the time-variation of loadings as functions of lagged common shocks and individual characteristics. The number of active factors can also be time-varying as an effect of the changing macroeconomic environment. The method uses instrumental variables which have full-rank covariation with the factor betas in the cross-section, and allows for a high-dimensional vector generating the conditioning information. We use Double Machine Learning to show that average conditional canonical correlations between latent and observed factors, and similar parameters of interest, are asymptotically normal. We find that the conditional factor space extracted from the panel of monthly returns of individual stocks in the CRSP dataset is low-dimensional and overlaps only partly with the span of traditional sets of empirical factors.Abstract
The extant literature predicts market returns with “simple” models that use only a few parameters. Contrary to conventional wisdom, we theoretically prove that simple models severely understate return predictability compared to “complex” models in which the number of parameters exceeds the number of observations. We empirically document the virtue of complexity in US equity market return prediction. Our findings establish the rationale for modeling expected returns through machine learning.Abstract
We propose that investment strategies should be evaluated based on their net-of-trading-cost return for each level of risk, which we term the "implementable efficient frontier." While numerous studies use machine learning return forecasts to generate portfolios, their agnosticism toward trading costs leads to excessive reliance on fleeting small-scale characteristics, resulting in poor net returns. We develop a framework that produces a superior frontier by integrating trading-cost-aware portfolio optimization with machine learning. The superior net-of-cost performance is achieved by learning directly about portfolio weights using an economic objective. Further, our model gives rise to a new measure of "economic feature importance".Abstract
This paper develops a novel method to estimate a latent factor model for a large target panel with missing observations by optimally using the information from auxiliary panel data sets. We refer to our estimator as target-PCA. Transfer learning from auxiliary panel data allows us to deal with a large fraction of missing observations and weak signals in the target panel. We show that our estimator is more efficient and can consistently estimate weak factors, which are not identifiable with conventional methods. We provide the asymptotic inferential theory for target-PCA under very general assumptions on the approximate factor model and missing patterns. In an empirical study of imputing data in a mixed-frequency macroeconomic panel, we demonstrate that target-PCA significantly outperforms all benchmark methods.Abstract
In this presentation we summarize different modelling and and predictive strategies for cryptocurrency. We begin with univariate models that features time-varying moments up to the fourth. Then we extend the forecasting to multivariate model using time-varying Vector Autoregressive models. Finally, we introduce a new study that combines multivariate models and time-varying higher moments in portfolio allocation.Abstract
We study a multi-factor block model for variable clustering and connect it to the regularized subspace clustering by formulating a distributionally robust version of the nodewise regression. To solve the latter problem, we derive a convex relaxation, provide guidance on selecting the size of the robust region, and hence the regularization weighting parameter, based on the data, and propose an ADMM algorithm for implementation. We validate our method in an extensive simulation study. Finally, we propose and apply a variant of our method to stock return data, obtain interpretable clusters that facilitate portfolio selection and compare its out-of-sample performance with other clustering methods in an empirical study. This talk is based on joint work with Xunyu Zhou and Xiao Xu.Abstract
Supply chain business interruption has been identified as a key risk factor in recent years, with high-impact disruptions due to disease outbreaks, logistic issues such as the recent Suez Canal blockage showing examples of how disruptions could propagate across complex emergent networks. Researchers have highlighted the importance of gaining visibility into procurement interdependencies between suppliers to develop more informed business contingency plans. However, extant methods such as supplier surveys rely on the willingness or ability of suppliers to share data and are not easily verifiable. In this article, we pose the supply chain visibility problem as a link prediction problem from the field of Machine Learning (ML) and propose the use of an automated method to detect potential links that are unknown to the buyer with Graph Neural Networks (GNN). Using a real automotive network as a test case, we show that our method performs better than existing algorithms. Additionally, we use Integrated Gradient to improve the explainability of our approach by highlighting input features that influence GNN’s decisions. We also discuss the advantages and limitations of using GNN for link prediction, outlining future research directions.Abstract
The overnight material news events and sources of stock illiquidity can be potentially important sources of jumps in stock returns. We find that for the average firm in the cross-section, stock illiquidity is more likely to drive a stock return jump than either day or overnight news flow frequency and content, however, for larger firms there is a higher likelihood that the stock return jump is driven by overnight news flow frequency. Yet our results find a larger idiosyncratic jump size for a higher number of day news articles than stock illiquidity for the average and large firms. Our results show how day and overnight news flow, stock illiquidity, and order flow are reflected in stock return jumps and idiosyncratic jump risk.Abstract
The presence of time series momentum has been widely documented in financial markets across asset classes and countries. In this study, we find a predictable pattern of the realized semivariance estimators for the returns of commodity futures, particularly during the reversals of time series momentum. Based on this finding, we propose a rule-based time series momentum strategy that has a statistically significant higher Sharpe ratio compared to the benchmark of the original time series momentum strategy in the out-of-sample data. The results are robust to different subsamples, lookback windows, volatility scaling, execution lag, and transaction cost.Abstract
We develop spectral volume models to systematically estimate, explain, and exploit the high-frequency periodicity in intraday trading activities using Fourier analysis. The framework consistently recovers periodicities at specific frequencies in three steps, despite their low signal-to-noise ratios. This reveals important and universal high-frequency periodicities across 2,573 stocks in the United States (US) and Chinese markets over a full year. The dominant frequencies are at 10-seconds, 15-seconds, 20-seconds, 30-seconds, 1-minute, and 5-minutes for the US market and 1-minute, 2.5-minutes, 5-minutes, and 10-minutes for the Chinese market. They each explain from 1.5 to 10 percent of the variance of de-trended intraday volumes on average. Through three different perspectives, we provide statistically significant evidence that this phenomenon is driven by trading algorithms that rely on periodic information arrivals, rather than trading cost considerations. Finally, we demonstrate the practical value of uncovering these high-frequency periodicities in improving intraday volume predictions, which leads to potential economic gains in intraday execution strategies.Abstract
Understanding stock market instability is a key question in financial management as practitioners seek to forecast breakdowns in asset comovements which expose portfolios to rapid and devastating collapses in value. The structure of these comovements can be described as a graph where companies are represented by nodes and edges capture correlations between their price movements. Learning a timely indicator of comovement breakdowns (manifested as modifications in the graph structure) is central in understanding both financial stability and volatility forecasting. We propose to use the edge reconstruction accuracy of a graph autoencoder (GAE) as an indicator for how spatially homogeneous connections between assets are, which, based on financial network literature, we use as a proxy to infer market volatility. Our experiments on the S&P 500 over the 20152022 period show that higher GAE reconstruction error values are correlated with higher volatility. We also show that outofsample autoregressive modeling of volatility is improved by the addition of the proposed measure. Our paper contributes to the literature of machine learning in finance particularly in the context of understanding stock market instability.Abstract
Neural networks that are able to reliably execute algorithmic computation may hold transformative potential to both machine learning and theoretical computer science. On one hand, they could enable the kind of extrapolative generalisation scarcely seen with deep learning models. On another, they may allow for running classical algorithms on inputs previously considered inaccessible to them. Over the past few years, the pace of development in this area has gradually become intense. As someone who has been very active in its latest incarnation, I have witnessed these concepts grow from isolated toy experiments, through NeurIPS spotlights, all the way to helping detect patterns in complicated mathematical objects (published on the cover of Nature) and supporting the development of generalist reasoning agents. In this talk, I will give my personal account of this journey, and especially how our own interpretation of this methodology, and understanding of its potential, changed with time. It should be of interest to a general audience interested in graphs, (classical) algorithms, reasoning, and building intelligent systems.Abstract
We consider the estimation of causal effects in panel data settings. During a given time period, one observes units of interest and stores the realized outcomes into a matrix. At a fixed point in time, a subset of the units is exposed to an irreversible treatment, i.e., the data matrix of treated units has a block structure. The objective is to design an estimator for the counterfactual outcomes of the block of treated units. For large sample sizes and under typical statistical settings, we show that the use of matrix completion (MC) estimators for counterfactual recovery yields phase transition (PT) phenomena, where it is possible to distinguish regions of the parameter space where a perfect estimation of the counterfactual is possible from those where it is not. We determine the separating line (the so-called phase transition (PT) curve) between the regions, and show that it admits a closed form expression that directly relates time series and cross-sectional heterogeneity among units to the number of untreated units and the initial time of the treatment. Our methodology is designed to handle settings where the starting time of the treatment and the number of control (untreated) units are not necessarily identical, i.e., where the block of counterfactuals in the matrix of control outcomes is a rectangular matrix. We support our theoretical analysis with numerical simulations, which further indicate that an exact counterfactual recovery is attainable even for fairly small sample sizes. (joint work with Mijailo Stojnic).Abstract
This paper proposes a method for model determination in ultra-high dimensional cointegrated systems where the cross-section dimension m can even largely exceed the sample size T. For such ultra-high dimensional cases, we require an adequate non-standard pre-screening step which we develop for the nonstationary cointegration vector but also for the stationary loading matrix. We prove that identified sets for the non-zero loadings and the cointegration space contain the respective true sets with high probability. A feasible algorithm is provided, making the technique easily accessible for practitioners. In a second step, we employ reduced rank regression based on the pre-selected set of variables, and show the cointegration rank selection consistency of the overall procedure. In order to achieve consistent rank selection, we propose a tailored information criterion which is also of general interest for factor models when both strong and weak factors are present. Results of the simulation study demonstrate competitive performance of the proposed methodology. In an empirical study with 1045 NASDAQ stocks, the proposed methodology allows for large-scale multivariate predictive regression for the entire system.Abstract
Any lead-lag effect in an asset pair implies the future returns on the lagging asset have the potential to be predicted from past and present prices of the leader, thus creating statistical arbitrage opportunities. We utilize robust lead-lag indicators to uncover the origin of price discovery and we propose an econometric model exploiting that effect with level 1 data of limit order books (LOB). We also develop a high-frequency trading strategy based on the model predictions to capture arbitrage opportunities. The framework is then evaluated on six months of DAX 30 cross-listed stocks' LOB data obtained from three European exchanges in 2013 -- Xetra, Chi-X, and BATS. We show that a high-frequency trader can profit from lead-lag relationships because of predictability, even when trading costs, latency, and execution-related risks are considered. Keywords - Lead-lag relationship, High-frequency trading, Statistical arbitrage, Limit order book, Cross-listed stocks, Econometric models.Abstract
In Cong, Feng, He, and He (2022), we develop a new class of tree-based models (P-Tree) for analyzing (unbalanced) panel data utilizing global (instead of local) split criteria that incorporate economic guidance to guard against overfitting while preserving interpretability. We grow a P-Tree top-down to split the cross section of asset returns to construct stochastic discount factors and test assets, generalizing sequential security sorting and visualizing (asymmetric) nonlinear interactions among firm characteristics and macroeconomic states. Data-driven P-Tree models reveal that idiosyncratic volatility and earnings-to-price ratio interact to drive cross-sectional return variations in U.S. equities; market volatility and inflation constitute the most critical regime-switching that asymmetrically interacts with characteristics. P-Trees outperform most known observable and latent factor models in pricing individual stocks and test portfolios, while delivering transparent trading strategies and risk-adjusted investment outcomes (e.g., out-of-sample annualized Sharp ratios of about 3 and monthly alpha around 0.8%). Time-permitting, I will briefly discuss Cong, Feng, He, and Li (2022) --- a further development of the panel tree framework for jointly clustering asset returns and modeling heterogeneous factor pricing under a Bayesian framework.Abstract
Recent studies suggest that networks among firms (sectors) play an essential role in asset pricing. However, it is challenging to capture and investigate the implications of networks due to the continuous evolution of networks in response to market micro and macro changes. This paper combines two state-of-the-art machine learning techniques to develop an end-to-end graph neural network model and shows its applicability in asset pricing. First, we apply the graph attention mechanism to learn dynamic network structures of the equity market over time and then use a recurrent convolutional neural network to diffuse and propagate firms fundamental information into the learned networks. Our model is efficient in both return prediction and portfolio performance. The result persists in different sensitivity tests and simulated data. We also show that the dynamic network learned from our model is able to capture major market events over time.Abstract
In the last decade, the arrival of new forms of social media has drastically increased the amount of personal data generated online. The massive amount of data available has shown a lot of opportunities for industries and research. In particular, increasing numbers of quantitative investors start to rely on alternative data to adapt their position in the market. However, it is still unclear whether aggregated online data could generate excess returns in active investing and allow refining positions on the stock market. In the present talk, we propose to tackle the question by focusing on three underlying themes. First, we will introduce one of the first viable approaches to the estimation of individual-level ideological positions derived from social media content. Second, we will show how a consensus model can be used to predict opinion evolution in online collective behaviour and how the "wisdom of the crowd" relates to group influence. Finally, we will explore whether aggregated opinion signals have potential to predict financial fundamentals and build an edge on the market.Abstract
Estimating high dimensional covariance matrices for portfolio optimization is challenging because the number of parameters to be estimated grows quadratically in the number of assets. When the matrix dimension exceeds the sample size, the sample covariance matrix becomes singular. A possible solution is to impose a (latent) factor structure for the cross-section of asset returns as in the popular capital asset pricing model. Recent research suggests dimension reduction techniques to estimate the factors in a data-driven fashion. We present an asymmetric autoencoder neural network-based estimator that incorporates the factor structure in its architecture and jointly estimates the factors and their loadings. We test our method against well established dimension reduction techniques from the literature and compare them to observable factors as benchmark in an empirical experiment using stock returns of the past five decades. Results show that the proposed estimator is very competitive, as it significantly outperforms the benchmark across most scenarios. Analyzing the loadings, we find that the constructed factors are related to the stocks’ sector classification.Abstract
We propose a Degree-Corrected Block Model with Dependent Multivariate Poisson edges (DCBM-DMP) to study stock co-jump dependency. To estimate the community structure, we extend the SCORE algorithm in Jin (2015) and develop a Spectral Clustering On Ratios-of-Eigenvectors for networks with Dependent Multivariate Poisson edges (SCORE-DMP) algorithm. We prove that SCORE-DMP enjoys strong consistency in community detection. Empirically, using high-frequency data of S&P 500 constituents, we construct two co-jump networks according to whether the market jumps and find that they exhibit different community features than GICS. We further show that the co-jump networks help in stock return prediction.Abstract
We learn from data that volatility is mostly path-dependent. Up to 90% of the variance of the implied volatility of equity indexes is explained endogenously by past index returns, and up to 65% for (noisy estimates of) future daily realized volatility. The path-dependency that we uncover is remarkably simple because a linear combination of a weighted sum of past daily returns and the square root of a weighted sum of past daily squared returns with different time-shifted power-law weights capturing both short and long memory. This simple model, which is homogeneous in volatility, is shown to consistently outperform existing models across equity indexes and train/test sets for both implied and realized volatility. It suggests a simple continuous-time path-dependent volatility (PDV) model that may be fed historical or risk-neutral parameters. The weights can be approximated by superpositions of exponential kernels to produce Markovian models. In particular, we propose a 4-factor Markovian PDV model which captures all the important stylized facts of volatility, produces very realistic price and volatility paths, and jointly fits SPX and VIX smiles remarkably well. We thus show, for the first time, that a continuous-time Markovian parametric stochastic volatility (actually, PDV) model can practically solve the joint S&P 500/VIX smile calibration problem.Abstract
Industry classification schemes provide a taxonomy for segmenting companies based on their business activities. They are relied upon in industry and academia as an integral component of many types of financial and economic analysis. However, even modern classification schemes have failed to embrace the era of big data and remain a largely subjective undertaking prone to inconsistency and misclassification. To address this, we propose a multimodal neural model for training company embeddings, which harnesses the dynamics of both historical pricing data and financial news to learn objective company representations that capture nuanced relationships. We explain our approach in detail and highlight the utility of the embeddings through several case studies and application to the downstream task of industry classification.Abstract
Financial economics and econometrics literature demonstrate that the limit order book data is useful in predicting short-term volatility in stock markets. In this paper, we are interested in forecasting short- term realized volatility in a multivariate approach based on limit order book data and relational stock market networks. To achieve this goal, we introduce Graph Transformer Network for Volatility Forecasting. The model allows combining limit order book features and a large number of temporal and cross-sectional relations from different sources. Through experiments based on about 500 stocks from S&P 500 index, we find a better performance for our model than for other benchmarks.Abstract
In light of micro-scale inefficiencies induced by the high degree of fragmentation of the Bitcoin trading landscape, we utilize a granular data set comprised of orderbook and trades data from the most liquid Bitcoin markets, in order to understand the price formation process at sub-1 second time scales. To achieve this goal, we construct a set of features that encapsulate relevant microstructural information over short lookback windows. These features are subsequently leveraged first to generate a leader-lagger network that quantifies how markets impact one another, and then to train linear models capable of explaining between 10% and 37% of total variation in 500ms future returns (depending on which market is the prediction target). The results are then compared with those of various PnL calculations that take trading realities, such as transaction costs, into account. The PnL calculations are based on natural taker strategies (meaning they employ market orders) that we associate to each model. Our findings emphasize the role of a market’s fee regime in determining its propensity to being a leader or a lagger, as well as the profitability of our taker strategy. Taking our analysis further, we also derive a natural maker strategy (i.e., one that uses only passive limit orders), which, due to the difficulties associated with backtesting maker strategies, we test in a real-world live trading experiment, in which we turned over 1.5 million USD in notional volume. Lending additional confidence to our models, and by extension to the features they are based on, the results indicate a significant improvement over a naive benchmark strategy, which we also deploy in a live trading environment with real capital, for the sake of comparison.Abstract
Many high-dimensional problems involve reconstruction of a low-rank matrix from incomplete and corrupted observations. Despite substantial progress in designing efficient estimation algorithms, it remains largely unclear how to assess the uncertainty of the obtained low-rank estimates, and how to construct valid yet short confidence intervals for the unknown low-rank matrix. In this talk, I will discuss how to perform inference and uncertainty quantification for two examples of low-rank models, (1) heteroskedastic PCA with missing data, and (2) noisy matrix completion. For both problems, we identify statistically efficient estimators that admit non-asymptotic distributional characterizations, which in turn enable optimal construction of confidence intervals for, say, the unseen entries of the low-rank matrix of interest. All this is accomplished by a powerful leave-one-out analysis framework that originated from probability and random matrix theory. This is based on joint work with Yuling Yan, Cong Ma, and Jianqing Fan.Biography: Yuxin Chen is currently an associate professor in the Department of Statistics and Data Science at the University of Pennsylvania. Before joining UPenn, he was an assistant professor of electrical and computer engineering at Princeton University. He completed his Ph.D. in Electrical Engineering at Stanford University, and was also a postdoc scholar at Stanford Statistics. His current research interests include high-dimensional statistics, nonconvex optimization, and reinforcement learning. He has received the Alfred P. Sloan Research Fellowship, the ICCM best paper award (gold medal), the AFOSR and ARO Young Investigator Awards, the Google Research Scholar Award, and was selected as a finalist for the Best Paper Prize for Young Researchers in Continuous Optimization. He has also received the Princeton Graduate Mentoring Award.
Abstract
(Volatility forecasting) We apply machine learning models to forecast intraday realized volatility (RV), by exploiting commonality in intraday volatility via pooling stock data together, and by incorporating a proxy for the market volatility. Neural networks dominate linear regressions and tree models in terms of performance, due to their ability to uncover and model complex latent interactions among variables. Our findings remain robust when we apply trained models to new stocks that have not been included in the training set, thus providing new empirical evidence for a universal volatility mechanism among stocks. Finally, we propose a new approach to forecasting one-day-ahead RVs using past intraday RVs as predictors, and highlight interesting diurnal effects that aid the forecasting mechanism. The results demonstrate that the proposed methodology yields superior out-of-sample forecasts over a strong set of traditional baselines that only rely on past daily RVs.Abstract
Spectral methods are simple but powerful approaches for extracting information from noisy data and have been widely used in various applications. In this talk, we demystify the success of spectral methods by establishing sharp theoretical guarantees for their performance in clustering and synchronization. (1) The first part of the talk is about a novel singular subspace perturbation analysis for spectral clustering. We consider two arbitrary matrices where one is a leave-one-column-out submatrix of the other one and establish a new perturbation upper bound for the distance between their corresponding singular subspaces. Powered by this tool, we obtain an explicit exponential error rate for the performance of spectral clustering in sub-Gaussian mixture models. (2) The second part of the talk is about the exact minimax optimality of a spectral method in the phase synchronization problem with additive Gaussian noises and incomplete data. We prove that it achieves the minimax lower bound of the problem with a matching leading constant under a squared l2 loss. This shows that the spectral method has the same performance as more sophisticated procedures including maximum likelihood estimation, generalized power method, and semidefinite programming, when consistent parameter estimation is possible.Biography: Anderson Ye Zhang is an assistant professor in the Department of Statistics and Data Science at the University of Pennsylvania. Before joining Penn, he was a William H. Kruskal Instructor in Department of Statistics at the University of Chicago. He completed his Ph.D. in Statistics and Data Science at Yale University. His research includes spectral analysis, network analysis, clustering, ranking, and synchronization.
Abstract
We propose a general modeling and algorithmic framework for discrete structure recovery that can be applied to a wide range of problems. Under this framework, we are able to study the recovery of clustering labels, ranks of players, signs of regression coefficients, cyclic shifts, and even group elements from a unified perspective. A simple iterative algorithm is proposed for discrete structure recovery, which generalizes methods including Lloyd's algorithm and the power method. A linear convergence result for the proposed algorithm is established in this paper under appropriate abstract conditions on stochastic errors and initialization. We illustrate our general theory by applying it on several representative problems, (1) clustering in Gaussian mixture model, (2) approximate ranking, (3) sign recovery in compressed sensing, (4) multireference alignment, and (5) group synchronization, and show that minimax rate is achieved in each case.Biography: Chao Gao is an Assistant Professor in Statistics at University of Chicago
Abstract
Globally, capital markets have gone through a paradigm shift towards complete automation through artificial intelligence, turning it into a highly competitive area at the intersection of statistical models from various branches of machine learning. A principled understanding of the interactions between statistical models that operate in a common environment will soon be a key success factor for leaders in the field. In this talk I will first discuss the unique challenges of capital markets through the lens of machine learning and then provide an overview how Borealis AI addresses them from an atomistic and a holistic point of view. In the second part of the talk I will focus on our recent work on continuous-time modeling of irregular time-series and describe an expressive differential deformation of the Wiener process using neural ordinary differential equations. Finally, we will see how an augmentation of this model with a latent process driven by a stochastic differential equation can further increase the flexibility of this system and allows us to capture non-Markovian dynamics.Biography: Andreas Lehrmann is a machine learning researcher at Borealis AI. Previously, he held postdoctoral positions at Facebook Reality Labs and Disney Research. He received his Ph.D. at ETH Zurich and the Max-Planck-Institute for Intelligent Systems under a Microsoft Research scholarship.
Abstract
How many samples are needed to accurately learn the covariance matrix, C, of a distribution over d-dimensional vectors? In modern data applications where d is large, the answer is often unacceptably high: the sample complexity of covariance learning inherently depends poorly on dimension. In this talk I will discuss efforts to address this issue by designing data collection methods and learning algorithms which reduce complexity by leveraging a priori knowledge about the covariance matrix. Specifically, I will discuss the setting when C is known to have Toeplitz structure. Toeplitz covariance matrices arise in many applications, from time series analysis, to wireless communications, to medical imaging. In many of these applications, data collection is expensive, so reducing sample complexity is an important goal. We will start by taking a fresh look at classical and widely used algorithms, including methods based on selecting samples according to a sparse ruler. Then, I will introduce a novel sampling and estimation strategy that improves on existing methods in many settings. Our new approach for learning Toeplitz structured covariance utilizes tools from random matrix sketching, non-linear approximation theory, and sparse Fourier transform algorithms. It fits into a broader line of work which seeks to address problems in active learning using tools from theoretical computer science and randomized numerical linear algebra.Biography: Chris Musco is an Assistant Professor at New York University in the Tandon School of Engineering
Abstract
Estimated covariance matrices are widely used to construct portfolios with variance-minimizing optimization, yet the embedded sampling error produces portfolios with systematically underestimated variance. This effect is especially severe when the number of securities greatly exceeds the number of observations. In this high dimension low sample size (HL) regime, we show that a dispersion bias in the leading eigenvector of the estimated covariance matrix is a material source of distortion in the minimum variance portfolio. We correct the bias with the data-driven GPS (Global Positioning System) shrinkage estimator, which improves with the size of the market, and which is structurally identical to the James Stein estimator for a collection of averages. We illustrate the power of the GPS estimator with a numerical example, and conclude with open problems that have emerged from our research.Biography: Lisa Goldberg is Professor of the Practice of Economics at University of California Berkeley. She is the co-director of the Berkeley Consortium for Data Analytics in Risk. She is Head of Research at Aperio Group, now part of BlackRock.
Abstract
What will happen to Y if we do A? A variety of meaningful social and engineering questions can be formulated this way: What will happen to a patient’s health if they are given a new therapy? What will happen to a country’s economy if policy-makers legislate a new tax? What will happen to a data center’s latency if a new congestion control protocol is used? We explore how to answer such counterfactual questions using observational data---which is increasingly available due to digitization and pervasive sensors---and/or very limited experimental data. The two key challenges are: (i) counterfactual prediction in the presence of latent confounders; (ii) estimation with modern datasets which are high-dimensional, noisy, and sparse. The key framework we introduce is connecting causal inference with tensor completion. In particular, we represent the various potential outcomes (i.e., counterfactuals) of interest through an order-3 tensor. The key theoretical results presented are: (i) Formal identification results establishing under what missingness patterns, latent confounding, and structure on the tensor is recovery of unobserved potential outcomes possible. (ii) Introducing novel estimators to recover these unobserved potential outcomes and proving they are finite-sample consistent and asymptotically normal. The efficacy of our framework is shown on high-impact applications. These include working with: (i) TaurRx Therapeutics to identify patient sub-populations where their therapy was effective. (ii) Uber Technologies on evaluating the impact of driver engagement policies without running an A/B test. (iii) The Poverty Action Lab at MIT to make personalized policy recommendations to improve childhood immunization rates across villages in Haryana, India. Finally, we discuss connections between causal inference, tensor completion, and offline reinforcement learning.Biography: Anish is currently a postdoctoral fellow at the Simons Institute at UC Berkeley. He did his PhD at MIT in EECS where he was advised by Alberto Abadie, Munther Dahleh, and Devavrat Shah. His research focuses on designing and analyzing methods for causal machine learning, and applying it to critical problems in social and engineering systems. He currently serves as a technical consultant to TauRx Therapeutics and Uber Technologies on questions related to experiment design and causal inference. Prior to the PhD, he was a management consultant at Boston Consulting Group. He received his BSc and MSc at Caltech.
Recording
About The Seminar
Organizers: Mihai Cucuringu, Shifan Yu, Chao Zhang
Acknowledgements: Website template from the Stanford MLSys Seminar Series