2023_Master's_thesis
- 2020.9 - 2022.12 | Beijing University of Aeronautics and Astronautics | MBA
- The master thesis is Ensemble learning stock selection research under the scenario of industry rotation. Using LightGBM, Xgboost, and Random Forest to select stock.
Abstract
In the stock selection strategy of quantitative trading, multi-factor stock selection is a classical stock selection method. The idea of this method is to select the factors related to the stock return rate and combine these factors to estimate the stock return rate. The author joins a quantitative institution X, which also extensively uses multi-factor models in stock selection.
Industry rotation refers to the phenomenon of the rise and fall of various industries at different times. Grasping industry rotation is conducive to achieving excess returns. This paper starts with the definition of industry rotation, tests the existence of industry rotation, constructs industry factors, and builds an industry prediction model to predict the industries with the highest monthly rise. First, this paper statistics the mean change of the rise and fall rankings of the various industries in the first level of Shenwan Industry classification, and finds that the mean change of the rise and fall rankings is around 9 per week, per month, and year, indicating the existence of industry rotation. Then, it constructs valuation, profit, and technical industry factors, and tests and screens these factors. Finally, it builds an industry prediction model to predict the industry with the highest monthly rise, and the rise of the industry selected by the model is 1 percentage point higher than the average rise of the industry, and the cumulative net worth of the industry continues to outperform the entire industry from 2008.1.1 to 2021.12.31
Combining industry factors, this paper constructs an ensemble learning stock selection model. The ensemble learning model includes random forests, XGboost, and lightGBM, and the prediction time is from 2015.1.1 to 2021.12.31. The factors include price and quantity, financial, and technical factors from institution X, as well as industry factors. The results show that the ensemble learning stock selection model with industry factors has better returns than the CSI 300, and also outperforms similar models that do not consider industry factors. The model's weekly effect is better than the monthly frequency. In terms of factor importance ranking, market value and industry factors are the most important, followed by price and quantity and technical factors, while financial factors rank lower.
Since the data and backtesting module come from Institution X, this paper also proposes some improvement suggestions to Institution X