-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the ML_AccountingFraud wiki!
This paper introduces a new fraud prediction model to the accounting literature using machine learning (ML). We adopt a methodology which combines ensemble learning, one of the most powerful (ML) methods, and logistic regressions which we refer to as LogitBoost. Thus, the methodology brings together ML methods recently introduced in accounting research with the commonly used logistic regressions. We show, using seven alternative measures assessing the ability to detect fraud, that our model outperforms the methods based solely on logistic regressions or other ML methods used by prior literature. Additionally, our model outperforms the others in predicting fraud beyond the current accounting period. Importantly, our method relies on a lower number of predictors than those used in prior ML research, thus minimizing concerns over multicollinearity and potential overfitting associated with machine learning methods.
N.B: A preprint will be released in early May. If you are interested to have a preview, please email me.
This project aims to create an early warning system by using financial ratios from annual statements for detecting misstatements leading to an SEC AAER.
We first evaluate two alternative choices of predictors for multicollinearity. We then train eight machine learning models (SVM with a financial kernel as in Cecchini et al 2010, RUSBoost as in Bao et al 2020, SVM, Logistic Regression as in Dechow et al 2011, AdaBoost with Logistic Regression, Artificial Neural Network, and a weighted average of all previous techniques named fused ML).
We find that using raw quantitative figures from financial statements leads to high multicollinearity. The AdaBoost with LR as the base model has the highest sensitivity at the top 1 percentile. The same model is able to predict AAERS up to 4 years ahead of SEC.