Skip to content
Arman Hassanniakalager edited this page May 10, 2022 · 9 revisions

Welcome to the ML_AccountingFraud wiki!

Abstract

image This paper introduces a new fraud prediction model to the accounting literature using machine learning (ML). We adopt a methodology which combines ensemble learning, one of the most powerful (ML) methods, and logistic regressions which we refer to as LogitBoost. Thus, the methodology brings together ML methods recently introduced in accounting research with the commonly used logistic regressions. We show, using seven alternative measures assessing the ability to detect fraud, that our model outperforms the methods based solely on logistic regressions or other ML methods used by prior literature. Additionally, our model outperforms the others in predicting fraud beyond the current accounting period. Importantly, our method relies on a lower number of predictors than those used in prior ML research, thus minimizing concerns over multicollinearity and potential overfitting associated with machine learning methods.

N.B: A preprint will be released in early May. If you are interested to have a preview, please email me.

What?

This project aims to create an early warning system by using financial ratios from annual statements for detecting misstatements leading to an SEC AAER.

How?

We first evaluate two alternative choices of predictors for multicollinearity. We then train eight machine learning models (SVM with a financial kernel as in Cecchini et al 2010, RUSBoost as in Bao et al 2020, SVM, Logistic Regression as in Dechow et al 2011, AdaBoost with Logistic Regression, Artificial Neural Network, and a weighted average of all previous techniques named fused ML).

Results

We find that using raw quantitative figures from financial statements leads to high multicollinearity. The AdaBoost with LR as the base model has the highest sensitivity at the top 1 percentile. The same model is able to predict AAERS up to 4 years ahead of SEC.

Clone this wiki locally