Skip to content
Arman Hassanniakalager edited this page May 24, 2022 · 9 revisions

Welcome to the ML_AccountingFraud wiki!

Abstract

This paper introduces a new fraud prediction model to the accounting literature using machine learning (ML). This model, which we refer to as LogitBoost, combines ensemble learning, one of the most powerful ML methods, and logistic regressions. We show, using seven alternative measures assessing the ability to detect fraud, that our model outperforms the methods based solely on logistic regressions or other ML methods used by prior literature. Additionally, our model outperforms the others in predicting fraud beyond the current accounting period. Importantly, our method relies on a lower number of predictors than those used in prior ML research, thus minimizing concerns over multicollinearity and potential overfitting associated with machine learning methods.

You can find a copy of the manuscript on SSRN.

What?

This project aims to create an early warning system by using financial ratios from annual statements for detecting misstatements leading to an SEC AAER.

How?

We first evaluate two alternative choices of predictors for multicollinearity. We then train eight machine learning models (SVM with a financial kernel as in Cecchini et al 2010, RUSBoost as in Bao et al 2020, SVM, Logistic Regression as in Dechow et al 2011, AdaBoost with Logistic Regression, Artificial Neural Network, and a weighted average of all previous techniques named fused ML).

Results

We find that using raw quantitative figures from financial statements leads to high multicollinearity. The AdaBoost with LR as the base model has the highest sensitivity at the top 1 percentile. The same model is able to predict AAERS up to 4 years ahead of SEC.

Clone this wiki locally