Can Words Reveal Fraud? A Lexicon Approach to Detecting Fraudulent Financial Reporting

The study introduces a fraud lexicon and a Balanced Random Forest classifier for detecting fraudulent financial reporting. The classifier, utilizing the fraud lexicon as a feature set, demonstrates strong accuracy in predicting fraud across multiple samples from 2000 to 2017, outperforming random guessing by 40 to 48 percent. The fraud lexicon proves valuable for "bag‑of‑words" analysis, benefiting researchers, practitioners, auditors, regulators, and investors in enhancing fraud risk assessment procedures.