Methodology: Regulatory Enforcement Action Analysis
This document describes the methodology used to collect, analyse, and report on regulatory enforcement actions with an AML/compliance focus.
Overview
The analysis pipeline has three stages:
- Document Ingestion — Enforcement action documents are processed by an AI analyst to extract structured findings.
- Statistical Analysis — Structured data is aggregated across all actions to identify patterns.
- Report Generation — Statistical outputs and AI-generated narratives are assembled into a final report.
Stage 1: Document Ingestion
Source Documents
Source materials are publicly available enforcement action documents — orders, notices of assessment, and settlement agreements — published by regulators including OFAC, FinCEN, FCA, MAS, FINTRAC, ACPR, DNB, and others.
AI-Powered Extraction
Each document is submitted in full to a large language model instructed to act as an expert compliance analyst. The model extracts findings only from content explicitly stated or strongly implied in the document; fabrication is explicitly prohibited.
The extraction covers eight analytical areas:
Industry Classification: The primary industry of the sanctioned entity is identified from a defined taxonomy (Banking, Broker-Dealer, VASP, FinTech, Insurance, Gaming, Corporate/Non-Financial, Multi-sector), with a confidence level assigned.
Compliance Domain Findings: Violations are mapped to the primary AML compliance domains:
- KYC & Onboarding — identity verification, beneficial ownership, risk rating, periodic review
- Transaction Monitoring — scenario coverage, threshold calibration, model validation
- Sanctions Screening — list management, screening frequency, match adjudication, ownership calculations
- Investigations & Reporting — alert disposition, SAR filing quality, regulatory reporting, record retention
For each domain, findings capture the gap identified, the root cause, the business impact, and a priority level.
Priority Assignment: Each finding is assigned a priority using a deterministic decision tree:
- High — penalty attributable to this gap exceeds $10M; regulator labelled the conduct “egregious”; the gap enabled actual illegal activity; or a criminal referral was made
- Medium — significant control weakness attracting explicit regulatory criticism; contributed to the penalty but not the primary driver
- Low — documentation or procedural gap only; technical non-compliance with no evidence of exploitation
Root Cause Classification: Root causes are classified into eight categories:
- Insufficient Technology
- Process Design Flaw
- Resource Constraints
- Governance Failure
- Data Quality Issues
- Cultural/Tone Issues
- External Factors (rapid growth, M&A, market disruption)
- Information Siloing
Solution Roadmap: Remediation actions are identified across three time horizons — immediate fix, tactical solution, and strategic transformation — each with associated success metrics.
AI Opportunity Assessment: AI/ML opportunities are identified for each gap and assessed on technical feasibility, regulatory and risk factors, and business case. Each opportunity records the technology type, expected benefit, estimated ROI timeline, and key implementation risks.
Key Facts Extraction: Penalty mechanics are extracted including base amount, adjustments, aggravating factors, mitigating factors, and voluntary self-disclosure status.
Human-Readable Summary: A concise narrative is generated covering entity, regulator, penalty, violation summary, and egregious/non-egregious classification.
Stage 2: Statistical Analysis
Data Cleaning
Before any aggregation, the raw data is cleaned: penalty amounts are converted to USD using exchange rates current at the time of the analysis run, country and regulator names are standardised for consistent display, and root cause labels are normalised to the canonical eight categories to correct for any variation in how the extraction model phrased them.
Currency Conversion
All penalty amounts are converted to USD. Exchange rates are refreshed from a live source before each analysis run to ensure comparability across jurisdictions. Rates are point-in-time snapshots and do not account for exchange rate movements over the full violation period.
Quantitative Analyses
The following analyses are computed across the full dataset:
Penalties by Region: Total, average, and count of enforcement actions per country.
Compliance Theme Frequency: Count of findings per compliance domain and the number of distinct entities with findings in each domain.
Root Cause Frequency: Count and percentage share of each root cause category across all findings.
Penalties by Regulator and Industry: Total, average, and count of enforcement actions, segmented by regulator and by industry sector.
Priority Distribution: Cross-tabulation of finding priority (High / Medium / Low) by compliance domain.
Multi-Domain Violation Patterns: Co-occurrence analysis of compliance domains — how frequently pairs of domains fail together within the same enforcement action, identifying systemic cross-functional weaknesses.
Root Cause by Domain: Cross-tabulation of root cause categories against compliance domains.
Penalty by Root Cause: Penalties are allocated across findings proportionally to priority severity (High weighted 3×, Medium 2×, Low 1×), then aggregated by root cause category. This shows which root causes drive the greatest financial exposure.
AI Use Cases by Root Cause
AI opportunities identified across all enforcement actions are mapped to the root causes that drove each underlying violation. For the most frequently occurring root cause categories, the highest-impact and most distinct AI use cases are selected using a structured LLM ranking that prioritises breadth of coverage, practical feasibility, and distinctness across the selected set. The output surfaces targeted, actionable AI investment opportunities grounded in actual enforcement patterns.
Stage 3: AI-Powered Narrative Insights
After the statistical analysis completes, AI-generated narratives are produced for five sections of the report:
Executive Summary: Key trends in enforcement activity, geographic and regulatory patterns, systemic compliance weaknesses, and implications for financial institutions.
Compliance Theme Analysis: Which domains are most problematic, patterns in multi-domain failures, and recommendations for compliance prioritisation.
Root Cause Analysis: The most significant systemic root causes, how they vary across domains, and strategic recommendations for governance and technology investment.
Regulatory and Geographic Patterns: Enforcement intensity across regulators and regions, and implications for multi-jurisdiction compliance programmes.
Strategic Recommendations: Five to seven specific, data-referenced recommendations, each with supporting context and concrete action items, prioritised by the frequency and financial severity of the underlying compliance failures.
Report Structure
The final report integrates statistical outputs and AI-generated narratives into a single self-contained document, structured as follows:
- Executive Summary
- Penalty Analysis — by country, regulator, and industry
- Compliance Theme Analysis — finding frequency and multi-domain patterns
- Root Cause Analysis — frequency, domain cross-tabulation, and penalty weighting
- Priority Distribution — gap severity by compliance domain
- AI Investment Themes — top use cases mapped to root causes, with expected benefit, ROI, and risks
- Strategic Recommendations
Quality Assurance
All AI-generated content in this report — including extracted findings, narrative insights, and AI investment recommendations — is reviewed for accuracy, completeness, and consistency with the underlying source documents before publication.
Data Quality and Limitations
Source coverage: The dataset reflects only publicly disclosed enforcement actions. Settlements resolved without public notice, informal supervisory actions, and actions by regulators that do not publish full orders are excluded.
Extraction accuracy: Findings are extracted by an AI model from regulatory documents. While the model is instructed to be conservative and avoid fabrication, extraction quality depends on document clarity. Heavily redacted or unusually formatted documents may result in incomplete extraction.
Currency conversion: Penalty comparisons across jurisdictions depend on exchange rates at the time of the analysis run. Rates are not adjusted for intra-period fluctuations over long violation periods.
AI use case attribution: Use cases are attributed to root causes based on the finding they address. Where a single enforcement action contains multiple root causes within the same compliance domain, the most prevalent root cause is used; others may not receive attribution.
Root cause categorisation: Root causes are mapped to a fixed set of eight categories. Some findings may have multiple contributing causes; the category recorded reflects the primary cause stated or most strongly implied in the enforcement document.