SACCO Internal Auditor – AI Blueprint

A concrete end‑to‑end plan to design, build, and deploy an AI assistant for a SACCO’s Internal Audit function.

Version 1.0 (HTML)

1) Objectives & Target Outcomes

Primary goals

Detect and prioritize high‑risk transactions, accounts, branches, and processes for audit testing.
Automate continuous controls monitoring (CCM) and generate exception reports with explanations.
Shorten audit cycles by pre‑populating working papers with AI‑derived findings and evidence trails.
Provide transparent, defensible reasoning aligned with audit standards (documentation, reproducibility).

Measurable outcomes (first 90–120 days)

≥70% precision in top‑K flagged items (e.g., top 1,000 monthly events).
≥30% reduction in manual sampling time.
"Time‑to‑detect" reduced by ≥50% for recurring anomalies.
100% decision traceability: every alert has features, rules, model version, and data lineage captured.

2) High‑Value Audit Use Cases

Transactions & Payments
- Duplicate/round‑sum payments, weekend/after‑hours postings, rapid reversals, manual overrides, off‑cycle journals.
- Benford’s Law deviations across GL and vendor payments.
Loans & Credit
- Ghost members, related‑party loans, unusual restructuring, delinquency risk, staging anomalies (IFRS 9 context).
Savings & Member Accounts
- Structuring/smurfing patterns, linked accounts, abnormal cash‑in/cash‑out velocity.
Vendors & Procurement
- Shell vendors, split purchases below approval thresholds, outdated KYC/CRB, conflicts of interest.
Channels & Ops
- Teller overrides, mobile/app anomalies, device/location mismatch, insider risk signals.
Policy Compliance (NLP)
- Scan narratives/emails/tickets for policy breach indicators and missing documentation.

3) Data Sources & Minimum Viable Dataset (MVD)

Core systems: GL, loan origination/servicing, savings, payments, treasury/cash.
Reference: Chart of accounts, product catalog, approval matrices, KYC/CRB, HR (roles, org, leave), vendor master, branch list.
Operational: Access logs, device/geo, exception logs, ticketing, audit findings history.
External (optional): Credit bureau signals, sanctions lists (for name screening), public registries for vendor due diligence.

Data window: 24–36 months history where possible; minimum 12 months for seasonality.

PII handling: Tokenize member IDs; keep reversible mapping in a secure vault for investigations.

4) Features & Red‑Flag Indicators (Examples)

Transaction‑level

Amount z‑scores by member/product/branch, proximity to approval thresholds, round amounts, weekend/holiday flags, velocity (n events in Δt), rapid reversals, teller overrides, manual GL journals.

Benford & Ratio tests

First‑digit distribution tests, expense ratios vs budgets, duplicate invoice numbers/amount+date collisions.

Behavioral & Network

Device/IP reuse across unrelated members, shared contacts/bank accounts, graph motifs (e.g., small cycles) among members, staff, vendors.

Credit risk

PD/EAD/LGD features, days‑past‑due dynamics, restructuring frequency, staging jumps, collateral valuation gaps.

NLP signals

Missing documents noted in case notes, phrases indicating urgency/bypass, sentiment spikes in complaints.

5) Model Portfolio (Complementary, not either/or)

Rules & Controls (Deterministic)
- Encodes policy and regulatory thresholds; fast and explainable; seeds labels for ML.
Unsupervised anomaly detection
- Isolation Forest / One‑Class SVM for transaction outliers.
- Time‑series: Seasonal‑Hybrid ESD, STL residual anomalies on volumes/amounts.
Supervised risk scoring (where historical cases exist)
- Gradient Boosted Trees (XGBoost/LightGBM) or Logistic Regression with monotonic constraints for stability.
Graph analytics
- Community detection, link prediction for collusion/related parties.
NLP
- Domain keyword patterns + lightweight transformer (e.g., MiniLM) for policy‑breach classification; zero‑shot for cold‑start.

Blended score: risk_score = w1*rules + w2*unsupervised + w3*supervised + w4*graph + w5*NLP with calibrated weights to meet precision/recall targets.

6) System Architecture (Target)

Ingestion/Storage

Batch: nightly CDC from core systems to data lake/warehouse.
Stream (optional): Kafka/Kinesis from payments/teller systems for near‑real‑time CCM.

Processing & Feature Store

ETL/ELT in dbt/Spark; feature store (entity = member, loan, vendor, device).

Modeling

Training pipelines (orchestrated by Airflow/Prefect). Model registry with versioning.

Serving

Batch scoring jobs (daily) + streaming microservice (REST) for real‑time flags.

Explainability & Evidence

SHAP values for supervised models; rule hits; feature snapshots; data lineage.

Audit UI

Web dashboard: queue of alerts, drill‑downs, trend charts, case management integration, export to working papers.

Security

RBAC/ABAC, row‑level security by branch/role; encryption at rest/transit; immutable (WORM) store for logs/evidence.

7) Evaluation & Monitoring

Offline

Class‑imbalance metrics: PR‑AUC, Precision@K, Recall@K, MCC.
Stability: PSI/CSI on features; backtests across months/quarters.

Online

Analyst acceptance rate, time‑to‑closure, confirmed‑case rate, alert fatigue.
Drift: population drift (KL/JS), performance drift; auto‑recalibration rules.

Human‑in‑the‑loop

All analyst actions (confirm, dismiss, escalate) feed labels back to training data.

8) Governance & Compliance

Document model purpose, data, assumptions, and limitations (Model Cards).
Change management: approvals for rule/model updates; rollback strategy.
Reproducibility: pin data snapshots, code hashes, and model versions in every alert.
Privacy: PII minimization, tokenization, lawful basis documentation, retention schedule.
Access: maker‑checker for high‑risk overrides; full audit trail.

9) Starter Implementation Plan (12 Weeks)

Weeks 1–2 – Discovery & Data

Confirm scope (transactions, loans, vendors). Land MVD tables. Define entity keys. Map approval matrices.
Build rule catalog with Internal Audit & Risk.

Weeks 3–4 – Feature Engineering & Rules

Implement 25–40 baseline rules (thresholds, duplicates, timing anomalies). Create shared feature views.

Weeks 5–6 – Unsupervised & Time‑Series

Isolation Forest on transactional features. STL+ESD anomalies on daily amounts/volumes by branch/product.

Weeks 7–8 – Supervised Prototype

If labels exist (historic exceptions/frauds), train LightGBM. Calibrate with isotonic regression.

Weeks 9–10 – UI & Workflow

Deliver dashboard with alert queue, drill‑downs, evidence bundle export (PDF/CSV), assignments and SLA timers.

Weeks 11–12 – Hardening

Backtesting, threshold tuning, monitoring dashboards, playbooks for incidents, and go‑live checklist.

10) Sample Data Model (Warehouse Views)

-- Fact table: transactions
CREATE VIEW vw_transactions AS
SELECT
  t.txn_id,
  t.member_id,
  t.account_id,
  t.product_type,
  t.amount,
  t.currency,
  t.posted_at,
  t.channel,
  t.branch_id,
  t.user_id,
  t.is_reversal,
  v.invoice_no,
  v.vendor_id
FROM raw.transactions t
LEFT JOIN raw.vendor_invoices v USING (txn_id);

-- Dimensions
CREATE VIEW dim_member AS SELECT member_id, join_date, kyc_risk, branch_id FROM raw.members;
CREATE VIEW dim_staff  AS SELECT user_id, role, department, is_maker, is_checker FROM raw.staff;
CREATE VIEW dim_vendor AS SELECT vendor_id, tax_pin, created_at, is_related_party FROM raw.vendors;

-- Feature view example: rolling stats by member
CREATE VIEW feat_member_txn_30d AS
SELECT
  member_id,
  COUNT(*) AS txn_30d,
  SUM(amount) AS amt_30d,
  AVG(amount) AS avg_amt_30d,
  STDDEV(amount) AS sd_amt_30d,
  MAX(posted_at) AS last_txn_at
FROM raw.transactions
WHERE posted_at >= CURRENT_DATE - INTERVAL '30 day'
GROUP BY 1;

11) Rule Examples (Deterministic CCM)

-- R1: Split purchases just under approval threshold (per vendor per day)
SELECT vendor_id, DATE(posted_at) d, COUNT(*) n, SUM(amount) total
FROM raw.vendor_payments
WHERE amount BETWEEN 0.9*${approval_threshold} AND ${approval_threshold}
GROUP BY 1,2 HAVING n >= 3;

-- R2: Duplicate invoices (same vendor, invoice_no, amount)
SELECT vendor_id, invoice_no, amount, COUNT(*) n
FROM raw.vendor_invoices
GROUP BY 1,2,3 HAVING n > 1;

-- R3: After‑hours postings by staff outside allowed window
SELECT txn_id, user_id, posted_at
FROM raw.transactions t JOIN dim_staff s USING (user_id)
WHERE EXTRACT(HOUR FROM posted_at) NOT BETWEEN s.allowed_start AND s.allowed_end;

12) Python Prototype (Training & Scoring)

import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import train_test_split
from lightgbm import LGBMClassifier
from sklearn.metrics import average_precision_score, precision_recall_curve

# Load your engineered features (joined views)
X = pd.read_parquet('features/txn_features.parquet')

# ---- Unsupervised: Isolation Forest ----
iso = IsolationForest(n_estimators=300, contamination=0.01, random_state=42)
iso.fit(X)
unsup_score = -iso.decision_function(X)  # higher => more anomalous

# ---- Supervised (if labels exist) ----
labels = pd.read_parquet('labels/txn_labels.parquet')  # columns: txn_id, y
XY = X.join(labels.set_index('txn_id'), on='txn_id').dropna(subset=['y'])
X_train, X_valid, y_train, y_valid = train_test_split(XY.drop(columns=['y']), XY['y'], stratify=XY['y'], test_size=0.2, random_state=42)

clf = LGBMClassifier(n_estimators=600, learning_rate=0.03, max_depth=-1, subsample=0.8, colsample_bytree=0.8)
clf.fit(X_train, y_train)
proba = clf.predict_proba(X_valid)[:,1]
pr_auc = average_precision_score(y_valid, proba)

# Blend example (weights to be tuned)
blend = 0.5*(unsup_score.loc[X_valid.index]) + 0.5*proba

# Threshold at top‑K
K = 1000
topk_idxs = blend.sort_values(ascending=False).head(K).index
alerts = X_valid.loc[topk_idxs]
alerts['risk_score'] = blend.loc[topk_idxs]
alerts.to_parquet('outputs/alerts_topk.parquet')

13) Explainability & Evidence Bundle

For each alert:

Rule hits (IDs and descriptions) + parameters used.
Feature snapshot (values and global percentiles).
Model version, training data timestamp, SHAP top contributors (for supervised models).
Raw data links (transaction, member, loan, vendor records) with immutable IDs.
Decision log (who viewed, actions taken, comments) for full traceability.

14) Dashboard & Workflow (MVP requirements)

Alert inbox: filters (date, branch, product, channel, risk band), bulk actions, assignments.
Drill‑down: transaction trace, member summary, related network, SHAP chart, rule hits, evidence export.
Case management: status, SLA timers, escalation, attachments, audit trail.
Monitoring: daily counts, precision@K, acceptance rate, drift indicators, top rules by hit rate.

15) Security & Access Controls

Role‑based/attribute‑based access; branch scoping.
Encryption at rest and in transit; secrets in vault; tokenized PII.
Immutable evidence storage and log retention per policy.
Maker‑checker on high‑risk actions; dual control for threshold changes.

16) Risks & Mitigations

Data quality: build validation tests (freshness, nulls, referential integrity) and fail‑closed on anomalies.
Label sparsity: leverage rules + weak supervision; active learning from analyst feedback.
Concept drift: monthly backtests; automated alerts; retrain gates.
Explainability: prefer simpler, stable models for production; SHAP and rule hits for narratives.

17) Resourcing & Tools (suggested)

Team: 1 data engineer, 1 ML engineer, 1 data scientist, 1 internal auditor product owner.
Stack (illustrative): Warehouse (BigQuery/Snowflake/Postgres), dbt, Spark/Pandas, Airflow/Prefect, MLflow, LightGBM/XGBoost, Kafka (optional), Metabase/Power BI for UI.

18) Next Steps Checklist

[ ] Confirm use‑case scope for Phase 1 (pick 2–3 domains).
[ ] Approve data access & governance plan.
[ ] Stand up staging area + feature views.
[ ] Implement 30 baseline rules; ship weekly exception reports.
[ ] Train Isolation Forest; tune contamination via backtests.
[ ] If labels exist, train supervised model; calibrate thresholds.
[ ] Deliver MVP dashboard + evidence export.
[ ] UAT with Internal Audit; tune; go‑live.