bAbI v2 AI Music Detection Model — Research Deep Dive

Published May 22, 2026 · 12 min read · AI Song Checker team

In May 2026, letssubmit publicly released bAbI v2, one of the few openly documented AI music detection models. The model card claims 87.67% accuracy on held-out test data — much lower than competing proprietary detectors but with full transparency. This article digs into the architecture, what it gets right, and where it falls short.

What is bAbI v2?

bAbI (Bayesian Authenticity-Based Inference, version 2) is a single-model neural network trained on a labeled dataset of AI-generated vs human-made music tracks. It outputs a probability score 0-1.

Architecture (per published model card)

Input: 30-second audio segment, resampled to 22.05 kHz mono
Feature extractor: pre-trained Wav2Vec2 + custom CNN head
Classifier: 3-layer MLP with dropout
Training data: ~80,000 tracks (40K AI from Suno/Udio/Riffusion, 40K human)
Loss: focal cross-entropy with class weighting
Reported accuracy: 87.67% on holdout (10K tracks)

Strengths

Transparency: the architecture, training data composition, and limitations are all publicly disclosed. Rare in this space.
Reproducible: researchers can replicate the methodology (though not the exact model weights).
Single-model simplicity: easier to debug and audit than ensemble approaches.

Limitations

Lower accuracy: 87.67% vs 99%+ for proprietary detectors (authio, AI Song Checker, IRCAM Amplify)
Single-model bias: no ensembling means certain engine-specific signatures get missed
Older engine focus: trained primarily on Suno v3-v4 and Udio v1.0. Less accurate on Suno v5 and Udio v1.5.
No platform attribution: outputs only "AI" vs "human", not which engine
No watermark reading: pure forensic, doesn't check C2PA/SynthID

Architectural comparison

System	Approach	Accuracy	Trade-off
bAbI v2 (letssubmit)	Single neural network	87.67%	Simple, transparent, lower accuracy
AI Song Checker v8.3	Bayesian fusion of 82+ hand-crafted forensic signals	99.1%	High accuracy, interpretable signals, weekly recalibration
authio	Ensemble of 12 specialized models	99.42%	Highest claimed accuracy, slower inference, opaque
IRCAM Amplify	Research-grade CNN + spectral analysis	~99%	Enterprise-only, no public API

Why hand-crafted signals beat black-box NN here

For AI music detection specifically, hand-crafted forensic signals (MFCC, CPP, phase coherence, codec residuals) outperform black-box neural networks because:

The signal types are physically grounded (we know why they work)
They generalize to unseen engine versions (signatures persist across versions)
They're interpretable (you can tell users which signal triggered the verdict)
They recalibrate faster (no retraining needed, just signal weighting)

Future of public AI detection models

bAbI v2's main contribution is normative — it sets a precedent for transparency. We expect more research labs to publish similar model cards through 2027, especially as the EU AI Act requires "labeling AI content" mandates auditable detection methods.

For developers

If you're building your own AI music detector for research, bAbI v2 is a solid starting baseline. For production use cases (DSP, distributor, A&R platform), an ensemble or Bayesian approach (like ours) will outperform single-model architectures.

Try the AI Song Checker API — 99% accuracy out of the box