bAbI v2 AI Music Detection Model — Research Deep Dive
In May 2026, letssubmit publicly released bAbI v2, one of the few openly documented AI music detection models. The model card claims 87.67% accuracy on held-out test data — much lower than competing proprietary detectors but with full transparency. This article digs into the architecture, what it gets right, and where it falls short.
What is bAbI v2?
bAbI (Bayesian Authenticity-Based Inference, version 2) is a single-model neural network trained on a labeled dataset of AI-generated vs human-made music tracks. It outputs a probability score 0-1.
Architecture (per published model card)
- Input: 30-second audio segment, resampled to 22.05 kHz mono
- Feature extractor: pre-trained Wav2Vec2 + custom CNN head
- Classifier: 3-layer MLP with dropout
- Training data: ~80,000 tracks (40K AI from Suno/Udio/Riffusion, 40K human)
- Loss: focal cross-entropy with class weighting
- Reported accuracy: 87.67% on holdout (10K tracks)
Strengths
- Transparency: the architecture, training data composition, and limitations are all publicly disclosed. Rare in this space.
- Reproducible: researchers can replicate the methodology (though not the exact model weights).
- Single-model simplicity: easier to debug and audit than ensemble approaches.
Limitations
- Lower accuracy: 87.67% vs 99%+ for proprietary detectors (authio, AI Song Checker, IRCAM Amplify)
- Single-model bias: no ensembling means certain engine-specific signatures get missed
- Older engine focus: trained primarily on Suno v3-v4 and Udio v1.0. Less accurate on Suno v5 and Udio v1.5.
- No platform attribution: outputs only "AI" vs "human", not which engine
- No watermark reading: pure forensic, doesn't check C2PA/SynthID
Architectural comparison
| System | Approach | Accuracy | Trade-off |
|---|---|---|---|
| bAbI v2 (letssubmit) | Single neural network | 87.67% | Simple, transparent, lower accuracy |
| AI Song Checker v8.3 | Bayesian fusion of 82+ hand-crafted forensic signals | 99.1% | High accuracy, interpretable signals, weekly recalibration |
| authio | Ensemble of 12 specialized models | 99.42% | Highest claimed accuracy, slower inference, opaque |
| IRCAM Amplify | Research-grade CNN + spectral analysis | ~99% | Enterprise-only, no public API |
Why hand-crafted signals beat black-box NN here
For AI music detection specifically, hand-crafted forensic signals (MFCC, CPP, phase coherence, codec residuals) outperform black-box neural networks because:
- The signal types are physically grounded (we know why they work)
- They generalize to unseen engine versions (signatures persist across versions)
- They're interpretable (you can tell users which signal triggered the verdict)
- They recalibrate faster (no retraining needed, just signal weighting)
Future of public AI detection models
bAbI v2's main contribution is normative — it sets a precedent for transparency. We expect more research labs to publish similar model cards through 2027, especially as the EU AI Act requires "labeling AI content" mandates auditable detection methods.
For developers
If you're building your own AI music detector for research, bAbI v2 is a solid starting baseline. For production use cases (DSP, distributor, A&R platform), an ensemble or Bayesian approach (like ours) will outperform single-model architectures.