Data Sonification Case Study: Turning Econometric Research into Music with AI and Reaper
How I turned an academic paper on movie word-of-mouth into a 4-layer musical composition using Python, MIDI, and Reaper. Mulhacen Labs case study.
I took a 44-page academic paper about how weather affects movie ticket sales and turned it into a 2-minute 40-second musical composition where you can literally hear the data. Four musical layers, 8 automation parameters, 48 bars, all generated from a Python pipeline and mixed in Reaper. This is Momentum Cascade, a data sonification project built by Mulhacen Labs.
What Is Data Sonification?
Data sonification maps numbers to sound. Instead of reading a chart, you hear the data. A rising stock price becomes a rising pitch. Accelerating growth becomes an accelerating rhythm. It is not new (sonar is sonification), but AI and modern audio tools make it practical for research, accessibility, and creative applications.
| Aspect | Visualization | Sonification |
|---|---|---|
| Sense | Sight | Hearing |
| Dimensions | 2-3 (x, y, color) | 6+ (pitch, volume, rhythm, timbre, stereo, density) |
| Time perception | Static or animated | Naturally temporal |
| Accessibility | Requires sight | Works for visually impaired |
| Engagement | Analytical | Emotional + analytical |
The Source Data
The source paper is "Something to Talk About: Social Spillovers in Movie Consumption" by Gilchrist and Sands (2016, Journal of Political Economy). The key finding: a positive weather shock on a movie's opening weekend creates a compounding wave of word-of-mouth. By week 6, a $1 shock to opening revenue generates $2.14 in total revenue.
| Week | Coefficient | Cumulative Multiplier |
|---|---|---|
| 1 | 1.000 | 1.000 |
| 2 | 0.474 | 1.474 |
| 3 | 0.269 | 1.743 |
| 4 | 0.188 | 1.931 |
| 5 | 0.112 | 2.043 |
| 6 | 0.096 | 2.139 |
All coefficients statistically significant at the 1% level. That cascading pattern is what I wanted to make audible.
The Pipeline
I built a 5-layer pipeline: data extraction (Python + pdfplumber), parameter mapping, MIDI generation, automation control, and mixing/mastering in Reaper.
Step 1: Extract. Python scripts pull the tables and figures from the PDF into structured CSVs (momentum coefficients, viewership decay curves, quality splits).
Step 2: Map. Each data dimension gets assigned to a musical parameter, grounded in The Sonification Handbook (Hermann, Hunt & Neuhoff, 2011). I did not make arbitrary choices. Every mapping follows established perceptual principles.
Step 3: Generate. A custom Python script (raw_midi_generator.py, stdlib only, no dependencies) generates a Type 1 multi-track MIDI file at 480 PPQN. 4 musical tracks + conductor track + automation track.
Step 4: Automate. 8 continuous MIDI CC parameters control effects in real-time: filter cutoff, reverb, delay feedback, EQ, stereo width. Sent via IAC Driver at 32nd-note resolution.
Step 5: Mix. Loaded into Reaper, assigned instruments and effects per track, mixed and mastered to a final stereo file.
The Four Musical Layers
| Layer | Data Dimension | Musical Parameter | Effect |
|---|---|---|---|
| Pad | Cumulative multiplier (1.0 to 2.14) | Chord density (1 to 5 voices) + velocity | Texture thickens as the cascade grows |
| Bass | Week coefficient (1.0 to 0.096) | Pitch (C2 to C3, lower = stronger) | Initial shock hits deep, echoes rise |
| Echo | Number of active echoes per week | Fragment count (1 to 6 ascending motifs) | Space fills up as word-of-mouth spreads |
| Pulse | Week progression | Note interval (whole to eighth notes) | Heartbeat accelerates with tension |
The piece is in C natural minor at 72 BPM. 6 sections of 8 bars each, one per weekend. It builds from sparse and quiet (week 1) to dense and urgent (week 6), mirroring the compounding cascade in the data.
Technical Specs
| Property | Value |
|---|---|
| Duration | 48 bars, ~2 min 40 sec |
| Tempo | 72 BPM |
| Key | C natural minor (Aeolian) |
| MIDI resolution | 480 PPQN |
| Automation | 8 CC parameters, 32nd-note resolution |
| Tools | Python, Reaper, IAC Driver |
| Dependencies | Zero external Python libraries for MIDI generation |
Why This Matters
Data sonification is a growing field with applications in scientific research, accessibility (making data available to visually impaired users), financial monitoring, and creative arts. The techniques I used here (parameter mapping, multi-layer composition, automation from data) apply to any dataset.
If you have data that tells a story (and most data does), sonification can make that story felt, not just understood.
About Mulhacen Labs
I'm Barry Faassen, founder of Mulhacen Labs. I build at the intersection of software engineering, AI, and audio. 25+ years of experience across scientific computing (Deltares), geotechnical software (Fugro), and audio plugin development (C++/JUCE). Based in Granada, Spain.
Have a dataset you want to hear? Book a call.