13 October 2025

Treble and Hugging Face Collaborate to Advance Audio ML

Treble Technologies and Hugging Face have partnered to make physically accurate acoustic simulation data openly accessible to the global research community.
As part of this collaboration, we are releasing the Treble10 dataset, a new open dataset containing broadband room impulse responses (RIRs) and speech-convolved acoustic scenes from ten distinct furnished rooms. The dataset is now freely available on the Hugging Face Hub for non-commercial research.
This collaboration aims to lower the barrier to entry for audio and speech machine learning research by providing high-quality, physics-based acoustic data that was previously difficult to obtain or reproduce.

A new open dataset for acoustic AI and audio ML

The Treble10 dataset provides a high-fidelity, physically grounded foundation for advancing automatic speech recognition (ASR), speech enhancement, dereverberation, source localization, and spatial audio research.

All simulations were created using the Treble SDK, Treble’s cloud-based acoustic simulation engine. The SDK employs a hybrid modeling approach that combines wave-based finite element (DG-FEM) simulations at low to mid frequencies with geometrical acoustics at high frequencies. This enables full-band 32 kHz RIRs that capture diffraction, interference, scattering, and reverberation across a wide range of realistic spaces.

Dataset overview

The Treble10 dataset includes simulated acoustics from 10 furnished rooms, carefully selected to represent diverse real-world environments:

  • 2 bathrooms
  • 2 bedrooms
  • 2 living rooms with hallway
  • 2 living rooms without hallway
  • 2 meeting rooms

Room volumes and reverberation times span a broad range, representing both compact reflective spaces and larger absorptive ones. Every simulation has been validated to ensure geometric and physical consistency.

Dataset structure

The dataset is organized into six subsets, designed to support various machine learning and signal processing tasks:

  • Treble10-RIR-mono – Mono RIRs between 5 sound sources and multiple receiver positions in each room, spaced on 0.5 m grids at three heights (0.5 m, 1.0 m, 1.5 m).
  • Treble10-RIR-HOA8 – 8th-order Ambisonics RIRs using identical spatial configurations as the mono subset.
  • Treble10-RIR-6chDevice – Six-channel recordings from a cylindrical microphone array placed at each receiver position.
  • Treble10-Speech-mono - Each RIR from the RIR-mono subset is convolved with a speech file from the LibriSpeech test set.
  • Treble10-Speech-HOA8 - Each Ambisonics RIR from the RIR-HOA subset is convolved with a speech file from the LibriSpeech test set
  • Treble10-Speech-6chDevice - Each DeviceRIR from the RIR-6chDevice subset is convolved with a speech file from the LibriSpeech test set.

In total, the dataset contains roughly 3,000 source-receiver configurations, each generated with verified physical accuracy.

Why accessibility matters

Collecting large, diverse, and high-quality acoustic data has traditionally been resource-intensive and difficult to scale. Physical measurements require time, space, and specialized equipment, making large-scale data collection for AI research impractical.

By making simulated acoustics freely available, Treble10 enables researchers everywhere to explore room-acoustic diversity without the need for costly measurement campaigns. This accessibility supports faster experimentation, reproducibility, and collaboration across research groups and institutions.

Beyond Treble10: Scaling with the Treble SDK

While Treble10 provides a curated sample of acoustic environments, it represents only a fraction of what can be achieved using the Treble SDK.

The SDK allows research and development teams to programmatically generate thousands of rooms, devices, and acoustic configurations, all simulated with physical accuracy and controlled variability. It enables:

  • Large-scale synthetic dataset generation for training and evaluating ML models
  • Virtual prototyping of audio products and room designs
  • Automated Python-based workflows for acoustic simulation and algorithm development

This combination of physical precision, automation, and scale opens new possibilities for both scientific research and product innovation.

Open access collaboration

The Treble10 dataset is freely available for non-commercial use on the Hugging Face Hub. We encourage the community to use, analyze, and build upon the data, and to share insights that help advance the state of audio and speech technology.

This collaboration underscores our commitment to making accurate, high-quality acoustic data accessible enabling researchers and engineers worldwide to better understand, model, and design how the world sounds.

Recent posts

06 January 2026

Meet Treble at CES 2026

At CES we will present the Treble SDK, our cloud based programmatic interface for advanced acoustic simulation. The SDK enables high fidelity synthetic audio data generation, scalable evaluation of audio ML models and virtual prototyping of audio products. Visit us in Las Vegas from January 6-9, 2026 at booth 21641.
07 November 2025

Studio Sound Service authorized reseller of Treble in Italy

Through this partnership, Treble and Studio Sound Service are bringing next-generation acoustic simulation and sound design solutions to professionals across the country. With its deep expertise and strong reputation in pro audio, Studio Sound Service is the perfect partner to expand the reach of Treble’s cutting-edge technology, empowering acousticians and sound engineers to design better-sounding buildings and venues.
18 September 2025

Webinar: Immersive Auralizations with Rendered Graphics

Watch our two-part webinar from Treble, and featuring Jonah Sacks and Khaleela Zaman from Acentech. Discover Treble’s new auralization workflow, combining visual overlays and acoustics for a more dynamic presentation, and hear how Acentech applies Treble to real-world projects.