Hear from Industry Leaders
This webinar will feature a range of industry leaders.






Standard ASR benchmarks often miss the conditions that matter most in real-world use: background noise, competing speech, reverberation, distance from the microphone, and other acoustic effects.
The Far-Field ASR Leaderboard, built by Treble and Hugging Face, gives developers and users of ASR engines a more realistic way to evaluate model performance in the environments where speech technology is actually deployed.
Alongside the launch of FFASR, hear from leading voices in speech AI. Dr. George Saon at IBM will share perspectives on the challenges of far-field speech recognition and the importance of robust evaluation in noisy environments. Nithin Rao Koluguri, Senior Research Scientist at NVIDIA, will present NVIDIA’s work in ASR, discuss far-field robustness, and explore the Parakeet family of ASR models. Julian Mack from Cohere will present Cohere’s work in speech modeling, including Cohere Transcribe, and discuss the challenges of evaluating ASR systems in real-world conditions. Dr. Shinji Watanabe, Professor at Carnegie Mellon University will also share insights from his research in speech and audio processing.
On the agenda for this Webinar:
1. Introducing the Far-Field ASR Leaderboard
Hugging Face will introduce FFASR, explain how the leaderboard works, and show how ASR developers and users can evaluate models under realistic far-field conditions.
2. Why real-world acoustic evaluation matters
Treble will explain the acoustic data behind FFASR, why far-field conditions are essential for robust ASR, and how realistic simulations can reveal performance gaps hidden by traditional benchmarks.
3. How to use FFASR and go further
Learn how to submit models, interpret benchmark results, and explore how Treble’s far-field datasets and simulation platform can support model training, evaluation, and custom real-world test scenarios.
4. IBM: Why far-field ASR matters
Dr. George Saon, IBM will introduce IBM’s ASR work, discuss the challenges of far-field and noisy speech recognition, and share perspectives on how realistic benchmarking could influence the future of ASR model development.
5. NVIDIA: Building robust ASR for real-world deployment
Nithin Rao Koluguri, Senior Research Scientist at NVIDIA, will present NVIDIA’s work in ASR, discuss challenges around far-field robustness, and provide insights into the Parakeet family of ASR models.
6. Cohere: Real-world ASR performance and benchmarking
Julian Mack, Staff, Member of Technical Staff, Foundations at Cohere, will introduce Cohere’s speech modeling work, including Cohere Transcribe, and discuss the challenges of far-field ASR and the importance of robust benchmarking.
7. Shinji Watanabe: Advances in speech and audio research
Dr. Shinji Watanabe, Associate Professor at Carnegie Mellon University, will share insights from his work in speech and audio processing and perspectives on the future of ASR research.
Go beyond near field data
For ASR models used in smart devices, meeting rooms, automotive systems, robotics, and other hands-free applications, clean near-field benchmarks only tell part of the story.
FFASR evaluates models against realistic far-field acoustic scenarios, helping the community understand where models perform well, where they fail, and how they compare under conditions that are difficult to reproduce manually.
Why FFASR matters
Easy to use
Submit a model through Hugging Face and view benchmark results in a transparent public leaderboard. FFASR makes advanced far-field evaluation accessible without requiring teams to build complex acoustic test environments themselves.
Comprehensive
The leaderboard evaluates ASR performance across realistic end-user far-field scenarios, including different levels of acoustic difficulty. Instead of reducing performance to one generic score, FFASR helps users understand how models behave across easier, moderate, and more challenging conditions.
Community-driven
FFASR extends existing evaluation approaches by adding realistic scenarios that have previously been difficult to test at scale. Community members can submit models, compare results publicly, and rely on a hidden test dataset designed to support impartial benchmarking.
Reachy's thoughts on far field data
Available on Hugging Face
Hugging Face has become a central platform for the machine learning community, setting the standard for open, collaborative development in AI. Known for its model hub, evaluation tools, and commitment to open benchmarks, Hugging Face plays a key role in shaping how models are shared, tested, and improved across the ecosystem. The introduction of the Far Field ASR Leaderboard continues this direction, bringing more realistic evaluation practices into the open.
FFASR Beta is available on Hugging Face for community members to evaluate their own models and for ASR users to compare the latest benchmark results.
Treble also provides:
Meet the speakers
Audio ML Engineer - Hugging Face
Dr. Eric Bezzam
Eric Bezzam is an audio ML engineer at Hugging Face. He received his PhD from EPFL, and previously worked at Snips, Sonos, DSP Concepts, and Fraunhofer IDMT. He was one of the main developers of pyroomacoustics.
Senior Product Manager for the Treble SDK
Dr. Daniel Gert Nielsen
Dr. Daniel Gert Nielsen is a specialist in numerical vibro-acoustics, with a PhD focused on loudspeaker modeling and optimization. His expertise spans acoustic simulation for communication devices and synthetic data generation for machine learning applications. With a strong background in numerical methods and audio technology, he plays a key role in shaping advanced acoustic modeling solutions at Treble.
Manager Speech Technologies - IBM Research, AI Language Technology
Dr. George Saon
George Saon received the Engineer Diploma from the Polytechnic University of Bucharest, Bucharest, Romania, in 1995, and the M.Sc. and Ph.D. degrees in computer science from Henri Poincare University, Nancy, France, in 1994 and 1997, respectively. He is currently managing the Speech Technologies Group at the IBM T. J. Watson Research Center and is responsible for the development of Granite Speech, a suite of advanced open-source LLM-based ASR and speech translation models. Since joining IBM in 1998, he has worked on a variety of problems spanning several areas of large vocabulary continuous speech recognition such as discriminative feature processing, acoustic modeling, speaker adaptation and large vocabulary decoding algorithms. He has authored or coauthored more than 150 conference and journal papers and holds several patents in the field of speech recognition. He was an Elected Member of the IEEE Speech and Language Technical Committee and became an IEEE Fellow in 2026.
Senior Research Scientist at NVIDIA
Nithin Rao Koluguri
Nithin Rao Koluguri is a Senior Research Scientist at NVIDIA, where he works on automatic speech recognition and speech-language modeling. He created TitaNet, a speaker-recognition architecture now widely used across the field (~1.5M downloads/month on Hugging Face), and co-built NeMo's first speaker diarization system. He led the billion-parameter scaling of FastConformer ASR and built the Parakeet V2 and V3 models, which are among the most performant ASR systems available and now widely used in production dictation apps. His current work centers on the modeling and data efforts behind NVIDIA's next-generation speech-language models, including the Parakeet and Nemotron SpeechLLM models. He holds a master's degree from the University of Southern California, where he conducted research at the Signal Analysis and Interpretation Laboratory (SAIL) under Prof. Shrikanth Narayanan, and has authored numerous papers at ICASSP, Interspeech, and other leading venues.
Staff, Member of Technical Staff, Foundations
Julian Mack
Julian Mack is a staff researcher working on multimodal LLMs and audio at Cohere. His primary focus is on speech modeling and audio understanding and he led the development of Cohere Transcribe. He also contributed to Cohere's Command A Vision model release. Previously, at Myrtle.ai he led a team training ultra low-latency ASR systems. He has a masters in Machine Learning from Imperial College London and a BA in physics from Cambridge University.
Professor at Carnegie Mellon University
Shinji Watanabe
Shinji Watanabe is an Associate Professor at Carnegie Mellon University, where his research focuses on automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and audio processing. He previously held research positions at NTT Communication Science Laboratories, Mitsubishi Electric Research Laboratories (MERL), and Johns Hopkins University. He has authored over 500 papers in speech and audio processing and received multiple awards, including the Best Paper Award at Interspeech. He is a Fellow of both IEEE and ISCA and serves as a Senior Area Editor for IEEE Transactions on Audio, Speech, and Language Processing.
