
ADEPT: Interactive Visual Analytics for Audio Dataset Exploration and Preparation
Chen Chen, Tica Lin, Josh Kimball
DIS 2026
While visual analytics systems have transformed exploration of structured data and images, unstructured data like audio and video remain underserved despite growing importance in ML applications. Audio datasets present unique challenges: multi-dimensional semantics unfolding over time, hidden quality issues, and unreliable label validity. Through a formative study with 10 practitioners, we identified gaps including lack of quality and feature overviews, uncertainty about label validity, and fragmented data workflows.
We present ADEPT (Audio Dataset Exploration and Preparation), addressing these challenges through three panels: (1) quality-feature visualization combining quality metrics and signal characteristics, (2) semantics validation using audio language models, and (3) provenance tracking with reusable processing specifications. A userstudy with 15 participants demonstrates that ADEPT enables efficient dataset exploration and preparation, with all users successfully comprehending distributions, validating labels, and creating subsets with confidence. ADEPT contributes a practical tool for audio data preparation and design principles for other modalities.





