Publication:
EuqAud: Detecting Gender Bias in Audio Datasets Using Polynomial Regression-Based Metric

Research Projects

Organizational Units

Journal Issue

Abstract

With the growing adoption of audio based AI systems in high-stakes domains such as healthcare, law enforcement, and social media, ensuring fairness particularly regarding gender bias has become critically important. While prior work on fairness has predominantly focused on disparities in model performance, bias inherent in training datasets remains underexplored. To address this gap, we propose EuqAud, a novel, pre-trained and traceable fairness metric that quantifies gender bias in audio datasets using raw acoustic features such as pitch, energy, amplitude, and voice activity. Unlike methods dependent on demographic labels such as race, age or language, EuqAud is designed to be demographic and language agnostic, enhancing its applicability across diverse contexts. The score is computed using an equation derived from polynomial regression with L2 regularization (Ridge regression), yielding robust and generalizable outputs. It spans a range from −10 to 10, where 0 denotes neutral, positive scores indicate male dominant bias, and negative scores reflect female dominant bias. For clarity, bias severity is categorized into three tiers: Neutral (EuqAud < 2), Moderate Bias (2 ≤ EuqAud ≤ 6), and Strong Bias (EuqAud > 6). Evaluation across multiple datasets demonstrates high predictive performance, with R2 values between 0.95 and 0.99. By focusing on dataset level bias rather than model outcomes, EuqAud offers a scalable and rigorous solution for advancing fairness in audio-based AI systems.

Description

Keywords

Audio datasets, bias detection, EuqAud, gender bias, polynomial regression, responsible AI

Citation

Endorsement

Review

Supplemented By

Referenced By