Frequency reflects region in birdsong recognition: Quantified by mutual information and mitigated through adaptive normalization
-
Graphical Abstract
-
Abstract
Geographic and environmental heterogeneity generates pronounced variability in bird vocal dialects, complicating reliable species identification across spatially distinct populations. To quantify such dialectal divergence, maximum mean discrepancy (MMD) analysis was applied to multiple regional birdsong datasets, revealing significant distributional differences between recording locations. To overcome these challenges, an adaptive normalization and recognition framework was developed that integrates task-driven feature normalization with a multi-head attention ResNet (MHAResNet) classifier. The normalization module dynamically reweights frequency, time, and channel dimensions according to their contextual importance, enhancing feature alignment across domains. The classifier concurrently distinguishes both species identity and regional provenance. This coupled architecture suppresses domain-induced variability while preserving salient acoustic cues critical for recognition. To dissect the contribution of individual feature dimensions, mutual information neural estimation (MINE) was employed to quantify their relevance to species and region classification. Across three geographically diverse birdsong datasets, the proposed method improved species recognition by an average of 2.9% and region recognition by 3.0% relative to non-normalized baselines. MINE results indicated that frequency features were the most predictive of geographic origin, whereas channel-based features most strongly encoded species discrimination. These results advance understanding of birdsong feature attribution and offer a scalable framework for acoustic biodiversity assessment across biogeographic gradients.
-
-