This project investigates whether a Transformer-based Audio Spectrogram Transformer (AST) can improve frog species recognition and call segmentation signals in Neotropical soundscapes. I focus on reliability under long-tailed class distributions and evaluate whether model outputs recover ecologically plausible temporal calling patterns (diel activity).

Passive Acoustic Monitoring (PAM) can capture biodiversity at scale, but the data volume makes manual expert review impractical. In species-rich environments, recordings contain overlapping calls, strong background noise (rain, insects), and a long tail of rare species. The goal of this project is to build a better machine listening approach that remains useful under these real-world constraints — especially for rare species where conservation value is highest. This work is inspired by and builds upon the AnuraSet dataset introduced by Cañas et al. (2023), which provides a comprehensive benchmark for Neotropical anuran call identification. Their work highlighted the challenges of multi-label classification in passive acoustic monitoring contexts, particularly with class imbalance and overlapping vocalizations — challenges this project directly addresses through transformer-based architectures.

Kelsey Kiantoro
Museum & Digital Culture student with focus on Data
Kelsey Kiantoro

Latest posts by Kelsey Kiantoro (see all)

Dhruvi Mehta

Dhruvi Mehta

Dhruvi Mehta

Latest posts by Dhruvi Mehta (see all)