Presentations - WindEurope Technology Workshop 2026

Poster pitch - Zero-Shot Weak Annotation of Avian Species Using Large Vision-Language Models

George Dimas, Head of R&D, nvisionist S.A.

Session

Abstract

Research Motivation and Objectives The rapid expansion of wind energy infrastructure necessitates advanced solutions for avian protection to ensure biodiversity conservation. While deep learning has shown promise in ecological monitoring, the application of pre-trained foundational models, specifically Large Vision Language Models (LVLMs), in this domain remains largely unexplored. State-of-the-art foundational architectures have demonstrated remarkable zero-shot capabilities in general computer vision tasks; however, the investigation of their applicability in the context of bird species classification is yet to be sufficiently investigated. We address this gap by evaluating the capacity of a Foundation Model, acting as an annotator, to perform automated annotation of avian species within complex Wind Turbine Generator (WTG) environments. By bypassing the need for expensive human labeling, this work establishes a framework for leveraging the inherent semantic reasoning of LVLMs to enable scalable, real-time ecological monitoring Methodology The proposed methodology utilizes a pre-trained Large Vision Language Model (LVLM) to enable the automated image annotation of avian species groups. As a first step, detected bird instances are propagated to the LVLM along with a curated, context-aware prompt schema designed to exploit the implicit knowledge base of the model for accurate categorization of the avian species. To evaluate the zero-shot generalization and mitigate the stochasticity that is inherent in generative sampling of LVLMs, the proposed pipeline leverages a deterministic sampling configuration. This ensures reproducibility and weak labels with high confidence scores suitable for downstream reanalysis datasets. This approach effectively eliminates the necessity for extensive human-in-the-loop supervision, significantly reducing the operational costs. Results The experimental evaluation process was conducted on a bird classification image dataset curated by Cornell University. The dataset consisted of 13,730 images in total. The depicted subjects were organized into distinct categories to address avian species classification tasks. A rejection class was also considered to assess out-of-distribution robustness. An 8B parameter foundational LVLM was used. The proposed methodology demonstrated strong zero-shot classification capabilities with an average precision, recall and F1-score of 82%, 87%, and 84%, respectively. Most critically, for priority classes, such as Raptors, the primary target for regulatory compliance and collision mitigation, the LVLM exhibited exceptional performance with an F1-score exceeding 98%. These results confirm that foundational models can generate the weak labels necessary for constructing large-scale reanalysis datasets, effectively bypassing the bottleneck of manual expert review. Conclusion These results validate the efficacy of Large Vision-Language Models as robust, zero-shot annotators for avian biodiversity monitoring. While direct deployment of such large-scale architectures remains computationally prohibitive for edge infrastructure, our findings demonstrate that LVLMs can successfully function as high-fidelity weak supervisors. By automating the generation of labeled reanalysis datasets, this framework effectively resolves the data scarcity bottleneck, establishing a scalable pathway for training lightweight model architectures.