Presentations - WindEurope Technology Workshop 2026

Poster pitch - Preventing Unnoticed Faults: Improving Model Reliability via Distribution-Optimized SCADA Partitioning

Alex Coronati, Renewable Energy Performance Engineer, Jungle AI

Session

Abstract

As wind assets age, the ability to detect creeping underperformance and incipient health issues becomes critical for life cycle optimization. While machine learning (ML) models for SCADA normality are now standard, their operational reliability is frequently undermined by poor validation strategies. Inaccurate performance assessment during the training phase often leads to a high rate of False Positives (alarm fatigue) or, more dangerously, False Negatives—where critical failures go unnoticed because the model was validated on unrepresentative data. This submission addresses the root cause of this reliability gap: the inherent difficulty of creating training and validation datasets that are statistically identical yet temporally distinct. Traditional chronological splitting renders models vulnerable to seasonal bias (e.g., training in winter, testing in summer), causing normal seasonal drifts to be misclassified as anomalies. Conversely, random shuffling introduces data leakage due to the high autocorrelation of sensor data, inflating model accuracy metrics and masking its inability to generalize. We propose a novel, robust framework for SCADA data partitioning: Evolutionary Stratified Blocking. This method moves beyond simple time-based splits to strictly optimize the statistical integrity of the validation process. The approach utilizes an evolutionary algorithm to segregate data into "blocks" rather than individual timestamps. It iteratively searches for the optimal combination of blocks to form Training, Validation, and Test sets that satisfy two critical conditions: 1. Leakage Prevention: The algorithm enforces Temporal Safety Gaps at the boundaries of every data block. This eliminates the risk of information "bleeding" from rolling feature windows into the validation set, ensuring that performance metrics reflect true predictive power. 2. Distribution Matching: The core innovation is the minimization of the Wasserstein distance between datasets. The algorithm ensures that the distribution of critical sensors (temperatures, pressures) in the validation set mathematically mirrors the training set, regardless of the season. Finally, we present real-world case studies demonstrating the impact of this methodology on operating wind farms. We compare standard chronological splits against our distribution-optimized approach. The results show that traditional methods frequently failed to flag subtle deviations in component behavior—allowing creeping failures to persist unnoticed—or flagged normal seasonal variance as critical issues. By utilizing Evolutionary Stratified Blocking, we demonstrate a significant reduction in false alarms and the successful early detection of complex component health issues that were previously masked by validation bias. This work offers the industry a standardized pathway to trustable, risk-mitigating AI.