Come meet the poster presenters to ask them questions and discuss their work
Check the programme for our poster viewing moments. For more details on each poster, click on the poster titles to read the abstract.
PO022: From Blade to BigQuery: A case study of the Aerosense Data Gateway with open-source, production-ready code.
Marcus Lugg, Senior Software Engineer, Octue
INTRODUCTION The Aerosense project is developing a MEMS-based surface pressure and acoustic measurement system for wind turbine blades. This will help wind turbine manufacturers optimise the aerodynamic design of large, flexible rotors and help wind farm owner/operators detect faults and optimise operation. Whilst this particular sensing setup is unique in the industry, the requirement to collect high bandwidth data from the edge, then robustly and securely ingress it to a cloud compute provider for storage (in a so-called "data warehouse" or "data lake") is extremely common. Many mini-tutorials on this subject exist. However, they tend not to address the worst pitfalls around security, authentication, robustness and the chosen storage form. Complete end-to-end examples that are of production-ready grade simply do not exist. This talk will address that gap with a case study (complete with fully fledged edge- and cloud- side code and documentation) detailing technical obstacles (and their solutions) found while architecting and implementing a production-ready, secure, highly scalable ingress and data lake solution for the Aerosense project. METHOD Raw measurement data is collected from multiple blade nodes to the tower via a bluetooth receiver. At the tower, a Gateway system (running on a Raspberry Pi) buffers data from the receiver before streaming it to Google Cloud Platform. A versatile Command Line Interface (CLI) was developed to run on the Gateway (in the field) or on researchers' laptops (in the lab), enabling quick and easy interaction with the hardware (sending commands and reading data), and incorporating a cloud uploader. The possibility of internet connectivity or power outages are addressed using long-term local storage and a process to daemonise the reader to start on reboot. Buffer overflow problems arose when high bandwidth sensors (like microphones) were added; a multithreaded buffer was developed to enable the uploader to function without packet loss from the receiver. A process was implemented for registering installations and generating long-lived authentication credentials unique to each gateway - enabling fine-grained access control while allowing devices to be deployed for long periods of time through significant outage durations (i.e. no ability to refresh Bearer type access tokens). Cloud Functions were developed to manage 'events' (blobs of time-series data from the uploader) in a highly scalable way (both in bandwidth and in number of installations). Events are received via a Cloud Endpoint, set up to prevent extreme cost spirals in the event of a malicious DDOS-like attack or other malfunction. A BigQuery database was set up with a schema designed to store raw measurements in a versatile way, enabling new sensors or data types to be added without frequent intervention at the database. Finally, Materialised Views were added to restructure the raw data dynamically, enabling easy and intuitive querying using basic SQL, Pandas/Numpy, and dashboarding tools. CONCLUSION An end-to-end, production-ready example of data collection, cloud ingress and datalake storage (together with documentation) has been developed and made available to the community. Code and documentation is available at https://github.com/aerosense-ai/data-gateway [https://github.com/aerosense-ai/data-gateway]