C’mon Feel the Noise
By Zachary Tessler, Head of Data Science
As a maritime technology company focused on IoT data and machine learning, we are often asked how our vessel performance models account for data outliers and noise. Sensor drift and malfunction is a topic that comes up a lot in the context of our discussions about data quality and the fidelity of our machine learning models. It is also an unavoidable reality of high frequency sensor data that our clients have enlisted Nautilus’s support to help solve. As experts in IoT and unification of disparate sources of data, Nautilus has put in place automation at all stages of the data lifecycle to ensure our platform is surfacing the most reliable data and our models are being trained on data that is absent of erroneous values.
Upon ingestion, we automatically flag outliers in sensor data and, in some cases, prevent that data from being used downstream if there is a clear reason to exclude it (e.g. if we have prior knowledge of a sensor malfunction). Cleaning the data in this way ensures a reliable dataset not only for platform-based decision-making, but also for our machine learning models, which inform many of the insights we surface. Using aggregation, averaging, and interpolation techniques, Nautilus controls for noise in data that is attributed to sensor malfunction or drift.
Noise in weather estimates from vessel sensors are common, as these sensors are often not well maintained. The gridded nature of global weather models also adds uncertainty. If the vessel’s spatial and temporal location does not fall exactly on the gridded coordinates of the model, the data must be interpolated to estimate weather conditions at a precise location and time.
If a vessel is transmitting data in one second intervals, Nautilus may aggregate these intervals and average them over one minute bins to reduce noise and better support cross-sensor analyses. This process of automatic aggregation and averaging results in a clearer, more reliable, and more informative signal from the vessel. Nautilus’s noise reduction is dependent on the availability of high frequency sensor data transmission. With higher frequency data, we can better reduce noise while still providing insight into short-term changes in operational conditions.
After ingestion and prior to use in machine learning processing, Nautilus conducts a second round of cleaning that screens for nonsensical and unrealistic values, such as negative readings (-0.56 MT of fuel consumed) or draft data listed at an order of magnitude outside of the norm (150 meters versus 15 meters).
We also detect instances in which sensor readings contradict each other. For example, if main engine fuel consumption at a given timestamp is low or even zero, then we would presume that speed through water (STW) should also be low. While a zero value for both is not problematic (we would expect to see this fairly regularly), a zero value for one but not the other calls into question the reliability of the data. In that case, we would exclude those data points from model training.
Finally, after ingestion and modelling, Nautilus conducts a third round of cleaning that screens for distractions — outputs that, while accurate, are unlikely to provide useful insights or recommendations for the vessel. For example, using historical data, Nautilus can produce expected consumption for a vessel given a speed and potential weather conditions.
Say that we are examining a data set in which the vessel has been in port the majority of the time — it has been maneuvering, only travelling at one knot. Based on the consumption data we have, let’s also say we can accurately predict the vessel will consume 0.25 MT per day whenever its speed is 1 knot on average. That insight, while accurate, is unlikely to provide useful input to an operations manager who cares about vessel behavior while away on passage. By screening out those periods before they get to the platform, we ensure that the recommendations we surface are optimized for underway operating conditions rather than accurate-but-extraneous settings.
Accept sensors for what they are and learn to extract the most value out of them
It is common for clients and software providers to view data noise and fluctuations as a problem. At Nautilus we have built automation, via flatlining indicators and other tools, that enhance data quality and model reliability by flipping that common notion on its head. By accepting that sensor data may include noise and occasional erroneous outputs we work to isolate instances where something is potentially wrong with the sensor, when data signals are lost, or when the sensor is broken. Nautilus continues to build and refine back-end automation to ensure that our clients are leveraging a high fidelity understanding of vessel performance in their pursuit of operational excellence.