The Journey of a Data Scientist: Chiller Surge Counts

Author: Adam Ashenfelter


Chillers vibrate routinely, surge occasionally, and excesses in both can be problematic. Distinguishing between benign and detrimental behavior is essential to ensuring successful chiller maintenance and protection.

Refrigerant that reverses course and flows from the condenser to the compressor causes chillers to vibrate more than normal, creating a groaning or squealing sound. Excessive surge events, whether from poor maintenance or poor control of water flow rates or temperature, can reduce a chiller’s reliability and life span.

The first line of defense is often a surge detector that counts chiller surge events; however, uncleansed data leads to poor analysis and conclusions. Plots of raw, unfiltered vibration sensor data typically reveal monotonically increasing surge counts interspersed with spikes, dips, and plunges (Figure 1). False positives are not uncommon and decision making based on misleading data can be ineffective and costly. Even in simple tasks, such as alerting on a rapid increase of a counter, the devil is in the details.

Figure 1: Raw surge counts reported by a chiller. Should be monotonically increasing. Bad data leads to false alarms.

Doing analysis with equipment data requires special expertise. With the right tools and skill sets, steps can be taken to cleanse the data, find the signals through the noise, and generate thresholds for actionable alarms based on meaningful surge information. Tignis helps companies apply data science in this manner to improve preventive and predictive maintenance. Tignis has developed a proprietary technology that enables rapid and agile integration, normalization, and analysis of physical equipment and associated sensor data.

Data science increases trust in the data

There are several reasons surge count data can be “dirty” and prone to misinterpretation:

  • Sensors may register surges even when the chiller is off, whether due to maintenance or another environmental trigger.
  • Surge counts may momentarily drop to zero due to a power or network communication failure, and then bounce back up to where the counter left off.
  • Surge counts may reset to zero if a sensor loses power for too long, is replaced, or needs replacement, causing the counter to start over.
  • Some sensors are more sensitive than others, e.g., one chiller’s detector might register 10 surges while another registers one surge for a similar event.
  • Overly sensitive detectors may produce false alarms when no surge actually occurred, and overactive or malfunctioning sensors may do so repeatedly.

Rules, flags, and filters are cleansing techniques used by data scientists to produce useful information. Instead of relying on an absolute count of every detected chiller surge, data science allows plants to see actual changes over longer periods of time by taking moment-to-moment data (e.g., every five minutes) and comparing the current count to the previous count.

Rules can flag when there are more events than expected so that suspicious values that clutter up relevant data can be dropped out. With filters, zero values from momentary drops or periods when the chiller was not actually running can be dropped, along with the negative values seen in full resets. Identifying and eliminating random spikes in the data helps to focus attention on clumps or clusters of spikes. These measures, which increase trust in the data, enable analysis of well-defined groups of activity for the overall number of surges over a given period of time.

Once the data is cleaned and filtered, thresholds can be set to generate alerts. The goal for alarm thresholds is to generate alerts when the chiller has a surprising number of surges over a specified time frame. For instance, more than 15 surges in a six-hour period could be an event triggering an alert to the maintenance team (Figure 2).

Figure 2: Surge count over six-hour windows, after filtering bad data. But what’s the right threshold on which to alarm? Different chillers need different thresholds.

Using this analysis, a dynamic threshold can be assigned based on how many surge events for this chiller is normal over a period of time. This is accomplished by finding a high quantile over an historic period, such as the 98% quantile over the previous 30 days, and alerting on surges above that amount (Figure 3).

Figure 3: Surges over six hours, with dynamic threshold unique to each chiller. Finally, an alert rule that generates the right amount of alerts across all chillers.

Dynamic alert thresholds are useful because they can be applied to chillers across the board. Due to variances in chiller design, sensor sensitivity, and operating conditions, the threshold for one chiller may not work well for another. Therefore, alerting thresholds must be selected individually for every chiller based on its own historic activity. Applying dynamic thresholds to the cleansed data is an effective longer-term solution.

Strong tooling and expertise make a difference

Most of the work in data science and machine learning (ML) is in cleaning the data and getting it ready to model. Having robust tooling and deep technical expertise in data analysis and transformation streamlines the process. For instance, Tignis developed an internal tool for data processing, exploration, and modeling of mechanical systems. It establishes a digital twin of the system where all connections and sensors are visualized and modeled, and a data processing layer on top where the rules, filters, and thresholds are formulated and deployed.

Preventing the destructive effects of chiller surge is a must for any maintenance organization, and data science can facilitate an intelligent approach. Tignis is uniquely equipped to help.