Perform Root Cause of Failure Analysis

We do our best to alert you prior to unexpected failures but for various reasons, we may not be able to alert you in time or you may not be able to take the necessary action in time. These failure situations present an opportunity for continuous improvement. 

Anytime a machine monitored by Augury fails, we want to know about it. Once notified, we kick off a collaborative process with you to understand if there was anything we could’ve done better or what we should focus our resources on improving. Here’s what we do once we’re notified: 

  • Ensure the failure is logged on the machine page for future reference.
  • Contact you to get as many details as possible regarding the event.
  • Once all the available information is collected we review our data to see if the event is apparent.
    • The first check is that all hardware was installed and communicating
    • If the data was flowing as expected, we scour all relevant data sets looking to see if there were any which were sensitive to the failure mode.
    • If there was a data set that was sensitive to the failure mode, we check if a detection was propagated by the algorithms.
    • If a detection was propagated, we check if we sent a direct alert to you such as an email as well as when our Vibration Analyst reviewed it.  
    • If the Vibration Analyst reviewed the detection prior to the machine failing, we review the decisions they made at that time.

Each of these findings from our internal review helps point us toward areas of improvement.  Maybe it’s installation practices. Maybe it’s better monitoring or support for our IoT devices.  Maybe we need to make changes to our sensing technology. Maybe the algorithms need tweaking. Maybe we need more direct alerts or additional analyst coverage. Or maybe there is a mentoring opportunity. 

When there is no clear low-hanging fruit to improve Augury’s products or services, there may still be clues in the data or elsewhere that can aid the investigation and lead to the mitigation of future failures of the same type. For example, Augury’s data can be paired with operational and scheduling data to understand exactly when a failure occurred, which may lend clues as to how the failure mode developed. Did something go wrong during the cleaning or maintenance activities?  

In the worst-case scenario, there will be nothing of use in our data sets, but that is perhaps when it is most important to conduct a thorough root cause of failure analysis (which is always important).  

Attempt to understand what did happen and what didn’t happen leading up to the failure. Assess the process, the environment, the staff, and the equipment upstream and downstream. When bearings fail, cut them open and examine the wear patterns to understand if the bearing was suffering from electrical fluting, thrusting, lubrication issues, water intrusion, or something else. When a motor short circuits, verify the motor, the wiring, the short circuit protection device, and the overload protection are correct.  Does the motor need to be VFD rated?  Is the insulation rating sufficient?  Is the IP rating sufficient?  

There are too many variables and examples to list, but taking the time to investigate and address the root cause(s) of failure will ensure history does not repeat itself.  


Categories