A Library for Metric-based Root Cause Analysis
PyRCA (Python Root Cause Analysis) is an open-source Python machine learning library specifically designed for Root Cause Analysis (RCA) in the field of Artificial Intelligence for IT Operations (AIOps). It provides a comprehensive framework for uncovering complex metric causal dependencies and automatically locating root causes of incidents. With PyRCA, users can efficiently analyse metric data, visualise causal graphs, and assess the performance of RCA models.
One of the key motivations behind developing PyRCA was the need for an automated RCA toolbox. Traditional methods of root cause analysis in IT operations often involve manual and time-consuming processes. PyRCA aims to address this pain point by offering an end-to-end solution that includes data loading, causal graph discovery, root cause localisation, and RCA results visualisation.
PyRCA offers several notable features. First, it provides a user-friendly interface that allows interactive editing and revision of causal graphs, making it easy for users to customise and refine the analysis. Additionally, PyRCA supports multiple causal graph construction and root cause scoring models, enabling users to choose the most suitable approach for their specific use case.
The library also includes a flexible pipeline that evaluates the performance of RCA models quantitatively. This pipeline is complemented by a visualisation module, enabling more qualitative analysis of the results. The interactive dashboard of PyRCA provides a comprehensive view of the causal graph discovery process and allows users to explore and interpret the analysis effectively.
As an open-source project, PyRCA encourages contributions from the community. The development team at Salesforce, the organisation behind PyRCA, welcomes input, support, and collaboration from industry professionals and researchers. In this way, PyRCA aims to continually improve and expand its functionality, including adding support for log data, trace data, and integrating additional RCA models into its benchmark.
Bayesian Networks
One of the key features of PyRCA is its implementation of Bayesian Networks for root cause scoring. Bayesian Networks are probabilistic graphical models that represent and reason about uncertainty. In the context of PyRCA, Bayesian Networks are used to assess the likelihood of a certain metric causing an incident or anomaly. By analysing the causal relationships between metrics, PyRCA can quantify the contribution of each metric to the occurrence of incidents.
To use PyRCA with Bayesian Networks, the library provides the RCAEngine class, which includes methods for building causal graphs, training Bayesian networks, and identifying root causes. The causal graphs capture the causal relationships between metrics, while the trained Bayesian network helps in scoring the root causes based on their influence on incidents.
In conclusion, PyRCA is a valuable resource for IT operations professionals and researchers involved in root cause analysis.
Github: https://github.com/salesforce/PyRCA
Source: https://arxiv.org/pdf/2306.11417