IA OPS is the next DevOps

In the late 90s and early 2000s, IT operations were run like the back-office jobs of a bank. Applications were simpler, integrations were simpler, and a big lot of failures were due to human error rather than due to anomalies in the applications. The impacts of the application issues were limited in terms of the impacted userbase and lost business. There were predefined sets of rules to be followed for addressing various issues and if something was found beyond the scope of the defined rules, it was immediately referred to the development team. For the next occurrence, it was already documented in the operations manual.

The application designs took a leap from desktop applications to server-based applications and needed more complex server-side infrastructure. As the infrastructure kept growing and became complex to handle, the Infrastructure Management team came into the picture. They were the dedicated set of professionals whose responsibility was to make sure that the underlying infrastructure remained up and behaved optimally. Infra team also had their rule books, but they attempted to address issues from the viewpoint of infrastructure changes. For example, they would try to optimize the memory allocation to an application when it showed out of memory errors instead of going back to the development team for reducing the memory footprint.

With more applications getting added to the solution mix, there was a need for a team that could take care of the end-to-end process of delivering the solution and operating the solution. That’s where DevOps teams came into the picture. They got involved during the development phase and continued the operations until the end-of-life of the solution. They provided critical inputs regarding the non-functional requirements of the solution, calibrated resource requirements, and set up standard processes that could lead to CI/CD pipelines. It improved the overall reliability of the solution and its environment.

As we move ahead in this journey, the focus is now on predicting the future and prescribing remedies in advance. DevOps enables automation of the processes for managing and operating solutions right from the development environment through the production environment. However, it requires AI enablement to ensure that it can analyze and come up with prescriptive action items for better reliability of the solution.

Gartner puts AIOps to be a combination of big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination. IBM defines AIOps to be the application of artificial intelligence (AI) to enhance IT operations.

IT operations are continuously growing in scale and complexity. It is beyond human potential to observe, understand, and act on the change dynamics quickly. DevOps brings in some control over the entire process, however, AIOps is the way forward with automation playing an important role in deriving actionable intelligence from the operations data.

How to get started with AI OPS?

When you are already into DevOps and want to begin your journey of Intelligence with AIOps, there are 3 important things to consider.

Collect the Data

Observability is the keyword. Build more observability in your solution, both at the application level as well as the infrastructure level. The application development, configuration, and deployment must include specific implementation of non-functional requirements that produce data about every step performed in the application. Even the PII data is supposed to be masked and made available in the logs. In the same way, capturing the state of infrastructure resources at a high frequency works like checking the heartbeat of a patient in a hospital’s ER.

The key ideas are, as follows:

  • The more the data, the better the chances of knowing and fixing your solution issues pro-actively.
  • The higher the frequency of data collection, the higher the probability of finding out every little twist and turns in the solution runtime.
  • The more the duration of data, the easier it is to establish the trends of changes in the non-functional concerns of the solution

Analyze the Data

The process of data analysis is the same as building business intelligence from business data. It goes through the steps of data cleansing, data selection, pattern discovery, inference creation, and model training.

The goals of data analysis are:

  • To establish a strong historical knowledge of the solution.
  • Detect anomalies in solution runtime.
  • Analyze the performance of the solution components in isolation and integration.
  • Correlate the results with peak performances and breakdown incidents.

Act automatically and immediately

Action is the most important part of AIOps where the process does not end at the presentation of the information but goes a step ahead in taking necessary actions. The biggest difference with traditional DevOps is that AIOps proposes a set of remedial actions and acts upon the most suitable one. It rarely needs to wait for human intervention to address an issue. The scope of action has been expanding over the years of usage of AIOps from fixing resource crunch issues to making configuration changes to the solution.

The key remediation steps include:

  • Identifying the root cause of an issue that has already happened or predicting an issue that might happen in near future by studying the trends.
  • Suggesting suitable actions to contain and remediate the issue, such as freeing up the memory and storage, providing more CPU time to the processes, and more.
  • Identifying the best possible action and executing pre-configured scripts to undertake the action.

Recommended Tools

AIOps is a process and requires different tools to address different concerns within the process with or without human intervention. The tools make it easier to adopt the AIOps processes by addressing the common challenges, such as the ever-growing size of the data set, the resource and velocity requirements for processing the data, and the ability to do it all close to the solution at the edge locations.

We recommend you look at incorporating the following tools in your solution for adopting AIOps yourself:

  • ELK: It is a combination of 3 different tools from the Open-source world, viz. Elastic Search, Logstash, and Kibana. The 3 pieces distinctly represent the 3 phases of AIOps: data collection, analysis, and action.
  • Splunk: It is one of the best tools for monitoring and analyzing various log messages produced by your solution components.
  • AppDynamics: It is another enterprise-grade application that helps you in monitoring and taking corrective actions about your solution. It works alike for both on-premises and cloud.

There are other managed services offered by the major cloud service providers for solutions hosted on their platforms. These services constitute the building blocks of the AIOps implementations on the cloud. For example, AWS offers CloudWatch and CloudTrail, Azure offers Azure Monitor, and GCP offers Cloud Monitoring, Cloud Logging, and Intelligent Operations.

With the help of such tools and best practices, the effort of operating a solution can be greatly reduced, failure causes can be identified quickly, and remedies can be implemented easily. That is the direction and focus of AIOps. Connect with us if you would like to explore more options and complete solutions around AIOps.