It is difficult for a company today to think of its technological scenarios without involving some kind of monitoring strategy. For example, solutions that make it possible to detect whether any element – from a node to a network and from a specific service to an API – is working properly, whether it is managed with adequate response times and whether its performance is correct, among other similar settings.
Observability emerges as an evolution of monitoring, especially for very large or complex infrastructures, such as those usually managed by audiovisual content distributors, large e-commerce platforms or financial services companies. In these cases, monitoring systems often grow out of control to the point where administrators are literally dealing with tens of thousands of data, alarms and traffic lights.
In this flood of information, it is almost impossible to determine what is important, which represents a double risk: first, an extensive use of resources to address situations that were not really problematic; and second, the possibility of a major conflict becoming «invisible» among many simultaneous alerts.
Ordering, prioritizing and acting
Observability is nourished by advanced technologies, such as machine learning and deep learning, to sort this enormous amount of information, prioritize it, extract insights, generate real-time diagnoses and propose preventive or corrective courses of action with the least possible human participation. In cases where human intervention is necessary, clear dashboards are available to make informed and agile decisions.
The profile of the talents needed to carry out an observability strategy changes with respect to the skills required to manage monitoring. In the first case, it becomes very important to have analytical skills, as opposed to the second, where a technical view is enough to understand simple parameters. The traditional paradigm of accumulating events in logs and producing «post-mortem» analysis of problems that could not be avoided is becoming obsolete.
A process, not a solution
Observability is more of a process than a solution and relies on a more proactive approach that seeks to resolve situations from a business perspective. For example, if a company has servers distributed in three availability zones and detects an anomaly in one of them, observability allows it to define how the traffic can be transferred to the other two in a natural and unattended way (without a human pressing a button), so that it is not abrupt and does not generate performance problems. To cite another case, when it is detected in advance that orders tend to queue in an e-commerce platform, it is decided to review and optimize the back office processes before suffering the abandonment of purchases. The possibilities are numerous and depend on the specific needs of each company or the characteristics of the various industries.
The road to observability
Companies embarking on the path to observability must consider that it is essential to have a volume of high-quality data. On the contrary, the project will start with the creation of a data lake as a repository, then the definition of a semantic layer to guarantee efficient access and only at a later stage will the observability strategy be worked on.
On the other hand, this is an implementation that, in order to deliver successful results, cannot be done overnight. One strategy to enter this world is to choose one or two pain points of the monitoring system, evolve them towards observability and gain value quickly in order to bring that success to the following projects.
The effort will pay off: according to Nubiral’s estimates, the time to diagnose a problem could be reduced by up to 80%. In real-life terms: enough to avoid a system crash.