Always-on uptime in a data centre is absolutely essential to business success, and ensuring uninterrupted service requires constant vigilance and maintenance. This need for constant upkeep and reliance on infrastructure only looks set to increase as organizations increasingly deploy more business-critical applications.
While there is continuous innovation to introduce new infrastructure management tools, many still fall short of achieving the enhanced automation and lowered maintenance requirements that the industry covets. As a result, many IT professionals are still wasting days and nights – possibly even missing important birthdays and anniversaries – to deal with issues that require manual tuning.
A major pain point that continuously surfaces during conversations with customers is how maintenance cycles still require human intervention. Furthermore, it is a large drain on operating budgets, with data centre operators spending a huge proportion of their budgets on keeping the lights on.
This begs the question of why maintenance is still keeping operators up at night despite the constant introduction of new tools to deal with the problem. What are we really missing?
The shortfalls of traditional infrastructure tools: Truly removing the burden of managing infrastructure requires having the foresight to predict problems before they occur, along with being able to provide deep insightful intelligence of underlying workloads and resources for better infrastructure optimization. Lose sleep over data centre maintenance no further. Consider these four factors to determine if your tools are falling short in overcoming frustrating maintenance problems:
- They Don’t Learn From Others
Analytics that simply report on local system metrics tend to offer limited value. Instead, what you should look for in a tool is its ability to learn from the behavior of thousands of peer systems, so as to aid in detection and diagnosis of developing issues. In a sense where it is said that two minds are better than one, a thousand are infinitely more so.
A holistic approach to data collection and analysis can pool observations from an immense variety of workloads. This allows rare events identified at one site to be preemptively avoided at another, and for more common events to be detected quicker with greater accuracy.
- Failing to See the Whole Picture
Traditional tools often only provide analytics in a siloed fashion; providing only system status per device, which is just one part of the overall story. With problems that disrupt applications popping up anywhere in the infrastructure stack, it is important to have the ability to conduct cross-stack analytics across multiple layers to get the bigger picture. This will require crucial components such as applications, compute, virtualization, databases, networks and storage.
- They Don’t Know Enough
Predictive modeling requires deep domain experience – understanding all the operating, environmental, and telemetry parameters within each system in the infrastructure stack. General-purpose analytics can only go so deep. However, pairing domain experts with AI can enable machine-learning algorithms to identify causation from historical events, and in turn, predict the most complex and damaging problems.
- They Can’t Act Without You
Perhaps the biggest drawback of traditional tools is their inability to act. In the ideal state of autonomous operations, the data centre would be self-managing, self-healing and self-optimizing. In essence, they should be able to avoid a problem or improve the environment without the need for human intervention from an administrator. To achieve this level of automation would require a proven history of automated recommendations that provide the necessary level of trust and confidence.
The Future of Data Centre Maintenance
To overcome the limitation of traditional tools and convincingly reduce maintenance requirements – and better automate a data centre – one would have to embrace a new generation of AI solutions. This means leveraging tools that are able to observe, learn, predict, recommend and ultimately, automate.
Through observation, AI will be able to develop a steady-state understanding of ideal operating environments for various workloads and applications. Deep system telemetry coupled with global connectivity allows for rapid cloud-enabled machine learning, resulting in AI tools being able to quickly predict problems through pattern-matching algorithms. Application performance can even be modeled and tuned for new infrastructure based on past historical configurations and workload patterns.
Based on these predictive analytics, AI solutions can determine appropriate responses required to improve the data centre environment. The pressure is then taken off IT teams – and they no longer have to work through the night to find the source of the problem when managing infrastructure. More importantly, in the event that the AI proves to be effective, recommendations can then be applied automatically without the intervention of IT administrators. That to me is achieving the holy grail of automation.
At HPE, we have seen how our customers utilizing AI tools are able to predict and resolve issues automatically 86% of the time. Furthermore, they spend 85% less time on storage issues and even enjoy a 79% reduction in IT storage operating expenditures. The advantage of deploying AI to assist in data centre infrastructure is undeniable.
Furthermore, with technological advancements set to invigorate all sectors of the Asia Pacific economy, the highly-diverse region is expected to experience a talent shortage of 2 million IT professionals by 2030 .I’m certainly looking forward to the not so distant future where automation will be the next frontier in data centre management – and of course, getting a good night’s rest.
By: Vikram K, Senior Director, hybrid IT, HPE India.