Intelligent Infrastructure

Mass data and next-gen workloads Intelligent Infrastructure

Why do I need Data Center Monitoring?


My purpose with this blog post is to convince you that data center (DC) monitoring matters to you and, in particular if you’re an IT professional, to provide you with the key requirements for a DC monitoring tool or suite.

I realize that you may be visiting this blog for any number of reasons: maybe you want to learn about the latest and greatest Cloud trends, maybe you use DC monitoring in your workplace, or maybe you have a third reason. Because of this and also because there is a lot of ambiguous information about the Cloud available online, I want to you read this blog post with the following presumption:

“At the end of the day, the Cloud is usually a piece of metal in a DC near the Cloud user. If that piece of metal fails, the user might lose data, lose temporary access, or, at best, experience a slower service as he or she uses the Cloud.”

I have visited several DCs and can testify to the fact that DC personnel put in a lot of effort to prevent service interruptions, as mentioned above, because downtime and unmet service license agreements are very costly issues. There are several ways DCs go about keeping their services up and running as much as possible:  first, they may buy the most reliable hardware for their application and second, they provide redundancy in compute and storage. When that’s all said and done, how do you know if you’re utilizing your DC as expected? How do you know if a component is a bottleneck, is too stressed, or perhaps already failed?

DC Monitoring — My Personal List

My answer is DC monitoring. Simply put, DC monitoring is for a DC what Task Manager is for your laptop. However, let me be a little more technical and give you my personal list of what I believe a DC monitoring tool should provide.

  • Scalability: Under the hood, every DC monitoring solution should be tailored with scalability in mind. Scalability requires having minimal performance overhead from the monitoring software itself and providing an embedded scheme to get more compute power for monitoring as the DC grows.
  • Monitoring, Alerts and Triaging: A DC monitoring tool should have a central dashboard providing views of your DC for performance evaluation and notifications when anything is out of the ordinary. This central dashboard should also let you triage and identify potential issues ranging from misconfiguration to components that are struggling or offline.
  • Analytics & Predictions: A monitoring tool holds the potential to provide a holistic view of a DC through extensive metrics collection. If the tool also comes with analytics capabilities such as pattern detection and trend analysis, I believe that the DC monitoring tool can and should be the central technical platform where future maintenance and provisioning needs are predicted and planned.

As a final remark, note that although DC monitoring tools may include some reset and control features for certain hardware and software stacks, you should not expect such management capabilities to be complete. The more important thing is to ensure that the DC monitoring integrates with your existing DC provisioning and management procedures.

Author: Christian Madsen