Case study Education

Virtual infrastructure health check

Assessing the VMware virtual infrastructure to ensure operational reliability and efficiency whilst meeting best practice standards

For Newcastle University the VMware environment is a critical component underpinning many of the University’s IT operations. Following years of reliable service, the platform suffered from disruption. Newcastle University appointed us as VMware Enterprise Solution Providers to assess the situation and find out why the issues were now occurring…

Newcastle University is one of the premier Russell Group Universities in the North East of England. As part of their strategic goals to provide and maintain the highest levels of IT service availability they were considered an ‘early adopter’ of VMware Enterprise server virtualisation technologies, deploying their first HP LeftHand SAN-based virtual workloads as early as 2006.

The VMware environment is a critical component underpinning many of the University’s IT operations, and has grown to a state where the production virtual platform now supports 640 virtual servers spanning two datacentres. The environment has complex technical and operational interactions with multiple technologies, departments, and users, but following years of previously reliable service the platform suffered disruption in 2014 caused primarily by unreliable networking. Recognising a point had been reached where existing assumptions needed to be re-examined, Newcastle University engaged with us as VMware Enterprise Solution Providers, technical subject matter experts – and a fresh pair of eyes – to assess and compare their virtual infrastructure configuration with relevant best practices and determine if it was operating as effectively, efficiently and reliably as possible.

The business drivers supporting this engagement were simple:

a. Ensuring the virtual infrastructure has the lowest possible total cost of ownership
b. Reducing the risk and impact of service disruption following component or other unplanned failure
c. Reinstating the exemplary reputation of the University IT platform

Our approach began with an initial engagement or ‘verbal discovery’ exercise ensuring key stakeholders were able to contribute to the appraisal. It was important to ensure technical staff did not see the appraisal as an overly critical exercise to apportion blame for the previous service disruption, but rather as an impartial audit. So project kick-off began with a round-table architectural overview and amnesty session with these technical stakeholders, followed by a brief review of the existing business continuity and disaster recovery strategy, and fact finding around growth and planned changes expected of the platform to support strategic agenda objectives.

We then used a toolset to automate point in time collection of the University vSphere inventory, configuration and utilization data, including VMware HealthAnalyzer (a VMware vApp available to VMware consultants and qualified partners) to analyse data and present observations, findings and data categorized by VMware Health Check best practices in a report card. In previous engagements we’ve found software tools improve the efficiency and accelerate the delivery of appraisal engagements including VMware Health Checks, but in their own right can only provide a proportion of the evidence required to support a thorough appraisal.

Therefore toolset discovery reports were complemented with a review by field experienced consultants, aware of both the practical implications of various architectural designs and specific configuration decisions, and what potential optimisation opportunities might be available to increase the predictability and resilience of the platform.

Toolset and manual review findings were all presented in a formal report highlighting areas of risk (weakness, concern or fault), detailing configuration and architectural deviations from best practice (all supported with details of the practices which should be followed), and documenting any potential areas for improvement, justified and categorized by priority (immediate, short term/quick wins, medium/long-term) and criticality.

As the report audience would be both technical operations and non-technical senior management/steering group, key findings and recommendations were highlighted in an executive summary. The document was initially presented informally for IT review, before being finalised to serve as both a method of communicating findings and infrastructure issues between stakeholders, and as a formal reference for the University to work from as part of their policy for continual improvement.

I particularly appreciated that I was able to use the executive summary of the report to reassure the University’s Audit Committee. Writing a single report for two audiences – highly technical IT staff and members of governance boards – is a tough ask. Waterstons delivered exactly that. I’d have no hesitation in recommending them.

The project concluded with a final round-table session with original technical stakeholders to discuss some of the less tangible or concrete areas for improvement documented in the report, particularly where suggestions did not necessarily align to a specific best practice and where recommendations were drawn from field experience and an awareness of which design, configuration and operational practices offer the lowest cost of virtual infrastructure ownership.

I was most impressed by this piece of work. Waterstons’ lead consultant was skilled and knowledgeable and his report was both technically comprehensive and well communicated. The report both validated a number of improvements which we had already planned and suggested several others.
Steve Williams Director of University IT