Guest post: Christos Karamanolis, CTO of the Storage and Availability Business Unit, VMware.
As discussed in earlier posts, the latest version of Virtual SAN (v.6.2) announced on February 10, 2016 is the biggest release of the product since its debut in March 2014. The list of new features is impressive and makes Virtual SAN very competitive against the most sophisticated storage platforms in the market today. Indeed, with more than 3,000 customers overall and more than 20,000 CPU licenses sold in Q4 2015 alone, Virtual SAN is one of the most widely deployed and mature Software-Defined Storage (SDS) products available.
Virtual SAN is a storage platform – the key software component that enables VMware’s Hyper-Converged Infrastructure (HCI) strategy. HCI is a drastically different model of building and operating IT infrastructure. A quick Google search returns the following definition:
Hyper–convergence (hyperconvergence) is a type of infrastructure system with a software-centric architecture that tightly integrates compute, storage, networking and virtualization resources and other technologies from scratch in a commodity hardware box supported by a single vendor. –Credit: http://searchvirtualstorage.techtarget.com/definition/hyper-convergence-
This new IT architecture has many benefits for the end customer including:
- Streamlined procurement, deployment and support. Customers can build their infrastructure in a gradual and scalable way as demands evolve.
- Adaptable software architecture that takes advantage of commodity technology trends, such as: increasing CPU densities; new generations of solid-state storage and non-volatile memories; evolving interconnects (40GB, 100GB Ethernet) and protocols (NVMe).
- Last but not least, a uniform operational model that allows customers to manage the entire IT infrastructure with a single set of tools.
It is not surprising that according to IDC, hyper-converged infrastructure (HCI) is the fastest growing segment of the converged (commodity-based hardware) infrastructure market.
VMware offers the most flexible and compelling option for customers to adopt the HCI model: a Hyper-Converged Software (HCS) stack based on vSphere and Virtual SAN. Customers can deploy the software on a wide range of pre-certified vendor hardware. They get the benefits of HCI, including strong software–hardware integration and a single point of support, while they have unparalleled options of hardware to choose from. One of the options is now VCE VxRail, a Hyper-Converged Infrastructure Appliance developed jointly by EMC and VMware.
One reason for the strong customer adoption of VMware’s HCS is the unique architecture of Virtual SAN, an enterprise-grade storage platform that has been designed specifically for HCI use cases.
However, we are missing the point, if we focus just on the (impressive) list of the product’s storage features. The other aspect, perhaps even more important to end users, is the operational benefits of the strongly integrated VMware HCS. With VSAN, VMware is introducing a drastically new, scalable paradigm of managing infrastructure and consuming IT resources. Let’s look into each of those areas in more detail.
Future-proof storage platform
Since the early days of Virtual SAN development, our goal has been to build a software architecture that can combine different types of storage devices to deliver the most cost-effective capacity and performance options for a wide range of workloads. With the first release of the product in early 2014, we focused on hybrid configurations, where HDD is used to offer low-cost capacity while SSD-based caching delivers impressive performance at low cost.
However, technology is evolving. The cost of NAND-based flash capacity ($/GB) is dropping rapidly. In 2015, lower-endurance enterprise SSD capacity cost became cheaper than that of 15K RPM HDDs. Most Flash vendors are planning the release of large-capacity SSDs over the next 6 – 18 months, which will push prices even lower. Think, for example, 8-16 TB SSDs with a capacity cost at $0.50/GB or less. Moreover, each of those devices can deliver 100s of thousands of IOPS even for the most demanding random workloads. Thus, one can use deduplication and erasure coding for substantial space efficiencies without practical concerns for the negative effects these features have in effective caching and for the inherent I/O amplification they may incur.
The economics of storage are skewed in favor of all-flash for an increasing number of use cases. For me, our experience with the Virtual SAN cluster deployed as part of the Hands-On Lab (HOL) infrastructure in VMworld 2014 was an eye opener. The storage workload generated by 100s of concurrent, constantly churning Labs is not very cache friendly (no surprise here). As such, the VMware IT team used a large number of spindles for the capacity tier of Virtual SAN to deal with the workload “escaping” the cache. In other words, the spindles were needed for performance, not capacity. We realized that an all-flash hardware would require fewer capacity devices and it would cost less! And that was already the case back in 2014.
The main challenge with the high-capacity, low cost SSDs is their low endurance (typically below 1 device-write per day guaranteed for 5 years). If one is not careful, those SSDs may burn out quickly in a demanding enterprise environment. In the current Virtual SAN architecture, we use a 2-tier approach to address this challenge. Small high-endurance (and expensive) devices are used as write caches. They are only a fraction of the size of the capacity tier (typically 10%), but they are used effectively to absorb the write workload, while only a fraction of writes (cold blocks) trickle to the capacity tier. The capacity tier devices are still used to serve the majority of the read workload.
So, even in all-flash configurations, Virtual SAN uses a combination of media to achieve a cost-effective, yet enterprise-grade storage solution that meets the requirements of the most demanding business-critical workloads.
New infrastructure management paradigm
In traditional data centers, IT services have been compartmentalized (compute, network, storage) not only because of organizational reasons, but also because of the diverse tools, operational procedures and expertise in each area.
HCI breaks those barriers. With commodity hardware platforms (x86 servers) and software to deliver the three IT services, customers look for a single operational experience and a unified set of tools to manage their infrastructure. Moreover, they wish to do that at scale with reduced staff of IT generalists, not experts in one vendor’s storage or network platform.
And this is exactly what we have been doing with our HCS Management Stack. Virtual SAN is blazing the trail. In April 2015, we introduced the Health-Check Plugin. With VSAN 6.1 in September 2015, this became the in-box ‘Health Service’ feature of the product. The service checks for hardware compatibility of storage devices and controllers and consistent vSphere cluster and network configuration. It monitors storage utilization reporting and even offers performance diagnostics. All natively integrated with vCenter Server and its alarm/event framework that hundreds of thousands of customers use and trust for years now. The user can run tests (such as performance burn-in tests) proactively to determine the health and state of their platform, on day-0 or at any other time as needed. As you can see, the health checks and monitoring go well beyond purely storage aspects – they cover the entire infrastructure.
The same applies with the new Performance Service being introduced in VSAN 6.2. The service provides detailed performance metrics over a configurable time window, both for physical infrastructure (at the granularity of cluster, host, or individual device) as well as for virtual machines and all their state: virtual disks, metadata, swap, etc. The service captures metrics such as IOPS, throughput, latency, and outstanding IO.
I encourage you to read a recent white paper that describes in more detail the Health and Performance services of VSAN 6.2.
Using these tools and data, the user has end-to-end visibility of the state of their infrastructure and the consumption of resources by different VMs and workloads. For example, one can perform performance troubleshooting, pin-point the root cause of any issues (whether compute, network or storage) and decide on remediation actions, while using a single tool set. Talking about tools, the Virtual SAN Health and Performance Services are natively integrated with the vSphere Web Client. Their functionality is also supported fully through APIs. They are part of the Virtual SAN API which is an extension of the vSphere API.
The architecture behind these services is almost as interesting as the great user experience they deliver. The raw configuration, state and performance data are collected on each individual host and they are persisted in a distributed, highly available database on (as you can guess) the Virtual SAN data store itself. There is a service agent running on each host, which executes a number of processing and analysis tasks to produce the metrics that pertain to the local host. Agents communicate in a peer-to-peer fashion to share and aggregate data at cluster level. The processed data are also persisted in the distributed database on Virtual SAN.
Aggregated information from across clusters (according to the filtering criteria and timelines specified by the user) is made available to the users through the centralized GUI or API – a single ‘pane of glass’. In the current incarnation of the product, that centralized interface is provided by vCenter Server. However, no processing is done in vCenter and no health/performance data are persisted in vCenter’s database. In fact, most of the host and cluster-level data can also be obtained through host APIs or the ESXi Host Client directly from each host in a Virtual SAN cluster.
So, HCI is more than a scalable, distributed software-defined storage platform. It is also about a new operational paradigm. The technology we are developing around Virtual SAN spearheads VMware’s strategy on HCI management. So what does the future hold for VMware, Virtual SAN and HCI?
The road ahead
There are three main technology areas we are betting on for the future. I am providing a brief overview of each here. You may hear more details from us on these topics over time. However, I should stress that the forward-looking statements in this article do not reflect committed VMware products or features.
1] New emerging storage and networking technologies
The industry is anticipating a number of fundamentally new physical storage technologies to emerge over the next few years. In July 2015, Intel and Micron announced 3D XPoint, a non-volatile memory (NVRAM) technology that may be delivered in different product form factors, including NVMe disk devices and DIMMs. The rumor mill has it that other vendors are working on competitive NVRAM technologies.
These technologies promise “1,000 times the performance of today’s SSDs with 100 times better
endurance”. The huge number of IOPS and throughput of those devices will pose a challenge for (software) storage platforms
that do not make the most efficient use of CPUs. Even more interesting, in my opinion, is the fact that these devices will be delivering extremely low latencies, in the order of a few micro-seconds, under heavy sustained workload. In comparison, some of the best SSDs today offer latencies in the range of 1 milli-second under sustained workload.
How do applications take advantage of the performance of these emerging storage devices? And what are the architectural properties of the storage stack that delivers that performance?
The cost of these devices is not clear yet, but it is expected to be somewhere between NAND-based flash and DRAM. So, not cheap. Obviously, some form of multi-tier architecture will be required to combine the performance and endurance benefits of NVRAM with the low-cost capacity of traditional Flash or even HDD. Does that sound familiar? It is indeed the core principle behind Virtual SAN’s architecture. As of VSAN 6.2, the product utilizes three tiers of storage: volatile RAM for local caching where the VM runs, high endurance flash for data caching and metadata, and low endurance flash or HDD for capacity tier. We are working actively in collaboration with hardware vendors to evolve Virutal SAN’s architecture so as to accommodate the new generation devices while it still delivers the most cost effective HCI storage in the industry.
2] Expand the current management paradigm with Analytics
As IT infrastructures grow, troubleshooting, root-cause analysis and remediation are becoming increasingly difficult tasks. The management model of HCI and integrated tools, such as those offered by Virtual SAN, are compelling. Yet, humans do not scale. IT organizations need automated and scalable tools to replace the majority of the manual, time-consuming tasks required to perform the daily operations of real-world infrastructures.
As described above, Virtual SAN utilizes a scalable distributed managem
ent architecture that already performs a number of data
crunching tasks and helps customers aggregate and analyze data from their HCI infrastructure.
However, we plan to push this model further: to offer powerful tools and analytics algorithms to customers in the form of Software running as a Service on a Cloud (SaaS). This way, VMware can deliver new features and evolve the service offered at a very rapid pace without requiring the customer to install new software or upgrade existing products.
Virtual SAN is already integrated with VMware’s Customer Experience Improvement Program (CEIP), a.k.a. ‘Phone Home’. Currently, the feature uploads support bundles, including logs, traces and system state data, on VMware’s cloud that allow support engineers to help customers with technical issues in the field including Virtual SAN specific support requests.
Our goal is to extend this service with a SaaS model of Infrastructure Analytics covering a range of use cases from Day-0 system sizing and planning to Day-2 performance and capacity analysis and trending, automated troubleshooting and pro-active customer expert advice.
Where applicable, cloud-based features and recommendations will also trickle back into the on-premise product offerings. That will be important for customers who run critical infrastructure on Virtual SAN, where Internet access is not an option due to security policies.
3] HCI as the infrastructure model for new use cases
The benefits of the HCI and Software-Defined Storage like Virtual SAN makes them the best infrastructure architecture for organizations with a DevOps management culture and cloud-native application use cases. Imagine using CLI tools, APIs and scripts to deal with all your storage management needs, from infrastructure sizing, configuration, and monitoring all the way to storage policy-based consumption, workload monitoring and analysis. All these through a distributed and scalable platform architecture, without central control points or bottlenecks.
This is not a futuristic pipe dream. The management stack of Virtual SAN is very well integrated with vSphere today. However, from a software architecture perspective, it does not have any strong dependencies on vCenter Server. We are currently working on supporting Photon Controller on VSAN-based hyper-converged infrastructures. Photon Controller is VMware’s new infrastructure stack optimized for cloud-native applications. It consists of Photon Machine and the Photon Controller, a distributed, API-driven, multi-tenant control plane that is designed for extremely high scale and churn.
Virtual SAN already fully exposes all its operational features, both infrastructure management and storage consumption through APIs. With Photon Controller, we are building logic to provide cluster-wide access to Virtual SAN’s’s control plane through any host in the cluster. We will still provide single-pane of glass visibility to one’s infrastructure. And, of course, we will be supporting all management workflows through DevOps-friendly command-line tools.
In summary, with Virtual SAN and the management solutions designed around it, VMware offers compelling and trustworthy solutions to thousands of vSphere Enterprise customers who are adopting HCI architectures for their data centers. VMware is investing heavily in the evolution of Virtual SAN as a storage platform. Moreover, Virtual SAN and NSX are the building blocks of a software-defined architecture that addresses new use cases and enables customers to manage their infrastructure at large scale.
This post originally appeared on the Storage & Availability blog Virtual Blocks and was written by Christos Karamanolis. Christos is a VMware Fellow and the CTO of the Storage and Availability Business Unit. He has 25 years of research and development experience in distributed systems, fault tolerance, storage, and storage management. He has co-authored more than 20 research papers in peer-reviewed journals and conferences and holds over 25 granted patents with several pending.