Best Processor for Virtualization

AMD or Intel CPU for home lab - Virtualization - Spiceworks

To have a better understanding of the challenges we may deal with when using Elasticsearch in a virtualized environment, we need to change the focus from conventional hardware problems to a more complex view. The purpose of this article is to uncover some common issues you might experience using Elasticsearch in virtual environments.

A Brief History

Way before Elasticsearch appeared, the concept of virtualization was taking its place as a first class citizen in computing. Virtualization refers to the act of creating a virtual (not an actual) version of something, including, among others, virtual-computer hardware platforms, operating systems, storage devices, or computer network resources.

Virtualization was born in the late 1960s and early 1970s, when IBM created the CP-40/CMS (Conversational Monitor System) as a method of logically dividing the system resources provided by mainframe computers between different applications. Afterwards, the meaning of the term broadened to what currently is: full virtual machine (VM) implementations and control of processing, network and memory, all working together seamlessly in the cloud.

Existing Platforms

There are various existing platforms to handle Elasticsearch in virtual environments, all of which are different between them. Generally, the three main platforms we see used for Elasticsearch are:

Finally, as a different way to handle our Elasticsearch virtualized infrastructure, Found by Elastic is a hosted and fully managed Elasticsearch Software as a Service (SaaS). Found provides a fast, scalable, reliable and easy to operate search service hosted for you in the cloud.

The Architecture

As an example of how complex a virtualized architecture can be, and all the points we have to understand to manage Elasticsearch on a virtual environment, we can take a brief look into VMware's vSphere architecture. VMware vSphere is used to transform entire datacenters into a single cloud computer infrastructure, virtualizing and aggregating the main physical hardware resource across multiple systems and providing virtual resources to the datacenter.

VMware vSphere consists of multiple component layers such as:

  • Infrastructure Services - VMware vCompute, VMware vStorage and VMware vNetwork. VMware ESX and ESXi are both physical servers that abstract away from the processor, manage storage in virtual environments and simplify networking.
  • Application Services - Ensure availability, security and scalability for applications.
  • VMware vCenter Server - A single application that takes control of the datacenter, providing access control, performance monitoring and configurations.
  • Clients - Different types of clients to access VMware vSphere datacenter, where we can create and access an Elasticsearch node.

Although the architecture is complex, no matter which virtualization solution we use, we will have tools that makes it very easy to manage entire datacenter or clusters. Those tools can help us to easily allocate storage and networking to the physical nodes, parcel out resource allocation (CPU, memory, disk and network bandwidth) as needed, monitor datacenter status, and more. The tools will allow us to configure and setup Elasticsearch in a virtual environment exactly as required depending on our needs. Regardless, we need to take care around some issues that can crop up with CPU, memory and disk utilization.

Handling Resources

There are various ways to achieve the goal of running Elasticsearch in a virtualized environment. Each platform and solution, whether is cloud-based or not, has his own complexity and difficulty for configuring and running. Handling resources is the key area for achieving success.

CPU

Every virtualization solution has limits regarding CPU usage. A physical processor core can support up to 32 virtual CPUs (vCPU) in both vSphere 6 and Azure, and 36 vCPU in Amazon EC2. As we increase CPU allocation on cloud providers, we will increase the cost for each instance.

Elasticsearch uses Java, so we will need to handle a Java Virtual Machine (JVM) within our virtual environment. A good approach for JVM's is to have a minimum of two CPU's, one to handle garbage collection and JVM administration, and the other to handle the application processing.

A good way to handle CPU usage is to monitor CPU utilization inside the VM using Marvel. If Elasticsearch is using a lot of CPU resources inside the VM, it may be worth considering increasing the number of available vCPUs.

Memory

As well as CPU limits, there are limits for the amount of RAM we can allocate on a host depending the provider: up to 6 TB on vSphere, 244 GB on Amazon EC2, and 112 GB on Azure. As we increase memory usage, we will generally see increase in costs.

CPU and disk usage can be affected by reaching memory limits. You might want to watch and monitor the Host and VM status with Marvel, to find whether you need to do something in order to decrease memory usage, such as refactoring Elasticsearch queries or increasing the amount of memory on the host.

Java objects, methods, thread stacks and others, reside in Java heap. The amount of memory given to the heap will ensure us good — or bad — behavior of our Elasticsearch cluster. When the heap starts to fill, the Java garbage collector will start running. It is a best practice to allocate half of the total amount of memory for the heap. In addition, we have detailed information in our documentation on how to limit memory usage .

Disk

Disk utilization is similar on a host and a VM. We need to eliminate disk contention as we do in any environment. If a set of disks in the host is being overused, meaning that the average I/O is close to 100%, we might see an impact in all the virtual environments that are using the same disks. Disk resources can also be impacted by "noisy neighbors", which are generally larger VMs running on or against the same hardware, thereby consuming resources in negative and surprising ways.

Backing up your Elasticsearch cluster, or creating snapshots for individual indices as well as entire clusters, is incredibly important! By making backups from the VM, we can ensure that we have a starting point to continue from in the case of failure. Creating snapshots or backups from VMs has some cost and may have an impact in the VM response time, so we may also impact Elasticsearch's responsiveness by doing such operations. Plus, it is just good practice to have a Backup and Snapshot policy for your clusters.

Elasticsearch disk usage depends on each use case. We recommend doing stress and performance tests on the server in order to understand the amount of disk we need to allocate in order to make the cluster work well. When it comes to CPU and Memory, some cloud solutions can become pricey as you increase the disk allocation.

Network

Configuring the network is usually straight forward. There are plenty of possible configurations depending on which cloud provider you choose and what your needs are. You can share the network with the host, or create an independent network to use on your VM.

You make consider creating a Virtualized Private Network (VPN) to isolate the cluster, as well as to secure it.

The Perils of Virtualization

In addition to the areas outlined above, there are a few other places where we can run into trouble running Elasticsearch in a virtualized environment.

Qnap Qnap TS-251A 2-bay TS-251A personal cloud NAS/DAS with USB direct access, HDMI local display (TS-251A-2G-US)
Personal Computer (Qnap)
  • Intel Celeron N3060 Dual Core 1.6GHz (up to 2.48GHz), 2GB DDR3L RAM (max 8GB), 2 x GbE LAN, USB Quick Access, hardware transcoding, encryption engine, max 1 UX-800P/UX-500P...
  • Excellent system performance, ideal for centralized management, file sharing and backup ; USB Quick Access port for accessing files without needing a network connection
  • Supports 4K hardware decoding and transcodes 4K H.264 videos on-the-fly or offline ; Enjoy the best audio-visual experience with HDMI output, multi-channel surround...
  • Build a personal karaoke system with Ocean KTV ; Linux Station supports Ubuntu® with direct output via HDMI
  • A hybrid approach to virtualization: Virtualization Station & Container Station ; Supports Virtual JBOD (VJBOD) to maximize the storage utilization of multiple...
  • material used is plastic

Related posts: