5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Last Update 28 January 2015 Mike Perks Kenny Bain Pawan Sharma Table of Contents Executive summary ......................................................................................... 1 1 Lenovo Client Virtualization solution ...................................................... 2 1.1 Hardware components ............................................................................................. 2 1.2 Software components .............................................................................................. 6 2 Performance test methodology and tools ............................................... 8 2.1 Test methodology ..................................................................................................... 8 2.2 Login VSI ................................................................................................................. 8 2.3 VMware esxtop ........................................................................................................ 9 2.4 Superputty.............................................................................................................. 10 2.5 IBM FlashSystem performance monitor ................................................................. 10 3 Performance test hardware configuration .............................................11 3.1 System under test .................................................................................................. 11 3.2 Load framework ..................................................................................................... 13 4 Software configuration and setup .......................................................... 15 4.1 Setting up Management VMs ................................................................................. 15 4.2 Setting up master user VM..................................................................................... 15 4.3 Setting up master launcher VM .............................................................................. 16 4.4 Setting up Login VSI .............................................................................................. 16 4.5 Setting up Atlantis Computing software.................................................................. 16 4.6 Setting up 5000 persistent desktop VMs ................................................................ 17 5 Scale out performance results ............................................................... 18 5.1 Brokerless by using RDP ....................................................................................... 18 5.2 Citrix XenDesktop .................................................................................................. 22 Resources ....................................................................................................... 28 ii 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Executive summary This paper documents persistent desktop scale out performance testing that was done at the Lenovo solution lab in Austin, Texas. The goal was to test 5000 persistent desktops by using a combination of Lenovo® servers ® ® and networking, IBM FlashSystem™ storage, and Atlantis Computing software. This paper includes the following sections: Section 1 describes the hardware and software components that were used in the Lenovo Client Virtualization (LVC) solution for Citrix XenDesktop. Section 2 describes the methodology and tools that were used for the scale out performance test including Login VSI. Section 3 describes the hardware configuration that was used. Section 4 describes the software configuration that was used. Section 5 describes the performance results. The Lenovo results show a best in class achievement of 5000 persistent desktops with Citrix XenDesktop. This type of result has not been documented before, partly because 5000 persistent desktops require a significant investment in storage. However, as shown in this paper, a combination of Atlantis Computing software and IBM FlashSystem storage turns the usual I/O performance problem into a non-event. Without data reduction technology, enterprise class flash storage can be expensive. The Atlantis Computing de-duplication and compression facilities make it cost effective (the storage cost for 5000 users is less than $130 per user, including the Atlantis license). Moreover, the performance of logging onto a desktop in 16 seconds and rebooting 5000 desktops in 20 minutes makes this system easy to use for users and IT staff. A total of 35 Lenovo Flex System servers in 30U were used to support 5000 users with up to 160 persistent users per server. The Lenovo servers provide a dense and cost effective solution in terms of CAPEX and OPEX. The performance results show that the usual persistent desktop I/O problem was effectively eliminated and the compute server performance is the driving factor. More compute servers can be added to support more users or more compute intensive users; it all depends on the individual customer environment and user load. For more information about the Lenovo Client Virtualization solution for Citrix XenDesktop, contact your Lenovo sales representative or business partner. For more information about the Citrix XenDesktop reference architecture that includes information on the LCV solution, performance benchmarks, and recommended configurations, see this website: http://lenovopress.com/tips1278. 1 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing 1 Lenovo Client Virtualization solution The LCV solution includes compute servers, networking, storage, and clients. The compute servers use the VMware ESXi hypervisor for user Virtual Machines (VMs) or management VMs. In this configuration, 10 Gb Ethernet (10 GbE) networking is used for connectivity between compute servers and clients. A SAN network that uses Fibre Channel is used for connectivity between compute servers and shared storage. Citrix XenDesktop provides a connection broker service between clients and user VMs to support virtual applications or virtual desktops. XenDesktop uses management VMs and supports multiple hypervisors, although only ESXi is used for the scale out performance test. Atlantis Computing software provides an important service, which substantially reduces the amount of IO from the virtual desktops to the shared storage. Figure 1 shows an overview of the main components in the LCV. The rest of this section describes the subset of hardware and software components that are used for the scale out performance test. Clients Virtual Desktops and Applications Broker Hypervisor Servers Storage Tablets IBM Storwize x3550 M4/3650 M4 Laptops NetApp NAS + DAS XenDesktop ESXi Thin Clients Flex x240 M4 IBM FlashSystem Desktops and All-in-Ones EMC VNX ThinkServer RD350/RD450 Workstations Figure 1: Overview of Lenovo Client Virtualization Solution 1.1 Hardware components This section describes the hardware components that are used for the 5000 persistent user scale out performance test, which includes Lenovo compute servers, Lenovo networking components, and IBM storage components. 1.1.1 Flex System elements Flex® System is an enterprise-class platform that is specifically created to meet the demands of a virtualized data center and help clients establish a highly secure private cloud environment. Flex System includes the following features: Greatest choice for clients in processor type and OS platform all in the same chassis and is managed from a single point of control. 2 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Flex System networking delivers 50% latency improvement through node-to-node (east-west) traffic rather than routing everything through the top-of-rack (TOR) switch (north-south). Figure 2: Flex System Enterprise Chassis, and Flex System compute nodes For more information, see the following Flex System website: ibm.com/systems/pureflex/overview.html 1.1.2 Flex System x240 Compute Node The Flex System x240 Compute Node (as shown in Figure 3) is a high-performance Intel® Xeon® processor-based server that offers outstanding performance for virtualization with new levels of processor performance and memory capacity and flexible configuration options for a broad range of workloads. The Flex System x240 Compute Node is ideal for virtualization, with maximum memory support (24 DIMMs and up to 768 GB of memory capacity), 10 GbE Integrated Virtual Fabric, and 8 Gb or 16 Gb Fibre Channel for high networking bandwidth. The Flex System x240 Compute Node also supports Flex System Flash for up to eight 1.8-inch solid-state drives (SSDs) for maximum local storage. Figure 3: Lenovo Flex System x240 Compute Node 1.1.3 Flex System x222 Compute Node The Flex System x222 Compute Node (as shown in Figure 4) is a high-density blade server that is designed for virtualization, dense cloud deployments, and hosted clients. The Flex System x222 Compute Node has two independent compute nodes in the one mechanical package, which means that Flex System x222 has a double-density design that allows up to 28 servers to be housed in a single 10U Flex System Enterprise 3 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Chassis. The Flex System x222 Compute Node supports up to 768 GB of memory capacity, 10 GbE Integrated Virtual Fabric, and 16 Gb Fibre Channel for high networking bandwidth. The Flex System x222 Compute Node also supports Flex System Flash for up to four 1.8-inch SSDs for maximum local storage. Figure 4: Flex System x222 Compute Node 1.1.4 Flex System Fabric EN4093R 10Gb Scalable Switch The Flex System Fabric EN4093R 10Gb Scalable Switch (as shown in Figure 5) provides unmatched scalability, port flexibility, and performance. It also delivers innovations to address many networking concerns today and provides capabilities that help you prepare for the future. This switch can support up to 64 10 Gb Ethernet connections while offering Layer 2/3 switching, in addition to OpenFlow and "easy connect" modes. It is designed to install within the I/O module bays of the Flex System Enterprise Chassis. Figure 5: Flex System Fabric EN4093R 10 Gb Scalable Switch For more information, see this website: ibm.com/redbooks/abstracts/tips0864.html 1.1.5 Flex System FC3171 8Gb SAN Switch The Flex System FC3171 8Gb SAN Switch (as shown in Figure 6) is a full-fabric Fibre Channel component with expanded functionality that is used in the Lenovo Flex System Enterprise Chassis. The SAN switch supports high-speed traffic processing for Flex System configurations, and offers scalability in external SAN size and complexity, and enhanced systems management capabilities. The FC3171 switch provides 14 internal 8 Gb Fibre Channel ports and 6 external ports and supports 2 Gb, 4 Gb, and 8 Gb port speeds. 4 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Figure 6: Flex System FC3171 8 Gb SAN Switch For more information, see this website: ibm.com/redbooks/abstracts/tips0866.html 1.1.6 Flex System FC5022 16Gb SAN Switch The Flex System FC5022 16 Gb SAN Scalable Switch (as shown in Figure 7) is a high-density, 48-port, 16 Gbps Fibre Channel switch that is used in the Flex System Enterprise Chassis. The switch provides 28 internal ports to compute nodes and 20 external SFP+ ports. The FC5022 offers end-to-end 16 Gb and 8 Gb Fibre Channel connectivity. Figure 7: Flex System FC5022 16 Gb SAN Switch For more information, see this website: ibm.com/redbooks/abstracts/tips0870.html 1.1.7 Lenovo RackSwitch G8264 Designed with top performance in mind, Lenovo RackSwitch G8264 (as shown in Figure 8) is ideal for today’s big data, cloud, and optimized workloads. The G8264 switch offers up to 64 10 Gb SFP+ ports in a 1U form factor and can accommodate future needs with four 40 Gb QSFP+ ports. It is an enterprise-class and full-featured data center switch that delivers line-rate, high-bandwidth switching, filtering, and traffic queuing without delaying data. Large data center grade buffers keep traffic moving. Redundant power and fans and numerous high availability features equip the switches for business-sensitive traffic. Figure 8: Lenovo RackSwitch G8264 The G8264 switch is ideal for latency-sensitive applications, such as client virtualization. It supports Virtual Fabric to help clients reduce the number of I/O adapters to a single dual-port 10 Gb adapter, which helps reduce cost and complexity. The G8264 switch supports the newest protocols, including Data Center Bridging/Converged Enhanced Ethernet (DCB/CEE) for support of FCoE, in addition to iSCSI and NAS. For more information, see this website: ibm.com/redbooks/abstracts/tips0815.html 5 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing 1.1.8 IBM System Storage SAN24B-5 The IBM System Storage SAN24B-5 SAN switch (as shown in Figure 9) is designed to meet the demands of hyper-scalable, private cloud storage environments by delivering 16 Gbps Fibre Channel technology and capabilities that support highly virtualized environments. These switches support autosensing of 2 Gb, 4 Gb, 8 Gb or 16 Gb port speeds. The SAN24B-5 supports up to 24 ports in a 1U package. A 48-port version (SAN48B-5) also is available. Figure 9: IBM System Storage SAN24B-5 For more information, see this website: ibm.com/systems/networking/switches/san/b-type/san24b-5 1.1.9 IBM FlashSystem 840 The IBM FlashSystem™ 840 (as shown in Figure 10) is an all-flash storage system that is used to make applications and data centers faster and more efficient by providing over 1 million input/output operations per second (IOPS). The FlashSystem 840 storage system has an industry-leading latency of nearly 100 microseconds. This latency is especially useful for client virtualization that requires high IOPS and low-latency access to large amounts of data. For enterprise-level availability, the IBM FlashSystem 840 system uses two-dimensional flash RAID with patented IBM Variable Stripe RAID™ technology that maintains system performance and capacity if there are partial or full-flash chip failures, which helps reduce downtime and forestall system repairs. It is also extremely compact, with up to 40 TB of useable flash storage in a 2U package with hot-swappable power supplies, backup batteries, and controllers. IBM FlashSystem 840 supports all industry standard interfaces, including 4 Gb, 8 Gb, or 16 Gb Fibre Channel, 40 Gb InfiniBand®, 10 Gb iSCSI, and 10 Gb FCoE. Figure 10: IBM FlashSystem 840 For more information, see this website: ibm.com/systems/storage/flash/840/ 1.2 Software components This section describes the software components that are used for the 5000 persistent user scale out performance test, which includes VMware ESXi hypervisor, Citrix XenDesktop, and Atlantis Computing software. 6 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing 1.2.1 VMware ESXi hypervisor VMware ESXi™ is a bare-metal hypervisor. ESXi partitions a physical server into multiple virtual machines. The compute, memory, and networking resources on the server are all virtualized. One advantage is that ESXi can be booted from a small USB key. For more information, see this website: vmware.com/products/esxi-and-esx/ 1.2.2 Citrix XenDesktop Citrix XenDesktop is an industry-leading connection broker for virtual applications and virtual desktops. It provides a range of services for provisioning, managing, and connecting users to Microsoft Windows virtual machines. For more information, see this website: citrix.com/products/xendesktop/. 1.2.3 Atlantis Computing Atlantis Computing provides a software-defined storage solution, which can deliver better performance than physical PC and reduce storage requirements by up to 95% in virtual desktop environments of all types. The key is Atlantis HyperDup content-aware data services, which fundamentally changes the way VMs use storage. This change reduces the storage footprints by up to 95% while minimizing (and in some cases, entirely eliminating) I/O to external storage. The net effect is a reduced CAPEX and a marked increase in performance to start, log in, start applications, search, and use virtual desktops or hosted desktops and applications. Atlantis software uses random access memory (RAM) for write-back caching of data blocks, real-time inline de-duplication of data, coalescing of blocks, and compression, which significantly reduces the data that is cached and persistently stored in addition to greatly reducing network traffic. Atlantis software works with any type of heterogeneous storage, including server RAM, direct-attached storage (DAS), SAN, or network-attached storage (NAS). It is provided as a VMware ESXi compatible VM that presents the virtualized storage to the hypervisor as a native data store, which makes deployment and integration straightforward. Atlantis Computing also provides other utilities for managing VMs and backing up and recovering data stores. For the purposes of this scale out test, the Atlantis ILIO Persistent VDI version was used in disk-backed mode. This mode provides the optimal solution for desktop virtualization customers that are using traditional or existing storage technologies that are optimized by Atlantis software with server RAM. In this scenario, Atlantis employs memory as a tier and uses a small amount of server RAM for all I/O processing while using the existing SAN, NAS, or all-flash arrays storage as the primary storage. Atlantis storage optimizations increase the number of desktops that the storage can support by up to 20 times while improving performance. Disk-backed configurations can use various different storage types, including host-based flash memory cards, external all-flash arrays, and conventional spinning disk arrays. For more information, see this website: atlantiscomputing.com/products/ 7 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing 2 Performance test methodology and tools This section describes the test methodology that was used for the scale out performance test and the tools that were used to run and monitor the test. 2.1 Test methodology An industry standard client virtualization test tool, Login VSI is used to provide a simulated load of up to 5000 users. Normally, Login VSI is used to benchmark compute servers to find the maximum number of users that can be supported in a specific configuration and the CPU load is at 100%. For the scale out performance test, the idea is not to overload any part of the system and ensure that all of the components are running at less than 100% utilization. This condition mirrors what is required for customer deployments. Login VSI supports two launcher modes: serial and parallel. Serial mode is normally used to test the maximum workload for a specific server. For the scale out performance testing, Login VSI was used in parallel mode so that the login interval could be substantially reduced from the default of every 30 seconds and the simulated load evenly distributed across the Login VSI launchers and compute servers. The user login interval was varied to achieve the best result given the available servers and in many cases one logon every two seconds was used. This means that 5000 users logon over a period of 10,000 seconds (approximately 2.75 hours) and the total test time (including the standard 30 minute Login VSI idle period and logoff) would be about 3.5 hours. All user VMs were pre-booted before the test so they were idle and ready to receive users. The Login VSI medium workload was chosen to represent typical customer workloads. The more intensive heavy workload simply required more servers to support the extra CPU load. During the scale out performance test, different performance monitors were used to ensure that no single component is overloaded. The esxtop tool was used for the compute servers and storage monitoring tools for the IBM FlashSystem shared storage. The results from these tools are described in section 5 on page 17. After each test run, the user VMs and Login VSI launcher VMs are rebooted and everything is reset and ready for the next run a few hours later. Two or three runs often were done for each test variation. 2.2 Login VSI Login VSI is a vendor-independent benchmarking tool that is used to objectively test and measure the performance and scalability of server-based Windows desktop environments (client virtualization). Leading IT analysts recognize and recommend Login VSI as an industry-standard benchmarking tool for client virtualization and can be used by end-user organizations, system integrators, hosting providers, and testing companies. Login VSI can be used for the following purposes: Benchmarking: Make the correct decisions about different infrastructure options that are based on tests. Load-testing: Gain insight in the maximum capacity of your current (or future) hardware environment. Capacity planning: Decide exactly what infrastructure is needed to offer users an optimal performing desktop. 8 Change Impact Analysis: To test and predict the performance effect of every intended modification before its implementation. 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Login VSI measures the capacities of virtualized infrastructures by simulating typical (and atypical) user workloads and application usage. For example, the Login VSI medium workload simulates a medium-level knowledge worker that uses Microsoft Office, Internet Explorer, and PDFs. The medium workload is scripted in a 12- to 14-minute loop when a simulated Login VSI user is logged on and each test loop performs the following operations: Microsoft Outlook: Browse 10 messages. Internet Explorer: One instance is left open (BBC.co.uk) and one instance is browsed to Wired.com, Lonelyplanet.com Flash application: gettheglass.com (not used with MediumNoFlash workload). Microsoft Word: One instance to measure response time, one instance to review and edit document. Bullzip PDF Printer and Acrobat Reader: The Word document is printed and reviewed to PDF. Microsoft Excel: A large randomized sheet is opened. Microsoft PowerPoint: A presentation is reviewed and edited. 7-zip: By using the command line version, the output of the session is zipped. After the loop finished, it restarted automatically. Each loop takes approximately 14 minutes to run. Within each loop, the response times of specific operations are measured at a regular interval: six times within each loop. The response times of these seven operations is used to establish the VSImax score. VSImax is the maximum capacity of the tested system expressed in the number of Login VSI sessions. For more information see this website: loginvsi.com/ 2.3 VMware esxtop IOPS distribution and latency are the two most important metrics to be considered in the analysis of storage system. The VMware tool esxtop was used to capture this information from the ESXi hypervisor. Figure 11 shows the command that was used to pipe the esxtop data to a file. Figure 11: esxtop command line and usage For more information, see this website: http://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.monitoring.doc/GUID-D89E8267-C 74A-496F-B58E-19672CAB5A53.html For more information about interpreting esxtop statistics, see this website: http://communities.vmware.com/docs/DOC-9279 9 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing 2.4 Superputty Superputty is a Windows GUI application that allows multiple PuTTY SSH clients to be opened, one per tab. In particular, this tool was used to allow simultaneous control of multiple SSH sessions and start tools (such as esxtop) in each session at the same time. For more information, see this website: https://code.google.com/p/superputty/ 2.5 IBM FlashSystem performance monitor As with other IBM storage platforms, IBM FlashSystem features an integrated web-based GUI that can be used for management and performance analysis, in addition to supporting data collection from external tools. The procedure described at the following website was used to export performance metrics into CSV format so they can be easily reviewed for this study: http://ibm.com/support/docview.wss?uid=tss1td106293&aid=1 10 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing 3 Performance test hardware configuration The hardware configuration for the 5000 persistent user scale out performance test consists to two major parts: the “System under test” that runs the 5000 persistent desktop VMs, and the “Load framework” that provides the simulated load of 5000 users that were using Login VSI. Figure 12 shows an overview of the hardware configuration for the 5000 persistent user performance test. G8264 Switch SAN24B-5 Switch User Network Management Network Storage Network SAN Network IBM FlashSystem 840 Storage for 5000 persistent user VMs Active Directory, DHCP, and DNS Server Launcher Servers NAS Storage for results, logs, management and launcher VM images Compute Servers (for users and management) System Under Test Load Framework Figure 12: Overview of hardware configuration for performance test 3.1 System under test The system under test configuration consists of 35 compute servers that are running 5000 user VMs and two management servers that are running management VMs. All servers have a USB key with ESXi 5.5. The 35 compute servers are various Lenovo Flex x240 and Lenovo Flex x222 compute nodes, as listed in Table 1. Table 1: Compute nodes used in system under test Server Processor Memory Count x222 2 x E5-2470 (Sandy Bridge EN) in each half 192 GB each half (384 total) 5x2 x240 2 x E5-2670 (Sandy Bridge EP) 256 GB 18 x240 2 x E5-2690 (Sandy Bridge EP) 256 Gb 5 x240 v2 2 x E5-2690v2 (Ivy Bridge EP) 384 GB 2 11 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Each x240 compute node has a two-port 10 GbE LAN on motherboard (LOM) adapter and a two-port 8 Gb Fibre Channel (FC) adapter (FC3172). Each x222 compute node also has a two-port LOM for each half and a shared four port 16 Gb Fibre Channel adapter (FC5024D). The 35 compute nodes are placed in three Lenovo Flex chassis, which use a total of 30U in a rack. Each Flex chassis is configured with a EN4093R 10 GbE switch that is connected to a Lenovo G8264 64-port TOR 10 GbE ethernet switch. Each chassis is connected by using four a 40 GbE cable for best performance. An extra EN4093R switch in each chassis and a second G8264 TOR switch can be used for redundancy. Each Flex chassis also contains an FC3171 or FC5022 FC switch that is configured in pass-thru mode. The chassis switches are connected with four LC-LC fibre cables to an IBM SAN24B-5 TOR SAN switch. An extra FC switch in each chassis and a second SAN24B-5 TOR switch can be used for redundancy. All zoning for the compute nodes and IBM FlashSystem 840 storage is centralized in the SAN24B-5 switch. The IBM FlashSystem 840 storage server was configured with a full complement of 12 4 TB flash cards for a total of 40 TB of redundant storage (usable after two-dimensional RAID protection). The 5000 persistent virtual desktops used less than 5 TB of FlashSystem capacity after Atlantis Computing data reduction. The FlashSystem 840 is connected to the SAN24B-5 switch by using four LC-LC fibre cables, two to each storage controller for redundancy. Another four fibre cables can be used to connect to a second SAN switch for further failover protection. Even with redundancy, there are enough ports on the IBM FlashSystem 840 for a direct FC connection from the Flex chassis FC switches. Pass-thru mode to a TOR SAN switch was used to show how a larger SAN network is built. All of the management VMs that are required by Citrix XenDesktop and Atlantis ILIO center are split across two x240 compute nodes. The configuration and number of these VMs is in Table 2. Table 2: Characteristics of management VMs Storage Windows OS Count 4 GB 50 GB 2008 R2 SP1 1 (+1) for redundancy 2 4 GB 70 GB 2008 R2 SP1 2 (1 per 2500 VMs) Delivery Controller 4 16 GB 70 GB 2008 R2 SP1 4 (1 per 1250 VMs) Citrix licensing server 2 4 GB 20 GB 2008 R2 SP1 1 XenDesktop SQL server 2 4 GB 150 GB 2008 R2 SP1 1 vCenter server 10 32 GB 100 GB 2008 R2 SP1 1 vCenter SQL server 4 4 GB 150 GB 2008 R2 SP1 1 Atlantis ILIO center 2 4 GB 20 GB Linux 1 Virtual System processors memory AD, DNS and DHCP 2 Web Interface Management VM The VM for the Active Directory, DNS, and DHCP services is shared by the servers in the system under test and the load framework and a second instance is used for redundancy. Windows Server 2012 R2 can be used instead of Windows 2008 R2 SP1. 12 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing For production purposes the AD server and SQL servers should be replicated to provide fault tolerance. Four VMs for the XenDesktop Delivery Controller are used to provide adequate performance under load. Figure 13 shows the compute servers, shared storage, and networking hardware for the system under test. • 42 42 41 41 40 40 39 IBM G8264 64 port 10GbE TOR 39 38 38 2 4 37 D A Rst A 1 12 24 36 14 26 38 48 2 D Mgmt 11 23 35 37 47 3 D S A D 2 A 1 4 3 6 5 8 7 10 11 9 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 30 31 29 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 36 • IBM SAN24B-5 FC SAN TOR 36 0 4 1 5 2 6 3 7 8 12 9 13 10 14 11 15 16 20 17 21 18 22 19 23 35 35 SAN24B-5 34 34 840 33 • 33 1 IBM FlashSystem 840 (40TB) for 12 32 32 1-2 31 31 30 storage of persistent VMs 30 29 13 14 28 12 27 9 x222 7 x222 10 26 26 25 x222 8 x222 6 25 24 24 5 x222 1 1 • 23 IBM Flex Chassis with 29 28 11 27 0 3 23 0 x240 x240 1 22 22 0 0 1 4 1 x240 x240 2 21 21 Flex System Enterprise – – 5 x222 twin node (compute) 4 x240 node (compute) 20 20 1 1 0 0 19 13 18 x240 x240 1 1 0 0 11 12 1 17 17 0 9 0 x240 x240 10 16 16 1 1 0 0 7 x240 15 1 1 0 0 x240 8 x240 6 x240 4 x240 2 15 14 14 5 x240 1 • 13 Two IBM Flex Chassis with 19 18 x240 x240 1 14 1 x240 1 1 0 0 1 13 0 0 3 12 x240 12 11 11 Flex System Enterprise – – 26 x240 node (compute) 2 x240 node (management) 10 10 1 1 0 0 09 13 08 x240 x240 1 1 0 0 11 09 08 x240 x240 1 14 12 1 07 07 0 9 0 x240 x240 10 06 06 1 1 0 0 7 x240 05 1 1 0 0 x240 8 x240 6 05 04 04 5 x240 1 1 03 0 x240 3 02 x240 1 4 02 1 0 1 03 0 0 x240 x240 01 2 01 Flex System Enterprise Figure 13: Hardware configuration for System under Test 3.2 Load framework The load framework uses Login VSI 3.7 to simulate a user load of up to 5000 users with the medium workload. The load framework consists of 29 compute servers and one management server that uses Lenovo x3550 rack servers with the VMware ESXi 5.5 hypervisor, and NAS shared storage for the Login VSI launcher VMs and performance data. The compute servers for the load framework must have adequate performance to support the required load of 8 - 12 Login VSI launcher VMs. These compute servers often have two Westmere EP or better processors and 96 GB or more of memory. Each “launcher” compute server has a USB key with ESXi 5.5 and a two-port 10 GbE adapter that is connected to the same G8264 10 GbE TOR switch that is used by the system under test. There is no need for an FC connection to the IBM FlashSystem storage, although there is nothing preventing centralization of the storage on FlashSystem. Instead, all of the data for the load framework is stored on NAS shared storage, which is connected to the same G8264 10 GbE switch. The management server for the load framework supports several VMs. The main VM is used to run Login VSI Launcher and Analyzer tools. In addition a separate Citrix XenDesktop configuration is used to provision 13 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing multiple launcher VMs using XenDesktop machine creation services (MCS). There are different ways this could have been done but it was easy to use MCS in dedicated mode to create the launcher VMs. Figure 14 shows the compute servers and storage hardware for load framework. 42 42 System x3550 41 41 System x3550 40 40 System x3550 39 39 System x3550 38 38 System x3550 37 37 System x3550 36 36 System x3550 35 29 IBM System x3550 nodes 35 System x3550 34 34 System x3550 33 33 System x3550 32 (launchers) 32 System x3550 31 31 System x3550 30 30 System x3550 29 29 System x3550 28 28 System x3550 27 27 System x3550 26 26 System x3550 25 25 System x3550 24 24 System x3550 23 23 System x3550 22 22 System x3550 21 21 System x3550 20 20 System x3550 19 19 System x3550 18 18 System x3550 17 17 System x3550 16 16 System x3550 15 1 IBM System x3550 node 15 System x3550 14 14 System x3550 13 13 System x3550 (management) 12 11 12 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 2.0TB 10 10 09 09 08 07 08 06 NAS Storage for launcher VMs 11 07 06 05 05 04 04 03 03 02 02 System Storage N6240 01 01 Figure 14: Hardware configuration for Load Framework 14 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing 4 Software configuration and setup The following software and configurations setup tasks must be done before any performance tests are run: Management VMs Master User VM Master Launcher VM Login VSI Atlantis Computing software 5000 Persistent Desktop VMs 4.1 Setting up Management VMs The configuration and set up of the management VMs that are required for Citrix XenDesktop should follow the normal procedures as documented by Microsoft and Citrix. The following special considerations apply: The Active directory, DNS, and DHCP server is shared between all compute servers on the network (from the system under test and the load framework). There are four XenDesktop Delivery Controllers. The mapping between user IDs and the names of the persistent desktop VMs is statically specified to the connection broker rather than being randomly assigned the first time it is needed. This specification makes it easier to remedy any VM setup problems before the first performance test. If this is not done, the assignment of user IDs to VMs must be rerun until it completes successfully for all 5000 users. 4.2 Setting up master user VM Windows 7 Professional with SP1 is used as the basis for the master user VM (master image) for the scale out performance testing. The master image was created by completing the following steps: 1. Create a Windows 7 Professional 64-bit with SP1 VM. The following VM parameters should be specified: 1 vCPU, 1024 MB vRAM, and 24 GB Disk. 2. Configure Windows 7, networking, and other OS features. 3. Install VMware VMtools for access by vCenter and reboot. 4. Join to the Active Directory domain and reboot. 5. Disable all Internet Explorer plug-ins. 6. Ensure that the firewalls are turned off. 7. Enable remote desktop for remote access to the desktop. 8. Install the Windows applications that are needed for Login VSI medium workload, including Microsoft Office, Adobe Acrobat, and so on. 9. Apply the Citrix recommended optimizations. For more information, see this website: support.citrix.com/article/CTX125874 10. Install the Citrix XenDesktop Virtual Desktop Agent (VDA). This step is not needed for the brokerless RDP test scenario. 15 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing 11. Add registry to point to the FQDNs of the 4 XenDesktop Delivery Controller VMs. For more information, see this website: support.citrix.com/article/CTX137993. This step is not needed for the brokerless RDP test. The Citrix desktop service randomly selects a controller from the list (grouped or ungrouped) until a successful connection to a controller is established. 12. Shutdown the VM and take a snapshot. 4.3 Setting up master launcher VM The setting up the master launcher VM for Login VSI is similar to that for the master user VM except that the Citrix receiver should be installed. The Citrix receiver is not needed for the brokerless RDP test scenario. To save time, an autologon script is added so that the launcher VMs are automatically logged on after being started. 4.4 Setting up Login VSI Login VSI 3.7 was used to simulate the load of 5000 users for the scale out testing. The process starts by installing Login VSI using the install instructions that are available at this website: loginvsi.com/documentation/index.php?title=Installation A separate management VM is used to run Login VSI performance tests and analyze the results. As noted earlier a Citrix MCS environment is used to create the launcher VMs. First add all of the physical launcher machines to VMware vCenter and also Citrix XenCenter. Then using the master launcher image as a template, the 288 launcher VMs are created using MCS dedicated mode. The number of launcher VMs per physical server depends on its performance; however, 8 - 12 launcher VMs works well. The Login VSI tool is started to ensure that all of the launchers were created properly and are ready to use. Finally, a script is used to add the 5000 unique user IDs to AD. The password for all of these users is the same for simplicity. For the brokerless RDP test scenario, the following slightly different steps are used for running Login VSI: Ensure that the LoginVSI RDP group has access to the Master image. Use vCenter to copy and paste the IP addresses of the user VMs that are performing the LoginVSI test into a CSV file (named %csv_target% in the commandline example below). In the Login VSI configuration, replace the commandline with the following: C:\Program Files\Login Consultants\VSI\Launcher\RDPConnect.exe %csv_target% <AD domain>\<login vsi user> P@ssword1 4.5 Setting up Atlantis Computing software Install Atlantis ILIO Persistent VDI product by following the standard installation procedure. The Atlantis ILIO Center VM can be run on one of the servers that is designated for management VMs. It is a recommended Atlantis best practice that each ILIO VM has its own logical unit number (LUN) on shared storage. Therefore, 35 volumes (each with 300 GB capacity) were created on the IBM FlashSystem storage. 16 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing This capacity totals about 10 TB. Each ILIO VM and its datastore of user VMs requires less than 150 GB for a total of only 5 GB on the shared storage. However, in production, the de-duplication savings for persistent desktops is more likely to be 80% - 90% instead of the 98% that is achieved by this performance test. By using vCenter, each physical server has access to all 35 of the volumes, even though only one is actually used per physical server. The ILIO master VM and the master user VM is then placed in one of those volumes. Scripts that are available from Atlantis Computing are used to clone the ILIO VM and the master user VM across all 35 compute servers in preparation for the next step. 4.6 Setting up 5000 persistent desktop VMs Each of the 35 compute servers supports persistent desktop VMs. The number of VMs depends upon the processor capability of the server. Table 3 lists the number of VMs per compute server and VM total. Table 3: Number of VMs per compute server Server Processor Count VMs Total VMs x222 2 x E5-2470 (Sandy Bridge EN) in each half 5x2 100 1000 x240 2 x E5-2670 (Sandy Bridge EP) 2 160 2880 x240 2 x E5-2690 (Sandy Bridge EP) 5 160 800 x240 v2 2 x E5-2690v2 (Ivy Bridge EP) 18 160 320 Total 5000 A command line script from Atlantis and a CSV file are used to fast clone the master VM on each compute server to create the required number of VMs on each of the servers. A naming scheme of the server name and VM number is used to create a set of 5000 uniquely named VMs. The cloning process can take half a day to complete, but needs to be done only once for each different master VM image. Each VM is started to register with active directory and automatically assign the machine name to the VM. This process can be done as a separate step or as part of the fast cloning process described above. The VMs are then shutdown via vCenter. The dedicated machine catalog is created for Citrix XenDesktop and a 5000 line CSV file is used to automatically insert all of the named VMs into the machine catalog. A desktop group is created and the 5000 VMs are added to it. XenDesktop automatically starts each VM and ensures that it is accessible from XenDesktop. Sometimes it is necessary to do some manual steps to get all of the VMs into the correct state. The last step is to perform a standard Login VSI profile run to automatically create the user profile in each persistent desktop. Because of the static assignment of names, any failures can be corrected manually or by rerunning Login VSI. A final restart of the guest operating systems and the 5000 persistent desktops are ready for a performance test. 17 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing 5 Scale out performance results To show the performance of 5000 persistent users, the following test scenarios were run: Brokerless with RDP connected clients (no Citrix XenDesktop) Citrix XenDesktop 7.5 with HDX connected clients This section describes the results of the scale out tests by examining the performance of Login VSI test, the compute servers, and shared storage. 5.1 Brokerless by using RDP In this test scenario, the View connection server is not used and the launcher VMs are connected directly to the VMs by using the RDP protocol. This test is used as a baseline and comparison for the other tests. Figure 15 shows the output from LoginVSI with a new logon every second. Out of 5000 started sessions, 4998 successfully reported back to Login VSI. The average response time is extremely good with a VSI baseline of 860 milliseconds (ms). The graph is flat with only a slight increase in the average between the first and last desktop. As measured by Login VSI, the longest time to logon for any session was 8 seconds. Figure 15: Login VSI performance result for Brokerless by using RDP Figure 16 shows the percentage CPU utilization by using representative curves for each of the four different servers that were used in the test. The utilization slowly climbs as more users logon and then sharply drops off after the steady state period ends and the users are logged off. The E5-2470 based server has the lowest 18 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing utilization because only 100 VMs are started on those servers. The E5-2670 has the highest utilization (92%) because it has the slowest CPU of the other three servers, which all have 160 VMs. Figure 16: Esxtop CPU utilization for Brokerless by using RDP Figure 17 shows the total number of server IOPS as reported by Esxtop by using representative curves for each of the four different servers that were used in the test. The IOPS slowly climb as more users logon and then sharply drops off after the steady state period ends and the users are logged off. The E5-2470 based server has the lowest IOPS because only 100 VMs are started on those servers. The other three servers have similar curves because all have 160 VMs. The IOPS curves are spiky and show that the number of IOPS at any instant of time can vary considerably. The peaks are most likely because of logons. 19 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Figure 17: Esxtop IOPS for Brokerless by using RDP Figure 18 shows the total number storage IOPS as measured by the IBM FlashSystem 840. The write IOPS curve shows the classic Login VSI pattern of gradual building of IOPS (up 12:56am), then a steady state period of 30 minutes (12:56am to 1:26am), and finally a peak for all of the logoffs at the end. The read IOPS are low as Atlantis Computing software is managing most of them out of its in-memory cache. The write IOPS are fairly low, peaking at less than 30,000 IOPS, which is 6 per persistent desktop. Atlantis Computing software is using its data services to compress, de-dupe, and coalesce the write IOPS. Figure 18: FlashSystem storage IOPS for Brokerless using RDP 20 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Figure 19 shows the server latency in milliseconds as reported by Esxtop by using representative curves for each of the four different servers that are used in the test. The average latency is 300 microseconds (us) and is constant throughout the whole test. The latency for the servers with the E5-2470 CPU tends peak higher, but often is not more than 1ms. Figure 19: Esxtop latency for Brokerless by using RDP Figure 20 shows the storage request latency in milliseconds as measured by the IBM FlashSystem 840. The curve shows that the average read latency is less than 200 us and even drops to zero during the steady state phase because all of the read requests are satisfied by the Atlantis Computing cache. The write latency also often is less than 200 us with occasional peaks, which are still less than 1000 us (1 millisecond), except during the 5000 virtual desktop restart. 21 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Figure 20: FlashSystem storage latency for Brokerless by using RDP 5.2 Citrix XenDesktop In this test scenario, the Citrix XenDesktop broker is used and the launcher VMs are connected to the XenDesktop Web Interface. Figure 21 shows the output from LoginVSI with a new logon every 2 seconds, which is half the interval of the brokerless RDP scale out test. Out of 5000 started sessions, 4997 successfully reported back to Login VSI and is a successful run. The average response time is good, with a VSI baseline of 1356 ms. The graphs for minimum and average response times are flat with only a slight increase in the average between the first and last desktop. The graph for the maximum response time increases steadily and only shows the worst case. As measured by Login VSI, the longest time to logon for any session was 16 seconds. 22 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Figure 21: Login VSI performance result for Citrix XenDesktop Figure 22 shows the percentage CPU utilization by using representative curves for each of the four different servers that were used in the test. The utilization slowly climbs as more users logon and then sharply drops off after the steady state period ends and the users are logged off. The E5-2470 based server has the lowest utilization because only 100 VMs are started on those servers. The E5-2670 and E5-2690 CPUs have the highest utilization (95%) compared to the faster E5-2690v2 and all three servers have 160 VMs. 23 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Figure 22: Esxtop CPU utilization for Citrix XenDesktop Figure 23 shows the total number of server IOPS as reported by Esxtop by using representative curves for each of the four different servers that were used in the test. The IOPS slowly climb as more users logon and then sharply drops off after the steady state period ends and the users are logged off. The E5-2470 based server has the lowest IOPS because only 100 VMs are started on those servers. The other three servers have similar curves as all have 160 VMs. The IOPS curves are spiky, which shows that the number of IOPS at any instant of time can vary considerably with the peaks are most likely because of a logon. 24 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Figure 23: Esxtop IOPS for Citrix XenDesktop Figure 24 shows the total number storage IOPS as measured by the IBM FlashSystem 840. The write IOPS curve shows the classic Login VSI pattern of gradual building of IOPS. The steady state period is less discernible in this graph and occurs around 9:15 p.m. The read IOPS are low as Atlantis Computing software is managing most of them out of its in-memory cache. The number of read IOPS increases substantially at logoff. The write IOPS are quite low, peaking at less than 35,000 IOPS, which is 7 IOPS per persistent desktop. Again, Atlantis Computing software is using its data services to compress, de-dupe, and coalesce the write IOPS. Figure 24: FlashSystem storage IOPS for Citrix XenDesktop 25 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Figure 25 shows two successive runs of the 5000 persistent desktop scale out test. Each run shows a similar pattern of IOPS, which culminates with the logoffs and an idle back to a low number. At 10.33 p.m., a reboot of all of the desktops was started and then completed 20 minutes later. In Figure 25, there are jumps in the IOPS log at 7:12 p.m. and 11:47 p.m., which is an artifact of the data collection process on the IBM FlashSystem 840. Figure 25: Two Citrix XenDesktop runs with reboot in between Figure 26 shows the server latency in milliseconds as reported by Esxtop by using representative curves for each of the four different servers that were used in the test. The average latency is 350 microseconds (us) and is fairly constant throughout the whole test. The latency for the servers with the E5-2470 CPU tends peak higher, but is usually not more than 1 ms. Figure 26: Esxtop latency for Citrix XenDesktop 26 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Figure 27 shows the storage request latency in milliseconds as measured by the IBM FlashSystem 840. The curve shows that the average read latency is less than 200 us. The write latency is also less than 250 us and most peaks are below 750 us with occasional peaks to 2.5 ms. Figure 27: FlashSystem storage latency for Citrix XenDesktop 27 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Resources Reference architecture for Lenovo Client Virtualization with Citrix XenDesktop lenovopress.com/tips1278 Atlantis Computing atlantiscomputing.com/products IBM FlashSystem 840 ibm.com/storage/flash VMware vSphere vmware.com/products/datacenter-virtualization/vsphere Citrix XenDesktop citrix.com/products/xendesktop Acknowledgements Thank you to the teams at Atlantis Computing (Mike Carman, Bharath Nagaraj), IBM (Rawley Burbridge), and ITXen (Brad Wasson) for their tireless work on helping with the performance testing. 28 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Trademarks and special notices © Copyright Lenovo 2015. References in this document to Lenovo products or services do not imply that Lenovo intends to make them available in every country. Lenovo, the Lenovo logo, ThinkCentre, ThinkVision, ThinkVantage, ThinkPlus and Rescue and Recovery are trademarks of Lenovo. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind. All customer examples described are presented as illustrations of how those customers have used Lenovo products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-Lenovo products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by Lenovo. Sources for non-Lenovo list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. Lenovo has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-Lenovo products. Questions on the capability of non-Lenevo products should be addressed to the supplier of those products. All statements regarding Lenovo future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local Lenovo office or Lenovo authorized reseller for the full text of the specific Statement of Direction. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in Lenovo product announcements. The information is presented here to communicate Lenovo’s current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard Lenovo benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Photographs shown are of engineering prototypes. Changes may be incorporated in production models. Any references in this information to non-Lenovo websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this Lenovo product and use of those websites is at your own risk. 29 5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing
© Copyright 2024