Thursday, May 18, 2017

VeeamON 2017 Announcements

VeeamOn 2017 has turned out to be pretty good. There were lots of updates from Veeam and partners that should keep me busy for quite awhile. Here is a summary of some of the announcements from VeeamOn 2017.


Veeam Availability Suite v10 – Built to drive business continuity and visibility by leveraging the "Always on Cloud" Availability Platform to manage and protect the private and public cloud.
Here are a few details:
  •  Built-in Management for Veeam Agent for Linux and Veeam Agent for Microsoft Windows - Reduce management complexity and improve usability through the direct integration of agent management, enabling users to manage virtual and physical infrastructures through the Veeam Backup & Replication console.
  •  NAS Backup Support for SMB and NFS Shares - Maintain Availability through the expansion of Veeam’s protected data footprint with file-level backup support for Network Attached Storage (NAS) for SMB and NFS shares. Support includes scalable SMB/NFS backups, flexible data protection through short and long-term retention policies and out-of-place restore to easily restore SMB/NFS files to any target.
  • Scale-Out Backup Repository — Archive Tier - Save primary backup repository space and minimize costs while maintaining compliance with the addition of an archive tier for Scale-Out Backup Repository. The new archive tier will deliver storage-agnostic support by enabling the archiving of backups to any storage or media, deliver backup data management by automatically moving the oldest backup files from primary storage to archive extents and provide broad cloud object storage support.
  • Veeam CDP (Continuous Data Protection) - Preserve critical data during a disaster and ensure the Availability of tier-1 VMs with VMware-certified continuous data protection to reduce RPOs from minutes to seconds, as well as leverage VMware’s VAIO framework to reliably intercept and redirect VM I/O to the replica without a need to create standard VM snapshots.
  •  Additional enterprise scalability enhancements - To save businesses valuable time and improve overall scalability, v10 will additionally feature several enterprise scalability enhancements including role-based access control to establish self-service backup and restore functionality for VMware workloads based on vCenter Server permissions and Oracle RMAN integration, allowing users to seamlessly stream RMAN backups into Veeam repositories and easily perform UI-driven restores from backups using a Veeam console.
  • Primary Storage Integrations — Universal Storage Integration API - Extend Availability and improve recovery time and point objectives (RTPO) through primary storage integrations with leading storage providers through a universal plug-in framework enabling select partners to build integrations with Veeam Backup & Replication. New integrated storage solutions based on the API include IBM Spectrum Virtualize (IBM SAN Volume Controller (SVC) and the IBM Storwize family), Lenovo Storage V Series and INFINIDAT.
  • DRaaS Enhancements (for service providers) - Service providers can help tenants minimize costs and reduce recovery times during a disaster with new Disaster Recovery as a Service (DRaaS) enhancements including vCloud Director integration for Veeam Cloud Connect Replication and Tape as a Service (TaaS) for Veeam Cloud Connect Backups.


Veeam Management Pack (MP) for System Center v8 Update 4 -  Veeam MP Update 4 extends your traditional on-premises System Center Operations Manager (SCOM) monitoring of VMware, Microsoft Hyper-V and Veeam Backup & Replication environments out to Microsoft Operations Management Suite (OMS) to allow the management and monitoring anywhere, anytime. Veeam MP for System Center provides integration, monitoring, advanced reporting and detailed topology features for virtualized environments, including the virtual machines, the physical hosts and the associated network and storage fabric. Veeam MP allows these advanced features to be leveraged across multiple System Center components, including: System Center Operations Manager (SCOM), Orchestrator (SCORCH), Virtual Machine Manager (SCVMM) and Service Manager (SCSM).

Veeam Availability for AWS - (delivered through a Veeam and N2WS strategic partnership) is a cloud-native, agentless backup and Availability solution designed to protect and recover AWS applications and data. With this solution, companies can mitigate the risk of losing access to applications and they are ensured protection of their AWS data against accidental deletion, malicious activity and outages.

Veeam Agent for Microsoft Windows - Now available. Based on Veeam Endpoint Backup Free and includes two editions — workstation and server. With this solution, you get complete protection for both workstations and Windows physical servers, even those running in the cloud.

Veeam Powered Network - Veeam PN -  A free Software defined network appliance deployed to your Microsoft Azure environment. The idea is to ease the process of network connectivity when moving workloads to the cloud. Once deployed on Azure another virtual appliance is then deployed into your on-premises environments.  From there you would configure which networks you wish to have access into Azure and importing site configuration files that are automatically generated to your remote site. Proving a secured link between sites.

Veeam Backup for Office 365 v1.5 – This release adds better scalability  with new proxies and repositories speeding up backup times.

Veeam Availability Console – Accelerating the migrating, managing, and protecting public clouds. AWS, Azure, and more – workloads, physical severs, and endpoints. With a single pane of glass console, GA expected Q3, but  the Release candidate was launched at VeeamON.

VMware vCloud Director Integration - VMware vCloud Director Integration with Veeam Cloud Connect Replication to reduce daily management and maintenance cost. ,Multi-tenant configuration by using vCloud for Disaster Recover-as-a-Service.

Tape-as-a-Service  - Tape-as-a-Service (TaaS) for Veeam Cloud Connect Backups. Allows for partners to deliver additional ‘tape out’ services to help customers meet compliance requirements for archival and long-term retention. Air-gapped backups protect against ransomware or insider threats, and it acts as an additional layer of disaster recovery protection.
Veeam Cloud and Service Provider Directory - free online platform for Veeam customers and partners who are looking for a cloud or service provider in their area who offers services using Veeam products.

Nimble Secondary Flash Arrays – Nimble announced a Secondary Flash array that combines flash, deduplication, and Predictive Analytics. The result: a secondary storage array that lets you run real workloads. You get fast flash performance, high effective capacity, 99.9999% measured availability, and simple to deploy and use. Designed to simply and efficiently handle tasks like Veeam backups and disaster recovery, Nimble Storage Secondary Flash Arrays offer the flash-optimized performance to run development/test, QA, and analytics on your secondary data – plus production workloads when needed.
That's all for now. once I go back through my notes I'm sure I will add a few updates!

Saturday, March 4, 2017

Datastore size and VMs per datastore. A look at Disk Queue limits affect on sizing.

As a consultant I get all kinds of questions but two of the most commonly asked is; "What size do we need to make these datastores?" and "How many VMs can we put in these datastores?" Well I believe that both of these questions are related.
First question, what size do we need to make these datastores? Well the VMwareConfig Max doc for vSphere 6.5 list the maximum volume size of 64TB, but just because you can doesn’t mean you should. Back in the days of vSphere 4.1 you were limited to a volume size of 2TB so the choice wasn’t as hard and most of the datastores I was running into were segregated into disk speeds and raid arrays under 2TB. So some of the high performance raid 10 on 15k disks were carved up into 250GB to 600GB LUNs and the raid 5 and 6 and 7.2k or 10k disks were anywhere from 500GB to 2TB. Need a fast disk in your vm? Carve up a chunk of the faster storage. Need slow disk? Use slow storage, Easy right? Well not any more. The world is full of storage vendors that have huge caches, auto-tiering, dedupe, and all around magic. So how big do we make them?
That leads us to the second question; how many VMs can we put in these datastores? Now that we can have huge datastores and storage choices are endless the answer is a little more complicate. I still get a funny feeling in my tummy when I get asked this question. Explaining the reasons why are not common knowledge and start to delve into the deep dark corners of vSphere, sometimes turning people off to doing the research themselves. I’m going to try and make it a little easier. This is for existing environments
One key is to know how queue depth works in vmware. Queue depth, is the number of pending input/output (I/O) requests for a volume. For VMware it is a limit of request that can be open on a storage port at any one time. It is a hardware dependent setting on the HBA or iSCSI initiator (software or hardware) that sets a limit on the queue depth. It allows vSphere host to have VMs that are able to share disk resources and make having multiple VMDKs per LUN possible. If queue depth settings are set too high the Storage ports get congested leading to poor VM disk performance. Conversely if set too low, the Storage ports become underutilized and that nice expensive SAN you bought is being underutilized.
I still didn’t answer the first or second question, why? It depends. I took the easy way out hu? I can however, help you find the answer that works for you. I’m going to blow your mind, read on if you are ready to reach the next level of control over your storage environment.
I will break this down into things you will need to know before you start;
  • Know your environment! What HBAs are you using? What SAN are you using? What storage protocol? What storage vendors are you looking at if acquiring new storage? 
  • Know your house and you will own it.Know your tools! Exstop, exscli, powercli, powershell, and your SAN interface will enable you to find the answers that you seek.
  • Know your resources! Google, forums, vmware / SAN support, and experienced consultants can guide you on your journey.

If you are good with that then we can get technical.
First you need to know how your environment is configured now so it exstop time. The command “esxtop -d” will give you the current queue depth in the “QUED” column. That’s the first puzzle piece.
Now you need to find the max queue depth for your storage adaptor, for that we will need to run esxtop -d then f you will now press f you will now be able to see the stats for the adapter under the AQLEN column. AQLEN is the queue depth of the storage adapter.
Now we need to find the storage device. Run exstop –u and hit f look at the DQLEN column. DQLEN is the queue depth of the storage device. This is the maximum number of ESX VMKernel active commands that the device is configured to support.
Now that you are armed with data you can start making choices. Do you raise the queue limit or do you keep it where it is? How many VMs am I able to support on this LUN without hitting the queue limit? If you are buying new storage what do they support and what is best practice? What are the physical limits on the storage arrays you are using or plan on using? It is important to determine what the queue depth limits are for the storage array. All of the HBAs that access a storage port must be configured with this limit in mind. You can use addition for this Time to answer both questions right? Yup the answer is still “It depends”, there are other factors like storage protocol, SAN fabric / SAN switches and IO needs, but now you can make an educated choice on how you can size your environment in regards to the subject covered in this post .
I generally see good performance and organizational benefits to using multiple 4TB Datastores when using a san that has auto-tiering and can handle the IO that is required for your environment. You can get IO required by working with a VMware partner and having them perform an evaluation using VMware capacity planner or you can do the math by adding up and trending IO load from your servers. For the VM count I find that averaging around 20 standard server load VMs like small web app servers, file servers, and RO domain controllers will work well. I prefer to half the count when using SQL, Exchange, or any other High IO server load. If your SAN doesn’t auto-tier well or policy dictates that you use LUNS in standard raid groups then the old way of thinking applies, only you are not limited to 2TB datastores. Just remember either way you need to take queue usage and limits into account.
If 4TB LUNs are overkill then size it down move VMs over and check all disk stats, not just disk queue. Ultimately every environment is different I have just averaged my findings as a Data center Virtualization guy; you still have to put in the time to make the most of your environment.

Now one thing I didn't mention is Hyper Converged architecture, This throws another wrench into the mix. Eventually I will  around to mostly answering that question as well.


Sources
VMware KB: Controlling LUN queue depth throttling in VMware

VMware KB: Changing the queue depth for QLogic, Emulex and Brocade HBAs

VMware KB: Checking the queue depth of the storage adapter and the storage device

Troubleshooting Storage Performance in vSphere – Storage Queues

VMware® vSphere 6.5 Configuration Maximums

The only method for knowing your true optimum Queue Depth for VMware




Thursday, January 19, 2017

vSphere 6.5 updates.

vSphere 6.5 is here and has had some time to bake in. Here are some of the improvements and changes VMware made to vSphere 6.5  concerning Host and Resource Management :

(From VMware)

vSphere 6.5 brings a number of enhancements to ESXi host lifecycle management as well as some new capabilities to our venerable resource management features, DRS and HA.  There are also greatly enhanced developer and automation interfaces, which are a major focus in this release.  Last but not least, there are some notable improvements to vRealize Operations, since this product is bundled with certain editions of vSphere.  Let’s dig into each of these areas.

Enhanced vSphere Host Lifecycle Management Capabilities

With vSphere 6.5, administrators will find significantly easier and more powerful capabilities for patching, upgrading, and managing the configuration of VMware ESXi hosts.
VMware Update Manager (VUM) continues to be the preferred approach for keeping ESXi hosts up to date, and with vSphere 6.5 it has been fully integrated with the VCSA.  This eliminates the additional VM, operating system license, and database dependencies of the previous architecture, and now benefits from the resiliency of vCenter HA for redundancy.  VUM is enabled by default and ready to handle patching and upgrading tasks of all magnitudes in your datacenter.
Host Profiles has come a long way since the initial introduction way back in vSphere 4!  This release offers much in the way of both management of the profiles, as well as day-to-day operations.  For starters, an updated graphical editor that is part of the vSphere Web Client now has an easy-to-use search function in addition to a new ability to mark individual configuration elements as favorites for quick access.
vsphere65-host-profile-editorAdministrators now have the means to create a hierarchy of host profiles by taking advantage of the new ability to copy settings from one profile to one or many others.
Although Host Profiles provides a means of abstracting management away from individual hosts in favor of clusters, each host may still have distinct characteristics, such as a static IP address, that must be accommodated.  The process of setting these per-host values is known as host customization, and with this release it is now possible to manage these settings for groups of hosts via CSV file – undoubtedly appealing to customers with larger environments.
Compliance checks are more informative as well, with a detailed side-by-side comparison of values from a profile versus the actual values on a host.  And finally, the process of effecting configuration change is greatly enhanced in vSphere 6.5 thanks to DRS integration for scenarios that require maintenance mode, and speedy parallel remediation for changes that do not.
Auto Deploy – the boot-from-network deployment option for vSphere – is now easier to manage in vSphere 6.5 with the introduction of a full-featured graphical interface.  Administrators no longer need to use PowerCLI to create and manage deploy rules or custom ESXi images.
ad-gui-tab-animation
New and unassigned hosts that boot from Auto Deploy will now be collected under the Discovered Hosts tab as they wait patiently for instructions, and a new interactive workflow enables provisioning without ever creating a deploy rule.
Custom integrations and other special configuration tasks are now possible with the Script Bundle feature, enabling arbitrary scripts to be run on the ESXi hosts after they boot via Auto Deploy.
Scalability has been greatly improved over previous releases and it’s easy to design an architecture with optional reverse proxy caches for very large environments needing to optimize and reduce resource utilization on the VCSA.  And like VUM, Auto Deploy also benefits from native vCenter HA for quick failover in the event of an outage.
In addition to all of that, we are pleased to announce that Auto Deploy now supports UEFI hardware for those customers running the newest servers from VMware OEM partners.
It’s easy to see how vSphere 6.5 makes management of hosts easier for datacenters of all sizes!

Resource Management – HA, FT and DRS

vSphere continues to provide the best availability and resource management features for today’s most demanding applications. vSphere 6.5 continues to move the needle by adding major new features and improving existing features to make vSphere the most trusted virtual computing platform available.  Here is a glimpse of the what you can expect to see when vSphere 6.5 later this year.

Proactive HA

Proactive HA will detect hardware conditions of a host and allow you to evacuate the VMs before the issue causes an outage.  Working in conjunction with participating hardware vendors, vCenter will plug into the hardware monitoring solution to receive the health status of the monitored components such as fans, memory, and power supplies.  vSphere can then be configured to respond according to the failure.
Once a component is labeled unhealthy by the hardware monitoring system, vSphere will classify the host as either moderately or severely degraded depending on which component failed. vSphere will place that affected host into a new state called Quarantine Mode.  In this mode, DRS will not use the host for placement decisions for new VMs unless a DRS rule could not otherwise be satisfied. Additionally, DRS will attempt to evacuate the host as long as it would not cause a performance issue. Proactive HA can also be configured to place degraded hosts into Maintenance Mode which will perform a standard virtual machine evacuation.

vSphere HA Orchestrated Restart

vSphere 6.5 now allows creating dependency chains using VM-to-VM rules.  These dependency rules are enforced if when vSphere HA is used to restart VMs from failed hosts.  This is great for multi-tier applications that do not recover successfully unless they are restarted in a particular order.  A common example to this is a database, app, and web server.
In the example below, VM4 and VM5 restart at the same time because their dependency rules are satisfied. VM7 will wait for VM5 because there is a rule between VM5 and VM7. Explicit rules must be created that define the dependency chain. If that last rule were omitted, VM7 would restart with VM5 because the rule with VM6 is already satisfied.
orchha
In addition to the VM dependency rules, vSphere 6.5 adds two additional restart priority levels named Highest and Lowest providing five total.  This provides even greater control when planning the recovery of virtual machines managed by vSphere HA.

Simplified vSphere HA Admission Control

Several improvements have been made to vSphere HA Admission Control.  Admission control is used to set aside a calculated amount of resources that are used in the event of a host failure.  One of three different policies are used to enforce the amount of capacity is set aside.  Starting with vSphere 6.5, this configuration just got simpler.  The first major change is that the administrator simply needs to define the number of host failures to tolerate (FTT).  Once the numbers of hosts are configured, vSphere HA will automatically calculate a percentage of resources to set aside by applying the “Percentage of Cluster Resources” admission control policy.  As hosts are added or removed from the cluster, the percentage will be automatically recalculated.  This is the new default configuration, but it is possible to override the automatic calculation or use another admission control policy.
Additionally, the vSphere Web Client will issue a warning if vSphere HA detects a host failure would cause a reduction in VM performance based on the actual resource consumption, not only based on the configured reservations.  The administrator is able to configure how much of a performance loss is tolerated before a warning is issued.

admission-controlFault Tolerance (FT)

vSphere 6.5 FT has more integration with DRS which will help make better placement decisions by ranking the hosts based on the available network bandwidth as well as recommending which datastore to place the secondary vmdk files.
There has been a tremendous amount of effort to lower the network latency introduced with the new technology that powers vSphere FT. This will improve the performance to impact to certain types of applications that were sensitive to the additional latency first introduced with vSphere 6.0. This now opens the door for even a wider array of mission critical applications.
FT networks can now be configured to use multiple NICs to increase the overall bandwidth available for FT logging traffic.  This is a similar configuration to Multi-NIC vMotion to provide additional channels of communication for environments that required more bandwidth than a single NIC can provide.

DRS Advanced Options

Three of the most common advanced options used in DRS clusters are now getting their own checkbox in the UI for simpler configuration.
  • VM Distribution: Enforce an even distribution of VMs. This will cause DRS to spread the count of the VMs evenly across the hosts.  This is to prevent too many eggs in one basket and minimizes the impact to the environment after encountering a host failure. If DRS detects a severe imbalance to the performance, it will correct the performance issue at the expense of the count being evenly distributed.
  • Memory Metric for Load Balancing: DRS uses Active memory + 25% as its primary metric when calculating memory load on a host. The Consumed memory vs active memory will cause DRS to use the consumed memory metric rather than Active.  This is beneficial when memory is not over-allocated.  As a side effect, the UI show the hosts be more balanced.
  • CPU over-commitment: This is an option to enforce a maximum vCPU:pCPU ratios in the cluster. Once the cluster reaches this defined value, no additional VMs will be allowed to power on.

drs-settingsNetwork-Aware DRS

DRS now considers network utilization, in addition to the 25+ metrics already used when making migration recommendations.  DRS observes the Tx and Rx rates of the connected physical uplinks and avoids placing VMs on hosts that are greater than 80% utilized. DRS will not reactively balance the hosts solely based on network utilization, rather, it will use network utilization as an additional check to determine whether the currently selected host is suitable for the VM. This additional input will improve DRS placement decisions, which results in better VM performance.

SIOC + SPBM

Storage IO Control configuration is now performed using Storage Policies and IO limits enforced using vSphere APIs for IO Filtering (VAIO). Using the Storage Based Policy Management (SPBM) framework, administrators can define different policies with different IO limits, and then assign VMs to those policies. This simplifies the ability to offer varying tiers of storage services and provides the ability to validate policy compliance.

sioc-spbmContent Library

Content Library with vSphere 6.5 includes some very welcome usability improvements.  Administrators can now mount an ISO directly from the Content Library, apply a Guest OS Customization during VM deployment, and update existing templates.
Performance and recoverability has also been improved.  Scalability has been increased, and there is new option to control how a published library will store and sync content. When enabled, it will reduce the sync time between vCenter Servers are not using Enhanced Linked Mode.
The Content Library is now part of the vSphere 6.5 backup/restore service, and it is part of the VC HA feature set.

Developer and Automation Interfaces

The vSphere developer and automation interfaces are receiving some fantastic updates as well. Starting with the vSphere’s REST APIs, these have been extended to include VCSA and VM based management and configuration tasks. There’s also a new way to explore the available vSphere REST APIs with the API Explorer. The API Explorer is available locally on the vCenter server itself and will include information like what URL the API task is available to be called by, what method to use, what the request body should look like, and even a “Try It Out” button to perform the call live.
api-explorerMoving over to the CLIs, PowerCLI is now 100% module based! There’s also some key improvements to some of those modules as well. The Core module now supports cross vCenter vMotion by way of the Move-VM cmdlet. The VSAN module has been bolstered to feature 13 different cmdlets which focus on trying to automate the entire lifecycle of VSAN. The Horizon View module has been completely re-written and allows users to perform View related tasks from any system as well as the ability to interact with the View API.
The vSphere CLI (vCLI) also received some big updates. ESXCLI, which is installed as part of vCLI, now features several new storage based commands for handling VSAN core dump procedures, utilizing VSAN’s iSCSI functionality, managing NVMe devices, and other core storage commands. There’s also some additions on the network side to handle NIC based commands such as queuing, coalescing, and basic FCOE tasks. Lastly, the Datacenter CLI (DCLI), which is also installed as part of vCLI, can make use of all the new vSphere REST APIs!
Check out this example of the power of DCLI’s interactive mode with features like tab complete:

dcli

Operations Management

There’s been some exciting improvements on the vSphere with Operations Management (vSOM) side of the house as well. vRealize Operations Manager (vR Ops) has been updated to version 6.4 which include many new dashboards, dashboard improvements, and other key features to help administrators get to the root cause that much faster and more efficient. Log Insight for vCenter has been also updated, and will be on version 4.0. It contains a new user interface (UI) based on our new Clarity UI, increased API functionality around the installation process, the ability to perform automatic updates to agents, and some other general UI improvements. Also, both of these products will be compatible with vSphere 6.5 on day one.
Digging a little further into the vR Ops improvements, let’s first take a look at the three new dashboards titled: Operations Overview, Capacity Overview, and Troubleshoot a VM. The Operations dashboard will display pertinent environment based information such as an inventory summary, cluster update, overall alert volume, and some widgets containing Top-15 VMs experiencing CPU contention, memory contention, and disk latency. The Capacity dashboard contains information such as capacity totals as well as capacity in use across CPU count, RAM, and storage, reclaimable capacity, and a distributed utilization visualization. The Troubleshoot a VM dashboard is a nice central location to view individual VM based information like its alerts, relationships, and metrics based on demand, contention, parent cluster contention, and parent datastore latency.
vrops-dash
One other improvement that isn’t a dashboard but is a new view for each object, is the new resource details page. It closely resembles the Home dashboard that was added in a prior version, but only focuses on the object selected. Some of the information displayed is any active alerts, key properties, KPI metrics, and relational based information.
vrops-details
Covering some of the other notable improvements, there is now the ability to display the vSphere VM folders within vR Ops. There’s also the ability to group alerts so that it’s easy to see what the most prevalent alert might be. Alert groups also enable the functionality to clear alerts in a bulk fashion. Lastly, there are now KPI metric groups available out of the box to help easily chart out and correlate properties with a single click.

Source: What’s New in vSphere 6.5: Host & Resource Management and Operations


VeeamON 2017 Announcements

VeeamOn 2017 has turned out to be pretty good. There were lots of updates from Veeam and partners that should keep me busy for quite awhile...