VM-Guy's Virtual Life: Datastore size and VMs per datastore. A look at Disk Queue limits affect on sizing.

As a consultant I get all kinds of questions but two of the most commonly asked is; "What size do we need to make these datastores?" and "How many VMs can we put in these datastores?" Well I believe that both of these questions are related.
First question, what size do we need to make these datastores? Well the VMwareConfig Max doc for vSphere 6.5 list the maximum volume size of 64TB, but just because you can doesn’t mean you should. Back in the days of vSphere 4.1 you were limited to a volume size of 2TB so the choice wasn’t as hard and most of the datastores I was running into were segregated into disk speeds and raid arrays under 2TB. So some of the high performance raid 10 on 15k disks were carved up into 250GB to 600GB LUNs and the raid 5 and 6 and 7.2k or 10k disks were anywhere from 500GB to 2TB. Need a fast disk in your vm? Carve up a chunk of the faster storage. Need slow disk? Use slow storage, Easy right? Well not any more. The world is full of storage vendors that have huge caches, auto-tiering, dedupe, and all around magic. So how big do we make them?
That leads us to the second question; how many VMs can we put in these datastores? Now that we can have huge datastores and storage choices are endless the answer is a little more complicate. I still get a funny feeling in my tummy when I get asked this question. Explaining the reasons why are not common knowledge and start to delve into the deep dark corners of vSphere, sometimes turning people off to doing the research themselves. I’m going to try and make it a little easier. This is for existing environments
One key is to know how queue depth works in vmware. Queue depth, is the number of pending input/output (I/O) requests for a volume. For VMware it is a limit of request that can be open on a storage port at any one time. It is a hardware dependent setting on the HBA or iSCSI initiator (software or hardware) that sets a limit on the queue depth. It allows vSphere host to have VMs that are able to share disk resources and make having multiple VMDKs per LUN possible. If queue depth settings are set too high the Storage ports get congested leading to poor VM disk performance. Conversely if set too low, the Storage ports become underutilized and that nice expensive SAN you bought is being underutilized.
I still didn’t answer the first or second question, why? It depends. I took the easy way out hu? I can however, help you find the answer that works for you. I’m going to blow your mind, read on if you are ready to reach the next level of control over your storage environment.
I will break this down into things you will need to know before you start;

Know your environment! What HBAs are you using? What SAN are you using? What storage protocol? What storage vendors are you looking at if acquiring new storage?

Know your house and you will own it.Know your tools! Exstop, exscli, powercli, powershell, and your SAN interface will enable you to find the answers that you seek.

Know your resources! Google, forums, vmware / SAN support, and experienced consultants can guide you on your journey.

If you are good with that then we can get technical.
First you need to know how your environment is configured now so it exstop time. The command “esxtop -d” will give you the current queue depth in the “QUED” column. That’s the first puzzle piece.
Now you need to find the max queue depth for your storage adaptor, for that we will need to run esxtop -d then f you will now press f you will now be able to see the stats for the adapter under the AQLEN column. AQLEN is the queue depth of the storage adapter.
Now we need to find the storage device. Run exstop –u and hit f look at the DQLEN column. DQLEN is the queue depth of the storage device. This is the maximum number of ESX VMKernel active commands that the device is configured to support.
Now that you are armed with data you can start making choices. Do you raise the queue limit or do you keep it where it is? How many VMs am I able to support on this LUN without hitting the queue limit? If you are buying new storage what do they support and what is best practice? What are the physical limits on the storage arrays you are using or plan on using? It is important to determine what the queue depth limits are for the storage array. All of the HBAs that access a storage port must be configured with this limit in mind. You can use addition for this Time to answer both questions right? Yup the answer is still “It depends”, there are other factors like storage protocol, SAN fabric / SAN switches and IO needs, but now you can make an educated choice on how you can size your environment in regards to the subject covered in this post .
I generally see good performance and organizational benefits to using multiple 4TB Datastores when using a san that has auto-tiering and can handle the IO that is required for your environment. You can get IO required by working with a VMware partner and having them perform an evaluation using VMware capacity planner or you can do the math by adding up and trending IO load from your servers. For the VM count I find that averaging around 20 standard server load VMs like small web app servers, file servers, and RO domain controllers will work well. I prefer to half the count when using SQL, Exchange, or any other High IO server load. If your SAN doesn’t auto-tier well or policy dictates that you use LUNS in standard raid groups then the old way of thinking applies, only you are not limited to 2TB datastores. Just remember either way you need to take queue usage and limits into account.
If 4TB LUNs are overkill then size it down move VMs over and check all disk stats, not just disk queue. Ultimately every environment is different I have just averaged my findings as a Data center Virtualization guy; you still have to put in the time to make the most of your environment.

Now one thing I didn't mention is Hyper Converged architecture, This throws another wrench into the mix. Eventually I will around to mostly answering that question as well.

Sources

VMware KB: Controlling LUN queue depth throttling in VMware

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008113

VMware KB: Changing the queue depth for QLogic, Emulex and Brocade HBAs

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1267

VMware KB: Checking the queue depth of the storage adapter and the storage device

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1027901

Troubleshooting Storage Performance in vSphere – Storage Queues

http://blogs.vmware.com/vsphere/2012/07/troubleshooting-storage-performance-in-vsphere-part-5-storage-queues.html

VMware® vSphere 6.5 Configuration Maximums

https://www.vmware.com/pdf/vsphere6/r65/vsphere-65-configuration-maximums.pdf

The only method for knowing your true optimum Queue Depth for VMware