Wednesday, December 10, 2014

PernixData FVP accelerating View = Happy users

Been doing an evaluation of PernixData FVP with a portion of my users, all I can say is WOW when I look at the latency numbers that the desktops have:


The purple line is the latency of the backend datastore (a FibreChannel SAN) and the blue line is what the desktops experience with Pernix FVP utilizing a local SSD on the host to accelerate the LUN.....  pretty much instantaneous response!

Tuesday, November 25, 2014

Cleaning up “Already used” desktops in View

The “Already used” error that appears in Problem Desktops in View Administrator is a common problem that occurs with floating (non-persistent) desktops that you have assigned to refresh on log-off.  This occurs for various reasons but all leave the desktop in an error state, which requires manual administrator intervention to resolve the issue.
Follow the following KB to permanently resolve the issue:
I set all of my floating pools to "pae-DirtyVMPolicy=2" by default.

Sunday, August 24, 2014

VMworld 2014

Sitting at the airport waiting for my flight to San Francisco.  This will be my second VMworld, my first was in 2012.  What will we see this year?  VVols?  A VMware created or endorsed converged appliance architecture (Marvin)??  How is the recent acquisition of CloudVolumes going to change VMware's End User Computing (or is it to early to announce anything)?  Plus, how are the third party vendors going to make a splash?

Stay tuned.......

Thursday, July 24, 2014

Why I feel Hybrid SANs aren't necessarily the full solution for VDI storage

In the early days of VDI, storage immediately became a pain point in any virtual desktop implementation using a SAN.  Basically the only way to get the IOPS up to an acceptable level was to add more disk spindles, regardless of whether or not more storage space was actually needed.  More recently hybrid arrays have become all the rage that use fast SSD caching to cache common reads and often buffer writes before it gets written to the traditional spinning drives.  The prices on hybrid arrays are often also extremely competitive with traditional arrays and way cheaper than going all flash....

So simple decision right?  Go hybrid and problem solved?

Not so fast.....  VDI isn't the same beast as serving databases and web pages.

I have personally seen instances, due to the write "IO blender" effect of hundreds of virtual desktops doing many different things, effectively flood and fill the SSD's of a hybrid array with writes which resulted in the overall performance of the array to fall dramatically and caused latency to increase as the array struggled to flush the SSD cache to spinning disks (so we're back to the original problem) in an effort to free space to cache more writes.

My thoughts on a true solution?  Well I think I've found a few.... and which I'll soon post about each separately in the future.

Friday, July 18, 2014

Purging old events from the View Events database

Do you have a lot of non-persistent floating pool desktops that refresh on logoff?

Do you point View Administrator to a SQL Event Database to store historical events?

If so, I've found that the database can get quite big as it seems to basically keeps events forever.

There are VMware KB's for purging the vCenter database of old records, but nothing for View Events.
Then I found the following SQL query on the VMware Community boards:

https://communities.vmware.com/message/1881999
delete from [View-Event].[dbo].[v_event_data_historical] where EventID in (select EventID from [View-Event].[dbo].[v_event_historical] where Time < DATEADD(day,-30,getdate()))  
delete from [View-Event].[dbo].[v_event_historical] where Time < DATEADD(day,-30,getdate())

and I changed it to the following for my needs:
delete from [ViewEvents].[dbo].[viewevent_data_historical] where EventID in (where EventID from [ViewEvents].[dbo].[viewevent_historical] where Time < DATEADD(day,-365,getdate()))

delete from [ViewEvents].[dbo].[viewevent_historical] where Time < DATEADD(day,-365,getdate())
Where [ViewEvents] is the name of my Events Database and since I have the table prefix set as "view" in View Administrator,  [viewevent_data_historical] and [viewevent_historical] are the names of the tables that need to be purged of old events.  If using the above, you'd need to modify those to your table names.  

Also note where I changed "-30" to "-365" which is where you specify how old the data has to be to be deleted.  I chose 365 days as a safe starting point because 1 year's worth of old Event data is more than I'll ever need, but it's still there if I need to review the past year for some reason.

Even keeping 1 year of data reduced my ViewEvents database size from almost 20GB to ~7GB once I performed a database shrink after running the query.  There were at least 2-3 extra years of accumulated old data.

One last thing, if you don't have a lot of disk space and this is the first time running the cleanup, change your logging of the database to Bulk-logging or you'll run out of drive space with all of the transaction logs before it completes.

I also ended up setting up the above query to run as a weekly maintenance cleanup job to keep the Event database size consistent over time.

Wednesday, June 11, 2014

Why I'm moving from blades to traditional rack mount servers for VDI

While blade servers offer numerous advantages in virtualized environments (such as easy scalability, minimal cabling, and ease of setup) as VDI has progressed beyond simple desktop OS virtualization, blades have some significant drawbacks that leaves them less than ideal for hosting a modern virtual desktop if one expects near traditional end user computing performance. 

I'm currently architecting a second generation VDI deployment that currently resides on HP blades.  As the BL685c G6's are approaching five years of age and aren't on the ESXi 5.5 compatibility list, it's time for an update.  With modern developments such as Teradici offload cards, nVidia GRiD accelerated graphics, and affordable PCIe SSDs here is why I'm planning on moving the desktops onto newer standard rack mount 2U hosts.

1.)  Limited Graphics Capabilities
To provide the display performance that a typical user is accustomed to, additional graphics power is necessary.  Most traditional servers include minimal onboard graphics capability and in order to add extra graphics power, you need a PCI-e slot.  While these slots are available in other form factors, such as tower and rack servers, blade options typically don't include a lot in the way of advanced graphics capability, and when they do it adds complexity or expense.

2.)  PCI-Express cards not always available as a mezzanine option
The manufacturer proprietary mezzanine form factor that most blades have for expansion cards limits the expansion capabilities for the blades because it typically takes an extended period of time, sometimes years, for a particular type of card to become available in this form factor (usually either due to the time it takes a server vendor to certify a card or the time it takes a third party vendor to redesign the card into this form factor.)  Also mezzanine cards are sometimes are not backwards compatible with previous mezzanine slots.  For instance HP completely redesigned the mezzanine cards in their G8 generation of servers making them completely incompatible with the mezzanine cards from their older G6 and G7 generations (and vice versa, you cannot use a G8 card in a G6 or G7 slot).  PCI-e is an industry standard and a newer slot can nearly always utilize cards designed to an older specification.

3.)  Local disk storage options for blades are limited and/or expensive
Simply due to a lack of space in their slim form factor.  You aren't going to run VSAN easily on a blade (if at all).  

Wednesday, June 4, 2014

Disabling Startup Repair on your VDI Golden Images

Although the VMware View Optimization Guide is always a good place to start when creating Golden Images for your View deployment, there are a few things that you pick up in real world deployments that come in handy.  

Sometimes a virtual desktop shows up under Problem Desktops as “Agent Unreachable” but if you console into that that desktop you may find that Windows 7 just booted into startup repair mode.  You can easily disable this from happening.  Log into the Golden Image and open a command prompt as an Administrator and type in the following:

   bcdedit /set {default} bootstatuspolicy ignoreshutdownfailures

This should stop Windows 7 from ever launching startup repair after a shutdown failure.  You could also swap out ignoreshutdownfailures with ignoreallfailures if you wanted it to also ignore boot failures.

Thursday, May 29, 2014

Fusion IO card not found?

I recently moved a 80GB Fusion IO card from a retired server running ESX 4.1 to a newer server running ESXi 5.1U1.  Upon installing the latest Fusion IO ESXi driver (version 3.2.6) I was greeted with the message "fio-status requires the driver to be loaded on this platform" when trying to find out the status of the card.


Using the ESXi shell command lspci -v showed the card present in the host, but why wasn't the driver seeing it and loading?


Turns out if the firmware of the card is too old, the newer driver doesn't acknowledge the presence of the card.  On a wild guess after poking around the VMware HCL, I uninstalled the 3.2.6 driver and installed version 2.3.9 which was also listed as being compatible with ESXi 5.1.

Eureka!  The card was found, but a warning was present to update the firmware.


Updated the firmware and the card was good to go with the old 2.3.9 driver.

Then I updated the driver to 3.2.6, flashed the firmware again with the 3.2.6 compatible version, formatted the card and finally I was up to date with a working card.


Thursday, May 1, 2014

Logon Performance Enhancements

I’ve been experimenting with logons to try to making them faster for my users, and have come across a way to make the initial profile creation go faster…

- Delete the value “StubPath” under this key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\{44BBA840-CC51-11CF-AAFA-00AA00B6015C}

(I actually renamed mine to BACKUP_StubPath in case I ever wanted to reverse it)

Also change the following key:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\{44BBA840-CC51-11CF-AAFA-00AA00B6015C}]
"IsInstalled"=dword:00000001

Modify it to "IsInstalled"=dword:00000000

This will prevent “Windows Mail” from generating around 14MB of content when the user profile is created (local folder under AppData, not roaming). It seems to be some sort of database thats created and who actually uses the Windows Mail application now-adays? Probably the same number that still use Outlook Express....

-----------------

It can also be disabled through group policy as well if you don't want to do it via the registry:

CPU Config -> Admin Templates -> Windows Components -> Windows Mail -> Turn off windows mail application - enabled

This change cut about 15-20 seconds off my login times, and undoubtedly saved IOPS as well.

Monday, April 21, 2014

HP G7 blade Emulex network adapters and ESXi 5.5

I recently tried to update some of my BL685c G7 blades and BL465c G7 blades from ESXi 5.1 to 5.5. These all have Emulex OneConnect 10Gb NICs in them.  After a successful upgrade, upon reboot it appeared that some of the vmnic interfaces had vanished (the ones that all my virtual machine traffic was on, fortunately the management network was still intact).  

Apparently a new driver model has been introduced as part of ESXi 5.5 and the host was now using the elxnet driver which apparently is where the problems started.  I tried the usual troubleshooting methods of updating the Emulex firmware to the latest from HP and the latest native Emulex NIC driver available from VMware (which was newer than the one on HP's ESXi 5.5 custom ISO) but the problems continued.

I even wasted a few hours doing a fresh install and setup thinking that it was a bad upgrade.  Still no success.

Then, after an extensive Google search and a stop on the VMware Communities site, I came across the following thread:


A poster on that thread had the idea to deactivate the native driver and enable the legacy driver which apparently is still included in the ESXi 5.5 install, and the problem stopped and all of my network adapters were back to normal.

Run the following commands directly on the host to disable the native driver and enable the legacy driver:

esxcli system module set --enabled=false --module=elxnet

esxcli system module set --enabled=true --module=be2net

Then reboot the host.  Everything should be back to normal.

Tuesday, April 15, 2014

Time to give back

My day job has been a VMware View virtual desktop administrator for almost three years now for a university.  Starting with View 4 and around 50 users when almost nothing worked, now to Horizon View 5.2 with over 500 endpoints and a solid deployment where I don't have to worry about it every day.  I decided to start this blog to share some of my expertise that I've learned the hard way, so maybe it'll save someone else hours of headaches plus help me document it so I personally don't forget it.