HAV vs. Cloud

I’m tired of the word cloud.  It is so hackneyed at this point that uttering it out of my mouth feels like a “poser” skateboarder in the late 80’s wearing Vision Street Wear and holding a pristine Gator pro model with rails, copers, and a nosegaurd.  Sigh.  However, for over a year now I have had to ensure that all my peers at Gap and those upstream of me understood the differences between “true” cloud and virtualization.  Infrastructure architectural designs and discussions had to be framed in such a way to avoid people (even smart people) from being sucked into broad misinterpretations of technology implementations.

Back in July of 2011, my buddy Chris Buben and I sat in the Gap cafeteria frustrated on how to get everyone on our teams aligned on how to properly consume “cloud”.  We wasted so much time in team meetings correcting people’s widespread misuse of the term that we were determined to fix the vernacular.  One of the biggest impediments to successful technology projects is clear communications.  Specifically, the need to align everyone on the terms.  This means you need everyone from technology VP down to analyst saying the same things the same way.  We needed a clearer way to mass communicate virtualization versus cloud.  So, we came up with a simple way to quickly and easily differentiate a lot of big infrastructure architectural differences with cloud and pure virtualization.  We started describing two different architectural zones:  One zone we called High Availability Virtualization or HAV and another just called cloud.

We wanted to draw a distinction between the two to avoid having people think that we can simply shove every application we have at Gap into what we considered real cloud architected infrastructure.  The reality is that you cannot.  Not every app is architected and has code that is amenable to real cloud computing.  The sad truth is that many commercially sold applications are not optimized for real cloud.  Case in point:  many Oracle e-business apps (In fact, Oracle seems to go out of their way to make applications difficult to run on anything but Oracle platforms and software… but I digress).  So, we needed a way to give ourselves breathing room to host some apps in a virtualized environment to reap some modern architectural benefits without being caught up in the “cloud” term abuse game.  HAV became our mantra for everything non-cloud.  You might ask, what and why are you drawing a distinction?  The answer is that we had our own clear viewpoints about what cloud really was versus just virtualization.  For purposes of this analysis we are using OpenStack as our cloud (IaaS) platform.  To understand our position, I put together a breakdown for HAV versus Cloud.  Understand that some features are continuously being developed and improved in the cloud arena (especially in the networking area) so this chart is changing as I type.

HAV Versus Cloud

  • Private Cloud = cheap, fault tolerant by design, disposable, big scale
  • HAV = more expensive, fault tolerance through licensed features, less scalability

PROFILES

HAV (High Availability Virtualization)

Cloud (Private IaaS)

Hardware:
Blade Servers or Rack ServersCommodity hardware (premium or cheap)
Hardware:
Rack Servers primarilyCommodity hardware (cheap)
Storage:

  • Shared SAN or NAS Disk for VMs
  • T1 SAN or T2 NAS Tier on iSCSI based block
  • High IOPs perforance expectations
Storage:

  • Local Disk for VMs
  • iSCSI based block cloud managed volumes
  • Generally expected “lower” IOPs performance
Lifetime:
Persistent VMs (horizontal scale needs are less)
Lifetime:
Disposable/Ephemeral VMs that scale out
Hypervisors:

  • Vmware esxi
  • RHEV
  • Citrix XenServer
  • Hyper-V
Hypervisors:

  • KVM
  • XEN
  • Other hypervisors may be supported, but we didn’t want to use them (esxi).
Patching:
Patching needed due to persistent lifetime VMs

  • WSUS for Windows
  • CM pushed RPM or YUM updates for Linux
Patching:
No “Traditional” Patching (always dispose & rebuild instantly)

  • High risk vulnerabilities require new AMIs to be built
  • CM pushed RPM or YUM updates for Linux
HA:

  • Persistent VMs that are re-deployed on failure
  • Clustering used for HA
  • Live Migration
  • Apps behind load balancer VIPs
HA:

  • On demand new VM instances deployed as needed
  • No reliance Live Migration
  • Apps behind load balancer VIPs
Custom Networking:

  • Can have Active/passive NICs on HAV hosts
  • Can have 2 Active NICs on hypervisor hosts
  • Dot1q VLAN trunks to HAV hypervisor hosts
  • Bridged networking between HAV VMs and network
  • Subnets controlled by VLANs and network L3 switches
  • Default gateways are external L3 VLAN interfaces
Homogenized Networking:  (Note: this is changing)

  • 2 Active NICs or more required on hypervisor Hosts
  • Dot1q VLAN trunks to HAV hypervisor hosts
  • NAT between cloud VMs and network is typical
  • VLANs/Subnets pre-allocated per Cloud Hypervisor Host or by Tenant/project
  • Default gateways are virtual gateways on Cloud Hypervisor Host
  • Default gateways can also be external L3 VLAN interfaces
DHCP
Enterprise DHCP servers
DHCP
Local Cloud Host DHCP servers or Enterprise External DHCP servers
DNS
Enterprise DNS direct to VMs
DNS
Cloud Host DNS Proxy (DNSMasq) to VMs or Enterprise DNS

Database Tier

HAV

Physical and virtual

(Determined by performance requirements)
Linux servers can utilize LXC containers for physical pseudo-virtualization 

Cloud (Private IaaS)

Physical and virtual (predominantly virtual)
(Determined by performance requirements)
Linux servers can utilize LXC containers for physical pseudo-virtualization 

Hardware:
Blade Servers or Rack Servers
Hardware:
Rack Servers predominantly
Storage:

  • Local disk
  • T1 SAN or T2 NAS
  • Database Tier on iSCSI based block
Storage:

  • Local disk
  • iSCSI based block cloud managed volumes

So, why go to all this trouble to break these out?  Because, when trying to properly place applications in the most appropriate environment for deployment, you need to take all these factors into consideration.  You may have an application that does not tolerate network address translation, or does not work well in a load balancer  design, or needs ultra-high IOPs.  In such a case, you may be better served deploying such apps in HAV architected infrastructure zones.  However, if you have a modern, truly service oriented architecture (SOA) app that doesn’t require ultra high IOPs, and is designed from ground up to be stateless and has built in failure detection then cloud architected zones may be perfectly appropriate.

The point is:  Don’t just try to shove every app (commercial or homegrown) into cloud architected zones without understanding the implications for doing so.  You may have undesired performance or reliability headaches.  Our goal is to move as much or our application workload to OpenStack based cloud infrastructure zones as possible.  However, on the journey to that nirvana, we have a lot of legacy app crud that just isn’t optimized for true cloud infrastructure.  For these apps, we have opted for a one-two hop using just an HAV infrastructure zone on the way to cloud.  This will buy us time to either re-architect those apps or simply replace them. This brings up a larger discussion about proper cloud application design.  In the next post, I will cover months of work by our internal architects and app dev teams at Gap for what comprises “cloud architected” application best practices.