Archive for category Google App Engine
At the #awssummit @Werner mentioned the micro instances and how they were useful to support use cases such as simple monitoring roles or to host services that need to always be on and listen for occasional events. @simonmunro cynically suggested it was a way of squeezing life out of old infrastructure.
I would prefer applications to hibernate, release resources and stop billing me but come back on instantly on demand.
Virtual Machine centric compute clouds (whether they be PaaS or IaaS oriented) exhibit this annoying “irreducible minimum” issue, whereby you have to leave some lights on. Not necessarily for storage, cloud databases or queues such as S3, SQS, SimpleDB, SES, SNS, Azure Table Storage etc. – they are always alive and will respond to a request irrespective of volume or frequency. They scale from 0 upwards. Not so for compute. Compute typically scales from 1 or 2 upwards (depending on your SLA view of availability).
This is one feature I really like about Google’s App Engine. The App Engine fabric brokers incoming requests, resolves it to the serving application and launches instances of the application if none are already running. An application can be idle for weeks consuming no compute resources and then near-instantly burst into life and scale up fast and furiously before returning to 0 when everything goes quiet. This is how elasticity of compute should behave.
My own personal application on App Engine remains idle most of the time. I am unable to detect that my first request has forced App Engine to instantiate an instance of my application first. My application is simple, but a new instance of my Python application will get a memcached cache-miss for all objects and proceed to the datastore to query for my data, put this data into the cache, and then pass the view-model objects to Django for rendering. Brilliantly fast. I can picture a pool of Python EXEs idling and suddenly the fabric picks one, hands it a pointer to my code base and an active request – bam – instant-on application.
For those applications that cannot give good performance from a cold start, App Engine supports the notion of “Always On” by forcing instances to stay alive with caches all loaded and ready for action: http://code.google.com/appengine/docs/adminconsole/instances.html
The screen shots below show my App Engine dashboard before my first request, how it spins up a single instance to cope with demand followed by termination after ten minutes of being idle.
Stage 1: View of the dashboard – no instances running – no activity in the past 24 hours
Stage 2: A request has created a single instance of the application and the average latency is 147ms. The page appeared PDQ in my browser.
Stage 3: 17 Requests later, and the average latency has dropped. One instance is clearly sufficient to support one user poking around.
Stage 4: I left the application alone for nearly ten minutes. My instance is still alive, but nothing happening.
Stage 5: After about ten minutes of being idle my application instance vanishes. App Engine has reclaimed the resources.
Head over to the new Google App Engine pricing that will come into effect when App Engine comes out of preview later in the year and you see a list of prices similar, in format at least, to pricing for AWS, Azure and other cloud providers. That seems fairly straightforward until you look at the FAQ that describes the pricing in more detail that, while answering a lot of questions, gives explanations that give rise to even more questions.
It seems that Google is switching over to an instance based pricing model from a CPU based one, but there are differences between different frameworks – where Java handles concurrent requests and Python and Go do not (yet). In addition the FAQ makes observations about the change in pricing that will affect current apps that are memory heavy because they have been designed to optimise the CPU pricing and may land up being more expensive under the new model. Then there are reserved instances, API charges, bandwidth, premier accounts and a whole lot of other considerations to add to the confusion. Even if you are not interested in App Engine it is a worthwhile read.
I have done and seen a few spreadsheets to try and work out hosting costs for cloud computing and they reach a point of complexity with so many unknowns that it becomes very difficult to go to the business with a definitive statement on how much it will cost to run an application. This is particularly difficult when development hasn’t even started yet, so there is no indication of the architectural choices (say memory over CPU) that affect the estimates. While AWS make be easier in some sense because the instance is a familiar unit (a machine yay big that we put stuff on), there are still many considerations that affect the cost of hosting. Grace an I struggled with a particular piece of SOLR availability and avoided using a load balancer for internal traffic until we ran the numbers and worked out that it would cost pennies per day in bandwidth costs so decided to use ELB after all – and that is one of the simpler pricing architectural decisions. Trying to build a scalable architecture out of loosely coupled components that makes optimal use of the resources available is very difficult to do.
We could ask vendors for better or more flexible pricing models. We could have estimating tools that allow us to estimate costs based on a choice of ‘similar’ application models. We could trade SLAs for cost as S3 reduced redundancy does. We can hedge out costs using reserved instances. We could run simulations (given the on demand availability this is relatively easy). We could have better tools to analyse our bills (as Quest has for Azure). We need all of this but ultimately the pricing of cloud computing is going to remain complex and will increase in complexity in future, leaving the big decisions up to the technical people doing the implementation.
Cloud expertise needs to extend beyond knowing your IaaS from your SaaS and experts need to have a handle on all aspects of cloud computing architectures, for a specific platform, in order to realise the benefits that cloud computing promises. In the context of developers being the new kingmakers, it is developers, software architects and DevOps that are the only ones close enough to the metal to make the decisions that ultimately affect the cost. Where currently developers optimise at the cost of development time (which is largely discouraged), we may want developers to optimise CPU against memory against latency against bandwidth against engineering effort, and even, at a push, against environmental friendliness in future. Let’s not even get into having to adapt to providers changing pricing models periodically. It is going to take some serious skill to pull that together – from the entire team.
So while the cloud computing marketers make it sound easy to put our apps onto the cloud there is a long road ahead in developing the necessary skills to ensure that it is done optimally and at a cost that is reasonable across the life of the application. There are business cases that could collapse under spiralling cloud costs if we pull one lever incorrectly.
Google has announced ‘Backends’ as part of their App Engine 1.5.0 Release. From the Python docs (emphasis mine),
Backends are special App Engine instances that have no request deadlines, higher memory and CPU limits, and persistent state across requests. They are started automatically by App Engine and can run continously for long periods. Each backend instance has a unique URL to use for requests, and you can load-balance requests across multiple instances.
Resident backends run continuously, allowing you to rely on the state of their memory over time and perform complex initialization. Dynamic backends come into existence when they receive a request, and are turned down when idle; they are ideal for work that is intermittent or driven by user activity.
This allows App Engine to be used in more scenarios and allows architectures that need more that web request/response. I have always thought that the service-web-request-and-terminate-as-soon-as-possible model of App Engine, while relevant to certain types of applications is limiting for general application requirements.
I wonder how this runs across the Google infrastructure? Moving away from stateless web request serving requires a different type of infrastructure, but I suppose Google has been doing this for a while for their own applications.
It will be interesting to see what developers and architects think and whether or not it will allow App Engine to be considered in more cases. Perhaps @jamessaull, or resident App Engine fan, will give us his comments.
In another post on the same blog, The Year Ahead for Google App Engine!, they announced a new pricing model and that they are finally moving out of preview (beta).
Perhaps App Engine is gearing up to become a serious contender for SME applications, on the heels of their more serious approach to enterprise email and collaboration.
In a classic enterprise IT process a business unit will commission a new system that they will fund mostly up front. The architects will push for the highest spec and capacity to meet anticipated demand over the next three years (and they don’t want to be associated with the system that runs like a dog). Governance will also insist that it is deployed in an HA configuration – multiple web servers, clustered app servers, clustered db servers. It is possible they might be able to put in a request to become a tenant of the multi-tenant high criticality database server; but they will have to jump through all sorts of hoops to prove they are worthy – no one likes noisy and nasty neighbours – especially in critical-ville.
This enterprise now jumps on the cloud. Woohoo. It is awesome. As much as you can eat Virtual Machines via VPN. So in goes some “process” to stop the greedy buffet lunchers and possibly defeating many of the agility hopes. On demand – at some point.
If this “process” gets out of hand it stands to jeopardise many of the gains hoped for. Architects begin to over spec things to avoid the change management processes and to make sure they get the best performance. And the cycle goes on.
I wanted to tackle the process of commissioning a new application though. If the enterprise is made up of thousands of departmental applications then each one will be deployed as a pair of web servers, app servers, database servers etc. If the irreducible minimum configuration is a pair, due to HA standards, then chances are utilisation will remain dreadfully low and some of the cloud promise is lost.
The above app can scale up, but it can’t scale down, leaving huge waste (e.g. over night and weekends). Scale this across the enterprise, the web tier, app tier and DB tier and it seriously adds up. Many try to kludge over this by over committing physical hardware. In the AWS cloud you would probably mitigate some of this by choosing Reserved Instances for the “irreducible minimum pair”. Given cloud automation you might be able to convince some business owners to run down as low as one machine – after all the time to recover a stateless web application in a VM is measured in minutes. Simon Wardley covers this behaviour well here: why enterprise clouds?
I keep harping on about how the notion of a Virtual Machine is a bit archaic in an age of cloud application platforms – see how the likes of Google App Engine move deliberately to avoid it. They feel like a transition state as they are well understood in terms of management, isolation, billing etc. Lean manufacturing would describe them as large batch sizes leading to waste.
Preferable to the previous scenario, and whilst we remain stuck with VM oriented services, would be something more like this:
The above shows a crude form of multi-tenancy. It keeps each VM fully packed with a sufficient number of applications with the irreducible minimum far greater than two. Each application enjoys higher levels of redundancy, by being spread across a necessarily higher number of machines, and utilisation stays nice and high. I am not saying it is a universal model, just one that suits a certain category of applications.
The key problems with this model though are that most clouds today use the coarse grained VM as the unit of failure and billing. There is no telemetry to apportion consumption to the application level. How much memory did it use, how many CPU cycles in the web tier, how many cycles in the DB tier, how many messages did it send, how many bytes in the database, how much bandwidth etc. Therefore people will continue with the previous model – cloud or no cloud.
On a secondary note. Web applications today continue to rely on things like sticky sessions and have no built in means for acquiescence. Elastic web farms go up as well as down. On the way up, new servers need to join the distributed cache and perform any warm up activities before being handed to the users. On the down side, if a server is being removed from the farm they need a grace period to cease accepting requests and complete the in flight requests ensuring all asynchronous processes are neatly handed off to queues etc. Most are just hacked off at the knees mid flight.
There has been a lot of noise recently around standards in IT. My mind naturally drifts to examples such as:
- .Net, Java, Ruby, Python, PHP, Rails and Django
- SOAP WS* and REST
- HTML, XHTML, CSS, ECMA Script and JQuery
- SQL Server, Oracle, MySQL, DB2, PostgreSQL and SQL 92 then on to HBase, MongoDB
My mind swims between seeing the value in closed/open standards as well as the detractions and how some things happen just fine without them. It seems on some occasions those standards are necessary to foster adoption, ecosystems and innovation whilst at other times they seem to really get in the way. For example SQL 92 only gets databases to agree on a relatively small, but crucial concern of databases; when you look at the complete feature set there aren’t many standards.
There are many clouds in action today: Google App Engine, Amazon Web Services, Microsoft Azure, GoGrid, Heroku, EngineYard, Force.com, PHP Fog, Eucalyptus – a big long list. Emerging from this real world I see Openstack and Opencompute emerge with a variety of motivations. I have to say I am quite optimistic about it – it tackles the very commoditised and “done” elements of cloud infrastructure and can help the industry generally keep moving to higher abstractions and in pursuit of value. Stabilising the lower layers makes sense. It is a little like agreeing on TCP/IP within the enterprise helped the web and associated technologies really get widespread and making it worthwhile stripping out all the IPX/SPX.
However, Openstack is not the only standards initiative for the cloud. There are the more product-oriented ones such as OpenNebula and RedHat DeltaCloud as well as the more “open” bodies approach from IEEE, DMTF, vCloud and the grey area around the AWS API.
So this raises the question. What on earth are all the cloud certifications about that are cropping up? Microsoft’s cloud certification is actually an Azure, SharePoint, Dynamics, HyperV certification – so let’s call it that. That’s fine. Over a decade ago I was a Microsoft Certified Solutions Developer. It told people I had some base level developer knowledge with a specific personal investment in Microsoft VB and SQL Server. Quite specific in the grand scheme of things. I wasn’t a certified developer. I was a Microsoft certified developer. The distinction is clear.
So how can vendor based “cloud” certifications be called “cloud” certifications? Especially when the subject is still so rapidly changing and so varied between vendors. Unless of course you have already worked out your cloud strategy and need people already heavily invested in it.
However, if you are trying to establish your cloud strategy then you’d feel inclined to avoid a certified cloud consultant as ironically their personal investment guarantees bias and a lack of knowledge about the majority of the cloud domain as the set of all cloud knowledge is vastly greater than one vendor’s opinion and product set.
It probably also means they only appreciate the technology elements of cloud too from what I can see.
Focusing on the economics of SaaS depends on a couple of things: efficiency and high resource utilisation. In other words the SaaS supplier wants every one of their assets to be turned to the greatest amount useful work whilst being able to endlessly scale to accommodate more and more customers; ideally the increasing volume should reflect a decrease in cost to serve each.
Multi-tenancy is one way to statistically multiplex a high volume of non-correlated, efficiently executed workloads to deliver very high customer:asset density – thus leading to high utilisation (only of course where economic well being is known to be a function of utilisation). SaaS usually has another property – self service, especially at sufficient scale. Scaling costs linearly (or worse) with consumer volumes is not a good business. Therefore customer provisioning has to be highly automated, swift and efficient.
This introduces customer-to-asset density. Imagine you chose to serve 10 customers with your enterprise web application. You might follow this evolution starting with the ludicrous-to-make-a-point:
- Procure 10 separate data centres, populate with servers, storage, networking, staff and so forth. Connect each customer to their data centre.
- Procure 1 data centre and 10 racks…
- Rent space for 10 racks in 1 data centre…
- Procure 1 rack with 10 servers…
- Procure 1 server with 10 Virtual Machines…
- Procure 1 server and install the application 10 times (e.g. 10 web sites)…
- Procure 1 server and install the application once and create 10 accounts in the application…
As we go down the list we are increasing the density and utilisation (assuming application efficiency is constant) and we meet different challenges on the way. At the start we have massive over-capacity, huge lead times and the expensive service is dominated by overheads. By stage 4 we have some real multi-tenancy happening via shared networking, storage and physical location etc.
At Stage 5 we notice that, when idle, each Virtual Machine is consuming plenty of RAM and a small amount of CPU. Even when busy serving requests each application is caching much the same data, loading the same libraries into memory etc. The overheads are still dominant. Might not sound too bad for 10 customers, but 10,000 customers each given 1GB of RAM for the operating system means 10TB of RAM. Those customers could be idle most of the time but you are still running 10,000 VMs just in case they make one little HTTP request…
So we are motivated to keep on pushing for more and more density to reach higher levels of utilisation by removing common overheads. Soon we are squarely in the domain of over-provisioning. This is where we know we can’t possibly service all our customer’s needs all at once and we get into the tricky business of statistical multiplexing: one man’s peak is another man’s trough. When one customer is idle another is busy. The net result: sustained high utilisation of resources. This takes a lot of management and some measure of over-capacity to ensure that statistical anomalies can be met without breaching SLA. Just how much over-capacity though?
All of a sudden, as a software services provider, you became embroiled in the complexity of running a utility. Building commercial models that take into account usage, usage patterns, load-factors and trying to incentivise customers to move their work away from peak times, trying to sell excess capacity by setting spot prices to keep utilisation high and all assets earning some revenue.
In reality this is the realm of utility compute providers and by running thousands of different workloads across many industries and time zones they stand a far better chance of doing it. Electricity providers provide power to railways and hospitals alike, but they don’t try to sell you a ticket or give you an x-ray. By reverse, railways and hospitals don’t try to generate their electricity. Not a new point, and not a new analogy.
Above – a quick sketch showing how today it is very hard to achieve very high utilisation, but the advent of increasingly complete PaaS offerings and “web-scale” technologies this is rapidly changing.
SaaS Maturity Model Level 1
Back to pushing along the density and utilisation curve. Public IaaS has solved the “utility problem” element of the task: without any capital expense I can now near-instantly, 100% automate the provisioning of new customers. Using AWS (as an example) and CloudFormation / Elastic Beanstalk, and maybe a touch of Chef, each customer gets their own isolated deployment that can independently scale up and down. Almost any web application can fit into this model without much re-engineering and by normal standards this is quite far along the utilisation curve. Every month the AWS bill for each customer’s deployment is sent to them with some margin added for any additional services provided such as new features, application monitoring, support desk etc.
Common technologies such as Chef, Capistrano etc. allow us to efficiently keep this elastic fleet of servers running the latest configurations, patches and so forth. Each customer can customise their system and move to newer versions on their own schedule – inevitably the service becomes customised or integrated to other software services and not all meaningful changes are trivial and non-breaking!
This simple model uses traditional technologies and is little more than efficient application hosting. The upside is that it isn’t much of a mental leap and requires off the shelf components and skills.
But is that enough? Are there cost benefits to going deeper?
SaaS Maturity Model Level 2
What if most customers are not BIG customers whose demand is always consuming a bare minimum of 2 Virtual Machines? What if a huge number of customers can barely justify 5% of a single VM? The task is now to multi-tenant each VM. A simple model would be to install the application 32 times on one VM and 32 times on another and then load balance the two machines. Each customer application instance would consume 2.5% on each machine (making 5%) leading to a total of 80% utilisation of each VM. An Auto Scale Elastic Load Balancer could always provision some more machines if needs be for burst / failure occasions.
This is still a simple and easy-to-achieve model and feels a bit like that story about filling a bucket with stones of different sizes. You might only get one big stone in a bucket, but you could get 10 smaller stones and 20 pebbles instead. But what about sand that represents the long tail of customers. Would you install the application 1000 times on a VM? How do you know the difference between a stone, a pebble and a grain of sand in a self service world? Are you going to run a really sophisticated monitoring and analytics system to work out the best way to fill your buckets, adjusting it frequently?
Still a simple model, and probably fit for purpose for a whole category of applications, workloads and customer segment. Importantly it relied mainly on IaaS automation and standard applications. However it has limitations and begins to lead back to many of the problems that face a utility provider – just at different layer.
SaaS Maturity Model Level 3
An improvement would be to deploy the application once into a single VM image sat behind an Auto Scaling Elastic Load Balancer (ASELB). The single application would use a discriminator (such as a secure login token or URL) and this discriminator would flow all the way through the application. Even database access would, for example, be only achieved via database views. These views would be supplied with the discriminator to ensure that each application only ever worked with its own data set.
A huge number of customers of all sizes would be simply balanced across a huge fleet of identical machines establishing very high levels of utilisation. It probably required some hefty application re-engineering to ensure tenant isolation and maybe even cryptography in some circumstances to ensure trespassing neighbours would only find encrypted data and only the customer supplies the key at login. It has to be said there aren’t many obvious pieces in the common application frameworks to help either – e.g. the discriminator concept being strictly enforced.
Whilst multi-tenant web applications can scale wonderfully the database comes into sharp focus. Elastic scaling up and down on-demand from a CPU, IO and storage perspective just isn’t a strong suit. Clearly some further re-engineering can shift the workload towards the application tier via techniques such as caching or data-fabrics that attempt to behave as in memory replicated databases. Perhaps the way the application works can be broken apart to employ asynchronous queuing in places to better level the load against the database. Either way, the classic database could continue to be barrier.
SaaS Maturity Model Level 4
At level 3 we managed to have one super large database containing all data for all customers, using a discriminator to isolate the tenants. It is very likely that unrelated data entities were placed into separate databases. Instead of having one server scaling we could have several. It is also possible that analytic workloads were also hived off on to separate servers. Maybe we even used database read-replicas to take advantage of the applications heavy read:write ratio. We probably also dug deep into modify the database transaction isolation levels and indexing. We maybe even separated out “search” into another subsystem of specialised servers (e.g. SOLR).
Sharding came next – attempting to split the database out even further. Maybe each customer got their own collection of databases, and each database server ran several databases. This is a bit like the Bucket analogy again – how do you balance the databases across servers? Some customers may have large amounts of data, but very few queries. Some may have little data but hundreds of users bashing away and running reports. Whilst this is a tough balancing act, it has a huge impact on your commercial model. What is expensive for you to serve? Data volumes? Reports? Writes? Accessing the product tables or the order history tables?
Whilst trying to balance all the customers and shift them around you are also trying to scale up and down with usage. Databases have very coarse controls, and shifting databases around and resizing them can mean downtime.
This is the sort of problem that the controlling fabric in SQL Azure solves and it is far from trivial!
Other partitioning schemes also exist such as customers A-F on one server, G-L on another etc. This too runs into issues of re-partitioning and rebalancing during growth/shrinkage and when hotspots arise. The classic old MySpace example is when Brad Pitt sets up a public page and all of a sudden the “B” partition attracts more heat than all the other partitions put together.
Level 4 is marked by exploring beyond the RDBMS as the only repository type. At this point it is key to thoroughly understand the data, access patterns, the application requirements and so forth. Are some elements better served by a graph database, a search technology, a key-value store, a triple store, a json store etc. A thorough scrutiny and knowing the options is vital. When Facebook look to introduce a new capability they don’t just try and create a 3rd normal form schema and put it into Oracle. That thinking is no longer sufficient.
Repositories that support scale-out, multiple-datacentre replication, eventual consistency, massively parallel analytics etc. This is an increasingly well trodden path and many of the technologies have come out of companies that have demonstrated they can solve this problem: Google, Facebook, Amazon, Microsoft etc. Names such as BigTable, SimpleDB, Azure Table Storage, MongoDB, Cassandra, HBase, Hadoop to name some popular ones.
SaaS Maturity Model Level 5
Level 4 saw some sophisticated re-engineering work to try and transform some regular technology to elastically scale with high utilisation levels at both the application and data layer and remain manageable!
The next level is probably the biggest discontinuity and probably doesn’t really exist yet as it is a blend of Google and Amazon. It takes a bold step and ignores the old notion of a Virtual Machine as a clunky interim solution. Applications don’t require operating systems, they require application platforms. Google’s App Engine is possibly the best example of an Application Platform. Even Azure as another popular PaaS makes it clear that you are renting Virtual Machines and that these are your units of scale, failure, management and billing. With GAE you pay for resource consumption – the number of CPU cycles your code executes – not how many VMs you have running and have desperately tried to fill up. This is like grinding up all the stones and pebbles and just pouring sand in to the bucket.
In order for this model to work the applications have to exhibit certain characteristics necessary for performance and scale, and the platform enforces it. For many this is a painful engineering discontinuity – and probably means a complete ground up application re-architecting.
PaaS provides the same fine-grained and utterly elastic data store capability too. The same goes for all the other subsystems such as queues, email, storage etc. The key point here is that each element of the application architecture is made up of Application Platform Services that have each solved the utility problem, are endlessly elastically scalable and charge for the value extracted (e.g. message sent, bytes stored, CPU cycles consumed not CPU cycles potentially used).
There is a reason people use IaaS as a foundation for building up PaaS and how SaaS sits more comfortably atop PaaS. Whilst GAE is probably one of the most opinionated PaaS architectures and suits a narrow set of use cases today, no doubt it will improve over time. Amazon and Microsoft will also continue to abolish the Virtual Machine by providing an increasingly complete Application Platform e.g. Simple Queue Service, Simple Notification Service, SimpleDB, Simple Email Service, Route53. At the same time Facebook, Yahoo and others may continue to open source enough technologies for anyone to build their own!
Following my post yesterday where I explained the awkwardness of batch deletes with Google App Engine, AWS have just introduced BatchDeletes for SimpleDB! If only App Engine would release features as continuously as AWS.