Archive for Papers

Cloud: Dark Side, Good Side

I found some good food for thoughts in Dave Durkee’s article “Why Cloud Computing will never be free” in the May issue of Communications of the ACM (did I ever say how much I enjoy reading CACM, lately?)

Competitive price pressures threaten to down spiral Cloud service quality. Those consumers who nickel and dime the commodity goods served by the Cloud (e.g., computing cores or gigabytes of storage) will get a taste of their own medicine, as Cloud providers nickel and dime them back just the same or more.  The supply-side “shop of horrors” that Durkee documents for Cloud is a scary one:

  • Murky pricing models or time commitments
  • Grossly oversubscribed resources
  • Silent traffic engineering downgrades (e.g., from 10Mbit/s down to 1Mbit/s)
  • Recycling of failed disk drives
  • Unattainable service uptimes (say  4 or 5 nines) with clement penalty clauses (say 10% discount) whose only purpose is to give the provider the bragging rights on those many nines
  • Undefined performance auditing benchmarks
  • SLA miss refunds that stretch out a customer’s time commitment (e.g., annualized refunds only)

These syndromes down-spiral the Cloud and set it further away from an enterprise’s needs.

Service level management (SLM) is the key to revert the direction and pull up the Cloud to enterprise-class service. Enterprises will want translate their business imperatives into service level objectives (SLOs), use them in the SLA negotiation with the Cloud provider, then monitor hits and misses. But SLM is no small feat. At either end of the service demarc line, there has to be complex logic to manage those SLOs and apply proper compensating actions upon exception. It’s no surprise that the crown jewels of companies like FedEx or Santa Fe Railway are the SLM logic that maximizes shareholder value out of the least amount of commodity resources (Fred Smith didn’t invent new airports nor airplanes).

These aspects didn’t go unnoticed to the Utility Computing camp (UC being the fad just before Cloud). They standardized a protocol, WS-Agreement, to manage a SLA throughout its lifecycle. May some of that experience be leveraged in the new world.

Sent from my iPad

Leave a Comment

Cloud pulls crypto agendas

What a great monthly publication CACM is. In the 15 years that I’ve been a member of the ACM, this must be the time that I’m getting the most out of CACM (now in soft-copy as well for extra convenience). In recent issues, CACM has featured interesting crypto papers with a Cloud spin.

In the March issue, I dug into Craig Gentry’s paper on homomorphic encryption. In today’s Clouds, we cannot separate delegation of processing from delegation of cleartext access. Enter homomorphic crypto and, voila, we no longer need to question a Cloud provider’s aptitude to handle sensitive information. With this crypto, one can tap off-the-shelf public compute resources to do the Navier-Stokes for a new wing or process the interception tracks from some military sightings, yet without ever revealing a thing. In practice, however, I doubt that there are that many Cloud use cases begging for homomorphic crypto … once I take away those that belong in private Clouds anyhow (e.g., for SLA reasons) and those that can be simply dealt with via anonymization (e.g., for medical records), tokenization (e.g., for select PII elements), and simple tests for equality (for which standard crypto suffices). Regardless, this is one of those jaw-dropping results well worthy of a you-must-be-kidding-me reaction. I give Gentry plenty kudos for making his material highly accessible and engaging. In the pile of security papers that I have read over the years, Alice has never looked so good and crafty!

In the April Issue, I’m reading Sergey Yekhanin’s article on crypto protocols that protect the privacy of queries to public databases. It’s not an identity challenge. Rather, it’s about disguising the intention of a query or a set of queries. In the age of real-time analytics, it’s not far fetched that a database provider or a data aggregator in the Cloud manages to detect and then leverage mounting interest in a particular topic. Counter to that, the discipline of private information retrieval makes it hard or impossible to infer a subject’s intention at the expense of some communication and/or data overhead.

In both cases, I’m eager to see how these research results will be reduced to practice. The Cloud can dress up as transformational technology capable to pull through some powerful ideas.

Leave a Comment

Teach programming to your littl’ digital natives

In my monthly CACM issue, I found a delightful and somewhat unusual article on “Scratch“. With Scratch, Mitch Resnick et al.  at the MIT Media Lab have created a programming environment with the lowest up front investment for children and teenagers. As you would expect in a platform that speaks to digital natives, Scratch comes with a host of rich media and social networking components built in.

My children love Scratch. They were able to program in Scratch and do things that appealed to them from the very first session. I like them to spend time with Scratch because it lifts the curtain on how computer games and digital entertainment work. It stimulates their creativity and a can-do attitude towards technology.

In the mid ’90s, I had the fortune to meet Mitch Resnick at the Media Lab. My company back then was a top tier sponsor. I saw the first prototypes of what became Lego Mindstorms (whose programming user experience put the early seeds for Scratch). It’s fascinating how Resnick repeatedly gets it. He might as well be the Steve Jobs of under age computer human interface.

Leave a Comment

Cores’ spread raises bar in concurrency

Over the last quarters, I spent much time developing the case (ROI, TCO, etc.) for the latest multi-core processors and their yield, measured in transactions/$ and transactions/watt.

Flashback. ‘Twas the end of the 80s and I was a jr. engineer hard at work to get a 4-way 68020 SMP Unix box to perform reasonably well by placing locks in a recalcitrant SVR2.4 kernel. David Cheriton (or was this AST?) quipped that one could either work allnighters for 18 months to figure out all the locks, or else could go to the beach for just as long, come back, and expeditiously plug the CPU du-jour into a uniprocessor with a huge gain over the SMPs with yesteryear’s silicon. This figurated view of Moore’s law hit home. I went on to  find some new challenges (note:  microkernels; no beach).

Fast forward twenty years, and we hit our head on the ceilings of clock frequency and gate density. We have no choice left but run a multi-socket multi-core setup flat out. The superior CPU horsepower and memory hierarchy quickly surface the concurrency shortcomings in our code. The performance line tops off and then turns South.

So, let’s take on concurrency head on. My colleagues recently went to JavaONE and gave a good, well-received run down of their lessons learned in Java concurrency, resulting in some practical patterns and anti-patterns.  Do try them at home!

Sangjin Lee (eBay), Debashis Saha (eBay), Mahesh Somani (eBay), “Robust and Scalable Concurrent Programming: Lessons from the Trenches”. Here’s a before/after flashcard gleaned from their presentation. The full presentation is up for free download here.

There’s another side to this story: The memory wall. It’s just as important to single-out and rework those constructs that get in the way of L2/L3 cache efficiency, like HashMaps and the traversals of linked lists. Furthermore, we like to have a systemic way to manage and leverage any NUMA-ness in our systems.

I list hereafter topics that I’m highly interested in and will be following:

  • Post core-spread principles for kernel re-design, like Robert Morris’ Corey that I profiled earlier on; I anticipate that this year’s SOSP will feature quite a few papers in this space;
  • Java-only production stacks for which there is (at least) one layer too many between hypervisor, kernel, and JVM, and beg for due simplifications;
  • Machine-learning techniques to manage the combinatorial explosion of configuration knobs-and-dials and their inter-dependencies, like Ganapathi’s HotPar09 paper;
  • Transactional memory (I read a good article by Drepper on the Feb issue of CACM);
  • Access to all hardware counters that can inform tuning (you can’t manage what you can’t measure);
  • Share-nothing languages like Scala actors or the re-discovered Erlang (which dates back to  just about the same time of my flashback in the opening).

Some interesting times for sure!!!

Leave a Comment

As good as it gets…

Renowned DBMS leaders (including DeWitt and Stonebraker) just published a paper in which they contrast the DBMS magnum opus and the green-ish, increasingly popular MapReduce paradigm. This work will be presented at SIGMOD in a couple of months. Before then, you can get a sneak preview here.

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. Dewitt, Samuel Madden, and Michael Stonebraker, “A Comparison of Approaches to Large-Scale Data Analysis,” in SIGMOD 2009: Proceedings of the 2009 ACM SIGMOD International Conference, July 2009 (Providence, RI)

Back on January 2008, DeWitt and Stonebraker made some waves with their op-ed titled “MapReduce, a major step backwards”. This new paper offers far more nuanced claims, with the benefit of empirical data.

Without venturing into oversimplifying such claims, I was struck by observations such as: “we were impressed by how easy Hadoop was to set up and use in comparison to the databases” and “extensibility was another area where we found the database systems we tested lacking”.

May a constructive tussle benefit both camps, as there seems to be work left at either side, regardless of how long a journey they have been in. Plus, there will be hybrid forms.

In practical terms, I expect that DBMS and MapReduce will continue to exhibit very different TCO models and thus will be quite easy to set apart for a given use case (with the caveat that one’s own TCO model will be different).

Leave a Comment

Cloud Security Alliance’s Document

With the Security Guidance document, the newly formed Cloud Security Alliance is off to a solid start. I read the white paper with interest. I like to think that many focus areas for the CSA and the Cloud security community at large stem from one simply-stated root cause: Trust ain’t a transitive property.

Among things, the document addresses the concerns on accountability that I had raised on this blog.

Some musings after reading the CSA document:

We have always built systems in observance of least privilege. What’s the actual least privilege for a Cloud provider? Let’s pick a provider of the IaaS persuasion. No root access to guest virtual machines. No root access to virtual load balancers, virtual switches and virtual firewalls. What else can be meaningfully taken out of a provider’s key chain, without compromising on site stability and service availability? Meanwhile, a Cloud user will do well with more than one line of defense. For one, I like what the Overshadow researchers are doing to protect application data in the event of OS compromise. It won’t make data impenetrable. It does make it a whole lot harder to get to, forcing a new round of cat & mouse chase.

The argument that in a Cloud one should know the neighbor bears some fallacies. Knowledge does not imply control. Yet, it’s tempting to blur this line. For example, false security sets in among some engineers using a Cloud – that they have some deterministic control over resource sharing with other neighboring Cloud tenants. Some cubicles away, the procurement/legal colleagues who negotiated that Cloud agreement know all too well that they have no control nor leverage. In this example, Cloud tenants change just like weather does (uhm, may be the “Cloud” moniker isn’t a bad choice after all!).

Naturally, personal identifiable information (PII) is a defining embodiment of data worth securing against foes. This should not detract from other, more nuanced data types. Take business meta-data, for example. The correlation between a Cloud customer’s feature roll-out and the resulting traffic surge (or the lack thereof) goes a long way towards revealing strategy, tactics, and competitive stance. Typically, it leads to information (analytics) that the Cloud customer would want to control and keep close to its vest. Would a Cloud provider’s routine telemetry dole out precious insights on a Cloud customer’s business trajectory, and who would have access to this information at the Cloud provider’s end?

I look forward to seeing CSA’s membership grow. Also, I will be interested to track whether CSA will codify best practices and take a stance on specific technology nuggets like the increasingly popular OAuth.

Leave a Comment

LADiS proceedings on-line, w/ summary paper

Earlier on, I wrote about the workshop on Large-scale Distributed Systems and Middleware (LADiS 2008) that I attended and how much I enjoyed it.

In fact, I’ve joined some esteemed colleagues and co-authored a paper that summarizes the key thoughts and discussion topics that we heard at this event. It has now been published along with the revised version of the material that was originally delivered at LADiS.

The bibtex for the summary paper is as follows:

author = {van Renesse,, Robbert and Rodrigues,, Rodrigo and Spreitzer,, Mike and Stewart,, Christopher and Terry,, Doug and Travostino,, Franco},
title = {Challenges facing tomorrow’s datacenter: summary of the LADiS workshop},
booktitle = {LADIS ‘08: Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware},
year = {2008},
isbn = {978-1-60558-296-2},
pages = {1–7},
location = {Yorktown Heights, New York},
doi = {},
publisher = {ACM},
address = {New York, NY, USA},

The LADiS workshop has now “graduated” into a recurring event that in 2009 will be jointly held with the 22nd ACM Symposium on Operating Systems Principles (SOSP 2009). I’m part of the Technical Program Committee and will be a strong advocate for the event, starting from this blog.

Leave a Comment

New directions in datacenter switching

I read a couple of highly intriguing research papers on next generation datacenter switching, Monsoon and SEATTLE.  They develop a Cloud-friendly view wherein a large-scale datacenter features:

  • Flat plug-and-play addressing eliminating any server fragmentation
  • High bisection bandwidth
  • VM enablement
  • Cost efficiencies at large scale

by way of:

  • Huge and STP-free L2 domain with up ~10^5 servers in it
  • IP presence limited to connecting the datacenter to the Internet
  • Custom control plane and/or custom DHTs

I buy the spirit of these new-wave requirements, albeit with some caveats.

I will work hard to drive requirements strictly top-down from my applications and their own modus operandi down to the network, before I sign on a blank check for a anyone-to-anyone dynamic network environment.  As a case in point, let’s assume that I have a design pattern by which my applications are either stateless or have their state fully externalized (in fact, it’s one of the design principles at eBay). From this, I derive that I will not live migrate virtualized application instances and will use simple create/destroy semantics instead. If I don’t have to worry about live migration, my network closet and my associated processes will begin to look a whole lot simpler. [This is quite something to admit for one who set a live migration benchmark back in 2005!]

In a Cloud provider scenario, the top-end does look open to any application style and its opposite. Is it really so and do we need to be all inclusive? I believe that we can still handily contain the requirements posed to the network by thinking in terms of abstracted tiers (each tier is what is horizontally-scaled to the customer, independently). Furthermore, as we look up the chain, the various PaaS stipulations provide a host of cues in terms of partitionable, directional, tiered workloads.

Lastly, for these ideas to be operationalized at scale, the new control plane(s) will need to earn quite some trust, just like any other foundational piece. After all these years, we are still very scared of STP flaps and their turning into a SPOF for the datacenter.

I enjoyed reading these papers and am grateful for their out-of-the-box, stimulating thoughts.

Is there a rose without thorns, an Ethernet without STP?

Leave a Comment


I came across this excellent OSDI 08 paper by Robert Morris & team at MIT. They look into the widening gap between traditional system software and many-core hardware. Their approach is to zero on needlessly shared kernel fixtures and to seek-out application’s participation, for the application to sanction what really needs to be shared and amongst which things, no more and no less.

I fully resonate with the problem statement and the solution scope. We are actively moving from 8-core to 16-core servers and are stumbling precisely on these issues.  These days, I repeat myself that “what got you here won’t get you there”.  More of this journey in upcoming blog entries.

We often say that a picture is worth a thousand words. Their figure 2 is just brilliant. It really put my finger on the disparity in memory access timings among the 16 cores. 

Leave a Comment