Archive for May, 2009

As good as it gets…

Renowned DBMS leaders (including DeWitt and Stonebraker) just published a paper in which they contrast the DBMS magnum opus and the green-ish, increasingly popular MapReduce paradigm. This work will be presented at SIGMOD in a couple of months. Before then, you can get a sneak preview here.

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. Dewitt, Samuel Madden, and Michael Stonebraker, “A Comparison of Approaches to Large-Scale Data Analysis,” in SIGMOD 2009: Proceedings of the 2009 ACM SIGMOD International Conference, July 2009 (Providence, RI)

Back on January 2008, DeWitt and Stonebraker made some waves with their op-ed titled “MapReduce, a major step backwards”. This new paper offers far more nuanced claims, with the benefit of empirical data.

Without venturing into oversimplifying such claims, I was struck by observations such as: “we were impressed by how easy Hadoop was to set up and use in comparison to the databases” and “extensibility was another area where we found the database systems we tested lacking”.

May a constructive tussle benefit both camps, as there seems to be work left at either side, regardless of how long a journey they have been in. Plus, there will be hybrid forms.

In practical terms, I expect that DBMS and MapReduce will continue to exhibit very different TCO models and thus will be quite easy to set apart for a given use case (with the caveat that one’s own TCO model will be different).

Leave a Comment

WolframAlpha live – Numbers as a Service

I couldn’t wait to try Wolfram’s new creation. Today, it opened up to the public. In no time, I could figure out just how many days I lived and could plot some pesky functions. Really cool. In the next iteration, I hope it will be possible to plug a query to this “computational knowledge engine” (their definition and TM) in the midst of a complex workflow, whether it’s a Map/Reduce one or even a venerable Unix pipe.

So, is this what 1st commander Mr. Spock queries at work … ;-)

Leave a Comment

Cloud Security Alliance’s Document

With the Security Guidance document, the newly formed Cloud Security Alliance is off to a solid start. I read the white paper with interest. I like to think that many focus areas for the CSA and the Cloud security community at large stem from one simply-stated root cause: Trust ain’t a transitive property.

Among things, the document addresses the concerns on accountability that I had raised on this blog.

Some musings after reading the CSA document:

We have always built systems in observance of least privilege. What’s the actual least privilege for a Cloud provider? Let’s pick a provider of the IaaS persuasion. No root access to guest virtual machines. No root access to virtual load balancers, virtual switches and virtual firewalls. What else can be meaningfully taken out of a provider’s key chain, without compromising on site stability and service availability? Meanwhile, a Cloud user will do well with more than one line of defense. For one, I like what the Overshadow researchers are doing to protect application data in the event of OS compromise. It won’t make data impenetrable. It does make it a whole lot harder to get to, forcing a new round of cat & mouse chase.

The argument that in a Cloud one should know the neighbor bears some fallacies. Knowledge does not imply control. Yet, it’s tempting to blur this line. For example, false security sets in among some engineers using a Cloud – that they have some deterministic control over resource sharing with other neighboring Cloud tenants. Some cubicles away, the procurement/legal colleagues who negotiated that Cloud agreement know all too well that they have no control nor leverage. In this example, Cloud tenants change just like weather does (uhm, may be the “Cloud” moniker isn’t a bad choice after all!).

Naturally, personal identifiable information (PII) is a defining embodiment of data worth securing against foes. This should not detract from other, more nuanced data types. Take business meta-data, for example. The correlation between a Cloud customer’s feature roll-out and the resulting traffic surge (or the lack thereof) goes a long way towards revealing strategy, tactics, and competitive stance. Typically, it leads to information (analytics) that the Cloud customer would want to control and keep close to its vest. Would a Cloud provider’s routine telemetry dole out precious insights on a Cloud customer’s business trajectory, and who would have access to this information at the Cloud provider’s end?

I look forward to seeing CSA’s membership grow. Also, I will be interested to track whether CSA will codify best practices and take a stance on specific technology nuggets like the increasingly popular OAuth.

Leave a Comment