Archive for Papers

Teach programming to your littl’ digital natives

In my monthly CACM issue, I found a delightful and somewhat unusual article on “Scratch“. With Scratch, Mitch Resnick et al.  at the MIT Media Lab have created a programming environment with the lowest up front investment for children and teenagers. As you would expect in a platform that speaks to digital natives, Scratch comes with a host of rich media and social networking components built in.

My children love Scratch. They were able to program in Scratch and do things that appealed to them from the very first session. I like them to spend time with Scratch because it lifts the curtain on how computer games and digital entertainment work. It stimulates their creativity and a can-do attitude towards technology.

In the mid ’90s, I had the fortune to meet Mitch Resnick at the Media Lab. My company back then was a top tier sponsor. I saw the first prototypes of what became Lego Mindstorms (whose programming user experience put the early seeds for Scratch). It’s fascinating how Resnick repeatedly gets it. He might as well be the Steve Jobs of under age computer human interface.

Leave a Comment

Cores’ spread raises bar in concurrency

Over the last quarters, I spent much time developing the case (ROI, TCO, etc.) for the latest multi-core processors and their yield, measured in transactions/$ and transactions/watt.

Flashback. ‘Twas the end of the 80s and I was a jr. engineer hard at work to get a 4-way 68020 SMP Unix box to perform reasonably well by placing locks in a recalcitrant SVR2.4 kernel. David Cheriton (or was this AST?) quipped that one could either work allnighters for 18 months to figure out all the locks, or else could go to the beach for just as long, come back, and expeditiously plug the CPU du-jour into a uniprocessor with a huge gain over the SMPs with yesteryear’s silicon. This figurated view of Moore’s law hit home. I went on to  find some new challenges (note:  microkernels; no beach).

Fast forward twenty years, and we hit our head on the ceilings of clock frequency and gate density. We have no choice left but run a multi-socket multi-core setup flat out. The superior CPU horsepower and memory hierarchy quickly surface the concurrency shortcomings in our code. The performance line tops off and then turns South.

So, let’s take on concurrency head on. My colleagues recently went to JavaONE and gave a good, well-received run down of their lessons learned in Java concurrency, resulting in some practical patterns and anti-patterns.  Do try them at home!

Sangjin Lee (eBay), Debashis Saha (eBay), Mahesh Somani (eBay), “Robust and Scalable Concurrent Programming: Lessons from the Trenches”. Here’s a before/after flashcard gleaned from their presentation. The full presentation is up for free download here.

javaone
There’s another side to this story: The memory wall. It’s just as important to single-out and rework those constructs that get in the way of L2/L3 cache efficiency, like HashMaps and the traversals of linked lists. Furthermore, we like to have a systemic way to manage and leverage any NUMA-ness in our systems.

I list hereafter topics that I’m highly interested in and will be following:

  • Post core-spread principles for kernel re-design, like Robert Morris’ Corey that I profiled earlier on; I anticipate that this year’s SOSP will feature quite a few papers in this space;
  • Java-only production stacks for which there is (at least) one layer too many between hypervisor, kernel, and JVM, and beg for due simplifications;
  • Machine-learning techniques to manage the combinatorial explosion of configuration knobs-and-dials and their inter-dependencies, like Ganapathi’s HotPar09 paper;
  • Transactional memory (I read a good article by Drepper on the Feb issue of CACM);
  • Access to all hardware counters that can inform tuning (you can’t manage what you can’t measure);
  • Share-nothing languages like Scala actors or the re-discovered Erlang (which dates back to  just about the same time of my flashback in the opening).

Some interesting times for sure!!!

Leave a Comment

As good as it gets…

Renowned DBMS leaders (including DeWitt and Stonebraker) just published a paper in which they contrast the DBMS magnum opus and the green-ish, increasingly popular MapReduce paradigm. This work will be presented at SIGMOD in a couple of months. Before then, you can get a sneak preview here.

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. Dewitt, Samuel Madden, and Michael Stonebraker, “A Comparison of Approaches to Large-Scale Data Analysis,” in SIGMOD 2009: Proceedings of the 2009 ACM SIGMOD International Conference, July 2009 (Providence, RI)

Back on January 2008, DeWitt and Stonebraker made some waves with their op-ed titled “MapReduce, a major step backwards”. This new paper offers far more nuanced claims, with the benefit of empirical data.

Without venturing into oversimplifying such claims, I was struck by observations such as: “we were impressed by how easy Hadoop was to set up and use in comparison to the databases” and “extensibility was another area where we found the database systems we tested lacking”.

May a constructive tussle benefit both camps, as there seems to be work left at either side, regardless of how long a journey they have been in. Plus, there will be hybrid forms.

In practical terms, I expect that DBMS and MapReduce will continue to exhibit very different TCO models and thus will be quite easy to set apart for a given use case (with the caveat that one’s own TCO model will be different).

Leave a Comment

Cloud Security Alliance’s Document

With the Security Guidance document, the newly formed Cloud Security Alliance is off to a solid start. I read the white paper with interest. I like to think that many focus areas for the CSA and the Cloud security community at large stem from one simply-stated root cause: Trust ain’t a transitive property.

Among things, the document addresses the concerns on accountability that I had raised on this blog.

Some musings after reading the CSA document:

We have always built systems in observance of least privilege. What’s the actual least privilege for a Cloud provider? Let’s pick a provider of the IaaS persuasion. No root access to guest virtual machines. No root access to virtual load balancers, virtual switches and virtual firewalls. What else can be meaningfully taken out of a provider’s key chain, without compromising on site stability and service availability? Meanwhile, a Cloud user will do well with more than one line of defense. For one, I like what the Overshadow researchers are doing to protect application data in the event of OS compromise. It won’t make data impenetrable. It does make it a whole lot harder to get to, forcing a new round of cat & mouse chase.

The argument that in a Cloud one should know the neighbor bears some fallacies. Knowledge does not imply control. Yet, it’s tempting to blur this line. For example, false security sets in among some engineers using a Cloud - that they have some deterministic control over resource sharing with other neighboring Cloud tenants. Some cubicles away, the procurement/legal colleagues who negotiated that Cloud agreement know all too well that they have no control nor leverage. In this example, Cloud tenants change just like weather does (uhm, may be the “Cloud” moniker isn’t a bad choice after all!).

Naturally, personal identifiable information (PII) is a defining embodiment of data worth securing against foes. This should not detract from other, more nuanced data types. Take business meta-data, for example. The correlation between a Cloud customer’s feature roll-out and the resulting traffic surge (or the lack thereof) goes a long way towards revealing strategy, tactics, and competitive stance. Typically, it leads to information (analytics) that the Cloud customer would want to control and keep close to its vest. Would a Cloud provider’s routine telemetry dole out precious insights on a Cloud customer’s business trajectory, and who would have access to this information at the Cloud provider’s end?

I look forward to seeing CSA’s membership grow. Also, I will be interested to track whether CSA will codify best practices and take a stance on specific technology nuggets like the increasingly popular OAuth.

Leave a Comment

LADiS proceedings on-line, w/ summary paper

Earlier on, I wrote about the workshop on Large-scale Distributed Systems and Middleware (LADiS 2008) that I attended and how much I enjoyed it.

In fact, I’ve joined some esteemed colleagues and co-authored a paper that summarizes the key thoughts and discussion topics that we heard at this event. It has now been published along with the revised version of the material that was originally delivered at LADiS.

The bibtex for the summary paper is as follows:

@inproceedings{1529976,
author = {van Renesse,, Robbert and Rodrigues,, Rodrigo and Spreitzer,, Mike and Stewart,, Christopher and Terry,, Doug and Travostino,, Franco},
title = {Challenges facing tomorrow’s datacenter: summary of the LADiS workshop},
booktitle = {LADIS ‘08: Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware},
year = {2008},
isbn = {978-1-60558-296-2},
pages = {1–7},
location = {Yorktown Heights, New York},
doi = {http://doi.acm.org/10.1145/1529974.1529976},
publisher = {ACM},
address = {New York, NY, USA},
}

The LADiS workshop has now “graduated” into a recurring event that in 2009 will be jointly held with the 22nd ACM Symposium on Operating Systems Principles (SOSP 2009). I’m part of the Technical Program Committee and will be a strong advocate for the event, starting from this blog.

Leave a Comment

New directions in datacenter switching

I read a couple of highly intriguing research papers on next generation datacenter switching, Monsoon and SEATTLE.  They develop a Cloud-friendly view wherein a large-scale datacenter features:

  • Flat plug-and-play addressing eliminating any server fragmentation
  • High bisection bandwidth
  • VM enablement
  • Cost efficiencies at large scale

by way of:

  • Huge and STP-free L2 domain with up ~10^5 servers in it
  • IP presence limited to connecting the datacenter to the Internet
  • Custom control plane and/or custom DHTs

I buy the spirit of these new-wave requirements, albeit with some caveats.

I will work hard to drive requirements strictly top-down from my applications and their own modus operandi down to the network, before I sign on a blank check for a anyone-to-anyone dynamic network environment.  As a case in point, let’s assume that I have a design pattern by which my applications are either stateless or have their state fully externalized (in fact, it’s one of the design principles at eBay). From this, I derive that I will not live migrate virtualized application instances and will use simple create/destroy semantics instead. If I don’t have to worry about live migration, my network closet and my associated processes will begin to look a whole lot simpler. [This is quite something to admit for one who set a live migration benchmark back in 2005!]

In a Cloud provider scenario, the top-end does look open to any application style and its opposite. Is it really so and do we need to be all inclusive? I believe that we can still handily contain the requirements posed to the network by thinking in terms of abstracted tiers (each tier is what is horizontally-scaled to the customer, independently). Furthermore, as we look up the chain, the various PaaS stipulations provide a host of cues in terms of partitionable, directional, tiered workloads.

Lastly, for these ideas to be operationalized at scale, the new control plane(s) will need to earn quite some trust, just like any other foundational piece. After all these years, we are still very scared of STP flaps and their turning into a SPOF for the datacenter.

I enjoyed reading these papers and am grateful for their out-of-the-box, stimulating thoughts.

Is there a rose without thorns, an Ethernet without STP?

Leave a Comment

Corey

I came across this excellent OSDI 08 paper by Robert Morris & team at MIT. They look into the widening gap between traditional system software and many-core hardware. Their approach is to zero on needlessly shared kernel fixtures and to seek-out application’s participation, for the application to sanction what really needs to be shared and amongst which things, no more and no less.

I fully resonate with the problem statement and the solution scope. We are actively moving from 8-core to 16-core servers and are stumbling precisely on these issues.  These days, I repeat myself that “what got you here won’t get you there”.  More of this journey in upcoming blog entries.

We often say that a picture is worth a thousand words. Their figure 2 is just brilliant. It really put my finger on the disparity in memory access timings among the 16 cores. 

Leave a Comment