Large-Scale Distributed Systems and Middleware (LADiS)

When Ken Birman and his extended research group take a leading role in organizing a workshop, you can rest assured that it’s going to be a top-notch workshop. In the early 90s, I had the fortune to come across Ken Birman, Robbert van Renesse, Werner Vogels, and their group at Cornell working on virtual synchrony, Isis, U-net, Horus, etc. … I drew upon their work when I was at the OSF RI developing real-time distributed Mach OS … and managed to keep an eye on their work ever since. It was great to come down to LADiS and mix with that research crowd again. Sadly, just when I went down memory lane with this group, I happened to learn that Jay Lepreau — another leading light to me and a good, passionate mentor — had passed away the night before.

I had the fortune to travel to LADiS with an esteemed colleague of mine, Randy Shoup. We co-authored and co-delivered this presentation on eBay’s scale-out journey. Judging from the questions and comments during and after our presentation, I would say that the presentation was well received. At LADiS, I enjoyed meeting James Hamilton of MSR. James’ talk and ours resonated on a number of topics related to internet-scale datacenters and their “this is life in a big city” nuances … whenever we went down different avenues, we seemingly complemented one another. Sure thing, I will be reading his blog from now on.

From the LADiS technical program, I single out the sessions on data collection/dissemination and resource management as the most relevant to my work. I will dig into many of these papers as soon as the proceedings are out. I’m still somewhat cold to Byzantine Fault Tolerance (BFT). I appreciate the intellectual challenge of arbitrary faults. However, I like to think that the application specific context and coding defensive practices (e.g., skeptics) go a long way towards addressing these faults without BFT replication. For what it’s worth, I cannot see myself producing a compelling TCO case for any of the BFT replication approaches that I have heard about. Specifically, the TCO would need to reflect the expanded operationalization complexity. OTOH, I’m not working in air traffic control environment either…

NOTE: I’ve accepted to work on a paper that summarizes the key themes and points heard at LADiS.

Comments are closed.