Cores’ spread raises bar in concurrency

Over the last quarters, I spent much time developing the case (ROI, TCO, etc.) for the latest multi-core processors and their yield, measured in transactions/$ and transactions/watt.

Flashback. ‘Twas the end of the 80s and I was a jr. engineer hard at work to get a 4-way 68020 SMP Unix box to perform reasonably well by placing locks in a recalcitrant SVR2.4 kernel. David Cheriton (or was this AST?) quipped that one could either work allnighters for 18 months to figure out all the locks, or else could go to the beach for just as long, come back, and expeditiously plug the CPU du-jour into a uniprocessor with a huge gain over the SMPs with yesteryear’s silicon. This figurated view of Moore’s law hit home. I went on to  find some new challenges (note:  microkernels; no beach).

Fast forward twenty years, and we hit our head on the ceilings of clock frequency and gate density. We have no choice left but run a multi-socket multi-core setup flat out. The superior CPU horsepower and memory hierarchy quickly surface the concurrency shortcomings in our code. The performance line tops off and then turns South.

So, let’s take on concurrency head on. My colleagues recently went to JavaONE and gave a good, well-received run down of their lessons learned in Java concurrency, resulting in some practical patterns and anti-patterns.  Do try them at home!

Sangjin Lee (eBay), Debashis Saha (eBay), Mahesh Somani (eBay), “Robust and Scalable Concurrent Programming: Lessons from the Trenches”. Here’s a before/after flashcard gleaned from their presentation. The full presentation is up for free download here.

There’s another side to this story: The memory wall. It’s just as important to single-out and rework those constructs that get in the way of L2/L3 cache efficiency, like HashMaps and the traversals of linked lists. Furthermore, we like to have a systemic way to manage and leverage any NUMA-ness in our systems.

I list hereafter topics that I’m highly interested in and will be following:

  • Post core-spread principles for kernel re-design, like Robert Morris’ Corey that I profiled earlier on; I anticipate that this year’s SOSP will feature quite a few papers in this space;
  • Java-only production stacks for which there is (at least) one layer too many between hypervisor, kernel, and JVM, and beg for due simplifications;
  • Machine-learning techniques to manage the combinatorial explosion of configuration knobs-and-dials and their inter-dependencies, like Ganapathi’s HotPar09 paper;
  • Transactional memory (I read a good article by Drepper on the Feb issue of CACM);
  • Access to all hardware counters that can inform tuning (you can’t manage what you can’t measure);
  • Share-nothing languages like Scala actors or the re-discovered Erlang (which dates back to  just about the same time of my flashback in the opening).

Some interesting times for sure!!!

Comments are closed.