Archive for Opinion

Redress your apps for Cloud

This week, Alex Stamos of iSEC Partners visited and gave a great talk titled “Securely Moving Your Business into the Cloud”.  Much of that material is publicly available here. Alex is a straight shooter and a straight talker.  By the second slide, he’s already warmed up and delivers quite a punch line:  You cannot securely move into the cloud without re-writing your software.

I subscribe to that line. And there’s more to it than security. Earlier on, I’ve reached the same conclusion when thinking about availability and all the *-abilities that an enterprise needs for its business-critical operations.

Every so often, the IT industry falls for the holy grail of horizontally scaling applications, blindly and effortlessly, without touching a line of code. It happened with Grid Computing before Clouds. The early wins in their respective stomping grounds (HPC for Grids, entrepreneurs for Cloud) don’t necessarily scale to become F500 wins. Rather, reality sinks in, that one needs to rework the application stack and, worse yet, needs to recruit several PhD types to do that. We cannot defy gravity nor the laws of distributed systems.

In learning this all over again, there’s some forward progress. Those who venture into retooling their stack will most likely achieve superior security and *-abilities in general. In their dollar and sense considerations, they will have to contrast Cloud savings with the budget and timeline to implement and operationalize the new stack. Some others will justifiably punt and wait for a Hail Mary pass* by whatever will come next after Grid and Cloud.

*Not quite Dave Patterson’s Hail Mary pass, even though there’s a striking similarity with what’s happening with multi-core at micron scale and the annex arguments pro/con application re-write.

Leave a Comment

Cloud: Dark Side, Good Side

I found some good food for thoughts in Dave Durkee’s article “Why Cloud Computing will never be free” in the May issue of Communications of the ACM (did I ever say how much I enjoy reading CACM, lately?)

Competitive price pressures threaten to down spiral Cloud service quality. Those consumers who nickel and dime the commodity goods served by the Cloud (e.g., computing cores or gigabytes of storage) will get a taste of their own medicine, as Cloud providers nickel and dime them back just the same or more.  The supply-side “shop of horrors” that Durkee documents for Cloud is a scary one:

  • Murky pricing models or time commitments
  • Grossly oversubscribed resources
  • Silent traffic engineering downgrades (e.g., from 10Mbit/s down to 1Mbit/s)
  • Recycling of failed disk drives
  • Unattainable service uptimes (say  4 or 5 nines) with clement penalty clauses (say 10% discount) whose only purpose is to give the provider the bragging rights on those many nines
  • Undefined performance auditing benchmarks
  • SLA miss refunds that stretch out a customer’s time commitment (e.g., annualized refunds only)

These syndromes down-spiral the Cloud and set it further away from an enterprise’s needs.

Service level management (SLM) is the key to revert the direction and pull up the Cloud to enterprise-class service. Enterprises will want translate their business imperatives into service level objectives (SLOs), use them in the SLA negotiation with the Cloud provider, then monitor hits and misses. But SLM is no small feat. At either end of the service demarc line, there has to be complex logic to manage those SLOs and apply proper compensating actions upon exception. It’s no surprise that the crown jewels of companies like FedEx or Santa Fe Railway are the SLM logic that maximizes shareholder value out of the least amount of commodity resources (Fred Smith didn’t invent new airports nor airplanes).

These aspects didn’t go unnoticed to the Utility Computing camp (UC being the fad just before Cloud). They standardized a protocol, WS-Agreement, to manage a SLA throughout its lifecycle. May some of that experience be leveraged in the new world.

Sent from my iPad

Leave a Comment

Sports that scale: Soccer

I’m a huge fan of football round kind. Every four years, I take the time to follow the FIFA World Cup and keep tags on nearly all the 32 teams that start off.

The FIFA tournament funnels large, geographically disperse audiences onto relatively few events (if compared to more spread-out calendars like the Olympics’). We are barely mid-way and am already seeing the World Cup matches making “dents” to our e-commerce traffic traces, starting with the national-level traces. Their W shape clearly marks the first half of a match (traffic is significantly depressed for 45 minutes), then the interval (traffic way, way up), and the second half (traffic down again for some 45 minutes more). Italy was the country showing the most pronounced dents among the ones that I surveyed (but no more of that, given their early toss out). The colleagues in the NOC must be aware of this happening and tease these symptoms apart from, say, a problem with backbone routers.

As the tournament progresses, those dents surface from a national-level to a super-national level (e.g., pan-European level). Eventually, they will make an appearance in the world-wide roll-up of all traces once semi-finals or finals take place. That will be the pulse of a planet. This year only, let’s call it the WWWzela effect :)

It’s interesting to debate whether these dents in the traces will be more or less pronounced compared to 4 years back. Several factors tip my expectations up or down:

(+) increasingly, Internet access is a commodity and overall traffic grows at a good clip year over year, worldwide;
(+) online content has grown manifold too, giving folks more reasons to be online before/after a match (e.g., sports commentaries, friends and family chats, etc.);
(+) the application bias is more significant, meaning that (say) social web features and e-commerce features will exhibit different levels of perturbation before/during/after match;
(-) compliments of wifi, smartphones, etc., more audiences are untethered and can now multi-task effectively during a match;
(-) DVRs, VOD, web video services are making tape-delay more practical than ever, thus eroding synchronization effects around any timed event.

Now, for another scaling dimension…

Some eleven basketball courts or so can be tiled over a soccer pitch. Yet, there is a single referee in a pro soccer match vs. three referees in a NBA match. Isn’t this a blatant scaling anomaly? Yes, it surely sounds like, though it’s basketball that got it wrong! As Ed Felten aptly puts it, the soccer rules are designed to scale way down and give any amateurs’ team the thrill of playing a match with precisely the same rules that the pros use. Nowhere is this more evident than in Brazil, where I can easily see legions of footballers of all ages and skillsets totally at ease with football’s minimalist prerequisites and ways to officiate a match.  There will always be blatant mistakes by referees (oops, I just saw one today morning). In absence of malice and conspiracy, they will even out, despite the immediate heartburns. That’s pretty good scaling to me.

Leave a Comment

10 Issues with smartphone apps

Someone best characterized application vs. platform in just a dozen words, as follows: A good application never surprises, a good platform never stops to surprise (I’d love to give proper credits, if someone is kind enough to provide me the citation).

I continue to be quite impressed with the two smartphone platforms that I dug into, iPhone and Android. They never stop to surprise me on the positive side with their nuggets of enabling technology.

I do have quite a few issues with their applications and the way they are written. Alas, they surprise me when and where they really shouldn’t. Here’s a list of 10 top of mind issues in no particular order:

  1. Unexpected entitlements. Some applications are more equal than others. For instance, try signing-out from your primary gmail account on Android. It won’t work unless the whole device is wiped clean;
  2. Power efficiency. Some applications turn the radio on very often and can even be quite chatty whenever they do so. In absence of a “green rating” for applications, it’s a trial and error process of loading some applications and then discovering that battery autonomy has suddenly tanked compliments of a “fat” application in that mix;
  3. Applications work unless they don’t. It’s hard to know why an application suddenly gets into the habit of aborting launch. It silently goes back to being a cute square icon, ready to fail again just the same;
  4. Stale coding practices. The application development environments don’t leverage any of the new ideas in software engineering, like Ruby on Rails with its built-in unit/functional testing;
  5. Bloomingdale’s and the bazaar. Paraphrasing E. Raymond, there seem to be just two styles of application store emerging: the exclusive velvety one (iTunes, Ovi) and the open messy one (Android). It would be nice to see some hybrid concepts emerging. It will be a pity if the smartphone software channels are already fully ossified this early in the game;
  6. Password sprawl. Without a widespread identity infrastructure, I’m forced to set passwords in as many different applications and have their renewal/challenges hanging on me. Intriguingly, the latter too change in frequency and style with the application, thus making it a really fragmented experience and a race towards lower grade security policies (i.e., simple passwords with the longest expiration intervals possible);
  7. Back-end password handling. Without a widespread identity infrastructure, chances are that for a given application the database of subject’s secrets and the subject’s application data get collocated into the same Cloud and the same logical slice therein. This is what my colleague Gunnar Peterson colorfully describes as loading dynamite and detonator onto the same truck;
  8. Porous sandboxes. The sandbox that an application operates in has several back-alley read/write access pathways to free-for-all data (e.g., the keyboard cache and address book on the iPhone, as described here), thus creating opportunities for Trojans and covert channels;
  9. Panta rei. After I stumble upon a really clever application and make it part of my daily life, it’s quite likely that another vendor will pick on the same good idea and apply some healthy one-upmanship to improve it. Thus, I regularly have the dilemma, whether to stick to the data accrued thus far or start fresh on a brand new application, without any migration capability in sight;
  10. Cloakers and phishers. Some applications mean big business and naturally attract ill-intentioned copycats. There are just so many pixels to copy. Current defenses are mainly non-technical – e.g., the presence in the iTune store hinges on relationships between vendor, Apple, and the user community. They are not as effective in the bazaar style of application store.

I don’t believe in the rise of mobile multi-platform application frameworks (other than WebKit, that is), nor do I believe in unicorns.

However, I’m firmly convinced that smartphones will pull through advances in software – be it on gadget, on cloud, or identity infrastructure  – much as they have already done for the 3G telco infrastructure.

Leave a Comment

iPhone pulls through AT&T infrastructure

Like in a Petri dish, I keep observing how the iPhone single-handedly pulls the roadmap of a telco infrastructure. Both iPhone and AT&T wireless infrastructure are expanding at torrid pace and beyond the wildest imagination (to an outside observer like me at least). The reaction is amplified by Apple’s single-track mind to perfect a user experience and their exclusive deal with a carrier – in short, a monoculture. No ounce of pull force gets lost. The 1-2 jolt that has developed from Apple to AT&T is a new baseline for textbooks.

A recent report confirms that AT&T has done good in its intent to improve its 3G download/upload throughput. Improvements stem from the roll-out of HSPA 7.2 (besides the sheer new capacity thrown at the problem). Broad technology advances in beamforming, multiple-input multiple-output communications (MIMO) and orthogonal frequency division multiplexing hint that there’s quite a headroom for further scale-outs over the next 3-5 years.

I’ve sampled the AT&T improvements directly using the excellent, free Xtreme Speedtest application. For extra credit, I can go multi-platform and run this same application at the same place and time on both my iPhone/AT&T and Droid/Verizon. The speed of a web browsing session would otherwise be highly subjective and dominated by the browser’s own effectiveness.

In a previous blog, I described the “wheel of innovation” looping over the following steps:

  1. New infrastructure build-outs
  2. Leading to faster/broader connectivity
  3. Making it a breeding ground for new applications
  4. Some of them reaching viral spread, network effect, etc. resulting in larger addressable markets
  5. Thus creating demand for more/different infrastructure

(loop back to 1.)

We have gone from step 5 to steps 1 to 2 (even though I have no basis to comment on coverage – I will steer clear of blue vs. red maps…) Now that the infrastructure shortcomings are beyond us, along with troubling rumors of usage tarifs, I’m eager to see a new breed of applications (steps 3 and 4).

In a subsequent post, I will share my wish list on what iPhone and smartphones in general can and should pull through in software infrastructure.

Leave a Comment

Web-track me if you can

This week, slashdot called my attention to EFF’s effort to level set the community on web tracking — how unique (and traceable) does my browser make me look when I visit a web site?  This new EFF site returns my overall score along with the break down of its factors (like plugin details, screen size, system fonts, cookie handling). For instance, it tells me that the Safari fingerprint generated off of my Mac is still unique among the half-million fingerprints on file at the EFF.

This is a great example of crowd-sourcing at work. The more participants, the better the study. EFF’s work gets a huge boost from being slashdotted. Moreover, EFF is no .com and doesn’t  have the halo of big-brother or world domination.

How does one know when the samples have hit a critical mass leading to a reasonably accurate model? It’s a recurring conundrum for both frequentists and Bayesians.

I agree with EFF’s view that a smartphone’s browser is due to show lesser entropy. That kind of browser is less likely to veer from stock config. To witness, my iPhone browser scored 1 in 1,442 uniqueness (10.49-bit entropy) and my Android browser scored 1 in 8,513 uniqueness (13.06-bit entropy). To the previous point, it’s unclear how many smartphones have hit the EFF site altogether.

This smartphone/browser early conclusion should not be generalized to native apps running on a smartphone. These native apps can yield the richest fingerprint features yet. They can draw upon some sophisticated UUID and TPM schema in system software, with the SDKs exposing programmatic access, resulting in stronger software/hardware linkages than their desktop/laptop equivalents. Today, the limiting factors here have to do with policy – e.g., a vendor’s authorization to export off-device the UUID material that is key to its own DRM.

Leave a Comment

Generativity!!

The word generativity jumps at me while I’m reading Jonathan Zittrain’s new book, “The Future of the Internet – and How to Stop it”. Zittrain defines generativity as  a “system’s capacity to produce unanticipated change through contributions from broad and varied audiences”. Internet, PC, wiki/wikipedia best exemplify generativity. It’s “generativity” what I had in mind and tried to say when I wrote about Internet’s virtuous wheel of innovation.

Generativity hits home. It’s the reason why I’m so genuinely interested in the Android platform (I got to carry one such phone alongside my iPhone). It’s why I put my TV set in early retirement and replaced it with an Internet-enabled one equipped with widget SDK – a generative TV in the making, hopefully. I know that I have given and will be giving my 150% in those jobs that have to do with generative artifacts (luckily, I have had a few of those jobs throughout my career).

Generativity is quite a litmus test for new directions in technology. Take cloud computing. Does it mark a new epoch in generativity? Or is this a mere TCO optimizer?

For sure, security, regulations, net-neutrality pose some great challenges to our collective journey in generativity. I look forward to reading the second half of Zittrain’s book and learning about his proposed solutions.

Zittrain came to visit us at eBay and gave an excellent lecture on “Minds for Sale” — an eye opener on both the positive and negative outcomes of long-tail participation in cyberspace.

Leave a Comment

Google Chrome OS and the Tarte Tatin

The Tarte Tatin is an upside apple cake. It used to be my favorite dessert when I lived in France. Yum.

Eating a Tarte Tatin on a lovely summer afternoon while catching up on Google Chrome OS (yeah, I’ve fallen way behind due to my ever demanding day job plus a pile of papers to review out of conference TPC duties).

Google Chrome OS (and other browser OS wannabes) makes me think of an upside cake, just like the Tarte Tatin. Let me explain. In the mid 90s, the Web browser rocketed into the scene. It became the pinnacle of our stack. Fast forward 15 years. With the Google Native Client, one can load and launch native x86 code in the browser without giving up on security (what could possibly be worse than PHP anyway…). Application management is quickly moving to the Cloud (SaaS, PaaS, the-whole-Enchillada-as-a-Service). Likewise, resource management has to play out in the Cloud. Thus, the new-wave browser must underpin both application management and resource management. The browser has become a shim layer buried deep near the bottom of the stack. Voila the upside down cake.

Have we seen other examples of upside down cakes in technology? For sure. Take the Internet. In the 70s, the revolutionary packet networking movement started off as a geeky use case that piggybacked on the very circuit switched network laid out for telephony. This set-up worked well for a long time, until data traffic outweighed voice traffic, in sheer volumes as well as business pull-through. The packet network then moved to the bottom of the pile, with telephony running as an application (VoIP) atop of it, along many others. Voila another upside down cake.

Legend has it that the Tarte Tatin was the lucky byproduct of a bad day in the kitchen. Unlike the Tarte Tatin, there’s little serendipity in what’s happening to the browser and what has happened to the Internet long before. Rather, they are huge R&D undertakings. In my career, I want to see some more of these upside down cakes! Along with chilled passito wine, please, for which I don’t have a geeky metaphor just yet.

Leave a Comment

Cores’ spread raises bar in concurrency

Over the last quarters, I spent much time developing the case (ROI, TCO, etc.) for the latest multi-core processors and their yield, measured in transactions/$ and transactions/watt.

Flashback. ‘Twas the end of the 80s and I was a jr. engineer hard at work to get a 4-way 68020 SMP Unix box to perform reasonably well by placing locks in a recalcitrant SVR2.4 kernel. David Cheriton (or was this AST?) quipped that one could either work allnighters for 18 months to figure out all the locks, or else could go to the beach for just as long, come back, and expeditiously plug the CPU du-jour into a uniprocessor with a huge gain over the SMPs with yesteryear’s silicon. This figurated view of Moore’s law hit home. I went on to  find some new challenges (note:  microkernels; no beach).

Fast forward twenty years, and we hit our head on the ceilings of clock frequency and gate density. We have no choice left but run a multi-socket multi-core setup flat out. The superior CPU horsepower and memory hierarchy quickly surface the concurrency shortcomings in our code. The performance line tops off and then turns South.

So, let’s take on concurrency head on. My colleagues recently went to JavaONE and gave a good, well-received run down of their lessons learned in Java concurrency, resulting in some practical patterns and anti-patterns.  Do try them at home!

Sangjin Lee (eBay), Debashis Saha (eBay), Mahesh Somani (eBay), “Robust and Scalable Concurrent Programming: Lessons from the Trenches”. Here’s a before/after flashcard gleaned from their presentation. The full presentation is up for free download here.

javaone
There’s another side to this story: The memory wall. It’s just as important to single-out and rework those constructs that get in the way of L2/L3 cache efficiency, like HashMaps and the traversals of linked lists. Furthermore, we like to have a systemic way to manage and leverage any NUMA-ness in our systems.

I list hereafter topics that I’m highly interested in and will be following:

  • Post core-spread principles for kernel re-design, like Robert Morris’ Corey that I profiled earlier on; I anticipate that this year’s SOSP will feature quite a few papers in this space;
  • Java-only production stacks for which there is (at least) one layer too many between hypervisor, kernel, and JVM, and beg for due simplifications;
  • Machine-learning techniques to manage the combinatorial explosion of configuration knobs-and-dials and their inter-dependencies, like Ganapathi’s HotPar09 paper;
  • Transactional memory (I read a good article by Drepper on the Feb issue of CACM);
  • Access to all hardware counters that can inform tuning (you can’t manage what you can’t measure);
  • Share-nothing languages like Scala actors or the re-discovered Erlang (which dates back to  just about the same time of my flashback in the opening).

Some interesting times for sure!!!

Leave a Comment

Cloud Security Alliance’s Document

With the Security Guidance document, the newly formed Cloud Security Alliance is off to a solid start. I read the white paper with interest. I like to think that many focus areas for the CSA and the Cloud security community at large stem from one simply-stated root cause: Trust ain’t a transitive property.

Among things, the document addresses the concerns on accountability that I had raised on this blog.

Some musings after reading the CSA document:

We have always built systems in observance of least privilege. What’s the actual least privilege for a Cloud provider? Let’s pick a provider of the IaaS persuasion. No root access to guest virtual machines. No root access to virtual load balancers, virtual switches and virtual firewalls. What else can be meaningfully taken out of a provider’s key chain, without compromising on site stability and service availability? Meanwhile, a Cloud user will do well with more than one line of defense. For one, I like what the Overshadow researchers are doing to protect application data in the event of OS compromise. It won’t make data impenetrable. It does make it a whole lot harder to get to, forcing a new round of cat & mouse chase.

The argument that in a Cloud one should know the neighbor bears some fallacies. Knowledge does not imply control. Yet, it’s tempting to blur this line. For example, false security sets in among some engineers using a Cloud – that they have some deterministic control over resource sharing with other neighboring Cloud tenants. Some cubicles away, the procurement/legal colleagues who negotiated that Cloud agreement know all too well that they have no control nor leverage. In this example, Cloud tenants change just like weather does (uhm, may be the “Cloud” moniker isn’t a bad choice after all!).

Naturally, personal identifiable information (PII) is a defining embodiment of data worth securing against foes. This should not detract from other, more nuanced data types. Take business meta-data, for example. The correlation between a Cloud customer’s feature roll-out and the resulting traffic surge (or the lack thereof) goes a long way towards revealing strategy, tactics, and competitive stance. Typically, it leads to information (analytics) that the Cloud customer would want to control and keep close to its vest. Would a Cloud provider’s routine telemetry dole out precious insights on a Cloud customer’s business trajectory, and who would have access to this information at the Cloud provider’s end?

I look forward to seeing CSA’s membership grow. Also, I will be interested to track whether CSA will codify best practices and take a stance on specific technology nuggets like the increasingly popular OAuth.

Leave a Comment