Rackspace versus Amazon: The big data edition

By Derrick Harris

Rackspace is busy building a Hadoop service, giving the company one more avenue to compete with cloud kingpin Amazon Web Services. However, the two services — along with several others on the market — highlight just how different seemingly similar cloud services can be.
Rackspace has been on a tear over the past few months releasing new features that map closely to the core features of the Amazon Web Services platform, only with a Rackspace flavor that favors service over scale. Its next target is Amazon Elastic MapReduce, which Rackspace will be countering with its own Hadoop service in 2013. If AWS and Rackspace are, indeed, the No. 1 and No. 2 cloud computing providers around, it might be easy enough to make a decision between the two platforms.
In the cloud, however, the choices are never as simple as black or white.

Amazon versus Rackspace is a matter of control

Discussing its forthcoming Hadoop service during a phone call on Friday, Rackspace CTO John Engates highlighted the fundamental product-level differences between his company and its biggest competitor, AWS. Right now, for users, it’s primarily a question of how much control they want over the systems they’re renting — and Rackspace comes down firmly on the side of maximum control.

John Engates
For Hadoop specifically, Engates said Rackspace’s service will “really put [users] in the driver’s seat in terms of how they’re running it” by giving them granular control over how their systems are configured and how their jobs run (courtesy of the OpenStack APIs, of course). Rackspace is even working on optimizing a portion of its cloud so the Hadoop service will run on servers, storage and networking gear designed specifically for big data workloads. Essentially, Engates added, Rackspace wants to give users the experience of owning a Hadoop cluster without actually owning any of the hardware.
“It’s not MapReduce as a service,” he added, “it’s more Hadoop as a service.”
The company partnered with Yahoo spinoff Hortonworks on this in part because of its expertise and in part because its open source vision for Hadoop aligns closely with Rackspace’s vision around OpenStack. “The guys at Hortonworks are really committed to the real open source flavor of Hadoop,” Engates said.
Rackspace’s forthcoming Hadoop service appears to contrast somewhat with Amazon’s three-year-old and generally well-received Elastic MapReduce service. The latter lets users write their own MapReduce jobs and choose the number and types of servers they want, but doesn’t give users system-level control on par with what Rackspace seems to be planning. For the most part, it comports with AWS’s tried-and-true strategy of giving users some control of their underlying resources, but generally trying to offload as much of the operational burden as possible.
Elastic MapReduce also isn’t open source, but is an Amazon-specific service designed around Amazon’s existing S3 storage system and other AWS features. When AWS did choose to offer a version of Elastic MapReduce running a commercial Hadoop distribution, it chose MapR’s high-performance but partially proprietary flavor of Hadoop.

It doesn’t stop with Hadoop

Rackspace is also considering getting into the NoSQL space, perhaps with hosted versions of the open source Cassandra and MongoDB databases, and here too it likely will take a different tact than AWS. For one, Rackspace still has a dedicated hosting business to tie into, where some customers still run EMC storage area networks and NetApp network-attached storage arrays. That means Rackspace can’t afford to lock users into a custom-built service that doesn’t take their existing infrastructure into account or that favors raw performance over enterprise-class features.
Rackspace needs stuff that’s “open, readily available and not unique to us,” Engates said. Pointing specifically to AWS’s fully managed and internally developed DynamoDB service, he suggested, “I don’t think it’s in the fairway for most customers that are using Amazon today.”
Perhaps, but early DynamoDB success stories such as IMDb, SmugMug and Tapjoy suggest the service isn’t without an audience willing to pay for its promise of a high-performance, low-touch NoSQL data store.

Which is better? Maybe neither

There’s plenty of room for debate over whose approach is better, but the answer for many would-be customers might well be neither. When it comes to hosted Hadoop services, both Rackspace and Amazon have to contend with Microsoft’s newly available HDInsight service on its Windows Azure platform, as well as IBM’s BigInsights service on its SmartCloud platform. Google appears to have something cooking in the Hadoop department, as well. For developers who think all these infrastructure-level services are too much work, higher-level services such as Qubole, Infochimps or Mortar Data might look more appealing.
The NoSQL space is rife with cloud services, too, primarily focused on MongoDB but also including hosted Cassandra and CouchDB-based services.
In order to stand apart from the big data crowd, Engates said Rackspace is going to stick with its company-wide strategy of differentiation through user support. Thanks to its partnership with Hortonworks and the hybrid nature of OpenStack, for example, Rackspace is already helping customers deploy Hadoop in their private cloud environments while its public cloud service is still in the works. “We want to go where the complexity is,” he said, “where the customers value our [support] and expertise.”