HANA vs. Exalytics: an Analyst's View

来源：互联网发布：拜伦戴维斯数据编辑：程序博客网时间：2024/06/06 18:13

Introduction

SAP has asked me to comment on the "HANA vs. Exalytics" controversy from an analyst's point of view. I think it's an interesting comparison, so I'm happy to comply. In this piece, I'll try to take you through my thinking about the two. As I think you'll see, I don't quite toe the party line for either company. (Please see the end for full disclosure about my relationship with SAP.)

Exalytics

To begin with, let's start with something that every analyst knows: Oracle does a lot of what in the retail trade are called "knockoffs."

They think this is good business, and I think they're right. The pattern is simple. When someone creates a product and proves a market, Oracle (or, for that matter, SAP) creates a quite similar product. So, when VMWare (and others) prove Vsphere, Oracle creates Oracle VM; when Red Hat builds a business model around Linux, Oracle creates "Oracle Linux with Unbreakable Enterprise Kernel."

We analysts know this because it's good business for us, too. The knockoffs--oh, let's be kind and call them OFPs for "Oracle Followon Products"--are usually feature-compatible with the originals, but have something, some edge, which (Oracle claims) makes them better.

People get confused by these claims, and sometimes, when they get confused, they call us. Like any analyst, I've gotten some of these calls, and I've looked at a couple of the OFPs in detail. Going in, I usually expect that Oracle is offering pretty much what you see at Penneys or Target, an acceptable substitute for people who don't want to or can't pay a premium, want to limit the number of vendors they work with, etc., etc., aren't placing great demands on the product, etc.

I think this, because it's what one would expect in any market. After all, if you buy an umbrella from a sidewalk vendor when it starts to rain, it's a good thing and it's serviceable, but you don't expect it to last through the ages.

Of course, with software, it's much more confusing than it is with umbrellas. The software industry is not particularly transparent, so it's really difficult, sometimes, to figure out the truth of the various claims. Almost always, you have to dig down pretty deep, and when you get "down in the weeds," as my customers have sometimes accused me of being, you may end up being right, but you may also fail to be persuasive.

Which brings me to the current controversy. To me, it has a familiar ring. SAP releases HANA, an in-memory database appliance. Now Oracle has released Exalytics, an in-memory database appliance. And I'm getting phone calls.

HANA: the Features that Matter

I'm going to try to walk you through the differences here, while avoiding getting down in the weeds. This is going to involve some analogies, as you'll see. If you find these unpersuasive, feel free to contact me.

To do this, I'm going to have to step back from phrases like "in-memory" and "analytics," because now both SAP and Oracle using this language and look instead at the underlying problem that "in-memory" and "analytics" are trying to solve.

This problem is really a pair of problems. Problem 1. The traditional row-oriented database is great at getting data in, not so good at getting data out. Problem 2. The various "analytics databases," which were designed to solve Problem 1--including, but not limited to the column-oriented database that SAP uses--are great at getting data out, not so good at getting data in.

What you'd really like is a column-oriented (analytics) database that is good at getting data in, or else a row-oriented database that is good at getting data out.

HANA addressed this problem in a really interesting way. They made a database where you get to choose how to treat the data, as either row-oriented or column-oriented. (If you want, imagine there's a software switch that you can throw.) So, if you want to do something that requires read-optimization, that is, the very fast and flexible analytic reporting that column-oriented databases are designed to do, you throw the switch and in effect tell HANA, "I want to run reports." And if you want to do something that requires write-optimization, like doing the transactions that row-oriented databases are designed to do, you throw the switch and tell HANA, "I'm entering a transaction."

Underneath, the data is the same; what this imaginary switch throws is your mode of access to it.

In explaining this to me, my old analyst colleague, Adam Thier, now an executive at SAP, said, "In effect, it's a trans-analytic database." (This is, I'm sure, not official SAP speak. But it works for me.) How do they make the database "trans-analytic?" Well, this is where you get down into the weeds pretty quickly. Effectively, they use the in-memory capabilities to do the caching and reindexing much more quickly than would have been possible before memory prices fell.

[Hand-Waving Alert: After Hasso read an earlier version of this blog, he stopped me and said, "David, you know this row/column 'switch' idea is wrong. There's no magic wand that you wave. It's more complicated than that." As you can see from the comments on the blog post, others objected to the "switch" word as well. So yes, let me acknowledge it. I'm doing some hand-waving here. So, for those of you interested in a more detailed explanation, check out my next blog post, "Row <it>and</it> column."]

There's one other big problem that the in-memory processing solves. In traditional SQL databases, the only kind of operation you can perform is a SQL operation, which is basically going to be manipulation of rows and fields in rows. The problem with this is that sometimes, you'd like to perform statistical functions on the data: do a regression analysis, etc., etc. But in a traditional database, statistical analysis (or sometimes even simple numerical calculations) can be complicated and difficult.

In HANA, though, business functions (what non-marketers call statistical analysis routines) are built into the database. So if you want to do a forecast, you can just run the appropriate statistical function. It's less cumbersome than a pure SQL database. And it's very, very fast; I have personally seen performance improvements of three orders of magnitude.

Exalytics: the Features that Matter

Now when I point out that HANA is both row-oriented (for transactions) and column-oriented (so that it can be a good analytics database) and then I point out that it has business functions built-in, I am not yet making any claim about the relative merits of HANA and Exalytics.

Why? Well, it turns out that with Exalytics, too, you can enter data into a transaction-oriented database and you can do reporting on the data in an analytics database. And in Exalytics, too, you have a business function library.

But the way it's done is different.

In Exalytics, the transactional capabilities come from an in-memory database (the old TimesTen product that Oracle bought a little more than a decade ago). The analytics capabilities come from Essbase (which Oracle bought about 5 years ago), and the business function library is an implementation of the open-source R statistical programming language.

So, Oracle would argue, it has the same features that matter. But, Oracle would also argue, it also has an edge, something that makes it clearly better. If you get Oracle's Exalytics, you're getting databases and function libraries that are tested, tried, and true. TimesTen has been at the heart ofSalesforce.com since its inception. Essbase is at the heart of Hyperion, which is used by much of the Global 2000. And R is used at every university in the country.

Confused? Well, you should be. That's when you call the analyst.

HANA vs. Exalytics

So what is the difference between the two, and does it matter? If you are a really serious database dweeb, you'll catch it right away:

In HANA, all the data is stored in one place. In Exalytics, the data is stored in different places.

So, in HANA, if you want to report on data, you throw that (imaginary) switch. In Exalytics, you extract the data from the Times10 database, transform it, and load it into the Essbase database. In HANA, if you want to run a statistical program and store the results, you run the program and store the results. In Exalytics, you extract the data from, say, Times10, push it into an area where R can operate on it, run the program, then push the data back into Times10.

So why is that a big deal? Again, if you're a database dweeb, you just kind of get it. (In doing research for this article, I asked one of those dweeb types why it was such a big deal, and I got your basic shrug-and-roll-of-eye.)

I can see it, I guess. Moving data takes time. Since the databases involved are not perfectly compatible, one needs to transform the data as well as move it. (Essbase, notoriously, doesn't handle special characters, or at least didn't use to.) Because it's different data in each database, one has to manage the timing, and one has to manage the versions. When you're moving really massive amounts of data around (multi-terabytes), you have to worry about space. (The 1TB Exalytics machine only has 300 GB of actual memory space, I believe.)

One thing you can say for Oracle. They understand these objections, and in their marketing literature, they do what they can to deprecate them. "Exalytics," Oracle says, "has Infiniband pipes" that presumably make data flow quickly between the databases, and "unified management tools," that presumably allow you to keep track of the data. Yes, there may be some issues related to having to move the data around. But Oracle tries to focus you on the "tried and true" argument. So what, it essentially says, if you have to move the data between containers, when each of the containers is so good, so proven, and has so much infrastructure already there, ready to go.

As long as the multiple databases are in one box, it's OK, they're arguing, especially when our (Oracle's) tools are better and more reliable.

Still confused? Not if you're a database dweeb, obviously. Otherwise, I can see that you might be. And I can even imagine that you're a little irritated. "Here this article has been going on for several hundred lines," I can hear you saying, "and you still haven't explained the differences in a way that's easy to understand."

[Update alert. In the comments below, an Exalytics expert says that I've mischaracterized the way customers would actually use Exalytics. If he's right (and I'm sure he is), the two products are not in fact as comparable as Oracle marketing would seem to have it and you should read the rest of this simply as a description of what HANA is all about, just taking it for granted that HANA is sui generic.]

HANA: the Design Idea

So how can you think of HANA vs. Exalytics in a way that makes the difference between all-in-one-place and all-in-one-box-with-Infiniband-pipes-connecting-stuff completely clear? It seems to me that the right way is to look at the design idea that's operating in each.

Here, I think, there is a very clear difference. In TimesTen or Essbase or other traditional databases, the design idea is roughly as follows: if you want to process data, move it inside engines designed for that kind of processing. Yes, there's a cost. You might have to do some processing to get the data in, and it take some time. But those costs are minor, because once you get it into the container, you get a whole lot of processing that you just couldn't get otherwise.

This is a very normal, common design idea. You saw much the same idea operating in the power tools I used one summer about forty years ago, when I was helping out a carpenter. His tools were big and expensive and powerful--drill presses and table saws and such like--and they were all the sort of thing where you brought the work to the tool. So if you were building, say, a kitchen, you'd do measuring at the site, then go back to the shop and make what you needed.

In HANA, there's a different design idea: Don't move the data. Do the work where the data is. In a sense, it's very much the same idea that now operates in modern carpentry. Today, the son of the guy I worked for drives up in a truck, unloads a portable table saw and a battery-powered drill, and does everything on site and it's all easier, more convenient, more flexible, and more reliable.

So why is bringing the tools to the site so much better in the case of data processing (as well as carpentry?) Well, you get more flexibility in what you do and you get to do it a lot faster.

To show you what I mean, let me give you an example. I'll start with a demo I saw a couple of years ago of a relatively light-weight in-memory BI tool.

The salesperson/demo guy was pretty dweeby, and he traveled a lot. So he had downloaded all the wait times at every security gate in every airport in America from the TSA web site. In the demo, he'd say, "Let's say you're in a cab. You can fire up the database and a graph of the wait-times at each security checkpoint. So now you can tell which gate to stop at."

The idea was great, and so were the visualization tools. But at the end of the day, there were definite limitations to what he was doing. Because the system is basically just drawing data out of the database, using SQL, all you were getting were lists of wait times, which were a little difficult to deal with. What you really wanted was the probability that a delay would occur at each of the gates, based on time of day and a couple of other things. But you sure weren't getting that in the cab.

Perhaps even worse, he wasn't really working with real-time data. For this purpose, by far the most important data is the most recent data, but he didn't have that; he couldn't really handle an RSS feed.

Now, consider what HANA's far more extensive capabilities do for that example. First of all, in HANA, data can be imported pretty much continuously. So if he had an RSS feed going, he could be sure the database was up-to-date. Second, in HANA, he could use the business functions to do some statistical analysis of the gate delay times. So instead of columns of times, he could get a single, simple output containing the probability of a delay at each checkpoint. He can do everything he might want to do in one place. And this gives him better and more reliable information.

So What Makes It Better?

Bear with me. The core difference between HANA and Exalytics is that in HANA, all the data is in one place. Is that a material difference? Well, to some people it will be; to some people, it won't be. As an analyst, I get to hold off and say, "We'll see."

Thus far, though, it appears that it is material. Here's why.

When I see a new design idea--and I think it's safe to say that HANA embodies one of those--I like to apply two tests. Is it simplifying? And is it fruitful?

Back when I was teaching, I used to illustrate this test with the following story:

A hundred years ago or so, cars didn't have batteries or electrical systems. Each of the things now done by the electrical system were thought of as entirely separate functions that were performed in entirely different ways. To start the car, you used a hand crank. To illuminate the road in front of the car, you used oil lanterns mounted where the car lights are now.

Then along came a new design idea: batteries and wires. This idea passed both tests with flying colors. It was simplifying. You could do lots of different things (starting the car, lighting up the road) with the same apparatus, in an easier and more straightforward way (starting the car or operating the lights from the dashboard). But it was also fruitful. Once you had electricity, you could do entirely new things with that same idea, like power a heater motor or operate automatic door locks.

So what about HANA? Simplifying and fruitful? Well, let's try to compare it with Exalytics. Simplifying? Admittedly, it's a little mind-bending to be thinking about both rows and columns at the same time. But when you think about how much simpler it is conceptually to have all the data in one database and think about the complications involved when you have to move data to a new area in order to do other operations on it, it certainly seems simplifying.

And fruitful?

Believe it or not, it took me a while to figure this one out, but Exalytics really helped me along. The "Aha!" came when I started comparing the business function library to the "Advanced Visualization" that Oracle was providing. When it came to statistics, they were pretty much one-to-one; the HANA developers very self-consciously tried to incorporate the in-database equivalents of the standard statistical functions, and Oracle very self-consciously gave you access to the R function library.

But the business function library also does…ta da…business functions, things like depreciation or a year-on-year calculation. Advanced Visualization doesn't.

This is important not because HANA's business function library has more features than R, but because HANA is using the same design idea (the Business Function Library) to enrich various kinds of database capabilities. On the analytics side, they're using the statistical functions to enrich analytics capabilities. On the transaction side, they're using the depreciation calculations to enrich the transaction capabilities. For either, they're using the same basic enrichment mechanism.

And that's what Oracle would find hard to match, I think. Sure, they can write depreciation calculation functionality; they've been doing that for years. But to have that work seamlessly with the Times10 database, my guess is that they'd have to create a new data storage area in Exalytics, with new pipes and changes in the management tools.

Will HANA Have Legs?

So what happens when you have two competing design ideas and one is simpler and more fruitful than the other?

Let me return to my automobile analogy.

Put yourself back a hundred years or so and imagine that some automobile manufacturer or other, caught short by a car with a new electrical system, decides to come to market ASAP with a beautiful hand-made car that does everything that new battery car does, only with proven technology. It has crisp, brass oil lanterns, mahogany cranks, and a picture of a smiling chauffeur standing next to the car in the magazine ad.

The subtext of the ad is roughly as follows. "Why would you want a whole new system, with lots and lots of brand-new failure points, when we have everything they have. Look, they've got light; we've got light, but ours is reliable and proven. They've got a starter; we've got a starter, but ours is beautiful, reliable, and proven, one that any chauffeur can operate."

I can see that people might well believe them, at least for a while. But at some point, everybody figures out that he guys with the electrical system have the right design idea. Maybe it happens when the next version comes out with a heater motor and an interior light. Maybe it happens when you realize that the chauffeur has gone the way of the farrier. But whenever it happens, you realize that the oil lantern and the crank will eventually fall by the wayside.

About the Author

I run a small analyst firm in Cambridge, Massachusetts that does strategy consulting in most areas of enterprise applications. I am not a database expert, but for the past year, I have been doing a lot of work with SAP related to HANA, so I'm reasonably familiar with it. I don't work with Oracle, but I know a fair amount about both the Times 10 database and the Essbase database, because I covered both Salesforce (which uses Times 10) and Hyperion (Essbase) for many years.

SAP is a customer, but they did not commission this piece, and they did not edit or offer to edit it in any way.