The Data Disconnect

Writing

I saw yesterday on Twitter that Sam Gentile had responded to Oren Eini’s “Impedence Mismatch and System Evolution” post that was actually a response to Stephen Forte’s “Impedance Mismatch” post. Phew, you got that? That is why I love the blogger world, it is like one big distributed, asynchronous conversation! In these posts there was a line stated by Stephen Forte that Oren had a problem with:

My first problem with ORMs in general is that they force you into a “objects first” box. Design your application and then click a button and magically all the data modeling and data access code will work itself out. This is wrong because it makes you very application centric and a lot of times a database model is going to support far more than your application.

Oren then went on to say that he “absolutely rejected” the statement that the database model is going to support far more than your application. Sam Gentile then chimed in that he fully agreed with this statement. I also agree with Oren’s statement, but that isn’t really why I am writing this blog post. I am writing this blog post because I think that we are at the edge of a huge schism in the data-centric application development community.

If you read Stephen Forte’s post you will see that it was actually started by talking about the Entity Framework, and I think that is an important note to make. Why? Because these tools power is manifested by a fundamental shift in who owns an application’s data. So, who does own the application’s data? Well, I think that you can tell which way I lean just by the way I worded my last sentence. In my mind, the data is owned by the application that is reading and writing the data, and hopefully most people will agree with this. But what happens when you have numerous applications reading and writing the same data? Well, you have to move the control over the data into the database because the database becomes the single point at which all data passes through.

Historically stored procedures have been viewed as the data gatekeepers. You put a bunch of stored procs in your database and they shielded your tables from bad input, incorrect or incomplete data, etc… In fact many people have argued for putting business logic in stored procedures, because how else would you make sure valid data is entering the database if you have a multitude of applications hitting these stored procs? Well, you can’t really. If you don’t put business logic into your stored procs, then your data is only as good as the individual application that is entering the data. The proverbial “weakest link” problem.

How do you combat this? It is simple really, you move from a database-centric view, to a an application-centric view. The very thing that Stephen above was speaking out against. And I know that there are going to be people out there that are going to throw their hands up in the air and say “blasphemy!” when they read that. But I am not arguing that we throw the database out, or marginalize its importance, I am saying that the single point at which all data passes through needs to be pushed up. There needs to be application code responsible for transforming and validating data. In fact, most well thought out applications are using an architecture like this already. They have an abstraction layer that sits between the database and your domain model that shields you from schema changes and they also have business logic that ensures good data. The issue is that there may be several applications hitting a database, each with their own abstraction layers that are operating directly against the database schema.

All of that data should be funneled through a single application! If you see the need for two applications to be hitting the same schema, then you should at least define a thin service layer between the database and those two applications. Even if the applications need the same schema now, they most likely won’t need the same schema a few months, or years, from now. Business logic just doesn’t work like that.

I am not advocating anything new here. People have been discussing it for years, it is called Service Oriented Architecture. And yes, I know that SOA is probably the most overloaded term in the history of software development, but in its most basic form it represents an architecture where you have a bunch of loosely connected services. So, what are those services? Well, they could take many forms, but they are all going to be some application that is sitting in front of a data back-end. And in this layer, all of our bajillion lines of business and application logic reside. The data is abstracted behind many layers so that we can mold, transform, and even query from multiple sources. All of this is hidden from the application or person that requests the data, just as it should be.

If you have a single rich layer of business and application logic that can be as thin or as thick as you need it, that can be changed and molded to produce the data you need, and can keep knowledge of your schema out of the hands of many other applications, then how can this be a bad thing? In my mind I would think that even the data people would love this, because it means that the database schema can be much more flexible and can change more rapidly to suit the needs of the company. Or not change at all, since the service layer can transform the data, if the needs of the enterprise dictate this.

But where do the ORM tools come into this? If you are looking at a database as the center of the application, with multiple applications hitting the same database, then an ORM solution would probably look less realistic for you. ORMs biggest advantages come into play when they are generating their own SQL (so as to avoid having to maintain separate stored procs) and when you let the database schema stay relatively simple and have the translation layer massage the data into the exact format you need. These goals can often conflict with the database-centric view of an application, which likes to keep more of this control inside the database.

So, is the data divide that I am talking about just an argument between the data and service centric approaches? That is the way that I see it. Did I think that we would still be arguing this point in 2008? Nope.

Comments (2)

  1. "To the new architect, there are many choices. For the master architect, there are but few." One of my favorite quotes from Juval Löwy.

    What these experts are wringing their hands about is the vast array of choices in the design of data-driven systems. But there isn’t a vast array of options if you don’t allow direct database access in the first place. If data were a service, not a language and a protocol as it has been historically, we wouldn’t be having this debate, I think. See my recent rant:

    http://www.gotnet.biz/Blog/post/An-Unfortunate-Consequence-of-History.aspx

    I’m not sure I understand Stephen Forte’s concern, to be frank. If I create services for one application, how does that encumber me when faced with the fact that "a lot of times a database model is going to support far more than your application"? When new applications come along, just write another set of services if you need to. And when you recognize that the now two sets of services share common business logic, refactor it into yet another layer of shared services. I believe you echoed this when you said "define a thin service layer between the database and those two applications" and then make it "as thin or as thick as you need it". Well said.

    SOA is not mysterious and terse, despite IBM’s and Oracle’s attempts to the convince us otherwise. It’s not a product strategy. It’s a simple philosophy that says everything’s a service. The degree to which you choose to codify services using an SOA framework like WCF is where Juval and I don’t perfectly agree. But he’s right about the simpleness of the SOA mindset. There is no database. There may be persistence services underneath layers of services implementing business logic. Within the persistence services, there may be an implementation strategy using Object/Relational Mapping tools. But that’s all O/RM should ever be in my mind: an implementation detail near the bottom of the services stack.

    Unfortunately, I think O/RM has become the "Vietnam of Computer Science" as Ted Neward says. People love to argue about what it means but, in my opinion, we need an airlift out of that war zone.

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Insights

View All