Object Relational Mapping (ORM) software is a great idea. You can’t deny that the mismatch between objects and relational data has to be dealt with. Instead of all the time, money and effort being spent here, why not get a tool that does most of the work for you? But… One direction that this can lead is towards dumb databases. After all, if putting a piece of software between the object & the db makes things easier, how much easier if the db and the object look exactly the same. Ta-da! Even less code to write & maintain. Unfortunately, TANSTAAFL (There Ain’t No Such Thing As A Free Lunch) still applies. What you save in initial coding you will pay for in reporting, data cleanup, integrity issues, data integration issues… Anyway, I’ve been researching this, since, as I mentioned before, my company is looking to implement ORM and the architects in charge of the project are really excited by the idea of making the database a reflection of the object. Here are the list of concerns and potential issues that I’ve come up regarding ORM. Any comments or suggestions around this would be useful. In no particular order:
- Â I/O increase due to “chattier” applications:
Most documentation indicates a lot more, smaller transactions, not to mention the possibility of frequent requests to verify structure (the app checking to see if the database has changed) prior to generating & running queries.
Mitigation is to ensure appropriate configuration & use of nHibernate. Monitoring can be done with Profiler - I/O increase due to loading larger data sets more frequently
Ensure the use of “lazy” collections to reduce data moved
No other mitigation possible. - General performance issues due to “generic” procedures using less efficient access methods
Generic data types used in queries can lead to indexes not being used - Zero possibility to tune queries in a production setting
Any, all, changes require re-coding and re-deployment. There are no methods available for database only tuning except applying an index or forcing parameterization. - Reporting
This includes transactional level reporting as well as moving data between a model driven design and a more flexible design (normalized or star schema or whatever) that better supports reporting. Coding time reduced on the front end is, to some degree, tacked on to the back-end.
A data cleansing mechanism may be required.
Redundancy of data an authoritative sources for data may require some refinement. - Data integrity
Generated structures are dependent on discovery to determine the proper constraints required on the data, or all constraints are assumed to be in the code.
Without integrity maintained with data the possibility of “dirty” data is increased (“USA”, “U.S.A.”,”US” all values entered through app). - Security
We will have to give over full read/write privileges at the table level to the application. I think, it’s possible, we’d have to give it ‘dbo’ in production. Based on a few statements in some of the research, it’s even possible we’d have to give it ‘sa’ (although that is completely unproven currently). - Integration with other systems at the data level
Depending on the application, this may not be required. But if it is required at any point, it will entail a larger than normal effort to convert the data to a more normalized structure.
You are correct about your point on Security. Can I ask, are there really horror stories out there for data being compromised this way, or is it just a paranoia thing?
I’m a little unsure of this statement: “mismatch between objects and relational data”. Do you mean mismatch because one has evolved at a different pace than the other? Or do you mean that object X should match Table X and Object Y should match Table Y?
If the latter, than I totally disagree that you would want to match them! A relational model serves one primary purpose: To eliminate data anomalies through normalization. This results in, say 5 tables, to represent a single entity (like a product, customer, order, etc.). As you know, relational databases are not the place to embed business logic, let alone behavior. That’s the realm for business objects, which in once instance would represent a Product, Order, or Customer. SO a single object might represent what a dozen or so tables in a relational model represents — and in fact, depending on security, compliance issues, etc., may only represent a small portion of the data that exists in the underlying tables.
Anyway, I’m not too much up on ORM, but I would like to see it discussed more…
To Brendan:
A bit of both really. The fact is, it does expose the database more from a security point of view. That should raise concerns for anyone. However, most of the documentation I’ve read suggests that most ORM products, including nHibernate, use parameterized queries. This will prevent SQL Injection attacks. However, since we’ll have to expose ourselves to a higher risk level, I want to ensure that we make that a part of the known and agreed to issues.
To Tod McKenna:
Actually, we totally agree. The mismatch is that objects store data one way, as they should, and relational databases store the data a different way, as they should. I’m not proposing that the objects should reflect the structure of the database. Our developers are proposing that the structure of the database reflect that of the objects. They say this will make the programming easier, which it will. Unfortunately it will also make integration at the data level and reporting from this system and possibly other things I’m not thinking of, much more difficult. “There ain’t no such thing as a free lunch” still seems very applicable in this situation.