Nothing found, try being more general or more specific

Why ORM is a harmful pattern and should be avoided

Mapping of database storage and its relations to in-memory objects to ease processing can be seen as challenging and noble task, but its the vietnam of CS, a holywar topic.

The good parts

Let me mention the good parts of ORMs that I like, so that you don't think that I dislike them blindly

  1. Easy CRUD​ operations and thus code is less polluted with SQL
  2. Validation of write operations, escaping
  3. Generated objects as results. Useful because of strict schema definition, autocomplete, json generation.
  4. Methods like save(), delete() are more specific than plain query() interface
  5. Filtering can be composed with strict blocks using criteria API (that is more modular) than with SQL string concatenation

The bad parts

Abstraction should be easier than SQL, but its not. It leaks

ORM as a concept is flawed because of inadequate and leaky abstraction. This can be noticed in many details you need to learn and research, which makes up a very steep learning curve for almost any ORM engine.

1. Transaction as a concept is not native to single threaded in-memory process. Its an alien from relational world. You still need to either explicitly state in code that you're starting transaction and add try-catch-retry part, or you must always use transaction per request which usually locks too much. Another problem is that you can't control order of queries, which make it very easy to encounter a deadlock in concurrent requests. So ORM does not abstract the concept of transactions for you

2. Associations in DB are very general and only deal with count – 1:n, n:m. But in process/memory world, it must be  resolved to something more specific – inheritance, incapsulation, reference. It also affects when should linked rows be loaded. For abstraction layer, thats too much details that need to be configured. Usually it is solved by tying object declarations between each other  – Doctrine uses Discriminator Column Annotation, Bookshelf uses association bindings.

3. Equality. The semantics of equal function may depend on where data came from. What should happen if you compare an object that came from DB and a plain object constructed natively if all fields are the same? It depends. In DB you can have multiple rows with same data entries and you can specify what UNIQUE index means. In memory world you can compare by reference or by value, but in reality your model needs context where data came from, so that it can decide if its the same data or not

Every ORM is flawed in its own way

So ORMs as libraries are flawed from the beginning. And every implementation adds problems as it goes along. Some can't remap data fields, some can't store time as precise as database can natively, some don't reference columns in consistent way..

ORMs introduce new terminology. Similar to angular's transclusion, you need to understand what hydration, unit of work pattern, identity mapping, behaviours are.

Java – JPA, Hibernate, EclipseLink, Oracle TopLink, OpenJPA
PHP – Propel, Doctrine, Eloquent, RedBean
Python – SQLAlchemy, Django
Ruby – ActiveRecord, DataMapper
Node – Bookshelf, KnexObjectionSequielizeWaterline

Some ORMs introduce new security vulnerabilities (while fixing SQL injections)

New query language that some mappers introduce, can have security bugs and considering how big these libraries tend to grow, its no surprise

Some ORMs have cascade update/deletion that uses temporary table

Hybernate and Doctrine internally use temporary delete and update tables for cases where you delete/update entries that are affecting other tables due to reference (cascade). This is done to avoid memory limits of WHERE id IN (..ids..). 

This is slow and "temporary" tables are not fast and can get deadlocks

Some ORMs force models to inherit its base class and complicate testing

A domain model class usually (but not always) needs to extend ORM just to have persistence to DB, which is wrong. 

Domain object should not inherit all of the ORM's methods, because it limits domain object's inheritance, pollutes class namespace with inherited methods that should not be overwritten, limits serialization of object, increases memory consumption.

Data persistence must be stored as a property, an adapter incapsulated into domain model class

Example, node / bookshelf:

var User = bookshelf.Model.extend({
  tableName: 'users',
  posts: function() {
    return this.hasMany(Posts);
  }
});

Here is similar stuff in ruby / ActiveRecord:

class Product < ApplicationRecord
end

(Product.methods.sort - Object.methods.sort).count
# => 391 methods !

ORM increases complexity by injecting itself as new declarative language

The problem is the magic that ORM introduces. This magic relies on developer to setup object mapping in declarative way. So you have entities with properties and their types.

Now lets add complexity with associations. Your table A references table B.  Add a new config about that. Now should object load all of its association tree? How about if you delete an object, do you want your ORM to trigger delete methods on associated entities, or do you want only a cascade deletion in DB?

All of those are architectural decisions that your data mapper is doing it instead of you. And its doing it without your explicit, imperative code, just by reading some annotations. I could show you HTML node with 20 attributes that are added for angular to bind its logic and you wouldn't guess what is going on, without digging through angular and directives. ORMs are the same.

Here is Doctrine explaining how to map inheritance..

<?php
// user mapping
namespace MyProject\Model;
/**
 * @MappedSuperclass
 */
class User
{
    //other fields mapping

    /**
     * @ManyToMany(targetEntity="Group", inversedBy="users")
     * @JoinTable(name="users_groups",
     *  joinColumns={@JoinColumn(name="user_id", referencedColumnName="id")},
     *  inverseJoinColumns={@JoinColumn(name="group_id", referencedColumnName="id")}
     * )
     */
    protected $groups;

    /**
     * @ManyToOne(targetEntity="Address")
     * @JoinColumn(name="address_id", referencedColumnName="id")
     */
    protected $address;
}

// admin mapping
namespace MyProject\Model;
/**
 * @Entity
 * @AssociationOverrides({
 *      @AssociationOverride(name="groups",
 *          joinTable=@JoinTable(
 *              name="users_admingroups",
 *              joinColumns=@JoinColumn(name="adminuser_id"),
 *              inverseJoinColumns=@JoinColumn(name="admingroup_id")
 *          )
 *      ),
 *      @AssociationOverride(name="address",
 *          joinColumns=@JoinColumn(
 *              name="adminaddress_id", referencedColumnName="id"
 *          )
 *      )
 * })
 */
class Admin extends User
{
}

ORMs hide real number of queries

Causes N+1 performance problem in code you write

ORMs make it easy to write (not so obviously) inefficient code that is hard to optimize later. Here is a regular cycle that looks fine if you don't take into account that every get does request to DB..

$companies = getAllCompanies();
    $totalValue = 0;
    foreach ($companies as $company) {
        // For each company there is, retrieve the total value of all their orders
        $orders = $company->getOrders();
        foreach ($orders as $order) {
            $totalValue += (float) $order->getValue();
        }

        echo "This company made us ". $totalValue ." euro already.";
    }

You might think that developer should have known his tools! But there is more..

Leads to eventual N+1 performance problem in code you don't see

Because your code relies on schema definition, its very easy to decrease performance by adding associations with other tables. As a result, your simple get() all over your code can suddenly increase ten-fold all over the place.

To quote Roman.

But the true is in Hibernate it can be any number (of queries). Thousands or millions. And the most cool feature is this number may be easily (and often unintentionally)  changed in several different places of the code!

It is because of autoloading. When you designed User refers to a collection of Roles you most probably never thought that one of your colleagues will later add ref from Role to let say Permission. And another one (couple of week later) from Permission to something else.

As a result, your fast and pretty loadUser now executes 100 times slowly. And even when you accidentally find it you will have to analyze the whole model to understand what happened and then start with huge refactoring to make your code fast again without breaking codes of you colleagues (that relies on Permission collection now!)

Adds redundant queries and leads to eventual performance degradation

Same issue can happen if junior developer wants to update one field in DB. So the obvious ORM-way is to get() row by id, set field value and write to DB. But if you're junior you don't know about lazy loading or what get() exactly it triggers. It can be entire object with blob fields or a hierarchy of multiple entities. And on test environment you will not notice it, until it reaches the production.

ORMs can also add minor tax to load metadata, like DB table schema. 

If you load list of results and then process them, your performance does degrade a little, because you traverse same list of data several times.

It forces IoC on entities but does not force domain model layer, leading to broken OOP

Another bad effect of ORMs if they use Data Mapper pattern instead of Active Record, is that they become smart storage layer for entire application while making objects that they operate with, dumb. These vary from Data Transfer Objects or Entities that should not have any logic, to Value Objects which can have minor functions (getters/setters and some simple string/math conversions).

The problem thus is that ORMs force developers to break OOP into separate layers – data manipulation is done with data/entity mapper and DTOs, but ORM says nothing about where should business logic go. And having object storage apart from its processing makes it very hard to see the whole picture later

Here is php / doctrine saving user:

$user = new User();
$user->setName($newUsername);

$entityManager->persist($user);
$entityManager->flush();

Notice that User class doesn't know where it will be saved. It has no control over it. This is IoC in practice and it shows who is more important – domain objects or mapper.

With true OOP, User object would hold both data and functions. Now, it is assumed by the ORM, that developer will write DTO + ORM and a separate model class for the user that will hold all of the manupulation logic. In Symfony, that means writing UserService that will have create() method with the code above. The problem is that its very easy not to do that and to put same DTO manipulation code in other service and cause chaos. Having ORM thus is no different from having direct DB connection in service layer.

ORMs lead to interface lazyness

Without ORM if you were to create raw SQL to update user's name, you would see that its too much code and move it to rename() method.

With ORM however, you have smaller CRUD methods. Thats convinient, but it becomes parameter hell. You rely on .update() and don't bother to move it into a separate method user.rename(). This code becomes too verbose if you have more properties that are updated.

ORM increases complexity by duplicating SQL (twice)

ORM is supposed to hide SQLs from code, but in reality, code is still littered with select-from-where syntax parts. This happens because its hard or inefficient to do more advanced queries with basic get/update/delete. So you want to filter something with AND + OR, order by several columns? You know how SQL is supposed to look, but now try to jump with function chaining and parameters in right places and format just to build it!

In Java / Hybernate this is done with Criteria API or with HQL.
In PHP / Doctrine this is done with Filtering API or with DQL.

SQL gets duplicated twice!

But what really irritates me is that it uses specification pattern which distorts naming of objects and methods. Methods are supposed to be verbs, not "eq" or "and". I've seen it in phpunit too and it sucks. Do you like this syntax, or would you prefer regular query as a string that you can copy, paste and execute to debug?

// $qb instanceof QueryBuilder

$qb->select(array('u')) // string 'u' is converted to array internally
   ->from('User', 'u')
   ->where($qb->expr()->orX(
       $qb->expr()->eq('u.id', '?1'),
       $qb->expr()->like('u.nickname', '?2')
   ))
   ->orderBy('u.surname', 'ASC');
   

// example7: how to define:
// "SELECT u FROM User u WHERE u.id = ? ORDER BY u.name ASC"
// using QueryBuilder using Expr\* instances
$qb->add('select', new Expr\Select(array('u')))
   ->add('from', new Expr\From('User', 'u'))
   ->add('where', new Expr\Comparison('u.id', '=', '?1'))
   ->add('orderBy', new Expr\OrderBy('u.name', 'ASC'));

Its painful to go away or go hybrid

Once you start using ORM, it becomes increasingly hard to use regular queries. Usually because ORM's introduce internal caches and your native code will make it less useful. Another issue is that ORMs provide all of the nice mapping that you are used to, whereas your native queries will not.

9fe2ccb813.1496050133.jpg