SQLAlchemy ORM: On the Road to Maintainability

SQLAlchemy ORM: On the Road to Maintainability

Many older Python applications started their life relying on raw SQL or simple data access libraries like pyodbc to interact with a database. While this approach can be pragmatic early on, it often leads to maintainability and scalability issues. You can quickly find yourself spending way too much time troubleshooting enigmatic database errors, increasing logging output, and painstakingly debugging just to find that your sprawling SQL string was missing a crucial comma. Wouldn’t it be nice to let your IDE prevent these errors long before you even execute the query? Or what if your application needs to be switched (or extended) to a new database platform entirely? Do you really want to go comb the codebase for all the SQL that needs some syntax tweaking? Migrating to a more modern Python Object Relational Mapper (ORM) , such as SQLAlchemy, can bring some powerful features like model validation, query abstraction, relationship management, and codebase clarity. And once you go all in on an ORM, your code becomes largely database-agnostic, and you can let SQLAlchemy deal with the particulars of the underlying database syntax without you ever worrying about it.

There are varying degrees to which you can migrate to a new database ORM, and you don’t have to bite off the whole migration at one time. At first, SQLAlchemy can be introduced with only some bare-bones connection logic upgrades. When you want to go further, you can take advantage of SQLAlchemy’s Core features, or dive in with the whole ORM. Take, for example, this series of upgrades:

1. Raw SQL using pyodbc
The old way. Does this look familiar?

import pyodbc

conn = pyodbc.connect("DRIVER={ODBC Driver 18 for SQL Server};SERVER=localhost;DATABASE=mydb;UID=sa;PWD=secret")
cursor = conn.cursor()

cursor.execute("SELECT id, name, email FROM users WHERE name = ?", ("Lebowski",))
row = cursor.fetchone()

print(row.id, row.name, row.email)

2. Raw SQL with SQLAlchemy engine
Using SQLAlchemy for connection and execution only — no Core or ORM features yet. Still raw SQL, but with parameter binding and DB-agnostic connection management.

from sqlalchemy import create_engine, text

engine = create_engine("mssql+pyodbc://sa:secret@localhost/mydb?driver=ODBC+Driver+18+for+SQL+Server")

with engine.connect() as conn:
     result = conn.execute(text("SELECT id, name, email FROM users WHERE name = :name"), {"name": "Lebowski"})
     row = result.fetchone()

     print(row.id, row.name, row.email)

3. SQLAlchemy Core (Programmatic SQL)
This avoids raw SQL entirely and uses SQLAlchemy’s Table, Column, and select constructs. This makes it easy to compose queries dynamically.

from sqlalchemy import MetaData, Table, Column, Integer, String, select

metadata = MetaData()

users = Table(
    "users", metadata,
    Column("id", Integer, primary_key=True), 
    Column("name", String), 
    Column("email", String) )
with engine.connect() as conn:
    stmt = select(users).where(users.c.name == "Lebowski")
    result = conn.execute(stmt).fetchone()

    print(result.id, result.name, result.email)

4. SQLAlchemy ORM
Now we define the User class and use sessions to interact with it as a Python object. It creates a full object-relational mapping, great for large applications and/or sizable domain models.

from sqlalchemy.orm import declarative_base, Session
from sqlalchemy import Column, Integer, String

Base = declarative_base()

class User(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True)
    name = Column(String)
    email = Column(String)

session = Session(engine)
user = session.query(User).filter_by(name="Lebowski").first()

print(user.id, user.name, user.email)

Cool right? Why would anyone NOT want to go the full ORM route? Well, there can be a couple of stumbling blocks on the road to ORM bliss. I’ve been working on a project for a while that uses a SQL Server database. That database schema was largely formed from the ingestion of Pandas dataframes, extracted from a separate database that had no relational or data-integrity constraints to speak of. Most of the tables in the database have no primary key constraints at all, and this can cause problems for SQLAlchemy. That’s because, in order for the ORM to fully work its magic and abstract all of the finer details away, it needs to rely on some minimal database structures to enable it to reason about your data (primary keys and relational constraints are the big ones). Initially, you could adopt a hybrid approach where you specify primary key constraints in your SQLAlchemy model, even though they don’t actually exist in the underlying database tables. But then you still have to ensure that the data in the database obeys all the rules (unique, non-NULL values) without the database actually enforcing these rules. SQLAlchemy tools like autoload_with=engine might get confused, and your results could be “inconsistent” from time to time. Thus, it’s probably not a bad idea to go ahead and do the database work now, to ensure that you get all the sweet sweet “ORM-ness” from the start.

The simplest strategy (at least for my use-case) would be to locate every table that needs a new primary key constraint and then:

  • Create an identical table called <table_name>_new that includes a primary key constraint
  • Verify there are no duplicate or NULL values in the candidate primary key column in the original table
  • Insert all the data from <table_name> into <table_name>_new
  • Drop <table_name>
  • Rename <table_name>_new to <table_name>

One thing to keep in mind in this process is the concept of the IDENTITY property in SQL Server. By default, primary key columns in SQL Server are marked as an IDENTITY column. This means that the database will manage an auto-incrementing sequence of integers and will automatically set the primary key value with the next number in the sequence upon row insertion. But, if your old table already has existing values in the primary key column that you want to keep, you have to first allow the insertion of those values (enable IDENTITY insert) on the new table before copying the data from the old table. This is actually the reason we created the new table in the first place. You can’t just ALTER a column in an existing table and add the IDENTITY property. It has to be set from the start. After the data has been moved to the new table, you can disable IDENTITY insert, and then the database should automatically start populating the primary key column for new inserts, starting at the current MAX(primary_key_value) + 1. So the whole process for one table would look like this:

CREATE TABLE users_new (
    id [int] IDENTITY(1,1) PRIMARY KEY,
    name [nvarchar](255)
);

The primary key column can’t have any NULL values, so check for those.

SELECT COUNT(*) FROM users WHERE id IS NULL;

Then verify that there are no duplicates in the primary key column either.

SELECT id, COUNT(*) FROM users GROUP BY id HAVING count(*) > 1;

If these two queries don’t return any rows, you’re in business. When IDENTITY insert is enabled, you have to specify all of the column names when running INSERT statements. A simple, INSERT INTO users_new SELECT * from users won’t work.

SET IDENTITY_INSERT users_new ON;
INSERT INTO users_new (id, name) SELECT id, name from users;
SET IDENTITY_INSERT users_new OFF;

Now you can just drop the old table and rename the new one.

DROP TABLE users;
EXEC sp_rename 'users_new', 'users';

And, you can verify that the IDENTITY sequence on the new primary key field is ready for your next insert by checking the MAX(id) value against the current value of the sequence.

SELECT MAX(id)from users;
DBCC CHECKIDENT(‘users’);

And when that’s all done, just remember to update any of your Python code that is currently supplying new ID values when inserting into those tables. From now on, your database will handle those.

Finally, to round out your database upgrades, it wouldn’t hurt to add in some referential integrity constraints to allow your database (and the SQLAlchemy ORM) to “keep your code honest.” This step is fairly straight-forward and doesn’t require any data movement or table destruction. You just need to have a solid knowledge of your underlying data model. If an ID value in one table is already “related” to an ID column in another table, you might as well throw a FOREIGN KEY constraint on the child table to make sure the matching ID values stay in sync. To do this, you can simply follow this pattern on any child table that needs a constraint.

ALTER TABLE <child_table>
ADD CONSTRAINT FK_<child>_<parent> 
FOREIGN KEY (<child_column>) REFERENCES <parent_table>(<pk_column>);

If you run into any errors with these statements, it might mean that your data has already drifted a bit, and you need to make some corrections. This can be a bit of a pain, but just know that once you get this done, the database will manage this from now on, and you shouldn’t run into this problem again.

And with those database adjustments in place, you should be ready to take full advantage of SQLAlchemy’s ORM. Begone, raw SQL! Your time is at an end. From now on, your data model and data access code will be pure Python objects, just like the rest of your codebase. Welcome to the land of blissful code consistency, with a sprinkling of database integrity thrown in for good measure.

Migrating to an ORM like this can take a little work, but the benefits in clarity, structure, and long-term maintainability are well worth it. Treat it as a gradual process: add primary keys where needed, model your entities step-by-step, and consider hybrid techniques as a transitional strategy. In the end, your codebase will be easier to navigate, your queries more consistent, and your data model finally documented in code. Let the migration commence!

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Insights

View All