We developers have been working with SQL databases for years. I wouldn't say they are ideal, but they get the job done. Probably the biggest problem with them nowadays is that their data storage model is based on tuples and relations, and those map poorly to the object-oriented programming paradigm. People have tried solving that problem by building sophisticated mapper tools (Object-relational mappers, or ORM's) that help in turning objects to tuples and vice versa. And while ORM's are usually an improvement to the situation, they don't come without a price tag. They bring their own issues like added complexity, and, to add a sweeping generalization, just more crap to deal with.
The above might be the reason so many people turned to NoSQL databases. We saw simpler data models and systems that were easier to program against. We were basking in the glory of schemaless data, JSON transport and storage formats and MapReduce jobs, until we realized that we have mostly traded one set of problems for another. And I doubt the newer set of problems is any smaller. For example, while there are specific real-world problems that certain NoSQL systems excel at, I am yet to see a NoSQL database that is as general as relational ones and can cover the same vast range of applications.
What If We Could Have Relational sans ORMs or SQL?
What if we thought of relational databases in the same way as we do with any new type of NoSQL database that pops up? Let's have a little mental experiment, shall we? Imagine a NoSQL database that is as solid and mature as a traditional SQL one. It has this weird data model of having to define your data collection as tables with predefined attributes and requires that you know those attributes in advance. Well, it also has several advantages. For example, it doesn't require you to write a MapReduce job for every query you'd like to throw at it. In fact the database supports ad-hoc queries since it doesn't necessarily rely on pre-built indexes or views to fetch results. Oh, and it also has full ACID support so that it never loses your data.
How would we query such a database without wasting time with ugly SQL? We would need an API that will let us define our table schema and then allow us to craft queries using simple abstractions like collection maps, filter, joins, etc. I don't mean a heavyweight ORM solution either. If we are after simplicity, we'd better forgo dealing with object mappings and the complexity they bring. All we want is a hassle-free way to model our data and read and write it.
SQLAlchemy Core to the Rescue
SQLAlchemy is a mature SQL database toolkit that may make our dream of quick and easy access to SQL data come true. It is split in two parts: SQLAlchemy Core and SQLAlchemy ORM. "Core" is an amazingly simple API that lets you work productively with any SQL database. It both hides the peculiarities of specific DB dialects away and provides the building blocks for building sophisticated applications and database management tools. Let's see how it does that.
In order to work effectively with Core, you need to make it aware of the database schema. That is you either have to define the structure of your tables or let it inspect them from an active database. In my tests I am using a simple two-table structure that models authors and their published books. Here is how the definition looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13
metadata = MetaData() authors = Table("authors", metadata, Column("id", Integer, primary_key=True), Column("name", String), Column("address", String), ) books = Table("books", metadata, Column("id", Integer, primary_key=True), Column("author_id", None, ForeignKey("authors.id")), Column("title", String, nullable=False), Column("isbn", String, nullable=False), )
The definition API is simple enough: we create two tables with the respective columns, each with its types. We also set up a foreign key linking
authors via the
author_id column. So far so good.
To insert a row, we ask a table to create an insert command for us, and then we execute that command with the needed arguments. Here's how to insert a new author:
1 2 3
insert_author = authors.insert() inserted_author = conn.execute(insert_author, name="J.R.R. Tolkien", address="Middle-earth") first_author_id = inserted_author.inserted_primary_key
Since we are using an autoincrement primary key, it might come in handy if we got the newly-inserted one after the insert command finishes executing. We do that via the
inserted_primary_key property. Note that it is a list of values since we may have multi-column keys.
Let's now insert a book.
1 2 3
insert_book = books.insert() conn.execute(insert_book, author_id=first_author_id, title="The Hobbit", isbn="12535ISBNYAY")
Nothing fancy other than the tricky bit that we're associating the book with our first author by passing the correct
Inserting data is all fine, but how do we get it back? Using the
select API and expressions of course. Here is a simple select command that gets all authors:
1 2 3 4 5
all_authors = select([authors]) authors_result = conn.execute(all_authors) for row in authors_result: print("%d -> %s" % (row["id"], row["address"])) authors_result.close()
select takes a list of columns or (for convenience) entire tables and returns a command that you pass to
execute. Executing that gets you back an iterable result object yielding row objects that are both indexable by column name or index. (Note: remember to close your result once you are done with it, to avoid leaking memory or server-side cursors.)
Let's go for something more complex now. I'd like to get all the books written by our first author. Here's how to do it:
1 2 3 4 5
first_author_books = select([books], authors.c.id == first_author_id) books_result = conn.execute(first_author_books) for row in books_result: print("title: %s" % row["title"]) books_result.close()
The magic happens in the second argument to the
select function. It requires an expression object that defines your query filter. What better way to build that than using the table you just defined. The
c property exposes column objects that have most of their comparison operators predefined so that they return the correct expression object. In the example above we get an expression that compares the
authors.id column to a literal value, our
What if we want to have a more complex filter? We combine expressions with the
not_ functions, of course. Here is how we get all books by our first author that contain the "hobbit" string in their title:
1 2 3 4 5 6
hobbit_books = select([books], and_(authors.c.id == first_author_id, books.c.title.like("%hobbit%"))) books_result = conn.execute(hobbit_books) for row in books_result: print("title: %s" % row["title"]) books_result.close()
Not being restricted by mappings to objects lets us freely use table joins and make the DB return data in the format we like it. Here is a simple join that fetches data from both
1 2 3 4 5
authors_and_books = select([authors, books], authors.c.id == books.c.author_id) books_result = conn.execute(authors_and_books) for row in books_result: print(row) books_result.close()
We are using an implicit join here by specifying a filter that compares columns from two tables, but SQLAlchemy has the API that lets us specify explicit inner and outer joins.
If you expected delete commands to look similar to insert ones, you'd be right. Here is one that deletes all books by our first author:
delete_first_books = books.delete().where(books.c.author_id == first_author_id) conn.execute(delete_first_books)
The only difference from inserts is that we specify a
where clause. That clause isn't strictly necessary, but omitting it will delete all data in the table and that is exactly the SQL noob mistake you'd probably want to avoid.
After using SQL databases with and without fat ORMs for years, I had this short identity crisis that led me to believe that NoSQL DB's are the future and we should be using them for everything. After getting this idea beaten out of my head the hard way, I am back to a more sober view of using relational databases by default and thinking of NoSQL ones only in extreme circumstances. I still wanted to try a different approach to working with databases and I believe I have found it with SQLAlchemy Core. Well, I can't throw all my existing code away yet, so I am still using the Django ORM in my web projects. I rely on tons of third-party Django apps that deal with models directly and I wouldn't want to lose their sweet features. I will be definitely be trying SQLAlchemy Core on something bigger when I have the chance though. I really want to see if one can create a bigger application using the simple relational API and nothing else. Oh, and in the mean time I can dream of better SQLAlchemy support in Django.