A step-by-step SQLAlchemy tutorial

Getting Started

Before we install SQLAlchemy, let's make sure you have the latest version of setuptools, which can make your life a lot easier. At a command prompt, run:

easy_install --help

If you get a list of commands and options, great: setuptools is installed and working on your system, and you can skip to the next paragraph. If you get an error message like "command not found", then you'll need to install setuptools. Download ez_setup.py and run it. (If you're on Linux or OS X, you may need to run it as root by doing "sudo python ez_setup.py").

Now that you have setuptools installed, you can install SQLAlchemy by running:

easy_install SQLAlchemy

On Linux and OS X, you may need to run this as root ("sudo easy_install SQLAlchemy"). This will automatically connect to the Python Package Index, find the latest version of SQLAlchemy, download it, and set it up in your site-packages directory.

You'll also need the latest version of pysqlite installed. Run "easy_install pysqlite" (with sudo if necessary) to fetch it. On Windows, that's all you need; on Mac OS X or Linux, you'll also need the sqlite3 package installed. The exact details of installing a package will vary from system to system: just make sure it's sqlite3 you're installing, and not sqlite or sqlite2.

Once you have everything installed, you're ready to begin!

First Steps

Let's start with a simple SQLAlchemy program. Copy this into a text editor and save it as "firststeps.py":

from sqlalchemy import *

db = create_engine('sqlite', opts={'filename': 'tutorial.db'})

db.echo = False  # Try changing this to True and see what happens

users = Table('users', db,
    Column('user_id', Integer, primary_key=True),
    Column('name', String(40)),
    Column('age', Integer),
    Column('password', String),
)
users.create()

i = users.insert()
i.execute(name='Mary', age=30, password='secret')
i.execute({'name': 'John', 'age': 42},
          {'name': 'Susan', 'age': 57},
          {'name': 'Carl', 'age': 33})

s = users.select()
rs = s.execute()

row = rs.fetchone()
print 'Id:', row[0]
print 'Name:', row['name']
print 'Age:', row.age
print 'Password:', row[users.c.password]

for row in rs:
    print row.name, 'is', row.age, 'years old'

First Steps, in detail

This code sample shows off a lot of the features of SQLAlchemy. Let's go through it step-by-step.

from sqlalchemy import *

db = create_engine('sqlite', opts={'filename': 'tutorial.db'})

The first step in writing SQLAlchemy code is to open a connection to the database you'll be using. In SQLAlchemy, this is done by creating an SQLEngine object, which knows how to talk to one particular type of database (SQLite, PostgreSQL, Firebird, MySQL, Oracle...). The SQLEngine object also doubles as a connection object. Behind the scenes, it will create a pool of database connections and re-use them automatically as needed, to keep your application running quickly. But most of the time, you don't even need to think about the database connections; just use the SQLEngine object that create_engine() returns, and let SQLAlchemy handle the details for you.

users = Table('users', db,
    Column('user_id', Integer, primary_key=True),
    Column('name', String(40)),
    Column('age', Integer),
    Column('password', String),
)
users.create()

This should be pretty self-explanatory: we've just created a users table in our database, with four columns. If the users table already existed, we could instead have done:

users = Table('users', db, autoload=True)

and SQLAlchemy would have automatically figured out the table's structure from the database.

i = users.insert()
i.execute(name='Mary', age=30, password='secret')
i.execute({'name': 'John', 'age': 42},
          {'name': 'Susan', 'age': 57},
          {'name': 'Carl', 'age': 33})

SQLAlchemy's SQL construction methods are beautiful. You'll almost never need to write SQL by hand. Instead, you create an "SQL statement object", build the SQL query you want, and call its execute() method. Here we ask for an INSERT statement referencing the users table:

i = users.insert()

Now when we do i.execute(), SQLAlchemy will generate the appropriate "INSERT INTO users VALUES (...)" statement for the values we pass into execute(). Notice the two different ways of executing an INSERT statement. We can either pass it keyword parameters, to insert a single object:

i.execute(name='Mary', age=30, password='secret')

or else we can pass it multiple dictionaries, to insert multiple objects:

i.execute({'name': 'John', 'age': 42},
          {'name': 'Susan', 'age': 57},
          {'name': 'Carl', 'age': 33})

If you have any "special" characters (such as semicolons or apostrophes) in your data, they will be automatically quoted for you by the SQLEngine object, so you don't have to worry about quoting. This also means that unless you deliberately bypass SQLAlchemy's quoting mechanisms, SQL-injection attacks are basically impossible.

You may have also noticed that we didn't have to specify all the columns of the database. Any columns we didn't specify will get filled with NULL, except for the primary key, which will automatically get a unique value.

s = users.select()
rs = s.execute()

Like INSERT statements, SELECT statements are also done by creating a statement object and calling its execute() method. This particular statement is the basic "SELECT * FROM users" with no WHERE clause. Later on we'll see how to do WHERE clauses, or only return data from certain columns, or JOIN two (or more) tables together.

Calling execute() on a SELECT statement object will return a result set, which has fetchone() and fetchall() methods. As you'd expect, fetchone() returns a single row, while fetchall() returns a list of rows. The rows returned aren't simple tuples or dictionaries, but intelligent row objects, as can be seen below:

row = rs.fetchone()
print 'Id:', row[0]
print 'Name:', row['name']
print 'Age:', row.age
print 'Password:', row[users.c.password]

Here we see some of the various ways you can access the data in a row object. First, you can pretend it's a tuple and access its columns by position. SQLAlchemy guarantees that the column order returned from a "SELECT * FROM (table)" statement will be the same as the order in which the columns were declared in that table, so here we know that row[0] will be the user_id column. We can also access the row as if it were a dictionary (row['name']). Next, my favorite: SQLAlchemy lets us access the columns as if they were attributes of the row object. (Some simple __getattr__() magic behind the scenes makes this work). And finally, we can even use the actual Column objects themselves as keys to lookup results from a row. You probably won't use this very often, but it can be extremely useful in some circumstances.

for row in rs:
    print row.name, 'is', row.age, 'years old'

Finally, we see that we can also iterate through the result set via a simple for loop. This is especially useful when you expect your SELECT query to return a huge result set that would be too large to load into memory: the for loop will only fetch one row at a time from the database.

Select Statements

Now let's take a little time to examine some of the various ways in which we can select rows from our database. There are lots of conditions you might want to put in the WHERE clause of a SELECT statement, and SQLAlchemy makes most of those easy.

Copy the following code into "selectdemo.py":

from sqlalchemy import *

# Let's re-use the same database as before
db = create_engine('sqlite', opts={'filename': 'tutorial.db'})

db.echo = True  # We want to see the SQL we're creating

# The users table already exists, so no need to redefine it. Just
# load it from the database using the "autoload" feature.
users = Table('users', db, autoload=True)

def run(stmt):
    rs = stmt.execute()
    for row in rs:
        print row

# Most WHERE clauses can be constructed via normal comparisons
s = users.select(users.c.name == 'John')
run(s)
s = users.select(users.c.age < 40)
run(s)

# Python keywords like "and", "or", and "not" can't be overloaded, so
# SQLAlchemy uses functions instead
s = users.select(and_(users.c.age < 40, users.c.name != 'Mary'))
run(s)
s = users.select(or_(users.c.age < 40, users.c.name != 'Mary'))
run(s)
s = users.select(not_(users.c.name == 'Susan'))
run(s)

# Or you could use &, | and ~ -- but watch out for priority!
s = users.select((users.c.age < 40) & (users.c.name != 'Mary'))
run(s)
s = users.select((users.c.age < 40) | (users.c.name != 'Mary'))
run(s)
s = users.select(~(users.c.name == 'Susan'))
run(s)

# There's other functions too, such as "like", "startswith", "endswith"
s = users.select(users.c.name.startswith('M'))
run(s)
s = users.select(users.c.name.like('%a%'))
run(s)
s = users.select(users.c.name.endswith('n'))
run(s)

# The "in" and "between" operations are also available
s = users.select(users.c.name.in_('Mary', 'Susan'))
run(s)
s = users.select(users.c.age.between(30,39))  # This might fail
run(s)
s = users.select(between_(users.c.age, 30, 39))  # This will work
run(s)

# If you want to call an SQL function, use "func"
s = users.select(func.substr(users.c.name, 2, 1) == 'a')
run(s)

# You don't have to call select() on a table; it's got a bare form
s = select([users], users.c.name != 'Carl')
run(s)

# This can be handy for things like count()
s = select([func.count(users.c.user_id)])
run(s)

Most of this should be pretty self-explanatory, and should give you an idea of what's possible. A few notes:

# Or you could use &, | and ~ -- but watch out for priority!
s = users.select((users.c.age < 40) & (users.c.name != 'Mary'))
run(s)
s = users.select((users.c.age < 40) | (users.c.name != 'Mary'))
run(s)
s = users.select(~(users.c.name == 'Susan'))
run(s)

In Python, the & (and), | (or), and ~ (not) operators have a higher priority than comparison operators like == and !=. So if you want to use the &, | and ~ operators, you have to be careful to wrap the other clauses in parentheses. If you forget the parentheses, you'll be surprised by the results:

s = users.select(users.c.age < 40 & users.c.name != 'Mary')

will be interpreted as:

s = users.select(users.c.age < (40 & users.c.name) != 'Mary')

which will almost certainly not return the results you were expecting.

Another note:

s = users.select(users.c.age.between(30,39))  # This might fail
run(s)
s = users.select(between_(users.c.age, 30, 39))  # This will work
run(s)

The latest SQLAlchemy release as I write this is 0.1.6, in which the first statement fails because the between() function is not available on table columns. This is an oversight, and the next SQLAlchemy release will fix this. If you're running an SVN checkout of SQLAlchemy, this bugfix is also available. But if you're running the 0.1.6 release or earlier, you won't be able to use between() on table columns, and you'll need to use the second form shown above. (With the final underscore at the end of between_().)

Finally:

# This can be handy for things like count()
s = select([func.count(users.c.user_id)])
run(s)

There doesn't seem to be a way to do COUNT(*) -- but doing COUNT(some_column) will return the same results under most circumstances.

Joins

At this point, you're probably wondering about using multiple tables in a single select() statement. Wonder no more. Copy the following into "joindemo.py":

from sqlalchemy import *

db = create_engine('sqlite', opts={'filename': 'joindemo.db'})

db.echo = True

users = Table('users', db,
    Column('user_id', Integer, primary_key=True),
    Column('name', String(40)),
    Column('age', Integer),
)
users.create()

emails = Table('emails', db,
    Column('email_id', Integer, primary_key=True),
    Column('address', String),
    Column('user_id', Integer, ForeignKey('users.user_id')),
)
emails.create()

i = users.insert()
i.execute(
    {'name': 'Mary', 'age': 30},
    {'name': 'John', 'age': 42},
    {'name': 'Susan', 'age': 57},
    {'name': 'Carl', 'age': 33}
)
i = emails.insert()
i.execute(
    # There's a better way to do this, but we haven't gotten there yet
    {'address': 'mary@example.com', 'user_id': 1},
    {'address': 'john@nowhere.net', 'user_id': 2},
    {'address': 'john@example.org', 'user_id': 2},
    {'address': 'carl@nospam.net', 'user_id': 4},
)

def run(stmt):
    rs = stmt.execute()
    for row in rs:
        print row

# This will return more results than you are probably expecting.
s = select([users, emails])
run(s)

# The reason is because you specified no WHERE clause, so a full join was
# performed, which returns every possible combination of records from
# tables A and B. With an appropriate WHERE clause, you'll get the
# restricted record set you really wanted.
s = select([users, emails], emails.c.user_id == users.c.user_id)
run(s)

# If you're interested in only a few columns, then specify them explicitly
s = select([users.c.name, emails.c.address], 
           emails.c.user_id == users.c.user_id)
run(s)

# There are also "smart" join objects that can figure out the correct join
# conditions based on the tables' foreign keys
s = join(users, emails).select()
run(s)

# If you want all the users, whether or not they have an email address,
# then you want an "outer" join.
s = outerjoin(users, emails).select()
run(s)

# Order of outer joins is important! Default is a "left outer join", which
# means "all records from the left-hand table, plus their corresponding
# values from the right-hand table, if any". Notice how this time, Susan's
# name will *not* appear in the results.
s = outerjoin(emails, users).select()
run(s)

That's enough for a taste. More information can be found in the SQL Construction section of the SQLAlchemy documentation. Now let's move on to the really interesting part: mapping your data objects to SQL database rows.

Mapping your objects to SQL rows

Now for the really interesting part: mapping your objects onto the database.

In other object-relational mappers such as SQLObject, the table definition also doubles as the class whose instances are rows of data from the table. SQLAlchemy, on the other hand, makes a strict distinction between the table definition and the data class. You first create the table definition, then create an (empty) class definition that will hold your data objects, and then create a mapper that will map that class onto the database. It's perhaps easier to show how this works than to explain it. Copy the following into "mapper1.py":

from sqlalchemy import *

db = create_engine('sqlite', opts={'filename': 'joindemo.db'})

db.echo = True

users = Table('users', db, autoload=True)
emails = Table('emails', db, autoload=True)

# These are the empty classes that will become our data classes
class User(object):
    pass
class Email(object):
    pass

usermapper = mapper(User, users)
emailmapper = mapper(Email, emails)

mary = usermapper.get_by(name='Mary')
mary.age += 1

objectstore.commit()

Let's break this down piece-by-piece to see what's going on here.

from sqlalchemy import *

db = create_engine('sqlite', opts={'filename': 'joindemo.db'})

db.echo = True

Here we're using the same SQLite database that we created in the join demo, which contains users and email addresses. Unless you've deleted that file, the data (four users and four email addresses) should still be there as well. We set echo to True so that the SQL will be printed out at each step.

users = Table('users', db, autoload=True)
emails = Table('emails', db, autoload=True)

Because the users and emails tables are already in the database, we don't have to specify them again; we can just let SQLAlchemy fetch their definitions from the database.

# These are the empty classes that will become our data classes
class User(object):
    pass
class Email(object):
    pass

Note that your data classes must be new-style classes (e.g., derived from the base class object). If they aren't, SQLAlchemy will raise an ArgumentError exception when you try to create the mapper. If you don't know what the difference between new-style classes and old-style classes is, don't worry about it; just get in the habit of deriving all your classes either from object or from another base class that descends from object. Most of Python's object-oriented features work much better on classes that ultimately derive from object, and SQLAlchemy makes heavy use of those features.

Moving on:

usermapper = mapper(User, users)
emailmapper = mapper(Email, emails)

This is where all the magic happens. The mapper() function takes a minimum of two parameters: first the data class to modify, and then the table object onto which that data class should be mapped. The data class will have attributes automatically added to it that correspond to the columns of your database table. Thus, the User class now has User.user_id, User.name, User.age, and User.password. The Email class now has Email.email_id, Email.address, and Email.user_id.

mary = usermapper.get_by(name='Mary')

In addition to creating attributes on the data classes, the mapper() function call also returns a mapper object, which has some features of its own. For example, it has select_by() and get_by() methods, which are convenience methods for doing a SELECT statement. select_by() will return a list of results, while get_by() will return exactly one result.

mary.age += 1

Now that we have an instance of the data class (in the object "mary"), we can manipulate its attributes just like a normal object. SQLAlchemy will keep track of the changes we make, but won't actually send them to the database right away. To send our changes to the database, we need to ...

objectstore.commit()

... call the objectstore.commit() method. This will take all the changes we've made to our data objects, and "flush" those changes out to the database. If we've made multiple changes, some of which depend on other changes (e.g., adding a new user and several email addresses for that user), SQLAlchemy is smart enough to write the SQL statements in the correct order.

Note that the objectstore.commit() method should not be confused with a database transaction. Despite the "commit()" name, this function call does not start a transaction, nor will it commit a transaction if you've already opened one. In a future release of SQLAlchemy, this method's name will be changed to "flush()" instead of "commit()" to reduce confusion. Meanwhile, you just need to remember that objectstore.commit() has nothing to do with transactions.

If you want to use a transaction, by the way, you should call the begin() method on your SQLEngine object. In our case, we called that object db, so to start a transaction, we'd call db.begin(). Then db.rollback() and db.commit() would either rollback or commit the transaction, respectively.

There's more information on transactions in the Database Engines section of the SQLAlchemy documentation. Right now, let's move on to the next section of the tutorial, where we'll examine more features of SQLAlchemy's mapper() function.

Data mapping, continued

Remember how, in the joindemo.py tutorial, one of the comments said:

# There's a better way to do this, but we haven't gotten there yet

Well, now's the time to look at the better way to handle one-to-many and many-to-many relations. Copy the following into "mapper2.py":

from sqlalchemy import *

db = create_engine('sqlite', opts={'filename': 'joindemo.db'})
db.echo = True
users = Table('users', db, autoload=True)
emails = Table('emails', db, autoload=True)

# Let's give our User and Email classes a little more smarts
class User(object):
    def __init__(self, name=None, age=None, password=None):
        self.name = name
        self.age = age
        self.password = password
    def __repr__(self):
        return self.name
class Email(object):
    def __init__(self, address=None):
        self.address = address
    def __repr__(self):
        return self.address

# Here we look at several alternate ways to do the same thing. Try
# running this program multiple times, enabling a different one of
# these code blocks each time.
if False:
    # We create the Email mapper first...
    Email.mapper = mapper(Email, emails)
    # ... so that we can use it in the User mapper
    User.mapper = mapper(User, users, properties={
        'emails': relation(Email.mapper),
    })
    mary = User.mapper.get_by(name='Mary')
    print mary.emails

elif False:
    # What if we also want a "user" property on the Email class?
    # Create the Email mapper first
    Email.mapper = mapper(Email, emails)
    # Then reference it in the User mapper
    User.mapper = mapper(User, users, properties={
        'emails': relation(Email.mapper),
    })
    # And now add a property to the original Email mapper
    Email.mapper.add_property('user', relation(User.mapper))
    john = User.mapper.get_by(name='John')
    print john.emails
    carl_address = Email.mapper.get_by(address='carl@nospam.net')
    print carl_address.user

elif False:
    # There's a handy "backref" feature that will do the above for you
    Email.mapper = mapper(Email, emails)
    User.mapper = mapper(User, users, properties={
        'emails': relation(Email.mapper, backref='user'),
    })
    # No need to call add_property(), it's already been done
    john = User.mapper.get_by(name='John')
    print john.emails
    carl_address = Email.mapper.get_by(address='carl@nospam.net')
    print carl_address.user

elif False:
    # Order doesn't actually matter; relation() is quite clever
    User.mapper = mapper(User, users)
    Email.mapper = mapper(Email, emails, properties={
        'user': relation(User.mapper, backref='emails'),
    })
    john = User.mapper.get_by(name='John')
    print john.emails
    carl_address = Email.mapper.get_by(address='carl@nospam.net')
    print carl_address.user

elif True:
    # Finally, let's demonstrate some other clever features
    Email.mapper = mapper(Email, emails)
    User.mapper = mapper(User, users, properties={
        'emails': relation(Email.mapper, backref='user'),
    })

    # ForeignKey references are automatically mapped to the appropriate
    # data class
    harry = User(name='Harry', age=47)
    em1 = Email('harry@nowhere.com')
    em2 = Email('harry@example.org')
    em1.user = harry
    harry.emails.append(em2)
    print em2.user
    print harry.emails

    # If a relation has been defined, then get_by and select_by calls
    # can do the correct joins automatically
    print User.mapper.get_by(address='mary@example.com')
    print Email.mapper.select_by(age=42)
    # This will only work if the column you're looking for is *not*
    # present in the "original" class, but is present in one of its
    # relations. For example, the following does *not* do a join to the
    # User table, but gets the user_id value from the Email table. Notice
    # the difference in the SQL that's printed.
    print Email.mapper.select_by(user_id=2)

Most of this is pretty self-explanatory as well. A couple items are worth noting:

Email.mapper = mapper(Email, emails)

Instead of using names like usermapper and emailmapper as we did in mapper1.py, this time we follow SQLAlchemy convention and store the mapper object in the data class, as User.mapper and Email.mapper respectively. That way every data object will have a reference to the mapper object that controls it. It doesn't make much difference in this example, but it can make a lot of difference in certain circumstances.

There's another helper function that I didn't show in the example above: the assign_mapper() function. It creates the mapper attribute as per convention, and also takes the get_by(), select_by(), and other mapper-based functions and makes them attributes of the data class. This lets you do:

assign_mapper(Email, emails)
print Email.select_by(name='John')
# Equivalent to:
print Email.mapper.select_by(name='John')

assign_mapper() takes all the same parameters that mapper() takes, so our mapper-assignment code could have looked like:

assign_mapper(Email, emails)
assign_mapper(User, users, properties={
    'emails': relation(Email.mapper, backref='user'),
})

Data mapping, part three: many-to-many relationships

There's one more item to cover. We've looked at one-to-many relationships, but we also need to look at many-to-many relationships. As you probably know already, many-to-many relationships in databases are handled by a third table that holds information about the relation. E.g., if you have an "articles" table and a "keywords" table, and you want to be able to associate keywords with articles, you'd need a many-to-many relationship. One-to-many wouldn't work, because one article might need to be tagged with several different keywords, and the same keyword might be used to tag several articles; so this is a classic many-to-many relationship. Thus, you'd use a third table with just two columns, "article_id" and "keyword_id", to keep track of the associations. By convention, such a table is usually named with the names of the two tables it references, separated by an underscore. Thus, the table structure in the following example (call it "manytomany.py"):

from sqlalchemy import *
db = create_engine('sqlite', {'filename': 'keywords.db'})
db.echo = True

articles = Table('articles', db,
    Column('article_id', Integer, primary_key = True),
    Column('headline', String(150)),
    Column('body', String),
)
articles.create()

keywords = Table('keywords', db,
    Column('keyword_id', Integer, primary_key = True),
    Column('keyword_name', String(50)),
)
keywords.create()

association = Table('articles_keywords', db,
    Column('keyword_id', Integer, ForeignKey('articles.article_id')),
    Column('article_id', Integer, ForeignKey('keywords.keyword_id')),
)
association.create()

class Article(object):
    def __init__(self, headline=None, body=None):
        self.headline = headline
        self.body = body
    def __repr__(self):
        return 'Article %d: "%s"' % (self.article_id, self.headline)

class Keyword(object):
    def __init__(self, name=None):
        self.keyword_name = name
    def __repr__(self):
        return self.keyword_name

# To create a many-to-many relation, specify the association table as
# the second parameter to mapper()
Article.mapper = mapper(Article, articles)
Keyword.mapper = mapper(Keyword, keywords, properties = {
    'articles': relation(Article.mapper, association, backref='keywords'),
})

a1 = Article(headline="Python is cool!", body="(to be written)")
a2 = Article(headline="SQLAlchemy Tutorial", body="You're reading it")

k_tutorial = Keyword('tutorial')
k_cool = Keyword('cool')
k_unfinished = Keyword('unfinished')

a1.keywords.append(k_unfinished)
k_cool.articles.append(a1)
k_cool.articles.append(a2)
# Or:
k_cool.articles = [a1, a2]  # This works as well!
a2.keywords.append(k_tutorial)

# Now we write all this out to the database in one single step, and
# SQLAlchemy automatically figures out the correct order for the SQL
# statements.
objectstore.commit()

print a1, a1.keywords
print a2, a2.keywords
print k_tutorial, k_tutorial.articles
print k_cool, k_cool.articles
print k_unfinished, k_unfinished.articles

Again, the code pretty much speaks for itself. If you pass the association table as the second parameter to mapping(), and SQLAlchemy sees that it's in the correct format (it has only two columns, each of which is a foreign key to one of the tables involved in the relation), it will automatically set up a many-to-many relationship for you.

If you want to hold more data in the association object (for example, maybe you want to record the exact date and time when the article was tagged with any given keyword), it's slightly more complicated. The SQLAlchemy documentation has a good explanation of the process at the bottom of the Data Mapping page. That whole page, in fact, is well worth reading, since there are several other features of data mapping that I glossed over or left out entirely, in order to keep the size of this tutorial manageable. And once you've grasped those concepts, you can move on to the Advanced Data Mapping section, which covers subjects like mapping a table onto itself (useful for tracking manager/subordinate relationships in an Employees table, for example).

Conclusion

Hopefully this has given you a taste of what using SQLAlchemy feels like. The examples in this tutorial should be enough to give you a head start in writing your own applications. When you need more advanced features beyond what we've covered here, check out the extensive SQLAlchemy documentation. Just about everything you'll need is there; and if you still don't find what you're looking for, join the SQLAlchemy mailing list and ask!