SECRET OF CSS

Async All The Way Down!. SqlAlchemy 1.4 now supports asyncio | by Peter Bunyan


The Jabberwock, as illustrated by John Tenniel, 1871

Callooh! Callay!¹ SqlAlchemy 1.4 now supports asyncio. The work they have done is spectacular.

For years I have been using SqlAlchemy with Tornado and have had to resort to a ThreadPoolExecutor to make my queries run in parallel. No more. Now it will be async all the way down to the last turtle. Unless, of course, you decide to use the ORM. Then greenlets will solve all your problems. But let us step back and discuss the editorial decision you make as you design an API.

To function or class?

The first editorial decision is often a matter of habit. Some I know would not allow the overhead of classes — that’s all inheritance and unnecessary complexity. They are wrong some of the time. An elegant class hierarchy can often ease the pain of how to navigate the graph of code before you. At other times a bunch of functions will do just fine.

Take for example the ease of learning Java. Everything was a class and everything required the implementation of an interface. Then you look at a python package — what style did this wally use? functional, classes or a — oh, good grief. Documentation helps, but there are times when I could not figure out what they expected without reading the source code.

Rant over, let’s have a look at our options. We’re trying to achieve a simple timesheet application. I’ve chosen to model it as follows:

tables diagram

Why? Because a user will log their time usage and do that throughout the day. Only a user can log time and only a user can edit their time. We’ve isolated the activities to a user’s namespace; activities are not shared. So a Log is just a composition of a timestamp and an activity. The user will also save notes about a day.

So how do we generate views on this model? Are we going to expose a CRUD like interface or assume a context? Let’s have a look at the log view first:

First, who is the current logged in user? Then let’s marshal some dates, which maybe None and may be poorly formatted. I let python-dateutil solve this problem for me. Then we get to our query. We’re using the new 1.4 Session and select statement. First, it’s context aware. It closes as it leaves the context, and instead of passing in an array of selectables, we use the *args to pass them in. It’s clean and intuitive. We join, add a where and order_by. This cursor then has an all() method, and since each row has an items interface, we can deconstruct them into the arguments to our dto constructor. Good grief man, it looks like SQL. Well, is that not the point – to write sql in your python.

Let’s now have a look at the equivalent query in an ORM world.

You can see the query is slightly more compact and that our model class knows about the different views it supports. There is some complexity in using the options for the select – what loading strategy to use? But wait – there is no await. That is because we’re using the greenlet abstraction built into SQLAlchemy 1.4 called: session.run_sync. It allows us to treat our asynchronous model as if it were the old synchronous style. Now here is another fun extension to the library:

Our Log class declares a meta data __table__, which allows use to use both the table in the expression language and the model in the ORM! So if you don’t need the overhead of the ORM, by pass it. If you want the ORM and its session cache – then use it. The sheer effort to bring these worlds into a seamless metaphor is astounding – but the documentation is still fragmented and impenetrable.

So the answer is both. Define your schema with table, augment it with a model and write functions that take advantage of both. Why functions? Because we’re building an api.

Of context and connections

The context is an enriched space in python starting with 3.7. Our contextvars now use both thread.local and task.local. So we can put the current user into a ContextVar and whether we’re running a task or working in a ThreadPoolExecutor each time we appeal to the variable it will be correct. This is also what happens with Session.

We can configure this async_session_factory from our environment variables:

Or, when we are testing, pick them up from our setup.cfg:

You might ask, what is Config and command? It’s Alembic, and we’re preparing the database in a pytest fixture, sort of like django.unittest. By using alembic for our database migration we get testability as well as schema migration, and it is driven from the tables.metadata. Alembic supports async with a new template. I abuse the system by putting the scripts directory into my python package and renaming the alembic.ini to setup.cfg. In today’s world of .cfg and .toml it is hard to know where all the configuration should live. I’ve not been able to get rid of the setup.cfg, although setup.py is gone. pyproject.toml is excellent, but no one has been able to sort out who does what and where. I’m sure it will come clean in the end, but for the moment I still need a setup.cfg, a format that alembic can understand, and a pyproject.toml because black won’t look in a .cfg. Mind you this is only for testing. In production, most configuration is loaded from environment variables or command line arguments.

So now we can test our tables and model and we can initialise our database. So let’s test something.

Testing

To test asynchronous code you’ll need an extension to pytest. I’ve used pytest.torn-async and pytest.asyncio. Because we’re not working with tornado yet, I’ll use pytest.asyncio. It expects a configuration and will look into your setup.cfg – add this:

[tool:pytest]
asyncio_mode = strict

Otherwise it will complain with a warning about depreciation. Now we’re set to go. Let’s write a register function to register a user:

We do an insert, and a select, in order to pick up the defaults from the database. There is a trend to use returning which would make this unnecessary, but my backends (mysql & sqlite)² don’t support it. I’ve abstracted the select as it is used in multiple places. And I raised the IntegrityError – which should possibly be abstracted. We’ve also encrypted the password – that should likely be salted. But we can test this:

Our init_db fixture just raids setup.cfg and downgrades and upgrades the database. The asyncio decorator lets pytest.asyncio know that we’re to be run asynchronously. And we can test the returned class contains the defaults we did not specify. We also cannot register twice.

So, if register were a class method on model.User, what would our test look like?

We’ve written a synchronous function within our test and called session.run_sync. So have we answered the question? Is it models or tables? functions or classes? Still both.

Hopefully, you can see where this is going. We’re build an application where a user is going to be the UML actor on our system. That user is going to log activities — we don’t need to know the activity id, the database primary key, just the text they’ve used.

REST makes endpoints and CRUD exposes the implementation of our database to our actors. Together they make a lazy abstraction of our models. In that world one would need to create an Activity before being able to create a Log. A Log would be:

{id, activity_id, created}

or worse:

{id, activity:{id, name, user:{id, name}}, created}

The context of an api simplifies our results and enables us to expose a functional interface. Each function returns a graph, but it is not just a rip of the database model. So our log is {id,activity, created}.

So what does this mythical log function look like:

This is the model version of log. It’s more compact than the functional version. We create a now, pass it with the current user and the supplied activity_name into class that we add to a session and commit! Then we with rpc.broadcast we tell all my wife’s devices that she’s added a log and even the one she used to add it. If there were any error only one device – the caller – would handle it, otherwise they’re all equal and connected!

If you’ve read this far you’ll want to see the code. It’s on Replit³, along with a working version. This uses the rpc I described in Remote Procedures, Please.

To Conclude

Models are about state and function — encapsulation. They make it less error prone to manage the relations in a graph. This can be seen by there being nothing but relationships added to the model of a table — plus some views of course. But we live in a functional world — a world of remote procedures. And often the understandable cost of an ORM is not needed. If you know the sql, map it with the expression language and await your joy.

The authors of SQLAlchemy deserve great praise for the work they have done. I could not be happier using their package to map my way. I’ve been using 1.4 for a year now and have done nothing but chortle with joy. Combined with tornado and vue, one can do just about anything. A vorpal¹ blade indeed.



News Credit

%d bloggers like this: