SECRET OF CSS

Develop a Rust Macro To Automatically Write SQL Boilerplate Code | by Julien de Charentenay | Sep, 2022


On meta-programming with Rust

Image by Do Exploit from Pixabay

This story describes the writing of a Rust macro that automatically implements wrapper methods around SQL statements to reduce the boilerplate code requirements when using SQL databases. The macro is available on crates.io, but is a work-in-progress at this stage with numerous limitations, including only supporting the rusqlite wrapper. The source code for the procedural macro is available at https://github.com/juliendecharentenay/derive-sql under an MIT open-source license.

I wrote this code and story as I wanted to refine my understanding of procedural macros. I chose to develop something around SQL, as I am anticipating I will need it for my email masking side project — https://1-ml.com — to store and retrieve usage statistics.

The views/opinions expressed in this story are my own. This story relates to my personal experience and choices and is provided with information in the hope that it will be useful but without any warranty.

My learning started from the section ‘How to write a custom derive macro’ of the Rust Programming Language — https://doc.rust-lang.org/book/ch19-06-macros.html#how-to-write-a-custom-derive-macro and the coding journey started with the creation of a library crate called derive-sql, with the following proc-macro=true in Cargo.toml to declare it as a procedural macro….

The objective of the procedural macro is to facilitate the storage and retrieval of a struct in an SQL database. This story uses, as an example, the following data structure to store a contact:

The DeriveSql macro is to implement the following methods for Contact:

I would like to note a couple of aspects:

I chose to have the insert , update , and delete method consume the object when called. There are no specific reasons for that choice apart from allowing method chaining — which may or may not be relevant.

The implementation is based on the rusqlite crate that wraps around SQLite. In the long term, I am hoping to attempt to make it agnostic from the SQL engine so that it is compatible with the rust-postgres crate (and possibly as a stretch target, to add compatibility with IndexedDB API for web client applications).

For this project, I chose to use test-driven development using documentation tests. With this approach, I have to write documentation and tests at the same time. That’s one aspect of Rust that I like — it makes it easy to apply (and even drives me towards) good practice.

The documentation tests take on the following form — the required //! leading comments have been removed for readability:

I started with the implementation of the create_table method first and gradually added the other methods and their associated tests. At this stage, none of the testing succeeds as nothing has been implemented yet.

Troubleshooting

Before talking about the implementation itself, this paragraph discusses what can be done when things don’t go to plan.

Solving issues with procedural macros is not straightforward. Writing a procedural macro is an exercise in ‘thinking like a compiler.’ One writes code that then generates the code to be compiled. This leads to compilation errors in code that are not ‘visible’:

Troubleshooting such problems requires either an excellent developer — which I am not — a lot of trial and error, or making the ‘invisible’ code visible. The latter is easy to achieve using either the cargo-expand crate — see here — or a compiler command (that cargo-expand wraps). The compiler command, shown below, is only available in nightly — i.e., one needs to change the Rust toolchain from stable to nightly using rustup default nightly :

cargo rustc --profile=check -- -Zunpretty=expanded

I prefer to use the compiler command directly as it saves adding yet another dependency. Using it in the above situation led me to the following code extract where my error is obvious — I am attempting to use r.get(0)? (lines 8 and 9) but it gets expanded into a string in place of code:

It is sometimes helpful to output the expanded code to a file. The generated code, after a little bit of tweaking to avoid compilation errors as shown below, can be edited to investigate modifications to be done and see what success looks like.

Implementation structure

I structured the implementation of the DeriveSql macro into (a) a lib.rs that contains the documentation and the entry function derive_sql and (b) a struct ImplDerive — in the file implderive.rs — that takes care of the code generation. The code generation is done in the method generate shown below. The method scaffold on individual methods, that each correspond to a method to be implemented — i.e., one method called impl_create_table that generates the code for the method create_table.

The Book of Rust probably explains how procedural macro works better than I would. But let me attempt to phrase it using my own words and understanding.

The procedural macro writes code that is compiled based on the code on which the procedural macro is applied. In the case of a derive macro, the procedural macro is applied to a struct , enum or union. The code read by the procedural macro is provided as a TokenStream — nominally a stream of tokens. This TokenStream is parsed using the syn library — see docs — to produce, in the derive macro scenario, a useable data structure DeriveInput — see the documentation — representing the struct (or enum or union ) on which the procedural macro is applied, i.e., the struct preceded by #[derive(...)] .

Using the information available in the DeriveInput data structure, the procedural macro is to generate a stream of tokens representing the code generated. For this purpose, the quote crate — see the documentation — is used.

My challenge has been (and still is) to understand and use the DeriveInput data structure and quote::quote! macro appropriately.

In my implementation, the ImplDerive struct stores a reference to the DeriveInput object created from the input TokenStream as shown below:

In this story, I am discussing the implementation of two methods: create_table, as it is a relatively simple, and insert, as it is slightly more complex. The implementation of the other methods can be viewed at https://github.com/juliendecharentenay/derive-sql/blob/main/src/implderive.rs.

Implementation of `create_table`

For our example using Contact , an implementation of the method create_table would look as follows:

The following code shows the procedural macro generating the implementation of the method create_table. For ease of explanation, two method calls self.name() and self.get_fields_named() have been expanded.

The implementation is divided into two sections: (a) preparation of information and statements — line 6 and 37 — and (b) generation of the token stream using the quote! macro — lines 39 to 45.

Understanding how token streams are generated is, in my view, key. The quote! interprets Rust code, using token stream and hence token identifiers. In the above code line 49, #name refers to the content of the variable name defined in line 7, which is the name of the struct on which the derive macro is applied, namely Contact in the example.

Reading the above one, I may assume (wrongly) that Contact can be changed to MyContact by writing impl My#name. Unfortunately, this is not how it works. To make such a change, one needs to define a new Ident using something like syn::Ident::new(format!("My{}", name), name.span()).

The challenge in the generation of the create_table method lays in retrieving the members of the struct on which the derive macro is applied. These members are used to generate the SQL statement that creates the table. This is achieved by (a) checking that the macro is applied to a data struct (lines 10–14) and that the struct fields are named fields (lines 15–19), and (b) building the SQL query statement based on the list of the named fields using their name and converting their type to SQL types (lines 24–25).

Implementation of `select`

For our example, using Contact, an implementation of the method select would look as follows:

The following code shows the procedural macro generating the implementation of the select method:

The main difference between select and create_table function lays in line 39. Line 39 generates a Contact object from the fields returned by the SQL statement. The #( #fields: #fields_assignment ), * results in the expansion of the two vectors: fields and fields_assignement, with a , character separation — resulting in the statement name: r.get(0)?, phone_number: r.get(1)? . This is explained in the Interpolation section of the quote crate doc that is worthwhile reading in detail.

The vector fields is a vector containing the named field identifier, namely name , phone_number , etc., as Ident.

The vector fields_assignment contains the assignment side. Going back to the troubleshooting section, my initial approach was to use a vector of string. But as mentioned in the previous section, the quote! macro operates on a stream of tokens and hence each string in a vector of strings is treated as a string token.

At line 32, the above code shows the selected approach whereby the vector fields_assignment is a vector of token streams — each generated using the quote! macro. It took some time to understand, but I got there in the end.



News Credit

%d bloggers like this: