SECRET OF CSS

Python Modules That Make Handling JSON Even Faster | by Tate Galbraith | Sep, 2022


Parsing JSON is a necessary evil of software development. If you’re interacting with an API, sending messages between hosts or scraping data then it’s probably in JSON. Python ships with the standard json library, and in most cases it is plenty sufficient, but for some scenarios you might want something more flexible and a bit faster.

If you’re building an application or a library that does a lot of heavy JSON lifting then it is worthwhile to invest in a more robust module. In this article we’ll look at some standout modules that let you slice and dice JSON in ways the vanilla modules could only dream of. Some of these modules also make significant speed improvements over the standard json library.

If you handle a high volume of complex JSON objects and need things to happen super fast then the orjson module is for you. This module is faster than the default Python json library and is capable of handling many different data types (including NumPy).

The primary benefits of this library are speed and efficiency, and it has some pretty impressive statistics:

1*0GTBkNqsIPPFZvffUwIXSw
Source.

Although this isn’t a direct replacement for the default json module, the speed and efficiency benefits make it especially enticing for data pipelines. If you’re struggling with parsing bottlenecks then this might be your holy grail of modules.

Tortilla is so awesome it will make you say: “where has this been all my life!?”. This module will take almost any API and wrap it so that you can literally use it just like a normal set of Python methods. The best part is, the results are stored as normal attributes so you can access them without having to massage any raw JSON into place.

This module is magical. If you’ve been building out your own client-side API parsing logic by hand over and over again then stop, take a step back and try this module right now.

Check out an example for how easily Tortilla works with the JSONPlaceholder API:

import tortillaapi = tortilla.wrap('https://jsonplaceholder.typicode.com')todo = api.todos(1).get()print(todo)>>> {'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}

All you need to do is provide the base URL to Tortilla and it will automatically wrap it up for you. Just like a real tortilla!

This module is really cool and makes pickling Python objects more readable. Most other pickle libraries will store serialized data in an unreadable format. Open up one of those pickled files and you’ll be presented with random garbage.

The jsonpickle module will store the same pickled Python objects in a human-readable JSON format. You’ll be able to open up the serialized objects and actually read them instead of just seeing gibberish.

For example, let’s assume we had the following class and object created:

class Foo:
self.bar = 1
foo = Foo()

If we wanted to serialize this basic object to JSON we could use jsonpickle to do this quite easily:

import jsonpicklejsonpickle.dumps(foo)>>> '{"py/object": "__main__.Foo", "bar": 1}'

This is pretty handy if you want to serialize objects but have them continue to be readable by some external process, indexer or human.

Let’s say you’re wrangling a huge blob of JSON. Let’s also say that you need to figure out what is different between this blob and yet another blob you just collected. You could scan through the contents yourself and try to pick out the changes, or you could take the slightly more sane approach: use the jsondiff module.

This module provides you with the ability to quickly diff two different JSON objects. Let’s say you had the following JSON blobs:

blob_1 = {'one': 1, 'two': 2, 'three': 3}
blob_2 = {'one': 1, 'two': 22, 'three': 3}

I know its pretty easy to spot the diff, but let’s have the jsondiff module find it for us:

import jsondiffjsondiff.diff(blob_1, blob_2)>>> {'two': 22}

Imagine doing this with a blob containing thousands of characters and tons of nested objects. Without jsondiff it would be like finding a needle in a haystack. The other benefit of using this module is that you can programmatically look for diffs and catch API errors before they make it into production.

When you’re working with a lot of JSON data you may also want to setup specific formatting and typing for JSON blobs. One of the ways to accomplish this is to generate a JSON schema. With a schema you can specify field types, requirements and other attributes for the overall JSON structure.

A great library for generating JSON schemas in Python is the GenSON module. This module allows you generate JSON schemas using a very straightforward, builder syntax. Let’s build a simple schema now:

from genson import SchemaBuilderschema_builder = SchemaBuilder()
schema_builder.add_schema({"type": "object", "properties": {}})
schema_builder.add_object({"my_obj": 123})
schema_builder.to_schema()>>> {'$schema': 'http://json-schema.org/schema#', 'type': 'object', 'properties': {'my_obj': {'type': 'integer'}}, 'required': ['my_obj']}

Here we’ve built a very small schema, but a schema nonetheless. It contains some basic information like the schema reference and type information for the associated objects.

Building out a JSON schema can become quite complex. For more details on types and other schema conventions, check out the official JSON Schema documentation.



News Credit

%d bloggers like this: