Very slow JSON serialization and deserialization and blocking event loop #489

Luksalos · 2024-11-25T14:39:53Z

What is the current behavior?

PrerecordedResponse.from_json(result) (link to code) is very slow, especially for larger inputs. This is due to the Dataclasses JSON library, where they are already aware of that performance issue but haven’t addressed it since 2020. In addition to .from_json(), the .to_dict() operation is also very slow, which one would use if they want to parse the output from the Deepgram SDK into their own Pydantic model.

In our case, for recordings lasting around 1 hour:

source = {"url": signed_url}
options = rerecordedOptions(
        model="nova-2-general",
        diarize=True,
        utterances=True,
        paragraphs=True)
deepgram.listen.rest.v("1").transcribe_url(source, options=options)

The .from_json() takes over 10 seconds. Pydantic parsing takes ~30ms.
For a 7-minute recording, the .from_json() operation took ~1.7 seconds, while Pydantic parsing took ~5ms.

This issue also affects the asynchronous version, where the problem is even more significant as it blocks the event loop for a long time.

Expected behavior

JSON serialization and deserialization shouldn't take that long, and CPU-heavy operations should definitely not block the event loop. Please consider using Pydantic or raw dataclasses.

The text was updated successfully, but these errors were encountered:

jjmaldonis · 2024-11-25T16:44:29Z

Adding __slots__ to the dataclasses may help -- this is worth a quick try. I have not tested, and I don't know if dataclasses actually support __slots__, but adding the class variable can result in dramatic speed improvements.

Overall, my opinion is that dataclasses begin to break down once the scope of their usage extends past the immediate value proposition of dataclasses, and a different implementation tends to work better. Pydantic tends to be used for input validation, which isn't a critically important feature within this SDK because responses do not need to be validated. That said, I'm a big fan of pydantic in general. But choosing a different class implementation may give us the speed and flexibility wins we're looking for. That said, moving away from dataclasses will be a major breaking change.

Luksalos changed the title ~~Blocking~~ Very slow JSON serialization and deserialization and blocking event loop Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow JSON serialization and deserialization and blocking event loop #489

Very slow JSON serialization and deserialization and blocking event loop #489

Luksalos commented Nov 25, 2024

jjmaldonis commented Nov 25, 2024

Very slow JSON serialization and deserialization and blocking event loop #489

Very slow JSON serialization and deserialization and blocking event loop #489

Comments

Luksalos commented Nov 25, 2024

What is the current behavior?

Expected behavior

jjmaldonis commented Nov 25, 2024