Measure time to hydrate, transform, index #115

hackartisan · 2024-10-03T14:09:34Z

We have requirements on how fast we have to be able to fully perform each of these steps. We need to know whether we're meeting those requirements.

Other things to think about:

Do we want to collect stats on this in an ongoing way? I.e. will we want to know how long it took last time? last month? E.g. if there were a lot of network errors during the last run, the timing we see might be anomalous.
Will it be relevant how many records were hydrated / transformed / indexed, or do we only care about how long it took?
We'll have to define how we know when we're done. Maybe we record some ID at startup and then check whether we hit that ID?
Potentially related to the "validate we got everything" ticket, Completeness Validation of the Index #46

Acceptance Criteria

There is some page I can go to or log message I can find that says how long each of these steps took the last time it ran

tpendragon · 2024-10-14T17:35:16Z

Brainstorming a bit:

Some possibly useful metrics I can think of are:

Time to Poll - this is how long it takes on a fresh start-up for the Hydrator to hit polling. You can't really do this for the other consumers, because you'd include the previous steps in the measure. You'd have to start the transformer step after hydrator is polling, likewise for the indexer once the transformer is polling.
- challenges:
Time to Process 1 Doc. Seems like we'd have to record what the doc is, and then notify when it came out of the indexing consumer. May help us know where to optimize when the time comes, 1 doc vs. the system, but probably not super relevant to this ticket.
Throughput - records/s indexed while the hydrator's going. Measure time to poll for each producer.

If we measure records / second / stage we could re-write our metrics that we established in these terms.

Broadway measures throughput for us in some form. maybe we could leverage that somehow?

we'd want an average time built up over a large set. e.g.:

Start from nil, collect stats until you hit polling, then compute an average from those.
still have to make sure you only collect the stat when you've started over after the previous stage/s "finished". one way to do this might be to only write the stat if you processed a certain number of resources before starting to poll.
or if you get from nil to a certain id.

Questions: where to store these stats? Integrate into livedashboard?

Implementation idea: run a watcher process that uses telemetry events.

hackartisan added the dls-work-cycle label Oct 3, 2024

tpendragon mentioned this issue Oct 21, 2024

Measure Hydration/Transformation/Indexing Times #153

Merged

hackartisan assigned tpendragon Oct 22, 2024

hackartisan added this to the Slice 1 milestone Oct 23, 2024

tpendragon removed their assignment Nov 1, 2024

tpendragon self-assigned this Nov 20, 2024

hackartisan closed this as completed in #153 Dec 4, 2024