Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api, robot-server): get historical and current command errors #16697

Merged
merged 63 commits into from
Nov 25, 2024

Conversation

TamarZanzouri
Copy link
Contributor

@TamarZanzouri TamarZanzouri commented Nov 5, 2024

Overview

closes https://opentrons.atlassian.net/browse/EXEC-655.
GET historical command errors and refactor current command errors.

Test Plan and Hands on Testing

run a protocol with ER failed commands.

  • GET /runs/{runId}/commandsErros while the run is active and make sure you get the failed commands errors.
  • GET /runs/{runId}/commandsErros when the run is finished and make sure you get the failed commands errors.

Changelog

  • added db schema 8.
  • added command_error, command_status field to commands_table.
  • insert command_error, command_status when inserting a command to commands table.
  • get_commands_errors_slice
  • added missing indexes from previous migrations.

Review requests

changes make sense?

Risk assessment

low.

@TamarZanzouri TamarZanzouri requested a review from a team as a code owner November 5, 2024 21:23
@TamarZanzouri TamarZanzouri marked this pull request as draft November 5, 2024 21:23
@TamarZanzouri TamarZanzouri marked this pull request as ready for review November 6, 2024 21:14
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

api/src/opentrons/protocol_engine/state/command_history.py Outdated Show resolved Hide resolved
Comment on lines 127 to 128
CREATE INDEX ix_run_command_command_error ON run_command (command_error)
""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested this, but I don't think this index is what we want. (This index is what happens when you just do index=True on the new column declaration.)

This will, across all commands from all runs, maintain an index that is a lexicographic sort of the errors' JSON strings.

That does not help with efficiently serving the SELECT statement that you've written in RunStore, which is something like, "get all the errors from a single run, in order."

So I think what we want instead is either:

  • A compound index on (run_id, has_error, index_in_run), where has_error is a true/false value computed from command_error being non-NULL. I don't know how to do this off the top of my head with SQLAlchemy, but here is SQLite documentation that looks relevant.
  • A compound index on (run_id, command_status, index_in_run), where command_status is a new column storing the command's succeeded/failed status. And then the SELECT statement filters based on command_status == "failed".

Copy link
Contributor

@SyntaxColoring SyntaxColoring Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also test whether it's fast enough to just use the existing (run_id, index_in_run) compound index. From there, filtering for just the failed commands will be an O(n) linear scan, but...it's a linear scan implemented by optimized C, so maybe it's good enough.

@SyntaxColoring SyntaxColoring force-pushed the EXEC-655-store-commands-error-list-in-db branch from bf9b701 to 5af324e Compare November 20, 2024 20:17
@TamarZanzouri TamarZanzouri marked this pull request as ready for review November 20, 2024 20:52
Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, but I'm not sure about the query.

Comment on lines +54 to +70
add_column(
dest_engine,
schema_8.run_command_table.name,
schema_8.run_command_table.c.command_error,
)

add_column(
dest_engine,
schema_8.run_command_table.name,
schema_8.run_command_table.c.command_status,
)

_add_missing_indexes(dest_transaction=dest_transaction)

_migrate_command_table_with_new_command_error_col_and_command_status(
dest_transaction=dest_transaction
)
Copy link
Contributor

@SyntaxColoring SyntaxColoring Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action needed, just some thoughts.

Early on in this PR, we made the decision to add the new column with ALTER TABLE, instead of letting SQLAlchemy create the new one from scratch.

I don't think that decision is panning out well.

  1. We have to manually create indexes.
  2. We have to manually create constraints. (And we're not doing that right now—see my # todo comment in schema_8.py).
  3. The command_status column needs to be nullable even when that's not a good match for the data we're storing in it.

1 and 2 are thankfully caught by tests now, and perhaps we can improve them by fleshing out the add_column() function. But 3 is fundamental to SQLite. We talked about fixing 3 with Alembic, but I've looked into it and I'm skeptical that that will be easy.

Meanwhile, is ALTER TABLE really saving that much overhead compared to creating the new table from scratch? We're still iterating over every single command and parsing it as JSON, and I imagine that cost will dominate the savings from not having to copy the data.

I think we should stick with what you have now, but in the future, let's try the other way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other way means creating all tables from scratch? we still need to insert the data.

robot-server/robot_server/runs/run_store.py Outdated Show resolved Hide resolved
robot-server/robot_server/runs/run_store.py Show resolved Hide resolved
robot-server/robot_server/runs/run_store.py Outdated Show resolved Hide resolved
robot-server/robot_server/runs/run_store.py Outdated Show resolved Hide resolved
Comment on lines +74 to +75
# todo(2024-11-20): Probably add the indexes missing from prior migrations here.
# https://opentrons.atlassian.net/browse/EXEC-827
Copy link
Contributor

@SyntaxColoring SyntaxColoring Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TamarZanzouri FYI I think that it was correct for you to have had the other fixes in here, but when I was resolving merge conflicts, it was feeling like a bit too much to try to do at once, so I removed them. Let's do it in a follow-up?

Comment on lines +235 to +242
# nullable=True because it was easier for the migration to add the column
# this way. This is not intended to ever be null in practice.
nullable=True,
# todo(mm, 2024-11-20): We want create_constraint=True here. Something
# about the way we compare SQL in test_tables.py is making that difficult--
# even when we correctly add the constraint in the migration, the SQL
# doesn't compare equal to what create_constraint=True here would emit.
create_constraint=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TamarZanzouri FYI, I made these changes when resolving merge conflicts:

  1. Made command_status nullable to match the ALTER TABLE statement that you were doing in the migration.
  2. Removed the command_status enum constraint because I couldn't figure out how to get the migration to match it in a way that would pass the tests. We should figure that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried setting create_constraint=True but its acting weird in the test as you mentioned. maybe we should add a CREATE CONSTRAINT to the migration script?

@TamarZanzouri
Copy link
Contributor Author

tested the migration on a FLEX and it works as expected. @SyntaxColoring

Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last fixup, but looks great to me overall. Thank you!

robot-server/robot_server/runs/run_store.py Show resolved Hide resolved
@TamarZanzouri TamarZanzouri merged commit 33100e7 into edge Nov 25, 2024
21 checks passed
@TamarZanzouri TamarZanzouri deleted the EXEC-655-store-commands-error-list-in-db branch November 25, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants