A db migration does not take Query#apply_auto_limit in count #6767

eugene-nikolaev · 2024-02-15T12:40:37Z

Issue Summary

I am trying to upgrade a Redash 10.1.0 instance to the current preview version
A new version is a "preview" docker image pushed at Feb 10th of 2024.

After migration:

my queries with scheduled refreshes stopped to refresh
"refresh button" makes a refresh in UI but query result but after page refresh the query still do not have a query result.

I've discovered that this db migration:
https://github.com/getredash/redash/blob/master/migrations/versions/1038c2174f5d_make_case_insensitive_hash_of_query_text.py
does not take in count the "apply_auto_limit" setting of Queries.

So, if a Query's options contains apply_auto_limit: true, this migration converts a query hash without applying the limit. But the QueryRunner applies it as usually generates query results adding the LIMIT 1000. Then the result could never be associated with their queries anymore.

I guess this db migration should be rewritten and use smth like this:

redash/redash/query_runner/__init__.py

Line 310 in 939bec2

def apply_auto_limit(self, query_text, should_apply_auto_limit):

Steps to Reproduce

Install 10.1.0. E.g. I've launched https://hub.docker.com/layers/redash/redash/10.1.0.b50633/images/sha256-f3e8d95656255d9684051604a586dfd50035fa1e297e68379dc1b557f1885c0f?context=explore. Or it could be pulled from github 🤷
Create a datasource (it could be a Redash's database as well)
Create and publish query, e.g. select current_timestamp;.
Set it to be refreshed once a minute.
Ensure the query results are refreshed by refreshing a page after a few minutes.
Stop the instance, update the version (use a preview image or pull to the current master or whatever you should do), using the same internal Redash's database.
Run db migrations.
Run the redash.
Check the query after a few minutes - it wouldn't be refreshed. Not "Refresh" button will lead to the expected behaviour.

Technical details:

Redash Version: preview version
Browser/OS: Mac OS/Chrome
How did you install Redash:
used docker image: https://hub.docker.com/layers/redash/redash/10.1.0.b50633/images/sha256-f3e8d95656255d9684051604a586dfd50035fa1e297e68379dc1b557f1885c0f?context=explore

The text was updated successfully, but these errors were encountered:

eugene-nikolaev · 2024-02-16T12:07:51Z

This is how I patched it to work, but it seems far from best practices and I haven't much context to provide a good fix.
Should there be some helper extracted?

"""Make case insensitive hash of query text

Revision ID: 1038c2174f5d
Revises: fd4fc850d7ea
Create Date: 2023-07-16 23:10:12.885949

"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import table

from redash.utils import gen_query_hash

from redash.query_runner import BaseSQLQueryRunner

import json

# revision identifiers, used by Alembic.
revision = '1038c2174f5d'
down_revision = 'fd4fc850d7ea'
branch_labels = None
depends_on = None


def query_hash(query_text, options):
    options_json = json.loads(options)
    apply_auto_limit = options_json.get('apply_auto_limit', False)
    runner = BaseSQLQueryRunner({})
    limit_aware_query = runner.apply_auto_limit(query_text, apply_auto_limit)
    return gen_query_hash(limit_aware_query)


def change_query_hash(conn, table, query_text_to):
    for record in conn.execute(table.select()):
        query_text = query_text_to(record.query)
        options = record.options
        conn.execute(
            table
            .update()
            .where(table.c.id == record.id)
            .values(query_hash=query_hash(query_text, options)))

def upgrade():
    queries = table(
        'queries',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('query', sa.Text),
        sa.Column('query_hash', sa.String(length=10)),
        sa.Column('options', sa.Text))

    conn = op.get_bind()
    change_query_hash(conn, queries, query_text_to=str)


def downgrade():
    queries = table(
        'queries',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('query', sa.Text),
        sa.Column('query_hash', sa.String(length=10)),
        sa.Column('options', sa.Text))


    conn = op.get_bind()
    change_query_hash(conn, queries, query_text_to=str.lower)

eugene-nikolaev · 2024-02-19T07:19:32Z

NOTE: my migration makes different hashes if upgrade->revert. Looking into.

eugene-nikolaev · 2024-02-19T09:31:13Z

I have tight time so ended up with just making the migration irreversible 😬
Turns out there is a hassle with Snowflake queries and I don't want to mess with that at the moment.

def downgrade():
    # This is irreversible mainly because of Snowflake queries which have a grayed out "auto-limit" option in 10.1.0.
    # But in a "preview" version it is available and Snowflake queries suddenly become "auto-limited"
    # and their hashes becomes different.
    # Also it would take a hash correction because BaseSQLQueryRunner's apply_auto_limit in "upgrade branch" has
    # own opinion on semicolons.
    # To sum up - it is easier to just restore a dump than mess with it.
    raise Exception("Irreversible migration")

eugene-nikolaev · 2024-02-20T12:11:49Z

I've encountered issues while migrating real production data - it was failing on queries which had no statements (e.g. empty or everything commented out), so hacked it that way:

"""Make case insensitive hash of query text

Revision ID: 1038c2174f5d
Revises: fd4fc850d7ea
Create Date: 2023-07-16 23:10:12.885949

"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import table

from redash.utils import gen_query_hash

from redash.query_runner import BaseSQLQueryRunner

import json
import sqlparse

# revision identifiers, used by Alembic.
revision = '1038c2174f5d'
down_revision = 'fd4fc850d7ea'
branch_labels = None
depends_on = None

def is_empty_statement(stmt):
    # copy statement object. `copy.deepcopy` fails to do this, so just re-parse it
    st = sqlparse.engine.FilterStack()
    st.stmtprocess.append(sqlparse.filters.StripCommentsFilter())
    stmt = next(st.run(str(stmt)), None)
    if stmt is None:
        return True
    return str(stmt).strip() == ""


def query_hash(query_text, options):
    options_json = json.loads(options)
    apply_auto_limit = options_json.get('apply_auto_limit', False)
    runner = BaseSQLQueryRunner({})
    limit_aware_query = runner.apply_auto_limit(query_text, apply_auto_limit)
    return gen_query_hash(limit_aware_query)


def change_query_hash(conn, table, query_text_to):
    for record in conn.execute(table.select()):
        if is_empty_statement(record.query):
            continue
        query_text = query_text_to(record.query)
        options = record.options
        conn.execute(
            table
            .update()
            .where(table.c.id == record.id)
            .values(query_hash=query_hash(query_text, options)))


def upgrade():
    queries = table(
        'queries',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('query', sa.Text),
        sa.Column('query_hash', sa.String(length=10)),
        sa.Column('options', sa.Text))

    conn = op.get_bind()
    change_query_hash(conn, queries, query_text_to=str)


def downgrade():
    # This is irreversible mainly because of Snowflake queries which have a grayed out "auto-limit" option in 10.1.0.
    # But in a "preview" version it is available and Snowflake queries suddenly become "auto-limited"
    # and their hashes becomes different.
    # Also it would take a hash correction because BaseSQLQueryRunner's apply_auto_limit in "upgrade branch" has
    # own opinion on semicolons.
    # To sum up - it is easier to just restore a dump than mess with it.
    raise Exception("Irreversible migration")

😬

arikfr · 2024-09-21T04:26:09Z

Thank you for reporting this! I was debugging a similar issue and came to similar conclusions.

The problem is more extensive than just this migration:

Even if the query does not have auto limit applied if we call redash.utils.gen_query_hash it will generate a different hash from calling the QueryRunner version of gen_query_hash.
There are places in the codebase that still call redash.utils.gen_query_hash...

I will create a fix.

arikfr · 2024-10-07T06:41:42Z

An update on this:

On the good news side: the place in the code where we call redash.utils.gen_query_hash directly is also applying query limit, so it's probably fine but not ideal.

On the bad news side:

To fix the refresh issue, we need to generate a correct query hash for each query. Problem is that to do so we have two options:

Reuse model code - most reliable and accurate, but not idiomatic to use in migrations (and can cause issues so better to avoid). We can have some "upgrade" script that you run to fix the hashes. The downside is that it's an extra step admins need to be aware of.
Rewrite the whole logic without model code - too much effort and error prone.
Best effort in migrations + upgrade script.

I think I'll go for 3.

justinclift · 2024-10-07T06:46:00Z

What would it take for it to always work 100% automatically and reliably?

On that note, with a "best effort in migrations" approach, what do you reckon the behaviour will be when (if?) it doesn't work, and do you reckon we can catch the failures in some better way? 😄

justinclift · 2024-10-07T06:52:46Z

With the:

We can have some "upgrade" script that you run to fix the hashes. The downside is that it's an extra step admins need to be aware of.

Would it be dangerous to run that automatically, so it's a hands off thing for admins?

arikfr · 2024-10-07T06:53:35Z

It's quite a lot of rewriting existing code. It will work incorrectly only for some of the parameterized queries. All other queries are fine.

and do you reckon we can catch the failures in some better way? 😄

We can have something that checks the hash of the query when trying to refresh it and updating if needed. But not sure I want to invest in this much.. a better path will be to deprecate this whole concept of query hashes. Not sure it brings enough value.

justinclift · 2024-10-07T06:54:45Z

Interesting. Is it something that if the query hash fails to be found, it could automatically run the query again to get the newest result instead of erroring out?

arikfr · 2024-10-07T07:38:01Z

In #7184 you can see:

The migration.
I added a CLI command (can be invoked with docker compose run --rm server manage queries rehash) to rehash all queries.

I think these two should be enough?

arikfr mentioned this issue Oct 7, 2024

Fixes #6767: correctly rehash queries in a migration #7184

Merged

2 tasks

justinclift closed this as completed in #7184 Oct 25, 2024

justinclift closed this as completed in ba973eb Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A db migration does not take Query#apply_auto_limit in count #6767

A db migration does not take Query#apply_auto_limit in count #6767

eugene-nikolaev commented Feb 15, 2024

eugene-nikolaev commented Feb 16, 2024 •

edited

Loading

eugene-nikolaev commented Feb 19, 2024

eugene-nikolaev commented Feb 19, 2024

eugene-nikolaev commented Feb 20, 2024

arikfr commented Sep 21, 2024

arikfr commented Oct 7, 2024

justinclift commented Oct 7, 2024 •

edited

Loading

justinclift commented Oct 7, 2024 •

edited

Loading

arikfr commented Oct 7, 2024

justinclift commented Oct 7, 2024 •

edited

Loading

arikfr commented Oct 7, 2024

A db migration does not take Query#apply_auto_limit in count #6767

A db migration does not take Query#apply_auto_limit in count #6767

Comments

eugene-nikolaev commented Feb 15, 2024

Issue Summary

Steps to Reproduce

Technical details:

eugene-nikolaev commented Feb 16, 2024 • edited Loading

eugene-nikolaev commented Feb 19, 2024

eugene-nikolaev commented Feb 19, 2024

eugene-nikolaev commented Feb 20, 2024

arikfr commented Sep 21, 2024

arikfr commented Oct 7, 2024

justinclift commented Oct 7, 2024 • edited Loading

justinclift commented Oct 7, 2024 • edited Loading

arikfr commented Oct 7, 2024

justinclift commented Oct 7, 2024 • edited Loading

arikfr commented Oct 7, 2024

eugene-nikolaev commented Feb 16, 2024 •

edited

Loading

justinclift commented Oct 7, 2024 •

edited

Loading

justinclift commented Oct 7, 2024 •

edited

Loading

justinclift commented Oct 7, 2024 •

edited

Loading