Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ColumnVEPField to be simpler data structures #1213

Open
davmlaw opened this issue Dec 17, 2024 · 0 comments
Open

Refactor ColumnVEPField to be simpler data structures #1213

davmlaw opened this issue Dec 17, 2024 · 0 comments

Comments

@davmlaw
Copy link
Contributor

davmlaw commented Dec 17, 2024

Now that we have T2T the combinatorial explosion of different builds is starting to bite

Instead of creating multiple entries in the DB, we could have eg genome_build = [ "GRCh37", "GRCh38"] etc - can do more with JSON than

This is not the final form, but useful to see how much you can remove

records = ColumnVEPField.objects.all().order_by("variant_grid_column").values()

cleaned_data = defaultdict(list)
for r in records:
    data = {k:v for k,v in r.items() if v is not None}
    del data["id"]
    if data.get("source_field_has_custom_prefix") is False:
        del data["source_field_has_custom_prefix"]
    del data["column"]
    vgc = data.pop("variant_grid_column_id")
    data["variant_grid_column"] = vgc
    sf = data.pop("source_field")

    cleaned_data[sf].append(data)

A few strange things:

Mastermind_counts is processed 3 times (and stored in different fields)
gnomAD_SV_AF is copied to both gnomad_sv_overlap_af and gnomad_sv_overlap_af

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant