Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wizard: collect bugs / improvements #1563

Open
lucasrodes opened this issue Aug 30, 2023 · 55 comments · Fixed by #1593
Open

wizard: collect bugs / improvements #1563

lucasrodes opened this issue Aug 30, 2023 · 55 comments · Fixed by #1593
Assignees
Labels
wizard Issues related to wizard tool

Comments

@lucasrodes
Copy link
Member

lucasrodes commented Aug 30, 2023

After #1539 you can now use the new tool wizard to generate templates for your ETL steps.

Use this issue to report bugs that you may encounter. If the issue is very complex, feel free to create a separate - and more extense - issue.

@pabloarosado
Copy link
Contributor

pabloarosado commented Sep 1, 2023

I'm using wizard now for the first time, so I'll write here all issues I find along the way:

  • When I open wizard, I get the warning:
2023-09-01 08:37:55.275 WARNING streamlit.runtime.caching.cache_data_api: No runtime found, using MemoryCacheStorageManager
['streamlit', 'run', '/Users/prosado/Documents/owid/repos/etl/apps/wizard/app.py', '--server.port', '8053', '--', '--phase', 'all', '--run-checks']
  • I think the "Generate playground notebook" (in the meadow/garden steps) should be unselected by default (I don't think many of us are using it, but I may be wrong). When I unselect it and submit, I get the error:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/prosado/Documents/owid/repos/etl/etl/steps/data/meadow/animal_welfare/2023-09-01/playground.ipynb'
Traceback:
File "/Users/prosado/Documents/owid/repos/etl/.venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
    exec(code, module.__dict__)
File "/Users/prosado/Documents/owid/repos/etl/apps/wizard/templating/meadow.py", line 212, in <module>
    os.remove(notebook_path)
  • After submitting a meadow/garden step, the new files are mentioned on the new page (under "Generated files"). But I think the dag file should also be mentioned, if it has been modified.
  • When loading the resulting snapshot, I was getting a strange error. I realised that snapshot.metadata.license was None, even if license was defined in the .dvc file. It took me a while to realise that the produced .dvc file had license indented too many spaces, hence becoming an item of of origin. I think they are meant to be separate fields in the snapshot metadata. Once I removed the indent of license (and its items), the issue was fixed. So, I suppose it's a bug, and wizard should not store license inside origin (we can discuss if that should be indeed the correct structure).
  • Field citation_producer should be expandable, given that it can be very long (as dataset description fields are).

@lucasrodes lucasrodes mentioned this issue Sep 1, 2023
7 tasks
@lucasrodes
Copy link
Member Author

These issues should be solved in #1567

@lucasrodes
Copy link
Member Author

@Marigold @pabloarosado From our discussions and the documentation on Notion, I had assumed that we agreed that the field license should be under $.meta.origin.

However, from what Pablo is saying, it sounds like this is raising an error. As a solution for now, I've set wizard to export the field license both at $.meta.origin.license and $.meta.license levels.

Do you think this is fine?

@Marigold
Copy link
Collaborator

Marigold commented Sep 4, 2023

It took me a while to realise that the produced .dvc file had license indented too many spaces, hence becoming an item of of origin. I think they are meant to be separate fields in the snapshot metadata.

The original idea was to keep it under origin, not separate from it. license is there only for backward compatibility. Ideally, snapshot meta would have either source & license or origin & origin.license (we could refactor old files if this causes too much confusion).

@lucasrodes
Copy link
Member Author

Thanks for clarifying, Mojmir. Yes, that's what I had in mind.

@pabloarosado could you share code to reproduce the error?

@lucasrodes
Copy link
Member Author

walkthrough bug that might be occurring here too: #1571

@paarriagadap
Copy link
Contributor

paarriagadap commented Sep 11, 2023

  • I see that update_period_days is mandatory in the guideline, but it is not available by default in any of the YAML files

@paarriagadap
Copy link
Contributor

paarriagadap commented Sep 11, 2023

Two related comments to the ones above:

  • description_processing is not shown by default when walkthrough garden generates the yaml, and it's a recommended field
  • description is automatically generated in the yaml file, when I see it is not used anymore

In general it would be good to match that file with the final version of the guidelines.

@lucasrodes
Copy link
Member Author

lucasrodes commented Sep 18, 2023

Hey! Should I just reopen for a small issue then?

  • The create playground notebook option in wizard doesn't generate a Jupyter Notebook

(copied from #1566 (comment))

@spoonerf
Copy link
Contributor

It's not super clear to me whether I should be hitting return after filling in each section or not? It seems like a good thing to somewhat validate each entry, but also the form completed unexpectedly early when pressing it, meaning I skipped the last couple of sections.

@paarriagadap
Copy link
Contributor

paarriagadap commented Sep 18, 2023

  • When the YAML for garden is generated, there are two localhost links that don't work:
# NOTE: To learn more about the fields, hover over their names.


# Learn more about the available fields:
# http://localhost:8000/architecture/metadata/reference/dataset/
dataset:
  update_period_days: 365


# Learn more about the available fields:
# http://localhost:8000/architecture/metadata/reference/tables/

@paarriagadap
Copy link
Contributor

paarriagadap commented Sep 18, 2023

  • The video explaining the harmonization tool is outdated:

Harmonize country names with the following command (assuming country field is called country). Check out a short demo of the tool

  • I think the options to add regions and population data available on walkthrough should also be available on wizard as well.

@lucasrodes
Copy link
Member Author

@spoonerf, good point, but unfortunately, clicking on ENTER is equivalent to submitting the entire form. It will be submitted if the form is valid (i.e., all required fields are present).

@lucasrodes
Copy link
Member Author

@pabloarosado

I think the options to add regions and population data available on walkthrough should also be available on wizard as well.

Which options do you refer to here? I just executed walkthrough and there aren't - at least on my end - any options to add regions or population datasets.

@paarriagadap
Copy link
Contributor

@lucasrodes Maybe it was removed recently, but they were available as check boxes at the # end of walkthrough garden

@Marigold
Copy link
Collaborator

I'll be looking at chart revisions this week, so I will take a look if there's an easy fix for that. (Examples like that are super useful by the way)

@lucasrodes
Copy link
Member Author

Issue raised by pablo might be related to this one: #867 (meant to post this here)

cc. @Marigold

@paarriagadap
Copy link
Contributor

Is the sorting in bar charts in the approval tool fixed here? Because I still find the issue:
image

@paarriagadap
Copy link
Contributor

paarriagadap commented Apr 25, 2024

  • When using wizard snapshots, if I save the title with : it generates an error in the yaml file. It can be solved if the content is rendered in quotes.
image image

@paarriagadap
Copy link
Contributor

  • Minor: The entity harmonizer might look a bit confusing when you add a custom name because the suggested name is still shown. Would it be possible to remove it when filling the custom box?
image

@paarriagadap
Copy link
Contributor

  • Suggestions in the entity harmonizer suggest wrong options when there is a parenthesis after a recognizable country
image

@lucasrodes
Copy link
Member Author

When using wizard snapshots, if I save the title with : it generates an error in the yaml file. It can be solved if the content is rendered in quotes.

Thanks for reporting @paarriagadap. I think this by design, otherwise the linter gets confused with the keyword. As you say, you can solve this by putting the text in quotes (") or using a multiline string (|-, >, etc.)

@paarriagadap
Copy link
Contributor

paarriagadap commented May 22, 2024

ValueError: Property variable.display.numDecimalPlaces has no type!
Traceback:
File "/home/owid/etl/.venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script
    exec(code, module.__dict__)
File "/home/owid/etl/apps/wizard/pages/expert/app.py", line 13, in <module>
    from apps.wizard.pages.expert.prompts import (
File "/home/owid/etl/apps/wizard/pages/expert/prompts.py", line 52, in <module>
    {render_indicator()}
File "/home/owid/etl/etl/docs.py", line 161, in render_indicator
    documentation = render_props_recursive(
File "/home/owid/etl/etl/docs.py", line 115, in render_props_recursive
    text += render_props_recursive(
File "/home/owid/etl/etl/docs.py", line 115, in render_props_recursive
    text += render_props_recursive(
File "/home/owid/etl/etl/docs.py", line 123, in render_props_recursive
    text += render_prop_doc(prop, prop_name=prop_name, level=level)
File "/home/owid/etl/etl/docs.py", line 65, in render_prop_doc
    raise ValueError(f"Property {prop_name} has no type!")

@Marigold
Copy link
Collaborator

@paarriagadap fixed the expert, thanks for reporting!

@paarriagadap
Copy link
Contributor

paarriagadap commented May 23, 2024

  • The express variant of the wizard shows duplicated namespaces in the dropdown when searching:
image

@paarriagadap
Copy link
Contributor

  • The Run snapshot step in Snapshot's Next Steps could integrate a menu to search for the local file.
image

@paarriagadap
Copy link
Contributor

  • (Minor) After running the Snapshot step via streamlit, it could have the button to continue in Meadow
image

Copy link

stale bot commented Aug 14, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Aug 14, 2024
@stale stale bot closed this as completed Aug 22, 2024
@paarriagadap paarriagadap reopened this Nov 19, 2024
@stale stale bot removed the wontfix This will not be worked on label Nov 19, 2024
@paarriagadap
Copy link
Contributor

Indicator Upgrader didn't recognize the version of a dataset updated this year: PR
Image

@lucasrodes
Copy link
Member Author

lucasrodes commented Nov 19, 2024

Indicator Upgrader didn't recognize the version of a dataset updated this year: #3569

@pabloarosado, in case you want/have time to look into the detection algorithm

@pabloarosado
Copy link
Contributor

Hi @paarriagadap thanks for reporting this, I've fixed it in your branch. There were two issues:

  • The archive dag file dag/archive/poverty_inequality.yml was created, but it wasn't added to dag/archive/main.yml, and therefore it was ignored.
  • Then I also realised that there were two steps that were both in the active and in the archive dag, namely:
    • data://explorers/lis/latest/luxembourg_income_study
    • data://explorers/wid/latest/world_inequality_database
      If you think this was caused by wizard (which would be a bug), let me know. Thanks!

@paarriagadap
Copy link
Contributor

Hi @pabloarosado. Ah, thanks for the fix. I should have forgotten about adding the dag file in the archive main.yml

Regarding those latest steps, yes, I think that when I use the tool to update steps, these latest steps are moved to the archive and disappear from the "live" yaml. At least I have seen that I need to re-add those steps to the latter.

@pabloarosado
Copy link
Contributor

pabloarosado commented Nov 20, 2024

Hi @pabloarosado. Ah, thanks for the fix. I should have forgotten about adding the dag file in the archive main.yml

Regarding those latest steps, yes, I think that when I use the tool to update steps, these latest steps are moved to the archive and disappear from the "live" yaml. At least I have seen that I need to re-add those steps to the latter.

I guess this could have been caused by the other issue, of not having the dag/archive/poverty_inequality.yml file properly accounted for. But please let me know if something like this happens again in the future.

@pabloarosado
Copy link
Contributor

pabloarosado commented Nov 28, 2024

  • Bug on Chart animation: Sometimes the first frame of a line chart animation becomes a bar chart (even though "Show single year" is deactivated).
    Image

@paarriagadap
Copy link
Contributor

In Wizard Express we could have the Generate playground notebook option in case we want it. I use to create them in the separate steps.

@lucasrodes
Copy link
Member Author

lucasrodes commented Dec 3, 2024

@paarriagadap

In Wizard Express we could have the Generate playground notebook option in case we want it. I use to create them in the separate steps.

I can add it back.

Some follow-up questions:

  • Do you use it only in Garden?
  • Do you use the interactive shell in VS Code? If so, where does it fall short compared to the playground notebook?

@paarriagadap
Copy link
Contributor

paarriagadap commented Dec 3, 2024

Hi @lucasrodes! Thanks!

  • No, I also can use it in meadow, to check the resulting table and table structure
  • No, I don't use it much

@Marigold
Copy link
Collaborator

Marigold commented Dec 3, 2024

Do you use the interactive shell in VS Code? If so, where does it fall short compared to the playground notebook?

Sorry to chime in! I also use playground.ipynb notebooks a lot (although for different use cases I guess). The main advantage over interactive shell is that it keeps your "history" in cells. If I switch to a different task and then go back, it's harder to get to the same state.

@pabloarosado
Copy link
Contributor

Do you use the interactive shell in VS Code? If so, where does it fall short compared to the playground notebook?

Sorry to chime in! I also use playground.ipynb notebooks a lot (although for different use cases I guess). The main advantage over interactive shell is that it keeps your "history" in cells. If I switch to a different task and then go back, it's harder to get to the same state.

Sorry to chime in as well! :-D
We could actually also achieve the same thing with playground.py files, ignored in .gitignore. The only difference would be that you would not be able to see the output cells. But, unless you are working on very big datasets that take a long time to run, you can simply re-run any playground code to produce the same outputs again.
That said, I have nothing against keeping the option of playground.ipynb, if others find it useful. But I would prefer to keep all ipynb files out of the repos, and stick to .py for simplicity.

@Marigold
Copy link
Collaborator

Marigold commented Dec 3, 2024

Yep, that's a good idea. I'll give it a try next time.

But I would prefer to keep all ipynb files out of the repos, and stick to .py for simplicity.

All playground.* files are in .gitignore anyway, so we're not going to commit anything.

@pabloarosado
Copy link
Contributor

  • Improvement on Producer analytics: Ed pointed out that it would be good to let the user see analytics not only at the data producer level, but also at the data producer & dataset level.
    One option would be to have a toggle to show either data producer or data producer & data product in the main table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wizard Issues related to wizard tool
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants