Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github import cannot update github-imported tasks that no longer come back in queries #155

Open
Dieterbe opened this issue Sep 26, 2021 · 0 comments

Comments

@Dieterbe
Copy link
Contributor

Dieterbe commented Sep 26, 2021

The Github import functionality is good at importing issues that come back from the github graphql api and
finding any pre-existing versions of said issues in the dstask database, and updating them (because it can find them by generating the UUID which is deterministically derived from immutable properties - the repo name, owner and ticket id)

What it is not good at, is updating tasks that were once generated from a github import, but now no longer come back in the query results. Such issues are simply left untouched, just like all other dstask issues.

This can happen when:

  • you change your import configuration, and e.g. remove a repository or change the assignee, label or milestone matchers.
  • ticket may have been changed to no longer match the query (e.g. your import config looks for issues with a certain assignee, but the assignee is removed or changed on github)

Both cases leave you with local tasks that you simply don't care about anymore. Maybe there are updates for them (e.g. maybe the issue has been closed on github while the task is still open/pending in dstask, or there's more information in the summary, or whatever)

My suggestion here is that any such tasks should simply all be deleted from the dstask database.
This means that dstask needs to find all github-imported tasks in its repo, in order to delete those that have a UUID that we didn't see in our github api responses. There's 2 issues here:

  1. For active, paused, etc tasks we can easily scan all files in those directories. But any tasks that were already resolved, are in a directory which can grow arbitrarily large. A user generally cares less about the accuracy of tasks that have been resolved and "out of their scope" (per the import config), but still, it's iffy to keep them out of sync (especially since they might be reopened on the github side)
  2. We have no way to distinguish GH imported tickets vs any others, once they have been imported, since users can have arbitrary templates to populate their content.

We can solve this in a few ways....

  1. have the user declare "search patterns" based on summary, tags, etc to let us find the right files. They have to make sure the patterns match their defined templates, which is clunky, not definitive (they could still manually make tasks that look like GH tasks), and still requires scanning.
  2. we could add a special property to GH-derived tasks, or give those UUID's a special prefix. still requires some scanning...
  3. we could make an additional directory which holds symlinks to all GH-imported tasks. This would be a bit clunky too because we'd need to update the symlinks as the tasks change states and move directories (which are code paths that i would like to keep unaware of any github import stuff), or if the symlink doesn't resolve, look for those filenames in the other directories
  4. keep a file which holds the UUID's/filenames of all github imported tasks (with the same tradeoffs as the option above)

My preferred option is 4 because it's simple, lightweight and doesn't touch any non-github/import related code.

I think the file should go in the database directory and we should commit (and thus also sync) any updates.
Anything less (e.g. not committing/syncing), or keeping the file in $XDG_CACHE_DIR means the file may be out of data and/or deleted and need reprovisioning, which brings us back to "how can we recognize the revelant tasks"

I should also point out that the github import has an option get_closed which lets you pull in all closed issues in GH that match your query. This can lead to large growth of the uuid file.

But:

  • If you have get_closed = true then any problems will be around slowness of the sync process/api calls
  • changing settings to get_closed = false will wipe all those closed GH tickets and clean out the uuid file

... so i'm not really worried about this problem. Although it's something we can revisit in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant