-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query doesn't report reliably #1
Comments
This may be to do with searching for a filename only. Recent weeks (including work done in https://github.com/sdruskat/cff-in-the-wild) have shown that the GitHub Search API doesn not reliably produce results:
One solution could be to combine filename search with a string we expect in each and every (real) CFF file, e.g., |
* Create CITATION.cff * Create cffconvert.yml * Fix typo
[Enhancement] Add `CITATION.cff` (#1)
A current search for This means that by using the metadata encoded in the corpus, we can now again construct a more reliable history of CFF files on GitHub! /cc @jspaaks |
Currently, the metadata in the corpus excludes deleted files, although this information could be retrieved by looking at the commits including a change in a CFF file more carefully during harvesting, looking at whether the file was removed... |
The GitHub query doesn't retrieve reliable number unfortunately. As suggested by @arfon, this may be due to ongoing work in the GitHub backend at query time.
Some options for working around this would be to clean results, e.g., by removing any unexpected spikes (delete rows that deviate from a general trend (1 measurements before, 2 after the spike) retroactively.
Or to run the script several times a day, then averaging.
The text was updated successfully, but these errors were encountered: