Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore files in _scans.tsv that correspond to entries in .bidsignore (#1366) #1914

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

appukuttan-shailesh
Copy link

@appukuttan-shailesh appukuttan-shailesh commented Mar 9, 2024

PR for handling bids-standard/bids-validator#48

Makes use of session storage to save the contents of .bidsignore when it is read originally. This is retrieved later while evaluating _scans.tsv files for ignoring the matching files. All tests with npm run test are passing (except two which were failing even on the unchanged master branch), and two new tests have been added.

For testing purposes, a mock session storage was required to be implemented as Jest is currently configured with "testEnvironment": "node". Could possibly be avoided if set to "testEnvironment": "jsdom". Another option is to employ jest-localstorage-mock package. But the current mock implementation seemed the simplest, and so went with it.

Two tests have been added:

  • should ignore files in scans.tsv that correspond to entries in .bidsignore
    This checks that files to be ignored, based on .bidsignore, are being ignored when evaluating the file list in _scans.tsv

  • should not allow missing files listed in scans.tsv and not accounted for by .bidsignore
    This catches files that are listed in _scans.tsv but are actually missing, and not accounted for by .bidsignore

First PR here; apologies if I have missed anything.
@effigies: Could you take a look?

@marcelzwiers
Copy link

marcelzwiers commented Mar 19, 2024

In bidscoin, I have added extra lines of code to remove those bidsignore files from the scans.tsv file because the bids-validator was complaining about them. Since the specs don't mention anything about this (at least not that I'm aware of), I simply took the validator output as the definition of the standard. So does this mean that I should not have added those extra lines of code to bidscoin and should just ignore the validator here?

@@ -330,6 +330,8 @@ async function getBIDSIgnore(dir) {
if (bidsIgnoreFileObj) {
const content = await readFile(bidsIgnoreFileObj)
ig.add(content)
// Store the .bidsignore content in session storage
sessionStorage.setItem('bidsignoreContent', JSON.stringify(content))
Copy link
Collaborator

@effigies effigies Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing an error.

Unhandled rejection (
  reason: ReferenceError: sessionStorage is not defined
    at getBIDSIgnore (/home/runner/work/bids-validator/bids-validator/bids-validator/bids-validator/utils/files/readDir.js:334:5)
    at Object.readDir (/home/runner/work/bids-validator/bids-validator/bids-validator/bids-validator/utils/files/readDir.js:23:14)
    at /home/runner/work/bids-validator/bids-validator/bids-validator/bids-validator/validators/bids/start.js:40:21
).

@effigies
Copy link
Collaborator

@marcelzwiers I would call this undefined behavior. The spec didn't say what to do, and bidsignore is not a spec-level concept, so the interaction turned out to be a problem.

As a practical matter, I think following the validator when it exceeds the specification is a good idea unless your goal is to force the issue and aim your users to complain to the validator issue tracker. I wouldn't consider that a bug, but you could stop removing the lines, once this or a similar solution is released.

Copy link

codecov bot commented Mar 21, 2024

Codecov Report

Attention: Patch coverage is 91.30435% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 85.52%. Comparing base (d626096) to head (6dacdea).
Report is 12 commits behind head on master.

Files Patch % Lines
bids-validator/utils/getSessionStorage.js 87.50% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    bids-standard/legacy-validator#1914      +/-   ##
==========================================
+ Coverage   83.57%   85.52%   +1.94%     
==========================================
  Files          92      132      +40     
  Lines        3890     6285    +2395     
  Branches     1271     1549     +278     
==========================================
+ Hits         3251     5375    +2124     
- Misses        541      806     +265     
- Partials       98      104       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@effigies
Copy link
Collaborator

 	1: [ERR] Internal error. SOME VALIDATION STEPS MAY NOT HAVE OCCURRED (code: 0 - INTERNAL ERROR)
		Evidence: ReferenceError: sessionStorage is not defined
    at TSV (/home/runner/work/bids-validator/bids-validator/bids-validator/validators/tsv/tsv.js:617:23)
    at /home/runner/work/bids-validator/bids-validator/bids-validator/validators/tsv/validate.js:33:9

I'm not sure if global storage is the right tool for this job. @rwblair @nellh Your insight would be appreciated here. I don't think I'm qualified to review this PR.

@appukuttan-shailesh
Copy link
Author

I did have my doubts about using sessionStorage for solving this problem, and it was infact a fallback approach to get the web app working. I believe the JS web app and the Python package might be using partly the same codebase (?), and if so this will not suffice. My first approach was to re-open the .bidsignore file when analyzing the _scans.tsv file, but this was problematic as .bidsignore is no longer accessible at that stage of the code (i.e. inside tsv.js), and couldn't force read owing to browser security related issues.

Looking forward to some feedback (@rwblair @nellh ) and suggestions on what would be a useful way to tackle this. Happy to try out a different approach.

@effigies
Copy link
Collaborator

Please ignore the Python package. It is only useful for packaging regular expressions for validating filenames. It will be replaced entirely in the future.

@appukuttan-shailesh
Copy link
Author

appukuttan-shailesh commented Mar 22, 2024

Just trying to debug the remaining errors with the current approach while waiting for further feedback.

Locally, I am testing the following (based on instructions here) before pushing the changes:

  • npm run test
  • npm run lint

But the errors that have been reported above (through the workflows) are never caught/tested via the above. Are there additional instructions for running the entire testing pipeline locally?

I have manually ran the following to ensure previous workflow errors do not arise again:

  • bids-validator/bin/bids-validator bids-validator/tests/data/valid_headers/ --ignoreNiftiHeaders
  • ../../../bin/bids-validator 7t_trt -c /home/shailesh/gits/bids-validator/bids-validator/tests/data/bids-examples/bidsconfig.json --ignoreNiftiHeaders

5 workflows await approval from maintainers to check if all is clear this time around.

Comment on lines +631 to 636
if (ig && ig.ignores(path.relative('/', scanRelativePath))) {
continue
}

// check if scan matches full dataset path list
if (!pathList.includes(scanFullPath)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I get what you're doing here. Thanks for this!

The issue here is that we want to error if a listed file doesn't exist, but we're currently erroring if the file doesn't exist or exists but is ignored. If you're going to follow the strategy you're using here, instead of storing the ignore patterns, you should be storing the ignored files. We can then say

 if (!(pathList.includes(scanFullPath) || ignoreList.includes(scanFullPath))) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @effigies

That should also work, just that we would need to store much more data in the session storage (storing ignore patterns vs complete list of ignored files across all subjects and sessions). I'm also curious why the current approach of storing patterns would not suffice; maybe I didn't follow your comment entirely.

As per #1366 , the intention was to allow .bidsignore files to be listed in _scans.tsv.
The plan then was to:

doing a match of scans.tsv files against ignore entries

The PR, in its current form, was intending to just add that one extra check, i.e. for each file listed in _scans.tsv, check if the file is to be ignored (based on .bidsignore), if yes then ignore that file entry with no error. This is being done irrespective of whether the file actually exists in the directory or not.

I added two tests for checking the following cases:

  • file is listed in _scans.tsv but isn't part of output of readDir() -> ERROR 129
  • file is listed in _scans.tsv; isn't part of output of readDir(), but is to be ignored -> no error

and both these tests are passing.

Note that there is a difference between the file actually existing (on disk) vs being present in the output of readDir() as the latter function already filters out the files to be ignored, before returning the list of required files (i.e. existing and non-ignored) to other steps in the workflow (e.g. TSV() in tsv.js)

@appukuttan-shailesh
Copy link
Author

appukuttan-shailesh commented Mar 25, 2024

In addition to the discussions above about the nature of the changes, I think the PR is passing most of the checks. 3 of them that it is failing, is found to have failed with similar status in a recent merged PR as well, so not sure if it's something specific to this PR that I should look into. I have made a small change to address the 4th failed test which dealt with code coverage; the current change should marginally improve that.

@appukuttan-shailesh
Copy link
Author

appukuttan-shailesh commented Apr 8, 2024

Could we run the remaining 5 workflows to see if they pass?

5 workflows awaiting approval

Also, happy to receive any feedback regarding the changes here.

@appukuttan-shailesh
Copy link
Author

As mentioned above, I believe it is now passing all the checks that a previous merged PR conformed to. So not sure if the 3 that failed, and 1 skipped, are something that I should look into.

@appukuttan-shailesh
Copy link
Author

I was wondering if we had any feedback on this PR.

@effigies
Copy link
Collaborator

effigies commented May 7, 2024

I still think we want the following cases:

File exists File is valid File is ignored Result
T T * Pass
T F T Pass
T F F Fail
F * * Fail

It seems like a regression to me to stop erroring when a user mentions a file that does not exist. The goal is to

I spent a little time trying to figure out how to pass ignored files around. I got this far:

❯ git --no-pager diff
diff --git a/bids-validator/validators/bids/fullTest.js b/bids-validator/validators/bids/fullTest.js
index edf0b405..efcd6204 100644
--- a/bids-validator/validators/bids/fullTest.js
+++ b/bids-validator/validators/bids/fullTest.js
@@ -76,8 +76,10 @@ const fullTest = (fileList, options, annexed, dir, schema, callback) => {
   }
 
   // remove ignored files from list:
+  const ignoredFiles = {}
   Object.keys(fileList).forEach(function (key) {
     if (fileList[key].ignore) {
+      ignoredFiles[key] = fileList[key]
       delete fileList[key]
     }
   })
@@ -123,6 +125,7 @@ const fullTest = (fileList, options, annexed, dir, schema, callback) => {
         participants,
         phenotypeParticipants,
         stimuli,
+        ignoredFiles,
       )
     })
     .then(({ tsvIssues, participantsTsvContent }) => {
diff --git a/bids-validator/validators/tsv/tsv.js b/bids-validator/validators/tsv/tsv.js
index 0b76e6cd..153b62b7 100644
--- a/bids-validator/validators/tsv/tsv.js
+++ b/bids-validator/validators/tsv/tsv.js
@@ -36,7 +36,7 @@ const filenameEvidence = (filename) => `Filename: ${filename}`
  * specification.
  */
 
-const TSV = (file, contents, fileList, callback) => {
+const TSV = (file, contents, fileList, ignoredFiles, callback) => {
   const issues = []
   const stimPaths = []
   if (contents.includes('\r') && !contents.includes('\n')) {
@@ -628,7 +628,7 @@ const TSV = (file, contents, fileList, callback) => {
         const scanFullPath = scanDirPath + '/' + scanRelativePath
 
         // check if file should be ignored based on .bidsignore content
-        if (ig && ig.ignores(path.relative('/', scanRelativePath))) {
+        if (ignoredFiles.includes(scanFullPath)) {
           continue
         }
 
diff --git a/bids-validator/validators/tsv/validate.js b/bids-validator/validators/tsv/validate.js
index dba18e7a..dfcd725c 100644
--- a/bids-validator/validators/tsv/validate.js
+++ b/bids-validator/validators/tsv/validate.js
@@ -9,6 +9,7 @@ const validate = (
   participants,
   phenotypeParticipants,
   stimuli,
+  ignoredFiles,
   annexed,
   dir,
 ) => {
@@ -34,6 +35,7 @@ const validate = (
           file,
           contents,
           fileList,
+          ignoredFiles,
           function (tsvIssues, participantList, stimFiles) {
             if (participantList) {
               if (file.name.endsWith('participants.tsv')) {

Something's not working right, but I don't understand the validator enough to know what. Perhaps this will help you, though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants