Ignore files in _scans.tsv that correspond to entries in .bidsignore (#1366) #1914

appukuttan-shailesh · 2024-03-09T16:53:56Z

PR for handling bids-standard/bids-validator#48

Makes use of session storage to save the contents of .bidsignore when it is read originally. This is retrieved later while evaluating _scans.tsv files for ignoring the matching files. All tests with npm run test are passing (except two which were failing even on the unchanged master branch), and two new tests have been added.

For testing purposes, a mock session storage was required to be implemented as Jest is currently configured with "testEnvironment": "node". Could possibly be avoided if set to "testEnvironment": "jsdom". Another option is to employ jest-localstorage-mock package. But the current mock implementation seemed the simplest, and so went with it.

Two tests have been added:

should ignore files in scans.tsv that correspond to entries in .bidsignore
This checks that files to be ignored, based on .bidsignore, are being ignored when evaluating the file list in _scans.tsv
should not allow missing files listed in scans.tsv and not accounted for by .bidsignore
This catches files that are listed in _scans.tsv but are actually missing, and not accounted for by .bidsignore

First PR here; apologies if I have missed anything.
@effigies: Could you take a look?

marcelzwiers · 2024-03-19T15:45:02Z

In bidscoin, I have added extra lines of code to remove those bidsignore files from the scans.tsv file because the bids-validator was complaining about them. Since the specs don't mention anything about this (at least not that I'm aware of), I simply took the validator output as the definition of the standard. So does this mean that I should not have added those extra lines of code to bidscoin and should just ignore the validator here?

effigies · 2024-03-19T15:55:36Z

bids-validator/utils/files/readDir.js

@@ -330,6 +330,8 @@ async function getBIDSIgnore(dir) {
  if (bidsIgnoreFileObj) {
    const content = await readFile(bidsIgnoreFileObj)
    ig.add(content)
+    // Store the .bidsignore content in session storage
+    sessionStorage.setItem('bidsignoreContent', JSON.stringify(content))


This is causing an error.

Unhandled rejection ( reason: ReferenceError: sessionStorage is not defined at getBIDSIgnore (/home/runner/work/bids-validator/bids-validator/bids-validator/bids-validator/utils/files/readDir.js:334:5) at Object.readDir (/home/runner/work/bids-validator/bids-validator/bids-validator/bids-validator/utils/files/readDir.js:23:14) at /home/runner/work/bids-validator/bids-validator/bids-validator/bids-validator/validators/bids/start.js:40:21 ).

effigies · 2024-03-19T16:01:57Z

@marcelzwiers I would call this undefined behavior. The spec didn't say what to do, and bidsignore is not a spec-level concept, so the interaction turned out to be a problem.

As a practical matter, I think following the validator when it exceeds the specification is a good idea unless your goal is to force the issue and aim your users to complain to the validator issue tracker. I wouldn't consider that a bug, but you could stop removing the lines, once this or a similar solution is released.

codecov · 2024-03-21T15:54:04Z

Codecov Report

Attention: Patch coverage is 91.30435% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 85.52%. Comparing base (d626096) to head (6dacdea).
Report is 12 commits behind head on master.

Files	Patch %	Lines
bids-validator/utils/getSessionStorage.js	87.50%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    bids-standard/legacy-validator#1914      +/-   ##
==========================================
+ Coverage   83.57%   85.52%   +1.94%     
==========================================
  Files          92      132      +40     
  Lines        3890     6285    +2395     
  Branches     1271     1549     +278     
==========================================
+ Hits         3251     5375    +2124     
- Misses        541      806     +265     
- Partials       98      104       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

effigies · 2024-03-21T15:58:50Z

 	1: [ERR] Internal error. SOME VALIDATION STEPS MAY NOT HAVE OCCURRED (code: 0 - INTERNAL ERROR)
		Evidence: ReferenceError: sessionStorage is not defined
    at TSV (/home/runner/work/bids-validator/bids-validator/bids-validator/validators/tsv/tsv.js:617:23)
    at /home/runner/work/bids-validator/bids-validator/bids-validator/validators/tsv/validate.js:33:9

I'm not sure if global storage is the right tool for this job. @rwblair @nellh Your insight would be appreciated here. I don't think I'm qualified to review this PR.

appukuttan-shailesh · 2024-03-21T16:28:57Z

I did have my doubts about using sessionStorage for solving this problem, and it was infact a fallback approach to get the web app working. I believe the JS web app and the Python package might be using partly the same codebase (?), and if so this will not suffice. My first approach was to re-open the .bidsignore file when analyzing the _scans.tsv file, but this was problematic as .bidsignore is no longer accessible at that stage of the code (i.e. inside tsv.js), and couldn't force read owing to browser security related issues.

Looking forward to some feedback (@rwblair @nellh ) and suggestions on what would be a useful way to tackle this. Happy to try out a different approach.

effigies · 2024-03-21T16:35:11Z

Please ignore the Python package. It is only useful for packaging regular expressions for validating filenames. It will be replaced entirely in the future.

appukuttan-shailesh · 2024-03-22T12:15:45Z

Just trying to debug the remaining errors with the current approach while waiting for further feedback.

Locally, I am testing the following (based on instructions here) before pushing the changes:

npm run test
npm run lint

But the errors that have been reported above (through the workflows) are never caught/tested via the above. Are there additional instructions for running the entire testing pipeline locally?

I have manually ran the following to ensure previous workflow errors do not arise again:

bids-validator/bin/bids-validator bids-validator/tests/data/valid_headers/ --ignoreNiftiHeaders
../../../bin/bids-validator 7t_trt -c /home/shailesh/gits/bids-validator/bids-validator/tests/data/bids-examples/bidsconfig.json --ignoreNiftiHeaders

5 workflows await approval from maintainers to check if all is clear this time around.

effigies · 2024-03-24T11:44:31Z

bids-validator/validators/tsv/tsv.js

+        if (ig && ig.ignores(path.relative('/', scanRelativePath))) {
+          continue
+        }
+
        // check if scan matches full dataset path list
        if (!pathList.includes(scanFullPath)) {


Okay, I get what you're doing here. Thanks for this!

The issue here is that we want to error if a listed file doesn't exist, but we're currently erroring if the file doesn't exist or exists but is ignored. If you're going to follow the strategy you're using here, instead of storing the ignore patterns, you should be storing the ignored files. We can then say

if (!(pathList.includes(scanFullPath) || ignoreList.includes(scanFullPath))) {

Thanks for the feedback @effigies

That should also work, just that we would need to store much more data in the session storage (storing ignore patterns vs complete list of ignored files across all subjects and sessions). I'm also curious why the current approach of storing patterns would not suffice; maybe I didn't follow your comment entirely.

As per #1366 , the intention was to allow .bidsignore files to be listed in _scans.tsv.
The plan then was to:

doing a match of scans.tsv files against ignore entries

The PR, in its current form, was intending to just add that one extra check, i.e. for each file listed in _scans.tsv, check if the file is to be ignored (based on .bidsignore), if yes then ignore that file entry with no error. This is being done irrespective of whether the file actually exists in the directory or not.

I added two tests for checking the following cases:

file is listed in _scans.tsv but isn't part of output of readDir() -> ERROR 129

file is listed in _scans.tsv; isn't part of output of readDir(), but is to be ignored -> no error

and both these tests are passing.

Note that there is a difference between the file actually existing (on disk) vs being present in the output of readDir() as the latter function already filters out the files to be ignored, before returning the list of required files (i.e. existing and non-ignored) to other steps in the workflow (e.g. TSV() in tsv.js)

appukuttan-shailesh · 2024-03-25T10:35:56Z

In addition to the discussions above about the nature of the changes, I think the PR is passing most of the checks. 3 of them that it is failing, is found to have failed with similar status in a recent merged PR as well, so not sure if it's something specific to this PR that I should look into. I have made a small change to address the 4th failed test which dealt with code coverage; the current change should marginally improve that.

appukuttan-shailesh · 2024-04-08T07:45:49Z

Could we run the remaining 5 workflows to see if they pass?

5 workflows awaiting approval

Also, happy to receive any feedback regarding the changes here.

appukuttan-shailesh · 2024-04-08T08:37:26Z

As mentioned above, I believe it is now passing all the checks that a previous merged PR conformed to. So not sure if the 3 that failed, and 1 skipped, are something that I should look into.

appukuttan-shailesh · 2024-05-07T13:06:59Z

I was wondering if we had any feedback on this PR.

effigies · 2024-05-07T14:35:53Z

I still think we want the following cases:

File exists	File is valid	File is ignored	Result
T	T	*	Pass
T	F	T	Pass
T	F	F	Fail
F	*	*	Fail

It seems like a regression to me to stop erroring when a user mentions a file that does not exist. The goal is to

I spent a little time trying to figure out how to pass ignored files around. I got this far:

❯ git --no-pager diff
diff --git a/bids-validator/validators/bids/fullTest.js b/bids-validator/validators/bids/fullTest.js
index edf0b405..efcd6204 100644
--- a/bids-validator/validators/bids/fullTest.js
+++ b/bids-validator/validators/bids/fullTest.js
@@ -76,8 +76,10 @@ const fullTest = (fileList, options, annexed, dir, schema, callback) => {
   }
 
   // remove ignored files from list:
+  const ignoredFiles = {}
   Object.keys(fileList).forEach(function (key) {
     if (fileList[key].ignore) {
+      ignoredFiles[key] = fileList[key]
       delete fileList[key]
     }
   })
@@ -123,6 +125,7 @@ const fullTest = (fileList, options, annexed, dir, schema, callback) => {
         participants,
         phenotypeParticipants,
         stimuli,
+        ignoredFiles,
       )
     })
     .then(({ tsvIssues, participantsTsvContent }) => {
diff --git a/bids-validator/validators/tsv/tsv.js b/bids-validator/validators/tsv/tsv.js
index 0b76e6cd..153b62b7 100644
--- a/bids-validator/validators/tsv/tsv.js
+++ b/bids-validator/validators/tsv/tsv.js
@@ -36,7 +36,7 @@ const filenameEvidence = (filename) => `Filename: ${filename}`
  * specification.
  */
 
-const TSV = (file, contents, fileList, callback) => {
+const TSV = (file, contents, fileList, ignoredFiles, callback) => {
   const issues = []
   const stimPaths = []
   if (contents.includes('\r') && !contents.includes('\n')) {
@@ -628,7 +628,7 @@ const TSV = (file, contents, fileList, callback) => {
         const scanFullPath = scanDirPath + '/' + scanRelativePath
 
         // check if file should be ignored based on .bidsignore content
-        if (ig && ig.ignores(path.relative('/', scanRelativePath))) {
+        if (ignoredFiles.includes(scanFullPath)) {
           continue
         }
 
diff --git a/bids-validator/validators/tsv/validate.js b/bids-validator/validators/tsv/validate.js
index dba18e7a..dfcd725c 100644
--- a/bids-validator/validators/tsv/validate.js
+++ b/bids-validator/validators/tsv/validate.js
@@ -9,6 +9,7 @@ const validate = (
   participants,
   phenotypeParticipants,
   stimuli,
+  ignoredFiles,
   annexed,
   dir,
 ) => {
@@ -34,6 +35,7 @@ const validate = (
           file,
           contents,
           fileList,
+          ignoredFiles,
           function (tsvIssues, participantList, stimFiles) {
             if (participantList) {
               if (file.name.endsWith('participants.tsv')) {

Something's not working right, but I don't understand the validator enough to know what. Perhaps this will help you, though?

appukuttan-shailesh added 7 commits March 8, 2024 18:00

compare _scans.tsv with .bidsignore

67c3c7a

update tests with mock session storage

7195fd2

sessionstorage fix for testing

ad2dd28

update mock session storage for tests

65da18f

add test for .bidsignore and scans.tsv check

5388f27

add test for missing files not in .bidsignore; clean up

9c28a60

fix lint related issues

e2030b5

effigies reviewed Mar 19, 2024

View reviewed changes

appukuttan-shailesh and others added 2 commits March 21, 2024 16:32

add browser check when reading directory

a7ca052

Merge branch 'master' into bidsignore_scansTSV

c7433af

rework use of session storage

aa04411

effigies reviewed Mar 24, 2024

View reviewed changes

increase coverage

6dacdea

effigies added the legacy label Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore files in _scans.tsv that correspond to entries in .bidsignore (#1366) #1914

Ignore files in _scans.tsv that correspond to entries in .bidsignore (#1366) #1914

appukuttan-shailesh commented Mar 9, 2024 •

edited

Loading

marcelzwiers commented Mar 19, 2024 •

edited

Loading

effigies Mar 19, 2024 •

edited

Loading

effigies commented Mar 19, 2024

codecov bot commented Mar 21, 2024 •

edited

Loading

effigies commented Mar 21, 2024

appukuttan-shailesh commented Mar 21, 2024

effigies commented Mar 21, 2024

appukuttan-shailesh commented Mar 22, 2024 •

edited

Loading

effigies Mar 24, 2024

appukuttan-shailesh Mar 25, 2024

appukuttan-shailesh commented Mar 25, 2024 •

edited

Loading

appukuttan-shailesh commented Apr 8, 2024 •

edited

Loading

appukuttan-shailesh commented Apr 8, 2024

appukuttan-shailesh commented May 7, 2024

effigies commented May 7, 2024

Ignore files in _scans.tsv that correspond to entries in .bidsignore (#1366) #1914

Are you sure you want to change the base?

Ignore files in _scans.tsv that correspond to entries in .bidsignore (#1366) #1914

Conversation

appukuttan-shailesh commented Mar 9, 2024 • edited Loading

marcelzwiers commented Mar 19, 2024 • edited Loading

effigies Mar 19, 2024 • edited Loading

Choose a reason for hiding this comment

effigies commented Mar 19, 2024

codecov bot commented Mar 21, 2024 • edited Loading

Codecov Report

effigies commented Mar 21, 2024

appukuttan-shailesh commented Mar 21, 2024

effigies commented Mar 21, 2024

appukuttan-shailesh commented Mar 22, 2024 • edited Loading

effigies Mar 24, 2024

Choose a reason for hiding this comment

appukuttan-shailesh Mar 25, 2024

Choose a reason for hiding this comment

appukuttan-shailesh commented Mar 25, 2024 • edited Loading

appukuttan-shailesh commented Apr 8, 2024 • edited Loading

appukuttan-shailesh commented Apr 8, 2024

appukuttan-shailesh commented May 7, 2024

effigies commented May 7, 2024

appukuttan-shailesh commented Mar 9, 2024 •

edited

Loading

marcelzwiers commented Mar 19, 2024 •

edited

Loading

effigies Mar 19, 2024 •

edited

Loading

codecov bot commented Mar 21, 2024 •

edited

Loading

appukuttan-shailesh commented Mar 22, 2024 •

edited

Loading

appukuttan-shailesh commented Mar 25, 2024 •

edited

Loading

appukuttan-shailesh commented Apr 8, 2024 •

edited

Loading