You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
is using a non-existent or deprecated argument / has some inherent syntax error.
However, what's actually happening is that the code assumes reset_index() is operating on a pd.Series (which DOES have the name argument). But something is going wrong in the logic for session_completed_pipeline_data such that it's producing a pd.DataFrame instead (which DOESN'T have the name argument for reset_index()).
Source of problem
Earlier in the code, when pipeline_grouped_data is constructed:
as a result, when we then try to run groupby again on this object to construct session_completed_pipeline_data, that has no effect and still returns a pd.DataFrame, causing the unexpected keyword error when we then try to run reset_index() on it.
If we instead set dropna=False in the groupby when constructing pipeline_grouped_data, there is no longer an error, but the resulting completed_pipelines field for single subject-session looks like this in the response:
"completed_pipelines": {
"null": []
}
Environment
OS:
Python/Node version:
How to reproduce
No response
Anything else?
Some considerations
Why this wasn't caught by our tests
This section of code is not currently covered by any tests - i.e., we don't assert over the new pipeline fields in an n-API response based on a non-aggregated (mocked or real) result from the graph (i.e., checking that the graph response is re-formatted correctly on the API side)
We do have a mocked response from the graph that we currently use in some tests of the API response formatting:
However, even if we did use this, it would not have caught the error because:
a. It assumes a dataset with info containing pipeline metadata
b. It is an aggregated response
To avoid similar issues
We need tests that assert over the API response given all possible graph database types allowed by Neurobagel, i.e. datasets with:
i. ONLY phenotypic data
ii. ONLY phenotypic + bids data (no derivatives)
iii. ONLY phenotypic + derivatives data (no BIDS)
iv. Phenotypic + bids + derivatives
this means either spinning up multiple test graphs or mocking these 4 types of graph responses
we need to make sure the graph responses are also non-aggregated
We may want to simplify our existing code for reformatting graph responses so that they are more explicit, rather than complex lambda funcs or chaining multiple methods together (harder to debug)
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Expected Behavior
No response
Current Behavior
When submitting any cohort query to an n-API v0.4.0
the query results in an
Internal server error
.The same issue does not occur if the n-API is run in aggregated mode, or for a graph database containing at least one subject with pipeline metadata.
Error message
Error in the n-API container logs:
This error is misleading, as it gives the impression that the relevant section of code:
api/app/api/crud.py
Lines 222 to 232 in 9913736
is using a non-existent or deprecated argument / has some inherent syntax error.
However, what's actually happening is that the code assumes
reset_index()
is operating on apd.Series
(which DOES have thename
argument). But something is going wrong in the logic forsession_completed_pipeline_data
such that it's producing apd.DataFrame
instead (which DOESN'T have thename
argument forreset_index()
).Source of problem
Earlier in the code, when
pipeline_grouped_data
is constructed:api/app/api/crud.py
Lines 202 to 220 in 9913736
we are dropping
NaNs
during thegroupby
, meaning that when there are no pipeline names in the data, we get an empty dataframe like:as a result, when we then try to run
groupby
again on this object to constructsession_completed_pipeline_data
, that has no effect and still returns apd.DataFrame
, causing the unexpected keyword error when we then try to runreset_index()
on it.If we instead set
dropna=False
in the groupby when constructingpipeline_grouped_data
, there is no longer an error, but the resultingcompleted_pipelines
field for single subject-session looks like this in the response:Environment
How to reproduce
No response
Anything else?
Some considerations
Why this wasn't caught by our tests
api/tests/conftest.py
Lines 111 to 168 in 9913736
a. It assumes a dataset with info containing pipeline metadata
b. It is an aggregated response
To avoid similar issues
i. ONLY phenotypic data
ii. ONLY phenotypic + bids data (no derivatives)
iii. ONLY phenotypic + derivatives data (no BIDS)
iv. Phenotypic + bids + derivatives
The text was updated successfully, but these errors were encountered: