-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: avoid duplicate mappings #131
Conversation
remote had merge changes from dev into issue_85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, thanks a lot. Please check the unit tests and resolve issues. Otherwise only minor comments from my side. Once the tests pass, can you please tag @balajtimate for an additional pair of eyes? 🙏
htsinfer/cli.py
Outdated
"upper limit on the difference in the reference " | ||
"sequence coordinates between the two mates to be " | ||
"considered as coming from a single fragment " | ||
"(Used only when sequence identifiers do not match) " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency I would use all lowercase "used". Also in the next option.
htsinfer/get_library_type.py
Outdated
@@ -116,6 +124,111 @@ def _evaluate_mate_relationship( | |||
self.results.relationship = ( | |||
StatesTypeRelationship.split_mates | |||
) | |||
self.mapping.library_type.relationship \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use parentheses for wrapping lines. See PEP 8 style guide.
htsinfer/get_read_orientation.py
Outdated
@@ -416,6 +101,7 @@ def process_alignments( | |||
if star_dirs[0].name == "file_1": | |||
results.file_1 = self.process_single(sam=paths[0]) | |||
elif len(star_dirs) == 2: | |||
print("Getting there") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a leftover from testing. If not, please change into a logging message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opsie 😆
@BorisYourich: I've been away for a while. Should I look at this again or was there nothing added/changed since the last review? |
Hey, sorry for such a late response @uniqueg I was busy with other stuff, no update on this yet. |
No problem. Thanks for the update on no update :) I had just asked because I was away for quite a while myself and lost a bit track of progress on different ends |
I made the required changes plus something extra I found, but the unit tests still fail and I have no idea why, and it seems random, take a look at the replayed 3.7 unit test, many more tests failed compared to previous run. I cannot replicate in my local env, even if I downgrade python to 3.7, the tests still pass. |
Hi @BorisYourich, I've fixed some of the errors by stabilizing the build in various ways (remote now installs the exact same package as local for me): https://github.com/zavolanlab/htsinfer/pull/131/files/0a232822922a006fd97ee9d6e94d44db733eca5d..b5bd469df2694f20ba46178425752b036cda2f94 Among other things, I have included Py3.10 in the supported versions, but dropped support for Py3.7 - support for which is phasing out anyway. As you can see, there is still one issue remaining that affects two tests. These two tests also pass on my local machine (I suppose the same issue you faced), which -together with the error message- makes me think that it's a permission problem. Maybe a file is attempted to be written in a location that we do have write access on our local machine, but wouldn't have write access on the remote machine. Or a file is created in a temporary directory that is garbage collected on the remote between tests, but not locally. From the error stack, I would have a look here first: Lines 79 to 142 in b5bd469
This method also had a line that created a file name based off an object (which I first thought was the problem). I have now given the transcripts subset file a hardcoded name (not sure if that could cause problems in some case?). And still this stringified library source object instance is used in the log message, so this isn't handled well. Seems to me that this function needs guarding against edge cases, e.g., when there is no library source determined/available). As I think you know that particular code better from your changes, I would like to pass it back to you if possible. With some debug messages and maybe additional try-except blocks around the code reading and writing that subset transcripts file, I'm sure you will find out what the problem is on the remote. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## dev #131 +/- ##
===========================================
+ Coverage 99.80% 100.00% +0.19%
===========================================
Files 12 13 +1
Lines 1047 1078 +31
===========================================
+ Hits 1045 1078 +33
+ Misses 2 0 -2
☔ View full report in Codecov by Sentry. |
Playing around with the tests I realized where the problem is, the tests didn't have the |
Awesome, thanks a lot, @BorisYourich! @balajtimate: Can you please test the feature branch to see if the proposed changes break anything on real world data? Maybe you can run it on a few samples and see if you get the same results as you did previously. If all is good, I will merge. |
Okay, so I tested on a subset of 20 samples from our sample data, random organisms, SE/PE and there was no error, code ran perfectly, and the results match the ones from the run with the dev branch. 😄 I think you can merge now, if the 3.7 integration and unit tests run. |
Thanks! Merging this now, given that it doesn't break anything. We can still do in-depth testing later to see if it actually works as expected. Thanks a lot @BorisYourich |
Co-authored-by: Boris Jurič <[email protected]> Co-authored-by: Alex Kanitz <[email protected]>
* feat: add org param * refactor: avoid duplicate mappings (#131) Co-authored-by: Boris Jurič <[email protected]> Co-authored-by: Alex Kanitz <[email protected]> * fix typo, update pylint config * feat: add org_id param #108 * refactor: get_library_source.py #108 * test: add org param tests #108 * fix: update Pydantic version (#146) * fix pydantic issues * fix: update pydantic version in envs * fix: pin sphinx-rtd-theme into env * fix: update readthedocs config * update readme, gitignore * feat: infer org source if id not in dict #108 * replace json with model_dump * feat: add org_id param #108 * feat: add org_id param #108 * refactor: replace org with tax-id * refactor get_library_source * refactor get_library_source tests * refactor: update models.py * refactor: fix typos --------- Co-authored-by: Boris Jurič <[email protected]> Co-authored-by: Boris Jurič <[email protected]> Co-authored-by: Alex Kanitz <[email protected]>
Description
Part of the code from the
get_read_orientation.py
module was transferred into a new modulemapping.py
which takes care of the STAR mapping process. The instance of theMapping
class is created in theHtsInfer
class and supplemented as an argument to theget_lib_type
andget_orientation
modules, so as to avoid a repeated mapping if the sequence IDs are not the same and the mates are actually not mates.Fixes #130
Type of change
Please delete options that are not relevant.
Checklist
Please carefully read these items and tick them off if the statements are true
or do not apply.
warnings
have added type annotations for any local variables that are non-trivial,
potentially ambiguous or might otherwise benefit from explicit typing.
methods/functions or updated previously existing ones
works
reduced the code coverage relative to the previous state
by the proposed changes
One Issue which I was not able to replicate is the unit test failures in the CI workflow, I will not be able to work on this currently, so I leave it as is for now.