Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mummer4 tools #1953

Merged
merged 21 commits into from
Dec 4, 2018
Merged

Add Mummer4 tools #1953

merged 21 commits into from
Dec 4, 2018

Conversation

nickeener
Copy link
Contributor

Hey my name's Nick and this is my first project for Dr. Nekrutenko. It's not quite done and there are still a few problems that I will list, but Nekrutenko suggested making a pull request so more people could offer help.

This is my attempt at wrapping the 16 commands of the mummer4 package, which contains sequence alignment and visualization tools. Of the 16 commands, I was able to able to get 10 of them to pass the test run by planemo (nucmer, delta-filter, exact-tandems, mgaps, mummer, repeat-match, show-aligns, show-coords, show-diff, show-snps). Another 3 fail the test but that problem should be fixed by the following pull request by Dave B: mummer4/mummer#68 (promer, dnadiff, mummerplot).

Problems:
mummer - needs to incorporate data manager created by Dave B (https://github.com/davebx/tools-iuc/tree/mummer_dm/data_managers/data_manager_mummer4_index_builder). Not sure how to go about this.

dnadiff - First test gives error: "Syntax error: Unbound variable input". Not sure what's going on here. Second test fails due to foundation.pm error that's fixed by aforementioned pull request.

show-tiling - When running the command, the output is blank. I have have the dependencies and optional utilities installed so I don't know why it's not working.

combineMUMs - When I run this command I get a seg fault. Also no idea why.

annotate - Confused about the input for this command. Man page says it's the output from the gaps command except there is no gaps command and the mgaps output doesn't seem to work.

I'd appreciate any suggestions on fixing these problems or any concerning any of the other commands. I look forward to working with you all.

Nick Keener

@peterjc
Copy link
Contributor

peterjc commented Jun 28, 2018

For reference, I wrapped a sub-set of the suite here:
http://toolshed.g2.bx.psu.edu/view/peterjc/mummer
https://github.com/peterjc/pico_galaxy/tree/master/tools/mummer

What you are doing here will be far more flexible and powerful - good plan 👍

@bgruening
Copy link
Member

Wonderful @nickeener! Many of us are currently at the GCC ... so sorry if we take a little bit longer to review it. Thanks!

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • please add a .shed.yml file

<command detect_errors="exit_code">
<![CDATA[
annotate
$mgaps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single quotes for all data-params and text-params

<option value="no">NO</option>
</param>
<when value="yes" >
<param name="qlabel" type="text" argument="-q" value="query" label="Query match label" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could remove this conditional and simply make the text parameter optional="True"

<param name="strings" type="boolean" argument="-s" truevalue="-s" falsevalue="" label="Output all differences in strings" />
<param name="qheader" type="boolean" argument="-t" truevalue="-t" falsevalue="" label="Label query matches with query fasta header" />
<param name="verbose" type="select" label="Set verbose level for extra output" >
<option value="0">0</option>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use indentation of 4 spaces everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just used a single tab for all my indentation, does that mean I need to go through and replace every tab with 4 spaces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

</outputs>
<tests>
<test>
<param name="reference_sequence" ftype="fasta" value="K12.fasta" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

<param name="query_sequence" type="data" format="fasta" multiple="True" label="Query Sequence(s)" />
</when>
<when value="delta">
<param name="delta_data" type="data" format="txt" label="Alignment file from nucmer" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general question to all of these txt formats. Are they really undefined, or are these maybe tabular etc...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would I tell if they were tabular? The original file in this case had the suffix ".delta" which wasn't an accepted format so I changed it to .txt.

@nsoranzo nsoranzo changed the title Initial commit Add Mummer4 tools Jun 29, 2018
@nickeener
Copy link
Contributor Author

I have a weird problem with the mgaps command. I initially installed mummer4 on my laptop several months ago and mgaps worked fine. But then just recently I got a new laptop and installed mummer4 on it and now the mgaps command does not exist. Not sure why it didn't get installed the 2nd time or how to get it now.

Also, the nucmer command is now failing even though it passed before making this pull request and I have not updated it since.

@bgruening
Copy link
Member

How did you installed it? With conda create -n mummer mummer4?

@nickeener
Copy link
Contributor Author

I used: conda install -c bioconda mummer4
I have all the other commands, just not mgaps.

@nekrut
Copy link
Contributor

nekrut commented Oct 16, 2018

Great job! Some notes:

  • Let's restrict the toolkit to mummer, mummerplot, nucmer, nucplot, filter-delta, show-coord
  • Combine nucmer and nucplot into one tool with an additional option to output the plot. Same thing for mummer

@nickeener
Copy link
Contributor Author

@bgruening Could you please take a look at my modified code? Travis-CI tests do not pass yet due to some problem with git but we are working on that.

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickeener thanks, this looks very good!

  • please check all filetypes if you can not use tabular instead of txt
  • please check your requirements, ideally you have just mummer4 as the requirement
  • you could save a lot of lines if you use macros ... e.g. requirements, citations, version numbers and so on can all be moved into a macro file and shared across the tools

<tool id="delta-filter" name="Delta-Filter" version="0.1.0">
<requirements>
<requirement type="package" version="4.0.0beta2">mummer4</requirement>
<requirement type="package" version="1.12.14">sh</requirement>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickeener why are all the other dependency needed? You just need mummer4 isn't it? And what is sh?

@@ -0,0 +1,176 @@
<tool id="mummer" name="Mummer" version="0.1.0">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change all tool versions to the mummer version.

</conditional>
</inputs>
<outputs>
<data name="output" format="txt" from_work_dir="mummer.txt" label="mummer" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove all label from the outputs ... or use something like label="${tool.name} on ${on_string}: mumplot"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all tools :)

</requirements>
<command detect_errors="exit_code">
<![CDATA[
#set $query=""+"".join(str($query_sequence).split(','))+""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also use "'+'" here? Is the leading +"" needed?

<tests>
<test>
<param name="delta" ftype="txt" value="nucmer.txt" />
<output name="output" ftype="txt" compare="diff" value="show-coords.txt" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, if you like, attach the column names actually as metadata.
See here: https://github.com/galaxyproject/tools-iuc/blob/master/tools/deseq2/deseq2.xml#L170

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it should probably be tabular isn't it?

@nickeener
Copy link
Contributor Author

@bgruening Thanks for the feedback. I believe I've fixed everything you mentioned. Are there any other changes I should make?

Nicholas Keener and others added 14 commits November 5, 2018 21:13
… 'from_work_dir' flag to reflect their correct extension, added nucmerplot and promerplot (combo tools of nucmer and mummerplot and promer and mummerplot), completely re-did the test data using smaller sequences, added labels to outputs, plus several other minor fixes
…axis option for mummerplot, and some other minor changes
…quirements, added macro for citation and mummerplot input, fixed version number, fixed output labels, added column names for certain outputs, and a few other fixes
@bgruening
Copy link
Member

@nickeener I rebased against master for you, but a few tests are still failing. Sorry.

@bgruening
Copy link
Member

@nekrut can you have a final look at this?
@nickeener I added a few things, can you maybe add a <description> to all tools?

@nekrut
Copy link
Contributor

nekrut commented Dec 3, 2018

@nickeener add help section as we discussed. Once done, @bgruening I think we are ready!

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone else?

Thanks @nickeener!

@nickeener
Copy link
Contributor Author

Thanks for all the help @bgruening!

@bgruening bgruening merged commit 9c835e6 into galaxyproject:master Dec 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants