Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra parameters for MUMme/Circos/dotplot workflow #182

Open
CeciliaDeng opened this issue Nov 20, 2024 · 7 comments
Open

Add extra parameters for MUMme/Circos/dotplot workflow #182

CeciliaDeng opened this issue Nov 20, 2024 · 7 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@CeciliaDeng
Copy link
Collaborator

Hi @GallVp ,

For my project, I need to filter the synteny alignment based on a strict identity (for example, delta-filter -i 95). How to config this in assemblyQC? One option I'm considering is to add extra_delta-filter_args = "--i 95" in the assemblyqc/nextflow.config. Will this work?

@CeciliaDeng CeciliaDeng added the documentation Improvements or additions to documentation label Nov 20, 2024
@GallVp
Copy link
Member

GallVp commented Nov 20, 2024

Thank you for raising this issue @CeciliaDeng

No, simply adding a parameter in the nextflow.config file won't work do what you intend to achieve. I can suggest something once I know,

  • Which file do you want to filter?
  • Which tool is needed to filter this file? Is it minimap2 or is it delta-filter?
  • Does the pipeline already have this tool?
  • Can you identify the tool on the pipeline flowchart?

@GallVp GallVp added the awaiting-feedback Waiting for input from user label Nov 20, 2024
@CeciliaDeng
Copy link
Collaborator Author

Hi @GallVp ,

It's part of the MUMMER package. Usually for synteny check , we run alignment (nucmer or minimap2), followed by filtering with various options on the minimum identity/aligned length etc. (delta-filter or other tools), and pass on for visualization (circos or dotplot).

@GallVp
Copy link
Member

GallVp commented Nov 20, 2024

@CeciliaDeng

We don't have delta-filter in our Mummer/Circos workflow. We will have to add the delta-filter to allow for its arguments.

The current Mummer/Circos involves quite a few steps. The key steps are: mummer/4.0.0/nucmer, dnadiff, and a custom bundle links script. You can include args for mummer and dnadiff.pl by adding custom configuration. The bundle links script exposes two arguments:

  • synteny_mummer_max_gap: Mummer alignments within this distance are bundled together, default 1000000
  • synteny_mummer_min_bundle_size: After bundling, any Mummer alignment bundle smaller than this size is filtered out, default 100000

See the full workflow below.

MUMMER ( FILTERSORTFASTA.out.fasta )
ch_versions = ch_versions.mix(MUMMER.out.versions.first())
// MODULE: GETFASTALENGTH
GETFASTALENGTH ( FILTERSORTFASTA.out.fasta )
ch_versions = ch_versions.mix(GETFASTALENGTH.out.versions.first())
// MODULE: DNADIFF
ch_dnadiff_inputs = FILTERSORTFASTA.out.fasta
| map { target, reference, target_fasta, ref_fasta ->
[ "${target}.on.${reference}", target_fasta, ref_fasta ]
}
| join(
MUMMER.out.delta
)
DNADIFF(
ch_dnadiff_inputs,
mummer_m2m_align
)
ch_versions = ch_versions.mix(DNADIFF.out.versions.first())
// MODULE: BUNDLELINKS
BUNDLELINKS(
DNADIFF.out.coords,
mummer_max_gap,
mummer_min_bundle_size
)
ch_versions = ch_versions.mix(BUNDLELINKS.out.versions.first())
// MODULE: COLOURBUNDLELINKS
COLOURBUNDLELINKS(
BUNDLELINKS.out.links,
color_by_contig
)
ch_coloured_links = COLOURBUNDLELINKS.out.coloured_links
ch_versions = ch_versions.mix(COLOURBUNDLELINKS.out.versions.first())
// MODULE: RELABELBUNDLELINKS
ch_relabellinks_inputs = ch_coloured_links
| join(ch_combination_labels)
RELABELBUNDLELINKS ( ch_relabellinks_inputs )
ch_versions = ch_versions.mix(RELABELBUNDLELINKS.out.versions.first())
// MODULE: SPLITBUNDLEFILE
SPLITBUNDLEFILE(
RELABELBUNDLELINKS.out.relabeled_links,
plot_1_vs_all
)
ch_split_links = SPLITBUNDLEFILE.out.split_file
| map { flattenSplitBundles(it) }
| flatten
| buffer(size:3)
ch_versions = ch_versions.mix(SPLITBUNDLEFILE.out.versions.first())
// MODULE: RELABELFASTALENGTH
ch_relabelfastalength_inputs = GETFASTALENGTH.out.length
| join(ch_combination_labels)
RELABELFASTALENGTH ( ch_relabelfastalength_inputs )
ch_versions = ch_versions.mix(RELABELFASTALENGTH.out.versions.first())
// MODULE: GENERATEKARYOTYPE
ch_generate_karyotype_inputs = RELABELFASTALENGTH.out.relabeled_seq_lengths
| cross(
ch_split_links
)
| map { seq_len_tuple, split_bundle_tuple ->
def target_on_xref = seq_len_tuple[0]
def seq_tag = split_bundle_tuple[1]
def split_bundle_file = split_bundle_tuple[2]
def target_seq_len = seq_len_tuple[1]
def ref_seq_len = seq_len_tuple[2]
[ target_on_xref, seq_tag, split_bundle_file, target_seq_len, ref_seq_len ]
}
GENERATEKARYOTYPE ( ch_generate_karyotype_inputs )
ch_versions = ch_versions.mix(GENERATEKARYOTYPE.out.versions.first())
// MODULE: CIRCOS
ch_circos_inputs = ( mummer_plot_type in [ 'circos', 'both' ] )
? ch_split_links
| map { target_on_xref, seq_tag, txt ->
[ "${target_on_xref}.${seq_tag}", txt ]
}
| join(GENERATEKARYOTYPE.out.karyotype)
: Channel.empty()
CIRCOS ( ch_circos_inputs )

@rosscrowhurst
Copy link
Collaborator

@GallVp 'dnadiff' is just a wrapper script from MUMMer devs to make QOL improvements - it is effectively just running other mummer commands including delat-filter, show_coords etc. My workflow you are using used dnafiff as it gives everything I want from its defaults. You can see the commadss :Output will be...
out.report - Summary of alignments, differences and SNPs
out.delta - Standard nucmer alignment output
out.1delta - 1-to-1 alignment from delta-filter -1
out.mdelta - M-to-M alignment from delta-filter -m
out.1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
out.mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
out.snps - SNPs from show-snps -rlTHC .1delta
out.rdiff - Classified ref breakpoints from show-diff -rH .mdelta
out.qdiff - Classified qry breakpoints from show-diff -qH .mdelta
out.unref - Unaligned reference sequence IDs and lengths
out.unqry - Unaligned query sequence IDs and lengths

@GallVp
Copy link
Member

GallVp commented Nov 21, 2024

@rosscrowhurst Thank you for the input.

dnadiff does not expose identity filter such as -i 95 that @CeciliaDeng needs. Is there a way to do that with dnadiff ?

@rosscrowhurst
Copy link
Collaborator

rosscrowhurst commented Nov 21, 2024

dnadiff.pl : https://github.com/garviz/MUMmer/blob/master/scripts/dnadiff.pl. Here is a somewhat simple why to do it. OPTION1: do not use dnadiff but run the command chain it wraps as separate commands and add the optional -i 95 to it if set by user. OPTION2: have 2 versions of dnadiff - one called dnadiff (as int eh standard script) and one called dnadiffCD95 (you get the picture) where in you hardcode the -i 95 option in at line 152 153 to contain this required new option. Then use an if else to run based on user setting.

sub RunFilter()
{
print STDERR "Filtering alignments\n";
my $cmd1 = "$DELTA_FILTER -1 $OPT_DeltaFile > $OPT_DeltaFile1";
my $cmd2 = "$DELTA_FILTER -m $OPT_DeltaFile > $OPT_DeltaFileM";
my $err = "ERROR: Failed to run delta-filter, aborting.\n";

system($cmd1) == 0 or die $err;
system($cmd2) == 0 or die $err;

}

This is the PERL code from dnadiff - they are actually just bash statements that get run by using the perl system command so you could easily just run them using the same logic - you just dont get the customized report at the end. But since my workflow you are using does not actually use that and only really needs the .1coords or .mcoords output files (custom reports and other outputs are useful to look at which is why I use dnadiff but not mandatory to the result for circos use case then cvonverting from using dnadiff to the underlying mummer scripts it wraps is feasible and would allow you to support passing more variables that dnadifff as written does not support

my $cmd = "$NUCMER --maxmatch -p $OPT_Prefix $OPT_RefFile $OPT_QryFile";
my $cmd1 = "$DELTA_FILTER -1 $OPT_DeltaFile > $OPT_DeltaFile1";
my $cmd2 = "$DELTA_FILTER -m $OPT_DeltaFile > $OPT_DeltaFileM";
my $cmd = "$SHOW_SNPS -rlTHC $OPT_DeltaFile1 > $OPT_SnpsFile";
my $cmd1 = "$SHOW_COORDS -rclTH $OPT_DeltaFile1 > $OPT_CoordsFile1";
my $cmd2 = "$SHOW_COORDS -rclTH $OPT_DeltaFileM > $OPT_CoordsFileM";
my $cmd1 = "$SHOW_DIFF -rH $OPT_DeltaFileM > $OPT_DiffRFile";
my $cmd2 = "$SHOW_DIFF -qH $OPT_DeltaFileM > $OPT_DiffQFile";

@GallVp
Copy link
Member

GallVp commented Nov 21, 2024

@rosscrowhurst That's vey useful. I prefer OPTION1 because it adds a lot of flexibility to the pipeline and also avoids maintenance of a custom version of dnadiff.

@GallVp GallVp changed the title Pass extra parameters to minimap2 Add extra parameters for MUMme/Circos/dotplot workflow Nov 21, 2024
@GallVp GallVp added enhancement New feature or request help wanted Extra attention is needed and removed awaiting-feedback Waiting for input from user labels Nov 21, 2024
@GallVp GallVp added this to the 2.3.0 milestone Nov 21, 2024
@GallVp GallVp removed their assignment Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants