Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blast.ml to lib/bioinfo #38

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions lib/bioinfo/blast.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
open Core
open Bistro.Std
open Bistro.EDSL
open Bistro_bioinfo.Std

type db = [`blast_db] directory
let env = docker_image ~account:"pveber" ~name:"ncbi-blast" ~tag:"2.4.0" ()

let db_name = "db"

let fastadb fa dbtype =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better call it makedb since it's the name of the tool

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also dbtype should be the first argument and named

workflow ~descr:"blast.makedb" [
mkdir_p dest ;
cmd ~env "makeblastdb" [
opt "-in" dep fa ;
opt "-dbtype" ident dbtype ;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type of dbtype is Bistro.Template.t, which is really not convenient for a user. I'd much prefer have a polymorphic variant type here [nucl | prot]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have to create the type ? In the .ml ? I don't really see how to call the function then.
Something like type polymorphic_variant = Prot | Nucl in .ml and call Blast.makedb ~dbtype:Prot ref_prot outside ? With a "reconversion" to string somewhere to call the command line.
(it's not exactly like that because it's not working)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually if you use polymorphic variants `Nucl and `Prot, you don't need to declare the type and it's also easier when calling the function. So yes, basically a call would be just like you say Blast.makedb ~dbtype:`Prot ref_prot, and yes you have to provide the conversion function that will take a variant and make an appropriate Bistro.Template.t out of it.

opt "-out" ident (dest // db_name) ;
] ;
]

(* Basic blastn*)

let blastn ?evalue ?word_size ?task ?gapopen ?gapextend ?penalty
?reward ?outfmt ?perc_identity ?qcov_hsp_perc ?max_hsps ?max_target_seqs ?(threads = 4) db query out_name = (*See blastn documentation to know what options are*)
workflow ~descr:"blastn" ~np:threads [
mkdir_p dest ;
cmd "blastn" ~env [
opt "-db" ident (dep db // db_name) ;
opt "-query" dep query ;
opt "-out" ident (dest // out_name) ;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the motivation for creating a directory here? Couldn't we give dest to the -out option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's already created in makedb ? Or do we have to create it in the main pipeline ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes, the same could be said about the database, I think in both cases you don't really need to create a directory and put the resulting file in it. The resulting file should be put at the location dest directly I think.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more precise in think your wrapper could be simplified like this:

opt "-db" dep db ;
opt "-out" ident dest ;

of course the wrapper for makedb should be modified accordingly.

option (opt "-evalue" float) evalue ;
option (opt "-word_size" int) word_size ;
option (opt "-task" string) task ;
option (opt "-gapopen" int) gapopen ;
option (opt "-gapextend" int) gapextend ;
option (opt "-penalty" int) penalty ;
option (opt "-reward" int) reward ;
option (opt "-outfmt" string) outfmt ;
option (opt "-perc_identity" float) perc_identity ;
option (opt "-qcov_hsp_perc" float) qcov_hsp_perc ;
option (opt "-max_hsps" int) max_hsps ;
option (opt "-max_target_seqs" int) max_target_seqs ;
opt "-num_threads" ident np ;
]
]

let blastp ?evalue ?word_size ?task ?gapopen ?gapextend ?penalty
?reward ?outfmt ?perc_identity ?qcov_hsp_perc ?max_hsps ?max_target_seqs ?(threads = 4) db query out_name = (*See blastn documentation to know what options are*)
workflow ~descr:"blastp" ~np:threads [
mkdir_p dest ;
cmd "blastp" ~env [
opt "-db" ident (dep db // db_name) ;
opt "-query" dep query ;
opt "-out" ident (dest // out_name) ;
option (opt "-evalue" float) evalue ;
option (opt "-word_size" int) word_size ;
option (opt "-task" string) task ;
option (opt "-gapopen" int) gapopen ;
option (opt "-gapextend" int) gapextend ;
option (opt "-penalty" int) penalty ;
option (opt "-reward" int) reward ;
option (opt "-outfmt" string) outfmt ;
option (opt "-perc_identity" float) perc_identity ;
option (opt "-qcov_hsp_perc" float) qcov_hsp_perc ;
option (opt "-max_hsps" int) max_hsps ;
option (opt "-max_target_seqs" int) max_target_seqs ;
opt "-num_threads" ident np ;
]
]



11 changes: 8 additions & 3 deletions lib/bioinfo/prokka.ml
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,14 @@ let gram_expr = function
| `Plus -> string "+"
| `Minus -> string "-"

let run ?prefix ?addgenes ?locustag ?increment ?gffver ?compliant
let run ?addgenes ?locustag ?increment ?gffver ?compliant
?centre ?genus ?species ?strain ?plasmid ?kingdom ?gcode ?gram
?usegenus ?proteins ?hmms ?metagenome ?rawproduct ?fast ?(threads = 1)
?mincontiglen ?evalue ?rfam ?norrna ?notrna ?rnammer fa =
workflow ~descr:"prokka" ~np:threads ~mem:(3 * 1024) [
mkdir_p dest ;
cmd "prokka" ~env [
string "--force" ;
option (opt "--prefix" string) prefix ;
string "--force --prefix prokka_res" ;
option (flag string "--addgenes") addgenes ;
option (opt "--locustag" string) locustag ;
option (opt "--increment" int) increment ;
Expand Down Expand Up @@ -47,3 +46,9 @@ let run ?prefix ?addgenes ?locustag ?increment ?gffver ?compliant
dep fa ;
] ;
]


let transcripts = selector ["prokka_res.ffn"]
let proteins = selector ["prokka_res.faa"]
let gff_annotation = selector [ "prokka_res.gff" ]
let gbk_annotation = selector [ "prokka_res.gbk" ]