From ce53c60e6b108ec50ed017058ae751eb764ac63e Mon Sep 17 00:00:00 2001 From: Jakob Nybo Nissen Date: Fri, 22 Sep 2023 16:15:09 +0200 Subject: [PATCH] Add docs --- docs/Project.toml | 4 +++- docs/src/files.md | 24 +++++++++++++++++++++++- docs/src/index.md | 33 +++++++++++++++++++++++---------- 3 files changed, 49 insertions(+), 12 deletions(-) diff --git a/docs/Project.toml b/docs/Project.toml index 08045d2..0fbc2a7 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -1,7 +1,9 @@ [deps] +BioGenerics = "47718e42-2ac5-11e9-14af-e5595289c2ea" BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59" Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" +FASTX = "c2308a5c-f048-11e8-3e8a-31650f418d12" [compat] BioSequences = "3" -Documenter = "0.27" +Documenter = "0.27" diff --git a/docs/src/files.md b/docs/src/files.md index 5ead08e..33fec45 100644 --- a/docs/src/files.md +++ b/docs/src/files.md @@ -7,7 +7,7 @@ end # FASTX formatted files -### Readers and writers +### Readers and writers - basics A `Reader` and a `Writer` are structs that wrap an IO, and allows efficient reading/writing of FASTX `Record`s. For FASTA, use `FASTAReader` and `FASTAWriter`, and for FASTQ - well I'm sure you've guessed it. @@ -75,6 +75,8 @@ UInt8[] ``` To use it correctly, either call `flush`, or close the writer first (which also closes the underlying stream). + +### Readers and writers with do-syntax It is recommended to use readers and writers to `do` syntax in the form: ```julia FASTAWriter(open(my_path, "w")) do writer @@ -97,6 +99,26 @@ end However, this latter syntax does not easily extend to different types of IO, such as gzip compressed streams. +### `rdr` and `wtr` macros +The `rdr` and `wtr` macros use the passed file name to determine the FASTX reader or writer to use - including any compression file extensions. +Since this both uses heuristics, and the macro is a little opaque to users, it is recommended to use these macros for ephemeral REPL work, and not in packages where the more explicit forms are preferred. + +The macro call `rdr"seqs.fna.gz"` expands to +```julia +FASTAReader(GzipDecompressorStream(open("seqs.fna.gz"; lock=false))) +``` + +To use rdr `rdr` and `wtr` macros with `do`-syntax, use the `defer` function. +The only purpose of the defer function is to enable `do`-syntax: + +```julia +record = FASTARecord("my_header", "TAGAG") + +defer(wtr"seqs.fna.gz") do writer + write(writer, record) +end +``` + ### Validate files The functions `validate_fasta` and `validate_fastq` can be used to check if an `IO` contains data that can be read as FASTX. diff --git a/docs/src/index.md b/docs/src/index.md index a7713e7..2a96fd4 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -23,17 +23,21 @@ Press `]` to enter pkg mode again, and enter the following: ``` ## Quickstart -"FASTX" is a shorthand for the two related formats FASTA and FASTQ. -See more documentation in the sections in the sidebar. +See more detailed documentation in the sections in the sidebar. -### Read FASTA or FASTQ files -It is preferred to use the `do` syntax to automatically close the file when it's done: +### Read and writing FASTA or FASTQ files +The idiomatic way uses a `do` block to automatically close the reader or writer +when the block is exited: ```julia FASTAReader(open("seqs.fna")) do reader for record in reader println(identifier(record)) end -end +end + +FASTQWriter(open("reads.fq", "w")) do writer + write(writer, FASTQRecord("abc", "TAG", "ABC")) +end ``` Alternatively, you can open and close the reader manually: @@ -46,18 +50,27 @@ end close(reader) ``` -### Write FASTA or FASTQ files +### Read or write GZip compressed FASTA or FASTQ files +For this you need to use a separate package to read GZip files, such as the `CodecZlib` package: + ```julia -FASTQWriter(open("reads.fq", "w")) do writer - write(writer, FASTQRecord("abc", "TAG", "ABC")) +using CodecZlib + +FASTAReader(GzipDecompressorStream(open("seqs.fna.gz"))) do reader + for record in reader + println(identifier(record)) + end end ``` -### Read and write Gzip compressed FASTA files +For added convenience, you can also use the reader and writer macros `rdr""` and `wtr""`. +These macros use the file extensions to determine the biological sequence reader or writer type, and any file compresion. +To use these macros with the `do`-syntax, you can use the `defer` function. Hence, the above code block can also be written in the following equivalent way: + ```julia using CodecZlib -FASTAReader(GzipDecompressorStream(open("seqs.fna.gz"))) do reader +defer(rdr"seqs.fna.gz") do reader for record in reader println(identifier(record)) end