Skip to content

Commit

Permalink
Add docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jakobnissen committed Sep 22, 2023
1 parent b224ffd commit ce53c60
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 12 deletions.
4 changes: 3 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
[deps]
BioGenerics = "47718e42-2ac5-11e9-14af-e5595289c2ea"
BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
FASTX = "c2308a5c-f048-11e8-3e8a-31650f418d12"

[compat]
BioSequences = "3"
Documenter = "0.27"
Documenter = "0.27"
24 changes: 23 additions & 1 deletion docs/src/files.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ end

# FASTX formatted files

### Readers and writers
### Readers and writers - basics
A `Reader` and a `Writer` are structs that wrap an IO, and allows efficient reading/writing of FASTX `Record`s.
For FASTA, use `FASTAReader` and `FASTAWriter`, and for FASTQ - well I'm sure you've guessed it.

Expand Down Expand Up @@ -75,6 +75,8 @@ UInt8[]
```

To use it correctly, either call `flush`, or close the writer first (which also closes the underlying stream).

### Readers and writers with do-syntax
It is recommended to use readers and writers to `do` syntax in the form:
```julia
FASTAWriter(open(my_path, "w")) do writer
Expand All @@ -97,6 +99,26 @@ end

However, this latter syntax does not easily extend to different types of IO, such as gzip compressed streams.

### `rdr` and `wtr` macros
The `rdr` and `wtr` macros use the passed file name to determine the FASTX reader or writer to use - including any compression file extensions.
Since this both uses heuristics, and the macro is a little opaque to users, it is recommended to use these macros for ephemeral REPL work, and not in packages where the more explicit forms are preferred.

The macro call `rdr"seqs.fna.gz"` expands to
```julia
FASTAReader(GzipDecompressorStream(open("seqs.fna.gz"; lock=false)))
```

To use rdr `rdr` and `wtr` macros with `do`-syntax, use the `defer` function.
The only purpose of the defer function is to enable `do`-syntax:

```julia
record = FASTARecord("my_header", "TAGAG")

defer(wtr"seqs.fna.gz") do writer
write(writer, record)
end
```

### Validate files
The functions `validate_fasta` and `validate_fastq` can be used to check if an `IO`
contains data that can be read as FASTX.
Expand Down
33 changes: 23 additions & 10 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,21 @@ Press `]` to enter pkg mode again, and enter the following:
```

## Quickstart
"FASTX" is a shorthand for the two related formats FASTA and FASTQ.
See more documentation in the sections in the sidebar.
See more detailed documentation in the sections in the sidebar.

### Read FASTA or FASTQ files
It is preferred to use the `do` syntax to automatically close the file when it's done:
### Read and writing FASTA or FASTQ files
The idiomatic way uses a `do` block to automatically close the reader or writer
when the block is exited:
```julia
FASTAReader(open("seqs.fna")) do reader
for record in reader
println(identifier(record))
end
end
end

FASTQWriter(open("reads.fq", "w")) do writer
write(writer, FASTQRecord("abc", "TAG", "ABC"))
end
```

Alternatively, you can open and close the reader manually:
Expand All @@ -46,18 +50,27 @@ end
close(reader)
```

### Write FASTA or FASTQ files
### Read or write GZip compressed FASTA or FASTQ files
For this you need to use a separate package to read GZip files, such as the `CodecZlib` package:

```julia
FASTQWriter(open("reads.fq", "w")) do writer
write(writer, FASTQRecord("abc", "TAG", "ABC"))
using CodecZlib

FASTAReader(GzipDecompressorStream(open("seqs.fna.gz"))) do reader
for record in reader
println(identifier(record))
end
end
```

### Read and write Gzip compressed FASTA files
For added convenience, you can also use the reader and writer macros `rdr""` and `wtr""`.
These macros use the file extensions to determine the biological sequence reader or writer type, and any file compresion.
To use these macros with the `do`-syntax, you can use the `defer` function. Hence, the above code block can also be written in the following equivalent way:

```julia
using CodecZlib

FASTAReader(GzipDecompressorStream(open("seqs.fna.gz"))) do reader
defer(rdr"seqs.fna.gz") do reader
for record in reader
println(identifier(record))
end
Expand Down

0 comments on commit ce53c60

Please sign in to comment.