-
Notifications
You must be signed in to change notification settings - Fork 0
Sequence Manipulation
The default way to import sequence data into a GLUE project is by using the import source
command. This command expects a folder containing individual sequence files, either in FASTA or GenBank XML format. When importing FASTA files, the sequence header will be used as the sequence ID, and it is recommended to name the files by their sequence IDs to ensure clarity and consistency.
Here is an example of the import source
command:
Mode path: /
GLUE> run file buildCoreProject.glue
In this example, the system imports sequences from the specified folder, recognizing each file format (in this case, GenBank XML) and assigning the appropriate sequenceID
based on the file contents. If the files were in FASTA format, the sequence header would be treated as the sequenceID
.
This approach allows for streamlined sequence data management, especially when working with large datasets organized by sequence identifiers.
Module Type: fastaImporter
The fastaImporter
module allows you to import nucleotide data from a FASTA file, creating a set of Sequence objects.
-
Type-Specific Commands:
-
import
: Imports sequences from a FASTA file.
-
-
Usage Example:
GLUE> import path/to/sequences.fasta
General Module Mode Commands: In addition to the above command, all general module mode commands are available for use after importing.
Exporting sequence data from GLUE allows researchers to save their results in various formats for further analysis or sharing. The primary method for exporting sequences is through the fastaExporter
module.
Using the fastaExporter
Module
The fastaExporter
module provides a command for exporting sequences to a FASTA file. Below are the command options and an example usage:
- Command Syntax:
export (-w <whereClause> | -a) [-o <offset> -b <batchSize>] [-y <lineFeedStyle>] [-r] [-t] (-p | -f <fileName>)
-
Options:
-
-y <lineFeedStyle>
or--lineFeedStyle <lineFeedStyle>
: Specifies the line feed style (LF or CRLF). -
-f <fileName>
or--fileName <fileName>
: Name of the output FASTA file. -
-w <whereClause>
or--whereClause <whereClause>
: Qualifies the sequences to be exported based on specified criteria. -
-o <offset>
or--offset <offset>
: Paged query offset for batch processing. -
-b <batchSize>
or--batchSize <batchSize>
: Number of sequences to export in each batch. -
-a
or--allSequences
: Exports all sequences in the project. -
-r
or--suppressReverseComplement
: Suppresses the reverse complement of sequences in the output. -
-t
or--suppressRotation
: Suppresses the rotation of sequences in the output. -
-p
or--preview
: Displays a preview of the output without saving the file.
-
Example Command:
To export DENV sequences belonging to a specific major lineage and preview the output, you can use the following command:
GLUE> module fastaExporter export -w "major_lineage = '1V_E'" -p
In this example, the -w
option filters the sequences to include only those with major_lineage
equal to '1V_E', and the -p
option previews the output instead of saving it.
Export Location:
The exported file will be saved in a location relative to the current load/save directory in GLUE.
Additional Notes:
- The
whereClause
can be tailored to suit specific research needs, enabling selective exporting based on various sequence attributes. - Users can leverage the batch size and offset options for large datasets to manage memory usage and improve performance during the export process.
By utilizing the fastaExporter
module, researchers can efficiently export sequence data from their GLUE projects for further analysis, sharing, or integration with other bioinformatics tools.