-
Notifications
You must be signed in to change notification settings - Fork 0
Interacting with the Dengue‐GLUE Project
The GLUE engine is a powerful tool for managing virus sequence data resources. Its console-driven command layer serves as the public interface, enabling users to manipulate data, extend the project's data schema, and integrate bioinformatics tools such as BLAST, RAxML, and MAFFT into their workflows.
The GLUE command line interpreter allows you to enter GLUE commands and see their results interactively. Commands may also be run from a batch file using the run file
command, as in the example project build.
This page explains how to navigate the GLUE environment, perform database queries, and export data.
Projects within GLUE are structured according to an underlying data model, and users interact with data objects using a command-line interface (the GLUE console).
The GLUE console operates based on the mode path, which reflects the current position within the project data model. Depending on your position in the data model, different commands become available.
For example, in the root mode path (i.e. Mode path: /
), the list command is available for listing projects.
Mode path: /
GLUE> list project
+========+========================================+
| name | description |
+========+========================================+
| dengue | A GLUE project for Dengue virus (DENV) |
+========+========================================+
Projects found: 1
You can use tab completion to view available fields and options, such as limiting or filtering data output.
To display a help page on any command, use the help
command. For example to get information on the list
command, type:
GLUE> help list
Navigating into Data Objects: You can navigate into a specific data object, such as the dengue project, via the GLUE console. For example, if you type the following command and press return, the mode path will update accordingly:
Mode path: /
GLUE> project dengue
OK
Mode path: /project/dengue
Under this mode path, tab command completion reveals new options for the list command, reflecting the data model.
GLUE> list
alignment almt-member custom-table-row feature
feature-location format member-floc-note module
reference sequence source var-almt-note
variation
For example, alignments can be listed with the following command:
GLUE> list alignment
+===================+=================+======================================+
| name | parent.name | refSequence.name |
+===================+=================+======================================+
| AL_DENV_1 | - | REF_MASTER_DENV1 |
| AL_DENV_1I | AL_DENV_1 | REF_DENV_1I_KU509258 |
| AL_DENV_1III | AL_DENV_1 | REF_DENV_1III_EU179860 |
| AL_DENV_1III_A | AL_DENV_1III | REF_DENV_1III_A_EU179860 |
We can also navigate into alignment objects, as shown here:
Mode path: /project/dengue
GLUE> alignment AL_DENV_1I
OK
Mode path: /project/dengue/alignment/AL_DENV_1I
This will make new command options available - as revealed by tab command completion:
Mode path: /project/dengue/alignment/AL_DENV_1I
GLUE>
add amino-acid clear commit config
console count data-util demote derive
descendent-tree exit export extract file-util
glue-engine help list member new-context
project-mode quit remove render-object root-mode
run score set show unset
variation web-list
Exiting a Level: To move up one level in the data model:
GLUE> exit
You will return to the previous level:
Mode path: /project/dengue
Shortcut to Project or Root Levels: To return directly to the project or root level:
GLUE> project-mode
Mode path: /project/dengue/
GLUE> root-mode
Mode path: /
A significant amount of functionality in the command layer is provided via the "module" mechanism. The current release of the GLUE engine includes more than 40 module types, all documented online. When a module is created, the associated commands for that module type become available.
Modules are stored as data objects, each containing a configuration document that modulates the operation of its commands. This flexibility allows module functionality to be customized for each project. For example, module commands can:
- Work with or update the project dataset.
- Operate on data obtained from the local file system or attached to incoming web requests.
- Interface with external bioinformatics programs such as BLAST+.
Modules help adapt built-in functionality on a per-project basis, facilitating a wide range of operations.
Perform a query to list attributes of sequences, filtered by specific conditions.
For example, you can list data rows is table objects as follows:
- Core Table Query:
GLUE> list sequence
-
Custom Table Query: To query a custom table like
isolate_data
:
GLUE> list custom-table-row isolate_data
For example, to list sequences from a specific source:
GLUE> project dengue list sequence sequenceID source.name serotype genotype major_lineage minor_lineage minor_sublineage -w "source.name = 'ncbi-nuccore-short'"
Example output:
+============+====================+==========+==========+===============+===============+==================+
| sequenceID | source.name | serotype | genotype | major_lineage | minor_lineage | minor_sublineage |
+============+====================+==========+==========+===============+===============+==================+
| A91810 | ncbi-nuccore-short | 2 | - | - | - | - |
| A91814 | ncbi-nuccore-short | 4 | 4II | 4II_B | - | - |
+============+====================+==========+==========+===============+===============+==================+
GLUE allows you to export query results in various formats such as TSV, CSV, and JSON:
- Setting the Output Format: Before running queries, specify the desired output format. For instance, to export in TSV format:
GLUE> console set cmd-output-file-format tab
- Defining the Output File: Specify the output file for the next command's results:
GLUE> console set next-cmd-output-file denv-genotypes-fragment-seqs.tsv
-
Exporting Results: After running the query, the results will be saved in the specified file (
denv-genotypes-fragment-seqs.tsv
), ready for analysis in spreadsheet software or scripting environments.
GLUE offers flexibility in querying core and custom tables. You can filter, sort, and join data from various tables to retrieve meaningful insights:
-
Filtering: Use the
-w
(where) clause to filter results:
GLUE> list sequence -w "genotype = '1'"
-
Sorting: Add the
-sort
option to organize results by one or more columns. -
Joining Core and Custom Tables: Custom tables extend the core schema, and their fields can be queried alongside core tables. For example, linking custom
isolate_data
with core sequence data:
GLUE> list sequence sequenceID isolate_data.country isolate_data.collection_year
- Wildcards: Command 'where' clauses can use wildcards. Here a wildcard is used to create a list command that will selectively list DENV4 alignments:
Mode path: /project/dengue
GLUE> list alignment --where "name like 'AL_DENV_4%'"
+=================+===============+============================+
| name | parent.name | refSequence.name |
+=================+===============+============================+
| AL_DENV_4 | - | REF_MASTER_DENV4 |
| AL_DENV_4I | AL_DENV_4 | REF_DENV_4I_MN018398 |
| AL_DENV_4II | AL_DENV_4 | REF_DENV_4II_KX812530 |
| AL_DENV_4III | AL_DENV_4 | REF_DENV_4III_MW945621 |
| AL_DENV_4II_A | AL_DENV_4II | REF_DENV_4II_A_MN018396 |
| AL_DENV_4II_A1 | AL_DENV_4II_A | REF_DENV_4II_A.1_OL314744 |
| AL_DENV_4II_A2 | AL_DENV_4II_A | REF_DENV_4II_A.2_KC762696 |
| AL_DENV_4II_B | AL_DENV_4II | REF_DENV_4II_B_KT276273 |
| AL_DENV_4I_A | AL_DENV_4I | REF_DENV_4I_A_MN018395 |
| AL_DENV_4I_A1 | AL_DENV_4I_A | REF_DENV_4I_A.1_OQ427041 |
| AL_DENV_4I_A1.1 | AL_DENV_4I_A1 | REF_DENV_4I_A.1.1_MZ976860 |
| AL_DENV_4I_A1.2 | AL_DENV_4I_A1 | REF_DENV_4I_A.1.2_OQ427041 |
| AL_DENV_4I_A2 | AL_DENV_4I_A | REF_DENV_4I_A.2_MN018395 |
| AL_DENV_4I_A3 | AL_DENV_4I_A | REF_DENV_4I_A.3_MN449006 |
| AL_DENV_4I_B | AL_DENV_4I | REF_DENV_4I_B_MN018398 |
| AL_DENV_4I_B1 | AL_DENV_4I_B | REF_DENV_4I_B.1_OP984198 |
| AL_DENV_4I_B2 | AL_DENV_4I_B | REF_DENV_4I_B.2_KU509287 |
+=================+===============+============================+
Alignments found: 17
-
list alignment
:- This part of the command is used to list all entries in the alignment table of the current GLUE project.
-
alignment
refers to a table or entity that holds information about multiple sequence alignments of DENV sequences. In GLUE, alignments can represent viral genetic sequences arranged for comparison.
-
--where
clause:- This clause is used to filter the results by applying a condition. It limits the entries returned to only those that meet the specified condition.
- The condition in this case is based on the name field of the alignment entries.
-
"name like 'AL_DENV_4%'"
:- This is the condition applied within the
--where
clause. -
name like 'AL_DENV_4%'
specifies that only entries where the name field starts with the prefixAL_DENV_4
should be listed. - The
%
symbol is a wildcard in SQL-like queries, meaning that any characters (or no characters) can follow the prefixAL_DENV_4
.
- This is the condition applied within the