Skip to content

Interacting with the Dengue‐GLUE Project

Robert J. Gifford edited this page Oct 16, 2024 · 6 revisions

Overview

The GLUE engine is a powerful tool for managing virus sequence data resources. Its console-driven command layer serves as the public interface, enabling users to manipulate data, extend the project's data schema, and integrate bioinformatics tools such as BLAST, RAxML, and MAFFT into their workflows.

The GLUE command line interpreter allows you to enter GLUE commands and see their results interactively. Commands may also be run from a batch file using the run file command, as in the example project build.

This page explains how to navigate the GLUE environment, perform database queries, and export data.

The GLUE Engine and Command Layer

Projects within GLUE are structured according to an underlying data model, and users interact with data objects using a command-line interface (the GLUE console).

The GLUE console operates based on the mode path, which reflects the current position within the project data model. Depending on your position in the data model, different commands become available.

For example, in the root mode path (i.e. Mode path: /), the list command is available for listing projects.

Mode path: /
GLUE> list project 
+========+========================================+
|  name  |              description               |
+========+========================================+
| dengue | A GLUE project for Dengue virus (DENV) |
+========+========================================+
Projects found: 1

You can use tab completion to view available fields and options, such as limiting or filtering data output.

To display a help page on any command, use the help command. For example to get information on the list command, type:

GLUE> help list 

Navigating GLUE Projects

Navigating into Data Objects: You can navigate into a specific data object, such as the dengue project, via the GLUE console. For example, if you type the following command and press return, the mode path will update accordingly:

Mode path: /
GLUE> project dengue
OK
Mode path: /project/dengue

Under this mode path, tab command completion reveals new options for the list command, reflecting the data model.

GLUE> list 
alignment           almt-member         custom-table-row    feature             
feature-location    format              member-floc-note    module              
reference           sequence            source              var-almt-note       
variation  

For example, alignments can be listed with the following command:

GLUE> list alignment 
+===================+=================+======================================+
|       name        |   parent.name   |           refSequence.name           |
+===================+=================+======================================+
| AL_DENV_1         | -               | REF_MASTER_DENV1                     |
| AL_DENV_1I        | AL_DENV_1       | REF_DENV_1I_KU509258                 |
| AL_DENV_1III      | AL_DENV_1       | REF_DENV_1III_EU179860               |
| AL_DENV_1III_A    | AL_DENV_1III    | REF_DENV_1III_A_EU179860             |

We can also navigate into alignment objects, as shown here:

Mode path: /project/dengue
GLUE> alignment AL_DENV_1I
OK
Mode path: /project/dengue/alignment/AL_DENV_1I

This will make new command options available - as revealed by tab command completion:

Mode path: /project/dengue/alignment/AL_DENV_1I
GLUE> 
add                amino-acid         clear              commit             config             
console            count              data-util          demote             derive             
descendent-tree    exit               export             extract            file-util          
glue-engine        help               list               member             new-context        
project-mode       quit               remove             render-object      root-mode          
run                score              set                show               unset              
variation          web-list           

Exiting a Level: To move up one level in the data model:

GLUE> exit

You will return to the previous level:

Mode path: /project/dengue

Shortcut to Project or Root Levels: To return directly to the project or root level:

GLUE> project-mode
Mode path: /project/dengue/

GLUE> root-mode
Mode path: /

GLUE Modules

A significant amount of functionality in the command layer is provided via the "module" mechanism. The current release of the GLUE engine includes more than 40 module types, all documented online. When a module is created, the associated commands for that module type become available.

Modules are stored as data objects, each containing a configuration document that modulates the operation of its commands. This flexibility allows module functionality to be customized for each project. For example, module commands can:

  • Work with or update the project dataset.
  • Operate on data obtained from the local file system or attached to incoming web requests.
  • Interface with external bioinformatics programs such as BLAST+.

Modules help adapt built-in functionality on a per-project basis, facilitating a wide range of operations.

Executing Database Queries

Perform a query to list attributes of sequences, filtered by specific conditions.

For example, you can list data rows is table objects as follows:

  • Core Table Query:
GLUE> list sequence
  • Custom Table Query: To query a custom table like isolate_data:
GLUE> list custom-table-row isolate_data

For example, to list sequences from a specific source:

GLUE> project dengue list sequence sequenceID source.name serotype genotype major_lineage minor_lineage minor_sublineage -w "source.name = 'ncbi-nuccore-short'"

Example output:

+============+====================+==========+==========+===============+===============+==================+
| sequenceID |    source.name     | serotype | genotype | major_lineage | minor_lineage | minor_sublineage |
+============+====================+==========+==========+===============+===============+==================+
| A91810     | ncbi-nuccore-short | 2        | -        | -             | -             | -                |
| A91814     | ncbi-nuccore-short | 4        | 4II      | 4II_B         | -             | -                |
+============+====================+==========+==========+===============+===============+==================+

Exporting Results

GLUE allows you to export query results in various formats such as TSV, CSV, and JSON:

  • Setting the Output Format: Before running queries, specify the desired output format. For instance, to export in TSV format:
GLUE> console set cmd-output-file-format tab
  • Defining the Output File: Specify the output file for the next command's results:
GLUE> console set next-cmd-output-file denv-genotypes-fragment-seqs.tsv
  • Exporting Results: After running the query, the results will be saved in the specified file (denv-genotypes-fragment-seqs.tsv), ready for analysis in spreadsheet software or scripting environments.

Advanced Querying

GLUE offers flexibility in querying core and custom tables. You can filter, sort, and join data from various tables to retrieve meaningful insights:

  • Filtering: Use the -w (where) clause to filter results:
GLUE> list sequence -w "genotype = '1'"
  • Sorting: Add the -sort option to organize results by one or more columns.

  • Joining Core and Custom Tables: Custom tables extend the core schema, and their fields can be queried alongside core tables. For example, linking custom isolate_data with core sequence data:

GLUE> list sequence sequenceID isolate_data.country isolate_data.collection_year
  • Wildcards: Command 'where' clauses can use wildcards. Here a wildcard is used to create a list command that will selectively list DENV4 alignments:
Mode path: /project/dengue
GLUE> list alignment --where "name like 'AL_DENV_4%'"
+=================+===============+============================+
|      name       |  parent.name  |      refSequence.name      |
+=================+===============+============================+
| AL_DENV_4       | -             | REF_MASTER_DENV4           |
| AL_DENV_4I      | AL_DENV_4     | REF_DENV_4I_MN018398       |
| AL_DENV_4II     | AL_DENV_4     | REF_DENV_4II_KX812530      |
| AL_DENV_4III    | AL_DENV_4     | REF_DENV_4III_MW945621     |
| AL_DENV_4II_A   | AL_DENV_4II   | REF_DENV_4II_A_MN018396    |
| AL_DENV_4II_A1  | AL_DENV_4II_A | REF_DENV_4II_A.1_OL314744  |
| AL_DENV_4II_A2  | AL_DENV_4II_A | REF_DENV_4II_A.2_KC762696  |
| AL_DENV_4II_B   | AL_DENV_4II   | REF_DENV_4II_B_KT276273    |
| AL_DENV_4I_A    | AL_DENV_4I    | REF_DENV_4I_A_MN018395     |
| AL_DENV_4I_A1   | AL_DENV_4I_A  | REF_DENV_4I_A.1_OQ427041   |
| AL_DENV_4I_A1.1 | AL_DENV_4I_A1 | REF_DENV_4I_A.1.1_MZ976860 |
| AL_DENV_4I_A1.2 | AL_DENV_4I_A1 | REF_DENV_4I_A.1.2_OQ427041 |
| AL_DENV_4I_A2   | AL_DENV_4I_A  | REF_DENV_4I_A.2_MN018395   |
| AL_DENV_4I_A3   | AL_DENV_4I_A  | REF_DENV_4I_A.3_MN449006   |
| AL_DENV_4I_B    | AL_DENV_4I    | REF_DENV_4I_B_MN018398     |
| AL_DENV_4I_B1   | AL_DENV_4I_B  | REF_DENV_4I_B.1_OP984198   |
| AL_DENV_4I_B2   | AL_DENV_4I_B  | REF_DENV_4I_B.2_KU509287   |
+=================+===============+============================+
Alignments found: 17

Components of the Command:

  1. list alignment:
    • This part of the command is used to list all entries in the alignment table of the current GLUE project.
    • alignment refers to a table or entity that holds information about multiple sequence alignments of DENV sequences. In GLUE, alignments can represent viral genetic sequences arranged for comparison.
  2. --where clause:
    • This clause is used to filter the results by applying a condition. It limits the entries returned to only those that meet the specified condition.
    • The condition in this case is based on the name field of the alignment entries.
  3. "name like 'AL_DENV_4%'":
    • This is the condition applied within the --where clause.
    • name like 'AL_DENV_4%' specifies that only entries where the name field starts with the prefix AL_DENV_4 should be listed.
    • The % symbol is a wildcard in SQL-like queries, meaning that any characters (or no characters) can follow the prefix AL_DENV_4.

Clone this wiki locally