-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bugfix – Inferred cell type field not in SDRF breaks experiment design (
#69) * Move checkDatabaseConnection to common_routines.sh * Remove unnecessary files * Add fixtures with fields absent in the SDRF file * Refactor load_exp_design.sh and AWK script with support for fields absent in the SDRF file * Restore files needed for testing, but set them in sensible locations * Use scratch directory for experiment design files * Set new experiment accession * Fix typos * Update version of the image used to run tests * Update version of the image used to run tests (!) * Changed scripts to adjust to changes in 1a0c76e * Use ${SCRATCH_DIR} if set to write the SQL file * Prepend ${SCRIPT_DIR} to AWK file * Remove duplicate line in fixture * Remove duplicate line in fixture * Remove unneeded configuration values for solr / zk * Remove unused configuration values for solr / zk * Add human experiment to tests * Fix typo in species when asserting experiment load correctness --------- Co-authored-by: Karoly Erdos <[email protected]>
- Loading branch information
1 parent
6326500
commit 908307e
Showing
33 changed files
with
315 additions
and
98 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Return the index of the first field that matches the given pattern, or 0 if it’s not found | ||
{ | ||
for (i = 1; i <= NF; ++i) { | ||
field = $i; | ||
if (field ~ pattern) { | ||
print i; | ||
exit; | ||
} | ||
} | ||
|
||
print 0; | ||
exit; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,59 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -e | ||
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]:-$0}" )" &> /dev/null && pwd ) | ||
source "${SCRIPT_DIR}/common_routines.sh" | ||
|
||
scriptDir=$(cd "$( dirname "${BASH_SOURCE[0]:-$0}" )" && pwd ) | ||
source $scriptDir/db_scxa_common.sh | ||
|
||
# Alfonso is bothered about dbConnection, it shouldn’t be camelCased because: | ||
# 1. It’s a constant, it should be DB_CONNECTION | ||
# 2. We use snake_case for Bash variables | ||
dbConnection=${dbConnection:-$1} | ||
condensed_sdrf_file=${CONDENSED_SDRF_FILE:-$2} | ||
sdrf_file=${SDRF_FILE:-$3} | ||
CONDENSED_SDRF_FILE=${CONDENSED_SDRF_FILE:-$2} | ||
SDRF_FILE=${SDRF_FILE:-$3} | ||
|
||
# Check that necessary environment variables are defined | ||
require_env_var "dbConnection" | ||
require_env_var "CONDENSED_SDRF_FILE" | ||
require_env_var "SDRF_FILE" | ||
checkDatabaseConnection "${dbConnection}" | ||
|
||
EXPERIMENT_ACCESSION=$(head -1 "${CONDENSED_SDRF_FILE}" | cut -f 1) | ||
DESTINATION_FILE=${SCRATCH_DIR:-${SCRIPT_DIR}}/${EXPERIMENT_ACCESSION}-exp-design.sql | ||
# Remove DESTINATION_FILE if it exists | ||
rm -f ${DESTINATION_FILE} | ||
|
||
# Check that necessary environment variables are defined. | ||
[ -z ${dbConnection+x} ] && echo "Env var dbConnection for the database connection needs to be defined. This includes the database name." && exit 1 | ||
[ -z ${CONDENSED_SDRF_FILE+x} ] && echo "Env var CONDENSED_SDRF_FILE for the experiment design data needs to be defined." && exit 1 | ||
[ -z ${SDRF_FILE+x} ] && echo "Env var SDRF_FILE for column sequence of experiment design needs to be defined." && exit 1 | ||
# Create the file and enclose all INSERT statements in a transaction | ||
echo "BEGIN;" >> ${DESTINATION_FILE} | ||
|
||
# for experiment design column table, we need to have a unique experiment accession, column name, and sample type | ||
# as they are the primary key for the table, and we don't want to insert duplicate rows | ||
cut -f 1,4,5 "$condensed_sdrf_file" | sort | uniq | while read exp_acc sample_type col_name; do | ||
# In the experiment design column table we use the experiment accession, column name and sample type as the primary key | ||
cut -f 1,4,5 "${CONDENSED_SDRF_FILE}" | sort | uniq | while read experiment_accession sample_type column_name; do | ||
if [ "$sample_type" == 'characteristic' ]; then | ||
column_order=$(awk -v val="$search_column" -v pattern="^Characteristics ?\\[${col_name}]$" -F '\t' '{for (i=1; i<=NF; i++) if ($i ~ pattern) {print i} }' "$sdrf_file") | ||
sdrf_column_index=$(awk -F '\t' -v pattern="^Characteristics ?\\\[${column_name}\\\]$" -f ${SCRIPT_DIR}/load_exp_design.awk ${SDRF_FILE}) | ||
else | ||
column_order=$(awk -v val="$search_column" -v pattern="^Factor ?Value ?\\[${col_name}]$" -F '\t' '{for (i=1; i<=NF; i++) if ($i ~ pattern) {print i} }' "$sdrf_file") | ||
sdrf_column_index=$(awk -F '\t' -v pattern="^Factor ?Value ?\\\[${column_name}\\\]$" -f ${SCRIPT_DIR}/load_exp_design.awk ${SDRF_FILE}) | ||
fi | ||
echo "INSERT INTO exp_design_column (experiment_accession, column_name, sample_type, column_order) VALUES ('$exp_acc', '$col_name', '$sample_type', '$column_order');" | psql -v ON_ERROR_STOP=1 "$dbConnection" | ||
sql_statement="INSERT INTO exp_design_column (experiment_accession, sample_type, column_name, column_order) VALUES ('${experiment_accession}', '${sample_type}', '${column_name}', '${sdrf_column_index}');" | ||
echo "${sql_statement}" >> ${DESTINATION_FILE} | ||
done | ||
|
||
while IFS=$'\t' read -r exp_acc sample sample_type col_name annot_value annot_url; do | ||
echo "INSERT INTO exp_design (sample, annot_value, annot_ont_uri, exp_design_column_id) VALUES ('$sample', '$annot_value', '$annot_url', (SELECT id FROM exp_design_column WHERE experiment_accession='$exp_acc' AND column_name='$col_name' AND sample_type='$sample_type'));" | psql -v ON_ERROR_STOP=1 "$dbConnection" | ||
done < "$condensed_sdrf_file" | ||
# Add the columns from the condensed SDRF file. | ||
# Fields in the condensed SDRF that aren’t in the SDRF are assigned a column_order value of 0 by the AWK script. | ||
# We need to assign them a value that is greater than the maximum column_order value for the experiment. | ||
# The column_order value is used to order the columns in the UI and is not used for the primary key, so it’s ok to have | ||
# duplicates; we can order the fields with the same column_order by name if necessary. | ||
sql_statement="UPDATE exp_design_column SET column_order=(SELECT MAX(column_order) FROM exp_design_column WHERE experiment_accession='${EXPERIMENT_ACCESSION}')+1 WHERE column_order=0 AND experiment_accession='${EXPERIMENT_ACCESSION}';" | ||
echo "${sql_statement}" >> ${DESTINATION_FILE} | ||
|
||
# Insert the experiment design data. | ||
while IFS=$'\t' read -r experiment_accession sample sample_type column_name annotation_value annotation_url; do | ||
sql_statement="INSERT INTO exp_design (sample, annot_value, annot_ont_uri, exp_design_column_id) VALUES ('${sample}', '${annotation_value}', '${annotation_url}', (SELECT id FROM exp_design_column WHERE experiment_accession='${experiment_accession}' AND column_name='${column_name}' AND sample_type='${sample_type}'));" | ||
echo "${sql_statement}" >> ${DESTINATION_FILE} | ||
done < "$CONDENSED_SDRF_FILE" | ||
|
||
# Finish the transaction | ||
echo "COMMIT;" >> ${DESTINATION_FILE} | ||
|
||
PSQL_CMD="psql -qv ON_ERROR_STOP=1 ${dbConnection} -f ${DESTINATION_FILE}" | ||
echo ${PSQL_CMD} | ||
eval ${PSQL_CMD} | ||
|
||
echo "Experiment design data done loading for $condensed_sdrf_file" | ||
echo "$CONDENSED_SDRF_FILE: finished loading experiment design" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Oops, something went wrong.