Skip to content

Runing Single Cell Expression Atlas

Alfonso Muñoz-Pomer Fuentes edited this page Jun 29, 2020 · 2 revisions

Software requirements

  • Java 11 (OpenJDK)
  • Java 8
  • Gradle 5.x
  • Tomcat 8 (or any other Java 7 EE web server)
  • Solr 7.1 + ZooKeeper 3.4.10
  • PostgreSQL 10 (via Docker)

Code

The application is split into two modules:

  • Atlas Web Core: business logic shared by both (bulk) Expression Atlas and Single Cell Expression Atlas
  • Atlas Web Single Cell: logic specific to single cell experiments and the web layer

There are other helper repositories configured as Git submodules such as different Gradle profiles we share across projects (i.e. development, testing and production), some front-end packages shared with bulk Expression Atlas that need some tweaks in each project and relational SQL schemas to create the testing in-memory H2 database. More details can be found in the .gitmodules file.

Create an atlas directory and clone both repos:

mkdir atlas
cd atlas
git clone --recurse-submodules https://github.com/ebi-gene-expression-group/atlas-web-core.git
git clone --recurse-submodules https://github.com/ebi-gene-expression-group/atlas-web-single-cell.git

IMPORTANT: atlas-web-single-cell needs for atlas-web-core to be installed in the same path as specified in settings.gradle.

In order to ensure the sanity of the stack, it’s a good idea to run unit tests (by convention we append the suffix Test to our unit tests and IT to our integration tests). We’ll need Java 11:

cd atlas-web-core
./gradlew -PtestResultsPath=ut test --tests *Test
cd atlas-web-single-cell
./gradlew -PtestResultsPath=ut test --tests *Test

You should see the following in the last lines of Gradle’s output in either case:

...
BUILD SUCCESSFUL in 28s
5 actionable tasks: 5 executed
<-------------> 0% WAITING

In order to run the application we’ll need to prepare the data files, PostgreSQL and Solr.

Data files

The web application requires some files that live outside the classpath at startup (e.g. properties of supported species) and others over the lifetime of the application (e.g. experiment files). All the critical paths are defined in uk.ac.ebi.atlas.configuration.BasePathsConfig and uk.ac.ebi.atlas.SingleCellFilePathConfig (in atlas-web-core and `atlas-web-single-cell, respectively).

The data files needed for the application to run are expected to be located at $HOME/ATLAS3.TEST/integration-test-data as specified in Gradle’s development environment file, profile-dev.gradle. This can be changed if it doesn’t suit your setup. A test data bundle can be downloaded from http://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/test/integration-test-data/. We recommend lftp for this, as it’s got a mirror command:

lftp ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/test/integration-test-data/ -e 'mirror . $HOME/ATLAS3.TEST/integration-test-data'

Some of the downloaded data won’t be used by Single Cell Expression Data, since we keep data for both bulk and single cell experiments together. At the moment there are no plans to split them.

PostgreSQL

Download the pre-loaded PG data archive from http://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/test/pgdata-scxa.tgz. Uncompress it to a location of your choice and create a Docker container mounting the directory on /var/lib/postgresql/data/pgdata:

docker run --name scxa-pg10 -e POSTGRES_USER=atlas3dev -e POSTGRES_PASSWORD=atlas3dev -e POSTGRES_DB=gxpatlasloc -d -p 5432:5432 -e PGDATA=/var/lib/postgresql/data/pgdata -v <the-dir-where-you-extracted-pg-archive>:/var/lib/postgresql/data/pgdata postgres:10.4-alpine

WARNING: In order to reduce the load time of datasets into the DB, there are some Postgres-specific settings we’re using such as table partitioning and disabling WALs. A forceful or regular termination of the Docker process may cause data integrity issues and full tables to be cleared when you restart your container. To avoid this, remember to alway stop the Postgres process manually before stopping Docker:

docker exec -it scxa-pg10 bash -c "su -c 'pg_ctl stop -m fast' postgres"
docker stop scxa-pg10

IMPORTANT: The database connection credentials must match the properties in profile-dev.gradle, which are used to filter the jdbc.properties file. If you can’t expose port 5432 in your host machine modify uk.ac.ebi.atlas.configuration.JdbcConfig.

Solr

In this section we’ll create a SolrCloud instance composed of a one-node ZooKeeper ensemble and two Solr nodes. ZooKeeper listens by default to port 2181 and the Solr nodes will be listening to ports 8983 and 8984.

Download Solr 7.1.0 and ZooKeeper 3.4.10. Both require Java 8, so for this section point JAVA_HOME at a suitable JDK/JRE and have its bin directory in your path. Lastly, download the prepopulated Single Cell Expression Atlas SolrCloud collections from http://ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/test/solrcloud.tgz.

Create a solr directory anywhere on your filesystem and extract the Solr and ZooKeeper binaries, and the prepopulated SolrCloud collections. We want the directory structure below:

├── solr-7.1.0          # Solr 7.1.0 binary
├── zookeeper-3.4.10    # ZooKeeper 3.4.10 binary
└── solrcloud           # Contents of solrcloud.tgz
    ├── node1
    ├── node2
    └── zk

The only change needed to correctly run ZooKeeper is to edit the dataDir property in line 11 of solrcloud/zk/zoo.cfg, so that it contains the absolute path of solrcloud/zk/data. Save the changes and run ZooKeeper:

ZOO_LOG_DIR=./solrcloud/zk/log ./zookeeper-3.4.10/bin/zkServer.sh start ./solrcloud/zk/zoo.cfg

If everything went well, you should see the following:

ZooKeeper JMX enabled by default
Using config: ./solrcloud/zk/zoo.cfg
Starting zookeeper ... STARTED

To further check that ZooKeeper is running properly, run the following:

echo "stat" | nc localhost 2181

And you should see something like this:

Zookeeper version: 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
Clients:
 /0:0:0:0:0:0:0:1:43772[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x4c0ed
Mode: standalone
Node count: 465

Now it’s time to start the Solr cores:

SOLR_LOGS_DIR=./solrcloud/node1/log ./solr-7.1.0/bin/solr start -c -s ./solrcloud/node1 -p 8983 -m 2g -z localhost:2181 -Denable.runtime.lib=true
SOLR_LOGS_DIR=./solrcloud/node2/log ./solr-7.1.0/bin/solr start -c -s ./solrcloud/node2 -p 8984 -m 2g -z localhost:2181 -Denable.runtime.lib=true

A successful start of the Solr processes above will display a message like:

Waiting up to 180 seconds to see Solr running on port 8983 [|]  
Started Solr server on port 8983 (pid=300471). Happy searching!

If everything went well you should be able to open Solr’s admin web UI on localhost:8983. Click on the collections dropdown to the left and check they’re all there:

Single Cell Expression Atlas SolrCloud web UI screenshot

Web

Generate the WAR of the application:

cd atlas-web-single-cell
./gradlew war

You will find it in atlas-web-single-cell/build/libs/sc.war. For both development and production we use Tomcat 8 but in principle any javax.servlet-implementing web server will do. If you use a different web server than Tomcat 8 we’d be very interested in knowing your outcome. Additionally, there are many IDEs that can automate this step, and the scope for such workflows fall outside this guide.

Clone this wiki locally