Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

Commit

Permalink
move from private repo
Browse files Browse the repository at this point in the history
  • Loading branch information
reisner committed Aug 10, 2022
1 parent 24c4da1 commit 5389c00
Show file tree
Hide file tree
Showing 40 changed files with 3,112 additions and 0 deletions.
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.DS_Store
.Rproj.user
*.Rproj
app_key_dir
app_vault_dir
.deploy_vars
.configs
decode.sh
.devcontainer/
.bash_history
.local/
.rstudio/
.dockerignore
92 changes: 92 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# syntax = docker/dockerfile:1.0-experimental # https://docs.docker.com/develop/develop-images/build_enhancements/
FROM rocker/r-ver:4.0.3

RUN export DEBIAN_FRONTEND=noninteractive && apt-get -y update \
&& apt-get install -y \
alien \
bzip2 \
cmake \
curl \
file \
gdal-bin \
gnupg2 \
libaio1 \
libapparmor1 \
libcairo2 \
libcairo2-dev \
libcurl4-openssl-dev \
libedit2 \
libgdal-dev \
libglpk-dev \
libpoppler-cpp-dev \
libproj-dev \
libsqliteodbc \
libssl-dev \
libudunits2-dev \
libxml2-dev \
libxt-dev \
libxt6 \
lsb-release \
odbc-postgresql \
openjdk-8-jdk \
openssh-client \
pandoc \
pandoc-citeproc \
postgresql \
procps \
psmisc \
r-cran-cairo \
swaks \
tcl-dev \
tk-dev \
unixodbc \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*

# get from https://packagemanager.rstudio.com/client/#/repos/1/overview
# Freezing packages to April 22, 2021:
RUN echo "options(repos = c(REPO_NAME = 'https://packagemanager.rstudio.com/all/__linux__/focal/2511902'))" >> $R_HOME/etc/Rprofile.site

RUN R -e "install.packages(c('assertthat', \
'data.table', \
'dplyr', \
'DT', \
'elastic', \
'future', \
'future.callr', \
'ggplot2', \
'ggthemes', \
'httr', \
'jsonlite', \
'leaflet', \
'lubridate', \
'memoise', \
'plotly', \
'promises', \
'rmarkdown', \
'rgdal', \
'shiny', \
'shinycssloaders', \
'shinydashboard', \
'shinyjs', \
'shinyWidgets', \
'sf', \
'stringr', \
'timetk', \
'htmltools'))"

# Add certs for accessing elastic search servers that require them
COPY elasticsearch/certificates/*.crt /usr/local/share/ca-certificates/
RUN update-ca-certificates

RUN mkdir /shinyapp
WORKDIR /shinyapp/
ADD ./ ./

RUN useradd shiny -u 5000 -m -b /home
RUN chown -R shiny:shiny /shinyapp
USER shiny

EXPOSE 3838

CMD ["R", "-e", "shiny::runApp('/shinyapp', host = '0.0.0.0', port = 3838)"]
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<img src="www/text_depot_icon/TextDepotIcon_TextImage_S.jpg" width="25%">

Text Depot is a tool to search and analyze topics of interest within a large database of text data. The Text Depot dashboard (this repo) provides a front-end to a set of indexes in ElasticSearch. To use this repository, you must provide one or more [Elastic Search](www.elastic.co) indexes in a particular format.

## Local Machine Setup

1. Clone this repo.
2. Run `cp .configs_sample .configs` and fill in the relevant values.

### Running Locally

1. Install any missing libraries with `install.packages("DT")` (for example). A list of required libraries can be found in the included `Dockerfile`
2. Run `Rscript run_text_depot_dashboard.R`

### Running via Docker

1. Optionally, create a `.dockerignore` file to exclude any local files.
2. Use the provided `Dockerfile` to build and run the app:

```
$ DOCKER_BUILDKIT=1 docker build -t text_depot_dashboard .
$ docker run -it -p 8080:3838 text_depot_dashboard
```

3. Open the dashboard on your browser: [http://localhost:8080](http://localhost:8080)

## ElasticSearch

Each data source should be stored in its own Elastic Search index. For more information, see [elasticsearch/](elasticsearch/)

## Notes

Our workflow contained the following components:

![Overall Workflow](workflow.png)

This repository contains the dashboard code (Blue above) for Text Depot. The green components were scheduled with cron jobs, and keep the indexes up-to-date in the ElasticSearch Database. We wrote a custom Parser for each data source, and a single Annotator class that adds the fields below to each document before insertion. The orange components were added for authentication and embeddings-based search (add to `embedding_api_host` in `.configs`) for your dashboard.

Loading

0 comments on commit 5389c00

Please sign in to comment.