From 152b9791846452bc205a660e27e6660bffa59544 Mon Sep 17 00:00:00 2001 From: Ashish Acharya Date: Tue, 18 Jun 2024 23:22:32 -0500 Subject: [PATCH] Improve the README --- README.md | 282 ++++++++++++++++++++---------------------------------- 1 file changed, 104 insertions(+), 178 deletions(-) diff --git a/README.md b/README.md index a76a0bf6..bcbe6c15 100644 --- a/README.md +++ b/README.md @@ -1,38 +1,31 @@ -# SDE Indexing Helper - -Web application to keep track of collections indexed in SDE and help decide what exactly to index from each collection. +# COSMOS: Web Application for Managing SDE Collections [![Built with Cookiecutter Django](https://img.shields.io/badge/built%20with-Cookiecutter%20Django-ff69b4.svg?logo=cookiecutter)](https://github.com/cookiecutter/cookiecutter-django/) [![Black code style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black) -## Settings - -Moved to [settings](http://cookiecutter-django.readthedocs.io/en/latest/settings.html). +COSMOS is a web application designed to manage collections indexed in NASA's Science Discovery Engine (SDE), facilitating precise content selection and allowing metadata modification before indexing. ## Basic Commands ### Building the Project - ```bash - $ docker-compose -f local.yml build - ``` - -### Running the necessary containers +```bash +$ docker-compose -f local.yml build +``` - ```bash - $ docker-compose -f local.yml up - ``` +### Running the Necessary Containers -### Non-docker local setup +```bash +$ docker-compose -f local.yml up +``` -If you want to run the project without docker, you will need the following: +### Non-Docker Local Setup -
-Postgres +If you prefer to run the project without Docker, follow these steps: -Run the following commands: +#### Postgres Setup -```` +```bash $ psql postgres postgres=# create database ; postgres=# create user with password ''; @@ -41,222 +34,155 @@ postgres=# grant all privileges on database to ; # This next one is optional, but it will allow the user to create databases for testing postgres=# alter role with superuser; -```` -
-
-Environment variables - -Now copy .env_sample in the root directory to .env. Note that in this setup we don't end up using the .envs/ directory, but instead we use the .env file. - -Replace the variables in this line in the .env file: `DATABASE_URL='postgresql://:@localhost:5432/'` with your user, password and database. Change the port if you have a different one. - -You don't need to change any other variable, unless you want to use specific modules (like the GitHub code will require a GitHub token etc). - -There is a section in `config/settings/base.py` which reads environment variables from this file. The line should look like `READ_DOT_ENV_FILE = env.bool("DJANGO_READ_DOT_ENV_FILE", default=True)`. Make sure either the default is True here (which it should already be), or run `export DJANGO_READ_DOT_ENV_FILE=True` in your terminal. - -
- -### How to run - -Run `python manage.py runserver` to test if your setup worked. You might have to run an initial migration with `python manage.py migrate`. - - -### Setting up your users - -- To create a **superuser account**, use this command: - ```bash - $ docker-compose -f local.yml run --rm django python manage.py createsuperuser - ``` - -- To create further users, go to the admin (/admin) and create them from the "Users" section. - -### Loading fixtures -Please note that currently loading fixtures will not create a fully working database. If you are starting the project from scratch, it is probably preferable to skip to the Loading the DB from a Backup section. -- To load collections - ```bash - $ docker-compose -f local.yml run --rm django python manage.py loaddata sde_collections/fixtures/collections.json - ``` - -### Loading scraped URLs into CandidateURLs - -- First make sure there is a folder in scraper/scraped_urls. There should already be an example folder. - -- Then create a new spider for your Collection. An example is mast_spider.py in spiders. In the future, this will be replaced by base_spider.py - -- Run the crawler with `scrapy crawl -o scraped_urls//urls.jsonl - -- Then run this: - ```bash - $ docker-compose -f local.yml run --rm django python manage.py load_scraped_urls - ``` - -### Loading the DB from a backup - -- If a database backup is made available, you wouldn't have to load the fixtures or the scrapped URLs anymore. This changes a few steps necessary to get the project running. - -- Step 1 : Build the project (Documented Above) - -- Step 2 : Run the necessary containers (Documented Above) - -- Step 3 : Clear Out Contenet Types Using Django Shell - - -- Enter the Django shell in your Docker container. - ```bash - $ docker-compose -f local.yml run --rm django python manage.py shell - ``` +``` - -- In the Django shell, you can now delete the content types. - ```bash - from django.contrib.contenttypes.models import ContentType - ContentType.objects.all().delete() - ``` +#### Environment Variables - -- Exit the shell. +Copy `.env_sample` to `.env` and update the `DATABASE_URL` variable with your Postgres credentials. -- Step 4 : Load Your Backup Database +```plaintext +DATABASE_URL='postgresql://:@localhost:5432/' +``` - Assuming your backup is a `.json` file from `dumpdata`, you'd use `loaddata` command to populate your database. +Ensure `READ_DOT_ENV_FILE` is set to `True` in `config/settings/base.py`. - -- If the backup file is on the local machine, make sure it's accessible to the Docker container. If the backup is outside the container, you will need to copy it inside first. - ```bash - $ docker cp /path/to/your/backup.json container_name:/path/inside/container/backup.json - ``` +### Running the Application - -- Load the data from your backup. - ```bash - $ docker-compose -f local.yml run --rm django python manage.py loaddata /path/inside/the/container/backup.json - ``` +```bash +$ python manage.py runserver +``` - -- Once loaded, you may want to run migrations to ensure everything is aligned. - ```bash - $ docker-compose -f local.yml run -rm django python manage.py migrate - ``` +Run initial migration if necessary: +```bash +$ python manage.py migrate +``` -### Type checks +### Setting Up Users -Running type checks with mypy: - ```bash - $ mypy sde_indexing_helper - ``` +#### Creating a Superuser Account -### Test coverage +```bash +$ docker-compose -f local.yml run --rm django python manage.py createsuperuser +``` -To run the tests, check your test coverage, and generate an HTML coverage report: - ```bash - $ coverage run -m pytest - $ coverage html - $ open htmlcov/index.html - ``` +#### Creating Additional Users -#### Running tests with pytest +Create additional users through the admin interface (/admin). - ```bash - $ pytest - ``` +### Loading Fixtures -### Live reloading and Sass CSS compilation +To load collections: -Moved to [Live reloading and SASS compilation](https://cookiecutter-django.readthedocs.io/en/latest/developing-locally.html#sass-compilation-live-reloading). +```bash +$ docker-compose -f local.yml run --rm django python manage.py loaddata sde_collections/fixtures/collections.json +``` -### Install Celery +### Loading the Database from a Backup -Make sure Celery is installed in your environment. To install : - ```bash - $ pip install celery - ``` +1. Build the project and run the necessary containers (as documented above). +2. Clear out content types using the Django shell: -### Install all requirements +```bash +$ docker-compose -f local.yml run --rm django python manage.py shell +>>> from django.contrib.contenttypes.models import ContentType +>>> ContentType.objects.all().delete() +>>> exit() +``` -Install all packages listed in a 'requirements' file - ```bash - pip install -r requirements/*.txt - ``` +3. Load your backup database: -### Celery +```bash +$ docker cp /path/to/your/backup.json container_name:/path/inside/container/backup.json +$ docker-compose -f local.yml run --rm django python manage.py loaddata /path/inside/the/container/backup.json +$ docker-compose -f local.yml run --rm django python manage.py migrate +``` -This app comes with Celery. +## Additional Commands -To run a celery worker: +### Type Checks ```bash -cd sde_indexing_helper -celery -A config.celery_app worker -l info -```` +$ mypy sde_indexing_helper +``` -Please note: For Celery's import magic to work, it is important _where_ the celery commands are run. If you are in the same folder with _manage.py_, you should be right. +### Test Coverage -To run [periodic tasks](https://docs.celeryq.dev/en/stable/userguide/periodic-tasks.html), you'll need to start the celery beat scheduler service. You can start it as a standalone process: +To run tests and check coverage: ```bash -cd sde_indexing_helper -celery -A config.celery_app beat +$ coverage run -m pytest +$ coverage html +$ open htmlcov/index.html ``` -or you can embed the beat service inside a worker with the `-B` option (not recommended for production use): +#### Running Tests with Pytest ```bash -cd sde_indexing_helper -celery -A config.celery_app worker -B -l info +$ pytest ``` -### Pre-Commit hook instructions +### Live Reloading and Sass CSS Compilation -Hooks have to be run on every commit to automatically take care of linting and structuring. +Refer to the [Cookiecutter Django documentation](https://cookiecutter-django.readthedocs.io/en/latest/developing-locally.html#sass-compilation-live-reloading). -To install pre-commit package manager : +### Installing Celery - ```bash - $ pip install pre-commit - ``` +```bash +$ pip install celery +``` -Install the git hook scripts : +### Running a Celery Worker - ```bash - $ pre-commit install - ``` +```bash +$ cd sde_indexing_helper +$ celery -A config.celery_app worker -l info +``` -Run against the files : +Please note: For Celery's import magic to work, it is important where the celery commands are run. If you are in the same folder with manage.py, you should be right. - ```bash - $ pre-commit run --all-files - ``` +### Running Celery Beat Scheduler - It's usually a good idea to run the hooks against all of the files when adding new hooks (usually `pre-commit` will only run on the chnages files during git hooks). +```bash +$ cd sde_indexing_helper +$ celery -A config.celery_app beat +``` -### Sentry +### Pre-Commit Hook Instructions -Sentry is an error logging aggregator service. You can sign up for a free account at or download and host it yourself. -The system is set up with reasonable defaults, including 404 logging and integration with the WSGI application. +To install pre-commit hooks: -You must set the DSN url in production. +```bash +$ pip install pre-commit +$ pre-commit install +$ pre-commit run --all-files +``` -## Deployment +### Sentry Setup -The following details how to deploy this application. +Sign up for a free account at [Sentry](https://sentry.io/signup/?code=cookiecutter) and set the DSN URL in production. -### Docker +## Deployment -See detailed [cookiecutter-django Docker documentation](http://cookiecutter-django.readthedocs.io/en/latest/deployment-with-docker.html). +Refer to the detailed [Cookiecutter Django Docker documentation](http://cookiecutter-django.readthedocs.io/en/latest/deployment-with-docker.html). -### How to import candidate URLs from the test server +## Importing Candidate URLs from the Test Server Documented [here](https://github.com/NASA-IMPACT/sde-indexing-helper/wiki/How-to-bring-in-Candidate-URLs-from-the-test-server). -## Adding new features/fixes - -New features and bugfixes should start with a [GitHub issue](https://github.com/NASA-IMPACT/sde-indexing-helper/issues). Then on local, ensure that you have the [GitHub CLI](https://cli.github.com/). Branches are made based off of existing issues, and no other way. Use the CLI to reference your issue number, like so `gh issue develop -c `. This will create a local branch linked to the issue, and allow GitHub to handle all the relevant linking. +## Adding New Features/Fixes -Once on the branch, create a PR with `gh pr create`. You can leave the PR in draft if it's still WIP. When done, take it out of draft with `gh pr ready`. +1. Start with a [GitHub issue](https://github.com/NASA-IMPACT/sde-indexing-helper/issues). +2. Use the GitHub CLI to create branches and pull requests (`gh issue develop -c `). ## Job Creation Eventually, job creation will be done seamlessly by the webapp. Until then, edit the `config.py` file with the details of what sources you want to create jobs for, then run `generate_jobs.py`. -## Code structure for the SDE_INDEXING_HELPER - -The frontend pages can be found in /sde_indexing_helper -- The html for [collection_list, collection_detail, candidate_urls_list] can be found in /sde_indexing_helper/templates/sde_collections -- The javascript that controls these pages can be found in /sde_indexing_helper/static/js +## Code Structure for SDE_INDEXING_HELPER -The main backend files like 'views.py' can be found in /sde_collections +- Frontend pages: + - HTML: `/sde_indexing_helper/templates/` + - JavaScript: `/sde_indexing_helper/static/js` + - CSS: `/sde_indexing_helper/static/css` + - Images: `/sde_indexing_helper/static/images`