Skip to content

Commit

Permalink
ReSpecT find out what it means to me
Browse files Browse the repository at this point in the history
  • Loading branch information
Vebop committed Nov 18, 2024
1 parent 56352f0 commit 07ca1a4
Show file tree
Hide file tree
Showing 7 changed files with 123 additions and 108 deletions.
30 changes: 15 additions & 15 deletions doc/operator-guide/deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,35 @@ Deployment
Database
========

- Deployments of the Consolidated Database are currently located at
- Deployments of the Consolidated Database are currently located at

- Summit
- USDF (+ dev, use the same underlying database, a replication of Summit)
- Base Test Stand (BTS)
- Tucson Test Stand (TTS)
- Summit
- USDF (+ dev, use the same underlying database, a replication of Summit)
- Base Test Stand (BTS)
- Tucson Test Stand (TTS)

- Updates to these deployments may be needed when there are edits to the schema for any of the cdb_* tables defined in <link to> sdm_schemas.
- Updates to these deployments may be needed when there are edits to the schema for any of the cdb_* tables defined in <link to> sdm_schemas.

Tools:
------

- Argo-CD
- LOVE
- Felis
- Argo-CD
- LOVE
- Felis

Repositories:
-------------

- ``phalanx`` (https://github.com/lsst-sqre/phalanx)
- ``sdm_schemas`` (https://github.com/lsst/sdm_schemas)
- ``consdb`` (https://github.com/lsst-dm/consdb)
- ``phalanx`` (https://github.com/lsst-sqre/phalanx)
- ``sdm_schemas`` (https://github.com/lsst/sdm_schemas)
- ``consdb`` (https://github.com/lsst-dm/consdb)

Access needed:
--------------

- NOIRLab VPN
- Summit VPN
- USDF
- NOIRLab VPN
- Summit VPN
- USDF

Process:
--------
Expand Down
18 changes: 9 additions & 9 deletions doc/operator-guide/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,26 @@ Monitoring
Reporting channels
==================

- Users of ConsDB, ConsDBClient (``pqserver``) will usually report via #consolidated-database in rubin-obs.slack.com when they are having issues.
- ConsDB operators should monitor this channel and #ops-usdf, #ops-usdf-alerts for issues and outages reported, as well as escalate verified database issues.
- Users of ConsDB, ConsDBClient (``pqserver``) will usually report via #consolidated-database in rubin-obs.slack.com when they are having issues.
- ConsDB operators should monitor this channel and #ops-usdf, #ops-usdf-alerts for issues and outages reported, as well as escalate verified database issues.

Database
========

- The ConsDB team is responsible for verifying whether or not the database is up when issues are reported
- They can check the method reported by the users, check using ``psql``/ ``pgcli``, and check in the #ops-usdf slack channel for currently reported issues.
- The ConsDB team is responsible for verifying whether or not the database is up when issues are reported
- They can check the method reported by the users, check using ``psql``/ ``pgcli``, and check in the #ops-usdf slack channel for currently reported issues.

- Once the ConsDB team has confirmed there is an issue with the database, they should notify #ops-usdf slack channel and USDF DBAs should be responsible for fixing/restarting.
- Once the ConsDB team has confirmed there is an issue with the database, they should notify #ops-usdf slack channel and USDF DBAs should be responsible for fixing/restarting.

REST API Server
===============

- If we suspect the API server died, the ConsDB team should be responsible for checking and restarting
- Use the appropriate argo-cd deployment graph to check deployment logs, and potentially restart the service.
- If we suspect the API server died, the ConsDB team should be responsible for checking and restarting
- Use the appropriate argo-cd deployment graph to check deployment logs, and potentially restart the service.


Other issues
------------

- K8s infrastructure died The ConsDB team can verify that that is the problem, but there are likely to be wider issues seen
- USDF or Summit K8s/IT support should be responsible for fixing.
- K8s infrastructure died The ConsDB team can verify that that is the problem, but there are likely to be wider issues seen
- USDF or Summit K8s/IT support should be responsible for fixing.
5 changes: 3 additions & 2 deletions doc/operator-guide/runbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,9 @@ Periodic tasks
Documentation and Training
==========================

Links to documentation and training resources
Here: (https://consdb.lsst.io)
- Links to documentation and training resources

- Here: (https://consdb.lsst.io)

Support
=======
Expand Down
114 changes: 57 additions & 57 deletions doc/operator-guide/schema-migration-process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,32 +5,32 @@ Schema Migration Process
Add columns to sdm_schemas
==========================

- Add the requested database additions, justifications, and where they are generated to (https://rubinobs.atlassian.net/wiki/spaces/DM/pages/246644760/Consolidated+Database+Non-EFD+Entries)
- Create a ticket and edit the repository at https://sdm-schemas.lsst.io/ to apply your schema changes to any of the ``cdb_*.yml`` schemas.
- If your ``sdm_schemas`` PR has issues, check that the schema conforms to Felis's data model and valid SQL tables can be created with ``felis validate/create`` (https://felis.lsst.io/user-guide/cli.html#felis-validate) using:
- Alembic migrations should be automatically created by a git workflow after your ``sdm_schemas`` pull request completes.
- Add the requested database additions, justifications, and where they are generated to (https://rubinobs.atlassian.net/wiki/spaces/DM/pages/246644760/Consolidated+Database+Non-EFD+Entries)
- Create a ticket and edit the repository at https://sdm-schemas.lsst.io/ to apply your schema changes to any of the ``cdb_*.yml`` schemas.
- If your ``sdm_schemas`` PR has issues, check that the schema conforms to Felis's data model and valid SQL tables can be created with ``felis validate/create`` (https://felis.lsst.io/user-guide/cli.html#felis-validate) using:
- Alembic migrations should be automatically created by a git workflow after your ``sdm_schemas`` pull request completes.

Create an Alembic Migration (manually)
======================================

- Alembic (https://alembic.sqlalchemy.org/en/latest/front.html) keeps track of versioning by autogenerated migrations to sync the test stands and summit databases.
- Versioning our database schema changes allows us to apply edits and move the database’s state forward or backward as needed.
- Alembic (https://alembic.sqlalchemy.org/en/latest/front.html) keeps track of versioning by autogenerated migrations to sync the test stands and summit databases.
- Versioning our database schema changes allows us to apply edits and move the database’s state forward or backward as needed.

- How to create an Alembic migration to deploy your new ``sdm_schema`` edits:
- How to create an Alembic migration to deploy your new ``sdm_schema`` edits:

1. Create an Alembic migration on your ``consdb`` ticket branch.
2. Use the script ``consdb/alembic-autogenerate.py`` to generate Alembic migrations. You’ll need the following environment variables:
1. Create an Alembic migration on your ``consdb`` ticket branch.
2. Use the script ``consdb/alembic-autogenerate.py`` to generate Alembic migrations. You’ll need the following environment variables:

- ``SDM_SCHEMAS_DIR`` - Points to a local clone of ``sdm_schemas`` with your ``cdb_*`` schema changes.
- ``SDM_SCHEMAS_DIR`` - Points to a local clone of ``sdm_schemas`` with your ``cdb_*`` schema changes.

3. Run ``alembic-autogenerate`` to create version files in respective database-named directories in ``consdb``.
3. Run ``alembic-autogenerate`` to create version files in respective database-named directories in ``consdb``.

- Follow the directions in the header of the script.
- Follow the directions in the header of the script.

4. Manually edit the generated files in ``consdb/alembic/<table-name>/`` to:
4. Manually edit the generated files in ``consdb/alembic/<table-name>/`` to:

- Remove the ``visit1`` and ``ccdvisit1`` views.
- Ensure constraints and renamed columns are correct.
- Remove the ``visit1`` and ``ccdvisit1`` views.
- Ensure constraints and renamed columns are correct.


Test alembic migration
Expand All @@ -42,73 +42,73 @@ Test alembic migration
1. Update the deployment on the test stand:
-------------------------------------------

- Choose the appropriate test stand (TTS, BTS)
- Create a branch in ``phalanx`` and edit the corresponding test stand environment file ``phalanx/applications/consdb/values-<test stand>.yaml`` to point to your branch's built docker image (tickets-DM-###).
- Coordinate and announce in the appropriate slack channel that you will begin testing your migrations.
- Update the consdb deployment in ``<url.to.teststand>/argo-cd`` to use your ``phalanx`` branch in the ``Target Revision``. Refresh and check pod logs.
- Verify the tables that you will be upgrading exist using ``psql``
- From the ``consdb/`` directory, (where ``alembic.ini`` file is) use the alembic commands to upgrade the existing database tables: ``alembic upgrade head -n <database name>``
- Deploy new consdb software (``hinfo``, ``pqserver``) and check the initial logs.
- Choose the appropriate test stand (TTS, BTS)
- Create a branch in ``phalanx`` and edit the corresponding test stand environment file ``phalanx/applications/consdb/values-<test stand>.yaml`` to point to your branch's built docker image (tickets-DM-###).
- Coordinate and announce in the appropriate slack channel that you will begin testing your migrations.
- Update the consdb deployment in ``<url.to.teststand>/argo-cd`` to use your ``phalanx`` branch in the ``Target Revision``. Refresh and check pod logs.
- Verify the tables that you will be upgrading exist using ``psql``
- From the ``consdb/`` directory, (where ``alembic.ini`` file is) use the alembic commands to upgrade the existing database tables: ``alembic upgrade head -n <database name>``
- Deploy new consdb software (``hinfo``, ``pqserver``) and check the initial logs.

2. Test with LATISS imaging in ATQueue:
---------------------------------------

- Access LOVE via ``<url.to.teststand>/love`` and use the 1Password admin information to sign in, or your SLAC username and password.
- Navigate to the ATQueue or Auxillary Telescope (AuxTel) Script Queue.
- See (TTS Start Guide)[https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53739987/Tucson+Test+Stand+Start+Guide] for guidelines on using the test stands.
- Before editing these scripts, note their starting configurations, as we will return the configuration to that when we are done.
- Take a test/simulated picture with LATISS through the ATQueue using these three scripts:
- Access LOVE via ``<url.to.teststand>/love`` and use the 1Password admin information to sign in, or your SLAC username and password.
- Navigate to the ATQueue or Auxillary Telescope (AuxTel) Script Queue.
- See (TTS Start Guide)[https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53739987/Tucson+Test+Stand+Start+Guide] for guidelines on using the test stands.
- Before editing these scripts, note their starting configurations, as we will return the configuration to that when we are done.
- Take a test/simulated picture with LATISS through the ATQueue using these three scripts:

1. ``set_summary_state.py`` Change the configuration to set ATHeaderService and ATCamera to ENABLED.
2. ``enable_latiss.py`` Remove any existing configuration.
3. ``take_image_latiss.py`` Update the configuration to remove anything that is not 'nimages' (1) and 'image_type' (BIAS or DARK or FLAT)
1. ``set_summary_state.py`` Change the configuration to set ATHeaderService and ATCamera to ENABLED.
2. ``enable_latiss.py`` Remove any existing configuration.
3. ``take_image_latiss.py`` Update the configuration to remove anything that is not 'nimages' (1) and 'image_type' (BIAS or DARK or FLAT)

- Once you have put these three scripts in the queue, click ``run``.
- Watch for errors in the Script Queue and the ``argo-cd`` ``consdb`` pod logs and ``hinfo-latiss`` deployment.
- Address any errors and retest.
- Check the database by using ``psql`` commands like ``\dt`` to display the table names and maybe even ``SELECT * from cdb_latiss.exposure where day_obs == <YYYYMMDD>;`` to view the most recent data.
- Once you have put these three scripts in the queue, click ``run``.
- Watch for errors in the Script Queue and the ``argo-cd`` ``consdb`` pod logs and ``hinfo-latiss`` deployment.
- Address any errors and retest.
- Check the database by using ``psql`` commands like ``\dt`` to display the table names and maybe even ``SELECT * from cdb_latiss.exposure where day_obs == <YYYYMMDD>;`` to view the most recent data.

- Run set_summary_state to set ATHeaderService and ATCamera back to STANDBY, and return LATISS back to STANDBY.
- Then return these three scripts to their original configurations.
- Run set_summary_state to set ATHeaderService and ATCamera back to STANDBY, and return LATISS back to STANDBY.
- Then return these three scripts to their original configurations.

- If you have encountered errors in this process, do not proceed to the summit, but address those errors and retest them with your ``phalanx`` branch pointing to your ``consdb`` branch with the updates that fix the errors.
- If you have encountered errors in this process, do not proceed to the summit, but address those errors and retest them with your ``phalanx`` branch pointing to your ``consdb`` branch with the updates that fix the errors.


- If tests are successful, create a pull request for the Alembic migration in ``consdb``. Tag the release according to ``standards-practices`` guidelines.
- Update your existing phalanx branch to point the environment based deployments to this consdb tag. You are able to retest on the test stand at this point, hopefully there were no changes to your consdb pull request so this step is trivial.
- If tests are successful, create a pull request for the Alembic migration in ``consdb``. Tag the release according to ``standards-practices`` guidelines.
- Update your existing phalanx branch to point the environment based deployments to this consdb tag. You are able to retest on the test stand at this point, hopefully there were no changes to your consdb pull request so this step is trivial.


Deploy migration in synchrony at Summit (if necessary), USDF, and Prompt Release (if necessary)
-----------------------------------------------------------------------------------------------

- What is prompt release?
- What is prompt release?


Deploy code to populate db at Summit and/or USDF
------------------------------------------------

- Follow the testing steps above for testing alembic migration and code at TTS/BTS, before the you consider deploying at the summit.
- Follow the testing steps above for testing alembic migration and code at TTS/BTS, before the you consider deploying at the summit.

- The steps to deploy at the summit mirror the steps to test on a test stand with coordination and permission from the observers and site teams.
- Access to argo-cd deployments is available via the Summit OpenVPN.
- To coordinate your deployment update on the summit, you must attend Coordination Activities Planning (CAP) meeting on Tuesday mornings and announce your request.
- The steps to deploy at the summit mirror the steps to test on a test stand with coordination and permission from the observers and site teams.
- Access to argo-cd deployments is available via the Summit OpenVPN.
- To coordinate your deployment update on the summit, you must attend Coordination Activities Planning (CAP) meeting on Tuesday mornings and announce your request.

- Add it to the agenda here: https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53765933/Agenda+Items+for+Future+CAP+Meetings
- Add it to the agenda here: https://rubinobs.atlassian.net/wiki/spaces/LSSTCOM/pages/53765933/Agenda+Items+for+Future+CAP+Meetings

- The CAP members may tell you a time frame that is acceptable for you to perform these changes.
- The CAP members may tell you a time frame that is acceptable for you to perform these changes.

- They may also tell you specific people to coordinate with to help you take images to test LATISS and LSSTCOMCAMSIM tables. There will be more tables to test eventually.
- Some important channels to note: #rubinobs-test-planning; #summit-announce; #summit-auxtel, https://obs-ops.lsst.io/Communications/slack-channel-usage.html.
- They may also tell you specific people to coordinate with to help you take images to test LATISS and LSSTCOMCAMSIM tables. There will be more tables to test eventually.
- Some important channels to note: #rubinobs-test-planning; #summit-announce; #summit-auxtel, https://obs-ops.lsst.io/Communications/slack-channel-usage.html.

- When you get your final approval and designated time to perform the changes to ConsDb, announce on #summit-announce, and follow similar steps as test stand procedure above.
- When you get your final approval and designated time to perform the changes to ConsDb, announce on #summit-announce, and follow similar steps as test stand procedure above.

* Summit Deployment Steps
Summit Deployment Steps
-----------------------

1. Use a branch in ``phalanx`` to point to the ``consdb`` tag for deployment.
2. Set the ``argo-cd`` application ``consdb's`` target revision to your ``phalanx`` branch.
3. Refresh the consdb application and review pod logs.
4. Have an image taken with the observing team, then verify database entries with a SQL query or Jupyter notebook.
5. Check your new entries in the database using a jupyter notebook or SQL query in RSP showing your new image has been inserted to the database as expected.
1. Use a branch in ``phalanx`` to point to the ``consdb`` tag for deployment.
2. Set the ``argo-cd`` application ``consdb's`` target revision to your ``phalanx`` branch.
3. Refresh the consdb application and review pod logs.
4. Have an image taken with the observing team, then verify database entries with a SQL query or Jupyter notebook.
5. Check your new entries in the database using a jupyter notebook or SQL query in RSP showing your new image has been inserted to the database as expected.


- Once deployment succeeds, set the ``Target Revision`` in ``argo-cd`` back to ``main`` and complete the ``phalanx`` PR for the tested ``consdb`` tag.
- Once deployment succeeds, set the ``Target Revision`` in ``argo-cd`` back to ``main`` and complete the ``phalanx`` PR for the tested ``consdb`` tag.
Loading

0 comments on commit 07ca1a4

Please sign in to comment.