From fa7a513a63853e983151594309d33b442393e82b Mon Sep 17 00:00:00 2001 From: Bruce Irschick Date: Mon, 12 Dec 2022 11:07:19 -0800 Subject: [PATCH] [AD-1014] Developer Guide. (#451) * [AD-1014] Developer Guide. * Commit Code Coverage Badge * [AD-1014] Updates to use existing GETTING_STARTED.md and added schema-caching.md * Commit Code Coverage Badge Co-authored-by: birschick-bq --- GETTING_STARTED.md | 102 +++++++++++++++++------ README.md | 7 +- src/markdown/index.md | 6 ++ src/markdown/schema/schema-caching.md | 111 ++++++++++++++++++++++++++ 4 files changed, 202 insertions(+), 24 deletions(-) create mode 100644 src/markdown/schema/schema-caching.md diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md index a13bf849..34c4aa24 100644 --- a/GETTING_STARTED.md +++ b/GETTING_STARTED.md @@ -156,8 +156,45 @@ rather than the cluster endpoint since we have set up the SSH tunnel. ~~~ mongo --host 127.0.0.1:27017 --username --password ~~~ + +## Database User Account Definitions + +The integration tests assume the following two user accounts are created +in the target database server. + +### Administrative User + +User: `documentdb` + +#### Definition: + +```json +{ + "user" : "documentdb", + "roles" : [ { + "db" : "admin", + "role" : "root" + } ] +} +``` + +### Restricted Access User + +User: `docDbRestricted` + +#### Definition + +```json +{ + "user" : "docDbRestricted", + "roles" : [ { + "db" : "admin", + "role" : "readAnyDatabase" + } ] +} +``` -##### Connect with TLS +## Connect with TLS When connecting to a TLS-enabled cluster you can follow the same steps to set up an SSH tunnel but will need to also download the Amazon DocumentDB Certificate Authority (CA) file before trying to connect. 1. Download the CA file. @@ -178,8 +215,8 @@ access the cluster from localhost, the server certificate does not match the hos mongo --host 127.0.0.1:27017 --username --password --tls --tlsCAFile rds-combined-ca-bundle.pem --tlsAllowInvalidHostnames ~~~ -##### Connect Programmatically -###### Without TLS +### Connect Programmatically +#### Without TLS Connecting without TLS is very straightforward. We essentially follow the same steps as when connecting using the `mongo` shell. 1. Setup the SSH tunnel. See step 3 in section [Setting Up Environment Variables](#setting-up-environment-variables) for @@ -201,7 +238,7 @@ Make sure to set the hostname, username, password and target database. The targe } ~~~ -###### With TLS +#### With TLS Connecting with TLS programmatically is slightly different from how we did it with the `mongo` shell. 1. Create a test or simple main to run. 2. Use either the Driver Manager, Data Source class or Connection class to establish a connection to `localhost:27017`. @@ -224,36 +261,57 @@ class: } ~~~ -#### Setting Up Environment Variables -1. Create and set the Environment Variables: +## Integration Testing - ~~~ - DOC_DB_USER_NAME= - DOC_DB_PASSWORD= - DOC_DB_LOCAL_PORT=27019 - DOC_DB_USER=@ - DOC_DB_HOST= - DOC_DB_PRIV_KEY_FILE=~/.ssh/.pem - ~~~ +By default, integration testing is disabled for local development. To enable +integration testing, follow the directions below. + +### Setting Up Environment Variables -2. Ensure the private key file .pem is in the location set by the environment variable +To enable integration testing the following environment variables allow +you to customize the credentials and DocumentDB cluster settings. + +1. Create and set the following environment variables: + +| Variable | Description | Example | +|------------------------|--------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------| +| `DOC_DB_USER_NAME` | This is the DocumentDB user. | `documentdb` | +| `DOC_DB_PASSWORD` | This is the DocumentDB password. | `aSecret` | +| `DOC_DB_LOCAL_PORT` | This is the port number used locally via an SSH Tunnel. It is recommend to use a different value than the default 27017. | `27019` | +| `DOC_DB_USER` | This is the user and host of SSH Tunnel EC2 instance. | `ec2-user@254.254.254.254` | +| `DOC_DB_HOST` | This is the host of the DocumentDB cluster server. | `docdb-jdbc-literal-test.cluster-abcdefghijk.us-east-2.docdb.amazonaws.com` | +| `DOC_DB_PRIV_KEY_FILE` | This is the path to the SSH Tunnel private key-pair file. | `~/.ssh/ec2-literal.pem` | + +### SSH Tunnel + +1. Ensure the private key file .pem is in the location set by the environment variable `DOC_DB_PRIV_KEY_FILE`. -3. Start an SSH port-forwarding tunnel: +2. Assuming you have the environment variables setup above, starting an SSH tunnel from the command line should look like this: + ~~~shell + ssh [-f] -N -i $DOC_DB_PRIV_KEY_FILE -L $DOC_DB_LOCAL_PORT:$DOC_DB_HOST:27017 $DOC_DB_USER ~~~ - ssh [-f] -N -i ~/.ssh/.pem -L $DOC_DB_LOCAL_PORT:$DOC_DB_HOST:27017 $DOC_DB_USER - ~~~ - + - The `-L` flag defines the port forwarded to the remote host and remote port. Adding the `-N` flag means do not execute a remote command, you will not get a shell in this case. The `-f` switch instructs SSH to run in the background. -#### Bypass Testing DocumentDB +### Enable Integration Testing of Amazon DocumentDB + +To enable integration testing in the IDE, update the grade property, as intructed below. + 1. Modify the */gradle.properties* file in the source code and uncomment the following line: -`runRemoteIntegrationTests=false` +`runRemoteIntegrationTests=true` + +### Project Secrets + +For the purposes of automated integration testing in **GitHub**, this project maintains the value for the environment variables above +as project secrets. See the workflow file [gradle.yml](https://github.com/aws/amazon-documentdb-jdbc-driver/blob/1edd9e21fdcccfe62d366580702f2904136298e5/.github/workflows/gradle.yml) ## Troubleshooting + ### Issues with JDK + 1. Confirm project SDK is Java Version 1.8 via the IntelliJ top menu toolbar under *File → Project Structure → Platform Settings -> SDK* and reload the JDK home path by browsing to the path and click *apply* and *ok*. Restart IntelliJ IDEA. @@ -277,5 +335,3 @@ class: below. Go to EC2 Dashboard → **Network & Security** Group in the left menu → **Security** Group. ![Security Policy for EC2 Instance](src/markdown/images/getting-started/security-policy-ec2-instance.png) - - \ No newline at end of file diff --git a/README.md b/README.md index 9799a5b3..467c8c14 100644 --- a/README.md +++ b/README.md @@ -67,4 +67,9 @@ your issue. ## Security Notice -If you discover a potential security issue in this project, please consult our [security guidance page](SECURITY.md). \ No newline at end of file +If you discover a potential security issue in this project, please consult our [security guidance page](SECURITY.md). + +## Contributor's Getting Started Guide + +If you're a developer and want to contribute to this project, ensure to read and follow the +[Getting Started as a Developer](GETTING_STARTED.md) guide. diff --git a/src/markdown/index.md b/src/markdown/index.md index fc4c9a1d..d1b6d7d6 100644 --- a/src/markdown/index.md +++ b/src/markdown/index.md @@ -51,6 +51,12 @@ The Amazon DocumentDB JDBC driver can perform automatic schema discovery and gen DocumentDB schema mapping. See the [schema discovery documentation](schema/schema-discovery.md) for more details of this process. +## Schema Caching + +Once schema is discovered, it is cached in the database to improve performance for subsequent access. +See the [schema caching documentation](schema/schema-caching.md) to learn +more about schema caching behaviour and access requirements. + ## Schema Management The SQL to DocumentDB schema mapping can be managed in the following ways: diff --git a/src/markdown/schema/schema-caching.md b/src/markdown/schema/schema-caching.md new file mode 100644 index 00000000..7547af7d --- /dev/null +++ b/src/markdown/schema/schema-caching.md @@ -0,0 +1,111 @@ +# Schema Caching + +## Schema Caching Behaviour + +When a connection is made to an Amazon DocumentDB database, the Amazon DocumentDB JDBC driver +checks for a previously cached version of the mapped schema. If a previous version exists, +the latest version of the cached schema is read and used for all further interaction with the database. + +If a previously cached version does not exist, the process of [schema discover](schema-discovery.md) is automatically +started on all the accessible collections in the database. The discovery process uses the properties +`scanMethod` (default `random`), and `scanLimit` (default `1000`) when sampling documents from the database. +At the end of the discovery process, the resulting schema mapping is written to the cache using the name +associated with the property `schemaName` (default `_default`). + +If some reason the resulting schema cannot be saved to the cache, the resulting schema will still be used +in-memory for the life of the connection. The implication of not having access to a cached version of the +schema is that the schema discovery will have to be performed for each connection - which could have a seriously +negative impact on performance. + +## Cache Location + +The SQL schema mapping cache is stored in two collections on the same database as +the sampled collections. The collection `_sqlSchemas` stores the names and versions of +all the sampled schemas for the given database. The collection `_sqlTableSchemas` stores the +column to field mappings for all the cached SQL schema mappings. The two cache collections +have a strong parent/child relationship and must be maintained in a consistent way. Always use +the [schema management CLI](manage-schema-cli.md) to ensure consistency in the cache collections. + +## User Permissions for Creating and Updating the Schema Cache + +To be able to store or update the SQL schema mappings to the cache collections, the connected +Amazon DocumentDB user account must have write permissions to create and update the +cache collections. Once the schema is cached, users need only read permission on the +cache collections. + +To allow access for an Amazon DocumentDB user, ensure to set or add the appropriate roles as +described below. + +### Enable Access per Database + +To allow read and write access to specific databases in your server, add +a `readWrite` [built-in role](https://www.mongodb.com/docs/manual/reference/built-in-roles/#mongodb-authrole-readWrite) +for each database the user should have access to be able to create and update the cached schema for specific +databases. + +```json +roles: [ + {role: "readWrite", db: "yourDatabase1"}, + {role: "readWrite", db: "yourDatabase2"} ... +] +``` + +### Enable Access for Any Database + +To allow read and write access to any databases in your server, add +a `readWriteAnyDatabase` [built-in role](https://www.mongodb.com/docs/manual/reference/built-in-roles/#mongodb-authrole-readWriteAnyDatabase) +on the `admin` database to be able to create and update the cached schema in any database. + +```json +roles: [ + {role: "readWriteAnyDatabase", db: "admin"} +] +``` + +### Collection-Level Access Control + +If [collection-level access control](https://www.mongodb.com/docs/manual/core/collection-level-access-control/) +is implemented, then ensure `find`, `insert`, and `update` actions are +allowed on the cache collections (`_sqlSchemas` and `_sqlTableSchemas`) + +## User Permissions for Reading an Existing Schema Cache + +To be able to read the SQL schema mappings to the cache collections, the connected +Amazon DocumentDB user account must have read permissions to read the +cache collections. + +To allow access for an Amazon DocumentDB user, ensure to set or add the appropriate roles as +described below. + +### Enable Access per Database + +To allow read access to specific databases in your server, add +a `read` [built-in role](https://www.mongodb.com/docs/manual/reference/built-in-roles/#mongodb-authrole-read) +for each database the user should have access to be able to read the cached schema for specific +databases. + +```json +roles: [ + {role: "read", db: "yourDatabase1"}, + {role: "read", db: "yourDatabase2"} ... +] +``` + +### Enable Access for Any Database + +To allow read access to any databases in your server, add +a `readAnyDatabase` [built-in role](https://www.mongodb.com/docs/manual/reference/built-in-roles/#mongodb-authrole-readAnyDatabase) +on the `admin` database to be able to read the cached schema in any database. + +```json +roles: [ + {role: "readAnyDatabase", db: "admin"} +] +``` + +### Collection-Level Access Control + +If [collection-level access control](https://www.mongodb.com/docs/manual/core/collection-level-access-control/) +is implemented, then ensure `find` actions are +allowed on the cache collections (`_sqlSchemas` and `_sqlTableSchemas`) +