Allow dumps as migrations (#190)

Allow migrations which contain `codd_schema` themselves in some cases to support pg dumps as migrations
mzabani · Jul 15, 2024 · b85e262 · b85e262
1 parent ae858b2
commit b85e262
Showing 7 changed files with 3,385 additions and 2,857 deletions.
diff --git a/README.md b/README.md
@@ -202,10 +202,10 @@ We recommend following these instructions closely to catch as many possible issu
 ## Frequently Asked Questions
 
 1. ### Why does taking and restoring a database dump affect my expected codd schema?
-   `pg_dump` does not dump all of the schema state that codd checks. A few examples include (at least with PG 13) role related state, the database's default transaction isolation level and deferredness, among possibly others. So check that it isn't the case that you get different schemas when that happens. We recommend using `pg_dumpall` to preserve more when possible instead. If you've checked with `psql` and everything looks to be the same please report a bug in codd.
+   `pg_dump` does not dump all of the schema state that codd checks. A few examples include (at least with PG 13) role related state, the database's default transaction isolation level and deferredness, among possibly others. So check that it isn't the case that you get different schemas when that happens. If you've checked with `psql` and everything looks to be the same please report a bug in codd.
 
 2. ### Will codd run out of memory or system resources if my migration files are too large or too many?
-    Most likely not. Codd reads migrations from disk in streaming fashion and keeps in memory only a single statement at a time. For `COPY` statements, codd uses a constant-size buffer to stream-read the contents and achieve bounded memory usage while staying fast. Also, codd does not open more than one migration file simultaneously to stay well below typical file handle limits imposed by the shell or operating system, and that is also assured through an automated test that runs in CI with `strace`.
+    Most likely not. Codd reads migrations from disk in streaming fashion and keeps in memory only a single statement at a time. For `COPY` statements, codd uses a constant-size buffer to stream-read the contents and achieve bounded memory usage while staying fast. Also, codd does not open more than one migration file simultaneously to stay well below typical file handle limits imposed by the shell or operating system, and that is also assured through an automated test that runs in CI with `strace`. Codd does keep metadata about all pending migrations in memory, but that should be fairly small.
 
 3. ### Will codd handle SQL errors nicely?
     Codd tries to do the "best possible thing" even in rather unusual situations. It will retry sets of consecutive in-txn migrations atomically so as not to leave your database in an intermediary state. Even for no-txn migrations, codd will retry the failing statement instead of entire migrations, and _even_ if you write explicit `BEGIN..COMMIT` sections in no-txn migrations, codd will be smart enough to retry from the `BEGIN` if a statement inside that section fails. See the [retry examples](/docs/SQL-MIGRATIONS.md#examples) if you're interested. What codd currently cannot handle well is having its connection killed by an external agent while it's applying a _no-txn_ migration, a scenario which should be extremely rare. Basically, we hope you should be able to write your migrations however you want and rely comfortably on the fact that codd should do the reasonable thing when handling errors.
diff --git a/docs/START-USING.md b/docs/START-USING.md
@@ -4,17 +4,17 @@ If you already have a Database and would like to start using codd, here's a guid
 
 1. Configure your environment variables as explained in the [README](../README.md) and in [CONFIGURATION.md](CONFIGURATION.md).
 2. In that configuration make sure you have that extra `dev-only` folder to hold SQL migrations that will only run in developers' machines.
-3. Run `pg_dump your_database > dump-migration.sql` locally. Do not use `pg_dumpall` because it includes _psql_'s meta-commands that codd_ doesn't support.
+3. Run `pg_dump your_database > dump-migration.sql` **locally**. Do not use `pg_dumpall` because it includes _psql_'s meta-commands that codd doesn't support.
 4. Run `dropdb your_database` to drop your DB **locally**.
 5. Add a bootstrap migration similar to the one exemplified in [BOOTSTRAPPING.md](BOOTSTRAPPING.md), but with ownership, encoding and locale equal to your Production DB's. The database's and the _public_'s Schema ownership might need some manual intervention to match in different environments.
    - **What do we mean?** Cloud services such as Amazon's RDS will create Schemas and DBs owned by users managed by them - such as the `rdsadmin` user -, that we don't usually replicate locally. We can either replicate these locally so we don't need to touch our Prod DB or change our Prod DB so only users managed by us are ever referenced in any environment.
 6. Make sure the bootstrap migrations added in the previous step create the database, roles and ownership match what you get in Production.
    - Use _psql_'s `\dg` to view roles in your Prod DB.
    - Use _psql_'s `\l` to check DB ownership and permissions of your Prod DB.
    - Use _psql_'s `\dn+` to check the _public_ schema's ownership and permissions in your Prod DB.
-7. Edit `dump-migration.sql` (created in step 3) and add `-- codd: no-txn` as its very first line.
-8.  Run `codd add dump-migration.sql --dest-folder your-dev-only-folder`
-9.  You should now have your database back and managed through codd.
-10. Make sure your Production environment variable `CODD_MIGRATION_DIRS` does not contain your `dev-only` folder. Add any future SQL migrations to your `all-migrations` folder.
-11. Before deploying with codd, we strongly recommend you run `codd verify-schema` with your environment variables connected to your Production database and make sure schemas match.
-12. In Production, we strongly recommend running `codd up --lax-check` (the default, so equivalent to `codd up`) to start with until you get acquainted enough to consider strict-checking. Make sure you read `codd up --help` to better understand your options.
+Once your bootstrapping migration is ready, run `codd add bootstrap-migration.sql --dest-folder your-dev-only-folder`. This will create your database with no tables or data in it.
+7. Run `codd add dump-migration.sql --dest-folder your-dev-only-folder`. Dumps can some times fail to be applied due to privileges being enforced by postgresql itself, so make sure to edit and change the dump file accordingly so that it can be applied. This often means adding a custom `-- codd-connection` comment on top to make it run as a privileged enough user, like the `postgres` user.
+8. You should now have your database back and managed through codd.
+9. Make sure your Production environment variable `CODD_MIGRATION_DIRS` does not contain your `dev-only` folder. Add any future SQL migrations to your `all-migrations` folder.
+10. Before deploying with codd, we strongly recommend you run `codd verify-schema` with your environment variables connected to your Production database and make sure schemas match.
+11. In Production, we strongly recommend running `codd up --lax-check` (the default) to start with until you get acquainted enough to consider strict-checking. Make sure you read `codd up --help` to better understand your options.
diff --git a/src/Codd/Internal.hs b/src/Codd/Internal.hs
diff --git a/src/Codd/Parsing.hs b/src/Codd/Parsing.hs
@@ -34,22 +34,22 @@ module Codd.Parsing
     isCommentPiece,
     isTransactionEndingPiece,
     isWhiteSpacePiece,
-    manyStreaming,
     piecesToText,
+    sqlPieceText,
     parsedSqlText,
     parseSqlMigration,
     parseWithEscapeCharProper,
     parseAddedSqlMigration,
     parseAndClassifyMigration,
     parseMigrationTimestamp,
     parseSqlPiecesStreaming,
-    sqlPieceText,
     substituteEnvVarsInSqlPiecesStream,
     toMigrationTimestamp,
     -- Exported for tests
     ParserState (..),
     coddConnStringCommentParser,
     copyFromStdinAfterStatementParser,
+    manyStreaming,
     parseSqlPiecesStreaming',
   )
 where
@@ -93,6 +93,7 @@ import qualified Data.Attoparsec.Text as Parsec
 import Data.Bifunctor (first)
 import qualified Data.Char as Char
 import qualified Data.DList as DList
+import Data.Int (Int64)
 import Data.Kind (Type)
 import Data.List
   ( nub,
@@ -114,6 +115,7 @@ import Data.Time
   )
 import Data.Time.Clock (UTCTime (..))
 import Database.PostgreSQL.Simple (ConnectInfo (..))
+import qualified Database.PostgreSQL.Simple.FromRow as DB
 import qualified Database.PostgreSQL.Simple.Time as DB
 import Network.URI
   ( URI (..),
@@ -166,14 +168,27 @@ data AddedSqlMigration m = AddedSqlMigration
 -- | Holds applied status and number of applied statements.
 data MigrationApplicationStatus = NoTxnMigrationFailed Int | MigrationAppliedSuccessfully Int
 
+instance DB.FromRow MigrationApplicationStatus where
+  fromRow = do
+    numAppliedStmts :: Maybe Int <- DB.field
+    noTxnFailedAt :: Maybe UTCTime <- DB.field
+    case (numAppliedStmts, noTxnFailedAt) of
+      (Nothing, _) ->
+        -- old codd_schema version where only fully applied migs were registered
+        pure $ MigrationAppliedSuccessfully 0
+      (Just n, Nothing) -> pure $ MigrationAppliedSuccessfully n
+      (Just n, Just _) -> pure $ NoTxnMigrationFailed n
+
 data AppliedMigration = AppliedMigration
   { appliedMigrationName :: FilePath,
     -- | The migration's timestamp as extracted from its file name.
     appliedMigrationTimestamp :: DB.UTCTimestamp,
     -- | When the migration was effectively applied.
     appliedMigrationAt :: UTCTime,
     appliedMigrationDuration :: DiffTime,
-    appliedMigrationStatus :: MigrationApplicationStatus
+    appliedMigrationStatus :: MigrationApplicationStatus,
+    appliedMigrationTxnId :: Int64,
+    appliedMigrationConnId :: Int
   }
 
 data FileStream m = FileStream

diff --git a/src/Codd/Query.hs b/src/Codd/Query.hs
@@ -7,6 +7,7 @@ module Codd.Query
     , query
     , txnStatus
     , unsafeQuery1
+    , queryMay
     , withTransaction
     ) where
 
@@ -58,6 +59,20 @@ unsafeQuery1 conn q r = liftIO $ do
         [x] -> return x
         _   -> error "More than one result for query1"
 
+-- | Throws an exception if more one result is returned by the query.
+queryMay
+    :: (DB.FromRow b, MonadIO m, DB.ToRow a)
+    => DB.Connection
+    -> DB.Query
+    -> a
+    -> m (Maybe b)
+queryMay conn q r = liftIO $ do
+    res <- DB.query conn q r
+    case res of
+        []  -> pure Nothing
+        [x] -> pure $ Just x
+        _   -> error "More than one result for queryMay"
+
 
 -- | Returns a Query with a valid "BEGIN" statement that is READ WRITE and has
 -- the desired isolation level.

diff --git a/test/DbDependentSpecs/ApplicationSpec.hs b/test/DbDependentSpecs/ApplicationSpec.hs
diff --git a/test/DbDependentSpecs/SchemaVerificationSpec.hs b/test/DbDependentSpecs/SchemaVerificationSpec.hs