-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rootfs): refactor database init process #112
Conversation
By analyzing the blame information on this pull request, we identified @kmala to be a potential reviewer |
63fc8db
to
6c41179
Compare
I rebased this patch on my repository and tests on Travis-CI. Tests were failed randomly. Sometimes they finish with success.
|
yeah this is still a WIP, it's not expected to work yet. |
@@ -0,0 +1,14 @@ | |||
#!/usr/bin/env bash | |||
|
|||
until psql -l -t >/dev/null 2>&1; do sleep 1; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be in "if then - fi" blocks? (e.g. Line 7)
(refs #123 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come? we want to wait here until the database is ready to accept connections. When I get back to this PR, I'd like to rewrite this to
until is_running; do sleep 1; done
So it's more clear what the intent of this line is.
6c41179
to
ea66780
Compare
Travis shows errors. I guess it may caused by timeout. Should be patched to the test driver? |
ea66780
to
b7814ba
Compare
I think it's because I'm out-of-date with master. Trying again |
k, got travis to pass, now we just need e2e to pass. |
b7814ba
to
350b15e
Compare
@@ -35,6 +30,9 @@ archive_mode = on | |||
archive_command = 'envdir "${WALE_ENVDIR}" wal-e wal-push %p' | |||
archive_timeout = 60 | |||
listen_addresses = '*' | |||
archive_mode = on | |||
archive_command = 'envdir "${WALE_ENVDIR}" wal-e wal-push %p' | |||
archive_timeout = 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate with L31? (L33, L34 are similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, must've missed it in the rebase
3396707
to
72390e8
Compare
re-labeling as in progress until I identify the travis failures |
I guess the failure on Travis is caused by some timing issues on tests only. I also got random failures on my Travis builds. |
I don't think these are random this time, but I'll eventually look into it. |
FYI I managed to get #131 in so half of the work here has already been done. |
bd6699a
to
d181497
Compare
When you complete a recovery of the database for the first time, a new log timeline is started with an ID of 2. When you restore again, the timeline used when the last backup occurred will be replayed. Because of this, if you restored the database and did not perform a backup, all data from that successful recovery will be lost because only WAL logs from the first timeline (the timeline that the database was last backed up) will be restored. In order to fix this, after completing a database recovery we create a fresh backup in order to establish a new recovery baseline. That way we can now replay from timeline 2. Other fixes that this refactor includes: - database attempting to start halfway through recovery - /bin/is_running identifying the database as running during recovery - cleaning up database boot script
917e795
to
37a75b5
Compare
@bacongobbler is this PR still worth pursuing? Should we prioritize it for the next release, or maybe mark it "help wanted?" |
I think it's still worth it to pursue eventually to get rid of PGCTLTIMEOUT on long recovery times, otherwise there's no real end benefit. It's mostly cleanup so it's not a high priority. |
I'm going to close this PR since it's old and rusty, but @bacongobbler please re-open if this code gets resurrected. |
This refactor includes:
Note that this is not required for 2.0.
Manual testing procedure:
kubectl delete po
closes #55
closes #108