-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research possibility of running Cloud Function into Cloud Storage #4
Comments
@yoiang whats the thinking here? In a func you would use this lib to backup your firestore db to a gcs file? |
That's one possibility! Another that comes to mind is to do the (currently little bit) of processing done on each document. I'm honestly not well acquainted with it yet so I don't know its limitations. For example, would it be possible to spawn additional processes to divide the work of querying and recording collections, fork again on subcollections, and sub-subcollections, etc? |
Let me know if I diverge from the original idea.
In the context of cloud functions, this is the perfect fan-out model. And I
use it all the time during normal operation of Firestore. (Denormalization,
near-time backup to BigQuery)
Using pubsub a message is published with the attributes containing the
class and method to be called and the message payload being the
class.method params. So in this functional model you would treat each
method as a functional rpc call.
$ backup
• send pubsub -> backup.getCollections
• receive pubsub -> doThings -> pubsub -> backup.getDocs
You can break it out as far as you want really. In addition, the cli could
technically do all the setup required — create pubsub topic, publish funcs
to handle pubsub messages, etc.
On Sun, Feb 4, 2018 at 10:16 AM Ian G ***@***.***> wrote:
That's one possibility! Another that comes to mind is to do the (currently
little bit) of processing done on each document.
I'm honestly not well acquainted with it yet so I don't know its
limitations. For example, would it be possible to spawn additional
processes to divide the work of querying and recording collections, fork
again on subcollections, and sub-subcollections, etc?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEyA3nnitUe_v1wEPIb6JNYMewcbwceFks5tRdfHgaJpZM4QQI0I>
.
--
google is watching
|
@yoiang this is kinda what I was thinking in regards to uploading to GCS. Because the backup flow is serial the time to backup the db is much longer, but we can tackle parallelization next. |
Yah, I agree that local parallelization (as opposed to what the remote parallelization we're discussing) should be the next task along with further work towards restoring. |
@yoiang have you made progress here? I was thinking of implementing parallelization by having the cli call itself and passing a document path for context. Seems like this will allow a good amount of reuse and the ability in the future to offload the work via a different mechanism later. Thoughts? |
No description provided.
The text was updated successfully, but these errors were encountered: