Research possibility of running Cloud Function into Cloud Storage #4

yoiang · 2017-11-02T18:22:18Z

No description provided.

jeremylorino · 2018-02-04T05:03:15Z

@yoiang whats the thinking here? In a func you would use this lib to backup your firestore db to a gcs file?

yoiang · 2018-02-04T16:16:07Z

That's one possibility! Another that comes to mind is to do the (currently little bit) of processing done on each document.

I'm honestly not well acquainted with it yet so I don't know its limitations. For example, would it be possible to spawn additional processes to divide the work of querying and recording collections, fork again on subcollections, and sub-subcollections, etc?

jeremylorino · 2018-02-04T17:58:32Z

Let me know if I diverge from the original idea. In the context of cloud functions, this is the perfect fan-out model. And I use it all the time during normal operation of Firestore. (Denormalization, near-time backup to BigQuery) Using pubsub a message is published with the attributes containing the class and method to be called and the message payload being the class.method params. So in this functional model you would treat each method as a functional rpc call. $ backup • send pubsub -> backup.getCollections • receive pubsub -> doThings -> pubsub -> backup.getDocs You can break it out as far as you want really. In addition, the cli could technically do all the setup required — create pubsub topic, publish funcs to handle pubsub messages, etc.

On Sun, Feb 4, 2018 at 10:16 AM Ian G ***@***.***> wrote: That's one possibility! Another that comes to mind is to do the (currently little bit) of processing done on each document. I'm honestly not well acquainted with it yet so I don't know its limitations. For example, would it be possible to spawn additional processes to divide the work of querying and recording collections, fork again on subcollections, and sub-subcollections, etc? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEyA3nnitUe_v1wEPIb6JNYMewcbwceFks5tRdfHgaJpZM4QQI0I> .

-- google is watching

jeremylorino · 2018-02-04T22:38:49Z

@yoiang this is kinda what I was thinking in regards to uploading to GCS. Because the backup flow is serial the time to backup the db is much longer, but we can tackle parallelization next.
forked commit

yoiang · 2018-02-28T17:51:58Z

Yah, I agree that local parallelization (as opposed to what the remote parallelization we're discussing) should be the next task along with further work towards restoring.

jeremylorino · 2018-03-04T00:29:35Z

@yoiang have you made progress here?

I was thinking of implementing parallelization by having the cli call itself and passing a document path for context.

Seems like this will allow a good amount of reuse and the ability in the future to offload the work via a different mechanism later.

Thoughts?

yoiang added the enhancement label Nov 2, 2017

jeremylorino mentioned this issue Mar 4, 2018

Parallel #20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research possibility of running Cloud Function into Cloud Storage #4

Research possibility of running Cloud Function into Cloud Storage #4

yoiang commented Nov 2, 2017

jeremylorino commented Feb 4, 2018

yoiang commented Feb 4, 2018

jeremylorino commented Feb 4, 2018 via email

jeremylorino commented Feb 4, 2018

yoiang commented Feb 28, 2018

jeremylorino commented Mar 4, 2018

Research possibility of running Cloud Function into Cloud Storage #4

Research possibility of running Cloud Function into Cloud Storage #4

Comments

yoiang commented Nov 2, 2017

jeremylorino commented Feb 4, 2018

yoiang commented Feb 4, 2018

jeremylorino commented Feb 4, 2018 via email

jeremylorino commented Feb 4, 2018

yoiang commented Feb 28, 2018

jeremylorino commented Mar 4, 2018