Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to restore backup #2

Open
yoiang opened this issue Nov 2, 2017 · 30 comments
Open

Ability to restore backup #2

yoiang opened this issue Nov 2, 2017 · 30 comments

Comments

@yoiang
Copy link
Member

yoiang commented Nov 2, 2017

It'd be great to go both ways. Would preferably allow specifying overwriting and merging rules on documents and collections separately.

@Slavrix
Copy link

Slavrix commented Nov 7, 2017

An issue with the current backup form is that it is currently JSON.stringifying all of the data in the document.

This causes issues with keeping the dataType of each field.

Primarily with timestamps and geoPoints, references and nullTyped fields.
Arrays and Maps also could run into issues based on how deep they are as they will need to be checked for these values also to ensure the data is saved correctly.

Here is a demo document that I put all the types in so we could see the values.
To get this object, I used this snippet

    firestore.collection(theCollection).get()
    .then((snapshots)=> {
        if(!snapshots.empty) {
            snapshots.forEach((document)=> {
                console.log(document._fieldsProto);
            }) 
        }
    });
...
{
 teamName: { stringValue: 'Under 9s', valueType: 'stringValue' },
  arrrr: { arrayValue: { values: [Array] }, valueType: 'arrayValue' },
  team: { stringValue: 'U09', valueType: 'stringValue' },
  teamLimits: { stringValue: '16', valueType: 'stringValue' },
  null: { nullValue: 'NULL_VALUE', valueType: 'nullValue' },
  maxAge: { integerValue: '9', valueType: 'integerValue' },
  obj: { mapValue: { fields: [Object] }, valueType: 'mapValue' },
  created: 
   { timestampValue: { seconds: '1510025554', nanos: 424000000 },
     valueType: 'timestampValue' },
  geo: 
   { geoPointValue: { latitude: -34.202716, longitude: 151.171875 },
     valueType: 'geoPointValue' }
 }
...

saving this to the files instead of just pure JSON of the document.data() allows for easier retention of the data types.

These can then be handled in a restore function much easier as an action can be taken on the object based on the type to ensure that the correct data type is retained on write.

I am yet to find if it is possible just to take this and write it as is with document.set or document.add or anything like that.

Hope this helps in some way.

@Slavrix
Copy link

Slavrix commented Nov 7, 2017

doesn't seem to be a way to just add a document using the above format.

However, looking through the firestore package.
@google-cloud/firestore/src/doucments.json line 402
we have a method that is used to decode the above _fieldsProto to which is returned by documents.data()

It's private so can't be called but thats our switch to use to decode the above format to allow type retention.

  /**
   * Decodes a single Firestore 'Value' Protobuf.
   *
   * @private
   * @param proto - A Firestore 'Value' Protobuf.
   * @returns {*} The converted JS type.
   */
  _decodeValue(proto) {
    switch (proto.valueType) {
      case 'stringValue': {
        return proto.stringValue;
      }
      case 'booleanValue': {
        return proto.booleanValue;
      }
      case 'integerValue': {
        return parseInt(proto.integerValue, 10);
      }
      case 'doubleValue': {
        return parseFloat(proto.doubleValue, 10);
      }
      case 'timestampValue': {
        return new Date(
          proto.timestampValue.seconds * 1000 +
            proto.timestampValue.nanos / MS_TO_NANOS
        );
      }
      case 'referenceValue': {
        return new DocumentReference(
          this.ref.firestore,
          ResourcePath.fromSlashSeparatedString(proto.referenceValue)
        );
      }
      case 'arrayValue': {
        let array = [];
        for (let i = 0; i < proto.arrayValue.values.length; ++i) {
          array.push(this._decodeValue(proto.arrayValue.values[i]));
        }
        return array;
      }
      case 'nullValue': {
        return null;
      }
      case 'mapValue': {
        let obj = {};
        let fields = proto.mapValue.fields;

        for (let prop in fields) {
          if (fields.hasOwnProperty(prop)) {
            obj[prop] = this._decodeValue(fields[prop]);
          }
        }

        return obj;
      }
      case 'geoPointValue': {
        return GeoPoint.fromProto(proto.geoPointValue);
      }
      case 'bytesValue': {
        return proto.bytesValue;
      }
      default: {
        throw new Error(
          'Cannot decode type from Firestore Value: ' + JSON.stringify(proto)
        );
      }
    }
  }

@Slavrix
Copy link

Slavrix commented Nov 7, 2017

Reference data types need more work still though.

the reference can then be reimported through
let reference_to_import = firebase.doc(the_saved_reference_string)

the depth of the string needs to be checked through and will need to be extracted from the saved string.
'projects/{PROJECT_NAME}/databases/(default)/documents/{COLLECTION_NAME}/{DOC_ID}'

to get the reference, we only need the collection and doc id and anything after that for depth, IF it is in the same database.
currently, I don't believe its possible to reference ones in other databases but the beginning section will probably be used for that in the future in some way.

So need to somehow do a split here that will remove the beginning part from it, the best way I can think of is to somehow get the beginning part of the string from the initialised firestore object and then matching and then removing the beginning part that way, or something.

@Slavrix
Copy link

Slavrix commented Nov 7, 2017

its janky but it works

        let toRemove = 'projects/'+firestore._referencePath._projectId+'/databases/'+firestore._referencePath._databaseId+'/documents/';

        let toFind = proto.referenceValue.replace(toRemove, '');
        console.log(toFind);
        return firestore.doc(toFind);

in the referenceValue section of the switch

@Slavrix
Copy link

Slavrix commented Nov 7, 2017

Hopefully, this helps if you guys haven't already got restoring working better.
there is probably a better way to do it, but this works for me for now.

@yoiang
Copy link
Member Author

yoiang commented Nov 8, 2017

@Slavrix !! Thank you for all this research! This is way farther than I've gotten in thinking about it!

Hmmmm, off the top of the skull the switch you wrote will likely have to be the direction we go.

Running through other possibilities that come to mind: an assumption would be that anything received from the server, if somehow maintained in its exact form, can be pushed back safely. To that if we could preserve all information about the object we should be able to write each document object's data to a binary file rather than JSON. However, because some of these data types are functional objects and classes and Javascript being a modern language, information composing said object or class are likely contained elsewhere and referred to. In that way it would be an incomplete copy we write to disk.

@Slavrix
Copy link

Slavrix commented Nov 8, 2017

Yea that switch itself came from the official node firebase package with just the one tweak for reference files. I figured if that's how they are doing it it's probably the correct way to go also.

If there is a way to add an entire documentRef straight to firestore that would be the best as then we can retain the rest of the metadata also, but I haven't been able to see one yet.

@lpellegr
Copy link

This project is a really great initiative. However, be careful.

It seems you do not provide any restoration mechanism and you claim on the front page:

Relax: That's it! ✨🌈

Are you joking? Maybe I missed something but as a first sight, it looks like a business killer! A backup that cannot be restored or has not be tested as restorable is just the same as no backup...

@yoiang
Copy link
Member Author

yoiang commented Jan 22, 2018

Hey @lpellegr , thank you for the input!

I don't think this is the best forum for a discussion of the merits or usefulness of the project, especially in an issue where we are discussing how to accomplish exactly what you feel is lacking 😉 Feel free to reach out to me directly if you would like to further discuss!

And please contribute to the conversation or submit a PR, we can use the help!

@yoiang
Copy link
Member Author

yoiang commented Jan 22, 2018

@s-shiva1995 created a PR #18 that enumerates most of the types, similar to what you described @Slavrix, I just got a chance to give it a look and it looks like a good start!

@jeremylorino
Copy link
Contributor

Began the basics of the restore functionality. Luckily @s-shiva1995 set us up nicely with the backup format.

Going to approach this the same way as the backup.

  1. Serial. No merging; will write over existing docs if they exist
  2. Retry options to allow a partial or full retry if documents fail. (Prerequisite for handling failed sub collection restores)
  3. Option to drop all data before restore (ensure no orphaned sub collections)
  4. Then approach the merge restore
  5. Then a parallel restore
  6. Then this will lead into using Pub/Sub and cloud functions for the restore (need to mind the Firestore API quota when dealing with large datasets in async mode)

@stewones
Copy link

stewones commented Apr 3, 2018

the funniest part is that google that should provide this =D

@jeremylorino
Copy link
Contributor

stewwan do you mean that this is currently a feature they provide for Firestore? Or that this is something they should provide?

@stewones
Copy link

stewones commented Apr 4, 2018

@jeremylorino no no, I meant that is something they should provide. Not sure if they plan to release something, what is funny, a Google database as a service, without backup tool. lol.

@jeremylorino
Copy link
Contributor

@stewwan I figured that is what you meant ;)

My personal opinion is that if I have time to build on top of any company's to bridge a gap then I will. Give help; get help.

That being said Firestore is a pre-GA product. Which ultimately means there will be features that may seem like must-haves that are not yet included.

FYI I have not heard of the current priority of this feature in any public Google channel

@stewones
Copy link

stewones commented Apr 4, 2018

yeah makes sense. I built an app that is running in production with thousands of records, I'm afraid to do something stupid when running database migrations, so I'll end up having to write some logic to restore data.. I'll try to PR something as soon as I can.

@elitan
Copy link

elitan commented Apr 13, 2018

Google will most probably add backup/restore functionality in the future. But for now, this is the best we got.

@jeremylorino
Copy link
Contributor

FYI There is no indication that the Firebase team will have this functionality in 2018

@yoiang going to PR this feature but i'm going to slap some tests around some of this repo. beginning to actually get a fair amount of code piling up. wanna get some tests in before it gets to heavy and we end up in a test hole.

@yoiang
Copy link
Member Author

yoiang commented May 7, 2018

Let me know if I can help out on this PR, it's an important one

@jeremylorino
Copy link
Contributor

@yoiang I am not very confident in putting a good suite of tests around the current repo using flow and babel. I converted to typescript and started a few tests. Before I go any further what are your thoughts?

https://github.com/now-ims/node-firestore-backup/tree/to-typescript

@yoiang
Copy link
Member Author

yoiang commented May 12, 2018

Hey @jeremylorino ! Switching over to Typescript is a bit out of scope, and comes with its new set of pluses and minuses vs Javascript + Flowtype. I feel if other consumers and contributors feel strongly about Typescript we should discuss it but it should be a decision made independent of these features. It would be great to hear your concerns!

@jeremylorino
Copy link
Contributor

@yoiang totally agree; which is why i went ahead dropped yall a line to get everyone's temperature here.

The largest factor that led me to doing a Typescript conversion was the fact that in my "day job" we run all Typescript so my toolchain setup for slapping together some unit tests was already done. I normally run fairly tight tslint rules to make sure my code walks a straight line and allows me to breakdown the pieces for testing.

not to mention the Firebase team is heavy Typescript for their Node.js projects

What toolchain are you using for dev of this repo?
I am completely spoiled with vscode and all their fancy extensions.

@atlanteh
Copy link

Any update on this?

@lpellegr
Copy link

@atlanteh Restoration works perfect with https://www.npmjs.com/package/firestore-backup-restore

@atlanteh
Copy link

Oh great! Is that project deliberately separated from this one? Or just waiting some pr?

@jeremylorino
Copy link
Contributor

jeremylorino commented Jul 28, 2018 via email

@atlanteh
Copy link

Ohh greatt! And when will that be? Or they don't disclose that yet?

@steve8708
Copy link
Contributor

Looks like the official import/export tool for firestore was just released moments ago today! 🎉

Docs - Announcement

@jeremylorino
Copy link
Contributor

jeremylorino commented Aug 9, 2018 via email

@rupertbulquerin
Copy link

Help on installing firestore-backup-restore

npm ERR! @google-cloud/[email protected] compile: tsc -p . && cp -r dev/protos build && cp -r dev/test/fake-certificate.json build/test/fake-certificate.json && cp dev/src/v1beta1/firestore_client_config.json build/src/v1beta1/ && cp dev/conformance/test-definition.proto build/conformance && cp dev/conformance/test-suite.binproto build/conformance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants