-
-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds initial support for external attachments using the Filesystem or MinIO #1320
base: main
Are you sure you want to change the base?
Conversation
Avoids returning `undefined`, which simplifies a lot of the parent logic.
Helps document and prevent programmer errors by adding a utility function to turn it into a docId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a batch of superficial comments from a skim. Feels like it is coming along pretty well!
app/common/DocumentSettings.ts
Outdated
@@ -2,6 +2,7 @@ export interface DocumentSettings { | |||
locale: string; | |||
currency?: string; | |||
engine?: EngineCode; | |||
idOfDefaultAttachmentStore?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: there's a lot of places in Grist where something could be called idOfFoo
and is instead called fooId
. I think that would make sense here too to rhyme with the rest of the codebase, defaultAttachmentStoreId
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(attachmentStoreId
would be fine too, locale and currency are also defaults but we don't spell that out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My motivation for phrasing it this way specifically was the ambiguity of the variable if it ends in id
.
It could be:
defaultAttachmentStore Id
or
default attachmentStoreId
Which are semantically different interpretations of the same name! I'd be happy to swap it around if you think defaultAttachmentStoreId
is clear enough though, despite the above :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attachmentStoreId
is my favorite now I'm afraid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough for me - it's less clear for purpose, but at least resolves the ambiguity. I wasn't a big fan of idOf
anyway. 🙂
app/server/lib/ActiveDoc.ts
Outdated
@@ -2822,17 +2841,21 @@ export class ActiveDoc extends EventEmitter { | |||
return this._dataEngine; | |||
} | |||
|
|||
private async _getDocumentSettings(): Promise<DocumentSettings | undefined> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no settings, should that be an error? In what circumstances might that happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see the method gets used in a migration. That answers that. Might still make sense to throw an error, but accept that error as a non-erroneous possibility in the migration. Feels like in all other circs it should be an error? No big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can switch this to throwing an error, then wrap the code below in _makeEngine
in a try-catch block to handle this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that'd be fine
app/server/lib/AttachmentStore.ts
Outdated
* Gets the correct pool id for a given document, given the document's id and trunk id. | ||
* Avoids other areas of the codebase having to understand how documents are mapped to pools. | ||
* This is a key security point - documents which share a pool can access each others' attachments. | ||
* Therefore documents with different security domains (e.g from different teams) need to be sure not to share |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good now to talk about our strategy, for snapshots and forks to have the same pool id and share attachments?
app/server/lib/AttachmentStore.ts
Outdated
* This is a general-purpose interface that should abstract over many different storage providers, | ||
* so shouldn't have methods which rely on one the features of one specific provider. | ||
* | ||
* This is distinct from `ExternalStorage`, in that it's more generic and doesn't support versioning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm "more generic" feels wrong here, since an interface mentioning docPoolIds: DocPoolId
vs key: string
feels much more specific in some ways. Not many things in the world have DocPoolIds. Maybe drop that text?
app/server/lib/AttachmentStore.ts
Outdated
export class MinIOAttachmentStore implements IAttachmentStore { | ||
constructor( | ||
public id: string, | ||
private _storage: MinIOExternalStorage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason this is using MinIOExternalStorage specifically and won't work with S3ExternalStorage or AzureExternalStorage is new Stream methods, correct? It would still help I think to anticipate that developers will add those methods for those types of storage. It is totally fine just to implement for MinIO here, since this is core, but ideally the follow up work needed would be adding the methods and making small updates, and not doing a significant refactor. Trying to bear in mind for reminder of review to watch for specific dependencies on MinIO creeping in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read rest of PR. Feels fine, no MinIO creep 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% - I can't remember if I left the "Make an interface for streamable external storage" in a TODO or not somewhere, but it's planned, so we can turn "MinIOAttachmentStore" into "ExternalStorageAttachmentStore"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done now - any external storage should be usable, if wrapped in ExternalStorageAttachmentStore
, and made to implement the streaming interface
storeExists(id: AttachmentStoreId): Promise<boolean>; | ||
} | ||
|
||
export interface IAttachmentStoreBackendFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extreme Java code vibes just here. If you search your heart, can you find a shorter name? I do see what you're going for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked deep into my heart and gave it a shot, let me know if this is better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks :) yes
} | ||
|
||
public async getFileData(fileIdent: string): Promise<Buffer | null> { | ||
this._log.debug({ fileIdent }, "retrieving file data"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned this logging might be a bit much. When a document is opened, it loads attachment previews. That results in 1 or 2 log lines for every attachment in a document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might indeed be a bit much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed 1 of the log lines - the other one is quite useful for debugging as it mentions the attachment pool, and which storage (document / external) the file is going into.
# Conflicts: # app/server/generateInitialDocSql.ts # app/server/lib/ActiveDoc.ts # app/server/lib/DocApi.ts # app/server/lib/FlexServer.ts # app/server/lib/coreCreator.ts # test/server/lib/HostedStorageManager.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is copying of documents with external attachments addressed? Might have missed it. If not, would want to be sure we don't miss that in follow-up.
@@ -636,6 +648,7 @@ export class ActiveDoc extends EventEmitter { | |||
useExisting?: boolean, // If set, document can be created as an overlay on | |||
// an existing sqlite file. | |||
}): Promise<ActiveDoc> { | |||
console.log(`OPENOPENOPENOPENOPENOPENOPENOPEN ${this.docName} ANYTHING ELSE`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray debugging message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this and others. Thanks for catching this 😅
app/server/lib/ActiveDoc.ts
Outdated
@@ -388,6 +392,14 @@ export class ActiveDoc extends EventEmitter { | |||
loadTable: this._rawPyCall.bind(this, 'load_table'), | |||
}); | |||
|
|||
// This will throw errors if _doc or externalAttachmentStoreProvider aren't provided, and ActiveDoc tries to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is _doc?
} | ||
|
||
/** | ||
* Instantiated on a per-document to basis to provide a document with access to its attachments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strike "to" in "to basis"?
// TODO - Decide if we keep an entry in SQLite after an upload error or not. Probably not? | ||
await store.upload(this._getDocPoolId(), fileIdent, Readable.from(fileData)); | ||
|
||
// TODO - Confirm in doc storage that it's successfully uploaded? Need to decide how to handle a failed upload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since external storage is not entirely under our control (an admin might reasonably delete some files there) we need to degrade gracefully if some attachments are mysteriously missing from expected location. Presumably that means that even without this TODO, a failure is non-catastrophic in the sense that the document will continue to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not well explained in this PR, but the transfer PR cleans up these errors, and explicitly re-orders operations to ensure we're always in a valid state (if the upload fails, the whole "Upload an attachment" flow fails, and no entries go into the DB).
In the case that the file isn't accessible in external storage, it should show that the attachment isn't available in the UI. I think that's handled on the clientside when the API returns an error response?
app/server/lib/AttachmentStore.ts
Outdated
type FileId = string; | ||
|
||
|
||
// Minimum required info from a document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Minimum required info from a document" for what purpose?
* Generally, the pool id of a document should be its trunk id if available (because it's a fork), | ||
* or the document's id (if it isn't a fork). | ||
* | ||
* This means that attachments need copying to a new pool when a document is copied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we've talked explicitly about the consequences of document copying now being a potentially slow operation. As long as a copy fails cleanly if there's a problem, I think that would be the main thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have either - might be worth a chat?
I think the copy behaviour is missing from this PR to be honest (:grimacing:). It might be worth opening as a separate PR at this point?
_installationUuid: string | ||
) { | ||
// In the current setup, we automatically generate store IDs based on the installation ID. | ||
// The installation ID is guaranteed to be unique, and we only allow one store of each backend type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only allow one store of each backend type
- I thought we agreed to move on from this as too restrictive? It makes no sense to me to allow say an s3 bucket and an azure bucket but then to kick up a fuss if the azure bucket is replaced with a distinct s3 bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember discussing permitting a small label that would take precedence over storeSpec.name if given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've since discussed this, and will use a "JSON in environment variable" approach (adding comment here for posterity)
This PR is still being worked on, but open to initial review.
Context
When attachments are uploaded to Grist, they're stored in an SQLite table within the document.
The result is that large files (multi-GB) substantially increase the size of the SQLite file, and cause performance problems (for example, with snapshots).
Proposed solution
The current design / full scope of changes can be seen in this Google document..
This PR contains a minimal proof of concept for external attachments, with the following features:
No UI exists for configuring anything related to attachment storage at this time. To use a store, the API or an SQLite client needs using to set the
idOfDefaultAttachmentStore
document setting to the ID of the desired attachment store. Attachment store IDs are logged to the console.Currently only a single store of each backend type (minio, filesystem) can exist, as they're automatically configured using the existing setting.
Related issues
#1021
Review guidance / code overview
Three significant components are added in this PR, with the rest of the PR being generally glue around them:
AttachmentFileManager
(in "AttachmentFileManager.ts") handles attachment fetching / saving logic on behalf ofActiveDoc
IAttachmentStore
is an instantiated store, that handles saving / loading attachments from a specific provider.AttachmentStoreProvider
is used to resolve store IDs to stores, and is the main way of providing other components (e.g AttachmentFileManager) with access to store instances.The available types of attachment store are defined in
coreCreator.ts
, under the name "attachmentStoreBackends".Additionally:
Migrations / schema changes
_gristsys_files
has been added, which adds a column to track which store a document is in.idOfDefaultAttachmentStore
has been added to theDocumentSettings
JSON blob.Both of these should be non-breaking on older installations, as both should be ignored.
Has this been tested?
Behaviour has been manually tested in the following ways:
There are currently no unit tests. These should be added before merging, but the goal.
Recommended local testing setup
MinIO
"idOfDefaultAttachmentStore": "store id"
to_grist_DocInfo
->documentSettings
Known issues