Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds initial support for external attachments using the Filesystem or MinIO #1320

Merged
merged 62 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
2473375
Enables checksumFile to handle streams
Spoffy Oct 27, 2024
ff2fe69
Routes attachment handling through AttachmentFileManager
Spoffy Oct 27, 2024
fc35744
Switches ActiveDoc to using AttachmentFileManager for attachments
Spoffy Nov 4, 2024
ef13def
Fixes a casing error in DocStorage
Spoffy Nov 4, 2024
d19ff54
Adds working filesystem store (in use by default)
Spoffy Nov 4, 2024
8295f5e
Avoids storing external attachments in the sqlite file
Spoffy Nov 4, 2024
8f09573
Renames some parameters in AttachmentStore
Spoffy Nov 4, 2024
72f2e5c
Adds "removeAllWithPrefix?" to ExternalStorage and MinIOExternalStorage
Spoffy Nov 12, 2024
5e26ec8
Adds a streaming API to MinIOExternalStorage
Spoffy Nov 12, 2024
20a1a79
Adds prototype MinIO attachment store
Spoffy Nov 15, 2024
482b5c7
Makes AttachmentStoreProvider function correctly
Spoffy Nov 15, 2024
59d1633
Enables setting document attachment store
Spoffy Nov 16, 2024
c1058da
Makes store creation error if not setup
Spoffy Nov 18, 2024
74102d0
Refactors DocInfo to be reusable
Spoffy Nov 19, 2024
6640133
Fixes pool deletion not working for MinIO
Spoffy Nov 19, 2024
85ba845
Adds lifecycle management to external attachments
Spoffy Nov 19, 2024
3902f82
Adds attachment upload/download logging
Spoffy Nov 22, 2024
4085c06
Fixes test build errors
Spoffy Nov 22, 2024
65bc7a9
Misc. cleanup for external attachments.
Spoffy Nov 22, 2024
ce5282f
Fixes import ordering issues
Spoffy Nov 25, 2024
e53f6eb
Improves documentation of AttachmentFileManager
Spoffy Nov 25, 2024
819e0d6
Switches AttachmentFileManager to structured logging
Spoffy Nov 26, 2024
52b7e24
Tidies up imports
Spoffy Nov 26, 2024
79396dc
Adds doc pool description to AttachmentStore
Spoffy Nov 26, 2024
d527e59
Improves naming of attachment store backend options
Spoffy Nov 26, 2024
c3c82bd
Adds explanation on store concepts
Spoffy Nov 26, 2024
dc19e54
Changes MinIOAttachmentStore to ExternalStorageAttachmentStore
Spoffy Nov 26, 2024
4d7dc11
Makes attachment store API stream based
Spoffy Nov 27, 2024
340c5b3
Adds tests for FilesystemAttachmentStore.ts
Spoffy Dec 1, 2024
6af1818
Makes attachment store creation async, removes check methods
Spoffy Dec 2, 2024
8ca7196
Adds tests for AttachmentStoreProvider
Spoffy Dec 2, 2024
2d59def
Fixes type for FileInfo.storageId
Spoffy Dec 2, 2024
fc9db0c
Adds listAllStoreIds to IAttachmentStoreProvider
Spoffy Dec 2, 2024
72eecf3
Fixes typing on AttachmentFileManagerLogInfo
Spoffy Dec 2, 2024
cdcfb0d
Make makeTestingFilesystemStoreSpec reuse the directory
Spoffy Dec 3, 2024
0bddba2
Adds AttachmentFileManager unit tests
Spoffy Dec 3, 2024
b9d2f8c
Adds attachment store backend availability checks
Spoffy Dec 3, 2024
e5b508c
Merge branch 'main' into spoffy/external-attachments-prototype
Spoffy Dec 3, 2024
f712ef8
Cleans up linter issues
Spoffy Dec 4, 2024
364ac08
Fixes findOrAttachFile test
Spoffy Dec 4, 2024
ae78bba
Removes unnecessary log line
Spoffy Dec 4, 2024
4730d99
Makes _getDocumentSettings error if not found.
Spoffy Dec 4, 2024
c39806f
Undoes erroneous storage schema change
Spoffy Dec 4, 2024
58a101d
Fix getDocumentSettings bad error condition
Spoffy Dec 6, 2024
785971f
Merge branch 'main' into spoffy/external-attachments-prototype
Spoffy Dec 6, 2024
f93948f
Updates migration tests with additional fixtures
Spoffy Dec 7, 2024
a4768a5
Fixes an error with missing storageId in doc schema
Spoffy Dec 7, 2024
c41b4db
Attempt to change BlobMigrationV8 to V9 using upgradeDocument
Spoffy Dec 9, 2024
f0222ca
Updates BlobMigrationV9 and associated tooling
Spoffy Dec 10, 2024
3008fdd
Fixes failing DocStorage tests
Spoffy Dec 10, 2024
4a1c112
Renames "idOfDefaultAttachmentStore" to "attachmentStoreId"
Spoffy Dec 10, 2024
adac94b
Makes getFileData throw an error if file is missing
Spoffy Dec 10, 2024
3b0f603
Updates comments to address feedback
Spoffy Dec 30, 2024
2ba5848
Removes leftover debugging log
Spoffy Jan 2, 2025
635a92c
Updates AttachmentStore comments
Spoffy Jan 2, 2025
dc55180
Makes removeAllWithPrefix check clearer
Spoffy Jan 2, 2025
ae87fa1
Simplifies MinIOExternalStorage "_listObjects" method
Spoffy Jan 2, 2025
b46b320
Fixes test import
Spoffy Jan 2, 2025
e44aadf
Removes TODO from coreCreator.ts
Spoffy Jan 3, 2025
bc2a440
Fixes linter issue
Spoffy Jan 6, 2025
3c2c828
Tidies imports and adds comments
Spoffy Jan 8, 2025
e1678dd
More import fixes
Spoffy Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions app/common/DocumentSettings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ export interface DocumentSettings {
locale: string;
currency?: string;
engine?: EngineCode;
idOfDefaultAttachmentStore?: string;
Spoffy marked this conversation as resolved.
Show resolved Hide resolved
}

/**
Expand Down
12 changes: 7 additions & 5 deletions app/server/generateInitialDocSql.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { ActiveDoc } from 'app/server/lib/ActiveDoc';
import { AttachmentStoreProvider } from "app/server/lib/AttachmentStoreProvider";
Spoffy marked this conversation as resolved.
Show resolved Hide resolved
import { create } from 'app/server/lib/create';
import { DocManager } from 'app/server/lib/DocManager';
import { makeExceptionalDocSession } from 'app/server/lib/DocSession';
Expand Down Expand Up @@ -32,11 +33,12 @@ export async function main(baseName: string) {
if (await fse.pathExists(fname)) {
await fse.remove(fname);
}
const docManager = new DocManager(storageManager, pluginManager, null as any, {
create,
getAuditLogger() { return createDummyAuditLogger(); },
getTelemetry() { return createDummyTelemetry(); },
} as any);
const docManager = new DocManager(storageManager, pluginManager, null as any, new AttachmentStoreProvider([], ""), {
create,
getAuditLogger() { return createDummyAuditLogger(); },
getTelemetry() { return createDummyTelemetry(); },
} as any
);
const activeDoc = new ActiveDoc(docManager, baseName);
const session = makeExceptionalDocSession('nascent');
await activeDoc.createEmptyDocWithDataEngine(session);
Expand Down
51 changes: 37 additions & 14 deletions app/server/lib/ActiveDoc.ts
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,6 @@ import {AssistanceSchemaPromptV1Context} from 'app/server/lib/Assistance';
import {AssistanceContext} from 'app/common/AssistancePrompts';
import {AuditEventProperties} from 'app/server/lib/AuditLogger';
import {Authorizer, RequestWithLogin} from 'app/server/lib/Authorizer';
import {checksumFile} from 'app/server/lib/checksumFile';
import {Client} from 'app/server/lib/Client';
import {getMetaTables} from 'app/server/lib/DocApi';
import {DEFAULT_CACHE_TTL, DocManager} from 'app/server/lib/DocManager';
Expand Down Expand Up @@ -127,6 +126,7 @@ import assert from 'assert';
import {Mutex} from 'async-mutex';
import * as bluebird from 'bluebird';
import {EventEmitter} from 'events';
import { readFile } from 'fs-extra';
import {IMessage, MsgType} from 'grain-rpc';
import imageSize from 'image-size';
import * as moment from 'moment-timezone';
Expand All @@ -137,6 +137,8 @@ import tmp from 'tmp';
import {ActionHistory} from './ActionHistory';
import {ActionHistoryImpl} from './ActionHistoryImpl';
import {ActiveDocImport, FileImportOptions} from './ActiveDocImport';
import {AttachmentFileManager} from "./AttachmentFileManager";
Spoffy marked this conversation as resolved.
Show resolved Hide resolved
import {IAttachmentStoreProvider} from "./AttachmentStoreProvider";
import {DocClients} from './DocClients';
import {DocPluginManager} from './DocPluginManager';
import {DocSession, makeExceptionalDocSession, OptDocSession} from './DocSession';
Expand Down Expand Up @@ -268,6 +270,7 @@ export class ActiveDoc extends EventEmitter {
private _onlyAllowMetaDataActionsOnDb: boolean = false;
// Cache of which columns are attachment columns.
private _attachmentColumns?: AttachmentColumns;
private _attachmentFileManager: AttachmentFileManager;

// Client watching for 'product changed' event published by Billing to update usage
private _redisSubscriber?: RedisClient;
Expand All @@ -283,7 +286,12 @@ export class ActiveDoc extends EventEmitter {
private _doShutdown?: Promise<void>;
private _intervals: Interval[] = [];

constructor(docManager: DocManager, docName: string, private _options?: ICreateActiveDocOptions) {
constructor(
docManager: DocManager,
docName: string,
externalAttachmentStoreProvider?: IAttachmentStoreProvider,
private _options?: ICreateActiveDocOptions,
) {
super();
const {forkId, snapshotId} = parseUrlId(docName);
this._isSnapshot = Boolean(snapshotId);
Expand Down Expand Up @@ -378,6 +386,14 @@ export class ActiveDoc extends EventEmitter {
loadTable: this._rawPyCall.bind(this, 'load_table'),
});

// This will throw errors if _doc or externalAttachmentStoreProvider aren't provided, and ActiveDoc tries to use
paulfitz marked this conversation as resolved.
Show resolved Hide resolved
// an external attachment store.
this._attachmentFileManager = new AttachmentFileManager(
this.docStorage,
externalAttachmentStoreProvider,
this._doc
);

// Our DataEngine is a separate sandboxed process (one sandbox per open document,
// corresponding to one process for pynbox, more for gvisor).
// The data engine runs user-defined python code including formula calculations.
Expand Down Expand Up @@ -912,7 +928,7 @@ export class ActiveDoc extends EventEmitter {
}
}
}
const data = await this.docStorage.getFileData(fileIdent);
const data = await this._attachmentFileManager.getFileData(fileIdent);
if (!data) { throw new ApiError("Invalid attachment identifier", 404); }
this._log.info(docSession, "getAttachment: %s -> %s bytes", fileIdent, data.length);
return data;
Expand Down Expand Up @@ -2347,13 +2363,16 @@ export class ActiveDoc extends EventEmitter {
dimensions.height = 0;
dimensions.width = 0;
}
const checksum = await checksumFile(fileData.absPath);
const fileIdent = checksum + fileData.ext;
const ret: boolean = await this.docStorage.findOrAttachFile(fileData.absPath, fileIdent);
this._log.info(docSession, "addAttachment: file %s (image %sx%s) %s", fileIdent,
dimensions.width, dimensions.height, ret ? "attached" : "already exists");
const attachmentStoreId = (await this._getDocumentSettings())?.idOfDefaultAttachmentStore;
const addFileResult = await this._attachmentFileManager
.addFile(attachmentStoreId, fileData.ext, await readFile(fileData.absPath));
this._log.info(
docSession, "addAttachment: store: '%s', file: '%s' (image %sx%s) %s",
attachmentStoreId ?? 'local document', addFileResult.fileIdent, dimensions.width, dimensions.height,
addFileResult.isNewFile ? "attached" : "already exists"
);
return ['AddRecord', '_grist_Attachments', null, {
fileIdent,
fileIdent: addFileResult.fileIdent,
fileName: fileData.origName,
// We used to set fileType, but it's not easily available for native types. Since it's
// also entirely unused, we just skip it until it becomes relevant.
Expand Down Expand Up @@ -2822,17 +2841,21 @@ export class ActiveDoc extends EventEmitter {
return this._dataEngine;
}

private async _getDocumentSettings(): Promise<DocumentSettings | undefined> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no settings, should that be an error? In what circumstances might that happen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see the method gets used in a migration. That answers that. Might still make sense to throw an error, but accept that error as a non-erroneous possibility in the migration. Feels like in all other circs it should be an error? No big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can switch this to throwing an error, then wrap the code below in _makeEngine in a try-catch block to handle this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'd be fine

const docInfo = await this.docStorage.get('SELECT documentSettings FROM _grist_DocInfo');
const docSettingsString = docInfo?.documentSettings;
return docSettingsString ? safeJsonParse(docSettingsString, undefined) : undefined;
}

private async _makeEngine(): Promise<ISandbox> {
// Figure out what kind of engine we need for this document.
let preferredPythonVersion: '2' | '3' = process.env.PYTHON_VERSION === '3' ? '3' : '2';

// Careful, migrations may not have run on this document and it may not have a
// documentSettings column. Failures are treated as lack of an engine preference.
const docInfo = await this.docStorage.get('SELECT documentSettings FROM _grist_DocInfo').catch(e => undefined);
const docSettingsString = docInfo?.documentSettings;
if (docSettingsString) {
const docSettings: DocumentSettings|undefined = safeJsonParse(docSettingsString, undefined);
const engine = docSettings?.engine;
const docSettings = await this._getDocumentSettings().catch(e => undefined);
if (docSettings) {
const engine = docSettings.engine;
if (engine) {
if (engine === 'python2') {
preferredPythonVersion = '2';
Expand Down
204 changes: 204 additions & 0 deletions app/server/lib/AttachmentFileManager.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
import {
AttachmentStoreDocInfo,
DocPoolId,
getDocPoolIdFromDocInfo,
IAttachmentStore
} from "app/server/lib/AttachmentStore";
import { AttachmentStoreId, IAttachmentStoreProvider } from "app/server/lib/AttachmentStoreProvider";
import { checksumFileStream } from "app/server/lib/checksumFile";
import { DocStorage } from "app/server/lib/DocStorage";
import log from "app/server/lib/log";
import { LogMethods } from "app/server/lib/LogMethods";
import { MemoryWritableStream } from "app/server/utils/MemoryWritableStream";
import { Readable } from "node:stream";

export interface IAttachmentFileManager {
addFile(storeId: AttachmentStoreId, fileExtension: string, fileData: Buffer): Promise<AddFileResult>;
getFileData(fileIdent: string): Promise<Buffer | null>;
}

export interface AddFileResult {
fileIdent: string;
isNewFile: boolean;
}

export class StoresNotConfiguredError extends Error {
constructor() {
super('Attempted to access a file store, but AttachmentFileManager was initialized without store access');
}
}

export class StoreNotAvailableError extends Error {
public readonly storeId: AttachmentStoreId;

constructor(storeId: AttachmentStoreId) {
super(`Store '${storeId}' is not a valid and available store`);
this.storeId = storeId;
}
}

export class AttachmentRetrievalError extends Error {
public readonly storeId: AttachmentStoreId;
public readonly fileId: string;

constructor(storeId: AttachmentStoreId, fileId: string, cause?: any) {
const causeError = cause instanceof Error ? cause : undefined;
const causeDescriptor = causeError ? `: ${cause.message}` : '';
super(`Unable to retrieve '${fileId}' from '${storeId}'${causeDescriptor}`);
this.storeId = storeId;
this.fileId = fileId;
this.cause = causeError;
}
}


interface AttachmentFileManagerLogInfo {
fileIdent?: string;
storeId?: string | null;
}

/**
* Instantiated on a per-document to basis to provide a document with access to its attachments.
paulfitz marked this conversation as resolved.
Show resolved Hide resolved
* Handles attachment uploading / fetching, as well as trying to ensure consistency with the local document database,
* which tracks attachments and where they're stored.
*
* This class should prevent the document code from having to worry about accessing the underlying stores.
*/
export class AttachmentFileManager implements IAttachmentFileManager {
// _docPoolId is a critical point for security. Documents with a common pool id can access each others' attachments.
paulfitz marked this conversation as resolved.
Show resolved Hide resolved
private readonly _docPoolId: DocPoolId | null;
private readonly _docName: string;
private _log = new LogMethods(
"AttachmentFileManager ",
(logInfo: AttachmentFileManagerLogInfo) => this._getLogMeta(logInfo)
);

/**
* @param _docStorage - Storage of this manager's document.
* @param _storeProvider - Allows instantiating of stores. Should be provided except in test scenarios.
* @param _docInfo - The document this manager is for. Should be provided except in test scenarios.
*/
constructor(
private _docStorage: DocStorage,
private _storeProvider: IAttachmentStoreProvider | undefined,
_docInfo: AttachmentStoreDocInfo | undefined,
) {
this._docName = _docStorage.docName;
this._docPoolId = _docInfo ? getDocPoolIdFromDocInfo(_docInfo) : null;
}

public async addFile(
storeId: AttachmentStoreId | undefined,
fileExtension: string,
fileData: Buffer
): Promise<AddFileResult> {
const fileIdent = await this._getFileIdentifier(fileExtension, Readable.from(fileData));
this._log.info({ fileIdent, storeId }, `adding file to ${storeId ? "external" : "document"} storage`);
if (storeId === undefined) {
return this._addFileToLocalStorage(fileIdent, fileData);
}
const store = await this._getStore(storeId);
if (!store) {
this._log.info({ fileIdent, storeId }, "tried to fetch attachment from an unavailable store");
throw new StoreNotAvailableError(storeId);
}
return this._addFileToAttachmentStore(store, fileIdent, fileData);
}

// TODO Given we throw if the underlying store can't retrieve the file, maybe this should throw if we're missing
// the fileInfo from doc storage too, instead of returning null?
// That, or we return a result type which might have an error in.
public async getFileData(fileIdent: string): Promise<Buffer | null> {
this._log.debug({ fileIdent }, "retrieving file data");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned this logging might be a bit much. When a document is opened, it loads attachment previews. That results in 1 or 2 log lines for every attachment in a document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might indeed be a bit much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 1 of the log lines - the other one is quite useful for debugging as it mentions the attachment pool, and which storage (document / external) the file is going into.

const fileInfo = await this._docStorage.getFileInfo(fileIdent);
if (!fileInfo) {
this._log.warn({ fileIdent }, "cannot find file metadata in document");
return null;
}
this._log.debug(
{ fileIdent, storeId: fileInfo.storageId },
`fetching attachment from ${fileInfo.storageId ? "external" : "document "} storage`
);
if (!fileInfo.storageId) {
return fileInfo.data;
}
const store = await this._getStore(fileInfo.storageId);
if (!store) {
this._log.warn({ fileIdent, storeId: fileInfo.storageId }, `unable to retrieve file, store is unavailable`);
throw new StoreNotAvailableError(fileInfo.storageId);
}
return this._getFileDataFromAttachmentStore(store, fileIdent);
}

private async _addFileToLocalStorage(fileIdent: string, fileData: Buffer): Promise<AddFileResult> {
const isNewFile = await this._docStorage.findOrAttachFile(fileIdent, fileData);

return {
fileIdent,
isNewFile,
};
}

private async _getStore(storeId: AttachmentStoreId): Promise<IAttachmentStore | null> {
if (!this._storeProvider) {
throw new StoresNotConfiguredError();
}
return this._storeProvider.getStore(storeId);
}

private _getDocPoolId(): string {
if (!this._docPoolId) {
throw new StoresNotConfiguredError();
}
return this._docPoolId;
}

private async _getFileIdentifier(fileExtension: string, fileData: Readable): Promise<string> {
const checksum = await checksumFileStream(fileData);
return `${checksum}${fileExtension}`;
}

private async _addFileToAttachmentStore(
store: IAttachmentStore, fileIdent: string, fileData: Buffer
): Promise<AddFileResult> {
const isNewFile = await this._docStorage.findOrAttachFile(fileIdent, undefined, store.id);

// Verify the file exists in the store. This allows for a second attempt to correct a failed upload.
const existsInRemoteStorage = !isNewFile && await store.exists(this._getDocPoolId(), fileIdent);

if (!isNewFile && existsInRemoteStorage) {
return {
fileIdent,
isNewFile: false,
};
}

// Possible issue if this upload fails - we have the file tracked in the document, but not available in the store.
// TODO - Decide if we keep an entry in SQLite after an upload error or not. Probably not?
await store.upload(this._getDocPoolId(), fileIdent, Readable.from(fileData));

// TODO - Confirm in doc storage that it's successfully uploaded? Need to decide how to handle a failed upload.
paulfitz marked this conversation as resolved.
Show resolved Hide resolved
return {
fileIdent,
isNewFile,
};
}

private async _getFileDataFromAttachmentStore(store: IAttachmentStore, fileIdent: string): Promise<Buffer> {
try {
const outputStream = new MemoryWritableStream();
await store.download(this._getDocPoolId(), fileIdent, outputStream);
return outputStream.getBuffer();
} catch(e) {
throw new AttachmentRetrievalError(store.id, fileIdent, e);
}
}

private _getLogMeta(logInfo?: AttachmentFileManagerLogInfo): log.ILogMeta {
return {
docName: this._docName,
docPoolId: this._docPoolId,
...logInfo,
};
}
}
Loading