Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MongoDB GridFS Adapter #1794

Merged
merged 10 commits into from
May 22, 2024
Merged

Add MongoDB GridFS Adapter #1794

merged 10 commits into from
May 22, 2024

Conversation

GromNaN
Copy link
Contributor

@GromNaN GromNaN commented May 13, 2024

GridFS is a file system that uses MongoDB as storage. I would like to propose this adapter which uses the MongoDB PHP Library.

Docs PR: #1795

The legacy GridFS Flysystem adapter had some significant number of installations, even if it used the legacy Mongo extension. It shows there's some interest.

Important notes:

  • Directories does not exist in GridFS. I use the same technic as for S3 when explicitly creating a directory: an empty file with a trailing / is created. The creation of a file with a name ending in a / is therefore prohibited.
  • Permissions are not supported by GridFS. When explicitly set, I store the visibility in the metadata.flysystem_visibility attribute of the file.
  • GridFS' contentType attribute is deprecated. The attribute metadata.contentType is used instead.
  • GridFS supports versioning by default. Writing to an existing file creates a new revision. Deleting removes all the revisions. Moving renames all the revisions.
  • For GitHub Action, the mongodb extension is installed and a mongodb server is started using Docker Compose.
  • I added some tests in the shared FilesystemAdapterTestCase in order to improve coverage on mimeType and visibility.

src/GridFS/.github/FUNDING.yml Outdated Show resolved Hide resolved
src/GridFS/GridFSAdapter.php Show resolved Hide resolved
}

try {
$this->bucket->getFilesCollection()->updateMany(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that this is the same approach used in mongodb/mongo-php-library#1207 (PHPLIB-1324), despite there being no standard approach documented in the GridFS spec.

src/GridFS/GridFSAdapter.php Outdated Show resolved Hide resolved
src/GridFS/GridFSAdapter.php Outdated Show resolved Hide resolved
}
}

public function createDirectory(string $path, Config $config): void
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same approach used in other storage drivers that don't have a native concept of directories (e.g. S3)? I wonder if it'd be sensible to add a boolean metadata field to identify this as a directory, as there is otherwise nothing to distinguish this from an empty file.

I'm thinking more about other tooling that may work with the same GridFS bucket, even if such a field isn't directly usable by this adapter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this adapter, trailing / are forbidden. So a file with a trailing / is a directory.

I just added a metadata.flysystem_directory as marker.

src/GridFS/LICENSE Outdated Show resolved Hide resolved
src/GridFS/composer.json Outdated Show resolved Hide resolved
@GromNaN GromNaN force-pushed the gridfs branch 3 times, most recently from 511da2b to 8c7daa1 Compare May 14, 2024 06:59
@frankdejonge
Copy link
Member

Hi all, I'm out this week but will get on this ASAP to get this out. Thanks for the effort and the interest to contribute. Would there be anybody interested from your end to do continued maintenance in this adapter? And if so, who? Then I know who to rope in with issues.

@GromNaN
Copy link
Contributor Author

GromNaN commented May 14, 2024

Hello Frank, we are a team of currently 3 devs, led by @alcaeus. We use an open Jira to organise our work and Flysystem is one of the packages we want to maintain.

}

// Deleting a file that does not exist is no-op
while ($file = $this->findFile($path)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. why is this a while loop?
  2. this block may be affected by a race condition when two deletions are happening concurrently. In other implementations we try to delete and ignore the exception if it was caused by a "file not found" situation. Can we inspect the exception to see if occurs and ignore the error when that's the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The while was necessary to remove all the revisions of the file name. I replaced the code with a "find all revisions" followed by "delete" on each revision. Good point about race-condition, I added a try-catch.

GromNaN added 2 commits May 20, 2024 10:15
Fix ext-mongodb explicit dependency for exception

Use generic type for file object
@GromNaN GromNaN requested review from frankdejonge and jmikola May 20, 2024 11:29
src/GridFS/GridFSAdapter.php Show resolved Hide resolved
src/GridFS/GridFSAdapter.php Outdated Show resolved Hide resolved
];

if ($visibility = $config->get(Config::OPTION_VISIBILITY)) {
$options['metadata'][self::METADATA_VISIBILITY] = $visibility;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this option? I don't follow why it's only injected into metadata when truthy. I noted this same logic appears earlier in the write and writeStream methods.

It seems contrary to the logic in setVisibility() below, which could end up storing a true or false value to the field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to store visibility unless it's explicitly set. setVisibility requires a value. For write, writeStream, createDirectory and copy, this is an option.

@frankdejonge Since there's no concept of file-level permission in GridFS, would it be better to disable the functionality altogether, as is done for WebDAV?

src/GridFS/GridFSAdapter.php Outdated Show resolved Hide resolved
src/GridFS/GridFSAdapter.php Outdated Show resolved Hide resolved
src/GridFS/GridFSAdapter.php Show resolved Hide resolved
src/GridFS/GridFSAdapter.php Show resolved Hide resolved
class GridFSAdapter implements FilesystemAdapter
{
private const METADATA_DIRECTORY = 'flysystem_directory';
private const METADATA_VISIBILITY = 'flysystem_visibility';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this metadata field? I only see it being set during write operations and incorporated into returned FileAttributes or DirectoryAttributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #1794 (comment)

Copy link

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll defer to you both to decide whether to remove the the visibility option (https://github.com/thephpleague/flysystem/pull/1794/files#r1606938015).

@frankdejonge
Copy link
Member

Looks like the test case GridFSAdapterTest::move_all_revisions is failing.

@GromNaN
Copy link
Contributor Author

GromNaN commented May 21, 2024

Looks like the test case GridFSAdapterTest::move_all_revisions is failing.

Where do you see that? It seems to work in GitHub Actions: https://github.com/thephpleague/flysystem/actions/runs/9161336038/job/25242807530?pr=1794

@frankdejonge
Copy link
Member

@GromNaN this is an example; https://github.com/thephpleague/flysystem/actions/runs/9180056390/job/25244027378?pr=1794 might be because of eventual consistency in MongoDB? Is that a thing? Previous run had more errors on the same test case.

@GromNaN
Copy link
Contributor Author

GromNaN commented May 21, 2024

Ok, I suspect that it's because 2 files are created in the same millisecond and revisions are sorted using this date.

@frankdejonge
Copy link
Member

@GromNaN still one failing, is there another way to deal with this or to configure it? https://github.com/GromNaN/flysystem/actions/runs/9181466558/job/25248713035

@GromNaN
Copy link
Contributor Author

GromNaN commented May 22, 2024

It seems that my previous guess was correct but the fix incorrectly implemented. I added usleep(1000) (1ms) between each insertion of a revision to have a different uploadDate.

@frankdejonge frankdejonge merged commit 17f5fab into thephpleague:3.x May 22, 2024
6 checks passed
@GromNaN GromNaN deleted the gridfs branch July 13, 2024 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants