Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHPLIB-1324 Implement rename for GridFS stream wrapper #1207

Merged
merged 3 commits into from
Jan 9, 2024

Conversation

GromNaN
Copy link
Member

@GromNaN GromNaN commented Dec 15, 2023

Fix PHPLIB-1324

Rename all the revisions as defined in the spec

To rename multiple revisions of the same filename, users must retrieve the full list of files collection documents for a given filename and execute “rename” on each corresponding “_id”.

Example

file_put_contents('gridfs://bucket/filename', 'Hello, GridFS!');
rename('gridfs://bucket/filename', 'gridfs://bucket/newname');
echo file_get_contents('gridfs://bucket/newname');

@GromNaN GromNaN changed the title PHPLIB-1324 Implement rename using stream wrapper PHPLIB-1324 Implement rename using stream wrapper Dec 15, 2023
@GromNaN GromNaN force-pushed the PHPLIB-1324 branch 2 times, most recently from a7f503a to 6e98e2e Compare December 15, 2023 23:01
@GromNaN GromNaN changed the title PHPLIB-1324 Implement rename using stream wrapper PHPLIB-1324 Implement rename for GridFS stream wrapper Dec 15, 2023
@GromNaN GromNaN force-pushed the PHPLIB-1324 branch 3 times, most recently from dc3627a to cfeae45 Compare December 18, 2023 10:08
@GromNaN GromNaN marked this pull request as ready for review December 18, 2023 10:08
@GromNaN GromNaN requested review from alcaeus and jmikola December 18, 2023 10:08
Comment on lines 188 to 199
$revisions = $this->collectionWrapper->findFiles(
['filename' => $this->file->filename],
['typeMap' => ['root' => 'bson'], 'projection' => ['_id' => 1]],
);
$count = 0;

foreach ($revisions as $revision) {
$this->collectionWrapper->updateFilenameForId($revision->get('_id'), $newFilename);
$count++;
}

return $count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reading results and updating them individually, we could run an updateMany operation. We'd have to expose it in the collection wrapper, but I think it would be beneficial for the performance, especially for files with lots of revisions. The number of revisions changed can also be retrieved from the WriteResult instance returned by updateMany.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's explicitly written in the spec to perform the update one-by-one.

Drivers construct and execute an update_one command on the files collection using { _id: @id } as the filter and { $set : { filename : "new_filename" } } as the update parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be worth bringing up with the wider DBX team, as that GridFS spec language definitely dates back a while and could stand to be updated. Re-reading it, it's clear to me that one of the priorities was raising an error if a file doesn't exist for a given _id; however, I see no technical reason why an updateMany() with all _id values expressed in a single filter (e.g. using $in) couldn't be used, provided the method also asserted that the expected number of documents were modified by the bulk update.

What you certainly don't want to do is attempt to update many documents based on the filename directly, since that could end up hitting additional file records that we didn't expect. That's certainly not happening here, but I wanted to point it out regardless.

Comment on lines 188 to 199
$revisions = $this->collectionWrapper->findFiles(
['filename' => $this->file->filename],
['typeMap' => ['root' => 'bson'], 'projection' => ['_id' => 1]],
);
$count = 0;

foreach ($revisions as $revision) {
$this->collectionWrapper->updateFilenameForId($revision->get('_id'), $newFilename);
$count++;
}

return $count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be worth bringing up with the wider DBX team, as that GridFS spec language definitely dates back a while and could stand to be updated. Re-reading it, it's clear to me that one of the priorities was raising an error if a file doesn't exist for a given _id; however, I see no technical reason why an updateMany() with all _id values expressed in a single filter (e.g. using $in) couldn't be used, provided the method also asserted that the expected number of documents were modified by the bulk update.

What you certainly don't want to do is attempt to update many documents based on the filename directly, since that could end up hitting additional file records that we didn't expect. That's certainly not happening here, but I wanted to point it out regardless.

src/GridFS/StreamWrapper.php Outdated Show resolved Hide resolved
@@ -90,6 +93,30 @@ public static function register(string $protocol = 'gridfs'): void
stream_wrapper_register($protocol, static::class, STREAM_IS_URL);
}

/**
* Rename all revisions of a filename.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this only supports the bucket alias API, since we still only have a single Bucket::rename() method. Is this something that Drupal requires?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if Drupal needs rename.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rename from stream wrapper only. If you do rename('original.txt', 'renamed.txt), you expect original.txt to no longer exist.

return $count;
$result = $this->collectionWrapper->updateFilenameForFilename($this->file->filename, $newFilename);

return $result->isAcknowledged();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would typically be true, unless a user specified a writeConcern of 0. In other cases there would be an exception. You can keep the previous return value by returning $result->getModifiedCount() (which can be null in case of a writeConcern of 0), or always return true as the exception will tell people if it didn't succeed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to return always true. The writeConcern = 0 shouldn't change the return.

*/
public function updateFilenameForFilename(string $filename, string $newFilename): UpdateResult
{
return $this->filesCollection->updateMany(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this warrants a spec change to Renaming Stored Files before doing something wildly different in PHPLIB. I don't recall why filename matching was not advised in the original spec, but selecting IDs in advance allows for error checking (see below).

As I suggested in a previous comment, I think the least controversial spec change would be to use updateMany on the list of collected IDs. If we can't wait for that, though, sticking with the spec's existing, inefficient approach would seem the right course of action.

With regard to the discussion of successful return values in #1207 (comment), you could assert that updateMany modifies exactly as many documents as the size of the ID list being updated and raise an exception if the write result reports an unexpected modified count. Users can decide for themselves if that's an error, but I reckon it'd only come up if a revision was deleted in another process while the rename was executing (between ID collection and issuing the updateMany).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sincerely, I don't see the difference between collecting IDs first then modifying filenames, vs making 1 single request that finds and modifies filenames.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack - I think since renaming files by filename is not part of the spec and the spec only contains a suggested implementation for users, we should be fine adding this implementation. I'd suggest we amend the spec to add APIs to work on all revisions of a file (i.e. remove and rename by filename instead of by ID).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming files by filename is not part of the spec and the spec only contains a suggested implementation for users

@alcaeus: Good point. I overlooked that the instructions were telling users to run the rename() operation for each ID, instead of discussing how a driver should implement filename-based renames.

@GromNaN GromNaN merged commit 401bbd7 into mongodb:master Jan 9, 2024
24 checks passed
@GromNaN GromNaN deleted the PHPLIB-1324 branch January 9, 2024 08:48
alcaeus added a commit to alcaeus/mongo-php-library that referenced this pull request Jan 15, 2024
* master:
  PHPLIB-1323 Implement `unlink` for GridFS stream wrapper (mongodb#1206)
  PHPLIB-1330: Sync tests for failCommand errorLabels reqs (mongodb#1214)
  PHPLIB-1246: Test PHP 8.3 on Evergreen (mongodb#1213)
  PHPLIB-1324 Implement `rename` for GridFS stream wrapper  (mongodb#1207)
  PHPLIB-1248 Add examples on GridFS (mongodb#1196)
  Deprecate setting GridFS disableMD5 to false explicitly (mongodb#1205)
  PHPLIB-1326: Use more permissive top-level runOnRequirements (mongodb#1210)
  PHPLIB-1206 Add bucket alises for context resolver using GridFS StreamWrapper (mongodb#1138)
  Bump actions/upload-artifact from 3 to 4 (mongodb#1208)
  PHPLIB-1275: Replace apiargs usage in docs with extracts (mongodb#1203)
  Fix title formatting in Client::removeSubscriber() docs (mongodb#1204)
  PHPLIB-1304: Pull mongohouse image from ECR repo (mongodb#1202)
  Fix evergreen failures (mongodb#1200)
  Enable workflows to run for GitHub Merge Queue (mongodb#1199)
  PHPLIB-1313 Ensure the GridFS stream is saved when the script ends (mongodb#1197)
  PHPLIB-1309 Add addSubscriber/removeSubscriber to Client class to ease configuration (mongodb#1195)
  Master is now 1.18-dev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants