-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHPLIB-1324 Implement rename
for GridFS stream wrapper
#1207
Conversation
rename
using stream wrapper
a7f503a
to
6e98e2e
Compare
rename
using stream wrapper rename
for GridFS stream wrapper
dc3627a
to
cfeae45
Compare
src/GridFS/ReadableStream.php
Outdated
$revisions = $this->collectionWrapper->findFiles( | ||
['filename' => $this->file->filename], | ||
['typeMap' => ['root' => 'bson'], 'projection' => ['_id' => 1]], | ||
); | ||
$count = 0; | ||
|
||
foreach ($revisions as $revision) { | ||
$this->collectionWrapper->updateFilenameForId($revision->get('_id'), $newFilename); | ||
$count++; | ||
} | ||
|
||
return $count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reading results and updating them individually, we could run an updateMany
operation. We'd have to expose it in the collection wrapper, but I think it would be beneficial for the performance, especially for files with lots of revisions. The number of revisions changed can also be retrieved from the WriteResult
instance returned by updateMany
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's explicitly written in the spec to perform the update one-by-one.
Drivers construct and execute an update_one command on the files collection using
{ _id: @id }
as the filter and{ $set : { filename : "new_filename" } }
as the update parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be worth bringing up with the wider DBX team, as that GridFS spec language definitely dates back a while and could stand to be updated. Re-reading it, it's clear to me that one of the priorities was raising an error if a file doesn't exist for a given _id
; however, I see no technical reason why an updateMany()
with all _id
values expressed in a single filter (e.g. using $in
) couldn't be used, provided the method also asserted that the expected number of documents were modified by the bulk update.
What you certainly don't want to do is attempt to update many documents based on the filename directly, since that could end up hitting additional file records that we didn't expect. That's certainly not happening here, but I wanted to point it out regardless.
src/GridFS/ReadableStream.php
Outdated
$revisions = $this->collectionWrapper->findFiles( | ||
['filename' => $this->file->filename], | ||
['typeMap' => ['root' => 'bson'], 'projection' => ['_id' => 1]], | ||
); | ||
$count = 0; | ||
|
||
foreach ($revisions as $revision) { | ||
$this->collectionWrapper->updateFilenameForId($revision->get('_id'), $newFilename); | ||
$count++; | ||
} | ||
|
||
return $count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be worth bringing up with the wider DBX team, as that GridFS spec language definitely dates back a while and could stand to be updated. Re-reading it, it's clear to me that one of the priorities was raising an error if a file doesn't exist for a given _id
; however, I see no technical reason why an updateMany()
with all _id
values expressed in a single filter (e.g. using $in
) couldn't be used, provided the method also asserted that the expected number of documents were modified by the bulk update.
What you certainly don't want to do is attempt to update many documents based on the filename directly, since that could end up hitting additional file records that we didn't expect. That's certainly not happening here, but I wanted to point it out regardless.
@@ -90,6 +93,30 @@ public static function register(string $protocol = 'gridfs'): void | |||
stream_wrapper_register($protocol, static::class, STREAM_IS_URL); | |||
} | |||
|
|||
/** | |||
* Rename all revisions of a filename. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this only supports the bucket alias API, since we still only have a single Bucket::rename()
method. Is this something that Drupal requires?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if Drupal needs rename
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rename from stream wrapper only. If you do rename('original.txt', 'renamed.txt)
, you expect original.txt
to no longer exist.
src/GridFS/ReadableStream.php
Outdated
return $count; | ||
$result = $this->collectionWrapper->updateFilenameForFilename($this->file->filename, $newFilename); | ||
|
||
return $result->isAcknowledged(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would typically be true
, unless a user specified a writeConcern
of 0. In other cases there would be an exception. You can keep the previous return value by returning $result->getModifiedCount()
(which can be null
in case of a writeConcern
of 0), or always return true as the exception will tell people if it didn't succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to return always true
. The writeConcern = 0
shouldn't change the return.
Rename all the revisions as defined in the spec https://github.com/mongodb/specifications/blob/0b47194538aa817978fae0f77f684f6d5e62ebab/source/gridfs/gridfs-spec.rst#renaming-stored-files
*/ | ||
public function updateFilenameForFilename(string $filename, string $newFilename): UpdateResult | ||
{ | ||
return $this->filesCollection->updateMany( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this warrants a spec change to Renaming Stored Files before doing something wildly different in PHPLIB. I don't recall why filename
matching was not advised in the original spec, but selecting IDs in advance allows for error checking (see below).
As I suggested in a previous comment, I think the least controversial spec change would be to use updateMany
on the list of collected IDs. If we can't wait for that, though, sticking with the spec's existing, inefficient approach would seem the right course of action.
With regard to the discussion of successful return values in #1207 (comment), you could assert that updateMany
modifies exactly as many documents as the size of the ID list being updated and raise an exception if the write result reports an unexpected modified count. Users can decide for themselves if that's an error, but I reckon it'd only come up if a revision was deleted in another process while the rename was executing (between ID collection and issuing the updateMany
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sincerely, I don't see the difference between collecting IDs first then modifying filenames, vs making 1 single request that finds and modifies filenames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack - I think since renaming files by filename is not part of the spec and the spec only contains a suggested implementation for users, we should be fine adding this implementation. I'd suggest we amend the spec to add APIs to work on all revisions of a file (i.e. remove and rename by filename instead of by ID).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renaming files by filename is not part of the spec and the spec only contains a suggested implementation for users
@alcaeus: Good point. I overlooked that the instructions were telling users to run the rename()
operation for each ID, instead of discussing how a driver should implement filename-based renames.
* master: PHPLIB-1323 Implement `unlink` for GridFS stream wrapper (mongodb#1206) PHPLIB-1330: Sync tests for failCommand errorLabels reqs (mongodb#1214) PHPLIB-1246: Test PHP 8.3 on Evergreen (mongodb#1213) PHPLIB-1324 Implement `rename` for GridFS stream wrapper (mongodb#1207) PHPLIB-1248 Add examples on GridFS (mongodb#1196) Deprecate setting GridFS disableMD5 to false explicitly (mongodb#1205) PHPLIB-1326: Use more permissive top-level runOnRequirements (mongodb#1210) PHPLIB-1206 Add bucket alises for context resolver using GridFS StreamWrapper (mongodb#1138) Bump actions/upload-artifact from 3 to 4 (mongodb#1208) PHPLIB-1275: Replace apiargs usage in docs with extracts (mongodb#1203) Fix title formatting in Client::removeSubscriber() docs (mongodb#1204) PHPLIB-1304: Pull mongohouse image from ECR repo (mongodb#1202) Fix evergreen failures (mongodb#1200) Enable workflows to run for GitHub Merge Queue (mongodb#1199) PHPLIB-1313 Ensure the GridFS stream is saved when the script ends (mongodb#1197) PHPLIB-1309 Add addSubscriber/removeSubscriber to Client class to ease configuration (mongodb#1195) Master is now 1.18-dev
Fix PHPLIB-1324
Rename all the revisions as defined in the spec
Example