-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHPLIB-1220: Support codec option for GridFS buckets #1149
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test codec is a good simple example of object mapping 👏🏻.
LGTM. It's up to you to allow null
to unset the codec option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c8dd067
to
d6dce41
Compare
src/GridFS/WritableStream.php
Outdated
@@ -136,6 +136,8 @@ public function __construct(CollectionWrapper $collectionWrapper, string $filena | |||
'_id' => $options['_id'], | |||
'chunkSize' => $this->chunkSize, | |||
'filename' => $filename, | |||
'length' => 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not initialized until fileCollectionInsert()
. Is there any reason to use 0
here instead of null
like you do for uploadDate
below? null
more clearly communicates that it's unset vs. being a potentially empty file, just as a null
upload date infers that we're in a pre-upload state.
The only reason I can think otherwise might be if you were defining an array shape on this, but I don't see any evidence of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered this when calling getFileDocumentForStream
, which tried to operate on the partial data available. Adding null
values here allowed me to call $document->get('...')
in the codec. While GridFS files are expected to always have a length
and uploadDate
, during the state of uploading (i.e. when the file hasn't been persisted to the database yet) this data is missing entirely, making the previous assumption false. I'm not sure what the better course of action is: ensuring that the data always has all fields, or making the codec accept partial documents that have some fields missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noted that you changed length
to null
, which is actually all I was curious about.
The explanation is very helpful, though ✌️
@@ -166,6 +175,7 @@ public function __construct(Manager $manager, string $databaseName, array $optio | |||
$this->databaseName = $databaseName; | |||
$this->bucketName = $options['bucketName']; | |||
$this->chunkSizeBytes = $options['chunkSizeBytes']; | |||
$this->codec = $options['codec'] ?? null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your explanation for not injecting codec
into $collectionOptions
and only applying on-demand in specific methods, but now I'm curious if we're handling typeMap
incorrectly. That is set directly into $collectionOptions
and ends up applying to most calls on the files and chunks collections.
Bucket uses getRawFileDocumentForStream()
internally as needed to access a raw document (without the type map). Does that not also cover us for a codec? Various CollectionWrapper methods explicitly set a simple typeMap
when we need consistent access to a document, and that should avoid a codec
being inherited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem with the codec
is that it also applies on incoming data for operations like insertOne
, while typeMap
only applies to outgoing data. Applying a codec on the collection broke inserting GridFS files, which is when I noticed that something was up.
In an ideal world, we'd be handling everything as raw BSON data internally and only decoding it using typeMap or codec when necessary to do so. However, I believe this goes beyond the scope of what we want to do here. I'd be in favour of revisiting this when we make changes to our GridFS API to accommodate the recent request to make the stream protocol available to consumers. The refactoring necessary to accommodate this would also allow us to change the Bucket
to internally always work with BSON documents and removing this special handling entirely. The collections would then receive a ['root' => 'bson']
type map, and all handling regarding codecs and type maps for the file document would be handled by the Bucket
class.
$fileObject->filename = $value->get('filename'); | ||
|
||
$uploadDate = $value->get('uploadDate'); | ||
$fileObject->uploadDate = $uploadDate ? DateTimeImmutable::createFromMutable($uploadDate->toDateTime()) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this prompt PHPC-2286?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prompted by #1154 (comment).
$fileObject = new TestFile(); | ||
$fileObject->id = $value->get('_id'); | ||
$fileObject->length = (int) $value->get('length'); | ||
$fileObject->chunkSize = (int) $value->get('chunkSize'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the int
casts here? I notice you don't do casts for filename
, but perhaps that's handled by PHP enforcing TestFile's property type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're operating on raw BSON, we may get Int64
instances, however unlikely they may be (especially for chunkSize). I decided to be prudent and cast these to int
to avoid that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my assumption. Thanks for confirming!
$fileObject->uploadDate = $uploadDate ? DateTimeImmutable::createFromMutable($uploadDate->toDateTime()) : null; | ||
|
||
if ($value->has('metadata')) { | ||
$fileObject->metadata = $value->get('metadata'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this assert that metadata
is a document? I don't think any other type is expected.
Or is that possibly a can of worms because then we'd want a codec for that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion added so I can call toPHP()
on it, see comment below.
tests/Fixtures/Document/TestFile.php
Outdated
public int $chunkSize; | ||
public ?DateTimeImmutable $uploadDate = null; | ||
public string $filename; | ||
public $metadata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my earlier question, should this be typed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a ?stdClass
doctype and called toPHP
on the metadata document to show that the "old" way of decoding data is still available.
|
||
final class TestFile | ||
{ | ||
public $id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted that GridFS permits custom _id
values, so we'd be wrong to force ObjectId here.
fca2389
to
a326ea4
Compare
a326ea4
to
f3a3be3
Compare
@@ -381,6 +404,10 @@ public function getFileDocumentForStream($stream) | |||
{ | |||
$file = $this->getRawFileDocumentForStream($stream); | |||
|
|||
if ($this->codec) { | |||
return $this->codec->decode(Document::fromPHP($file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you re-create a BSON document that have already been decoded into PHP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, yes: the document may not have come from the database but may be stored by the WritableStream
instance as a hash.
PHPLIB-1220
Note that the
Bucket
class does not forward the codec option to the underlying collections, as they can't both work with the same codec (it's only applicable to file documents). Setting a codec on the file collection also wouldn't work, as the collection would expect to apply the codec when inserting a file document. To work around these issues, the codec is manually applied in thefindOne
,find
, andgetFileDocumentForStream
methods.