Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHPLIB-1022: Codec implementation for better BSON decoding/encoding #1059

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions phpcs.xml.dist
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,11 @@
<rule ref="PSR1.Classes.ClassDeclaration.MultipleClasses">
<exclude-pattern>/examples</exclude-pattern>
<exclude-pattern>/tests/PHPUnit/ConstraintTrait.php</exclude-pattern>
<exclude-pattern>tests/Collection/CodecCollectionTest</exclude-pattern>
</rule>
<rule ref="Squiz.Classes.ClassFileName.NoMatch">
<exclude-pattern>/examples</exclude-pattern>
<exclude-pattern>tests/Collection/CodecCollectionTest</exclude-pattern>
</rule>
<rule ref="Squiz.Classes.ValidClassName.NotCamelCaps">
<exclude-pattern>/tests/SpecTests/ClientSideEncryption/Prose*</exclude-pattern>
Expand Down
5 changes: 5 additions & 0 deletions psalm.xml.dist
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,9 @@
<directory name="vendor" />
</ignoreFiles>
</projectFiles>
<stubs>
jmikola marked this conversation as resolved.
Show resolved Hide resolved
<file name="stubs/BSON/BSON.stub.php"/>
<file name="stubs/BSON/Document.stub.php"/>
<file name="stubs/BSON/PackedArray.stub.php"/>
</stubs>
</psalm>
15 changes: 15 additions & 0 deletions src/Codec/Codec.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<?php

namespace MongoDB\Codec;

/**
* @api
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply the class is something we want documented?

I know we use @api elsewhere, but I've forgotten the motivation behind that. When we're thinking about preserving BC, I know everything in the library is implicitly part of the API unless we've annotated it with @internal, so that might lead one to think @api is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be superfluous, in which case I'd go through and remove instances of @api across the board. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*
* @psalm-template B
* @psalm-template T
* @template-extends Decoder<B, T>
* @template-extends Encoder<B, T>
*/
interface Codec extends Decoder, Encoder
{
}
137 changes: 137 additions & 0 deletions src/Codec/CodecLibrary.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
<?php

namespace MongoDB\Codec;

use MongoDB\Exception\UnexpectedValueException;

use function array_map;
use function get_debug_type;
use function sprintf;

class CodecLibrary implements Codec
{
/** @var array<Decoder> */
private $decoders = [];

/** @var array<Encoder> */
private $encoders = [];

/** @param Decoder|Encoder $items */
public function __construct(...$items)
{
array_map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use this over foreach? You're not actually modifying the array elements, so the callback doesn't seem necessary.

function ($item) {
if ($item instanceof Decoder) {
$this->attachDecoder($item);
}

if ($item instanceof Encoder) {
$this->attachEncoder($item);
}

// Yes, we'll silently discard everything. Please let me already have union types...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to replace this with an InvalidArgumentException - for now this is fine.

},
$items
);
}

/** @return static */
final public function attachCodec(Codec $codec): self
{
$this->decoders[] = $codec;
$this->encoders[] = $codec;
if ($codec instanceof KnowsCodecLibrary) {
$codec->attachLibrary($this);
}

return $this;
}

/** @return static */
final public function attachDecoder(Decoder $decoder): self
{
$this->decoders[] = $decoder;
if ($decoder instanceof KnowsCodecLibrary) {
$decoder->attachLibrary($this);
}

return $this;
}

/** @return static */
final public function attachEncoder(Encoder $encoder): self
{
$this->encoders[] = $encoder;
if ($encoder instanceof KnowsCodecLibrary) {
$encoder->attachLibrary($this);
}

return $this;
}

/** @param mixed $value */
final public function canDecode($value): bool
{
foreach ($this->decoders as $decoder) {
if ($decoder->canDecode($value)) {
return true;
}
}

return $value === null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does null get special treatment? Likewise for canEncode().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was for each codec to decide how to handle a null value. For example, an ODM may decide that for a list of embedded documents it may not want to encode null as an empty array. If a single codec can't deal with a null value, it should return false in its canEncode method.

On second thought, the special handling for null seems misguided. If null deserves special treatment, other scalar values should probably enjoy the same. By that logic, it would make more sense to include a separate ScalarValueCodec that handles all scalar values that PHP offers, including null and returns them as-is. This would also do away with the encodeOrReturn topic in other places. Same goes for array, where a separate codec should support any array and traverse the array to encode all items within the array. I think that would make it more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ScalarValueCodec require additional changes in PHPC? If so, this may be a good opportunity to try and provide a way to differentiate 32-bit and 64-bit integers.

}

/** @param mixed $value */
final public function canEncode($value): bool
{
foreach ($this->encoders as $encoder) {
if ($encoder->canEncode($value)) {
return true;
}
}

return $value === null;
}

/**
* @param mixed $value
* @return mixed
*/
final public function decode($value)
{
foreach ($this->decoders as $decoder) {
if (! $decoder->canDecode($value)) {
continue;
}

return $decoder->decode($value);
Comment on lines +103 to +107
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (! $decoder->canDecode($value)) {
continue;
}
return $decoder->decode($value);
if ($decoder->canDecode($value)) {
return $decoder->decode($value);
}

IMO, the guard statement makes sense if we were doing something else here that would need a break afterwards, but that's not the case with return. This just seems more concise unless you expect the body of this loop to get more complicated down the line.

}

if ($value === null) {
return null;
}
Comment on lines +110 to +112
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally after the foreach since individual Decoders might handle null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct - it was thought as a fallback. I might end up refactoring this though, see my comment above.


throw new UnexpectedValueException(sprintf('No decoder found for value of type "%s"', get_debug_type($value)));
}

/**
* @param mixed $value
* @return mixed
*/
final public function encode($value)
{
foreach ($this->encoders as $encoder) {
if (! $encoder->canEncode($value)) {
continue;
}

return $encoder->encode($value);
}

if ($value === null) {
return null;
}
Comment on lines +123 to +133
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as I left on decode() apply here as well.


throw new UnexpectedValueException(sprintf('No encoder found for value of type "%s"', get_debug_type($value)));
}
}
26 changes: 26 additions & 0 deletions src/Codec/Decoder.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?php

namespace MongoDB\Codec;

/**
* @api
*
* @psalm-template B
* @psalm-template T
Comment on lines +8 to +9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume "T" refers to a PHP value and "B" is for BSON. Can we use more descriptive names (e.g. TBson) like you did in AsListIterator?

*/
interface Decoder
{
/**
* @param mixed $value
* @psalm-assert-if-true B $value
*/
public function canDecode($value): bool;

/**
* @param mixed $value
* @psalm-param B $value
* @return mixed
* @psalm-return T
*/
public function decode($value);
}
26 changes: 26 additions & 0 deletions src/Codec/Encoder.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?php

namespace MongoDB\Codec;

/**
* @api
*
* @psalm-template B
* @psalm-template T
*/
interface Encoder
{
/**
* @param mixed $value
* @psalm-assert-if-true T $value
jmikola marked this conversation as resolved.
Show resolved Hide resolved
*/
public function canEncode($value): bool;

/**
* @param mixed $value
* @psalm-param T $value
* @return mixed
* @psalm-return B
*/
public function encode($value);
}
8 changes: 8 additions & 0 deletions src/Codec/KnowsCodecLibrary.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<?php

namespace MongoDB\Codec;

interface KnowsCodecLibrary
{
public function attachLibrary(CodecLibrary $library): void;
}
68 changes: 68 additions & 0 deletions src/Codec/LazyBSONArrayCodec.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
<?php

namespace MongoDB\Codec;

use MongoDB\BSON\PackedArray;
use MongoDB\Exception\UnexpectedValueException;
use MongoDB\Model\LazyBSONArray;

use function array_values;
use function sprintf;

/**
* Codec for lazy decoding of BSON PackedArray instances
*
* @template-implements Codec<PackedArray, LazyBSONArray>
*/
class LazyBSONArrayCodec implements Codec, KnowsCodecLibrary
{
/** @var CodecLibrary|null */
private $library = null;

public function attachLibrary(CodecLibrary $library): void
{
$this->library = $library;
}

/** @inheritDoc */
jmikola marked this conversation as resolved.
Show resolved Hide resolved
public function canDecode($value): bool
{
return $value instanceof PackedArray;
}

/** @inheritDoc */
public function canEncode($value): bool
{
return $value instanceof LazyBSONArray;
}

/** @inheritDoc */
public function decode($value): LazyBSONArray
{
if (! $value instanceof PackedArray) {
throw new UnexpectedValueException(sprintf('"%s" can only decode from "%s" instances', self::class, PackedArray::class));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you adamant about using a RuntimeException here? If not, I think InvalidArgumentException::invalidType() would work well here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the documentation, InvalidArgumentException might indeed work better. I'll update this 👍

}

return new LazyBSONArray($value, $this->getLibrary());
}

/** @inheritDoc */
public function encode($value): PackedArray
{
if (! $value instanceof LazyBSONArray) {
throw new UnexpectedValueException(sprintf('"%s" can only encode "%s" instances', self::class, LazyBSONArray::class));
}

$return = [];
foreach ($value as $offset => $offsetValue) {
$return[$offset] = $this->getLibrary()->canEncode($offsetValue) ? $this->getLibrary()->encode($offsetValue) : $offsetValue;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently codecs are designed to throw an exception if encode is called with an unsupported object. However, we may want to consider a encodeOrReturn kind of method that contains this logic to avoid repeating this all over the place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodeOrReturn() seems sensible.

}

return PackedArray::fromPHP(array_values($return));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than call array_values() here, would it make sense to disregard offsets during iteration of the LazyBSONArray instance and just use $return[] = ... to collect processed values?

As I'm writing this, I've yet to read through LazyBSONArray::getIterator() but a quick glance there tells me that you're relying on AsListIterator to maintain the key sequence of a packed array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️

Yes, that seems like an easier way to accomplish the same thing. And yes, AsListIterator ensures the correct key sequence of a packed array. I'll update this and also give the whole stack of encoding the lazy array another look to ensure we're not needlessly iterating this multiple times.

}

private function getLibrary(): CodecLibrary
{
return $this->library ?? $this->library = new CodecLibrary(new LazyBSONDocumentCodec(), $this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this written as such to avoid constructing a CodecLibrary that might get discarded if attachLibrary() is called before decode() or encode()?

Code style nit: I'm in favor of rewriting this with an explicit if statement to initialize the null property before an unconditional return.

Full disclosure: Avoiding one-liners in PHP from @localheinz was fresh on my mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I wanted to use a null-coalescing assignment, but those aren't available in 7.2. Your assumption however is correct: in the absence of a codec library (which we assume will be attached to the codec earlier) we create our own codec library consisting of the two codecs necessary to handle the lazy objects. I can refactor this if you'd like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assignment within a return statement is really the thing that rubbed me the wrong way. That feels a bit like doing assignments within conditionals, which we also try to avoid.

I wouldn't object to:

$this->library = $this->library ?? new CodecLibrary(new LazyBSONDocumentCodec(), $this);

Once we require PHP 7.4+, I expect the CS tools would suggest a refactoring to use ??=.

}
}
67 changes: 67 additions & 0 deletions src/Codec/LazyBSONDocumentCodec.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
<?php

namespace MongoDB\Codec;

use MongoDB\BSON\Document;
use MongoDB\Exception\UnexpectedValueException;
use MongoDB\Model\LazyBSONDocument;

use function sprintf;

/**
* Codec for lazy decoding of BSON Document instances
*
* @template-implements Codec<Document, LazyBSONDocument>
*/
class LazyBSONDocumentCodec implements Codec, KnowsCodecLibrary
{
/** @var CodecLibrary|null */
private $library = null;

public function attachLibrary(CodecLibrary $library): void
{
$this->library = $library;
}

/** @inheritDoc */
public function canDecode($value): bool
{
return $value instanceof Document;
}

/** @inheritDoc */
public function canEncode($value): bool
{
return $value instanceof LazyBSONDocument;
}

/** @inheritDoc */
public function decode($value): LazyBSONDocument
{
if (! $value instanceof Document) {
throw new UnexpectedValueException(sprintf('"%s" can only decode from "%s" instances', self::class, Document::class));
}

return new LazyBSONDocument($value, $this->getLibrary());
}

/** @inheritDoc */
public function encode($value): Document
{
if (! $value instanceof LazyBSONDocument) {
throw new UnexpectedValueException(sprintf('"%s" can only encode "%s" instances', self::class, LazyBSONDocument::class));
}

$return = [];
foreach ($value as $field => $fieldValue) {
$return[$field] = $this->getLibrary()->canEncode($fieldValue) ? $this->getLibrary()->encode($fieldValue) : $fieldValue;
}
Comment on lines +55 to +58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall that Document::fromPHP() accepts an array or object. If you were to pass the LazyBSONDocument instance directly, how would LazyBSONDocument::bsonSerialize() differ from the explicit iteration/encoding you're doing here?

Also, I noted that PackedArray::fromPHP() requires a list array (not array|object), so passing a LazyBSONArray directly isn't even a possibility; however, I suppose you could explicitly pass the result of LazyBSONArray::bsonSerialize().

I imagine there's a clear reason you're not doing this in both encode() implementations, but knowing why would help me better understand the codec design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LazyBSONDocument::bsonSerialize is implemented for compatibility to BSONDocument, but it lacks the ability to support additional codecs (apart from LazyBSONArrayCodec). I'll give this another look though to see if I can break up the hard coupling from the document to the codec itself somehow. Either way, I'd rather not open up the ability to inject a CodecLibrary into the LazyBSONDocument instance and instead keep it as a lazy value holder while doing the heavy lifting in the codec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate the opportunity to discuss this over video, so please make a note to bring it up in our next 1:1.


return Document::fromPHP($return);
}

private function getLibrary(): CodecLibrary
{
return $this->library ?? $this->library = new CodecLibrary($this, new LazyBSONArrayCodec());
}
}
Loading