Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHPLIB-1180: Add basic infrastructure for codecs #1125

Merged
merged 16 commits into from
Jul 13, 2023

Conversation

alcaeus
Copy link
Member

@alcaeus alcaeus commented Jul 7, 2023

PHPLIB-1180

This follows the prototyping work done in #1059.

This PR only introduces the codec interfaces and two standard codecs to traverse arrays and objects. Support for lazy BSON deserialisation will be added in a separate PR.

Compared to the prototype PR, the following changes have been made to the codecs:

  • Introduced encodeIfSupported and decodeIfSupported methods (suggested during review)
  • Special treatment for null in CodecLibrary has been removed

Other comments in the prototype PR concerned other areas (such as lazy BSON structures), those will be worked into future pull requests.

@alcaeus alcaeus requested review from jmikola and GromNaN July 7, 2023 08:54
@alcaeus alcaeus self-assigned this Jul 7, 2023
@alcaeus alcaeus changed the title PHPLIB-1180: Add basic infrastructure to codecs PHPLIB-1180: Add basic infrastructure for codecs Jul 7, 2023
@alcaeus alcaeus force-pushed the phplib-1180_codec-infrastructure branch from 36c0c20 to 37eb67c Compare July 7, 2023 10:10
stubs/BSON/Document.stub.php Show resolved Hide resolved
src/Codec/architecture.md Outdated Show resolved Hide resolved
src/Codec/architecture.md Outdated Show resolved Hide resolved
src/Codec/architecture.md Show resolved Hide resolved
src/Codec/architecture.md Outdated Show resolved Hide resolved
src/Codec/architecture.md Outdated Show resolved Hide resolved
src/Codec/architecture.md Outdated Show resolved Hide resolved
src/Codec/KnowsCodecLibrary.php Show resolved Hide resolved
@alcaeus alcaeus force-pushed the phplib-1180_codec-infrastructure branch from 8fa0518 to 74cf9bb Compare July 7, 2023 12:27
@alcaeus
Copy link
Member Author

alcaeus commented Jul 7, 2023

@GromNaN thank you for the feedback, especially on the architecture doc. I've reworded a lot in the document and better outlined future work.

I've also added descriptions to the codec interfaces and the methods therein.

@alcaeus alcaeus force-pushed the phplib-1180_codec-infrastructure branch from 74cf9bb to 78dc173 Compare July 7, 2023 12:50
src/Codec/Decoder.php Outdated Show resolved Hide resolved
src/Codec/DocumentCodec.php Outdated Show resolved Hide resolved
use MongoDB\Exception\InvalidArgumentException;

/**
* @internal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it internal?
I expect there will be cases where you only need to encode or decode, and not both. For example: you want to decode the result of an aggregation with a particular structure: you don't need to write the code to encode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was aware of an encoder-only use case for the aggregation pipeline builder, but that would use a more specific interface extending Encoder. The use case of only needing to Decode documents for results from an aggregation pipeline completely slipped my mind. I'll remote the internal designation from the two interfaces.

tests/Codec/ArrayCodecTest.php Outdated Show resolved Hide resolved
@@ -86,7 +87,8 @@ public function provideProjectClassNames()
continue;
}

$classNames[][] = 'MongoDB\\' . str_replace(DIRECTORY_SEPARATOR, '\\', substr($file->getRealPath(), strlen($srcDir) + 1, -4));
$className = 'MongoDB\\' . str_replace(DIRECTORY_SEPARATOR, '\\', substr($file->getRealPath(), strlen($srcDir) + 1, -4));
$classNames[$className][] = $className;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted this this ensures the class names gets used as a more descriptive data provider name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could yield the result directly.

Suggested change
$classNames[$className][] = $className;
yield $className => [$className];

Copy link
Member

@GromNaN GromNaN Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An evil code shortcut:

Suggested change
$classNames[$className][] = $className;
yield $className => $className = 'MongoDB\\' . str_replace(DIRECTORY_SEPARATOR, '\\', substr($file->getRealPath(), strlen($srcDir) + 1, -4));

https://3v4l.org/ntqfH

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave that out to not have more changes than necessary here. That said, I've created PHPLIB-1191 to track making data providers static and refactoring them to return Generator instances instead of arrays.

stubs/BSON/Document.stub.php Show resolved Hide resolved
src/Codec/architecture.md Outdated Show resolved Hide resolved
src/Codec/architecture.md Outdated Show resolved Hide resolved
src/Codec/architecture.md Outdated Show resolved Hide resolved
Comment on lines 16 to 19
$value = [
'magic',
'cigam',
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my first read through the file, I didn't realize that the magic and cigam strings were being encoded and decoded, respectively. I think this would benefit from a comment (at least on the first test case).

Alternatively, replace the string values with something more self-documenting, such as "encoded" and "decoded", so it's clear that decode() ends up changing the only "encoded" value to "decoded" and leaves the other "decoded" value as-is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've updated the test values.

tests/Codec/CodecLibraryTest.php Show resolved Hide resolved
tests/Codec/CodecLibraryTest.php Outdated Show resolved Hide resolved
src/Codec/Decoder.php Outdated Show resolved Hide resolved
tests/Codec/ObjectCodecTest.php Outdated Show resolved Hide resolved
@alcaeus alcaeus force-pushed the phplib-1180_codec-infrastructure branch from 78dc173 to bc40f96 Compare July 11, 2023 10:59
@alcaeus
Copy link
Member Author

alcaeus commented Jul 11, 2023

@jmikola @GromNaN Thank you for your reviews. I went through the comments and have made numerous changes. I've removed the ArrayCodec and ObjectCodec classes - if necessary I'll re-introduce them when adding codecs for lazy BSON values. At this time I'm not 100% sure they are useful the way I created them.

@alcaeus alcaeus requested review from GromNaN and jmikola July 11, 2023 11:05
Copy link
Member

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the ArrayCodec and ObjectCodec classes

Glad to hear. I wasn't sure what those were actually used for (within the scope of this PR).

Just a few more comments/thoughts about the exception classes.

use function get_debug_type;
use function sprintf;

class UnencodableValueException extends InvalidArgumentException implements Exception
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on this.

For GridFS, we lumped its three exceptions under a MongoDB\GridFS\Exception namespace. Is that something we'd want to do for Codec as well, assuming these exceptions will only be used by classes in MongoDB\Codec?

I think the only trade-off is that it makes the use statement a bit longer, but it does convey that these are codec exceptions.

And a separate question for @GromNaN, since you suggested using dedicated exceptions. Do you feel strongly about having a separate exception for each method, or could there be a common UnsupportedValueException thrown by both encode() and decode()? I can't predict all ways the codec feature will be used, but I don't expect it'd be common for applications to encode and decode within the same try/catch scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 for moving to MongoDB\Codec\Exception namespace.

This exceptions are LogicException, meaning they may pop in dev and never in prod. The message is more important than the exception class. I don't really know if it's better to have 1 or 2 classes.
I find it useful that @alcaeus added the $value property, so that it can be inspected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight, I regret that we introduced so many exceptions in the initial release of PHPC. There are cases where different exceptions made sense (e.g. CommandException, ServerException, WriteException), but there are an equal number of mistakes (e.g. SSLConnectionException).

In this case, both exceptions expose a $value property and there should be little confusion about the context. I'd very much be in favor of using a common exception to reduce the cognitive and documentation overhead.

private $value;

/** @param mixed $value */
public function __construct($value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up sharing exceptions for encode and decode, I'd suggest making the constructor private and creating factory methods like we do for other exceptions.

I might suggest doing this regardless, as it keeps this non-internal class more flexible. By overriding the constructor here, you're making it difficult to ever pass original constructor arguments to InvalidArgumentException (e.g. code, previous exception).

My suggested factory method names would be:

  • If a common UnsupportedValueException: invalidEncodableType() and invalidDencodableType()
  • If separate classes: invalidType()

stubs/BSON/Document.stub.php Show resolved Hide resolved
tests/Codec/CodecLibraryTest.php Outdated Show resolved Hide resolved
tests/Codec/CodecLibraryTest.php Outdated Show resolved Hide resolved
@alcaeus alcaeus force-pushed the phplib-1180_codec-infrastructure branch from bc40f96 to 283eb0c Compare July 12, 2023 11:39
@alcaeus alcaeus requested a review from jmikola July 12, 2023 11:39
src/Codec/architecture.md Outdated Show resolved Hide resolved
src/Exception/UndecodableValueException.php Outdated Show resolved Hide resolved
@alcaeus alcaeus force-pushed the phplib-1180_codec-infrastructure branch from 283eb0c to 09f5475 Compare July 12, 2023 11:45
@alcaeus alcaeus requested a review from GromNaN July 12, 2023 11:45
@@ -0,0 +1,16 @@
<?php

Copy link
Member

@jmikola jmikola Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add copyright copypasta to all of the source files:

/*
 * Copyright 2023-present MongoDB, Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *   https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

Not needed in tests and I think we can skip this for the stubs, too.

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once you add copyright notices to source files and respond to my question about the CodecLibrary constructor.

private $encoders = [];

/** @param Decoder|Encoder $items */
public function __construct(...$items)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we settle on this signature, I just want to confirm that you don't expect we'd ever need to solicit additional params in the constructor (e.g. options array).

If so, consider making this an array argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - this particular library is a first-come-first-serve library. If we have the need for different behaviour, I'd rather create a separate library class for the purpose.

use function get_debug_type;
use function sprintf;

class UnsupportedValueException extends InvalidArgumentException implements Exception
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this wasn't in the Codec namespace. SGTM as this seems generally useful for casts where the argument isn't invalid simply because of its type and the actual value might be relevant.

For the existing cases in InvalidArgumentException, I think we only care about the type itself so there's no need to attach the value to the exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack - I figured a new namespace for this one exception might be overkill.

instance to guarantee that data always encodes to a BSON document and decodes to a PHP object.

All operations that return documents will use the codec to decode the documents into PHP objects. This includes
the various `find` and `findAndModify` operations in collections as well as the `aggregate` and `watch` operations in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alcaeus: outstanding question above. It doesn't affect my review, but I'd be curious to know before this gets picked up in a future PR.

@alcaeus alcaeus force-pushed the phplib-1180_codec-infrastructure branch from 09f5475 to 6a70032 Compare July 13, 2023 06:03
@alcaeus alcaeus merged commit 3e2e42f into mongodb:master Jul 13, 2023
@alcaeus alcaeus deleted the phplib-1180_codec-infrastructure branch July 13, 2023 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants