-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIU-390: Move plasma drop implementation to this repository. #1
Conversation
Reviewer's Guide by SourceryThis pull request moves the Plasma IO and PlasmaDrops implementation from the daliuge-engine repository to this repository. It introduces new classes for handling Plasma IO and PlasmaDrops, including PlasmaIO, PlasmaFlightIO, PlasmaDROP, and PlasmaFlightDROP. Additionally, it adds a new PlasmaFlightClient class for accessing plasma-backed arrow flight data servers. Comprehensive test cases for these new classes have been added, and the README and init.py files have been updated accordingly. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @myxie - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 4 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟡 Documentation: 2 issues found
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
self._expected_size = ( | ||
expected_size if expected_size and expected_size > 0 else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider simplifying the conditional assignment.
The conditional assignment for _expected_size
could be simplified to self._expected_size = expected_size if expected_size > 0 else None
since expected_size
is already optional.
self._expected_size = ( | |
expected_size if expected_size and expected_size > 0 else None | |
self._expected_size = ( | |
expected_size if expected_size > 0 else None | |
) |
|
||
@overrides | ||
def _open(self, **kwargs): | ||
return plasma.connect(self._plasma_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Handle potential connection errors.
Consider adding error handling for the connection to the plasma store. This will help in diagnosing issues if the connection fails.
daliuge_plasma_components/data.py
Outdated
assert isinstance( | ||
data, Union[memoryview, bytes, bytearray, pyarrow.Buffer].__args__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Use isinstance with a tuple.
Instead of using Union.__args__
, you can pass a tuple of types directly to isinstance
: assert isinstance(data, (memoryview, bytes, bytearray, pyarrow.Buffer))
.
assert isinstance( | |
data, Union[memoryview, bytes, bytearray, pyarrow.Buffer].__args__ | |
assert isinstance( | |
data, (memoryview, bytes, bytearray, pyarrow.Buffer) |
scheme (str, optional): [description]. Defaults to "grpc+tcp". | ||
connection_args (dict, optional): [description]. Defaults to {}. | ||
""" | ||
self.plasma_client: plasma.PlasmaClient = plasma.connect(socket) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Handle potential connection errors.
Consider adding error handling for the connection to the plasma store. This will help in diagnosing issues if the connection fails.
@@ -17,11 +17,7 @@ There are multiple options for the installation, depending on how you are intend | |||
```bash | |||
pip install daliuge_plasma_components | |||
``` | |||
This will only work after releasing the project to PyPi. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (documentation): Clarify if the project has been released to PyPi.
The line about PyPi release has been removed. Has the project been released to PyPi, making this line obsolete?
Add your functionality in the run method and optional additional | ||
methods. | ||
|
||
class PlasmaFlightClient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider breaking down the new functionality into smaller, more manageable pieces.
The new code introduces significant complexity compared to the original. Here are some points to consider:
- Increased Code Size: The new code is significantly longer, making it more complex to read and understand.
- Additional Dependencies: The new code introduces several new dependencies (
pyarrow
,pyarrow.flight
,pyarrow.plasma
,hashlib
,BytesIO
, andOptional
), which increases the complexity of the codebase. - New Functionality: The new code adds a lot of new functionality related to Plasma and Flight clients, which adds to the cognitive load required to understand the code.
- Error Handling: The new code includes more error handling, which adds to the complexity.
- Logging: The new code includes more logging statements, which, while useful for debugging, add to the overall complexity.
- Documentation: The new code has more extensive documentation, which adds to the length and complexity of the file.
To maintain simplicity while adding new functionality, consider breaking down the new functionality into smaller, more manageable pieces. For example, you could introduce a separate PlasmaClient
class to handle Plasma-related operations, keeping it separate from the MyAppDROP
class. This approach keeps the code more modular and easier to manage.
if inputDrop.status == DROPStates.COMPLETED: | ||
if inputDrop.checksum: | ||
crcSum += inputDrop.checksum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Merge nested if conditions (merge-nested-ifs
)
if inputDrop.status == DROPStates.COMPLETED: | |
if inputDrop.checksum: | |
crcSum += inputDrop.checksum | |
if inputDrop.status == DROPStates.COMPLETED and inputDrop.checksum: | |
crcSum += inputDrop.checksum | |
Explanation
Too much nesting can make code difficult to understand, and this is especiallytrue in Python, where there are no brackets to help out with the delineation of
different nesting levels.
Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two if
conditions can be combined using
and
is an easy win.
self._writer.close() | ||
else: | ||
self._desc.seal(self._object_id) | ||
self._writer.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Hoist repeated code outside conditional statement (hoist-statement-from-if
)
daliuge_plasma_components/data.py
Outdated
return "plasma://%s" % ( | ||
binascii.hexlify(self.object_id).decode("ascii") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Replace interpolated string formatting with f-string (replace-interpolation-with-fstring
)
return "plasma://%s" % ( | |
binascii.hexlify(self.object_id).decode("ascii") | |
) | |
return f'plasma://{binascii.hexlify(self.object_id).decode("ascii")}' |
daliuge_plasma_components/data.py
Outdated
return "plasmaflight://%s" % ( | ||
binascii.hexlify(self.object_id).decode("ascii") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Replace interpolated string formatting with f-string (replace-interpolation-with-fstring
)
return "plasmaflight://%s" % ( | |
binascii.hexlify(self.object_id).decode("ascii") | |
) | |
return f'plasmaflight://{binascii.hexlify(self.object_id).decode("ascii")}' |
Summary by Sourcery
This pull request moves the plasma drop implementation from the
daliuge/daliuge-engine
repository to this repository. It introduces new classes for handling shared-memory IO operations using plasma store and Arrow Flight protocol, and adds corresponding unit tests. The README.md has been updated to reflect these changes.PlasmaIO
andPlasmaFlightIO
classes for handling shared-memory IO operations using plasma store and Arrow Flight protocol.PlasmaDROP
andPlasmaFlightDROP
classes to represent data stored in a Plasma Store and managed by Arrow Flight protocol.PlasmaFlightClient
class for accessing plasma-backed Arrow Flight data server.daliuge/daliuge-engine
to this repository, consolidating related functionality.daliuge_plasma_components
package and its installation instructions.PlasmaDROP
andPlasmaFlightDROP
classes, including tests for writing data and dynamic writing with staging buffers.