Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add best practices doc section for string manipulation #385

Open
mothslaw opened this issue Jun 17, 2021 · 0 comments
Open

Add best practices doc section for string manipulation #385

mothslaw opened this issue Jun 17, 2021 · 0 comments
Assignees

Comments

@mothslaw
Copy link
Contributor

mothslaw commented Jun 17, 2021

Is your feature request related to a problem? Please describe.
In Python 2, it's non-obvious how to handle strings correctly, such that plugin code doesn't have problems when encountering non-ASCII text. Code like this can cause problems:

result = libs.run_bash(connection, "cat settings.txt | grep FIRST_NAME | awk '{print $2'}")
first_name = result.stdout
message = "The first name is {}".format(first_name)

The problem happens when the remote-side output contains non-ASCII characters. When we create message, we are calling format on an str object, not a unicode object. That means that Python needs to convert those non-ASCII characters into bytes. But, we've never specified which encoding to use to do so.

Describe the solution you'd like

A general best practice is the so-called "Unicode Sandwich". This says to always use unicode objects, not str objects.

The only exception is when directly interacting with other code that really expects/produces sequences of bytes (not characters). Even so, you should immediately decode any received bytes before the rest of your code sees them, and you should encode characters to bytes at the last possible second before sending them out. For plugins, any strings passed to/from Delphix code already support unicode objects, so this exception does not apply to plugins

So, we want to encourage plugin authors to:

  • Always use Unicode objects (u"Hello, World", not "Hello, World")
  • Never call encode or decode.

We should:

  1. Document this as a best practice. This includes giving examples of problematic code, as above
  2. Change our documentation examples so that they actually follow this best practice.
  3. Change dvp init so that the code it generates also follows this best practice.
@mothslaw mothslaw self-assigned this Jun 17, 2021
@mothslaw mothslaw changed the title Add best practices doc section for string manipuation Add best practices doc section for string manipulation Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant