Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(to_string): change the behaviour of bytes to string #1141

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

qazxcdswe123
Copy link
Contributor

BREAKING CHANGE for bytes to string

Copy link

peter-jerry-ye-code-review bot commented Oct 18, 2024

From the provided git diff output, I observe several potential issues and suggestions for improvement in the MoonBit language code. Here are the top three points:

  1. Redundant Code in to_string Function:

    • In the builtin/bytes.mbt file, the to_string function has been significantly refactored to include handling for converting bytes to a string with escape sequences. However, the original implementation directly used self.sub_string(0, self.length()), which seems to have been replaced by a more detailed and potentially redundant implementation. The new implementation includes functions to_hex_digit and printable, which are also defined in the Show implementation for Bytes. This redundancy could be simplified by reusing the existing Show implementation or consolidating the logic to avoid duplication.
    pub fn to_string(self : Bytes) -> String {
      self.sub_string(0, self.length())
    }

    Suggestion: Consider if the detailed handling in to_string is necessary or if it can be simplified by leveraging existing utility functions or the Show trait.

  2. Inconsistent Byte Representation in Tests:

    • Several tests in different files (builtin/bytes_test.mbt, bytes/bytes_test.mbt, string/string_test.mbt) have been modified to change the representation of byte sequences. For example, the original hex representation b"\x41\x42\x43" has been changed to the literal ASCII characters b"ABC". This change might simplify readability but could also lead to confusion if the intention was to explicitly test hex-to-byte conversion or if the original hex values were significant for the test case.
    inspect!(
      b"ABC",
      content=
        #|b"\x41\x42\x43"
      ,
    )

    Suggestion: Ensure that the representation of byte sequences in tests aligns with the testing goals, whether focusing on hex values or literal ASCII characters.

  3. Potential Typographical Error in to_hex_digit:

    • In the builtin/bytes.mbt file, the to_hex_digit function converts an integer to its hexadecimal character representation. The function assumes that Char::from_int will correctly convert an integer to a character. However, if the integer is out of the expected range (0-15), the resulting character might not be a valid hexadecimal digit.
    fn to_hex_digit(i : Int) -> Char {
      if i < 10 {
        Char::from_int('0'.to_int() + i)
      } else {
        Char::from_int('a'.to_int() + (i - 10))
      }
    }

    Suggestion: Add a validation check to ensure i is within the expected range (0-15) before conversion to avoid potential invalid character outputs.

These observations highlight areas where code redundancy, test consistency, and input validation could be improved to enhance the reliability and maintainability of the MoonBit codebase.

Char::from_int('a'.to_int() + (i - 10))
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we print ascii as is?
e.g, b"a\0b\0"

Copy link
Contributor Author

@qazxcdswe123 qazxcdswe123 Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's kinda hard to find the ascii inside a bunch of \00.

Also encounter: Fatal error: exception File "inlined_snapshot/_build/Core_format.ml", line 200, characters 8-14: Assertion failed. Investigating

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we print ascii as is? e.g, b"a\0b\0"

displaying some byte as ASCII may not be a good idea. Bytes are not always converted from a String, they may come from data of an image or some pieces of memory. ASCII is meaningless in such cases.

Copy link
Contributor Author

@qazxcdswe123 qazxcdswe123 Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second this as UTF-16 encoded string has lots of \x00 and printing them is very ugly.

And Bytes should just be Bytes, it should print in Byte format, not readble format.

@coveralls
Copy link
Collaborator

coveralls commented Oct 21, 2024

Pull Request Test Coverage Report for Build 3820

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 20 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.05%) to 80.258%

Files with Coverage Reduction New Missed Lines %
/Users/runner/work/core/core/builtin/bytes.mbt 5 89.36%
/Users/runner/work/core/core/builtin/show.mbt 15 66.0%
Totals Coverage Status
Change from base Build 3812: 0.05%
Covered Lines: 4358
Relevant Lines: 5430

💛 - Coveralls

inspect!(
"\n\t\"\\\r\b".to_bytes(),
content=
#|b"\n\x00\t\x00\"\x00\\\x00\r\x00\b\x00"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not quite sure whether \n should yield \\n or \n like this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\\n should never be used unless the user absolutely wants to see the backslash, which is typically not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants