Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor parse_script function for readability and maintainability #90

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 24 additions & 10 deletions src/rawdog/parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,25 +5,39 @@
def parse_script(response: str) -> tuple[str, str]:
"""Split the response into a message and a script.

Expected use is: run the script if there is one, otherwise print the message.
The function splits the response by '```' delimiters, extracting the message and script parts.
It checks for common mistakes, such as 'python' prefix and attempts to parse the script as JSON.
Finally, it validates the script as valid Python code using ast module.

Args:
response (str): The input response containing a message and optionally a script.

Returns:
tuple[str, str]: A tuple containing the message (or error message) and the script.
"""
# Parse delimiter
n_delimiters = response.count("```")
if n_delimiters < 2:
delimiter_count = response.count("```")
if delimiter_count < 2:
return response, ""

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line attempts to concatenate a message with the last segment after splitting, which might not necessarily be always the part of the original message, especially if there are an uneven number of ''. Ensure that relevant test cases are checking for varying counts of '' to avoid potential errors in message reconstruction.

segments = response.split("```")
message = f"{segments[0]}\n{segments[-1]}"
script = "```".join(segments[1:-1]).strip() # Leave 'inner' delimiters alone

# Check for common mistakes
if script.split("\n")[0].startswith("python"):
script = "\n".join(script.split("\n")[1:])
try: # Make sure it isn't json
if script.startswith("python"):
script = script[len("python"):] # Remove 'python' prefix

# Attempt to parse script as JSON
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformation from the prefix 'python' might not handle additional whitespace or characters that could follow the prefix, such as 'python3'. Consider using a more robust method to strip out not just 'python' but also variations to cater for possible user inputs. We could use regex for this purpose to ensure a standard and safe removal of the prefix.

try:
script = json.loads(script)
except Exception:
except json.JSONDecodeError:
pass
try: # Make sure it's valid python

# Validate the script as valid Python code
try:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the script variable may have been transformed into a JSON object at line 33, this call to ast.parse(script) might raise an error since ast.parse expects a string as an input, not a Python object like a dictionary or list. We should ensure that ast.parse is only called if script is a string. Consider adding a check to verify the type of script before parsing it as Python code, and adjust the error handling accordingly.

ast.parse(script)
except SyntaxError:
return f"Script contains invalid Python:\n{response}", ""
except SyntaxError as e:
return f"Script contains invalid Python:\n{e}", ""

return message, script