RET504 false-positive when variable built up with assignments and returned #2950

gdub · 2023-02-16T06:07:01Z

Example:

def get_options(option1, option2):
    options = []
    if option1:
        options = options + ["option1"]
    if option2:
        options = options + ["option2"]
    return options

Output:

$ ruff check --isolated --select RET ret_test.py 
ret_test2.py:16:12: RET504 Unnecessary variable assignment before `return` statement
Found 1 error.

In this case, there is no unnecessary variable assignment that could be removed and cannot do an earlier return as need to get through all the conditionals.

That said, the error does not trigger if one were to instead use an in-place operator such as +=:

def get_options(option1, option2):
    options = []
    if option1:
        options += ["option1"]
    if option2:
        options += ["option2"]
    return options

Version:

$ ruff --version
ruff 0.0.247

The text was updated successfully, but these errors were encountered:

charliermarsh · 2023-02-16T14:01:28Z

Possibly not respecting overrides, does look like a bug.

henryiii · 2023-02-16T17:04:00Z

Just noticed this too!

def f(y):
    x = 1
    if y:
        x = 2

    return x

$ ruff tmp.py --isolated --select RET
tmp.py:6:12: RET504 Unnecessary variable assignment before `return` statement
Found 1 error.

It seems the RET checks don't handle branches very well (the "else after" ones also are broken when you have a chain).

charliermarsh · 2023-02-16T17:07:17Z

It may wants you to do this, even though it's awkward:

def f(y):
    if y:
        return 2

    return 1

the "else after" ones also are broken when you have a chain

Is this on main? I removed enforcement for elif chains in the most recent release.

charliermarsh · 2023-02-16T17:09:18Z

(I find these rules really tedious lol. Maybe they should just be removed.)

henryiii · 2023-02-16T17:19:39Z

They will come back again when adding pylint rules. :) I like these in pylint a lot, which I is why I'm happy to see them showing up in a faster pre-commit friendly form, but the pylint version is solid and doesn't trigger falsely, while RET seems to not properly handle branching (I expect it's a problem with the flake8 plugin).

It may wants you to do this, even though it's awkward

It's not practical in the actual code that caused me to look into this. The fix should be to just only enforce x = value; return value and not if the assignment is in an if statement (with statement is 90% of the time fine, just not if statements)

henryiii · 2023-02-16T17:20:37Z

I haven't rerun with the most recent release yet for the RET 507 and similar rules, quite possibly fixed!

charliermarsh · 2023-02-16T17:22:01Z

Ah yeah, I mostly meant that I have issues with the implementation. It's a direct port of the flake8 plugin, and it just has a bunch of limitations (in addition to being hard to reason about).

henryiii · 2023-02-16T17:36:08Z

Ahh, I see what you mean.

def f(y):
    x = 1
    if y:
        return 2

    return x

Would pass. I could live with that in some/most cases.

charliermarsh · 2023-02-16T17:38:34Z

I think it will still complain about x=1 :(

henryiii · 2023-02-16T18:04:27Z

In a realistic example (where other things are happening to the x before the final if), it seems pretty happy. I didn't end up with any false positives that I couldn't handle knowing that it was fine just changing the assignment in the if to a return. There were several when converting Plumbum and scikit-hep/particle@10caa90 was the one that prompted me to search for this issue.

martinlehoux · 2023-03-04T13:18:04Z

Just to add one more (almost) real life example: I have this pattern that seems reasonable in Django to build a queryset based on different inputs

def get_queryset(option_1, option_2):
    queryset: Any = None

    queryset = queryset.filter(a=1)

    if option_1:
        queryset = queryset.annotate(b=Value(2))

    if option_2:
        queryset = queryset.filter(c=3)

    return queryset

main.py:15:12: RET504 Unnecessary variable assignment before return statement

henryiii · 2023-03-04T14:24:34Z

This would be how you’d make it happy:

def get_queryset(option_1, option_2):
    queryset: Any = None

    queryset = queryset.filter(a=1)

    if option_1:
        queryset = queryset.filter(b=2)

    if option_2:
        return queryset.filter(c=3)

    return queryset

Not quite as symmetric, but not really “worse” from a readers standpoint either. I’d probably disable the check if this was a lot of very symmetric conditions (like you have described), though in practice most of them are not that symmetric and are fine written with the extra return. Overriding a variable is a bit ugly, so avoiding an unnecessary one isn’t that bad.

gdub · 2023-03-06T17:13:45Z

Personally, I dislike the resulting work-around because:

The lack of symmetry feels unnatural.
If you need to add an additional condition block, you would have churn turning the return statement back to an assignment.
It does not save any lines, as is the case for simple assignment then return.

One idea would be to only flag the single-assignment and return scenarios, e.g.:

def get_queryset():
    queryset = Model.filter(a=1)
    return queryset

...and not flag scenarios where there are multiple assignments to the same variable before returning, e.g.:

def get_queryset(option_1, option_2):
    queryset = Model.filter(a=1)
    if option_1:
        queryset = queryset.filter(b=2)
    if option_2:
        queryset = queryset.filter(c=3)
    return queryset

Though, I guess that would also mean that something like this would pass not get flagged:

def get_queryset():
    queryset = Model.filter(a=1)
    queryset = queryset.filter(c=3)
    return queryset

Which raises the question of what is it that this rule is really meant to be targeting? Cases where would save a line of code due to an unnecessary assignment? Should check be simplified to only look for the simple assign/return case that happens at the same nesting level, and ignore if there is more complex branching/looping involved?

Czaki · 2023-03-06T17:21:44Z

Though, I guess that would also mean that something like this would pass not get flagged:

If the rule should only be ignored if the assignment is under if clause.

bellini666 · 2023-03-06T17:24:23Z

I agree with @gdub . In cases similar to the one mentioned by him, I usually do # noqa: RET504, but it would be very good if that kind of usage got exempted from the rule.

Don't know if it is simple for ruff to detect it though =P

charliermarsh · 2023-03-06T17:50:52Z

The multiple assignments heuristic seems like it could work... pretty well? In my personal opinion I actually prefer this to inlining return queryset.filter(c=3):

def get_queryset():
    queryset = Model.filter(a=1)
    queryset = queryset.filter(c=3)
    return queryset

It also has the nice property that if you add another operation to the fluent chain, the diff much clearer. But, it's all a bit subjective of course.

I think the intent of the rule is to capture "trivial" errors like:

x = 1
return x

Or:

x = 1
print("Got 1")
return x

Part of me is like... is this even a rule worth having? I wish I had data on how many people are using it and how often it's # noqa'd or ignored, etc.

NeilGirdhar · 2023-03-06T18:08:11Z

I wish I had data on how many people are using it and how often it's # noqa'd or ignored, etc.

Can you run Ruff on a corpus of projects and gather rule hit counts and a random sampling of examples? MyPy does something like that with its MyPy Primer, which they use to evaluate pull requests. It might be nice to see how various changes make rules more or less sensitive, and examples of lines that would be newly flagged, and also examples of lines would no longer be flagged.

charliermarsh · 2023-03-06T18:09:58Z

I'd like to do something like this: index projects using Ruff, and get some data on their configurations (and on how often they noqa various rules).

gdub · 2023-03-06T18:38:48Z

Interesting looking through some results from:
https://github.com/search?q=noqa%3A+RET504&type=code
https://github.com/search?q=RET504&type=Code

Shows up in some ignores with comments, e.g.:

# RET504: Unnecessary variable before return (ugly)

    # suppress what appears to be false-positives with these checks
    'RET504',
    'RET505',
    'RET506',
    'RET507',
    'RET508',

See many examples with similar chaining-style assignment with conditionals that we've touched on already.

Here's an interesting case where variable is assigned to, referenced, and then returned:
https://github.com/KyleKing/pytest_cache_assert/blob/47d0b538528a9c9cdbc182d2b276462a2aa0fd4d/tests/test_flask.py#L20

def _create_app() -> Flask:
    app = Flask(__name__)

    @app.route('/hello')
    def hello() -> str:
        """Hello endpoint."""
        return 'Hello, World!'

    return app  # noqa: RET504

Seems like something that shouldn't be flagged.

Perhaps instead of an assignment count, heuristic could be reference count. In other words, excluding return statement, flag if the only reference was assignment.

charliermarsh · 2023-03-08T00:26:29Z

This PR productionizes that suggestion (avoid RET504 for variables with multiple assignments): #3393.

Feedback welcome, I won't be merging tonight.

charliermarsh · 2023-03-08T00:32:37Z

(Fixed the decorator thing in #3395, separate bug!)

charliermarsh · 2023-03-08T23:18:34Z

Ok, I looked through the GitHub Code Search, and I do see a lot of examples of this kind of "fluent interface"-like assignment style, e.g.:

command = self.extract_command(text)
self.validate_prefix(command=command)
await self.validate_mention(bot=bot, command=command)
command = self.validate_command(command)
command = self.do_magic(command=command)
return command  # noqa: RET504

This strikes me as very reasonable code, so I'm gonna go ahead and merge that PR.

henryiii · 2023-03-09T19:50:09Z

Personally, like moving the return up one, reducing the number of variable overwrites there (and that sort of code isn't very good for static typing unless every command returns the same thing!)

I don't see any problem with:

command = self.extract_command(text)
self.validate_prefix(command=command)
await self.validate_mention(bot=bot, command=command)
command = self.validate_command(command)
return self.do_magic(command=command)

In fact, there's only one overwrite now, so I'd actually do:

command = self.extract_command(text)
self.validate_prefix(command=command)
await self.validate_mention(bot=bot, command=command)
validated_command = self.validate_command(command)
return self.do_magic(command=validated_command)

IMO, this now reads much better. The point of style checks is to force better style, not to try not to bother anyone or conform to what everyone is doing if it's wrong. I think the biggest problem with this check was that it wasn't clear how to "fix" it from the message (and also some false positives, like the function false positive above). And it's totally fine to disable a style check you don't agree with! Whenever I talk about style checking, I emphasize this (especially WRT PyLint).

Also, one of the points of this sort of check is to help two people writing the same logic get the same code. Returning without assigning is perfectly valid, so it's reasonable to force this rather than force an assignment always before return.

For example, I think this would have read much better without an extra assignment: https://github.com/scikit-build/scikit-build-core/pull/197/files#diff-5a9598893dbb4007601522cfb26b27c92d790c18bf35d7bfe86205ae1955fa0bR47-R48

I'm not sure why Ruff didn't find that, as I'd guess #3393 wasn't in the version of Ruff that ran on that, but I wish it did.

## Summary The `RET504` rule, which looks for unnecessary assignments before return statements, is a frequent source of issues (#4173, #4236, #4242, #1606, #2950). Over time, we've tried to refine the logic to handle more cases. For example, we now avoid analyzing any functions that contain any function calls or attribute assignments, since those operations can contain side effects (and so we mark them as a "read" on all variables in the function -- we could do a better job with code graph analysis to handle this limitation, but that'd be a more involved change.) We also avoid flagging any variables that are the target of multiple assignments. Ultimately, though, I'm not happy with the implementation -- we just can't do sufficiently reliable analysis of arbitrary code flow given the limited logic herein, and the existing logic is very hard to reason about and maintain. This PR refocuses the rule to only catch cases of the form: ```py def f(): x = 1 return x ``` That is, we now only flag returns that are immediately preceded by an assignment to the returned variable. While this is more limiting, in some ways, it lets us flag more cases vis-a-vis the previous implementation, since we no longer "fully eject" when functions contain function calls and other effect-ful operations. Closes #4173. Closes #4236. Closes #4242.

## Summary This PR extends the new `RET504` implementation to handle cases like: ```py def foo(): with open("foo.txt", "r") as f: x = f.read() return x ``` This was originally suggested in #2950 (comment).

## Summary The `RET504` rule, which looks for unnecessary assignments before return statements, is a frequent source of issues (#4173, #4236, #4242, #1606, #2950). Over time, we've tried to refine the logic to handle more cases. For example, we now avoid analyzing any functions that contain any function calls or attribute assignments, since those operations can contain side effects (and so we mark them as a "read" on all variables in the function -- we could do a better job with code graph analysis to handle this limitation, but that'd be a more involved change.) We also avoid flagging any variables that are the target of multiple assignments. Ultimately, though, I'm not happy with the implementation -- we just can't do sufficiently reliable analysis of arbitrary code flow given the limited logic herein, and the existing logic is very hard to reason about and maintain. This PR refocuses the rule to only catch cases of the form: ```py def f(): x = 1 return x ``` That is, we now only flag returns that are immediately preceded by an assignment to the returned variable. While this is more limiting, in some ways, it lets us flag more cases vis-a-vis the previous implementation, since we no longer "fully eject" when functions contain function calls and other effect-ful operations. Closes #4173. Closes #4236. Closes #4242.

## Summary This PR extends the new `RET504` implementation to handle cases like: ```py def foo(): with open("foo.txt", "r") as f: x = f.read() return x ``` This was originally suggested in #2950 (comment).

charliermarsh added the bug Something isn't working label Feb 16, 2023

charliermarsh mentioned this issue Mar 8, 2023

Ignore multiply-assigned variables in RET504 #3393

Merged

charliermarsh mentioned this issue Mar 8, 2023

RET504 false positive via decorator #3394

Closed

charliermarsh closed this as completed in #3393 Mar 9, 2023

charliermarsh mentioned this issue Mar 15, 2023

Warning from VS Code extension but not from CLI #3551

Closed

This was referenced Jun 10, 2023

Refactor RET504 to only enforce assignment-then-return pattern #4997

Merged

Extend revised RET504 implementation to with statements #4998

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RET504 false-positive when variable built up with assignments and returned #2950

RET504 false-positive when variable built up with assignments and returned #2950

gdub commented Feb 16, 2023

charliermarsh commented Feb 16, 2023

henryiii commented Feb 16, 2023

charliermarsh commented Feb 16, 2023

charliermarsh commented Feb 16, 2023

henryiii commented Feb 16, 2023

henryiii commented Feb 16, 2023

charliermarsh commented Feb 16, 2023

henryiii commented Feb 16, 2023 •

edited

Loading

charliermarsh commented Feb 16, 2023

henryiii commented Feb 16, 2023

martinlehoux commented Mar 4, 2023 •

edited

Loading

henryiii commented Mar 4, 2023 •

edited

Loading

gdub commented Mar 6, 2023

Czaki commented Mar 6, 2023

bellini666 commented Mar 6, 2023

charliermarsh commented Mar 6, 2023

NeilGirdhar commented Mar 6, 2023 •

edited

Loading

charliermarsh commented Mar 6, 2023

gdub commented Mar 6, 2023

charliermarsh commented Mar 8, 2023

charliermarsh commented Mar 8, 2023

charliermarsh commented Mar 8, 2023

henryiii commented Mar 9, 2023

RET504 false-positive when variable built up with assignments and returned #2950

RET504 false-positive when variable built up with assignments and returned #2950

Comments

gdub commented Feb 16, 2023

charliermarsh commented Feb 16, 2023

henryiii commented Feb 16, 2023

charliermarsh commented Feb 16, 2023

charliermarsh commented Feb 16, 2023

henryiii commented Feb 16, 2023

henryiii commented Feb 16, 2023

charliermarsh commented Feb 16, 2023

henryiii commented Feb 16, 2023 • edited Loading

charliermarsh commented Feb 16, 2023

henryiii commented Feb 16, 2023

martinlehoux commented Mar 4, 2023 • edited Loading

henryiii commented Mar 4, 2023 • edited Loading

gdub commented Mar 6, 2023

Czaki commented Mar 6, 2023

bellini666 commented Mar 6, 2023

charliermarsh commented Mar 6, 2023

NeilGirdhar commented Mar 6, 2023 • edited Loading

charliermarsh commented Mar 6, 2023

gdub commented Mar 6, 2023

charliermarsh commented Mar 8, 2023

charliermarsh commented Mar 8, 2023

charliermarsh commented Mar 8, 2023

henryiii commented Mar 9, 2023

henryiii commented Feb 16, 2023 •

edited

Loading

martinlehoux commented Mar 4, 2023 •

edited

Loading

henryiii commented Mar 4, 2023 •

edited

Loading

NeilGirdhar commented Mar 6, 2023 •

edited

Loading