Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B007 guidance when variable used outside the loop only #10723

Open
jaraco opened this issue Apr 1, 2024 · 10 comments
Open

B007 guidance when variable used outside the loop only #10723

jaraco opened this issue Apr 1, 2024 · 10 comments
Labels
needs-decision Awaiting a decision from a maintainer

Comments

@jaraco
Copy link
Contributor

jaraco commented Apr 1, 2024

Bugbear returns a B007 for this loop when i is used solely outside the loop, a usage that appears to be valid and have endorsement from the community.

jaraco/postgres/__init__.py:507:13: B007 Loop control variable `i` not used within loop body

Somewhat strangely, renaming the variable to _i in both places does suppress the error, but that feels like the wrong "fix", since prefixing with _ is an indicator of "unused".

In this particular example, it's legacy code that I've inherited and I plan to rewrite it using a retry wrapper and avoiding an index variable altogether.

Nevertheless, it seems to me that in general, B007 should not alert here or if it does, the documentation should include guidance on what to do in this scenario.

Related: #2509.
Ruff 0.3.4

@charliermarsh charliermarsh added the needs-decision Awaiting a decision from a maintainer label Apr 2, 2024
@charliermarsh
Copy link
Member

There's some discussion around this pattern in PyCQA/flake8-bugbear#68. It looks like bugbear explicitly decided not to support this. I welcome more opinions here.

@jaraco
Copy link
Contributor Author

jaraco commented Apr 2, 2024

In my opinion, since it's language feature that's been extensively discussed and deemed supported by consensus, default linter settings should be lenient to that position. It feels a bit tyrannical to disallow a pattern just because one tool developer finds the pattern risky.

It also feels like B007 is trying to do too much and too little at the same time:

  • It raises when the loop variable is not used inside the loop, but
  • It allows the variable to be marked as private, then not used in the loop, but still used outside the loop, and
  • It doesn't protect against masking another variable in the local namespace, and
  • If the variable is used in the loop, there's no longer any concern with it being used outside the loop.

I feel like this rule should be refactored into its separate concerns and check against:

  • Loop variable overrides variable in the local scope.
  • Loop variable is not used at all in the surrounding scope.
  • Loop variable is used solely outside the loop (and thus may be unbound).

I'd further argue that only the second of those should be enabled by default and the other two should be opt-in, as that use is legitimized by consensus, but even if they're on by default, that would give users the ability to opt out of that behavior. Currently, a user or system cannot opt out of B007 for outside access without opting out for inside access.

@Nodd
Copy link

Nodd commented Apr 2, 2024

I agree that B007 is smelly, but maybe it should be kept as-is for compatibility, and new rules added following @jaraco's proposal ?

The first one could still yield false positives :

Loop variable overrides variable in the local scope.

Wouldn't it raise if there are two loops with the same variable name ?

for i in range(10):
    f(i)
# i exists here

# This loop triggers "Loop variable overrides variable in the local scope."
for i in range(10, 20):
    g(i)

@charliermarsh
Copy link
Member

@AlexWaygood - wondering if you have an opinion here?

@AlexWaygood
Copy link
Member

AlexWaygood commented Apr 2, 2024

Thanks for opening the issue @jaraco!

I'm not sure I agree that this pattern has been "supported by consensus". If anything, I feel like I see more and more linters coming out against this kind of style because it can be difficult for static-analysis tools to reason about. For example, take the following snippet. A static-analysis tool such as Ruff or mypy cannot know whether i is bound after the loop has completed or not, as an empty iterable might have been passed to the function (in which case i will be unbound), or a nonempty iterable might have been passed (in which case i will be bound):

from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    for i in it:
        pass
    return i + 42
(env) % python -i flake8-pyi/foo.py                                                                                                                                                                              ~/dev
>>> foo([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/alexw/dev/flake8-pyi/foo.py", line 6, in foo
    return i + 42
           ^
UnboundLocalError: cannot access local variable 'i' where it is not associated with a value
>>> foo([1])
43

It's for this reason that pyright will emit an error on this snippet (reportPossiblyUnboundVariable). (Mypy's possibly-undefined optional error code doesn't emit an error on this exact snippet -- I actually think that's possibly a mypy bug, but I digress -- but it does emit an error on a small variation of it.)

A small variation which mypy will also complain about if you pass `--enable-error-code=possibly-undefined` to mypy on the command line
from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    for i in it:
        x = i
    return x + 42

A safer pattern (albeit a slightly annoying one for situations where you know the iterable is non-empty even though the static-analysis tool doesn't, such as in your original snippet) is to do something like this:

from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    i: int | None = None
    for i in it:
        pass
    assert i is not None, "Empty iterable was passed to foo()!"
    return i + 42

The flake8-bugbear rules are intended to be opinionated rules, they're not enabled by default, and I think there are good reasons to want to avoid this pattern. So I would personally vote for leaving this rule as-is. (We could certainly explore improving the documentation for the rule, however.)

@jaraco
Copy link
Contributor Author

jaraco commented Apr 2, 2024

[tools] cannot know that i is bound

Indeed, and using the variable in the loop doesn't help. Modifying your example slightly:

from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    for i in it:
        log(i)
    return i + 42
from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    for _i in it:
        pass
    return _i + 42

In either case, the code no longer fails B007, but still is subject to the same failure with foo([]). If the goal is to ensure people don't access a possibly unbound loop variable outside of the loop, that should probably be a different check.

they're not enabled by default

Oh. Maybe that's my mistake. I encountered this issue by setting --select B, and I was assuming that meant opt into the recommended B rules, but maybe it means opt into all B rules, in which case, my opt-in suggestion was misguided. Reading the docs briefly, that seems to be the case.

We could certainly explore improving the documentation for the rule, however.

I'd be happy to help draft the change. Let me know if that would be valuable or assign it to me and I'll make a PR.

@AlexWaygood
Copy link
Member

AlexWaygood commented Apr 3, 2024

If the goal is to ensure people don't access a possibly unbound loop variable outside of the loop, that should probably be a different check.

Agreed that if this were the goal, then it should be enforced in a more uniform way via a separate rule. It doesn't look like we currently have such a rule, but I think we should consider adding one.

However, the question (as I see it) isn't whether it should be the goal of this check to enforce that people don't access a possibly-unbound loop variable outside of the loop -- B007 is not currently complaining on the use of the variable outside of the loop. The original report was that using the variable after the loop did not affect whether ruff (via B007) complained that the loop variable was unused within the loop. So the question, as I see it, is whether we should add complexity -- both complexity in terms of the rule's implementation, and conceptual complexity in what the rule is doing -- in order to account for a coding pattern that is reasonably rare, and is probably not advisable in the first place (accessing a loop variable outside the loop).

Why do I say this would add complexity to the implementation? Because currently, the rule only analyses the loop itself to see whether the variable is used within the loop:

let used_names = {
let mut finder = NameFinder::default();
for stmt in &stmt_for.body {
finder.visit_stmt(stmt);
}
finder.names
};

If we wanted to make sure that the rule was not emitted if the variable was used after the loop, we would have to analyse the whole scope in which the loop was defined to see whether it was used after the loop without having previously been redefined after the loop.

Why do I say this would add conceptual complexity? Because, while the documentation might be lacking, I feel like this rule is currently quite easy to explain: "As a matter of style, loop control variables should always be used inside the loop". If we start making special cases for things like this, it becomes harder to explain to users. (If we make a change here without flake8-bugbear also making this change, then we also need to start explaining to users why our implementation differs from theirs.)

I encountered this issue by setting --select B, and I was assuming that meant opt into the recommended B rules, but maybe it means opt into all B rules

Yeah, that will enable all B rules. I'm currently working on a proposal for recategorising ruff's rules, which should help with this kind of confusion.

I'd be happy to help draft the change. Let me know if that would be valuable or assign it to me and I'll make a PR.

Yeah, I think it would be valuable to add a short note explaining that this rule doesn't care if the loop variable is used outside the loop, and illustrating alternative, safer patterns people can use if they encounter this problem.

@MichaReiser
Copy link
Member

If the goal is to ensure people don't access a possibly unbound loop variable outside of the loop, that should probably be a different check.

Agree, although I'm not sure if that should be a lint rule. I think that's something that fits nicely into a type checker (or other correctness static analysis tooling). E.g. I think Pyright can do that for you

@LincolnPuzey
Copy link

If the goal is to ensure people don't access a possibly unbound loop variable outside of the loop, that should probably be a different check.

I was going to open an issue requesting this rule, then saw #13742 already did so.

@xixixao
Copy link

xixixao commented Nov 12, 2024

Can we tell people:

Loop control variable foo not used within loop body. If it's needed outside of it, assign it explicitly to another variable.

As presumably the simple fix is going from:

for a in foo:
  ... # some code here I'm not showing
print(a)

to:

a = None
for b in foo:
  a = b
  ... # some code here I'm not showing
print(a)

(although "simple" is perhaps inaccurate, as the order of the code in the loop and the assignment matters)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Awaiting a decision from a maintainer
Projects
None yet
Development

No branches or pull requests

7 participants