-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
B007 guidance when variable used outside the loop only #10723
Comments
There's some discussion around this pattern in PyCQA/flake8-bugbear#68. It looks like bugbear explicitly decided not to support this. I welcome more opinions here. |
In my opinion, since it's language feature that's been extensively discussed and deemed supported by consensus, default linter settings should be lenient to that position. It feels a bit tyrannical to disallow a pattern just because one tool developer finds the pattern risky. It also feels like B007 is trying to do too much and too little at the same time:
I feel like this rule should be refactored into its separate concerns and check against:
I'd further argue that only the second of those should be enabled by default and the other two should be opt-in, as that use is legitimized by consensus, but even if they're on by default, that would give users the ability to opt out of that behavior. Currently, a user or system cannot opt out of B007 for outside access without opting out for inside access. |
I agree that B007 is smelly, but maybe it should be kept as-is for compatibility, and new rules added following @jaraco's proposal ? The first one could still yield false positives :
Wouldn't it raise if there are two loops with the same variable name ? for i in range(10):
f(i)
# i exists here
# This loop triggers "Loop variable overrides variable in the local scope."
for i in range(10, 20):
g(i) |
@AlexWaygood - wondering if you have an opinion here? |
Thanks for opening the issue @jaraco! I'm not sure I agree that this pattern has been "supported by consensus". If anything, I feel like I see more and more linters coming out against this kind of style because it can be difficult for static-analysis tools to reason about. For example, take the following snippet. A static-analysis tool such as Ruff or mypy cannot know whether from collections.abc import Iterable
def foo(it: Iterable[int]) -> int:
for i in it:
pass
return i + 42 (env) % python -i flake8-pyi/foo.py ~/dev
>>> foo([])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/alexw/dev/flake8-pyi/foo.py", line 6, in foo
return i + 42
^
UnboundLocalError: cannot access local variable 'i' where it is not associated with a value
>>> foo([1])
43 It's for this reason that pyright will emit an error on this snippet ( A small variation which mypy will also complain about if you pass `--enable-error-code=possibly-undefined` to mypy on the command linefrom collections.abc import Iterable
def foo(it: Iterable[int]) -> int:
for i in it:
x = i
return x + 42 A safer pattern (albeit a slightly annoying one for situations where you know the iterable is non-empty even though the static-analysis tool doesn't, such as in your original snippet) is to do something like this: from collections.abc import Iterable
def foo(it: Iterable[int]) -> int:
i: int | None = None
for i in it:
pass
assert i is not None, "Empty iterable was passed to foo()!"
return i + 42 The flake8-bugbear rules are intended to be opinionated rules, they're not enabled by default, and I think there are good reasons to want to avoid this pattern. So I would personally vote for leaving this rule as-is. (We could certainly explore improving the documentation for the rule, however.) |
Indeed, and using the variable in the loop doesn't help. Modifying your example slightly: from collections.abc import Iterable
def foo(it: Iterable[int]) -> int:
for i in it:
log(i)
return i + 42 from collections.abc import Iterable
def foo(it: Iterable[int]) -> int:
for _i in it:
pass
return _i + 42 In either case, the code no longer fails
Oh. Maybe that's my mistake. I encountered this issue by setting
I'd be happy to help draft the change. Let me know if that would be valuable or assign it to me and I'll make a PR. |
Agreed that if this were the goal, then it should be enforced in a more uniform way via a separate rule. It doesn't look like we currently have such a rule, but I think we should consider adding one. However, the question (as I see it) isn't whether it should be the goal of this check to enforce that people don't access a possibly-unbound loop variable outside of the loop -- B007 is not currently complaining on the use of the variable outside of the loop. The original report was that using the variable after the loop did not affect whether ruff (via B007) complained that the loop variable was unused within the loop. So the question, as I see it, is whether we should add complexity -- both complexity in terms of the rule's implementation, and conceptual complexity in what the rule is doing -- in order to account for a coding pattern that is reasonably rare, and is probably not advisable in the first place (accessing a loop variable outside the loop). Why do I say this would add complexity to the implementation? Because currently, the rule only analyses the loop itself to see whether the variable is used within the loop: ruff/crates/ruff_linter/src/rules/flake8_bugbear/rules/unused_loop_control_variable.rs Lines 88 to 94 in d467aa7
If we wanted to make sure that the rule was not emitted if the variable was used after the loop, we would have to analyse the whole scope in which the loop was defined to see whether it was used after the loop without having previously been redefined after the loop. Why do I say this would add conceptual complexity? Because, while the documentation might be lacking, I feel like this rule is currently quite easy to explain: "As a matter of style, loop control variables should always be used inside the loop". If we start making special cases for things like this, it becomes harder to explain to users. (If we make a change here without flake8-bugbear also making this change, then we also need to start explaining to users why our implementation differs from theirs.)
Yeah, that will enable all
Yeah, I think it would be valuable to add a short note explaining that this rule doesn't care if the loop variable is used outside the loop, and illustrating alternative, safer patterns people can use if they encounter this problem. |
Agree, although I'm not sure if that should be a lint rule. I think that's something that fits nicely into a type checker (or other correctness static analysis tooling). E.g. I think Pyright can do that for you |
I was going to open an issue requesting this rule, then saw #13742 already did so. |
Can we tell people:
As presumably the simple fix is going from: for a in foo:
... # some code here I'm not showing
print(a) to: a = None
for b in foo:
a = b
... # some code here I'm not showing
print(a) (although "simple" is perhaps inaccurate, as the order of the code in the loop and the assignment matters) |
Bugbear returns a B007 for this loop when
i
is used solely outside the loop, a usage that appears to be valid and have endorsement from the community.Somewhat strangely, renaming the variable to
_i
in both places does suppress the error, but that feels like the wrong "fix", since prefixing with_
is an indicator of "unused".In this particular example, it's legacy code that I've inherited and I plan to rewrite it using a retry wrapper and avoiding an index variable altogether.
Nevertheless, it seems to me that in general, B007 should not alert here or if it does, the documentation should include guidance on what to do in this scenario.
Related: #2509.
Ruff 0.3.4
The text was updated successfully, but these errors were encountered: