B007 guidance when variable used outside the loop only #10723

jaraco · 2024-04-01T21:34:57Z

Bugbear returns a B007 for this loop when i is used solely outside the loop, a usage that appears to be valid and have endorsement from the community.

jaraco/postgres/__init__.py:507:13: B007 Loop control variable `i` not used within loop body

Somewhat strangely, renaming the variable to _i in both places does suppress the error, but that feels like the wrong "fix", since prefixing with _ is an indicator of "unused".

In this particular example, it's legacy code that I've inherited and I plan to rewrite it using a retry wrapper and avoiding an index variable altogether.

Nevertheless, it seems to me that in general, B007 should not alert here or if it does, the documentation should include guidance on what to do in this scenario.

Related: #2509.
Ruff 0.3.4

The text was updated successfully, but these errors were encountered:

charliermarsh · 2024-04-02T03:47:10Z

There's some discussion around this pattern in PyCQA/flake8-bugbear#68. It looks like bugbear explicitly decided not to support this. I welcome more opinions here.

jaraco · 2024-04-02T15:15:44Z

In my opinion, since it's language feature that's been extensively discussed and deemed supported by consensus, default linter settings should be lenient to that position. It feels a bit tyrannical to disallow a pattern just because one tool developer finds the pattern risky.

It also feels like B007 is trying to do too much and too little at the same time:

It raises when the loop variable is not used inside the loop, but
It allows the variable to be marked as private, then not used in the loop, but still used outside the loop, and
It doesn't protect against masking another variable in the local namespace, and
If the variable is used in the loop, there's no longer any concern with it being used outside the loop.

I feel like this rule should be refactored into its separate concerns and check against:

Loop variable overrides variable in the local scope.
Loop variable is not used at all in the surrounding scope.
Loop variable is used solely outside the loop (and thus may be unbound).

I'd further argue that only the second of those should be enabled by default and the other two should be opt-in, as that use is legitimized by consensus, but even if they're on by default, that would give users the ability to opt out of that behavior. Currently, a user or system cannot opt out of B007 for outside access without opting out for inside access.

Nodd · 2024-04-02T15:48:57Z

I agree that B007 is smelly, but maybe it should be kept as-is for compatibility, and new rules added following @jaraco's proposal ?

The first one could still yield false positives :

Loop variable overrides variable in the local scope.

Wouldn't it raise if there are two loops with the same variable name ?

for i in range(10):
    f(i)
# i exists here

# This loop triggers "Loop variable overrides variable in the local scope."
for i in range(10, 20):
    g(i)

charliermarsh · 2024-04-02T16:37:53Z

@AlexWaygood - wondering if you have an opinion here?

AlexWaygood · 2024-04-02T17:34:24Z

Thanks for opening the issue @jaraco!

I'm not sure I agree that this pattern has been "supported by consensus". If anything, I feel like I see more and more linters coming out against this kind of style because it can be difficult for static-analysis tools to reason about. For example, take the following snippet. A static-analysis tool such as Ruff or mypy cannot know whether i is bound after the loop has completed or not, as an empty iterable might have been passed to the function (in which case i will be unbound), or a nonempty iterable might have been passed (in which case i will be bound):

from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    for i in it:
        pass
    return i + 42

(env) % python -i flake8-pyi/foo.py                                                                                                                                                                              ~/dev
>>> foo([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/alexw/dev/flake8-pyi/foo.py", line 6, in foo
    return i + 42
           ^
UnboundLocalError: cannot access local variable 'i' where it is not associated with a value
>>> foo([1])
43

It's for this reason that pyright will emit an error on this snippet (reportPossiblyUnboundVariable). (Mypy's possibly-undefined optional error code doesn't emit an error on this exact snippet -- I actually think that's possibly a mypy bug, but I digress -- but it does emit an error on a small variation of it.)

A small variation which mypy will also complain about if you pass `--enable-error-code=possibly-undefined` to mypy on the command line

from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    for i in it:
        x = i
    return x + 42

A safer pattern (albeit a slightly annoying one for situations where you know the iterable is non-empty even though the static-analysis tool doesn't, such as in your original snippet) is to do something like this:

from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    i: int | None = None
    for i in it:
        pass
    assert i is not None, "Empty iterable was passed to foo()!"
    return i + 42

The flake8-bugbear rules are intended to be opinionated rules, they're not enabled by default, and I think there are good reasons to want to avoid this pattern. So I would personally vote for leaving this rule as-is. (We could certainly explore improving the documentation for the rule, however.)

jaraco · 2024-04-02T20:27:38Z

[tools] cannot know that i is bound

Indeed, and using the variable in the loop doesn't help. Modifying your example slightly:

from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    for i in it:
        log(i)
    return i + 42

from collections.abc import Iterable

def foo(it: Iterable[int]) -> int:
    for _i in it:
        pass
    return _i + 42

In either case, the code no longer fails B007, but still is subject to the same failure with foo([]). If the goal is to ensure people don't access a possibly unbound loop variable outside of the loop, that should probably be a different check.

they're not enabled by default

Oh. Maybe that's my mistake. I encountered this issue by setting --select B, and I was assuming that meant opt into the recommended B rules, but maybe it means opt into all B rules, in which case, my opt-in suggestion was misguided. Reading the docs briefly, that seems to be the case.

We could certainly explore improving the documentation for the rule, however.

I'd be happy to help draft the change. Let me know if that would be valuable or assign it to me and I'll make a PR.

AlexWaygood · 2024-04-03T13:58:50Z

If the goal is to ensure people don't access a possibly unbound loop variable outside of the loop, that should probably be a different check.

Agreed that if this were the goal, then it should be enforced in a more uniform way via a separate rule. It doesn't look like we currently have such a rule, but I think we should consider adding one.

However, the question (as I see it) isn't whether it should be the goal of this check to enforce that people don't access a possibly-unbound loop variable outside of the loop -- B007 is not currently complaining on the use of the variable outside of the loop. The original report was that using the variable after the loop did not affect whether ruff (via B007) complained that the loop variable was unused within the loop. So the question, as I see it, is whether we should add complexity -- both complexity in terms of the rule's implementation, and conceptual complexity in what the rule is doing -- in order to account for a coding pattern that is reasonably rare, and is probably not advisable in the first place (accessing a loop variable outside the loop).

Why do I say this would add complexity to the implementation? Because currently, the rule only analyses the loop itself to see whether the variable is used within the loop:

ruff/crates/ruff_linter/src/rules/flake8_bugbear/rules/unused_loop_control_variable.rs

Lines 88 to 94 in d467aa7

    
           let used_names = { 
        
               let mut finder = NameFinder::default(); 
        
               for stmt in &stmt_for.body { 
        
                   finder.visit_stmt(stmt); 
        
               } 
        
               finder.names 
        
           };

If we wanted to make sure that the rule was not emitted if the variable was used after the loop, we would have to analyse the whole scope in which the loop was defined to see whether it was used after the loop without having previously been redefined after the loop.

Why do I say this would add conceptual complexity? Because, while the documentation might be lacking, I feel like this rule is currently quite easy to explain: "As a matter of style, loop control variables should always be used inside the loop". If we start making special cases for things like this, it becomes harder to explain to users. (If we make a change here without flake8-bugbear also making this change, then we also need to start explaining to users why our implementation differs from theirs.)

I encountered this issue by setting --select B, and I was assuming that meant opt into the recommended B rules, but maybe it means opt into all B rules

Yeah, that will enable all B rules. I'm currently working on a proposal for recategorising ruff's rules, which should help with this kind of confusion.

I'd be happy to help draft the change. Let me know if that would be valuable or assign it to me and I'll make a PR.

Yeah, I think it would be valuable to add a short note explaining that this rule doesn't care if the loop variable is used outside the loop, and illustrating alternative, safer patterns people can use if they encounter this problem.

MichaReiser · 2024-04-04T09:13:49Z

If the goal is to ensure people don't access a possibly unbound loop variable outside of the loop, that should probably be a different check.

Agree, although I'm not sure if that should be a lint rule. I think that's something that fits nicely into a type checker (or other correctness static analysis tooling). E.g. I think Pyright can do that for you

LincolnPuzey · 2024-10-23T08:27:40Z

If the goal is to ensure people don't access a possibly unbound loop variable outside of the loop, that should probably be a different check.

I was going to open an issue requesting this rule, then saw #13742 already did so.

xixixao · 2024-11-12T14:39:41Z

Can we tell people:

Loop control variable foo not used within loop body. If it's needed outside of it, assign it explicitly to another variable.

As presumably the simple fix is going from:

for a in foo:
  ... # some code here I'm not showing
print(a)

to:

a = None
for b in foo:
  a = b
  ... # some code here I'm not showing
print(a)

(although "simple" is perhaps inaccurate, as the order of the code in the loop and the assignment matters)

charliermarsh added the needs-decision Awaiting a decision from a maintainer label Apr 2, 2024

jaraco mentioned this issue Apr 2, 2024

B007: False positive if loop control variable is used after the loop PyCQA/flake8-bugbear#68

Closed

roelofvandijk mentioned this issue Jun 27, 2024

fix: unused loop variables tinygrad/tinygrad#5177

Closed

LordAro mentioned this issue Nov 5, 2024

B007 does not detect unused variables when they are shadowed or appear in inner scope #14113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

B007 guidance when variable used outside the loop only #10723

B007 guidance when variable used outside the loop only #10723

jaraco commented Apr 1, 2024

charliermarsh commented Apr 2, 2024

jaraco commented Apr 2, 2024 •

edited

Loading

Nodd commented Apr 2, 2024

charliermarsh commented Apr 2, 2024

AlexWaygood commented Apr 2, 2024 •

edited

Loading

jaraco commented Apr 2, 2024

AlexWaygood commented Apr 3, 2024 •

edited

Loading

MichaReiser commented Apr 4, 2024

LincolnPuzey commented Oct 23, 2024

xixixao commented Nov 12, 2024 •

edited

Loading

B007 guidance when variable used outside the loop only #10723

B007 guidance when variable used outside the loop only #10723

Comments

jaraco commented Apr 1, 2024

charliermarsh commented Apr 2, 2024

jaraco commented Apr 2, 2024 • edited Loading

Nodd commented Apr 2, 2024

charliermarsh commented Apr 2, 2024

AlexWaygood commented Apr 2, 2024 • edited Loading

jaraco commented Apr 2, 2024

AlexWaygood commented Apr 3, 2024 • edited Loading

MichaReiser commented Apr 4, 2024

LincolnPuzey commented Oct 23, 2024

xixixao commented Nov 12, 2024 • edited Loading

jaraco commented Apr 2, 2024 •

edited

Loading

AlexWaygood commented Apr 2, 2024 •

edited

Loading

AlexWaygood commented Apr 3, 2024 •

edited

Loading

xixixao commented Nov 12, 2024 •

edited

Loading