Skip to content

Conversation

ilevkivskyi
Copy link
Member

@ilevkivskyi ilevkivskyi commented Oct 12, 2025

Ref #933

Instead of processing SCCs layer by layer, we will now process an SCC as soon as it is ready. This logic is easier to adapt for parallel processing, and should get us more benefit from parallelization (as more SCCs can be processed in parallel). I tried to make order with single worker stable and very similar (or maybe even identical) to the current order.

Note I already add some methods to the build manager to emulate parallel processing, but they are not parallel yet.

This comment has been minimized.

@ilevkivskyi
Copy link
Member Author

Hm, the mypy_primer looks unexpected. The previous errors look like obvious false positives, not sure what caused those. I will try to dig a bit deeper tomorrow, but if I will not find anything, I think it is fine to ignore (since as I said errors were false positives).

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exciting to see the foundation for parallel checking moving forward!

Not a full review -- did a quick pass only. I wonder if the missing errors might indicate that some modules aren't processed -- seems like worth double checking that no modules aren't silently ignored, just in case.

mypy/build.py Outdated
if fresh:
done = fresh
else:
done, processing = manager.get_done(graph)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment explaining that this processes some SCCs / waits for some SCCs to be done? When reading this for the first time, I thought this would just get whatever is already done right now, but it actually does some work. Or maybe the method name could be more descriptive.

Maybe rename processing to has_more or similar -- at first I was confused by it.

If we'd be doing parallel processing, would this perform a busy loop while waiting for things to happen? Or would get_done wait until something has finished processing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd be doing parallel processing, would this perform a busy loop while waiting for things to happen? Or would get_done wait until something has finished processing?

I was thinking about something like select() waiting for worker sockets. So this will block until some stale SCC(s) get finished (since we know there is nothing else we can progress). IIUC select() should not busy loop (and also I guess there is a Windows equivalent for this if we would use named pipes).

Btw I think my initial implementation will be Linux/Mac only, I was reading more about it, and the situation there is a bit of a mess. There are already a bunch of other things to worry, so I guess it is OK to add Windows support later.

@ilevkivskyi
Copy link
Member Author

I wonder if the missing errors might indicate that some modules aren't processed -- seems like worth double checking that no modules aren't silently ignored, just in case

Yeah, I was going to clone that repo and do this.

@ilevkivskyi
Copy link
Member Author

ilevkivskyi commented Oct 13, 2025

@JukkaL

I wonder if the missing errors might indicate that some modules aren't processed -- seems like worth double checking that no modules aren't silently ignored, just in case

Yeah, I was going to clone that repo and do this.

OK, so I checked and I think this is a bug in mypy_primer.

When I check:

  • I see a bunch of obvious false positives (of the kind module has no attribute) on both 1.18.2 and on this PR.
  • I see that on my PR that file is indeed processed (by introducing a type error there).

IIUC the problem is that we are supposed to install scipy-stubs itself, before we can run assert_type()-style tests on it. Otherwise we will pick up scipy package (which it depends on). That said:

  • I am not sure why this PR makes mypy "prefer" some of the stubs without installing them. Maybe this is some subtlety caused by module processing order?
  • I noticed that after I installed scipy-stubs some caches were not invalidated, so I see Library stubs not installed after I literally just installed them. I guess it is confusing for mypy to have all three: scipy, scipy-stubs installed, and scipy-stubs passed as sources.

If you look at the scipy-stubs CI, they do everything right, they have two separate jobs:

  • One does mypy scipy-stubs only (i.e. type-checks the stubs)
  • Another does pip install . && mypy tests (i.e. tests the installed stubs)

so mypy_primer should do something similar instead of mypy . (maybe also for other xxx-stubs repos) cc @hauntsaninja

This comment has been minimized.

@ilevkivskyi
Copy link
Member Author

@JukkaL it would be great if you can take a look at this rather sooner than later. I have two other PRs stacked on top of this that make cache meta files ~4x smaller.

@ilevkivskyi
Copy link
Member Author

@JukkaL also ping on this one. I f there are no more comments, I will be merging this later today.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 16, 2025

I wanted to try test this on the big codebase at work, but I can test it after merging as well. I'll review this now (feel free to merge if I haven't left a review in the next few hours).

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two potential performance issues, otherwise looks good.

mypy/build.py Outdated
if fresh:
manager.trace(f"Queuing {fresh_msg} SCC ({scc_str})")
manager.trace(f"Found {fresh_msg} SCC ({scc_str})")
scc = order_ascc_ex(graph, ascc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be pretty slow for large SCCs. If there are no errors, we can skip the order_ascc_ex call (which is most of the time, esp. for third-party deps). I.e. add a check if any module has errors, and only perform the ordering op if that's true. I think we'd only need to do the ordering if more than one module has errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.

if missing_sccs:
# Load missing SCCs from cache.
fresh_sccs_to_load = [
manager.scc_by_id[sid] for sid in manager.top_order if sid in missing_sccs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is linear (does a for loop over all SCCs). I think this means that the incremental logic is O(n**2). I'm not sure if this will be a real issue, but at least it would be good to have a TODO comment about it. We could make this faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess an alternative would be making top_order a dict, and doing something like sorted(missing_sccs, key=lambda scc: manager.top_order[scc]), or do you have something else in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I added a TODO for now, I can fix it in a follow up PR if needed.

Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

scipy-stubs (https://github.com/scipy/scipy-stubs)
- tests/datasets/test_utils.pyi:6: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:6: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:9: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:9: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:9: error: Module has no attribute "ascent"  [attr-defined]
- tests/datasets/test_utils.pyi:10: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:10: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:10: error: Module has no attribute "ascent"  [attr-defined]
- tests/datasets/test_utils.pyi:11: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:11: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:11: error: Module has no attribute "ascent"  [attr-defined]
- tests/datasets/test_utils.pyi:14: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:14: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:14: error: Module has no attribute "electrocardiogram"  [attr-defined]
- tests/datasets/test_utils.pyi:15: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:15: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:15: error: Module has no attribute "electrocardiogram"  [attr-defined]
- tests/datasets/test_utils.pyi:16: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:16: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:16: error: Module has no attribute "electrocardiogram"  [attr-defined]
- tests/datasets/test_utils.pyi:19: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:19: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:19: error: Module has no attribute "face"  [attr-defined]
- tests/datasets/test_utils.pyi:20: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:20: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:20: error: Module has no attribute "face"  [attr-defined]
- tests/datasets/test_utils.pyi:21: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:21: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:21: error: Module has no attribute "face"  [attr-defined]
- tests/datasets/test_utils.pyi:24: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:24: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:24: error: Module has no attribute "ascent"  [attr-defined]
- tests/datasets/test_utils.pyi:24: error: Module has no attribute "electrocardiogram"  [attr-defined]
- tests/datasets/test_utils.pyi:24: error: Module has no attribute "face"  [attr-defined]
- tests/datasets/test_utils.pyi:26: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_utils.pyi:26: error: Module has no attribute "clear_cache"  [attr-defined]
- tests/datasets/test_utils.pyi:26: error: Module has no attribute "ascent"  [attr-defined]
- tests/datasets/test_utils.pyi:26: error: Module has no attribute "electrocardiogram"  [attr-defined]
- tests/datasets/test_utils.pyi:26: error: Module has no attribute "face"  [attr-defined]
- tests/datasets/test_fetchers.pyi:7: error: Expression is of type "Any", not "ndarray[tuple[int, int], dtype[unsignedinteger[_8Bit]]]"  [assert-type]
- tests/datasets/test_fetchers.pyi:7: error: Module has no attribute "ascent"  [attr-defined]
- tests/datasets/test_fetchers.pyi:9: error: Expression is of type "Any", not "ndarray[tuple[int], dtype[float64]]"  [assert-type]
- tests/datasets/test_fetchers.pyi:9: error: Module has no attribute "electrocardiogram"  [attr-defined]
- tests/datasets/test_fetchers.pyi:11: error: Expression is of type "Any", not "ndarray[tuple[int, int, int], dtype[unsignedinteger[_8Bit]]]"  [assert-type]
- tests/datasets/test_fetchers.pyi:11: error: Module has no attribute "face"  [attr-defined]
- tests/datasets/test_fetchers.pyi:12: error: Expression is of type "Any", not "ndarray[tuple[int, int, int], dtype[unsignedinteger[_8Bit]]]"  [assert-type]
- tests/datasets/test_fetchers.pyi:12: error: Module has no attribute "face"  [attr-defined]
- tests/datasets/test_fetchers.pyi:13: error: Expression is of type "Any", not "ndarray[tuple[int, int], dtype[unsignedinteger[_8Bit]]]"  [assert-type]
- tests/datasets/test_fetchers.pyi:13: error: Module has no attribute "face"  [attr-defined]
- tests/datasets/test_download_all.pyi:6: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_download_all.pyi:6: error: Module has no attribute "download_all"  [attr-defined]
- tests/datasets/test_download_all.pyi:7: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_download_all.pyi:7: error: Module has no attribute "download_all"  [attr-defined]
- tests/datasets/test_download_all.pyi:8: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_download_all.pyi:8: error: Module has no attribute "download_all"  [attr-defined]
- tests/datasets/test_download_all.pyi:9: error: Expression is of type "Any", not "None"  [assert-type]
- tests/datasets/test_download_all.pyi:9: error: Module has no attribute "download_all"  [attr-defined]

@ilevkivskyi ilevkivskyi merged commit e1643ae into python:master Oct 16, 2025
20 checks passed
@ilevkivskyi ilevkivskyi deleted the dynamic-scc-order branch October 16, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants