-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Route choice additions #508
Route choice additions #508
Conversation
* Update graph.py * .
…ro/improves_dependencies
…pendencies Improves dependencies
I think a summary of my reading today would be useful. As I see it we have 3 real approaches to this.
Overall option Optimistically I think |
EXCELLENT summary, @Jake-Moss . I am inclined to say that option 2 is indeed the ideal, but there is no absolute need to have the saving of path files to be incredibly fast when performing assignment. Maybe writing choice sets to be used later would be a welcome option, @jamiecook . |
I've got an idea for using |
Nice summary @Jake-Moss. Batching sounds reasonable, another approach could potentially be to not care about number of files and related RowGroups while processing data, but instead adding a post-processing step at the end. |
I've just pushed the first form of real saving, test compatibility will come in a few moments. I've managed to implement all the desired features with one real catch, writing to disk requires the gil, it is however, multi threaded on pyarrows end. I attempted to keep all the disk writing and data transformation in Cython, interop-ing with the Pyarrow Cython and C++ API, however I've found the the Cython API is incredibly undocumented and fragile. Unfortunately working with the C++ API has its caveats and difficulties. I wasn't able to decipher it, perhaps someone with more C++ experience might have been able to, but I'm not sure the investment would be worth the rewards, I don't expect having to reacquire the gil after each batch to have a severe impact on runtime.
Due to the inflexibilities of the API, or rather my inability to bend it to my will, I've changed the data model slightly. Instead of having a row per origin id, which has a MapArray of destination ids to lists, we now have a row per OD pair with the list as top level value. I think is this both more flexible from a user perspective and easier to reason about. AFAIK this is just an overall better way to store things. A small implementation detail to be aware of is that the batching for the disk writing requires that a whole partition be written at once. That is all OD pairs with a specific O must be computed, and dumped to disk at once. Otherwise the previous results will be overwritten. This isn't really an issue and is handled already but it's worth noting that the batching is as granular as it previously was. It also means that attempting to append to an existing dataset using an existing origin value will overwrite the stored results. The current implementation allows for,
|
In what I consider a bad move, the PyPi wheel installation of pyarrow requires modifying the installation environment to create symlinks for the linker.
For the modifications to CI |
Code is unprofiled, I'm not sure if this is the best approach but it works well. It works by stacking all the paths in a route set into a big vector, then sorting it. By sorting it all the links become grouped and we can just count their occurrences. This has the added benefit of the resulting frequency arrays being sorted so we can bisect then later. Generally this has really simple memory accesses and is easy to read. Another possible implementation might be to sort each path individually, then walk and merge them all (not adding duplicates). This is trickier, requires a lot of book keeping to walk n arrays correctly, upside is lower memory usage provided we sort inplace, if not then it should be the same.
Regarding the CI, I attempted to fix it locally but had no luck. It's all build issues related to gcc failing to find |
Despite CI failures, all tests pass locally with 78.80% coverage |
Moves the computation of the path sized logit inside the main multithreaded loop, this lets us batch them as well and dump them to disk along with the reset of the tables. Enable multithreading by default. Catch a handle of memory leaks.
This reverts commit e3ea0c6.
* OSM Geo-processing using GeoPandas * OSM Geo-processing using GeoPandas * OSM Geo-processing using GeoPandas * OSM Geo-processing using GeoPandas * Processing with chunking * Processing with chunking * Processing with chunking * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * pandas deprecations * improves griding function * adds network cleaning * adds network cleaning * Allows for use of polygons with rings inside * Allows for use of polygons with rings inside * adjusts types * Update pyproject.toml Co-authored-by: Jake Moss <[email protected]> * . * . --------- Co-authored-by: pveigadecamargo <[email protected]> Co-authored-by: Jake Moss <[email protected]>
* Removes legacy code * updates versions --------- Co-authored-by: pveigadecamargo <[email protected]>
commit 9337fb611463606b0ca89d1270210fa7fdf46714 Author: Jake-Moss <[email protected]> Date: Tue Feb 27 11:22:48 2024 +1000 . commit 3f2c01b Author: Jake-Moss <[email protected]> Date: Tue Feb 27 10:09:51 2024 +1000 I give up commit c77d5e6 Author: Jake-Moss <[email protected]> Date: Tue Feb 27 10:03:52 2024 +1000 . commit 9dc3650 Author: Jake-Moss <[email protected]> Date: Tue Feb 27 09:58:40 2024 +1000 . commit b4b945c Author: Jake-Moss <[email protected]> Date: Tue Feb 27 09:53:10 2024 +1000 . commit e0c32f2 Author: Jake-Moss <[email protected]> Date: Tue Feb 27 09:38:29 2024 +1000 . commit eedb859 Author: Jake-Moss <[email protected]> Date: Tue Feb 27 09:30:20 2024 +1000 . commit d0def11 Author: Jake-Moss <[email protected]> Date: Tue Feb 27 09:27:33 2024 +1000 . commit a6b2a8e Merge: 0548be6 77edeae Author: Jake Moss <[email protected]> Date: Tue Feb 27 09:19:30 2024 +1000 Merge branch 'develop' into pedro/ci_test commit 0548be6 Author: Jake-Moss <[email protected]> Date: Tue Feb 27 09:13:07 2024 +1000 Macos test commit a10c791 Author: Jake-Moss <[email protected]> Date: Wed Feb 21 20:00:01 2024 +1000 Maybe fix MacOS again commit cad0579 Author: Jake-Moss <[email protected]> Date: Wed Feb 21 19:43:00 2024 +1000 Maybe MacOS fix commit 24de8f7 Author: Jake-Moss <[email protected]> Date: Wed Feb 21 19:31:08 2024 +1000 Hopefully fix CI commit 66ffcc3 Author: pveigadecamargo <[email protected]> Date: Wed Feb 21 17:01:57 2024 +1000 . commit f934c34 Author: pveigadecamargo <[email protected]> Date: Wed Feb 21 16:41:36 2024 +1000 . commit 9f2ca20 Author: pveigadecamargo <[email protected]> Date: Wed Feb 21 16:30:59 2024 +1000 . commit ba0d882 Author: pveigadecamargo <[email protected]> Date: Wed Feb 21 16:27:14 2024 +1000 . commit 315cbce Author: Pedro Camargo <[email protected]> Date: Wed Feb 21 01:01:23 2024 +1000 Update pyproject.toml commit d58e7cc Author: Pedro Camargo <[email protected]> Date: Wed Feb 21 00:58:54 2024 +1000 Update pyproject.toml commit dd6723f Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 23:02:05 2024 +1000 . commit f7ae37e Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 22:58:19 2024 +1000 . commit 0f954e4 Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 22:54:21 2024 +1000 . commit 3dafd88 Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 22:27:05 2024 +1000 . commit ebe3a19 Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 22:12:39 2024 +1000 . commit 9f4413e Merge: daf48a5 2e3c234 Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 22:11:22 2024 +1000 Merge branch 'develop' of github.com:AequilibraE/aequilibrae into pedro/ci_test commit daf48a5 Merge: 3df73da ccb9cfa Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 22:11:13 2024 +1000 . commit 3df73da Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 22:10:13 2024 +1000 . commit 9448e76 Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 22:10:05 2024 +1000 adds emulation commit c9f2aaa Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 21:20:12 2024 +1000 adds emulation commit b2d5d3d Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 21:16:51 2024 +1000 adds emulation commit 2458f9a Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 21:02:26 2024 +1000 adds emulation commit e9e660b Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 20:54:57 2024 +1000 adds emulation commit f112262 Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 20:42:13 2024 +1000 Add ARM architectures for Linux and mac commit bae7d0d Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 20:28:15 2024 +1000 tests cibuildwheels commit 293457a Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 20:09:43 2024 +1000 tests cibuildwheels commit 06b6a44 Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 19:31:21 2024 +1000 tests cibuildwheels commit 0ebce09 Author: pveigadecamargo <[email protected]> Date: Tue Feb 20 19:23:39 2024 +1000 tests cibuildwheels
Merging this to make a clean slate for new API and assignment changes |
Migrated from Jake-Moss#1
I don't think the approach I currently have is the way to go but it functions as a POC. I'll revamp it tomorrow utilising the table more now that I have a better understanding.
Ideally we'd be able to
Currently, the approach I had isn't able to be partitioned, although I think it does satisfy the other two.
To allow partitioning I think I'll have to move the origin index up to the main table, this would also let me drop the
column index
field and the struct entirely. But I think this would force the use of string keys for the columns in the table