The goal of this repo is to study the speed increases that I as a particular Python dev can achieve by going from 'naive Python' to using either more pythonic tools, using C-powered libraries like numpy, or by at the extreme using straight up C-code. The differences are benchmarked with a simple recursive Sudoku solver.
If you want to read of my (mis)adventures in this in a more verbose form,
turn to reading ProgressDescription.md
which is
a more diary-like description of what I've been doing.
C is faster than Python. There is very little question about that. But there is a much smaller difference between a Python program correctly using e.g. numpy, pandas or one of the other fast libraries with a C back-end. In particular, in many cases (including mine) the important question is whether I the Python programmer could write C-code that runs faster than the implementations that rely on the Python libraries whose C backends have been written by professional C developers.
We're gonna look at a Sudoku solver. Right now there are the following Python versions (in the ./src
-folder):
- Simple naive solver (
naive_sudoku_solver.py
) - OOP version of the naive solver (
OOP_sudoku.py
) - Improved/optimized OOP version of the OOP solver (
OOP_sudoku_improved.py
) - Numpy-based version of the solver (
numpy_sudoku.py
)
Besides these, we have a C-implementation (./C-version/sudoku_solver.c
) which accepts sudokus either via .csv file addressess or cli input strings. Currently the only way to call this from Python
is via the naive version of using Python to run bash commands, see ./src/CLI_C_caller.py
.
Still in progress
- Version of the C-solver which allows us to call it as a total or some of its subparts as Python packages.
- Cython implementation of the OOP solver.
To run the more naive C-approaches, you'll probably have to run this on Linux as the
simplest Python C-utilization just runs bash commands of the type "./C-version/a.out -r {input_sudoku} -S"
. For these to work, run gcc sudoku_solver.c
in the folder ./C-version
. (I am very far from a C-developer, so caveats with all my advice in this.)
To get the Python C-extension moduler running, after activating your virtual environment,
run python setup.py install
in the folder ./C-version
.
conda create --name SudokuSpeedBenchmark python=3.10
conda activate SudokuSpeedBenchmark
pip install requirements.txt
cd C-version
python setup.py install
cd ..
python benchmark.py
For Python:
coverage run -m pytest
coverage report -m
Currently all python tests have 100% coverage.
For C-code, go to folder ./C-version
, (compile) and run the result with
option -t
, e.g.:
gcc sudoku_solver.c
./a.out -t
If this doesn't crash, then probably tests are okay. The tests here are more troubleshoot-tests than TDD-tests, and can be turned on or off by commenting lines in the C source code.
- Should not have imported fixtures in many places,
conftest.py
should be made instead. - Test ideology should probably move from all of the unit tests to more workflow-based thing. Especially as with coverage we can make sure that all edge cases are covered.
- Could the
hypothesis
library be used here?