This repository has been archived by the owner on Jan 26, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* First draft affine batch ops & wnaf * changes to mutability and lifetimes * delete superfluous files * crazy direction: Passing a FnMut to generate an iterator locally * unsuccessful further attempts * compile sucess using index approach * fixes for mutable borrows * Successfully passed scalar mul test * benchmarks + prefetching * stash * generic impl of batch arith for all affinecurves * batched affine formulas for TE - too expensive * improved TE affine * cleanup batch inversion * fmt... * fix minor error * remove debugging scaffolding * fmt... * delete batch arith bench as not suitable for criterion or bench * fix bench removal errors * fmt... * added missing coeff_a * refactor BatchGroupArithmetic to be separate trait * Batch verification with radix sort * Cache-locality & parallelisation * Successfully impl batch verify * added tests and bench for batch_ver, parallel_random_gen, ^ thread util * fmt * enabled missing test * remove voracious_radix_sort * commented unneeded Instant::now() * Fixed batch_ver tests for curves of small or unit cofactor * split recursive and non-recursive, tidy up shared functionality * reduce max_logn * adjust max_logn further * Batch MSM, speedup only for bw6 due to poor cache performance * fmt... * GLV iBiginteger * stash * stash * GLV with Parameter-based specialisation * GLV lattice basis script success * Successfully passed tests and benched * Improvments to MSM with and bucketed adds using lightweight index sort * changed rng to be external parameter for non-parallel batch veri * remove bench print scaffolding * remove old batch_bucketed_add using vectors instead of fixed offsets * retain parallel batch_add_split * Comments for batch arith * remove need for hashmap for no std for batch_bucketed_add * minor changes * cleanup * cleanup * fmt + use no_std Vec * removed std:: * add scratch space * Add GLV for non-batched SW mul * fix for glv_scalar_decomposition when k == MODULUS (subgroup check) * Fixed performance BUG: unnecessary table generation * GLV -> has_glv(), bigint slice bd check, refactor batch loops, u32 index * clean remove of batch_verify * fix mistake with elems indexing, unused arg for future recursion PR * trivial errors * more minor fixes * fix issues with batch_ver (.is_zero(), TE affine->proj mul) * fix issue with batch_bucketed_add_split * misname * Success in test and bench \(*v*)/ * tmp commit to cache experimental batch_add_write_shift_.. * remove batch_add_write_shift.. * optional dep, fmt... * undo accidental deletion of dlsd sort * fmt... * cleanup batch bucket add, unify impl * no std... * fixed tests * fixed unimplemented for TE, swapped wnaf table row/col for batchaddwrite * wnaf table generation uses fewer copies, remove timing instrumentation * Minor Cleanup * Add feature-activated timing instrumentation, reduce code bloat (wnaf) * unused var, no_std * Make timing macros defined globally, instrument more code * instrument w/ tid, better num_rounds est. f64, timing black/whitelisting * Minor changes * refactor tests, generic MSM test * 2D test matrix :) * batchaffine * tests * additive features * big_n feature for test-benching * prefetch unroll * minor adjustments * extension(s -> "")_fields * remove artifacts, fix asm * uncomment subgroup checks, glv param sources * gpu scalar mul * fix dependency issues * Extend GPU scalar mul to all curves * refactor * CPU + GPU coprocessing * With suboptimal BW6 assembly * add static partitioning * profiling-based static partitioining * statically partition between multiple gpus * comments * BBaseField -> BaseFieldForBatch * Outline of basic traits * Remove sw_proj, add gpu support for all sw projective curves * impl gpu kernels for all curves * feature-gate with "cuda" * rename curves/gpu directory to curves/cuda * Fix merge errors * Use github rather than local jon-chuang/accel * again * again * update README * feature = "cuda" * gpu_standalone (good for non-generic), feature gate under cuda too * fix merging errors * make helpers a same-file module * remove cancerous --all-features from github yml * Use dummy accel_dummy crate for when not compiling as CUDA * feature gate accel import * fix no_std * fix gpu-standalone does not depend algebra-core/cuda * lazy static optional * kernel-specific static profile data * cuda test, cached profile data (in OS cache dir) for all curves * rectify omission of NAMESPACE, minor errors * fix no_std, group size in bits too large for 2 groups (mnt6, cp6 - Fq3) * toml fixes * update README * remove extraneous file * bake in check for oversized group elems * typo * remove boilerplate/compactify * remove standalone * fmt * fix println and comments * fix: typo * Update README.md Co-authored-by: Kobi Gurkan <[email protected]> * Make GPUScalarMulInternal APIs, only expose two APIs exposing more APIs is future work * add ci to test cuda compilation/link and cuda scalar mul when no gpu * change kernel accel compile branch to master * fix ci * use unreachable instead of empty implementation * install required toolchain * Empty commit to get CI working * try to fix ci * fmt * fix ci * safer error handling in gpu code * fix ci * handle dirs crate not available without cuda * don't check early intermediate results * fix no_std and nightly * fix remaining errors * No for_tests * Feature gate clear profile data * install cuda library to successfully link * change the order of CI jobs * change the order of CI again * cd .. * Get rid of cacheing * Never all features * Put back cacheing * Remove cuda .deb to save disk space * Increase max-parallel * check examples with all features Co-authored-by: Kobi Gurkan <[email protected]>
- Loading branch information