Releases · NVIDIA/cccl

03 Mar 21:03

wmaxey

v2.8.0

6d02e11

CCCL 2.8.0 Latest

Latest

What's Changed

Adds benchmarks for DeviceSelect::Unique by @elstehle in #2359
CUB - Enable DPX Reduction by @fbusato in #2286
[CUDAX] add a small c++17 implementation of std::execution (aka P2300) by @ericniebler in #2301
Add thurst::transform_inclusive_scan with init value by @gonidelis in #2326
Widen histogram agent constructor to more types by @bernhardmgruber in #2380
Use a constant for the amount of static SMEM by @bernhardmgruber in #2374
Add cub::DeviceTransform by @bernhardmgruber in #2086
Update toolkit to CTK 12.6 by @miscco in #2348
implement make_integer_sequence in terms of intrinsics whenever possible by @ericniebler in #2384
Implement cuda::mr::cuda_async_memory_resource by @miscco in #1637
Drop implementation of thrust::pair and thrust::tuple by @miscco in #2395
Pull out _LIBCUDACXX_UNREACHABLE into its own file by @miscco in #2399
Share common compiler flags in new CCCL-level targets. by @alliepiper in #2386
conditionally include <crt/host_defines.h> from __cccl/execution_space.h header by @ericniebler in #2406
add some simple utilities for manipulating lists of types by @ericniebler in #2370
Drop thrusts diagnostic suppression warnings by @miscco in #2392
[PoC]: Implement cuda::experimental::uninitialized_async_buffer by @miscco in #1854
Fix thrust package to work with newer FindOpenMP.cmake. by @alliepiper in #2421
Introduce cccl_configure_target cmake function. by @alliepiper in #2388
Fix sccache errors in RAPIDS builds by @trxcllnt in #2417
Replace CUDA C++ Core Libraries with CUDA Core Compute Libraries (only in README.md). by @rwgk in #2424
Minor cleanup with cuda/atomic by @miscco in #2418
uninitialized_buffer::get_resource returns a ref to an any_resource that can be copied by @ericniebler in #2431
Refactor cuda::ceil_div to take two different types by @miscco in #2376
Reduce PR testing matrix. by @alliepiper in #2436
Implement cudax::shared_resource by @miscco in #2398
Increase the libcu++ timeout by @miscco in #2435
Move c/include/cccl/.h files to c/include/cccl/c/.h by @rwgk in #2428
Make any_resource emplacable by @miscco in #2425
Fix issues with __host__ and __device__ definitions by @miscco in #2413
Make bit_cast play nice with extended floating point types by @miscco in #2434
Do not include our own string.h file by @miscco in #2444
Move nightly time by @bdice in #2437
Remove a ton of lines in thrust tests by @gonidelis in #2356
[CUDAX] Add placeholder green context type and logical device that can hold both a green ctx and a device by @pciolkosz in #2446
Fix typo in CCCLBuildCompilerTargets.cmake by @alliepiper in #2453
Drop superflous compile definition from thrust tests by @miscco in #2450
Consolidate packages and install rules by @alliepiper in #2456
Prune CUB's ChainedPolicy by CUDA_ARCH_LIST by @bernhardmgruber in #2154
fixes merge conflict for policy pruning by @elstehle in #2466
Add CCCL_ENABLE_WERROR flag. by @alliepiper in #2463
Add CUB tests for segmented sort/radix sort with 64-bit num. items and segments by @fbusato in #2254
Propagate compiler flags down to libcu++ LIT tests by @Artem-B in #2420
Drop remaining uses of _LIBCUDACXX_COMPILER_* by @miscco in #2467
Avoid C++17 extension in c++11 tests by @miscco in #2469
Add span to example and templated block size by @Kh4ster in #2470
Drop Objective C++ support by @miscco in #2468
removes superfluous template keyword in call to Dereference by @andrewcorrigan in #2482
Improve build times in several heavyweight libcudacxx tests. by @wmaxey in #2478
Drop __availability header by @miscco in #2484
Replace a few more instances of CUDA C++ Core Libraries with CUDA Core Compute Libraries`. by @rwgk in #2447
Fix common_type specialization for extended floating point types by @miscco in #2483
Implement some CUDA API calls for async_memory_pool by @miscco in #2455
Move cudax example project to CCCL project examples. by @alliepiper in #2462
Disable system header for narrowing conversion check by @miscco in #2465
Require resources to always provide at least one execution space property by @miscco in #2489
Rework builtin handling by @miscco in #2461
Disable execution checks for std::equal by @miscco in #2491
replace _CCCL_ALWAYS_INLINE with _CCCL_FORCEINLINE by @ericniebler in #2439
Drop 2 relative includes that snuck in by @miscco in #2492
re-express the cudax::__tupl::__apply member to make nvc++ happy by @ericniebler in #2493
Drop badly named _One_of concept by @miscco in #2490
Unify assert handling in cccl by @miscco in #2382
Reduce scope of Thrust linkage in cudax. by @alliepiper in #2496
Centralize CPM logic. by @alliepiper in #2495
Fix typo in presets. by @alliepiper in #2497
Refactor away per-project TOPLEVEL flags. by @alliepiper in #2498
[FEA]: Validate cuda.parallel type matching in build and execution by @rwgk in #2429
avoid gcc optimizer bug by not force inlining part of thrust::transform by @ericniebler in #2509
Cleanup and modularize <cuda/std/barrier> by @miscco in #2443
Consolidate header testing infra. by @alliepiper in #2460
Add ForEachN from CUB to cccl/c. by @wmaxey in #2378
Adds support for large number of items in DeviceSelect and DevicePartition by @elstehle in #2400
Adds support for large number of items to DeviceScan::*ByKey family of algorithms by @elstehle in #2477
Integrate c/parallel with CCCL build system and CI. by @alliepiper in #2514
Create a command list utility for nvrtc/jitlink steps. by @wmaxey in #2511
Fix the example project which the documentation refers too by @caugonnet in #2531
Enable tests/headertests for c/parallel in all-dev presets. by @alliepiper in #2566
Rename cudax test targets to match CCCL conventions. by @alliepiper in #2568
Update project list in issue template by @alliepiper in #2532
Disable compiler extensions on CCCL targets. by @alliepiper in #2559
Fixes cub::DeviceMemcpy::Batched to be able to copy from const source pointers by @elstehle in #2573
Fix documentation error in ci/build_common.sh for -arch by @caugonnet in #2574
gcc-14 gained the ability to mangle noexcept expressions by @ericniebler in #2565
Miscellaneous simple fixes by @rwgk in #2575
Avoid including yvals.h when the compiler is not MSVC. by @wmaxey in #2545
Fix popc.h when architecture is not x86 on MSVC. by @wmaxey in #2524
test for exceptions support on msvc with the _CPPUNWIND macro by @ericniebler in https://github.co...

Contributors

alliepiper, trxcllnt, and 31 other contributors

Assets 2

06 Jan 22:12

wmaxey

v2.7.0

b5fe509

CCCL 2.7.0

What’s New

C++

Thrust / CUB

Inclusive scan now supports initial value #1940
Inclusive and exclusive scan now support problem sizes exceeding 2^31 elements #2171
New cub::DeviceMerge::MergeKeys and cub::DeviceMerge::MergePairs algorithms #1817
New thrust::tabulate_output_iterator fancy iterator #2282

Libcudacxx

Enable Assertions on host and device depending on users choice
C++26 inplace_vector has been implemented and backported to C++14
Improved support for extended floating point types __half and __nv_bfloat16 both for cmath functions and complex
cuda::std::tuple is now trivially copyable if the stored types are trivially copyable
Reworked our atomics implementation
Improved <cuda/std/bit> conformance
Implemented <cuda/std/bitset> and backported to C++14
Implemented and backported C++20 bit_cast. It is available in all standard modes and constexpr with compiler support
Various backports and constexpr improvements (bool_constant, cuda::std::max)
Moved the experimental memory resources from <cuda/memory_resource> into <cuda/experimental/memory_resource.cuh>

Python

cuda.cooperative

Best practice of using CCCL to make your CUDA kernels easier to write and faster to execute is now available in Python through the cuda.cooperative module. This module currently supports block- and warp-level algorithms within numba.cuda kernels, offering speed-of-light reductions, prefix sums, radix, and merge sort. You can customize cuda.cooperative algorithms with user-defined data types and operators, implemented directly in Python.

Block and warp-level cooperative algorithms are now available in Python #1973.
Experimental versions of reduce, scan, merge and radix sort are available in numba.cuda kernels.

cuda.parallel

Apart from device-side cooperative algorithms, CCCL 2.7 provides an experimental version of host-side parallel algorithms as part of the cuda.parallel module. This release includes parallel reduction.

What's Changed

Fix documentation generation for thrust::pair by @bernhardmgruber in #1976
Correct typo in a launch configuration header name by @pciolkosz in #1972
Fix thrust::sort for large problem sizes by @gevtushenko in #1952
Avoid SIGPIPE when truncating verbose output in CI scripts. by @alliepiper in #1971
Clarify compiler support by @bernhardmgruber in #1970
Experimental Python cooperative algorithms by @gevtushenko in #1973
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #1928
Guard against an overflow in sort tests by @bernhardmgruber in #1980
Remove obsolete Thrust function traits by @bernhardmgruber in #1962
Python: Add version string & wheel build command by @leofang in #1985
Add device inclusive scan with init_value by @gonidelis in #1845
Fix BWUtil report on early exit by @gonidelis in #1994
Use libcu++ void_t everywhere by @bernhardmgruber in #1977
Drop zipped_binary_op by @bernhardmgruber in #1988
Clarify PtxVersion and SmVersion by @bernhardmgruber in #2004
More simplifications for CUB util_device by @bernhardmgruber in #1948
fix some typos in <cuda/stream_ref> by @ericniebler in #2003
Add CI slack notifications. by @alliepiper in #1961
Allow nightly workflow to be manually invoked. by @alliepiper in #2007
Need to use a different approach to reuse secrets in reusable workflows vs. actions. by @alliepiper in #2008
Enable RAPIDS builds for manually dispatched workflows. by @alliepiper in #2009
clean up complex.inl by @ZelboK in #1655
Add github token to nightly workflow-results action. by @alliepiper in #2012
Remove obsolete build system glue from the Thrust/CUB submodule structure. by @alliepiper in #2016
Benchmark thrust::copy with non-trivially relocatable type by @bernhardmgruber in #1989
Make bool_constant available in C++11 by @bernhardmgruber in #1997
Spell value initialization where used in thrust vectors by @bernhardmgruber in #1990
Do no redefine __ELF__ macro by @miscco in #2018
Port thrust::merge[_by_key] to CUB by @bernhardmgruber in #1817
Simplify some pointer traits by @bernhardmgruber in #2020
Simplify test data setup by @bernhardmgruber in #2023
Add tests to ensure that we properly propagate common_type for complex types by @miscco in #2025
Update Thrust CMake README to use CCCL repo. by @alliepiper in #2026
Include container toolkit in manual prereqs by @bryevdv in #2064
Avoid ADL issues with thrust::distance by @miscco in #2053
Simplify thrust::detail::wrapped_function by @bernhardmgruber in #2019
Add a test for Thrust scan with non-commutative op by @bernhardmgruber in #2024
Update memory_resource docs by @miscco in #1883
Temporarily switch nightly H100 CI to build-only. by @alliepiper in #2060
Do not rely on conversions between float and extended floating point types by @miscco in #2046
experimental wrapper types for cudaEvent_t that provide a modern C++ interface. by @ericniebler in #2017
[CUDAX] Add a dummy device struct for now by @pciolkosz in #2066
Allow (somewhat) different input value types for merge by @bernhardmgruber in #2075
Avoid ::result_type for partial sums in TBB reduce_by_key by @bernhardmgruber in #1998
Fix formatting by @bernhardmgruber in #2090
Rename and refactor transform_iterator_base by @bernhardmgruber in #1987
Benchmark analysis: Print all top rows when asked for by @bernhardmgruber in #2089
Makes user-provided functors in our examples use __device__ instead of CUB_RUNTIME_FUNCTION by @elstehle in #2088
Separate cuda/experimental when sorting includes by @bernhardmgruber in #2094
add support to cudax::device for querying a device's attributes by @ericniebler in #2084
[CUDAX] Add experimental owning abstraction for cudaStream_t by @pciolkosz in #2093
Do not query NVRTC for cuda runtime header by @miscco in #2102
Cleanup CUB block/thread load and exchange by @bernhardmgruber in #1946
Improve binary function objects and replace thrust implementation by @srinivasyadav18 in #1872
Replace _LIBCUDACXX_CPO_ACCESSIBILITY with _CCCL_GLOBAL_VARIABLE by @miscco in #1881
Add script to update RAPIDS version. by @bdice in #2082
Update bad links by @bryevdv in #2080
Fix line break issues that break doxygen code examples by @miscco in #2103
Add internal wrapper for cuda driver APIs by @pciolkosz in #2070
Use common_type for complex pow by @miscco in #1800
[CUDAX] rename device to device_ref, add immovable device as a place to cache properties by @ericniebler in #2110
Use the float flavors of the cmath functions in the extended floating point fallbacks by @miscco in #2106
[PoC]: Implement cuda::experimental::uninitialized_buffer by @miscco in #1831
Ensure that we avoid ABI Version conflics by @miscco in #2137
Ensure that cuda_memory_resource allocates memory on the proper device by @miscco in #2073
Clarify compatibility wrt. template specializations by @bernhardmgruber in #2138
Implement a cudax::get_stream CPO by @miscco in #2135
Make cuda::std::tuple trivially copyable by @miscco in #2127
Fix missing copy of docs artifacts by @miscco in #2162
...

Contributors

alliepiper, ericniebler, and 19 other contributors

Assets 2

10 Sep 18:45

wmaxey

v2.6.1

9019a6a

CCCL 2.6.1

This release includes backports for PRs #2332 and #2341. Please see release 2.6.0 for the full list of changes included in the release.

What's Changed

Backport PR #2332 and #2341 by @wmaxey in #2368

Full Changelog: v2.6.0...v2.6.1

Contributors

wmaxey

Assets 2

04 Sep 17:42

wmaxey

v2.6.0

c67b1c3

CCCL 2.6.0

What's Changed

Restrict active histogram channels to channel count by @bernhardmgruber in #1796
Cleanup internal thrust CUDA utils by @bernhardmgruber in #1802
Use variadic interfaces in agent launcher by @bernhardmgruber in #1804
Use nullptr over NULL by @bernhardmgruber in #1805
Rework the documentation to be build with sphinx by @miscco in #1753
Let Catch2 report cudaError descriptions by @bernhardmgruber in #1808
Check size-querying CUB API invocation in tests by @bernhardmgruber in #1809
Update docs link by @gevtushenko in #1812
Add missing inline specifiers by @bernhardmgruber in #1813
Upgrade actions that use node16 to versions that use node20 by @trxcllnt in #1779
Document NVTX range behavior during graph capture by @bernhardmgruber in #1814
Clean up AliasTemporaries by @bernhardmgruber in #1815
Drop removed clang-tidy option by @bernhardmgruber in #1810
Exclude docs from cccl infra changes. by @alliepiper in #1821
Clean up thrust merge unit tests by @bernhardmgruber in #1819
Fix atomic performance regressions by avoiding use of memcpy with natively supported atomic types. by @wmaxey in #1801
Clean up merge_by_key and merge_key_value tests by @bernhardmgruber in #1824
Restore the old thrust api documentation in rst by @miscco in #1818
Drop all internal implementations of exceptions by @miscco in #1806
Fix span for non-ranges by @miscco in #1836
Cleanup thrust test special types by @bernhardmgruber in #1837
Add inclusive_scan with initial value support (warp/block) by @gonidelis in #1749
Fix loading from incorrect URI on 404 page. by @wmaxey in #1843
Port CUB temporary storage layout test to Catch2 by @bernhardmgruber in #1835
Port CUB thread operators test to Catch2 by @bernhardmgruber in #1834
Adds ceil_div by @gonzalobg in #1825
Split workflow into multiple dispatch groups to avoid skipped jobs. by @alliepiper in #1797
Fix broken CUB doc build and add 404 page to Sphinx. by @wmaxey in #1846
Port CUB thread sort test to Catch2 by @bernhardmgruber in #1838
Cleanup CUB temporary storage layout test by @bernhardmgruber in #1848
Propogate error when docsbuild fails, add docs build to CI. by @alliepiper in #1852
Cleanup CUB util_macro.cuh by @bernhardmgruber in #1849
Provide libcu++ transparent functors in C++11 by @bernhardmgruber in #1851
Roll back upload-pages-artifact to v2. by @alliepiper in #1861
Port CUB iterator test to Catch2 by @bernhardmgruber in #1822
Symbol visibility is now invariant in regards to __cuda_std__ definition by @robertmaynard in #1832
Add dimensions description functionality to CUDA Experimental library by @pciolkosz in #1743
Document Asynchronous Operations by @gonzalobg in #1781
Remove cpp11_required.h by @bernhardmgruber in #1860
Add workflow to build RAPIDS from source with local CCCL by @trxcllnt in #1667
Refactor CI matrix. by @alliepiper in #1844
Adds tests for large number of items in cub::DeviceScan by @elstehle in #1830
Make CUB test launch wrappers functor instances by @bernhardmgruber in #1850
Improve CUB test overview docs by @bernhardmgruber in #1867
Skip devcontainer validation jobs if not needed. by @alliepiper in #1853
Improve CUB device-scope documentation by @bernhardmgruber in #1862
Make integer sequence et al. available in C++11 by @bernhardmgruber in #1859
Minimize template instantiations in CUB thread_load by @bernhardmgruber in #1857
Create major version 2.6.0 by @wmaxey in #1880
Drop facilities deprecated in CUB 1.x by @bernhardmgruber in #1868
Make thrust::sort use radix sort with more comparators by @bernhardmgruber in #1884
Make cuda::ptx::*_multicast pass on all architectures by @ahendriksen in #1874
Replace typedef by alias declarations in CUB by @bernhardmgruber in #1885
Remove legacy benchmarks and other dvs/p4 remnants by @alliepiper in #1901
Qualify call to distance in thrust::async_reduce by @bernhardmgruber in #1904
Rename CUB uninitialized_copy by @bernhardmgruber in #1913
Sanitizer fixes by @alliepiper in #1916
Use c2h::vectors in all non-example CUB tests by @bernhardmgruber in #1914
Renamed overlooked uninitialized_copy by @bernhardmgruber in #1920
Add assert implementation for device side testing by @pciolkosz in #1918
Thrust and CUB: README: Fix copy-paste from libcu++ and links by @pauleonix in #1878
Follow-up fixes to CUB iterator test by @bernhardmgruber in #1875
Replace typedef by alias declarations in Thrust by @bernhardmgruber in #1915
Cleanup CUB util_type.cuh by @bernhardmgruber in #1863
Fix include for in cub/util_type.cuh by @bernhardmgruber in #1929
Fix issues with comments in the concept emulation by @miscco in #1931
Deprecate and reduce use of old functional stuff by @bernhardmgruber in #1925
Deprecate more nested aliases in thrust functors by @bernhardmgruber in #1932
Fix various typos in CUB documentation and comments. by @brycelelbach in #1933
Add BabelStream flavors as thrust::transform benchmarks by @bernhardmgruber in #1921
Some cleanup in Thrust config headers by @bernhardmgruber in #1934
Update to CUDA 12.5 containers by @jrhemstad in #1935
Check that the current version of CMake supports policy 141 before se… by @alliepiper in #1924
Fix memmove optimization by @miscco in #1937
Fixes thrust::unique_by_key examples by @elstehle in #1943
Use only explicit NVTX3 V1 API in CUB by @bernhardmgruber in #1751
Suppress a clang warning on array size computation by @bernhardmgruber in #1942
Add a benchmark for thrust::equal by @bernhardmgruber in #1944
Strip prefix paths to improve doc rendering by @bdice in #1954
Modernize Thrust's alignment.h and triple_chevron_launch by @bernhardmgruber in #1905
Restore RAPIDS devcontainer by @bdice in #1955
Fix for in-place DeviceSelect & thrust::remove_if by @elstehle in #1782
Drop Thrust's cstdint.h by @bernhardmgruber in #1959
Use make_devcontainers.sh --clean when validating. by @alliepiper in #1963
Fix missing binary_pred in thrust::unique_by_key by @bernhardmgruber in #1957
cuda::launch and launch configuration object with minimal functionality by @pciolkosz in #1950
Backport PR #2046 - Fixing FP16 conversions. by @wmaxey in #2222

Full Changelog: v2.5.0...v2.6.0

Contributors

alliepiper, trxcllnt, and 14 other contributors

Assets 2

17 Jun 18:00

wmaxey

v2.5.0

69be18c

CCCL 2.5.0

What's New

This release includes several notable improvements and new features:

CUB device-level algorithms now support NVTX ranges in Nsight Systems. This integration makes it easier to identify and analyze the time spent in CUB algorithms. Please note that profiling with this feature requires at least C++14.
We have added new cub::DeviceSelect::FlaggedIf API, which allows you to select items based on applying a predicate to flags. This addition provides more flexibility and control over item selection.

What's Changed

Clean up libcu++ docs landing page by @jrhemstad in #1492
PTX: Add cuda::ptx::elect_sync by @ahendriksen in #1537
Print a summary of all tests sorted by execution time. by @alliepiper in #1539
Fix unused variable warning for __can_use_complete_tx by @wmaxey in #1547
Fix usage of naked array with 0 elements in sm90 barrier tests. by @wmaxey in #1546
Add support for stream operators for complex by @miscco in #1538
Fix __half for older architectures by @miscco in #1543
Feat 565 remove redundant thrust dialect conditional by @ZelboK in #566
fix missing device hint in WarpMergeSort Documentation by @MARD1NO in #1553
Minor fixes and additions on cub developer guides by @gonidelis in #1559
Consolidate handling of constexpr and if constexpr by @miscco in #1562
Ensure that cuda::aligned_size_t is usable in a constexpr context by @miscco in #1564
Group CUB docs by @gevtushenko in #1565
Update toolkit to 12.4 by @miscco in #1554
Work around change in cuTensorMapEncode by @miscco in #1567
Remove stdlib arg from .clangd. by @alliepiper in #1569
Add the DeviceSelect::FlaggedIf algorithm by @gonidelis in #1533
Catch2 segmented sort by @alliepiper in #1484
Do not emit diagnostic with extended device lambdas with preserved re… by @Revaj in #1495
Use absolute includes for libcu++ by @miscco in #1560
[NFC] Modularize <exception> by @miscco in #199
Add test support for launching kernels with cluster size > 1 by @ahendriksen in #416
Fix typo in README.md by @bprb in #1574
[FEA]: Modularize <cuda/memory_resource> by @miscco in #1532
Cleanup_complex by @miscco in #1555
Add missing comma in barrier __try_wait by @miscco in #1593
Segmented sort test fix by @alliepiper in #1591
Add pre-commit configuration by @bdice in #1596
Preserve .devcontainer/img/ when cleaning. by @alliepiper in #1604
Add some documentation for recent additions to libcu++ by @miscco in #1594
Ensure cuda::std::nullopt is visible in device code by @trxcllnt in #1598
Fix ordering of alignas and __shared__ by @miscco in #1601
Update Thrust CI tests. by @alliepiper in #1605
Implement tuple interface for cuda vector types by @miscco in #1410
Inspect PR changes to determine if subproject builds are needed. by @alliepiper in #1572
Apply clang-format to cub by @bdice in #1602
Add missing non-volatile atomic overloads. by @wmaxey in #1582
Drop unused libcxx files by @miscco in #1606
Apply formatting to libcudacxx by @miscco in #1610
Add conda documentation to the README. by @bdice in #1581
Allow jobs to be skipped. by @alliepiper in #1611
Make libcu++ work with exceptions by @miscco in #1607
Implement cuda::mr::cuda_memory_resource by @miscco in #1578
Implement cuda::mr::managed_memory_resource by @miscco in #1579
Apply formatting to thrust by @miscco in #1616
Update example_device_radix_sort.cu by @eriktedhamre in #1608
Implement cuda::mr::pinned_memory_resource by @miscco in #1580
Set the devcontainers to format on save. by @miscco in #1624
Enable internal use of std::allocator related functionality by @miscco in #1583
Adds tests for large number of items for cub::DeviceSelect by @elstehle in #1612
Add pre-commit docs to CONTRIBUTING.md. by @bdice in #1627
Move visibility attributes to cccl by @miscco in #1595
Work around thrust/memory.h circular include by @dkolsen-pgi in #1634
Fix mbarrier.init addressing by @ahendriksen in #1636
Trim trailing whitespace and normalize newlines. by @bdice in #1633
Add a git-blame-ignore-revs file by @miscco in #1629
Revert "PTX: Add cuda::ptx::elect_sync (#1537)" by @ahendriksen in #1638
Address potential oob in cub when passing in an invalid device counter by @miscco in #1641
Allow ninja_summary to fail by @jrhemstad in #1644
Mostly flatten the folder structure of libcu++ by @miscco in #1630
Make --cmake-options="" always override others. by @alliepiper in #1648
Fix invalid _CCCL_CUDACC definition for clang cuda by @miscco in #1656
Add missing #pragma once in some headers by @bernhardmgruber in #1668
Add NVTX ranges for all CUB algorithms by @bernhardmgruber in #1657
Implement LWG-3843 and LWG-3940 by @miscco in #1621
Modularize <memory> by @miscco in #1639
Expose <cuda/std/numeric> to be publicly available by @miscco in #1671
Add nsight support for automated debugging by @gonidelis in #1660
Format core headers by @miscco in #1670
Guard resource_ref and friends behind feature flag by @miscco in #1675
Create major version 2.5.0 by @wmaxey in #1677
Install CUB headers with .hpp extension by @bernhardmgruber in #1687
Update CMakePresets.json by @alliepiper in #1686
Fix deprecated status by @gevtushenko in #1692
Test combined internal/user-side use of NVTX by @bernhardmgruber in #1690
CI Overhaul, new nightly workflow by @alliepiper in #1654
Fix CMake option handling. by @alliepiper in #1698
Fix issues that came up with building cuDF with main by @miscco in #1643
Drop new properties until we are certain about the design by @miscco in #1681
Remove more uses of __cuda_std__ by @miscco in #1669
Fix usage of result_of in thrust by @miscco in #1705
Fix thrust::optional<T&>::emplace() by @Snektron in #1707
Remove old f(void) function signatures by @bernhardmgruber in #1708
Fix code sample in README and docs by @pauleonix in #1652
Format libcudacxx/include files without extensions by @bdice in #1676
Several improvements to zip_iterator/zip_function by @bernhardmgruber in #1710
Expose thrust's contiguous iterator unwrap helpers by @bernhardmgruber in #1717
Fix flakey heterogeneous tests by @wmaxey in #1712
Ensure that we can use cuda::std::optional with types that are not __host__ __device__ by @miscco in #1663
Fix a typo in barrier docs and update the godbolt link by @PointKernel in #1718
Massively improve test times in heterogeneous atomics tests by @wmaxey in #1719
Consolidate more common functi...

Contributors

alliepiper, trxcllnt, and 19 other contributors

Assets 2

23 Apr 21:30

wmaxey

v2.4.0

1c009d2

v2.4.0

What’s New

We are still hard at work in CCCL on paying down lots of technical debt, improving infrastructure, and various other simplifications as part of the unification of Thrust/CUB/libcu++. In addition to various fixes and documentation improvements, the following notable improvements have been made to Thrust, CUB, and libcudacxx.

Thrust

As part of our kernel consolidation effort, kernels of thrust::unique_by_key, thrust::copy_if, and thrust::partition algorithms are now consolidated in CUB. Kernel consolidation achieves two goals. First, it delivers the latest optimizations of CUB algorithms to Thrust users. Apart from the performance improvements, it introduces support of large problem sizes (64-bit offsets) into Thrust algorithms.

CUB

cub::DeviceSelect::UniqueByKey now supports equality operator and large problem sizes.
New cub::DeviceFor family of algorithms goes beyond conventional cub::DeviceFor::ForEach. cub::DeviceFor::ForEachCopy can provide you with additional performance benefits from vectorized memory accesses.
Many CUB algorithms now support CUDA graph capture mode.

libcudacxx

Added new cuda::ptx namespace with wrappers for inline-PTX instructions
cuda::std::complex specializations for CUDA types bfloat and half.

What's Changed

Implement remaining ranges iterator concepts and modernize array by @miscco in #627
Fix C++11 support of recently added tests by @ahendriksen in #651
Update CUDA newest to CTK 12.3 by @jrhemstad in #629
Add cuda::ptx::* namespace by @ahendriksen in #574
The test seems to pass just fine by @miscco in #654
Fixes discard_memory compilation failure for pre-Volta by @elstehle in #637
Reduce benchmarking time by @gevtushenko in #657
Add CCCL_VERSION and script for updating version by @jrhemstad in #652
Fixes compiler error for extended fp type data gen by @elstehle in #666
fixup ___CUDA_VPTX -> _CUDA_VPTX by @wmaxey in #664
Attempt to WAR CUB / RDC / MSVC issue by @gevtushenko in #669
Rework our system header approach to be more error proof by @miscco in #661
Project automation - fix sync action and draft setting step by @jarmak-nv in #625
Fix fallback when checking git repo by @wmaxey in #1085
Currently the verbose option does not work beacuse of a typo in the argument handling by @miscco in #1088
Adds virtual shared memory helper and tests by @elstehle in #619
Add cuda::ptx::st_async by @ahendriksen in #1078
Add cuda::ptx::red_async by @ahendriksen in #1080
Remove libcudacxx symlinks by @wmaxey in #1075
Move PTX tests that missed the symlink PR by @wmaxey in #1098
Fix truncation of constant value by @gevtushenko in #1097
Add cuda::ptx:mbarrier_{try/test}_wait{_parity} by @ahendriksen in #674
Initial CUB/NVRTC support by @gevtushenko in #1081
Fix cuda::ptx::red.async for int32_t types by @ahendriksen in #1102
Fix local test runs with lit by @miscco in #1108
Fix config when only non-CDPv1 arches are enabled. by @alliepiper in #1109
Do not replace the sccache binary for windows by @miscco in #1115
Test cuda graph capture by @gevtushenko in #1112
Fix overflow bug for >2^32 elements in thrust::shuffle by @djns99 in #1074
Introduce CUB transform reduce by @gevtushenko in #1091
Add infrastructure for compile-time CUB tests by @gevtushenko in #1124
Fix GCC6 / FP8 warning by @gevtushenko in #1130
Fix thrust transform reduce bench by @gevtushenko in #1133
Fix ptx.st.async.compile.pass.cpp failing in C++11. by @wmaxey in #1132
Fix _LIBCUDACXX_UNREACHABLE for old MSVC by @miscco in #1114
Allow filtering P0 benchmarks by @gevtushenko in #1135
Update barrier_arrive_tx.md docs by @gonzalobg in #1147
Update std iterators by @miscco in #672
Fix argument name in windows CI by @miscco in #1145
Fix XFAIL condition for subsumption tests by @miscco in #1144
Project Automation - remove draft automation + reduce permissions by @jarmak-nv in #1154
Use rst in block-scope docs by @gevtushenko in #1150
Fix errors when find_package(CCCL) is called twice. by @alliepiper in #1157
Fix icc / cub by @gevtushenko in #1152
Abort testing on unsupported dialect flags by @wmaxey in #1158
Run with latest nvbench by @robertmaynard in #583
Set finer-grain workflow permissions by @jrhemstad in #1163
Port device docs to rst by @gevtushenko in #1160
CI log improvements by @jrhemstad in #621
Setup documentation and corresponding github action by @wmaxey in #1118
Update Docs links in README.md by @wmaxey in #1169
Fix GCC 13 by @gevtushenko in #1175
Add missing exit from run-as-coder by @jrhemstad in #1176
Adds new virtual shared memory facility to DeviceMergeSort by @elstehle in #1117
Add first batch of Catch2 tests for DeviceRadixSort by @alliepiper in #1164
Implement math functions for thrust::complex by @miscco in #1178
Use anchors in matrix.yaml by @jrhemstad in #1193
Ensure the targets that Thrust creates are global. by @robertmaynard in #1182
Fix availability of is_constant_evaluated on old MSVC by @miscco in #1180
Enable std::variant for libcu++ by @miscco in #1076
Implement enable_borrowed_range by @miscco in #1196
Reduce thrust benchmarks noise by @gevtushenko in #1203
Prepare more algorithms by @miscco in #1161
Add icc compiler to CI matrix by @jrhemstad in #1159
Unify handling of dialects by @miscco in #1200
Add argument to build/test scripts for additional cmake options by @jrhemstad in #620
Move definitions of execution space macros into cccl by @miscco in #1199
Adds new virtual shared memory facility to DeviceSelect::UniqueByKey by @elstehle in #1197
Add Catch2 tests for cub::DeviceSegmentedRadixSort by @alliepiper in #1214
Fix the example on README.md by @so298 in #1220
Add missing overloads for thrust::pow by @miscco in #1222
Fix 'nvc++ -stdpar' by @dkolsen-pgi in #1224
Fix examples in reduce docs by @gevtushenko in #1230
Do not benchmark small problem sizes by @gevtushenko in #1243
Implement enable_view by @miscco in #1208
Refactors thrust::unique_by_key to use cub::DeviceSelect::UniqueByKey by @elstehle in #1245
Fix merge conflict from incoming PR by @miscco in #1250
Disable fast-math for ICC by @miscco in #1252
Fix a typo in thrust-config.cmake by @valgur in #1259
Implement ranges::{c}begin and ranges::{c}end by @miscco in #1256
Switch to entropy-based stopping criterion by @gevtushenko in #1280
Fix a sync bug in stream_ref::wait by @PointKernel in #1238
Silence some static asserts in ptx helpers by @miscco in #1257
Restore docs images...

Contributors

alliepiper, robertmaynard, and 23 other contributors

Assets 2

12 Mar 20:22

wmaxey

v2.3.2

64d3a5f

v2.3.2

What's Changed

[BACKPORT]: Silence some static asserts in ptx helpers (#1257) by @miscco in #1284
[BACKPORT]: Ensure that pair is trivially copyable (#1249) by @miscco in #1292
[BACKPORT]: Properly test internal headers (#1258) by @miscco in #1299
[Backport]: Fix errors when find_package(CCCL) is called twice. (#1157) by @miscco in #1298
[BACKPORT] Fix MSVC issues (#1261) by @miscco in #1297
[backport] thrust/mr: fix the case of reuising a block for a smaller alloc. (#1232) by @griwes in #1317
[BACKPORT]: Fix ptx usage to account for PTX ISA availability (#1359) by @miscco in #1421
Create patch 2.3.2 by @wmaxey in #1530

Full Changelog: v2.3.1...v2.3.2

Contributors

griwes, miscco, and wmaxey

Assets 2

23 Apr 21:29

wmaxey

v2.3.1

299eb62

v2.3.1

What's Changed

[BACKPORT]: Fix bug in stream_ref::wait by @miscco in #1283
Revert "Refactor thrust::complex as a struct derived from cuda::std::complex (#454)" by @miscco in #1286
Create patch 2.3.1 by @wmaxey in #1287

Full Changelog: v2.3.0...v2.3.1

Contributors

miscco and wmaxey

Assets 2

28 Feb 18:36

wmaxey

v2.3.0

c4eda1a

CCCL 2.3.0

What’s New

In addition to various fixes and documentation improvements, the following notable improvements have been made to Thrust, CUB, and libcudacxx.

System Headers and Warnings

Users don't want to see warnings from CCCL headers. The typical way to accomplish this with header libraries is to use -isystem. However, this causes problems when using CCCL from GitHub, it will conflict with the CCCL headers in the CTK. Therefore, you should always include CCCL headers via -I.

To achieve the same effect as -isystem, CCCL headers will now use the system_header pragma. For more information, see #527.

TL;DR: You should never see warnings emitted from a CCCL header ever again!

Linkage Issues

Using CUB and Thrust in shared libraries is a known source of issues. Previously, the solution to these issues consisted of using the THRUST_CUB_WRAPPED_NAMESPACE macro so that different shared libraries have different symbol names. However, this solution has poor discoverability, since issues present themselves in forms of segmentation faults, hangs, wrong results, etc. As of the 2.3 release, linkage issues are addressed by default without the need for THRUST_CUB_WRAPPED_NAMESPACE. Although the fix is API compatible, it might cause ABI compatibility issues. For more details, see issue #443.

Thrust

thrust::tuple, thrust::pair, and thrust::complex have been replaced with cuda::std alternatives. This can be a breaking change, but should be source compatible.

CUB

Up to 60% performance improvements of cub::DeviceSelect::UniqueByKey, cub::DeviceScan::ExclusiveSumByKey, and cub::DeviceReduce::ReduceByKey on A100. cub::DeviceSegmentedReduce now supports 64-bit indexing.

libcudacxx

The cuda::ptx namespace and <cuda/ptx> header is now available and provides access to various inline PTX functions that enumerate various async memcpy and barrier intrinsics.
#379 - Added experimental bulk TMA memcpy under <cuda/barrier>

What's Changed

Port cub::DeviceSegmentedReduce tests to catch2 by @elstehle in #303
Branch/2.2.x by @gevtushenko in #305
Tune unique by key on A100 by @gevtushenko in #306
Merge branch/2.2.x to main by @jrhemstad in #308
Add example cmake project by @jrhemstad in #177
Adds catch2 tests for reduce-by-key by @elstehle in #311
Tune scan by key on A100 by @gevtushenko in #325
Replace diag_suppress by nv_diag_suppress in documentation by @ahendriksen in #281
Fix MSVC / CUB tests build by @gevtushenko in #336
gdb pretty printer: handle non-cuda device vectors by @siboehm in #264
Add a nvrtc configuration for libcu++ by @miscco in #202
GH Infra: project automation and issue template fixes by @jarmak-nv in #297
Tune reduce by key on A100 by @gevtushenko in #346
Merge commits from 2.2 branch by @miscco in #350
Fix a shadow warning in thrust's execute_with_dependencies.h by @hageboeck in #334
Assorted fixes for MSVC 2017 by @miscco in #341
[skip-tests] Guard inline variables with _LIBCUDACXX_INLINE_VAR macro by @miscco in #355
Port cub::DeviceScan tests to catch2 by @elstehle in #347
Remove _NOEXCEPT macro in favor of noexcept in libcu++ by @Blonck in #349
Project Automation: add conditional steps due to context errors by @jarmak-nv in #353
Work around strange gcc bug by @miscco in #363
Implement iter_swap CPO by @miscco in #332
Replace default, constexpr, and delete macros by original keywords by @Blonck in #360
Add clang16 devcontainer and CI job by @miscco in #362
[skip-tests] Skip merge conflict from old iter_swap PR by @miscco in #369
[skip-tests] Also skip all CI runs that require a GPU when [skip-tests] is set by @miscco in #370
Remove _LIBCUDACXX_CXX03_LANG macro and all encapsulated code by @Blonck in #368
Remove checks against _LIBCUDACXX_STD_VER < 11 by @Blonck in #375
Use copy-pr-bot by @ajschmidt8 in #381
Implement the permutable concept by @miscco in #367
[NFC] We missed some _NOEXCEPT_ macro uses by @miscco in #371
Implement identity changes for c++20 by @miscco in #383
Hide third party cmake options in our cmake developer builds. by @allisonvacanti in #300
Port cub::DeviceScanByKey tests to Catch2 by @elstehle in #380
Fixes a race in DeviceRunLengthEncode::NonTrivialRuns by @elstehle in #399
Add commit information to the test output by @miscco in #401
Project Automation: Handle PRs opened as non-draft + multiple bug fixes by @jarmak-nv in #387
Project Automation: set Roadmap project value on issue/pr close and Auto-type new issues by @jarmak-nv in #389
Add support for tests that should fail at runtime by @ahendriksen in #418
Port DeviceAdjacentDifference::SubtractRight tests to catch2 by @miscco in #390
Project automation - Fix indentation for continue-on-error by @jarmak-nv in #425
[BUG] Ensure that all headers build on their own by @miscco in #200
Remove util_device.cuh from iterator headers to enable online compilation by @leofang in #412
Fix ci-overview example by @gevtushenko in #428
Port cub::DeviceRunLengthEncode tests to catch2 by @miscco in #411
Add cuda::device::barrier_arrive tx by @ahendriksen in #358
Fix CubDebug by @gevtushenko in #430
Do not use static member functions to initialize static member variables. by @miscco in #438
Implement the projected helper struct by @miscco in #385
Add PTX wrapping functions for TMA features by @ahendriksen in #379
Clarify docstring for num_items parameter of DeviceSegmentedRadixSort by @HapeMask in #320
Enable lit to determine the compute architectures by @miscco in #447
Add NVRTC_SKIP_KERNEL_RUN tag to compile, but skip running NVRTC test by @ahendriksen in #434
Improve documentation of cuda::barrier by @ahendriksen in #440
Extend thrust::complex unit tests to prepare for upcoming replacement with std::complex by @Blonck in #413
Remove having two install rules for -header-search.cmake by @robertmaynard in #298
Run .devcontainer/launch.sh with bash + add error checking by @wence- in #407
Remove C++03 compatability from unit tests by @Blonck in #378
[libcu++] Fix use of __ppc64__ by @miscco in #451
Update the README by @jrhemstad in #291
[libcu++] Try to avoid gcc misscompilation issues by @miscco in #452
Consolidate matrix logic into single script/job by @jrhemstad in #361
Implement the indirectly_comparable concept by @miscco in #445
Fix compute matrix dropping trailing zeros by @jrhemstad in #466
Avoid integer promotion warnings with MSVC by @miscco in #460
Implement ranges comparison objects by @miscco in #464
Fix CUB/MSVC/RDC tests by @gevtushenko in #469
Fix Thrust/CUB Linkage Issues by @gevtushenko in #443
Script for Running CUB Benchmarks by @gevtushenko in #472
[skip ci] Add list of CCCL users to README by @jrhemstad in #474
constexpr all the things by @pb-dseifert in #476
Add Gonzalo/Allard to trustees by @jrhemstad in #482
Implement the sortable concept by @miscco in #471
[libcu++] Add _LIBCUDACXX_CUDACC_BELOW_12_3 macro by @gonzalobg in #479
Refactor thrust::complex as a struct derived from cuda::std::complex by @Blonck in #454
Add ci scripts for windows by...

Contributors

robertmaynard, jecs, and 20 other contributors

Assets 2

07 Sep 19:09

jrhemstad

v2.2.0

36f379f

CCCL 2.2.0

(Note that these release notes are not yet finalized. They do not reflect any PRs that were merged to Thrust/CUB/libcudacxx before migrating to the nvidia/cccl repo).

What's Changed

Add axis for docker builds by @raydouglass in #1
Docker: Add support for ICPC and NVC++, install newer CMake, and add curl by @brycelelbach in #4
Update excludes by @raydouglass in #5
Docker: OS and CUDA upgrades, support for additional configurations by @brycelelbach in #9
Docker: Add Thrust/CUB documentation toolchain to Ubuntu docker images by @brycelelbach in #15
Re-enable CentOS images. by @allisonvacanti in #16
Add sccache to dockerfile by @msadang in #17
Update base containers. by @allisonvacanti in #18
Update sccache version by @ajschmidt8 in #19
Build 11.5.1 containers by @ajschmidt8 in #20
Add ops-bot.yaml by @jrhemstad in #80
Monorepo workflow by @jrhemstad in #99
Add devcontainers by @jrhemstad in #105
Update the libcu++ submodule by @miscco in #109
Update libcudaxx again by @miscco in #110
Remove submodules from CI workflow by @jrhemstad in #115
Fix CUB CI by @senior-zero in #114
Fix async scan / counting iterator tests by @senior-zero in #118
Make sccache work locally by @jrhemstad in #113
Fix compilation of thrust and cub by @miscco in #120
Fix segfault in cub::CachingDeviceAllocator by @senior-zero in #119
Initial GH Infra Setup by @jarmak-nv in #23
Visualize variant space coverage by @senior-zero in #125
Fix broken issue templates by @jarmak-nv in #124
Tune scan by key for SM90 by @senior-zero in #121
Update PR template to more explicitly prompt for a linked issue closed by the PR by @jrhemstad in #134
Change component section to more general "area" by @jrhemstad in #132
Try and fix CI for old CTK by @miscco in #116
Fix tuple_cat for std:: qualified types by @miscco in #144
Add ccache to lit invocation by @miscco in #147
Benchmark batched memcpy by @senior-zero in #136
Properly querry CMAKE_CUDA_COMPILER_LAUNCHER for ccache support by @miscco in #152
Implement Three-Way Partition Tuning / Benchmark by @senior-zero in #155
Port three-way partition to use Catch2 by @senior-zero in #156
Add gcc-6 to the test matrix by @miscco in #160
Tune reduce / unique by key for SM90 by @senior-zero in #163
Remove unused folders by @miscco in #145
Fix documentation of atomic_ref by @miscco in #164
New iterator traits by @miscco in #158
Improve implementation of destructible by @miscco in #157
Build script improvements by @jrhemstad in #149
Fix icpc / denormals by @senior-zero in #185
Enable tests by @jrhemstad in #167
Monorepo by @jrhemstad in #194
Multi-benchmark tuning by @senior-zero in #208
Fixes universal_vector test failure on CTK 11.1 & gcc-6 by @elstehle in #209
Delete several directories for older CI infra. by @wmaxey in #218
Memory-safe radix sort test by @senior-zero in #222
[FEA] Implement iter_move CPO by @miscco in #197
Build cub benchmarks in build_cub.sh by @jrhemstad in #216
[skip-tests] Do not run tests when skip-tests is part of the latest commit message by @miscco in #224
Factor out build job logic into a "run-as-coder" reusable workflow. by @jrhemstad in #205
Fix instances of 'scan' copy-pasted into reduction documentation by @milesvant in #221
Add clangd to devcontainer by @senior-zero in #225
Add initial CODEOWNERS file by @jrhemstad in #226
Attempt to fix codeowners by @jrhemstad in #231
Make libcudacxx respect CMake options for CUDA archs. by @wmaxey in #235
Optimize Three-Way Partition by @senior-zero in #228
[BUG] Rework how we handle feature test macros by @miscco in #195
Enable use of cudaMemcpyAsync for thrust::copy by @miscco in #211
Enable additional arguments in build_common.sh by @wmaxey in #236
[BUG] Properly uglify all qualifiers in product headers by @miscco in #201
Port cub::Device{Select, Partition} tests to catch2 by @miscco in #229
Fix CUB tests / MSVC 2022 by @senior-zero in #255
Ensure that any CMake re-rooting doesn't break our find_file by @miscco in #257
[BUG] Fix compilation issues with MSVC 2017 by @miscco in #196
Implement iterator concepts by @miscco in #223
Tune Histogram on H100 by @senior-zero in #266
Add WarpExchangeAlgorithm customization for WarpExchange class by @pb-dseifert in #256
[BUG]: Avoid deprecation warning for std::aligned_storage when building with c++23 by @miscco in #258
Port cub::DeviceReduce tests to catch2 by @elstehle in #267
Add support for nvcc-specific matrix. by @jrhemstad in #243
Fix anchor link to cooperative groups in CUDA programming guide by @wence- in #274
Fix BibTeX syntax in CITATION.md [skip-tests] by @wence- in #276
Enforce C++17 for benches by @senior-zero in #275
Project Automation: Move PR and Linked Issues to In Progress by @jarmak-nv in #170
Update to 23.08 devcontainers and CUDA 12.2 by @jrhemstad in #270
[skip-tests] CTK 12.2 tuning image by @senior-zero in #282
Fix single-thread block reduction by @senior-zero in #287
Tune Select and Partition on A100 by @senior-zero in #289
Fix CUB tests / MSVC by @senior-zero in #292
Allow building CUB tests without cuRand by @senior-zero in #250
Fixup to CUB build - s/curand/cudart/ by @wmaxey in #301
Fix OOB in cub::DeviceRunLengthEncode::NonTrivialRuns by @senior-zero in #294
Tune RLE on A100 by @senior-zero in #295
Tune scan on A100 by @senior-zero in #302
Add new CCCL:: CMake targets by @allisonvacanti in #244
Fix cudacc and nvcc mixup. by @wmaxey in #329
[skip-tests] Use builtin for destructible concept on MSVC by @miscco in #333
Fix merge conflict from two inflight PRs by @miscco in #338

New Contributors

@raydouglass made their first contribution in #1
@brycelelbach made their first contribution in #4
@msadang made their first contribution in #17
@wmaxey made their first contribution in #218
@milesvant made their first contribution in #221
@pb-dseifert made their first contribution in #256
@wence- made their first contribution in #274

Full Changelog: https://github.com/NVIDIA/cccl/commits/v2.2.0

Contributors

alliepiper, brycelelbach, and 12 other contributors

Assets 2

Releases: NVIDIA/cccl

CCCL 2.8.0

What's Changed

Contributors

CCCL 2.7.0

What’s New

C++

Thrust / CUB

Libcudacxx

Python

cuda.cooperative

cuda.parallel

What's Changed

Contributors

CCCL 2.6.1

What's Changed

Contributors

CCCL 2.6.0

What's Changed

Contributors

CCCL 2.5.0

What's New

What's Changed

Contributors

v2.4.0

What’s New

Thrust

CUB

libcudacxx

What's Changed

Contributors

v2.3.2

What's Changed

Contributors

v2.3.1

What's Changed

Contributors

CCCL 2.3.0

What’s New

System Headers and Warnings

Linkage Issues

Thrust

CUB

libcudacxx

What's Changed

Contributors

CCCL 2.2.0

What's Changed

New Contributors

Contributors