Releases · axonn-ai/axonn

What's Changed

Update README by @bhatele in #16
fix evaluation bug for inter-layer by @siddharth9820 in #18
Support for intra-layer parallelism by @siddharth9820 in #21
add checkpointing and post backward hook support by @siddharth9820 in #24
docs: fix readthedocs.org build issues by @bhatele in #26
fix g_intra print by @zsat in #27
Tests: convert memopt to int before bool by @adityaranjan in #28
Docs: installation and running mnist test by @adityaranjan in #29
add 2D tensor parallelism for FC layers by @siddharth9820 in #30
readme: add slack link by @bhatele in #31
CI/CD tests for intra-layer parallelism by @siddharth9820 in #33
add AxoNN logo by @bhatele in #34
changes to the intra-layer API for the GPT benchmark by @siddharth9820 in #36
add dependencies between workflows by @bhatele in #41
[WIP] ILP Conv Layer support by @prajwal1210 in #38
Intra-layer - Overlap communication in backward pass by @siddharth9820 in #44
[WIP] A tensor parallel API for beginners by @siddharth9820 in #40
first iteration of 3D tensor parallelism by @siddharth9820 in #49
Initialize layers on the GPU by @siddharth9820 in #51
add option to change batch dimension in drop by @siddharth9820 in #52
change outer variables by @siddharth9820 in #53
A context manager to optimize communication by @siddharth9820 in #54
Rebase axonn-cpu to master by @Avuxon in #56
More communication optimizations by @siddharth9820 in #57
Parallel transformers by @jwendlan in #59
Added Depth Tensor Parallelism to Conv Layer by @prajwal1210 in #60
Change overlap for depth tp and do not initialize MPI unless absolutely needed by @siddharth9820 in #62
removed mpi4py dependency by @S-Mahua in #63
adding parallelize context for opt by @jwendlan in #65
Removing the drop and gathers in depth tensor parallelism for the easy API by @siddharth9820 in #66
change parallelize context to use AutoConfig by @siddharth9820 in #67
Bugfix: Initialize grad_input, grad_weight to None by @adityaranjan in #68
docs: fix build issues and add sub-sections by @bhatele in #69
added automatic_parallelism by @S-Mahua in #70
This PR shards the Dataloader across depth and data parallel ranks both by @siddharth9820 in #74
Make monkeypatching more efficient and change easy API to a single argument by @siddharth9820 in #72
Add API for tensor parallel model checkpointing by @siddharth9820 in #77
Changes to fix issues in IFT. by @siddharth9820 in #78
AxonnStrategy for Lightning Fabric backend by @anishbh in #76
initial doc for EasyAPI, Accelerate, and FT example by @jwendlan in #73
User guide Changes by @siddharth9820 in #80
Update advanced.rst by @siddharth9820 in #81
More lightning features by @siddharth9820 in #82
Supporting init_module, load/save checkpoint by @siddharth9820 in #83
make no-grad-sync yield None by @siddharth9820 in #88
create an engine for all things pipelining and deprecate custom mixed precision by @siddharth9820 in #91
Tensor parallel embedding by @siddharth9820 in #93
Improving AxoNN's memory consumption by @siddharth9820 in #95
Correct url of ci tests badge by @siddharth9820 in #99
reorg code and first implementation of the new easy API by @siddharth9820 in #96
Minor changes for Release 0.2.0 by @siddharth9820 in #100