This repository has been archived by the owner on Oct 14, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
5.11.5
->5.12.0
2.5.0rc1
->2.5.1
7.3.1
->7.4.2
Release Notes
pycqa/isort (isort)
v5.12.0
Compare Source
paddlepaddle/paddle (paddlepaddle)
v2.5.1
Compare Source
v2.5.0
: PaddlePaddle 2.5.0 Release NoteCompare Source
PaddlePaddle 2.5.0 Release Note
1. 重要更新
[1,]
及形状为[]
的张量定义了清晰的语义。2. 不兼容升级
loss.numpy()[0]
这样的代码。经过本次修改后,模型训练中输出的loss为0维tensor,使用loss.numpy()
即可取出或打印loss,代码简短、易懂且符合业界使用习惯。paddle.fluid
API全面退场。按照上个版本已预告的计划,本次退场了1116个paddle.fluid
API及相关内部接口,剩余少量相关内部接口会在下个版本全部清理完成。fluid API属于飞桨2.0本计划移除但考虑到兼容性等因素延缓清理的历史API,本次退场清理不会影响基于飞桨2.0开发的程序,飞桨API体系也会更加简洁易懂。paddle.static.ParallelExecutor
和paddle.static.CompiledProgram().with_data_parallel()
两个接口,原因是这套接口只支持单机多卡,不支持多机多卡,且底层执行性能较差。推荐统一使用多进程多卡训练方式,即paddle.distributed.launch
接口来进行数据并行的分布式训练。该升级只影响静态图,不影响动态图和动转静训练,如果使用了废弃接口,请参考 数据并行 的文档修改模型代码。#50351,#50501,#51240,#51701,#51616,#51369,#526713. 训练框架(含分布式)
Python API
API 支持0维tensor
paddle.reshape
、paddle.trace
、paddle.linalg.norm
等286个API。#53208, #53592, #47074, #53186, #47677, #49357, #50237, #46555, #47219, #47501, #47858, #47961, #48058, #48007, #49755, #51024, #51566, #51899, #49813, #47812, #47849, #47251, #53125, #53828, #51265, #47689, #48452, #49072, #48638, #49175, #49279, #50857, #49805, #47734, #45992, #49616, #49959, #50536, #49544, #49842, #46909, #49361, #50169, #48314, #48735, #49122, #49122, #49177, #49501, #49562, #49340, #49550, #49596, #49730, #49667, #49692, #49854, #49845, #49803, #49889, #49904, #49518, #49884, #49880, #49862, #49921, #49260, #49929, #49570, #49882, #50213, #49780, #50271, #50289, #50293, #49735, #50433, #49847, #50635, #50950, #50947, #49460, #53087, #51687, #52185, #54649paddle.sum
、paddle.min/max
、paddle.any/all
等90个API。#52891, #52861, #52775, #52850, #52843, #52857, #51721, #53051, #53192, #52739, #52741, #53175, #51889, #53199, #53242, #53421new API
paddle.sparse.reshape
、paddle.sparse.sum
和paddle.sparse.slice
等。#46694, #51513, #53794, #51406paddle.optimizer.LBFGS
、paddle.index_put
和paddle.logaddexp
等。#53314, #51912, #52886, #50843, #47282, #52284动态图
新功能
功能优化
bug fix
性能优化
静态图
静态图新执行器全面上线
静态图新执行器实现多项功能和性能优化,完成对原有多套旧执行器的统一和替换,成为静态图单卡和分布式训练python端入口以及动转静、控制流、CINN等后端默认使用的执行引擎,大幅提升框架调度性能,功能架构更加清晰,二次开发能力显著增强。#45913,#46025,#48911,#50239,#45696,#46092,#48158,#51389,#49708,#49275,#48789,#49939,#51149,#52652
算子库
自定义算子等功能增强
包括:全新支持了自定义扩展机制,实现将 C++ 扩展的运算函数绑定至Python端使用,进一步提升了框架的二次开发能力;扩展支持自定义硬件上使用自定义算子机制,以满足硬件厂商实现非Paddle已有算子的需求;扩展支持了在自定义算子中实现
inplace
、vector<Tensor>
输出、optional<Tnesor>
输入等高阶机制;优化了自定义算子在动态图模式下的调度性能,多输入参数的算子性能提升 25.4%;为自定义算子Tensor扩展新增了常用运算符及API,支持链式调用,简化代码写法。对算子内核选择机制进行了优化;对部分算子内核进行了逻辑完善、支持数据类型增强以及性能优化;新增以及完善 XPU 内核 100+;修复各项 Bug 累计 170+。#49222, #51773, #51923, #53080, #50731, #50563, #50840, #50983, #51713, #48733, #50558, #50764, #51973, #52216, #51027, #50745, #50756, #50886, #50813, #50869, #51085, #51646, #51620, #51844, #52421, #52872, #52597, #50582, #52114, #52915, #50928, #48272, #48702, #52191, #52191, #47374, #47375, #47378, #54126, #47638, #47661, #50606, #53528, #50599, #51727, #50825, #50773, #50979, #53336, #53555, #53716, #53753, #53981, #53977, #53980, #54043, #54066, #52866, #53043, #53325, #54323, #54367, #51353, #53749, #50013, #47570, #50997, #51241, #49537
算子体系架构统一
具体包括:将原算子体系下剩余的350+算子内核全部统一到PHI算子库中,以及原算子体系中的算子定义方式也都统一为PHI算子库的算子定义形式(基于YAML配置定义算子),提升了架构统一性,降低了框架开发的理解成本;将PHI算子库依赖的Fluid头文件全部解耦,并独立编译为动态链接库,为框架的二次开发提供更轻量的算子库复用方式;继续对飞桨框架中不规范的算子以及算子内核进行规范化调整,便于开发者理解,降低了硬件的接入成本。
#47856, #49328, #49138, #52014, #52044, #52116, #52486, #52101, #52882, #53003, #53034, #51914, #49116, #52626, #52878, #52879, #52880, #52875, #51600, #51601, #51590, #51887, #51891, #52036, #52130, #52134, #51951, #51886, #52274, #52263, #51913, #52145, #52347, #52370, #52437, #52424, #52231, #52522, #52529, #52802, #52799, #52855, #52711, #52940, #53309, #47817, #48001, #48063, #48049, #48168, #48415, #48696, #48970, #50183, #50407, #50498, #50419, #50282, #50870, #50911, #50865, #51288, #53735, #47248, #47787, #52202,
#47579, #49444, #45772, #51264, #51634, #51631, #47385, #46342, #47510, #47532, #47702, #47860, #49470, #50358, #49121, #50190, #52374, #52372, #52375, #52371
动转静加组合算子
新功能
CINN
或者 None,当该参数指定为CINN
时,将会使用 CINN 编译器来加速训练和推理 #52596OpTest
新增组合测试功能,对算子精度进行保障;添加elementwise类基础算子单测;添加batch_norm的CINN单测 #50509, #50807, #52815功能优化
grad var name
以修复高阶梯度计算时的错误 #53250print
函数转换以支持在组网阶段打印 Tensor 参数;升级参数收集机制 #48672, #50336bug fix
性能优化
分布式训练
动态图分布式
自动并行
参数服务器
Configuration
📅 Schedule: Branch creation - "before 4am on Monday" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
👻 Immortal: This PR will be recreated if closed unmerged. Get config help if that's undesired.
This PR has been generated by Mend Renovate. View repository job log here.