feat: Self-Rewarding Algorithm with TRT Support #321

trias702 · 2024-09-26T22:14:33Z

What does this PR do ?

Adds support for the Self-Rewarding and Meta-Rewarding algorithms from the following two papers:

https://arxiv.org/abs/2401.10020
https://arxiv.org/abs/2407.19594

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

Please see the new tutorial document at: docs/user-guide/self_rewarding.rst

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

Does the trainer resume and restore model state all states?
Does the trainer support all parallelism techniques(PP, TP, DP)?
Does the trainer support max_steps=-1 and validation?
Does the trainer only call APIs defined in alignable_interface.py?
Does the trainer have proper logging?

Additional Information

Related to # (issue)

Signed-off-by: Gerald Shen <[email protected]>

Signed-off-by: Daniel Egert <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Daniel Egert <[email protected]>

…/NeMo-Aligner into degert/self-rewarding-trt

Signed-off-by: Daniel Egert <[email protected]>

for more information, see https://pre-commit.ci

…/NeMo-Aligner into degert/self-rewarding-trt

Signed-off-by: Daniel Egert <[email protected]>

for more information, see https://pre-commit.ci Signed-off-by: NeMo-Aligner CI <[email protected]>

Signed-off-by: Daniel Egert <[email protected]>

gshennvm added 30 commits March 20, 2024 17:17

add logging

e7ea083

Signed-off-by: Gerald Shen <[email protected]>

fix critic client

6cd8923

Signed-off-by: Gerald Shen <[email protected]>

fix

887874d

Signed-off-by: Gerald Shen <[email protected]>

fix

6e2ca2d

Signed-off-by: Gerald Shen <[email protected]>

fix

bacd786

Signed-off-by: Gerald Shen <[email protected]>

fix

17b04ca

Signed-off-by: Gerald Shen <[email protected]>

fix typo

9fee58b

Signed-off-by: Gerald Shen <[email protected]>

better train timing

61ba204

Signed-off-by: Gerald Shen <[email protected]>

remove prints

c9dad2a

Signed-off-by: Gerald Shen <[email protected]>

with timing

e5f68e2

Signed-off-by: Gerald Shen <[email protected]>

delete unused func

74791ae

Signed-off-by: Gerald Shen <[email protected]>

add critic logging

e40ebd6

Signed-off-by: Gerald Shen <[email protected]>

add

72ba6c6

Signed-off-by: Gerald Shen <[email protected]>

cleanup

bfb61e4

Signed-off-by: Gerald Shen <[email protected]>

update

21206c5

Signed-off-by: Gerald Shen <[email protected]>

fix

148acf4

Signed-off-by: Gerald Shen <[email protected]>

fix bug

d7c9990

Signed-off-by: Gerald Shen <[email protected]>

fix bug

537d6e5

Signed-off-by: Gerald Shen <[email protected]>

test

47400ba

Signed-off-by: Gerald Shen <[email protected]>

fix bug

d7b2b23

Signed-off-by: Gerald Shen <[email protected]>

fix

ce76226

Signed-off-by: Gerald Shen <[email protected]>

add

8edf534

Signed-off-by: Gerald Shen <[email protected]>

fix

6379a2e

Signed-off-by: Gerald Shen <[email protected]>

fix again

eadae31

Signed-off-by: Gerald Shen <[email protected]>

fix

e2b97d9

Signed-off-by: Gerald Shen <[email protected]>

fix mean

d9bdf7c

Signed-off-by: Gerald Shen <[email protected]>

fix

1c7d215

Signed-off-by: Gerald Shen <[email protected]>

add debug

3638301

Signed-off-by: Gerald Shen <[email protected]>

fix

4cca85f

Signed-off-by: Gerald Shen <[email protected]>

add data iter for VP

1b19bdd

Signed-off-by: Gerald Shen <[email protected]>

trias702 added 5 commits September 24, 2024 16:08

Moved the templates to the conf file instead

acd4d07

Signed-off-by: Daniel Egert <[email protected]>

Merged to latest aligner main

a249a44

Signed-off-by: Daniel Egert <[email protected]>

Merge remote-tracking branch 'origin' into degert/self-rewarding-trt

ce687ed

Fixes to yaml tab issues

005702b

Signed-off-by: Daniel Egert <[email protected]>

Far enough along that I can file the PR

a35e359

Signed-off-by: Daniel Egert <[email protected]>

trias702 added the Algorithms label Sep 26, 2024

trias702 requested a review from odelalleau September 26, 2024 22:14

github-actions bot added documentation Improvements or additions to documentation Utils labels Sep 26, 2024

pre-commit-ci bot and others added 11 commits September 26, 2024 22:14

[pre-commit.ci] auto fixes from pre-commit.com hooks

fc1def7

for more information, see https://pre-commit.ci

Added self-rewarding rst doc

eab62af

Signed-off-by: Daniel Egert <[email protected]>

Merge branch 'degert/self-rewarding-trt' of https://github.com/NVIDIA…

8805db5

…/NeMo-Aligner into degert/self-rewarding-trt

Fixed bad merge in utils.py

fb61b86

Signed-off-by: Daniel Egert <[email protected]>

Removed trt patch file

a7817b3

Signed-off-by: Daniel Egert <[email protected]>

Fixed ordering of clamp bug

71029d9

Signed-off-by: Daniel Egert <[email protected]>

Merge remote-tracking branch 'origin' into degert/self-rewarding-trt

9051695

[pre-commit.ci] auto fixes from pre-commit.com hooks

35f8ee4

for more information, see https://pre-commit.ci

Merge branch 'degert/self-rewarding-trt' of https://github.com/NVIDIA…

ab2d3ce

…/NeMo-Aligner into degert/self-rewarding-trt

Merge remote-tracking branch 'origin' into degert/self-rewarding-trt

294afcd

Added CI test for SPIN

c465f30

Signed-off-by: Daniel Egert <[email protected]>

github-actions bot added the CI label Oct 26, 2024

trias702 added the Run CICD Set + un-set to retrigger label Oct 26, 2024

ko3n1g added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Oct 28, 2024

trias702 changed the title ~~Self-Rewarding Algorithm with TRT Support~~ feat: Self-Rewarding Algorithm with TRT Support Oct 28, 2024

trias702 and others added 5 commits October 29, 2024 13:00

Added CI tests for self-rewarding

2092084

Signed-off-by: Daniel Egert <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

99c7bcd

for more information, see https://pre-commit.ci Signed-off-by: NeMo-Aligner CI <[email protected]>

Added CI tests for generation

4e435f7

Signed-off-by: Daniel Egert <[email protected]>

Merged in v13 TRT changes

3d429fb

Signed-off-by: Daniel Egert <[email protected]>

Removed max_input_tokens from TRT algos

b460a14

Signed-off-by: Daniel Egert <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Self-Rewarding Algorithm with TRT Support #321

feat: Self-Rewarding Algorithm with TRT Support #321

trias702 commented Sep 26, 2024

feat: Self-Rewarding Algorithm with TRT Support #321

Are you sure you want to change the base?

feat: Self-Rewarding Algorithm with TRT Support #321

Conversation

trias702 commented Sep 26, 2024

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing a new algorithm

Additional Information