Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…into feature/selftrain-data-generation
  • Loading branch information
ruiyiw committed Nov 13, 2023
2 parents 8aa47ba + 4515a91 commit d92c41b
Show file tree
Hide file tree
Showing 145 changed files with 2,464,984 additions and 4 deletions.
167 changes: 166 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@ __pycache__
dist
.venv

# Byte-compiled / optimized / DLL files
*.py[cod]
*$py.class

# C extensions
*.so

# Log
*.log
*.log.*
Expand All @@ -13,6 +20,7 @@ llm_ft/checkpoints/*
llm_ft/*_checkpoints/*
!**/dummy_conversation.json
!llm_ft/deepspeed_config_s2.json
!llm_rl/data/*.json

# Editor
.idea
Expand All @@ -33,4 +41,161 @@ tests/state_of_the_union.txt

# Build
build
!dummy_file
!dummy_file

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

./llm_rl/preprocess/GPT4-4_Redis_Easy_No_Slide

llm_rl/*cache/
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

We split our overall framework into multiple parts

0. Data Generate --> Input None / Output new data on redis
1. Data Processing --> Output general form of sotopia train and test data
2. Together AI Finetuning --> Input the train and test data / Output model checkpoint
3. LLM Finetuning --> Input the train and test data / Output model checkpoint
4. LLM Deplyment --> Input LLM Finetuned model checkpoint / Output Deployable OpenAI type API
5. Eval --> Input model checkpoint / Output evaluation scores
6. Generate --> Input None / Output new data on redis
5. Eval --> Input model checkpoint / Output evaluation scores
20 changes: 20 additions & 0 deletions data_generate/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Data Generation

For the first step, we generate envProfile (including scenario / social goal / relationship restriction) based on inspiring prompt.

For the 2.1 step, we put the original agentProfile and relationshipProfile into our new redis database

For the 2.2 step, we combine them together to be combos based on conditiona sampling (the restriction is the relationship)

All the EnvProfile (new generated), AgentProfile (sotopia original), RelationshipProfile (sotopia original), and envagentcombo are on the redis database that is new created.

For the third step, we need to use another version of redis and convert it into json file and save the whole data in the database on the local machine.

For the final step, we convert the whole thing into Ruiyi's format.

# Local Redis Setting
Since the redis-server cannot directly input json data, it requires loading a RedisJson model into the redis-server to enable this function. Therefore, we need to load a docker based on RedisJson:

docker run -p 6379:6379 --name redis-stack redis/redis-stack:latest

Link: <https://github.com/RedisJSON/RedisJSON>
Loading

0 comments on commit d92c41b

Please sign in to comment.