Skip to content

Commit

Permalink
fix merge conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
epwalsh committed Dec 7, 2023
2 parents 148ca06 + 1dbc346 commit 569381a
Show file tree
Hide file tree
Showing 438 changed files with 2,979 additions and 1,322,098 deletions.
27 changes: 0 additions & 27 deletions .flake8

This file was deleted.

4 changes: 2 additions & 2 deletions CONTRIBUTING.md → .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,9 @@ When you're ready to contribute code to address an open issue, please follow the
isort .
black .

Our CI also uses [`flake8`](https://github.com/allenai/LLM/tree/main/tests) to lint the code base and [`mypy`](http://mypy-lang.org/) for type-checking. You should run both of these next with
Our CI also uses [`ruff`](https://docs.astral.sh/ruff/) to lint the code base and [`mypy`](http://mypy-lang.org/) for type-checking. You should run both of these next with

flake8 .
ruff check .

and

Expand Down
2 changes: 1 addition & 1 deletion .github/actions/setup-venv/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ runs:
- shell: bash
run: |
# Install prerequisites.
pip install --upgrade pip setuptools wheel virtualenv
pip install --upgrade pip setuptools build wheel virtualenv
- shell: bash
run: |
Expand Down
7 changes: 3 additions & 4 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
task:
- name: Lint
run: |
flake8 .
ruff check .
include:
- python: '3.10'
Expand All @@ -49,8 +49,7 @@ jobs:
task:
name: Build
run: |
python setup.py check
python setup.py bdist_wheel sdist
python -m build
- python: '3.10'
task:
Expand Down Expand Up @@ -175,7 +174,7 @@ jobs:

- name: Install requirements
run: |
pip install --upgrade pip setuptools wheel
pip install --upgrade pip setuptools wheel build
pip install -r dev-requirements.txt
- name: Prepare environment
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ TEST_IMAGE = $(shell beaker workspace images $(BEAKER_WORKSPACE) --format=json
run-checks :
isort --check .
black --check .
flake8 .
ruff check .
mypy .
CUDA_VISIBLE_DEVICES='' pytest -v --color=yes tests/

Expand Down
13 changes: 5 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
After cloning this repository, first install the latest [PyTorch](https://pytorch.org) according the official instructions relevant to your environment. Then install the remaining dependencies and code base by running:

```
pip install -e .[dev]
pip install -e .
```

## Running LM pre-training jobs
Expand All @@ -31,25 +31,22 @@ torchrun --nproc-per-node=8 scripts/train.py configs/c4-tiny.yaml \

#### Running on Cirrascale via [beaker-gantry](https://github.com/allenai/beaker-gantry)

Check the script at [`scripts/olmo-small-ablation-on-gantry.sh`](scripts/beaker/olmo-small-ablation-on-gantry.sh) for an example on how to run a training job on Cirrascale.

After installing `beaker-gantry`, you can launch a training job like this:
Check the script at [`scripts/beaker/olmo-small-ablation-on-gantry.sh`](scripts/beaker/olmo-small-ablation-on-gantry.sh) for an example on how to run a training job on Cirrascale. Using that script, you can launch a training job like this:

```bash
CONFIG_PATH=configs/choose_a_config.yml \
LOAD_PATH=/optional/path/to/checkpoint/ \
bash scripts/olmo-small-ablation-on-gantry.sh
```

if `CONFIG_PATH` is not specified, the default config is `configs/olmo-small-ablation.yaml`;
if `LOAD_PATH` is not specified, the training will start from scratch.
If `CONFIG_PATH` is not specified, the default config is `configs/olmo-small-ablation.yaml`. If `LOAD_PATH` is not specified, the training will start from scratch.

#### Running on LUMI via Slurm

First read our [LUMI](docs/LUMI.md) documentation, but submitting a new job essentially just boils down to running this:

```bash
sbatch scripts/c4-tiny-on-lumi.sh
sbatch scripts/lumi/c4-small-on-lumi.sh
```

### Restarting a training job from a checkpoint
Expand All @@ -62,7 +59,7 @@ There are also symlinks for the latest checkpoints in the form of `latest` and `
Sharded checkpoints are the default type of checkpoint that's saved during training since these are the fastest, but you can also save unsharded checkpoints by setting `--save_interval_unsharded [INT]`.

If you plan to restart a training run using a *different* world size, you can only restart from an *unsharded* checkpoint.
However, you can convert a sharded checkpoint into an unsharded checkpoint by launching the script [scripts/unshard_checkpoint.sh](./scripts/unshard_checkpoint.sh) in the same way you launched the training script. Note that this needs to be launched with the exact same world size as when the *sharded* checkpoint was saved.
However, you can convert a sharded checkpoint into an unsharded checkpoint by launching the script [scripts/unshard.sh](./scripts/unshard.sh) in the same way you launched the training script. Note that this needs to be launched with the exact same world size as when the *sharded* checkpoint was saved.

## Finding official runs and checkpoints

Expand Down
File renamed without changes.
Loading

0 comments on commit 569381a

Please sign in to comment.