Skip to content

Commit 3dbcab5

Browse files
committed
merge
1 parent cd1df18 commit 3dbcab5

File tree

189 files changed

+7236
-14068
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

189 files changed

+7236
-14068
lines changed

.dockerignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
!setup.cfg
77
!Megatron-LM
88
!fast_llm
9+
!fast_llm_external_models
910
!examples
1011
!tools
1112
!tests

Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ RUN MAX_JOBS=2 pip install --no-build-isolation "causal-conv1d@git+https://gith
3434
RUN MAX_JOBS=2 pip install --no-build-isolation "mamba_ssm[causal-conv1d]@git+https://github.com/jxiw/varlen_mamba@varlen_mamba"
3535
# Copy dependency files with universal write permissions for all users.
3636
COPY --chmod=777 setup.py setup.cfg pyproject.toml ./
37+
COPY --chmod=777 ./fast_llm_external_models/__init__.py fast_llm_external_models/
3738
COPY --chmod=777 ./fast_llm/__init__.py fast_llm/
3839
COPY --chmod=777 ./fast_llm/csrc/ fast_llm/csrc/
3940

@@ -45,4 +46,5 @@ COPY --chmod=777 ./Megatron-LM Megatron-LM
4546
COPY --chmod=777 ./examples examples
4647
COPY --chmod=777 ./tests tests
4748
COPY --chmod=777 ./tools tools
49+
COPY --chmod=777 ./fast_llm_external_models fast_llm_external_models
4850
COPY --chmod=777 --exclude=./fast_llm/csrc/ ./fast_llm/ fast_llm/

docs/contributing/contributing.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,15 +40,15 @@ Before diving into code, [open an issue](https://github.com/ServiceNow/Fast-LLM/
4040
Here are some tips to ensure your pull request gets reviewed and merged promptly:
4141

4242
- **Follow our coding standards**: Stick to our [style guide and conventions](https://servicenow.github.io/Fast-LLM/developers/style-guide) to keep the code clean and consistent.
43-
- **Write tests**: Verify your changes with unit tests for new features or bug fixes.
43+
- **Write tests**: Verify your changes with unit tests for new features or bug fixes. See our [testing guide](https://servicenow.github.io/Fast-LLM/contributing/testing) for tips and recommendations on testing.
4444
- **Test on GPUs and real-world workloads**: Since Fast-LLM is all about training large language models, make sure your changes work smoothly in GPU environments and on typical training setups.
4545
- **Run benchmarks and performance tests**: Make sure your changes don't slow things down. If there's any impact on performance, provide benchmark results to back it up.
4646
- **Avoid introducing new issues**: Check that there are no new runtime warnings, type checker errors, linting problems, or unhandled edge cases.
4747
- **Comment non-trivial code**: Make your code easy to understand for others.
4848
- **Keep sensitive data out**: Make sure your code or commit messages don't expose private or proprietary information.
4949
- **Use a clear and descriptive title**: The PR title should summarize the key change or feature introduced. Avoid vague titles like "Fix bug" or "Update code." Start with a keyword like `[feat]`, `[fix]`, `[docs]`, etc. to categorize the change. Reference the issue number if applicable (e.g., `[fix] resolve #123 memory leak in training loop`). This title will become the commit message for the squashed merge.
5050
- **Use the [PR template](https://github.com/ServiceNow/Fast-LLM/blob/main/.github/PULL_REQUEST_TEMPLATE.md)**: Complete the checklist to make sure everything is in order before hitting submit.
51-
- **Make sure all tests pass before merging**: Run the tests with `pytest tests/ -v -ra -n 10`, and fix any failure before merging. If possible, please run the test in an environment with at least 4 GPUs.
51+
- **Make sure all tests pass before merging**: Run the tests with `pytest tests/ -v -ra -n 10`, and fix any failure before merging. If possible, please run the test in an environment with at least 4 GPUs. See our [testing guide](https://servicenow.github.io/Fast-LLM/contributing/testing) for more details on testing and debugging.
5252

5353
## 🆘 Seeking Help or Clarification
5454

docs/contributing/testing.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,30 @@
11
---
2-
title: Writing tests
2+
title: Writing and running tests
33
---
44

5+
## Debugging with tests
6+
7+
### Selecting tests
8+
9+
When debugging, it is often advisable to target specific tests that can be executed efficiently. Although Pytest allows targeting specific tests or files, complex parameterization and dependencies in our suite often make explicit selection difficult. To address this, several options for test selection are available:
10+
11+
* `--skip-slow`: Executes a subset of expedited tests that encompass much of the codebase. This option is effective for quickly checking for major regressions prior to executing the comprehensive test suite. Please note, parallel testing (`-n`) is typically unnecessary—and may even be counterproductive—when using this argument.
12+
* `--run-extra-slow`: Certain tests are disabled by default due to their lengthy execution times (e.g., complex integration tests) or limited criticality. Use this flag to re-enable them.
13+
* `--models MODEL0 MODEL1 ...`: Enables targeting of one or more specific models within the model testing suite. This feature is particularly useful during model-specific debugging efforts. For instance, running `pytest tests/models/test_models/test_checkpoint.py -v -ra --models llama` will specifically test checkpointing functionality for the llama model. Note that parallelization (`-n`) may be unnecessary in this context, as model tests for a given model are only partially distributed due to dependency constraints.
14+
15+
### Monitoring distributed tests
16+
17+
Distributed tests are generally the slowest due to the overhead associated with starting processes and process groups. To mitigate this, Fast-LLM incorporates several bundled tests that execute multiple subtests within a single subprocess call. As bundled calls can generate substantial output and potentially reduce report readability, Fast-LLM captures the output from each subtest and forwards it to an associated test. If necessary, this output capture can be disabled using `--no-distributed-capture`—for instance, if a severe crash hinders output capture or to disable pytest capture entirely (`-s`). Captured logs are stored in the testing cache directory; please consult individual tests for specific locations.
18+
19+
For example, `test_run_model_distributed[llama]` tries various distributed configurations for the `llama` model, each reported under an associated test such as `test_model_distributed[llama-distributed]`. Should a distributed subtest, say `tp2` (tensor-parallel), encounter a failure, `test_run_model_distributed` will log the issue, continue executing remaining subtests, and ultimately raise an error to designate the bundled test as failed. The associated test, `test_model_distributed[llama-tp2]`, will also fail and display the captured output (retrieved from `/tmp/fast_llm_tests/models/llama/tp2/`), separated by type (stdout, stderr and traceback) as would happen for a normal test (minus some advanced formating), but also by rank.
20+
21+
### Other options
22+
23+
* `--show-gpu-memory N`: Monitors GPU memory use and reports the top N tests (default 10). Mainly helps ensure tests don't exceed memory limits, but results may not be precise.
24+
* `--show-skipped`: Many tests skipped for obvious reasons (ex. marked as slow or extra slow, skipped model testing groups (see below)) are removed entirely from the report to reduce clutter. Use this flag to display them.
25+
26+
## Best practices
27+
528
## Testing models
629

730
[Model integration tests](https://github.com/ServiceNow/Fast-LLM/blob/main/tests/models) are the most important part of our testing suite, ensuring that Fast-LLM works and yields consistent results for a variety of models, training configurations, optimizations, etc.

docs/developer_guide/conversion.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -230,21 +230,21 @@ Continuing our `AwesomeModel` handler example, we define:
230230

231231
```python
232232
def _create_weight_converters(self) -> list[WeightConverter]:
233-
converters = []
234-
# The set of converters may depend on the base model configuration, which is accessible through `self._model.base_model_config`.
235-
num_layers = self._model.config.base_model.transformer.num_layers
236-
237-
# A simple renaming example, for the word embeddings.
238-
converters.append(WeightConverter("layers.0.word_embeddings_weight", "model.embed_tokens.weight"))
239-
240-
# We usually want to loop dynamically over layers
241-
for i in range(num_layers):
242-
# A `SplitWeightConverter` example, splitting a weight in two.
243-
converters.append(SplitWeightConverter(
244-
f"layers.{i + 1}.weight",
245-
(f"model.layers.{i}.weight_1", f"model.layers.{i}.weight_2"),
246-
))
247-
return converters
233+
converters = []
234+
# The set of converters may depend on the base model configuration, which is accessible through `self._model.base_model_config`.
235+
num_layers = len(self._model.config.base_model.decoder)
236+
237+
# A simple renaming example, for the word embeddings.
238+
converters.append(WeightConverter("layers.0.word_embeddings_weight", "model.embed_tokens.weight"))
239+
240+
# We usually want to loop dynamically over layers
241+
for i in range(num_layers):
242+
# A `SplitWeightConverter` example, splitting a weight in two.
243+
converters.append(SplitWeightConverter(
244+
f"layers.{i + 1}.weight",
245+
(f"model.layers.{i}.weight_1", f"model.layers.{i}.weight_2"),
246+
))
247+
return converters
248248
```
249249

250250
And that's it! We're ready to use the new checkpoint format in Fast-LLM.

docs/recipes/generate.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,12 @@ Below is a step-by-step example of how to generate text using a Fast-LLM model c
2121
import huggingface_hub
2222
from transformers import AutoTokenizer
2323
from fast_llm.engine.checkpoint.config import CheckpointLoadConfig
24-
from fast_llm.models.gpt.config import LlamaGPTHuggingfaceCheckpointFormat
24+
from fast_llm.models.gpt.conversion.config import LlamaCheckpointFormat
2525
from fast_llm.models.gpt.huggingface import HuggingfaceGPTModelForCausalLM
2626

2727
# Specify model and configuration
2828
model = "HuggingFaceTB/SmolLM2-135M-Instruct"
29-
checkpoint_format = LlamaGPTHuggingfaceCheckpointFormat
29+
checkpoint_format = LlamaCheckpointFormat
3030
max_new_tokens = 50
3131

3232
# Download model checkpoint from the Hugging Face Hub to a local directory

examples/mistral.yaml

Lines changed: 30 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -27,32 +27,42 @@ optimizer:
2727
beta_2: 0.95
2828
model:
2929
base_model:
30-
transformer:
30+
embeddings_layer:
31+
hidden_size: 4096
32+
vocab_size: 32000
33+
dropout: 0.0
34+
decoder:
35+
block:
36+
mixer:
37+
type: attention
38+
rotary:
39+
type: default
40+
theta: 10000
41+
heads: 32
42+
head_groups: 8
43+
head_size: 128
44+
add_linear_biases: false
45+
window_size: 4096
46+
dropout: 0.0
47+
mlp:
48+
intermediate_size: 14336
49+
add_linear_biases: false
50+
gated: true
51+
activation: silu
52+
normalization:
53+
type: rms_norm
54+
epsilon: 1.0e-05
55+
dropout: 0.0
56+
num_blocks: 32
57+
output_layer:
58+
tied_weight: false
3159
normalization:
3260
type: rms_norm
3361
epsilon: 1.0e-05
34-
rotary:
35-
type: default
36-
theta: 10000
37-
num_layers: 32
38-
hidden_size: 4096
39-
ffn_hidden_size: 14336
40-
num_attention_heads: 32
41-
head_groups: 8
42-
add_linear_biases: false
43-
gated: true
44-
activation_type: silu
45-
kv_channels: 128
46-
window_size: 4096
47-
init_method_std: 0.009021
48-
attention_dropout: 0.0
49-
hidden_dropout: 0.0
50-
vocab_size: 32000
51-
tie_word_embeddings: false
5262
multi_stage:
5363
zero_stage: 2
5464
distributed:
55-
training_dtype: bf16
65+
compute_dtype: bf16
5666
seed: 984059
5767
run:
5868
experiment_dir: mistral_example

fast_llm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.2.0"
1+
__version__ = "0.3.0"

fast_llm/config.py

Lines changed: 38 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -759,57 +759,32 @@ def from_dict(
759759
return cls._from_dict(default, strict)
760760

761761
@classmethod
762-
def from_flat_dict(
763-
cls,
764-
default: dict[str, typing.Any],
765-
strict: bool = True,
766-
) -> typing.Self:
767-
# TODO v0.3: Remove flat format
768-
return cls._from_dict(default, strict, True)
769-
770-
@classmethod
771-
def _from_dict(
772-
cls,
773-
default: dict[str, typing.Any],
774-
strict: bool = True,
775-
flat: bool = False,
776-
) -> typing.Self:
777-
# TODO v0.3: Remove flat format
762+
def _from_dict(cls, default: dict[str, typing.Any], strict: bool = True) -> typing.Self:
778763
out_arg_dict = {"_from_dict_check": True}
779-
780-
# TODO v0.3: Remove backward compatibility fix
781-
if "__class__" in default:
782-
del default["__class__"]
783-
784764
try:
785765
actual_cls = cls.get_subclass(default.get("type"))
786-
if actual_cls is not None and actual_cls is not cls:
787-
return actual_cls._from_dict(default, strict=strict, flat=flat)
788766
except KeyError:
789-
# Postpone error to validation.
790-
pass
767+
# Try to postpone error to validation.
768+
actual_cls = cls
769+
770+
if actual_cls is not None and actual_cls is not cls:
771+
return actual_cls._from_dict(default, strict=strict)
791772

792773
# Do not validate yet in case the root class sets cross-dependencies in validation.
793774
with NoAutoValidate():
794775
for name, field in cls.fields():
795776
if not field.init or field._field_type != dataclasses._FIELD: # noqa
796777
continue
797-
if flat:
798-
if isinstance(field.type, type) and issubclass(field.type, Config):
799-
out_arg_dict[name] = field.type._from_dict(default, False, True)
800-
elif name in default:
801-
out_arg_dict[name] = default.pop(name)
802-
else:
803-
# Check for nested configs to instantiate.
804-
try:
805-
value = cls._from_dict_nested(default.pop(name, MISSING), field.type, strict)
806-
if value is not MISSING:
807-
out_arg_dict[name] = value
808-
except FieldTypeError as e:
809-
raise FieldTypeError(
810-
f"Invalid field type `{get_type_name(field.type)}` in class {cls._get_class_name()}: "
811-
+ ", ".join(e.args)
812-
)
778+
# Check for nested configs to instantiate.
779+
try:
780+
value = cls._from_dict_nested(default.pop(name, MISSING), field.type, strict)
781+
if value is not MISSING:
782+
out_arg_dict[name] = value
783+
except FieldTypeError as e:
784+
raise FieldTypeError(
785+
f"Invalid field type `{get_type_name(field.type)}` in class {cls._get_class_name()}: "
786+
+ ", ".join(e.args)
787+
)
813788
out = cls(**out_arg_dict) # noqa
814789
if strict and default:
815790
out._unknown_fields = default.copy()
@@ -1028,6 +1003,28 @@ def __init__(self, config: ConfigType, *args, **kwargs):
10281003
# Handle multiple inheritance.
10291004
super().__init__(*args, **kwargs)
10301005

1006+
def __init_subclass__(cls):
1007+
# Automatically set `config_class` based on the bound type.
1008+
# Make sure `ConfigType` is bound and respects class hierarchy.
1009+
try:
1010+
config_class = None
1011+
for base in types.get_original_bases(cls):
1012+
if hasattr(base, "__origin__") and issubclass(base.__origin__, Configurable):
1013+
for arg in base.__args__:
1014+
if arg.__name__ == "ConfigType":
1015+
if config_class is None:
1016+
config_class = arg.__bound__
1017+
else:
1018+
assert arg.__bound__ is config_class
1019+
assert config_class is not None
1020+
except Exception as e:
1021+
raise TypeError(
1022+
f"Could not determine the configuration class for the configurable class {cls.__name__}: {e.args}. "
1023+
"Please make sure to declare in the format "
1024+
f"`class {cls.__name__}[ConfigType: ConfigClass](BaseConfigurable[ConfigType])`.] "
1025+
)
1026+
cls.config_class = config_class
1027+
10311028
@property
10321029
def config(self) -> ConfigType:
10331030
return self._config

0 commit comments

Comments
 (0)