ServiceNow
diff --git a/‎.dockerignore‎
Lines changed: 1 addition & 0 deletions b/‎.dockerignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 2 additions & 0 deletions b/‎Dockerfile‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎Megatron-LM‎ b/‎Megatron-LM‎
diff --git a/‎docs/contributing/contributing.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/contributing/contributing.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/contributing/testing.md‎
Lines changed: 24 additions & 1 deletion b/‎docs/contributing/testing.md‎
Lines changed: 24 additions & 1 deletion
diff --git a/‎docs/developer_guide/conversion.md‎
Lines changed: 15 additions & 15 deletions b/‎docs/developer_guide/conversion.md‎
Lines changed: 15 additions & 15 deletions
diff --git a/‎docs/recipes/generate.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/recipes/generate.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/mistral.yaml‎
Lines changed: 30 additions & 20 deletions b/‎examples/mistral.yaml‎
Lines changed: 30 additions & 20 deletions
diff --git a/‎fast_llm/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎fast_llm/__init__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎fast_llm/config.py‎
Lines changed: 38 additions & 41 deletions b/‎fast_llm/config.py‎
Lines changed: 38 additions & 41 deletions
@@ -6,6 +6,7 @@
 !setup.cfg
 !Megatron-LM
 !fast_llm
+!fast_llm_external_models
 !examples
 !tools
 !tests
 
@@ -34,6 +34,7 @@ RUN MAX_JOBS=2 pip install --no-build-isolation  "causal-conv1d@git+https://gith
 RUN MAX_JOBS=2 pip install --no-build-isolation "mamba_ssm[causal-conv1d]@git+https://github.com/jxiw/varlen_mamba@varlen_mamba"
 # Copy dependency files with universal write permissions for all users.
 COPY --chmod=777 setup.py setup.cfg pyproject.toml ./
+COPY --chmod=777 ./fast_llm_external_models/__init__.py fast_llm_external_models/
 COPY --chmod=777 ./fast_llm/__init__.py fast_llm/
 COPY --chmod=777 ./fast_llm/csrc/ fast_llm/csrc/
 
@@ -45,4 +46,5 @@ COPY --chmod=777 ./Megatron-LM Megatron-LM
 COPY --chmod=777 ./examples examples
 COPY --chmod=777 ./tests tests
 COPY --chmod=777 ./tools tools
+COPY --chmod=777 ./fast_llm_external_models fast_llm_external_models
 COPY --chmod=777 --exclude=./fast_llm/csrc/ ./fast_llm/ fast_llm/
@@ -40,15 +40,15 @@ Before diving into code, [open an issue](https://github.com/ServiceNow/Fast-LLM/
 Here are some tips to ensure your pull request gets reviewed and merged promptly:
 
 -   **Follow our coding standards**: Stick to our [style guide and conventions](https://servicenow.github.io/Fast-LLM/developers/style-guide) to keep the code clean and consistent.
--   **Write tests**: Verify your changes with unit tests for new features or bug fixes.
+-   **Write tests**: Verify your changes with unit tests for new features or bug fixes. See our [testing guide](https://servicenow.github.io/Fast-LLM/contributing/testing) for tips and recommendations on testing.
 -   **Test on GPUs and real-world workloads**: Since Fast-LLM is all about training large language models, make sure your changes work smoothly in GPU environments and on typical training setups.
 -   **Run benchmarks and performance tests**: Make sure your changes don't slow things down. If there's any impact on performance, provide benchmark results to back it up.
 -   **Avoid introducing new issues**: Check that there are no new runtime warnings, type checker errors, linting problems, or unhandled edge cases.
 -   **Comment non-trivial code**: Make your code easy to understand for others.
 -   **Keep sensitive data out**: Make sure your code or commit messages don't expose private or proprietary information.
 -   **Use a clear and descriptive title**: The PR title should summarize the key change or feature introduced. Avoid vague titles like "Fix bug" or "Update code." Start with a keyword like `[feat]`, `[fix]`, `[docs]`, etc. to categorize the change. Reference the issue number if applicable (e.g., `[fix] resolve #123 memory leak in training loop`). This title will become the commit message for the squashed merge.
 -   **Use the [PR template](https://github.com/ServiceNow/Fast-LLM/blob/main/.github/PULL_REQUEST_TEMPLATE.md)**: Complete the checklist to make sure everything is in order before hitting submit.
--   **Make sure all tests pass before merging**: Run the tests with `pytest tests/ -v -ra -n 10`, and fix any failure before merging. If possible, please run the test in an environment with at least 4 GPUs.
+-   **Make sure all tests pass before merging**: Run the tests with `pytest tests/ -v -ra -n 10`, and fix any failure before merging. If possible, please run the test in an environment with at least 4 GPUs. See our [testing guide](https://servicenow.github.io/Fast-LLM/contributing/testing) for more details on testing and debugging.
 
 ## 🆘 Seeking Help or Clarification
 
 
@@ -1,7 +1,30 @@
 ---
-title: Writing tests
+title: Writing and running tests
 ---
 
+## Debugging with tests
+
+### Selecting tests
+
+When debugging, it is often advisable to target specific tests that can be executed efficiently. Although Pytest allows targeting specific tests or files, complex parameterization and dependencies in our suite often make explicit selection difficult. To address this, several options for test selection are available:
+
+* `--skip-slow`: Executes a subset of expedited tests that encompass much of the codebase. This option is effective for quickly checking for major regressions prior to executing the comprehensive test suite. Please note, parallel testing (`-n`) is typically unnecessary—and may even be counterproductive—when using this argument.
+* `--run-extra-slow`: Certain tests are disabled by default due to their lengthy execution times (e.g., complex integration tests) or limited criticality. Use this flag to re-enable them.
+* `--models MODEL0 MODEL1 ...`: Enables targeting of one or more specific models within the model testing suite. This feature is particularly useful during model-specific debugging efforts. For instance, running `pytest tests/models/test_models/test_checkpoint.py -v -ra --models llama` will specifically test checkpointing functionality for the llama model. Note that parallelization (`-n`) may be unnecessary in this context, as model tests for a given model are only partially distributed due to dependency constraints.
+
+### Monitoring distributed tests
+
+Distributed tests are generally the slowest due to the overhead associated with starting processes and process groups. To mitigate this, Fast-LLM incorporates several bundled tests that execute multiple subtests within a single subprocess call. As bundled calls can generate substantial output and potentially reduce report readability, Fast-LLM captures the output from each subtest and forwards it to an associated test. If necessary, this output capture can be disabled using `--no-distributed-capture`—for instance, if a severe crash hinders output capture or to disable pytest capture entirely (`-s`). Captured logs are stored in the testing cache directory; please consult individual tests for specific locations.
+
+For example, `test_run_model_distributed[llama]` tries various distributed configurations for the `llama` model, each reported under an associated test such as `test_model_distributed[llama-distributed]`. Should a distributed subtest, say `tp2` (tensor-parallel), encounter a failure, `test_run_model_distributed` will log the issue, continue executing remaining subtests, and ultimately raise an error to designate the bundled test as failed. The associated test, `test_model_distributed[llama-tp2]`, will also fail and display the captured output (retrieved from `/tmp/fast_llm_tests/models/llama/tp2/`), separated by type (stdout, stderr and traceback) as would happen for a normal test (minus some advanced formating), but also by rank.
+
+### Other options
+
+* `--show-gpu-memory N`: Monitors GPU memory use and reports the top N tests (default 10). Mainly helps ensure tests don't exceed memory limits, but results may not be precise.
+* `--show-skipped`: Many tests skipped for obvious reasons (ex. marked as slow or extra slow, skipped model testing groups (see below)) are removed entirely from the report to reduce clutter. Use this flag to display them.
+
+## Best practices
+
 ## Testing models
 
 [Model integration tests](https://github.com/ServiceNow/Fast-LLM/blob/main/tests/models) are the most important part of our testing suite, ensuring that Fast-LLM works and yields consistent results for a variety of models, training configurations, optimizations, etc.
 
@@ -230,21 +230,21 @@ Continuing our `AwesomeModel` handler example, we define:
 
 ```python
     def _create_weight_converters(self) -> list[WeightConverter]:
-    converters = []
-    # The set of converters may depend on the base model configuration, which is accessible through `self._model.base_model_config`.
-    num_layers = self._model.config.base_model.transformer.num_layers
-
-    # A simple renaming example, for the word embeddings.
-    converters.append(WeightConverter("layers.0.word_embeddings_weight", "model.embed_tokens.weight"))
-
-    # We usually want to loop dynamically over layers
-    for i in range(num_layers):
-        # A `SplitWeightConverter` example, splitting a weight in two.
-        converters.append(SplitWeightConverter(
-            f"layers.{i + 1}.weight",
-            (f"model.layers.{i}.weight_1", f"model.layers.{i}.weight_2"),
-        ))
-    return converters
+        converters = []
+        # The set of converters may depend on the base model configuration, which is accessible through `self._model.base_model_config`.
+        num_layers = len(self._model.config.base_model.decoder)
+
+        # A simple renaming example, for the word embeddings.
+        converters.append(WeightConverter("layers.0.word_embeddings_weight", "model.embed_tokens.weight"))
+
+        # We usually want to loop dynamically over layers
+        for i in range(num_layers):
+            # A `SplitWeightConverter` example, splitting a weight in two.
+            converters.append(SplitWeightConverter(
+                f"layers.{i + 1}.weight",
+                (f"model.layers.{i}.weight_1", f"model.layers.{i}.weight_2"),
+            ))
+        return converters
 ```
 
 And that's it! We're ready to use the new checkpoint format in Fast-LLM.
 
@@ -21,12 +21,12 @@ Below is a step-by-step example of how to generate text using a Fast-LLM model c
 import huggingface_hub
 from transformers import AutoTokenizer
 from fast_llm.engine.checkpoint.config import CheckpointLoadConfig
-from fast_llm.models.gpt.config import LlamaGPTHuggingfaceCheckpointFormat
+from fast_llm.models.gpt.conversion.config import LlamaCheckpointFormat
 from fast_llm.models.gpt.huggingface import HuggingfaceGPTModelForCausalLM
 
 # Specify model and configuration
 model = "HuggingFaceTB/SmolLM2-135M-Instruct"
-checkpoint_format = LlamaGPTHuggingfaceCheckpointFormat
+checkpoint_format = LlamaCheckpointFormat
 max_new_tokens = 50
 
 # Download model checkpoint from the Hugging Face Hub to a local directory
 
@@ -27,32 +27,42 @@ optimizer:
   beta_2: 0.95
 model:
   base_model:
-    transformer:
+    embeddings_layer:
+      hidden_size: 4096
+      vocab_size: 32000
+      dropout: 0.0
+    decoder:
+      block:
+        mixer:
+          type: attention
+          rotary:
+            type: default
+            theta: 10000
+          heads: 32
+          head_groups: 8
+          head_size: 128
+          add_linear_biases: false
+          window_size: 4096
+          dropout: 0.0
+        mlp:
+          intermediate_size: 14336
+          add_linear_biases: false
+          gated: true
+          activation: silu
+        normalization:
+          type: rms_norm
+          epsilon: 1.0e-05
+        dropout: 0.0
+      num_blocks: 32
+    output_layer:
+      tied_weight: false
       normalization:
         type: rms_norm
         epsilon: 1.0e-05
-      rotary:
-        type: default
-        theta: 10000
-      num_layers: 32
-      hidden_size: 4096
-      ffn_hidden_size: 14336
-      num_attention_heads: 32
-      head_groups: 8
-      add_linear_biases: false
-      gated: true
-      activation_type: silu
-      kv_channels: 128
-      window_size: 4096
-      init_method_std: 0.009021
-      attention_dropout: 0.0
-      hidden_dropout: 0.0
-    vocab_size: 32000
-    tie_word_embeddings: false
   multi_stage:
     zero_stage: 2
   distributed:
-    training_dtype: bf16
+    compute_dtype: bf16
     seed: 984059
 run:
   experiment_dir: mistral_example
@@ -1 +1 @@
-__version__ = "0.2.0"
+__version__ = "0.3.0"
@@ -759,57 +759,32 @@ def from_dict(
         return cls._from_dict(default, strict)
 
     @classmethod
-    def from_flat_dict(
-        cls,
-        default: dict[str, typing.Any],
-        strict: bool = True,
-    ) -> typing.Self:
-        # TODO v0.3: Remove flat format
-        return cls._from_dict(default, strict, True)
-
-    @classmethod
-    def _from_dict(
-        cls,
-        default: dict[str, typing.Any],
-        strict: bool = True,
-        flat: bool = False,
-    ) -> typing.Self:
-        # TODO v0.3: Remove flat format
+    def _from_dict(cls, default: dict[str, typing.Any], strict: bool = True) -> typing.Self:
         out_arg_dict = {"_from_dict_check": True}
-
-        # TODO v0.3: Remove backward compatibility fix
-        if "__class__" in default:
-            del default["__class__"]
-
         try:
             actual_cls = cls.get_subclass(default.get("type"))
-            if actual_cls is not None and actual_cls is not cls:
-                return actual_cls._from_dict(default, strict=strict, flat=flat)
         except KeyError:
-            # Postpone error to validation.
-            pass
+            # Try to postpone error to validation.
+            actual_cls = cls
+
+        if actual_cls is not None and actual_cls is not cls:
+            return actual_cls._from_dict(default, strict=strict)
 
         # Do not validate yet in case the root class sets cross-dependencies in validation.
         with NoAutoValidate():
             for name, field in cls.fields():
                 if not field.init or field._field_type != dataclasses._FIELD:  # noqa
                     continue
-                if flat:
-                    if isinstance(field.type, type) and issubclass(field.type, Config):
-                        out_arg_dict[name] = field.type._from_dict(default, False, True)
-                    elif name in default:
-                        out_arg_dict[name] = default.pop(name)
-                else:
-                    # Check for nested configs to instantiate.
-                    try:
-                        value = cls._from_dict_nested(default.pop(name, MISSING), field.type, strict)
-                        if value is not MISSING:
-                            out_arg_dict[name] = value
-                    except FieldTypeError as e:
-                        raise FieldTypeError(
-                            f"Invalid field type `{get_type_name(field.type)}` in class {cls._get_class_name()}: "
-                            + ", ".join(e.args)
-                        )
+                # Check for nested configs to instantiate.
+                try:
+                    value = cls._from_dict_nested(default.pop(name, MISSING), field.type, strict)
+                    if value is not MISSING:
+                        out_arg_dict[name] = value
+                except FieldTypeError as e:
+                    raise FieldTypeError(
+                        f"Invalid field type `{get_type_name(field.type)}` in class {cls._get_class_name()}: "
+                        + ", ".join(e.args)
+                    )
             out = cls(**out_arg_dict)  # noqa
             if strict and default:
                 out._unknown_fields = default.copy()
@@ -1028,6 +1003,28 @@ def __init__(self, config: ConfigType, *args, **kwargs):
         # Handle multiple inheritance.
         super().__init__(*args, **kwargs)
 
+    def __init_subclass__(cls):
+        # Automatically set `config_class` based on the bound type.
+        # Make sure `ConfigType` is bound and respects class hierarchy.
+        try:
+            config_class = None
+            for base in types.get_original_bases(cls):
+                if hasattr(base, "__origin__") and issubclass(base.__origin__, Configurable):
+                    for arg in base.__args__:
+                        if arg.__name__ == "ConfigType":
+                            if config_class is None:
+                                config_class = arg.__bound__
+                            else:
+                                assert arg.__bound__ is config_class
+            assert config_class is not None
+        except Exception as e:
+            raise TypeError(
+                f"Could not determine the configuration class for the configurable class {cls.__name__}: {e.args}. "
+                "Please make sure to declare in the format "
+                f"`class {cls.__name__}[ConfigType: ConfigClass](BaseConfigurable[ConfigType])`.] "
+            )
+        cls.config_class = config_class
+
     @property
     def config(self) -> ConfigType:
         return self._config
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = "0.2.0"`
	`1`	`+__version__ = "0.3.0"`