You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/contributing/contributing.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,15 +40,15 @@ Before diving into code, [open an issue](https://github.com/ServiceNow/Fast-LLM/
40
40
Here are some tips to ensure your pull request gets reviewed and merged promptly:
41
41
42
42
-**Follow our coding standards**: Stick to our [style guide and conventions](https://servicenow.github.io/Fast-LLM/developers/style-guide) to keep the code clean and consistent.
43
-
-**Write tests**: Verify your changes with unit tests for new features or bug fixes.
43
+
-**Write tests**: Verify your changes with unit tests for new features or bug fixes. See our [testing guide](https://servicenow.github.io/Fast-LLM/contributing/testing) for tips and recommendations on testing.
44
44
-**Test on GPUs and real-world workloads**: Since Fast-LLM is all about training large language models, make sure your changes work smoothly in GPU environments and on typical training setups.
45
45
-**Run benchmarks and performance tests**: Make sure your changes don't slow things down. If there's any impact on performance, provide benchmark results to back it up.
46
46
-**Avoid introducing new issues**: Check that there are no new runtime warnings, type checker errors, linting problems, or unhandled edge cases.
47
47
-**Comment non-trivial code**: Make your code easy to understand for others.
48
48
-**Keep sensitive data out**: Make sure your code or commit messages don't expose private or proprietary information.
49
49
-**Use a clear and descriptive title**: The PR title should summarize the key change or feature introduced. Avoid vague titles like "Fix bug" or "Update code." Start with a keyword like `[feat]`, `[fix]`, `[docs]`, etc. to categorize the change. Reference the issue number if applicable (e.g., `[fix] resolve #123 memory leak in training loop`). This title will become the commit message for the squashed merge.
50
50
-**Use the [PR template](https://github.com/ServiceNow/Fast-LLM/blob/main/.github/PULL_REQUEST_TEMPLATE.md)**: Complete the checklist to make sure everything is in order before hitting submit.
51
-
-**Make sure all tests pass before merging**: Run the tests with `pytest tests/ -v -ra -n 10`, and fix any failure before merging. If possible, please run the test in an environment with at least 4 GPUs.
51
+
-**Make sure all tests pass before merging**: Run the tests with `pytest tests/ -v -ra -n 10`, and fix any failure before merging. If possible, please run the test in an environment with at least 4 GPUs. See our [testing guide](https://servicenow.github.io/Fast-LLM/contributing/testing) for more details on testing and debugging.
Copy file name to clipboardExpand all lines: docs/contributing/testing.md
+24-1Lines changed: 24 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,30 @@
1
1
---
2
-
title: Writing tests
2
+
title: Writing and running tests
3
3
---
4
4
5
+
## Debugging with tests
6
+
7
+
### Selecting tests
8
+
9
+
When debugging, it is often advisable to target specific tests that can be executed efficiently. Although Pytest allows targeting specific tests or files, complex parameterization and dependencies in our suite often make explicit selection difficult. To address this, several options for test selection are available:
10
+
11
+
*`--skip-slow`: Executes a subset of expedited tests that encompass much of the codebase. This option is effective for quickly checking for major regressions prior to executing the comprehensive test suite. Please note, parallel testing (`-n`) is typically unnecessary—and may even be counterproductive—when using this argument.
12
+
*`--run-extra-slow`: Certain tests are disabled by default due to their lengthy execution times (e.g., complex integration tests) or limited criticality. Use this flag to re-enable them.
13
+
*`--models MODEL0 MODEL1 ...`: Enables targeting of one or more specific models within the model testing suite. This feature is particularly useful during model-specific debugging efforts. For instance, running `pytest tests/models/test_models/test_checkpoint.py -v -ra --models llama` will specifically test checkpointing functionality for the llama model. Note that parallelization (`-n`) may be unnecessary in this context, as model tests for a given model are only partially distributed due to dependency constraints.
14
+
15
+
### Monitoring distributed tests
16
+
17
+
Distributed tests are generally the slowest due to the overhead associated with starting processes and process groups. To mitigate this, Fast-LLM incorporates several bundled tests that execute multiple subtests within a single subprocess call. As bundled calls can generate substantial output and potentially reduce report readability, Fast-LLM captures the output from each subtest and forwards it to an associated test. If necessary, this output capture can be disabled using `--no-distributed-capture`—for instance, if a severe crash hinders output capture or to disable pytest capture entirely (`-s`). Captured logs are stored in the testing cache directory; please consult individual tests for specific locations.
18
+
19
+
For example, `test_run_model_distributed[llama]` tries various distributed configurations for the `llama` model, each reported under an associated test such as `test_model_distributed[llama-distributed]`. Should a distributed subtest, say `tp2` (tensor-parallel), encounter a failure, `test_run_model_distributed` will log the issue, continue executing remaining subtests, and ultimately raise an error to designate the bundled test as failed. The associated test, `test_model_distributed[llama-tp2]`, will also fail and display the captured output (retrieved from `/tmp/fast_llm_tests/models/llama/tp2/`), separated by type (stdout, stderr and traceback) as would happen for a normal test (minus some advanced formating), but also by rank.
20
+
21
+
### Other options
22
+
23
+
*`--show-gpu-memory N`: Monitors GPU memory use and reports the top N tests (default 10). Mainly helps ensure tests don't exceed memory limits, but results may not be precise.
24
+
*`--show-skipped`: Many tests skipped for obvious reasons (ex. marked as slow or extra slow, skipped model testing groups (see below)) are removed entirely from the report to reduce clutter. Use this flag to display them.
25
+
26
+
## Best practices
27
+
5
28
## Testing models
6
29
7
30
[Model integration tests](https://github.com/ServiceNow/Fast-LLM/blob/main/tests/models) are the most important part of our testing suite, ensuring that Fast-LLM works and yields consistent results for a variety of models, training configurations, optimizations, etc.
0 commit comments