Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 580 Bytes

parallel_output.md

File metadata and controls

5 lines (3 loc) · 580 Bytes

Parallel Computing Loss

The parallel computing loss function in InternEvo is adapted from Apex. Users can replace the loss function with Flash-Attention to obtain speedup, which may lead to loss divergence.

For detailed modifications in InternEvo,please refer to the code InternEvo-parallel-loss