[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0 #750

00why00 · 2023-08-23T08:36:35Z

The problem is caused by the empty tensor check in quant ops in

FasterTransformer/examples/pytorch/gpt/utils/gpt.py

Lines 46 to 47 in f8e42aa

    
           quant = torch.ops.fastertransformer.symmetric_quantize_last_axis_of_batched_matrix 
        
           self.weight_transpose_calibrate_quantize = lambda x: quant(x, torch.int8)

When pipeline_para_size > 1, the current rank does not load the weights of all layers, resulting in quantization errors, and it is necessary to add is_load() check when traversing layers

…!= 0

[BigFix] add is_load check when pipeline_para_size > 1 and int8_mode …

322ad2a

…!= 0

00why00 changed the title ~~[BigFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0~~ [BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0 Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0 #750

[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0 #750

00why00 commented Aug 23, 2023

	quant = torch.ops.fastertransformer.symmetric_quantize_last_axis_of_batched_matrix
	self.weight_transpose_calibrate_quantize = lambda x: quant(x, torch.int8)

[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0 #750

Are you sure you want to change the base?

[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0 #750

Conversation

00why00 commented Aug 23, 2023