- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 11k
 
Closed as not planned
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity
Description
Your current environment
vllm 0.5.4
🐛 Describe the bug
autoawq marlin must with no zero point, but vllm:
def query_marlin_supported_quant_types(has_zp: bool,
                                       min_capability: Optional[int] = None):
    if min_capability is None:
        major, minor = current_platform.get_device_capability()
        min_capability = major * 10 + minor
    if min_capability < 80:
        return []
    if has_zp:
        # AWQ style, unsigned + runtime zero-point
        return [scalar_types.uint4, scalar_types.uint8]
    else:
        # GPTQ style, unsigned + symmetric bias
        # TODO: once fp8_marlin is merged into "gptq_marlin" we should be able
        #  to add `scalar_types.float8_e4m3fn` here
        return [scalar_types.uint4b8, scalar_types.uint8b128]`this would error### ###
liangzelang
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity