Skip to content

Group Query Attention #22

@SimJeg

Description

@SimJeg

Hello,

Could you clarify how you handle group query attention ? For instance in Mistral 7B, there are 8 key value heads and 32 heads. So a given key-value pair is associated with 4 different queries and hence 4 different attention weights. How do you aggregate these 4 values ? I do see the num_key_value_groups variable in the update_kv method but it is not used.

Thanks !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions