Skip to content

What happens to the total KV length > max-compacity length during response generation? #23

@PengWenChen

Description

@PengWenChen

Hi, thanks for your great work!

It's impressive to compress the long prompt KVs into a constant length.
I'm wondering whether the scenario here also consider the case that generation responses > maximum compacity?

It always goes to ln127 only during prefilling stage, and during generation stage it always goes to ln131.
Is my understanding correct?
https://github.com/FasterDecoding/SnapKV/blob/main/snapkv/monkeypatch/mistral_hijack_4_37.py#L127-L133

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions