Skip to content

questions about inconsistent evaluation result #392

@coorful

Description

@coorful

Hi,i have used deepspeed framework to train gpt-117M model.
when i evaluate model perfomance on wikitext-103, result by using tasks/eval_harness/evaluate.py vs. first convert checkpoint to megatron format and use tasks/main.py , there exists a large performance gap in PPL...
May I ask what is the reason for this phenomenon? @mayank31398

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions