Skip to content

Where to find the benchmark datasets (MemPrivacy-Bench / PersonaMem-v2 .jsonl)? #3

@HongguangLi

Description

@HongguangLi

Hi MemPrivacy team, thanks for the great work!

I'd like to run the evaluation in this repo, but I can't find the benchmark data. A few questions:

1. MemPrivacy-Bench / PersonaMem-v2 data

evaluation/eval.py reads from a local file:

input_file = 'test_mem_privacy_annotated_final.jsonl'

But this .jsonl file does not seem to be committed to the repo, and I can't locate it. The appendix (supplemental_material/MemPrivacy_Appendices.pdf) describes:

  • MemPrivacy-Bench: 200 synthetic users from PersonaHub seeds, bilingual (zh/en), train split 160 users, plus a test split
  • PersonaMem-v2 evaluation split: 20 users, 2,521 turns, 2,378 privacy instances, 563 QA pairs

Could you please release these datasets (the annotated .jsonl files), or point to where they are hosted?

2. Hosting location

The HuggingFace collection (IAAR-Shanghai/memprivacy) currently lists only the 4 models (1.7B/4B SFT/RL) and the paper — no datasets. Is the benchmark planned for release on HuggingFace Datasets / ModelScope, or elsewhere?

3. Expected schema

Could you confirm the expected fields of each JSONL record? From the code it looks like: dialogues, uuid, metadata, questions. A small sample record would be very helpful for reproducing the eval.

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions