Hi MemPrivacy team, thanks for the great work!
I'd like to run the evaluation in this repo, but I can't find the benchmark data. A few questions:
1. MemPrivacy-Bench / PersonaMem-v2 data
evaluation/eval.py reads from a local file:
input_file = 'test_mem_privacy_annotated_final.jsonl'
But this .jsonl file does not seem to be committed to the repo, and I can't locate it. The appendix (supplemental_material/MemPrivacy_Appendices.pdf) describes:
- MemPrivacy-Bench: 200 synthetic users from PersonaHub seeds, bilingual (zh/en), train split 160 users, plus a test split
- PersonaMem-v2 evaluation split: 20 users, 2,521 turns, 2,378 privacy instances, 563 QA pairs
Could you please release these datasets (the annotated .jsonl files), or point to where they are hosted?
2. Hosting location
The HuggingFace collection (IAAR-Shanghai/memprivacy) currently lists only the 4 models (1.7B/4B SFT/RL) and the paper — no datasets. Is the benchmark planned for release on HuggingFace Datasets / ModelScope, or elsewhere?
3. Expected schema
Could you confirm the expected fields of each JSONL record? From the code it looks like: dialogues, uuid, metadata, questions. A small sample record would be very helpful for reproducing the eval.
Thanks a lot!
Hi MemPrivacy team, thanks for the great work!
I'd like to run the evaluation in this repo, but I can't find the benchmark data. A few questions:
1. MemPrivacy-Bench / PersonaMem-v2 data
evaluation/eval.pyreads from a local file:But this
.jsonlfile does not seem to be committed to the repo, and I can't locate it. The appendix (supplemental_material/MemPrivacy_Appendices.pdf) describes:Could you please release these datasets (the annotated
.jsonlfiles), or point to where they are hosted?2. Hosting location
The HuggingFace collection (
IAAR-Shanghai/memprivacy) currently lists only the 4 models (1.7B/4B SFT/RL) and the paper — no datasets. Is the benchmark planned for release on HuggingFace Datasets / ModelScope, or elsewhere?3. Expected schema
Could you confirm the expected fields of each JSONL record? From the code it looks like:
dialogues,uuid,metadata,questions. A small sample record would be very helpful for reproducing the eval.Thanks a lot!