GitHub - KAIST-Visual-AI-Group/DFP: Official Implementation of Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

Juil Koo · Mingue Park · Jiwon Choi · Yunhong Min · Minhyuk Sung

KAIST

Preprint

Drifting Field Policy (DFP) is a novel one-step generative policy that avoids the trajectory-level credit assignment of diffusion policies in RL fine-tuning.

Overview

Drifting Field Policy (DFP) is a one-step generative policy framework for reinforcement learning. Instead of relying on iterative diffusion or ODE-based sampling, DFP represents the policy as a direct pushforward map from noise to actions. Policy improvement is formulated as a Wasserstein gradient flow that moves the action distribution toward high-value regions under a critic. This enables fast, multimodal action generation while keeping the training objective directly aligned with reward-guided policy improvement.

This release contains:

drift: Drifting Field Policy.
meanflow: Mean Velocity Policy comparison backbone.
acfql: QC/FQL baseline retained from the original action-chunking codebase.

Environment and Requirements

Tested Environment

Python: 3.10
CUDA: 12.x
Benchmarks: Robomimic, OGBench

Installation

conda env create -f environment.yml
conda activate dfp

Or install the pip dependencies manually:

conda create -n dfp python=3.10 pip -y
conda activate dfp
pip install -r requirements.txt

Datasets

Robomimic

Place the low-dimensional Robomimic datasets under the standard Robomimic directory:

~/.robomimic/lift/mh/low_dim_v15.hdf5
~/.robomimic/can/mh/low_dim_v15.hdf5
~/.robomimic/square/mh/low_dim_v15.hdf5

If your datasets live elsewhere, set:

export ROBOMIMIC_DATASET_DIR=/path/to/robomimic

The datasets can be downloaded from the Robomimic dataset page: https://robomimic.github.io/docs/datasets/robomimic_v0.1.html

OGBench Cube-Quadruple

For cube-quadruple, we use the 100M-size offline dataset:

wget -r -np -nH --cut-dirs=2 -A "*.npz" \
  https://rail.eecs.berkeley.edu/datasets/ogbench/cube-quadruple-play-100m-v0/

Pass the downloaded directory with:

--ogbench_dataset_dir=/path/to/cube-quadruple-play-100m-v0

MVP Baseline Note

MVP is our main comparison baseline. Since no official implementation was available, we implemented the MVP baseline ourselves for reproduction. Most hyperparameters follow the MVP paper, but for cube-triple experiments we set ivc_lambda=0 because it gave the strongest performance in our runs.

We were not able to fully reproduce the reported paper performance across all settings, so the MVP results in our experiments use the best-performing configuration we found.

Usage

The main results are offline-to-online runs. Each command first trains on the offline dataset and then continues online fine-tuning in the same run.

# DFP
MUJOCO_GL=egl python main.py --agent_config=drift --run_group=reproduce --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=5

# MVP
MUJOCO_GL=egl python main.py --agent_config=meanflow --run_group=reproduce --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=5

# QC-BFN
MUJOCO_GL=egl python main.py --run_group=reproduce --agent.actor_type=best-of-n --agent.actor_num_samples=32 --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=5

# QC-FQL
MUJOCO_GL=egl python main.py --run_group=reproduce --agent.alpha=100 --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=5

# BFN
MUJOCO_GL=egl python main.py --run_group=reproduce --agent.actor_type=best-of-n --agent.actor_num_samples=4 --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=1

# FQL
MUJOCO_GL=egl python main.py --run_group=reproduce --agent.alpha=100 --env_name=cube-triple-play-singletask-task2-v0 --sparse=False --horizon_length=1

The default agent is acfql, so the QC-BFN, QC-FQL, BFN, and FQL commands do not need an explicit --agent_config=acfql. Override the environment when needed:

MUJOCO_GL=egl python main.py \
  --agent_config=drift \
  --run_group=reproduce \
  --env_name=cube-quadruple-play-100m-singletask-task3-v0 \
  --ogbench_dataset_dir=/path/to/cube-quadruple-play-100m-v0 \
  --seed=42

Online-Only From an Offline Checkpoint

To skip offline training and start online fine-tuning from a saved offline checkpoint, pass the checkpoint and set restore_epoch to the offline training horizon:

MUJOCO_GL=egl python main.py \
  --agent_config=drift \
  --run_group=reproduce \
  --env_name=cube-triple-play-singletask-task3-v0 \
  --restore_path=/path/to/params_offline_final.pkl \
  --restore_epoch=1000000 \
  --seed=42

Repository Layout

Path	Description
`agents/`	DFP, MVP, and QC/FQL baseline agents
`config/`	Main, evaluation, optimizer, and agent configs
`envs/`	Robomimic, OGBench, and D4RL environment utilities
`utils/`	Datasets, networks, drifting loss, logging, and Flax utilities

Citation

If you find our work useful, please consider citing:

@article{koo2026drifting,
  title={Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow},
  author={Koo, Juil and Park, Mingue and Choi, Jiwon and Min, Yunhong and Sung, Minhyuk},
  journal={arXiv preprint arXiv:2605.07727},
  year={2026}
}

Acknowledgements

This repository builds on the Q-chunking/FQL codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agents		agents
assets		assets
config		config
envs		envs
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
evaluation.py		evaluation.py
log_utils.py		log_utils.py
main.py		main.py
main_online.py		main_online.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

Overview

Environment and Requirements

Tested Environment

Installation

Datasets

Robomimic

OGBench Cube-Quadruple

MVP Baseline Note

Usage

Online-Only From an Offline Checkpoint

Repository Layout

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

Overview

Environment and Requirements

Tested Environment

Installation

Datasets

Robomimic

OGBench Cube-Quadruple

MVP Baseline Note

Usage

Online-Only From an Offline Checkpoint

Repository Layout

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages