feat: add JoyImage edit plus#14032
Conversation
|
Hi @tangyanf, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. |
There was a problem hiding this comment.
🤗 Serge says:
This PR adds the JoyImage Edit Plus model and pipeline. There are several blocking issues that need to be addressed before merging.
Blocking — Debug artifacts left in production code
Multiple torch.save() calls, a print() statement, and a commented-out exit(0) are left in pipeline_joyimage_edit_plus.py. These will write files to the user's working directory and print to stdout during every inference call.
Blocking — einops dependency
Per .ai/models.md: "No new mandatory dependency without discussion (e.g. einops). Optional deps guarded with is_X_available() and a dummy in utils/dummy_*.py." The pipeline directly imports from einops import rearrange — this is the only non-comment usage of einops in src/diffusers/. The rearrange calls should be rewritten with native PyTorch (reshape, permute, unflatten).
Blocking — sglang integration code in model forward
The transformer's forward method contains sglang-specific code: list-unwrapping for "SglangXvideo CFG branches" (lines 272-276) and a try: from sglang... fallback (lines 279-287). Per .ai/AGENTS.md: "No defensive code, unused code paths, or legacy stubs — do not add fallback paths, safety checks, or configuration options 'just in case'." This code doesn't belong in the diffusers model — the pipeline always passes the required arguments.
Blocking — Missing dummy objects
JoyImageEditPlusTransformer3DModel, JoyImageEditPlusPipeline, and JoyImageEditPlusPipelineOutput are not registered in dummy_pt_objects.py / dummy_torch_and_transformers_objects.py. This will cause ImportError when torch/transformers are not installed.
Blocking — Missing tests
No test files were added for the new model or pipeline.
Blocking — Hardcoded device_type="cuda" in torch.autocast
torch.autocast(device_type="cuda", ...) is hardcoded in two places in the pipeline. This will fail on MPS, XPU, and other non-CUDA devices.
Non-blocking — Inlined scheduler sigma math
Per .ai/pipelines.md gotcha #3, the pipeline manually computes shifted sigmas and temporarily overrides self.scheduler.shift — this is exactly what FlowMatchEulerDiscreteScheduler does with its shift config. The scheduler should own this logic.
Non-blocking — Unused imports and parameters
import inspectintransformer_joyimage_edit_plus.pyis unused.enable_denormalizationparameter is declared inprepare_latentsand__call__but never read.retrieve_timestepsis duplicated from the existing pipeline without a# Copied fromannotation.
serge v0.1.0 · model: claude-opus-4-6 · 29 LLM turns · 50 tool calls · 190.2s · 1602502 in / 7369 out tokens
- Remove einops dependency: replace rearrange with reshape/permute
- Remove sglang-specific code from transformer forward
- Remove unused import inspect from transformer
- Fix hardcoded device_type="cuda" to use device.type
- Simplify scheduler sigma math: delegate to retrieve_timesteps
- Remove unused enable_denormalization parameter
- Fix callback latents variable binding
- Fix output_type="pt" to return stacked tensor
- Set return_dict default to True in transformer forward
- Add dummy objects for JoyImageEditPlus classes
- Add transformer and pipeline test files
6f2763a to
8a911e5
Compare
|
@claude can you do a review here? |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
yiyixuxu
left a comment
There was a problem hiding this comment.
thanks, i left some feedbacks
| from ..modeling_utils import ModelMixin | ||
| from ..normalization import FP32LayerNorm | ||
| from .transformer_joyimage import ( | ||
| JoyImageAttention, |
There was a problem hiding this comment.
we should not cross-import between transformer models
for exact same blocks you want to reuse, can you copy paste over with #Copied from .... with JoyImage-> JoyImagePlus statements? , or create new ones for joy image edit
| encoder_hidden_states: torch.Tensor = None, | ||
| image_rotary_emb: tuple[torch.Tensor, torch.Tensor] | None = None, | ||
| attention_mask: torch.Tensor | None = None, | ||
| **kwargs, |
There was a problem hiding this comment.
| **kwargs, |
| if cos.ndim == 2: | ||
| # unbatched: [S, D] -> [1, S, 1, D] | ||
| cos = cos.unsqueeze(0).unsqueeze(2) | ||
| sin = sin.unsqueeze(0).unsqueeze(2) |
There was a problem hiding this comment.
| if cos.ndim == 2: | |
| # unbatched: [S, D] -> [1, S, 1, D] | |
| cos = cos.unsqueeze(0).unsqueeze(2) | |
| sin = sin.unsqueeze(0).unsqueeze(2) |
| if shape_list is None: | ||
| raise ValueError( | ||
| "shape_list must be provided either as an argument or via forward_batch.vae_image_sizes" | ||
| ) |
There was a problem hiding this comment.
| if shape_list is None: | |
| raise ValueError( | |
| "shape_list must be provided either as an argument or via forward_batch.vae_image_sizes" | |
| ) |
let's just make sure shape_list is a required argument
| if vec.shape[-1] > self.hidden_size: | ||
| vec = vec.unflatten(1, (6, -1)) |
There was a problem hiding this comment.
| if vec.shape[-1] > self.hidden_size: | |
| vec = vec.unflatten(1, (6, -1)) | |
| vec = vec.unflatten(1, (6, -1)) |
| """ | ||
|
|
||
|
|
||
| def retrieve_timesteps( |
There was a problem hiding this comment.
can we add a #Copied from ... here?
| img = img.crop((left, top, left + bw, top + bh)) | ||
| return img | ||
|
|
||
| def _get_bucket_size(self, img: Image.Image) -> tuple[int, int]: |
There was a problem hiding this comment.
can you move this to image_processor.py?
| latent = latent / self.vae.config.scaling_factor | ||
| return latent | ||
|
|
||
| def _resize_center_crop(self, img: Image.Image, target_size: tuple[int, int]) -> Image.Image: |
There was a problem hiding this comment.
you already have a _resize_center_crop method on the image processor? can you just make it work with self.image_processor.preprocess(...)?
Description
We are the JoyAI Team, and this is the Diffusers implementation for the JoyAI-Image-Edit-Plus model.
GitHub Repository: [https://github.com/jd-opensource/JoyAI-Image]
Hugging Face Model: [https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers]
Original opensource weights: [https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus]
Fixes #14049
Model Overview
JoyAI-Image-Edit-Plus extends JoyAI-Image-Edit with multi-image editing capabilities. While JoyAI-Image-Edit operates on a single reference image, Edit-Plus accepts multiple reference
images as input and performs instruction-guided editing across them — enabling tasks such as subject composition, style transfer from multiple sources, and multi-view consistent editing.
It combines an 8B Multimodal Large Language Model (MLLM) with a 16B Multimodal Diffusion Transformer (MMDiT), supporting variable-resolution reference images that are independently
encoded and jointly denoised.
Key Features
and dog images).