[feat]: support wan2.2 s2v, support dist infer, pose-audio by Watebear · Pull Request #1113 · ModelTC/LightX2V

Watebear · 2026-06-02T07:18:20Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces support for the Wan2.2 S2V (Speech-to-Video) model, adding configuration files, a dedicated runner, networks, and utilities for audio encoding, audio injection, and frame packing. The review feedback highlights several key improvements: fixing an AttributeError in weight synchronization within casual_audio.py, replacing a float comparison with an integer frame count check in audio_encoder.py to avoid precision issues, dynamically determining latent channels in framepack.py, ensuring multi-device portability in pre_infer.py by avoiding hardcoded device strings, removing an unused variable in transformer_infer.py, and correcting a type annotation syntax error in wan_causal_audio_module.py.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T07:19:59Z

+        dst.final_linear.weight.data.copy_(src.final_linear.weight.t())
+        dst.final_linear.bias.data.copy_(src.final_linear.bias)


src.final_linear is registered as an MMWeight module, which wraps the underlying PyTorch tensors in WeightTensor objects. Accessing .weight and .bias directly on src.final_linear returns these wrapper objects rather than raw PyTorch tensors, which will cause an AttributeError when calling .t() or copying data. You should access .tensor on them, matching how other weights are copied in this function.

Suggested change

dst.final_linear.weight.data.copy_(src.final_linear.weight.t())

dst.final_linear.bias.data.copy_(src.final_linear.bias)

dst.final_linear.weight.data.copy_(src.final_linear.weight.tensor.t())

dst.final_linear.bias.data.copy_(src.final_linear.bias.tensor)

gemini-code-assist · 2026-06-02T07:19:59Z

+    if required_duration > total_frames / original_fps:
+        raise ValueError("required_duration must be less than video length")


Comparing float durations (required_duration > total_frames / original_fps) can lead to precision issues (e.g., when they are mathematically equal but float representation makes one slightly larger), causing unexpected ValueError exceptions. Comparing the integer frame counts (required_origin_frames > total_frames) is much more robust.

Suggested change

if required_duration > total_frames / original_fps:

raise ValueError("required_duration must be less than video length")

if required_origin_frames > total_frames:

raise ValueError("required_duration must be less than video length")

gemini-code-assist · 2026-06-02T07:19:59Z

+
+    for m in motion_latents:
+        lat_height, lat_width = m.shape[2], m.shape[3]
+        padd_lat = torch.zeros(16, zip_frame_buckets.sum(), lat_height, lat_width, device=m.device, dtype=m.dtype)


The number of latent channels is hardcoded to 16. To make the frame packing logic robust to different VAE architectures or configurations, it is better to dynamically use m.shape[0] instead of a magic number.

Suggested change

padd_lat = torch.zeros(16, zip_frame_buckets.sum(), lat_height, lat_width, device=m.device, dtype=m.dtype)

padd_lat = torch.zeros(m.shape[0], zip_frame_buckets.sum(), lat_height, lat_width, device=m.device, dtype=m.dtype)

gemini-code-assist · 2026-06-02T07:20:00Z

+from lightx2v.models.networks.wan.infer.transformer_infer import WanTransformerInfer
+from lightx2v_platform.base.global_var import AI_DEVICE
+
+torch_device_module = getattr(torch, AI_DEVICE)


torch_device_module is defined but never used anywhere in this file. It can be safely removed to clean up the code and avoid potential AttributeError on unsupported devices.

gemini-code-assist · 2026-06-02T07:20:00Z

+
+
+class MotionEncoder_tc(nn.Module):
+    def __init__(self, in_dim: int, hidden_dim: int, num_heads=int, need_global=True, dtype=None, device=None):


In the method signature, num_heads=int uses the assignment operator = instead of a colon : for the type annotation. This makes the default value of num_heads the int class itself, which will cause runtime errors (e.g., in rearrange) if the argument is ever omitted. It should be written as num_heads: int.

Suggested change

def __init__(self, in_dim: int, hidden_dim: int, num_heads=int, need_global=True, dtype=None, device=None):

def __init__(self, in_dim: int, hidden_dim: int, num_heads: int, need_global=True, dtype=None, device=None):

[feat]: support wan2.2 s2v, support dist infer, pose-audio

dd0acbb

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Watebear force-pushed the s2v branch 2 times, most recently from 24d4f7e to cd75cec Compare June 2, 2026 09:19

fix causal

39dc72f

Watebear force-pushed the s2v branch from cd75cec to 39dc72f Compare June 2, 2026 09:31

helloyongyang approved these changes Jun 2, 2026

View reviewed changes

helloyongyang merged commit 92b2f1b into main Jun 2, 2026
2 checks passed

helloyongyang deleted the s2v branch June 2, 2026 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat]: support wan2.2 s2v, support dist infer, pose-audio#1113

[feat]: support wan2.2 s2v, support dist infer, pose-audio#1113
helloyongyang merged 2 commits into
mainfrom
s2v

Watebear commented Jun 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		dst.final_linear.weight.data.copy_(src.final_linear.weight.t())
		dst.final_linear.bias.data.copy_(src.final_linear.bias)

		if required_duration > total_frames / original_fps:
		raise ValueError("required_duration must be less than video length")

	padd_lat = torch.zeros(16, zip_frame_buckets.sum(), lat_height, lat_width, device=m.device, dtype=m.dtype)
	padd_lat = torch.zeros(m.shape[0], zip_frame_buckets.sum(), lat_height, lat_width, device=m.device, dtype=m.dtype)



		class MotionEncoder_tc(nn.Module):
		def __init__(self, in_dim: int, hidden_dim: int, num_heads=int, need_global=True, dtype=None, device=None):

Conversation

Watebear commented Jun 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants