ATI: Any Trajectory Instruction for Controllable Video Generation

Angtian Wang Haibin Huang Jacob Zhiyuan Fang Yiding Yang Chongyang Ma

ByteDance Intelligent Creation

🔥 Highlight

🔥 Motion transfer demo and tools is added.
🔥 Any Trajectory Instruction (ATI) is a video generation approach that makes any trajectories you draw on an image into a realistic video.
🔥 ATI-Wan2.1 14B model is be publicly available on hugging face.
🔥 Thanks for Kijai developing the ComfyUI nodes for ATI: ComfyUI-WanVideoWrapper .
🔥 Guideline by Benji: ComfyUI Wan 2.1 With Any Trajectory Instruction Motion Control For AI Video.

Abstract

We present a trajectory-based motion control framework that unifies object, local and camera movements in video generation. By embedding user-defined keypoint paths into pretrained image-to-video models via a lightweight motion injector, our approach produces temporally coherent, semantically aligned motion. It excels across tasks—motion brushes, dynamic viewpoints and precise local deformations—offering superior controllability and visual quality compared to prior and commercial methods, while remaining compatible with diverse state-of-the-art backbones.

Feature-level Instruction of Trajectories

ATI enables fine-grained feature-level Instruction of Trajectories. Specifically, ATI introduces a Gaussian-based motion injector to encode trajectory signals, spanning local, object-level, and camera motion, directly into the latent space of a pretrained image-to-video diffusion model. This enables unified and continuous control over both object and camera dynamics.

ATI generates the videos from flexible input trajectories, enabling flexible object motion or cameral control. ATI takes an image and user specified trajectories as inputs. The point-wise trajectories are injected into the latent condition for the generation. Videos are decoded from the latent denoised from the DiT.

Trajectory Instruction module computes a latent feature from a point’s trajectory. During inference, given the point’s location in the first frame (i.e., the input image), we sample the feature at that location using bilinear interpolation. We then compute a spatial Gaussian distribution for each visible point on its corresponding location in every subsequent frame.

Input Any Trajectory Instruction

We provide an interactive editor tools for creating trajectory. This tools allowing user to create free draw trajectories, camera zoom in or out trajectories, as well as edit trajectory and applying camera pan move globally.

Trajectory Editor

Trajectory Editor Guideline

Left: process of user drawing trajectories. Right: generated video.

ATI WanX 14B Results

Note: all results are generated once a time without any cherry pick.

ATI Seaweed 7B Results

Note: all results are generated once a time without any cherry pick.

ATI for Video Motion Transfer

ATI can mimic a video by extracting its motion dynamics along with its first-frame image. Moreover, by leveraging powerful image-editing tools, it also enables "video-editing" capabilities.

Reference Video (for Extracting Motion)	First Frame Image	Generated Video

Acknowledgments

We sincerely acknowledge the insightful discussions from Bo Liu, Yizhi Wang, Alex, Xueqin Deng, and Linjie Yang. We greatly appreciate the help from Liming Jiang for website build.

Citation BibTeX

If you find ATI useful for your research or applications, please cite our paper:

@article{wang2025ati,
  title={{ATI}: Any Trajectory Instruction for Controllable Video Generation},
  author={Wang, Angtian and Huang, Haibin and Fang, Zhiyuan and Yang, Yiding, and Ma, Chongyang}
  journal={arXiv preprint},
  volume={arXiv:2505.22944},
  year={2025}
}