Haozhe Yang, Zhiyang Dou, Zekai Gu, Cheng Lin, Wenping Wang, Yuan Liu, Taku Komura
Submitted to Advances in Neural Information Processing Systems (NeurIPS). Under Review. 2026 Under Review
We introduce TKCAM, a text- and keyframe-conditioned camera trajectory generation framework based on generative masked modeling. Camera dynamics are formulated as continuous 12-DoF kinematic sequences and discretized into hierarchical motion tokens via a Residual Vector Quantizer (RVQ), enabling a two-stage masked transformer to reconstruct temporally coherent trajectories from free-form text prompts and sparse key poses. TKCAM significantly surpasses recent state-of-the-art baselines on FID, text-motion matching, and retrieval metrics (R@K).
[Paper (Coming Soon)] [Code (Coming Soon)]
Haozhe Yang, Zhiyang Dou, Zekai Gu, Cheng Lin, Wenping Wang, Yuan Liu, Taku Komura
Submitted to Advances in Neural Information Processing Systems (NeurIPS). Under Review. 2026 Under Review
We introduce TKCAM, a text- and keyframe-conditioned camera trajectory generation framework based on generative masked modeling. Camera dynamics are formulated as continuous 12-DoF kinematic sequences and discretized into hierarchical motion tokens via a Residual Vector Quantizer (RVQ), enabling a two-stage masked transformer to reconstruct temporally coherent trajectories from free-form text prompts and sparse key poses. TKCAM significantly surpasses recent state-of-the-art baselines on FID, text-motion matching, and retrieval metrics (R@K).
[Paper (Coming Soon)] [Code (Coming Soon)]