MotionDiffuse
Mingyuan Zhang et al.|200M MotionDiffuse is a pioneering diffusion model developed by Mingyuan Zhang and collaborators that generates realistic 3D human motion sequences from natural language text descriptions. The model takes text prompts such as 'a person walks forward and waves' or 'someone performs a backflip' and produces corresponding 3D skeleton-based animation data with natural body dynamics and physical plausibility. Built on a diffusion architecture with approximately 200 million parameters, MotionDiffuse introduces probabilistic motion generation that captures the inherent diversity of human movement, generating multiple plausible motion variations for the same text input. The model supports both single-action and sequential multi-action generation, enabling the creation of complex motion sequences that smoothly transition between different activities. MotionDiffuse was trained on large-scale motion capture datasets including HumanML3D and KIT-ML, learning to map semantic descriptions to physically realistic joint rotations and translations across the full body skeleton. The generated motion data can be exported in standard formats compatible with 3D animation software including Blender, Maya, and Unity, making it practical for professional production workflows. Released under the MIT license, the model is fully open source and available for both research and commercial applications. Key use cases include generating character animations for games and films, creating training data for pose estimation models, prototyping choreography, producing VR and AR avatar movements, and automating repetitive animation tasks that traditionally require skilled motion capture artists and extensive studio equipment.