MAGI-1 is an advanced autoregressive video generation model created by Sand AI. It generates high-quality videos by predicting sequences of video chunks in an autoregressive manner. Trained to denoise video chunks, it supports causal temporal modeling and streaming generation. MAGI-1 excels in image-to-video (I2V) tasks, offering high temporal consistency and scalability due to its algorithmic innovations and dedicated infrastructure.
Model Features
MAGI-1 uses a Transformer-based Variational Autoencoder (VAE) that provides 8x spatial and 4x temporal compression. This results in fast decoding times and competitive reconstruction quality. The model employs an auto-regressive denoising algorithm, generating videos chunk-by-chunk. Each chunk (24 frames) is denoised holistically, and the next chunk begins once the current one reaches a certain level of denoising. This design allows concurrent processing of up to four chunks for efficient video generation.
The diffusion model architecture of MAGI-1 is based on the Diffusion Transformer. It incorporates innovations such as Block-Causal Attention, Parallel Attention Block, QK-Norm, and GQA. It also features Sandwich Normalization in FFN, SwiGLU, and Softcap Modulation to enhance training efficiency and stability at scale. Additionally, MAGI-1 uses shortcut distillation to train a single velocity-based model that supports variable inference budgets, ensuring efficient inference with minimal loss in fidelity.
Model Variants
MAGI-1 offers pre-trained weights for the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The 24B model is optimized for high-fidelity video generation, while the 4.5B model is suitable for resource-constrained environments. Distilled and quantized models are available for faster inference.
Evaluation Results
In human evaluations, MAGI-1 outperforms other open-source models like Wan-2.1, Hailuo, and HunyuanVideo in terms of instruction following and motion quality. This makes it a strong competitor to closed-source commercial models. In physical evaluations, MAGI-1 demonstrates superior precision in predicting physical behavior through video continuation, significantly outperforming existing models.
Applications
MAGI-1 is designed for various applications, including content creation, game development, film post-production, and education. Its 'Infinite Video Expansion' function allows seamless extension of video content. Combined with 'second-level time axis control,' it enables users to achieve smooth scene transitions and refined editing through chunk-by-chunk prompting. This feature meets the needs of film production and storytelling.
Running MAGI-1
MAGI-1 can be run using Docker or directly from the source code. Docker is recommended for ease of setup. Users can control input and output by modifying parameters in the provided run.sh scripts. The model is released under the Apache License 2.0.
MAGI-1 represents a significant advancement in the field of video generation. It offers high-quality, scalable, and efficient video generation capabilities. Its innovative features and strong performance make it a valuable tool for a wide range of applications.