Skip to content

Commit

Permalink
Cosmos text to video example.
Browse files Browse the repository at this point in the history
  • Loading branch information
comfyanonymous committed Jan 17, 2025
1 parent be1ba46 commit 75b660e
Show file tree
Hide file tree
Showing 4 changed files with 706 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ Here are some more advanced examples:

[Hunyuan Video](hunyuan_video)

[Nvidia Cosmos](cosmos)

[Audio Models](audio)


Expand Down
41 changes: 41 additions & 0 deletions cosmos/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Nvidia Cosmos Models

[Nvidia Cosmos](https://www.nvidia.com/en-us/ai/cosmos/) is a family of "World Models". ComfyUI currently supports specifically the 7B and 14B text to video diffusion models and the 7B and 14B image to video diffusion models.

## Files to Download

You will first need:

#### Text encoder and VAE:

[oldt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/comfyanonymous/cosmos_1.0_text_encoder_and_VAE_ComfyUI/tree/main/text_encoders) goes in: ComfyUI/models/text_encoders/

[cosmos_cv8x8x8_1.0.safetensors](https://huggingface.co/comfyanonymous/cosmos_1.0_text_encoder_and_VAE_ComfyUI/blob/main/vae/cosmos_cv8x8x8_1.0.safetensors) goes in: ComfyUI/models/vae/

Note: oldt5_xxl is not the same as the t5xxl used in flux and other models.
oldt5_xxl is t5xxl 1.0 while the one used in flux and others is t5xxl 1.1

#### Video Models

The video models can be found [in safetensors format here.](https://huggingface.co/mcmonkey/cosmos-1.0/tree/main)

The workflows on this page use [Cosmos-1_0-Diffusion-7B-Text2World.safetensors](https://huggingface.co/mcmonkey/cosmos-1.0/blob/main/Cosmos-1_0-Diffusion-7B-Text2World.safetensors) and [Cosmos-1_0-Diffusion-7B-Video2World.safetensors](https://huggingface.co/mcmonkey/cosmos-1.0/blob/main/Cosmos-1_0-Diffusion-7B-Video2World.safetensors)

These files go in: ComfyUI/models/diffusion_models

Note: "Text to World" means Text to video and "Video to World" means image/video to video.

If you want the original diffusion models in .pt format instead of the repacked safetensors the official links are: [7B-Text2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Text2World) [7B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Video2World) [14B-Text2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Text2World) [14B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Video2World)

## Workflows

### Text to Video

This workflow requires the 7B text to video model that you can download above.

![Example](text_to_video_cosmos_7B.webp)

[Workflow in Json format](text_to_video_cosmos_7B.json)

### Image to Video

Loading

0 comments on commit 75b660e

Please sign in to comment.