Data prepare

Data preparation aligns with T2V section.

Training

Training on GPUs:

bash scripts/text_condition/gpu/train_inpaint_v1_3.sh

Training on NPUs:

bash scripts/text_condition/npu/train_inpaint_v1_3.sh

There are additional parameters you need to understand beyond those introduced in the T2V section.

Argparse	Usage
`--default_text_ratio` 0.5	During I2V training, a portion of the text is replaced with a default text to account for cases where the user provides an image without accompanying text.
`--mask_config`	The path of the `mask_config` file.
`--add_noise_to_condition`	Adding a small amount of noise to conditional frames during training to improve generalization.

In Open-Sora Plan V1.3, all mask ratio settings are specified in the mask_config file, located at scripts/train_configs/mask_config.yaml. The parameters include:

Argparse	Usage
`min_clear_ratio`	The minimum ratio of frames retained during continuation and random masking.
`max_clear_ratio`	The maximum ratio of frames retained during continuation and random masking.
`mask_type_ratio_dict_video`	During training, specify the ratio for each mask task. For video data, there are six mask types: `t2iv`, `i2v`, `transition`, `continuation`, `clear`, and `random_temporal`. These inputs will be normalized to ensure their sum equals one.
`mask_type_ratio_dict_image`	During training, specify the ratio for each mask task. For image data, there are two mask types: `t2iv` and `clear`. These inputs will be normalized to ensure their sum equals one.

Inference

Inference on GPUs:

bash scripts/text_condition/gpu/sample_inpaint_v1_3.sh

Inference on NPUs:

bash scripts/text_condition/npu/sample_inpaint_v1_3.sh

In the current version, we have only open-sourced the 93x480p version of the Image-to-Video (I2V) model. We recommend configuration --guidance_scale 7.5 --num_sampling_steps 100 --sample_method EulerAncestralDiscrete for sampling.

Inference on 93×480p, the speed on H100 and Ascend 910B.

Size	1 H100	1 Ascend 910B
93×480p	150s/100step	292s/100step

During inference, you can specify --nproc_per_node and set the --sp parameter to choose between single-gpu/npu mode, DDP (Distributed Data Parallel) mode, or SP (Sequential Parallel) mode for inference.

The following are key parameters required for inference:

Argparse	Usage
`--height` 352 `--width` 640 `--crop_for_hw`	When `crop_for_hw` is specified, the I2V model operates in fixed-resolution mode, generating outputs at the user-specified height and width.
`--max_hxw` 236544	When `crop_for_hw` is not specified, the I2V model operates in arbitrary resolution mode, resizing outputs to the greatest common divisor of the resolutions in the input image list. In this case, the `--max_hxw` parameter must be provided, with a default value of 236544.
`--text_prompt`	The path to the `prompt` file, where each line represents a prompt. Each line must correspond precisely to each line in `--conditional_pixel_values_path`.
`--conditional_pixel_values_path`	The input path for control information can contain one or multiple images or videos, with each line controlling the generation of one video. It must correspond precisely to each prompt in `--text_prompt`.
`--mask_type`	Specify the mask type used for the current inference; available types are listed in the `MaskType` class in `opensora/utils/mask_utils.py`, which are six mask types: `t2iv`, `i2v`, `transition`, `continuation`, `clear`, and `random_temporal`. This parameter can be omitted when performing I2V and Transition tasks.
`--noise_strength`	The noise strength added to conditional frames, which defaults to 0 (no noise added).

Before inference, you need to create two text files: one named prompt.txt and another named conditional_pixel_values_path. Each line of text in prompt.txt should correspond to the paths on each line in conditional_pixel_values_path.

For example, if the content of prompt.txt is:

this is a prompt of i2v task.
this is a prompt of transition task.

Then the content of conditional_pixel_values_path should be:

/path/to/image_0.png
/path/to/image_1_0.png,/path/to/image_1_1.png

This means we will execute a image-to-video task using /path/to/image_0.png and "this is a prompt of i2v task." For the transition task, we'll use /path/to/image_1_0.png and /path/to/image_1_1.png (note that these two paths are separated by a comma without any spaces) along with "this is a prompt of transition task."

After creating the files, make sure to specify their paths in the sample_inpaint_v1_3.sh script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I2V.md

I2V.md

Data prepare

Training

Inference

Files

I2V.md

Latest commit

History

I2V.md

File metadata and controls

Data prepare

Training

Inference