Video Training

Preprocessing Videos

Selection of Training Videos

  • Use videos with consistent content, actions, or visual effects, but different main subjects.

  • Prioritize using videos; images can be used as supplementary data.

  • Videos must be high-resolution and watermark-free.

Number of Videos

  • 4 to 10 videos are sufficient. (Image-only training is not recommended.)

Frame Rate

  • Convert videos to 16fps, with a total of 81 frames (i.e., 5 seconds in duration).

  • You can use video editing tools to trim clips to 5 seconds, then extract frames at 16fps.

  • Shorter videos (e.g., 2s or 3s) are also acceptable, but they must be processed to 16fps.

Resolution

  • 480p works well. You can also reduce it to 320p to speed up training.

  • (Training will likely fail if the resolution is too high.)

Video Tagging

Automatic Tagging

Manual Tagging

Key Points: Secondary Features + Main Features

Main Features: Actions/effects to be learned; Secondary Features: Characters in the video, where they are, what they're doing.

Example: In the video, a woman wearing a black formal suit is presented. The person raises her hand and showers colorful confetti in celebration with a smile. The person then reveals a bikini, causing a b1k1n1 bikini up effect. The person continues celebrating, further showing the b1k1n1 bikini up effect.

The part before the red text describes the video content, and the red text summarizes the action effects being learned (i.e., red text represents main features, the rest are secondary features).

3(2).mp4

Online Training

Video Model Introduction

Hunyuan Video

Text-to-video: hunyuanvideo-fp8

Wan Video

Text-to-video: Wan2.1-14B

Image-to-video: Wan2.1-14B-480P, Wan2.1-14B-720P

Difference between text-to-video and image-to-video: In parameter adjustment, for model effect preview prompts, text-to-video only needs text similar to training set tags to generate preview images.

Image-to-video requires inputting images and corresponding prompts to generate preview images.

Online Parameter Settings

Image-to-video

Image-to-video: Wan2.1-14B-480P, Wan2.1-14B-720P (mainly selected based on training video resolution).

For training materials of 216*320 (less than 480p), choose the 480p model (there is little difference in final training effect between 720p and 480p, so 480p is recommended).

Resolution

Specific Size

Total Pixels

480p

854*480

About 410,000

720p

1280*720

About 920,000

Complete Dataset Upload

Parameter Settings

Frames to Extract: Number of frames to extract from a single video segment.

Example: For each segment at 16fps, setting Frames to Extract to 9 means not every frame will be learned.

Number of Slices: Dividing each video material.

Example: For a 5-second video at 16fps, setting Number of Slices to 5 means each segment is 16 frames; if set to 4, each segment is 20 frames.

Times per Image: Learning times for each video.

Cycles: Number of cycles based on Times per Image.

Model Effect Preview Prompts: Prompt for generating example video (modify it based on dataset tags combined with initial frame image content).

Initial frame: For image-to-video, the required image for generating the example video.

Advanced Parameter Settings

The only setting to be modified: Flow Shift.

720p is 5, 480p is 3 [materials must also be 480p].

Text-to-Video Parameters

Text-to-video parameters are consistent with image-to-video parameters. Flow Shift follows the default parameters.

Model Selection

Choose the one with good real-time sample images that match the effects or actions shown in the training set videos.

Model Testing

Image-to-Video Testing

kijai Workflow: kj wan testing.json

AI App Testing: SeaArt AI AI | kj wan testing

Parameter Settings

Model Selection: The training model should match the testing model.

Select LoRA: Choose saved LoRA from your models.

Weight: LoRA weight.

Width: The size after the input image is compressed and cropped.

Height: The size after the input image is compressed and cropped.

Frames: Total frame count for the output duration (calculated as 4*n+1, where n represents seconds; e.g., 5 seconds = 81 frames).

Shift: 720p is 5, 480p is 3.

CFG: Default cfg is 6, can be adjusted to 5.

Official Workflow: wan official workflow.json

AI App Testing: SeaArt AI AI | wan official workflow

Default cfg is 6, can be adjusted to 5.

Sampler is uni-pc, scheduler can be normal or simple.

Sampler dpmpp_2m, scheduler sgm_uniform.

Note: Other parameters are consistent with the kj parameter settings.

Text-to-Video Testing

Wan Creation Flow Testing

Model: Select wan2.1.

Additional: Select saved trained model.

Select Text to Video.

Hunyuan Creation Flow Testing

Model: Hunyuan Video.

Additional: Select saved trained model.

Select Text to Video.

Last updated