Motion Control
Motion Control enables precise control of a character's movements and facial expressions based on a reference image. When generating video from an image, you can assign motion to one character in the image. The motion can be extracted from an uploaded video or selected directly from the motion library. The result is a character video with accurately controlled actions and expressions.
Kling AI
Mar 5, 2026
7 分钟阅读

Kling VIDEO 3.0 Motion Control

The newly released Kling VIDEO 3.0 Motion Control builds upon the Motion Control introduced in VIDEO 2.6, delivering key capability upgrades. VIDEO 3.0 Motion Control enhances facial consistency across scenarios, ensuring stable facial features and smooth expressions even in complex, multi-angle, long-duration motions.

This upgrade expands Motion Control into cinematic performance, high-precision motion capture, and diverse entertainment scenarios, delivering more powerful and reliable video generation.

Showcases

1) Consistent Facial Identity from Any Angle

Reference Image

Element

Output with 

Element Binding

Output without 

Element Binding

视频缩略图播放视频

@Soccer Boy

 

视频缩略图播放视频
视频缩略图播放视频

 

视频缩略图播放视频

@Martial Arts Girl

视频缩略图播放视频
视频缩略图播放视频

2) Complex Emotions, Faithfully Reproduced

Reference Image

Element

Output with 

Element Binding

Output without Element Binding

@Fresh Girl

视频缩略图播放视频

 

视频缩略图播放视频

@Elegant Woman

视频缩略图播放视频
视频缩略图播放视频

3) Face Occlusion, High-Fidelity Restoration

Reference Image

Element

Output with 

Element Binding

Output without Element Binding

 

视频缩略图播放视频
视频缩略图播放视频

视频缩略图播放视频

 

视频缩略图播放视频

4) Consistent Facial Clarity Across Dynamic Framing

Reference Image

Element

Output with 

Element Binding

Output without 

Element Binding

视频缩略图播放视频
视频缩略图播放视频
视频缩略图播放视频

视频缩略图播放视频
视频缩略图播放视频
视频缩略图播放视频

How to Use Kling VIDEO 3.0 Motion Control

WEB

APP

  1. Upload the reference action video and character image you want to imitate.
  2. Click "Bind Facial Element to Enhance Facial Consistency" below the character image. Bind an existing element or create a new one to generate an action video with consistent facial identity.
  3. Simply upload a set of images or upload/record a short video to quickly create an element and easily enrich multi-angle and multi-emotion element information.
  • Element Binding is supported only when the character's orientation matches the video orientation.

How to Achieve the Desired Outputs

  1. The Motion Control Element Library only uses facial information for reference. It does not include clothing, hairstyle, makeup, or props. Therefore, we recommend uploading clear facial close-ups to ensure sufficient facial data.
  2. Whether you upload images or videos, follow this core principle: Upload facial references that match the result you want to generate.
    1. Head Turn Accuracy. To achieve more accurate head turns, upload: A front-facing view, side views (left and/or right)
    2. Facial Expression Accuracy. To better match facial expressions (such as smiling), upload: A neutral front-facing image, a smiling front-facing image (e.g. Smiling)
    3. 360° Smiling Rotation. For a seamless 360° smiling rotation, upload: Front-facing smile, left-profile smile, right-profile smile, upward-facing smile, downward-facing smile.
    4. Complex Emotional Transitions with Head Movement. For complex emotional changes (e.g. happy to sad) combined with head turns, upload: A front-facing image, a smiling expression, a sad expression, side views (left or right).
    5. If you need complex facial expressions while maintaining high identity accuracy, we strongly recommend uploading a video, which provides richer and more continuous facial information.
  3. Edge Cases
    1. The first frame in Motion Control may contain multiple people, but only one element is supported; the system will select the person with the largest on-screen presence as the element, and if the elements occupy similar portions of the frame, no element will be selected.
    2. If the element's face differs significantly from the face in the first frame, there is a small chance that facial quality may degrade, for example, when using a cat's face to reference a human.

Model Pricing

Model 

Mode

Credit Usage

Pricing Principle

    


 

 

Kling VIDEO 3.0 Motion Control


 

 

Professional

12 Credits/s
  1. The pricing is based on seconds, with video duration and actual price rounded to the nearest whole second.
  2. For example, if the user generates a 3.4s standard video, the pricing will be calculated as 3s x 9 credits/s = 27 credits.
  3. For example, if the user generates a 3.6s standard video, the pricing will be calculated as 4s x 9 credits/s = 36 credits.

Standard

9 Credits/s

Kling VIDEO 2.6 Motion Control 

Showcases

1)Perfectly Synchronized Full-Body Motions

Motion Reference

Image Reference

Output

视频缩略图播放视频

视频缩略图播放视频
视频缩略图播放视频

视频缩略图播放视频

2)Masterful Performance of Complex Motions

Motion Reference

Image Reference

Output

视频缩略图播放视频

视频缩略图播放视频
视频缩略图播放视频

视频缩略图播放视频

3)Precision in Hand Performances

Motion Reference

Image Reference

Output

视频缩略图播放视频

视频缩略图播放视频
视频缩略图播放视频

视频缩略图播放视频

4)30s One-Shot Action

Motion Reference

Image Reference

Output

视频缩略图播放视频

视频缩略图播放视频
视频缩略图播放视频

视频缩略图播放视频

5)Scene Details at Your Command

Motion Reference

Image Reference

Output

视频缩略图播放视频
prompt: A girl wearing a white tank top and a denim skirt.
视频缩略图播放视频
视频缩略图播放视频
prompt: A corgi runs in, circling around a girl's feet. 
视频缩略图播放视频

How to Use Kling VIDEO 2.6 Motion Control

WEB

APP

  1. Add video of character actions to mimic; you can upload video from local resources or choose from the Motion Library.
  2. Add the character image, ensuring the proportions match those in the video for optimal results.
  3. By default, the video will be generated through "Character Orientation Matches Video". You can select "Character Orientation Matches Image" which supports camera movement.
  4. Enter the prompt to control background elements and other information, and the motion-controlled video will be generated.

How to Achieve the Desired Outputs

1. Match the character's full-body/half-body in the image reference with the full-body/half-body in the motion reference.

2. Use a motion reference that features a wide range of motion, moderate speed, and minimal displacement.

3. For large motions reference, ensure there is enough space in the image reference for the character to move freely.

 

Image Reference

Half-Body

Full-Body

  1. Ensure the character's entire body and head are clearly visible and not obstructed.
  2. Upload a single character motion reference. For motion reference with two or more characters, the motion of the character occupying the largest portion of the frame will be used for generation.
  3. Real human actions are recommended, while certain stylized humanoid or humanoid body proportions can be recognized.
  4. The action video must be a single continuous shot, with the character consistently visible in the frame. Please avoid cuts, shot changes, or camera movements; otherwise, the video may be truncated.
  5. Avoid overly fast motions; steady, moderate movements yield the best results.
  6. The short edge must be at least 340px, and the long edge must not exceed 3850px.
  7. The supported duration for uploaded action videos is 3–30 seconds, and the generated video duration will match the length of the uploaded video. If the action is highly complex or performed at a very fast pace, there is a possibility that the generated result may be shorter than the original upload. This is because the model extracts only the valid and continuous action segments for generation. As long as a minimum of 3 seconds of usable continuous motion is extracted, the video can be generated. Please note that in such cases, the consumed Credits are non-refundable. We recommend adjusting the action difficulty and speed accordingly for optimal results.

 

Motion Reference

Half-Body

Full-Body

视频缩略图播放视频
视频缩略图播放视频
视频缩略图播放视频
视频缩略图播放视频
  1. Ensure the character's entire body and head are clearly visible and not obstructed.
  2. Upload a single character motion reference. For motion reference with two or more characters, the motion of the character occupying the largest portion of the frame will be used for generation.
  3. Real human actions are recommended, while certain stylized humanoid or humanoid body proportions can be recognized.
  4. Avoid cuts and camera movements in the motion reference.
  5. Avoid overly fast motions; steady, moderate movements yield the best results.
  6. The short edge must be at least 340px, and the long edge must not exceed 3850px.
  7. The duration range of the uploaded motion reference is from 3 to 30 seconds, in which the generated video length will align with the duration of the uploaded video. If motions are complex or fast-paced, there is a chance that the output may be shorter than the uploaded video duration, as the model can only extract the valid action duration for generation. The minimum extractable continuous action duration is 3 seconds. Please note that in such cases, the consumed credits cannot be refunded. It is recommended to adjust the complexity and speed of the actions accordingly.

 

Character Orientation

1. By default, the video will be generated through "Character Orientation Matches Video", and the character's movements, expressions, camera movements, and orientation will follow the motion reference. Other details can be controlled via prompts.

2. When you choose "Character Orientation Matches Image" to match the character orientation with the image reference, the character’s movements and expressions will follow the motion reference, and the orientation will align with the character orientation in the reference image. Camera movements and other elements can be customised through prompts.

"Character Orientation Matches Image" Camera Movement Showcase

Zoom In

Zoom Out

Camera Up

Camera Down

Fixed Position

视频缩略图播放视频
视频缩略图播放视频
视频缩略图播放视频
视频缩略图播放视频
视频缩略图播放视频

Model Pricing

Model 

Mode

Credit Usage

Pricing Principle

Kling VIDEO 2.6 Motion Control

 

Professional

8 Credits/s

1. The pricing is based on seconds, with video duration and actual price rounded to the nearest whole second.

2. For example, if the user generates a 3.4s standard video, the pricing will be calculated as 3s x 5 credits/s = 15 credits.

3. For example, if the user generates a 3.6s standard video, the pricing will be calculated as 4s x 5 credits/s = 20 credits.

Standard

5 Credits/s