POST /v2/videos/generations
Method: POSTEndpoint: /v2/videos/generations
The Tongyi Wanxiang text-to-video model generates a smooth video from a text prompt. Supported capabilities include:
Core capabilities: flexible durations (5s/10s), specified video resolution (480P/720P/1080P), smart prompt rewriting, and watermark support.
Audio capabilities: supports automatic dubbing or a custom audio file for audio-video synchronization. Available only on wan2.5.
Request Parameters
Header Parameters
text
Authorization
string
Optional
Default Value:
Bearer {{YOUR_API_KEY}}Body Parameters application/json
text
prompt
string
Required
The text prompt supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content that exceeds this limit will be truncated.
Text prompts support both Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content beyond this limit is truncated.
Example: A kitten running in the moonlight.
Example: a kitten runs under the moonlight.
model
enum<string>
Required
Model name. Example: wan2.1-t2v-turbo.
Value:
wan2.5-t2v-preview
// provider-specific example normalized for English documentation
// provider-specific example normalized for English documentation
duration
enum<integer>
Optional
Duration of the generated video in seconds. This parameter is fixed at 5 and cannot be changed. The model always generates a 5-second video.
Enum Values:
5
10
audio_url
string
Optional
Supported only by wan2.5-t2v-preview. Audio file URL used by the model to generate the video. See audio settings for usage.
Supports HTTP or HTTPS. Local files can be uploaded first to obtain a temporary URL.
Audio limits:
Formats: wav and mp3.
Duration: 3 to 30 seconds.
File size: up to 15 MB.
Over-limit handling: if the audio is longer than the duration value of 5s or 10s, only the first 5s or 10s are kept and the rest is discarded. If the audio is shorter than the video duration, the remaining part of the video is silent. For example, if the audio is 3s and the video is 5s, the output has sound for the first 3s and is silent for the last 2s.
Example Value:
https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250923/hbiayh/%E4%BB%8E%E5%86%9B%E8%A1%8C.mp3。
audio
string
Optional
Supported only by wan2.5-t2v-preview. Whether to add audio. Parameter priority: audio_url > audio, and it applies only when audio_url is empty.
true: default value, automatically adds audio to the video.
// provider-specific example normalized for English documentation
size
string
Optional
480P tier: optional video resolutions and corresponding aspect ratios are:
832*480:16:9。
480*832:9:16。
624*624:1:1。
720P tier: optional video resolutions and corresponding aspect ratios are:
1280*720:16:9。
720*1280:9:16。
960*960:1:1。
1088*832:4:3。
832*1088:3:4。
1080P tier: optional video resolutions and corresponding aspect ratios are:
1920*1080: 16:9。
1080*1920: 9:16。
1440*1440: 1:1。
1632*1248: 4:3。
1248*1632: 3:4。
watermark
boolean
Optional
Specifies whether to add a watermark. The watermark appears in the lower-right corner and reads "Generated by AI".
template
string
Optional
negative_prompt
string
Optional
A negative prompt is used to describe content that you do not want to appear in the video, which lets you restrict the video content.
The negative prompt describes content you do not want to appear in the video, helping you constrain the result.
It supports Chinese and English, with a maximum length of 500 characters. Content that exceeds this limit will be truncated.
It supports both Chinese and English, with a maximum length of 500 characters. Content beyond that limit is truncated.
Examples: low resolution, error, worst quality, low quality, defects, extra fingers, poor proportions.
Example: low resolution, errors, worst quality, low quality, defects, extra fingers, bad proportions.
prompt_extend
boolean
Optional
Specifies whether prompt rewriting is enabled. When enabled, a large language model (LLM) intelligently rewrites the input prompt. This significantly improves results for shorter prompts but increases processing time.
seed
integer
Optional
A random seed used to control the randomness of the generated content. The value must be in the range [0, 2147483647].
If this parameter is not provided, the algorithm automatically generates a random seed. To keep the generated content relatively stable, reuse the same seed value.
Example
{
"model"
:
"wan2.5-t2v-preview"
,
"prompt"
:
"An epically cute scene. A tiny cartoon kitten general in detailed golden armor and an oversized helmet stands bravely on a cliff. Riding a small but heroic warhorse, he declares: \"Dark clouds gather above the Snow Mountain, and from the lone city we gaze toward Yumenguan. After a hundred battles in yellow sand, the golden armor is worn through; we will not return until Loulan is broken.\" Below the cliff, a vast and endless army of mice charges forward with improvised weapons. It is a dramatic large-scale battle scene inspired by ancient Chinese war epics. Dark clouds hang above the distant Snow Mountain, blending comedy, cuteness, and epic grandeur."
,
"audio_url"
:
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250923/hbiayh/%E4%BB%8E%E5%86%9B%E8%A1%8C.mp3"
,
"size"
:
"832*480"
,
"prompt_extend"
:
true
,
"duration"
:
10
}Example Request
Shell
bash
curl --location --request POST '/v2/videos/generations' \
--header 'Authorization: Bearer {{YOUR_API_KEY}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "wan2.5-t2v-preview",
"prompt": "An epically cute scene. A tiny cartoon kitten general in detailed golden armor and an oversized helmet stands bravely on a cliff. Riding a small but heroic warhorse, he declares: \"Dark clouds gather above the Snow Mountain, and from the lone city we gaze toward Yumenguan. After a hundred battles in yellow sand, the golden armor is worn through; we will not return until Loulan is broken.\" Below the cliff, a vast and endless army of mice charges forward with improvised weapons. It is a dramatic large-scale battle scene inspired by ancient Chinese war epics. Dark clouds hang above the distant Snow Mountain, blending comedy, cuteness, and epic grandeur.",
"audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250923/hbiayh/%E4%BB%8E%E5%86%9B%E8%A1%8C.mp3",
"size": "832*480",
"prompt_extend": true,
"duration": 10
}'Response
🟢 200 Success
Content Type: application/json
Response Schema
text
task_id
string
RequiredExample
json
{
"task_id": "e7bed961-d1b9-4b3f-8ef9-5f441bde28c8"
}