Text-to-Video
Method: POSTEndpoint: /qwen/api/v1/services/aigc/video-generation/video-synthesis
The Tongyi Wanxiang text-to-video model generates a smooth video from a text prompt. Supported capabilities include:
Core capabilities: choose a video duration of 5s or 10s, specify the resolution as 480P, 720P, or 1080P, enable smart prompt rewriting, and add a watermark.
Audio capabilities: supports automatic dubbing or a custom audio file for audio-video synchronization. Available only on wan2.5.
Request Parameters
Header Parameters
text
X-DashScope-Async
string
Required
Example:
enable
Content-Type
string
Required
Example:
application/json
Authorization
string
Optional
Default Value:
Bearer {{YOUR_API_KEY}}Body Parameters application/json Required
text
model
string
Required
input
object
Required
prompt
string
Required
Text prompt describing the elements and visual characteristics expected in the generated video.
Chinese and English are both supported. Each Chinese character or Latin letter counts as one character, and content beyond the limit is truncated automatically. Length limits vary by model version:
// provider-specific example normalized for English documentation
For wan2.2 and earlier models: maximum length is 800 characters.
// provider-specific example normalized for English documentation
negative_prompt
string
Optional
Negative prompt used to describe what should not appear in the video, helping constrain the visual result.
Supports both Chinese and English, with a maximum length of 500 characters. Excess content is truncated automatically.
Example Value: low resolution, errors, worst quality, low quality, mutilation, extra fingers, bad proportions, etc.
audio_url
string
Optional
Supported only by wan2.5-t2v-preview. Audio file URL used by the model to generate the video. See audio settings for usage.
Supports HTTP or HTTPS. Local files can be uploaded first to obtain a temporary URL.
Audio limits:
Formats: wav and mp3.
Duration: 3 to 30 seconds.
File size: up to 15 MB.
Over-limit handling: if the audio is longer than the duration value of 5s or 10s, only the first 5s or 10s are kept and the rest is discarded. If the audio is shorter than the video duration, the remaining part of the video is silent. For example, if the audio is 3s and the video is 5s, the output has sound for the first 3s and is silent for the last 2s.
Example Value:
https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250923/hbiayh/%E4%BB%8E%E5%86%9B%E8%A1%8C.mp3。
parameters
object
Optional
size
string
Optional
// provider-specific example normalized for English documentation
// provider-specific example normalized for English documentation
// provider-specific example normalized for English documentation
// provider-specific example normalized for English documentation
// provider-specific example normalized for English documentation
480P tier: optional video resolutions and corresponding aspect ratios are:
832*480:16:9。
480*832:9:16。
624*624:1:1。
720P tier: optional video resolutions and corresponding aspect ratios are:
1280*720:16:9。
720*1280:9:16。
960*960:1:1。
1088*832:4:3。
832*1088:3:4。
1080P tier: optional video resolutions and corresponding aspect ratios are:
1920*1080: 16:9。
1080*1920: 9:16。
1440*1440: 1:1。
1632*1248: 4:3。
1248*1632: 3:4。
prompt_extend
boolean
Optional
Whether to enable prompt rewriting. When enabled, a large model rewrites the input prompt automatically. This often improves results for short prompts, but it also increases latency.
true: default value, enables smart rewriting.
false: disables smart rewriting.
Example Value:true。
duration
integer
Optional
Video duration in seconds. The allowed values depend on the model parameter:
// provider-specific example normalized for English documentation
// provider-specific example normalized for English documentation
// provider-specific example normalized for English documentation
// provider-specific example normalized for English documentation
Example Value:5。
audio
boolean
Optional
Supported only by wan2.5-t2v-preview. Whether to add audio. Parameter priority: audio_url > audio, and it applies only when audio_url is empty.
true: default value, automatically adds audio to the video.
false: no audio is added; the output video is silent.
Example Value:true。
watermark
boolean
Optional
Whether to add a watermark. The watermark appears in the lower-right corner of the video and always reads "AI-generated".
false: default value, no watermark is added.
true: add a watermark.
seed
integer
Optional
Random seed. Range: [0, 2147483647].
If omitted, the system generates a random seed automatically. To improve reproducibility, keep the seed fixed.
Because model generation is probabilistic, using the same seed does not guarantee identical results every time.
Example Value:12345。
Example
{
"model"
:
"wan2.5-t2v-preview"
,
"input"
:
{
"prompt"
:
"An epically cute scene. A tiny cartoon kitten general in detailed golden armor and an oversized helmet stands bravely on a cliff. Riding a small but heroic warhorse, he declares: \"Dark clouds gather above the Snow Mountain, and from the lone city we gaze toward Yumenguan. After a hundred battles in yellow sand, the golden armor is worn through; we will not return until Loulan is broken.\" Below the cliff, a vast and endless army of mice charges forward with improvised weapons. It is a dramatic large-scale battle scene inspired by ancient Chinese war epics. Dark clouds hang above the distant Snow Mountain, blending comedy, cuteness, and epic grandeur."
}
,
"parameters"
:
{
"size"
:
"832*480"
,
"prompt_extend"
:
true
,
"duration"
:
10
,
"audio"
:
true
}
}Example Request
Shell
bash
curl --location --request POST '/qwen/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header 'Authorization: Bearer {{YOUR_API_KEY}}' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "wan2.5-t2v-preview",
"input": {
"prompt": "An epically cute scene. A tiny cartoon kitten general in detailed golden armor and an oversized helmet stands bravely on a cliff. Riding a small but heroic warhorse, he declares: \"Dark clouds gather above the Snow Mountain, and from the lone city we gaze toward Yumenguan. After a hundred battles in yellow sand, the golden armor is worn through; we will not return until Loulan is broken.\" Below the cliff, a vast and endless army of mice charges forward with improvised weapons. It is a dramatic large-scale battle scene inspired by ancient Chinese war epics. Dark clouds hang above the distant Snow Mountain, blending comedy, cuteness, and epic grandeur."
},
"parameters": {
"size": "832*480",
"prompt_extend": true,
"duration": 10,
"audio": true
}
}'Response
🟢 200 Success
Content Type: application/json
Response Schema
text
objectExample
json
{}