Промпты Veo 3.1 для AI-Shorts 2026: 12 рабочих шаблонов

5-частная структура, негатив-блок и готовые рецепты под talking-head, гастро, urban и sci-fi — без воды.

Главное за 30 секунд

Veo 3.1 — это не «опиши сцену словами», а структурированный промпт из пяти частей: shot → subject → action → context → style. Модель ждёт от вас язык режиссёра: размер кадра, движение камеры, освещение, длительность. Без этого вы получаете generic AI-slop, какой делает каждый второй faceless-канал.

Цена вопроса в мае 2026: Veo 3.1 Standard стоит $0.40 за секунду, Fast — $0.15, Lite — $0.05 (данные Vertex AI). Восьмисекундный клип на Standard — $3.20, на Lite — $0.40. На длинной дистанции структура промпта решает, переплачиваете вы за reroll-ы или нет.

Ниже — 5-частная анатомия, рабочий negative prompt, 12 готовых шаблонов под жанры AI-Shorts и сравнение синтаксиса Veo 3.1 vs Sora 2 vs Kling 3.0. Welder упомянут как пайплайн, который собирает эти промпты автоматически — но шаблоны работают и без него.

Почему «опишите красивую сцену» не работает

Крупные модели генерации видео в 2026-м перестали быть промпт-вайперами. Каждая интерпретирует входной текст по-своему:

Sora 2 — physics simulator. Описывайте не вид объектов, а силы, которые на них действуют (разбор синтаксисов на Medium, март 2026).
Veo 3.1 — rendering engine. Любит структурированные данные и reference-картинки. Промпт похож на JSON-схему: cinematography → subject → action → context → style.
Kling 3.0 — audio-visual choreographer. Работает с timeline-скриптами, поддерживает negative prompt отдельно для звука и видео.

Если вы пишете один и тот же текст во все три модели, вы фактически не пользуетесь ни одной. Эта статья — про Veo 3.1, потому что в RU-сегменте AI-Shorts это самый ходовой бэкенд: Welder, VEED, Invideo, Leonardo все рендерят через него. У нас уже выходил материал про новый стандарт качества Sora 2 и Veo 3.1 — здесь идём глубже в синтаксис.

5-частная анатомия промпта Veo 3.1

Google DeepMind в официальном гайде рекомендует 3–6 предложений, 100–150 слов. Меньше — модель додумывает за вас. Больше — теряет фокус.

Каноническая структура:

Shot composition. Размер кадра + угол + объектив. Пример: «Medium close-up, eye level, 35mm lens, shallow depth of field».
Subject. Кто или что в кадре, без эпитетов-вибов. «A weathered fisherman in his sixties, wearing a faded yellow raincoat».
Action. Что субъект делает — глагол + объект + наречие. «Slowly hauling a fishing net out of the water, breath visible in cold air».
Context. Где, когда, в каком освещении. «Foggy harbor at dawn, soft golden rim light from low sun, wooden dock under his feet».
Style. Жанр и референс. «70mm documentary feel, muted Kodachrome palette, grain similar to The Lighthouse (2019)».

Дописываете в конце negative prompt отдельной строкой — про него ниже.

Эта структура даёт consistency между сценами в одной серии: модель цепляется за shot и style, и второй клип не разваливается в другую вселенную.

Шаблон-скелет, который копируете под любую сцену

[Shot]: <size> + <angle> + <lens/movement>
[Subject]: <noun + 2 specific descriptors>
[Action]: <verb + object + adverb>
[Context]: <location + time + lighting>
[Style]: <genre> + <reference film/look>
[Negative]: <comma-separated artifacts to avoid>

Не уходите в поэзию. Vague prompt = generic output. Veo 3.1 работает лучше, чем большинство людей думают, но только если вы говорите с ней на языке treatment-а, а не на языке Pinterest-доски.

12 шаблонов под жанры AI-Shorts

Все 12 проверены на Veo 3.1 Fast в апреле–мае 2026. В каждом — заполняете только переменные в <...>.

1. Talking-head без лица (faceless)

Medium shot, slight low angle, 50mm lens, locked tripod.
A <предмет — например, a steaming espresso cup> on a marble counter.
Camera holds steady for 3 seconds, then a slow 10° push-in toward the subject as ambient steam rises.
Urban café at golden hour, soft window light from camera-left, shallow DOF blurring the background.
Cinematic product film aesthetic, Kinfolk magazine palette, subtle film grain.
Negative: no logos, no text overlays, no extra hands, no morphing.

2. Динамичный city b-roll

Wide tracking shot, low angle, 24mm lens, dolly-left along a wet street.
<Город — Tokyo Shibuya / Москва Зарядье> at night, neon reflections on asphalt.
Crowd walks past camera in the opposite direction, creating parallax.
Light rain, strong rim light from neon signs, atmospheric haze.
Blade Runner 2049 inspired, teal-and-magenta palette, anamorphic lens flares.
Negative: no readable text on signs, no static crowd, no lens distortion at edges.

3. Гастро / еда крупно

Macro extreme close-up, 90° top-down, 100mm macro lens, locked.
<Блюдо — например, a fresh cut of dragonfruit> on a dark slate plate.
A single drop of water falls onto the surface in slow motion (240fps look), splash crowns at impact.
Studio black background, single soft key light from above, controlled specular highlights.
Food-porn editorial style, MasterChef-grade lighting, hyper-detailed texture.
Negative: no plastic look, no oversaturation, no floating particles.

4. Sci-fi / fantasy сцена

Sweeping crane shot, low to high arc, 35mm lens, anamorphic 2.39:1.
A lone <персонаж — astronaut in a dust-covered orange suit> walking across a red desert.
They plant a flag, then look up as a ringed gas giant rises on the horizon.
Martian-like terrain at dusk, two suns casting long shadows, fine dust particles in the air.
70mm sci-fi epic, Denis Villeneuve aesthetic, IMAX-grade composition.
Negative: no Earth-like vegetation, no extra figures, no helmet reflection artifacts.

5. UGC-style talking-head с аватаром

Medium close-up, eye level, 35mm lens, slight handheld micro-jitter.
A <персонаж — woman in her late 20s, casual hoodie, light makeup> looking directly at the camera.
She gestures naturally with her right hand while speaking; subtle blinks every 3 seconds.
Home office at noon, soft window light from camera-left, blurred bookshelf in background.
UGC vlog aesthetic, iPhone 15 Pro look, no color grading.
Negative: no plastic skin, no robotic blink, no static gaze, no face warping.

6. Dramatic reveal (хук первой секунды)

Extreme close-up on <деталь — a vintage pocket watch>, then quick rack-focus to reveal a face behind it.
First 1.5 seconds: only the object, ticking sound.
Then camera pulls back 30cm in 0.8s, focus shifts to a man in a fedora staring intently at the lens.
Dimly lit interrogation room, single overhead bulb, hard shadows.
Film noir aesthetic, black-and-white-leaning palette, 1940s detective drama.
Negative: no motion blur on watch, no jump cut, no soft focus on final face.

Полная подборка хуков — в отдельном гайде про retention первой секунды, здесь даю только промпт-форму.

7. Slow-motion активность

Medium shot, eye level, 85mm lens, locked.
A <субъект — basketball player> mid-jump, ball about to leave their fingertips toward an off-screen hoop.
Action unfolds in 480fps slow motion: muscle tension visible, sweat droplet trajectories rendered.
Indoor gym at night, harsh top-down arena lights, polished wooden floor reflecting silhouette.
Sports broadcast cinematic, ESPN documentary feel, high contrast.
Negative: no extra limbs, no warped ball, no audience in background.

8. Природа / wildlife

Long lens shot, 400mm equivalent, slight handheld breathing.
A <животное — red fox> stepping carefully through fresh snow, ears alert.
It pauses, sniffs the air, then turns its head 90° toward the camera over 2 seconds.
Boreal forest at blue hour, ambient moonlight from above-left, snowfall in foreground.
BBC Planet Earth aesthetic, naturalistic palette, 4K wildlife documentary look.
Negative: no human hands, no zoom artifacts, no unnatural fur direction.

9. Городская архитектура / drone

Aerial drone shot, slow 360° orbit, 24mm lens at 80m altitude.
<Здание — a brutalist concrete tower> rising from a misty river bend.
Camera completes a quarter rotation in 8 seconds, revealing the city skyline behind it.
Early morning fog, low golden sun cutting through haze, calm water below.
Architectural film aesthetic, Wes Anderson symmetry, muted earth-tone palette.
Negative: no other drones, no drone shadows on building, no text on signage.

10. Fashion / lookbook

Full body shot, slight high angle, 50mm lens, slow dolly-in 1m over 6 seconds.
A <модель — woman in a flowing red silk dress> walking toward the camera through a long marble corridor.
Dress catches air with each step, hair moves naturally, gaze fixed on camera.
Luxury hotel interior, warm chandelier light, subtle gold reflections on walls.
Vogue editorial film, Steven Meisel aesthetic, painterly grade.
Negative: no extra people, no logo on dress, no awkward gait, no static fabric.

11. ASMR / micro-interaction

Extreme macro close-up, 100mm lens, locked.
A <предмет — single match> being struck against a rough surface.
Sparks fly outward, flame ignites and stabilizes over 1.5 seconds.
Dark room, only the match flame illuminates a small radius.
ASMR product film aesthetic, hyper-detailed texture on match head.
Negative: no smoke beyond natural amount, no extra hands, no flame morphing.

12. Documentary opener

Medium shot transitioning to wide, 35mm lens, slow camera rise on stabilized rig.
<Главный герой — a craftsman in his workshop> sanding a wooden chair leg.
Camera starts on his hands, then rises 1.2m over 7 seconds to reveal the entire workshop.
Warm afternoon light through dusty window, sawdust particles visible in beams.
Netflix documentary aesthetic, Chef's Table grading, intimate observational tone.
Negative: no other people in workshop, no modern devices visible, no overlay text.

Как пользоваться: подставляете переменные, дожимаете до 100–150 слов, добавляете негатив-блок. Если кадр всё ещё «не тот» — двигайте Style-строку, не Subject-строку. Стиль чаще всего и ломает.

Negative prompt: что класть в анти-список

Google официально не делает «-1.5 weighting» как Stable Diffusion, но Veo 3.1 чувствует слова avoid, no и not в финальной строке промпта и режет частоту артефактов на 30–50% по нашим тестам.

Базовый negative-блок, который кладёте в каждый промпт:

Negative: no extra limbs, no face distortion, no morphing, no text overlays, no logos, no jump cuts within clip, no soap opera effect, no over-sharpening, no plastic skin.

Добавляете точечные:

Артефакт	Что писать
Лишние пальцы	`no six fingers, consistent hand anatomy`
Двоение лица	`no face duplication, single subject only`
Дрожащая текстура одежды	`stable fabric texture, no shimmering cloth`
Скачущий фон	`static background, no scene shifts`
Плавающие объекты	`no floating objects, all objects grounded`
Soap opera look	`no 60fps motion smoothing, cinematic 24fps feel`

Если дефект повторяется — переносите его в позитивный промпт. Вместо no warped hands пишите consistent five-finger hand anatomy в Subject-строке. Sider в полевом гайде по техникам Veo 3.1 подтверждает: позитивная переформулировка работает лучше, чем длинный negative.

Veo 3.1 vs Sora 2 vs Kling 3.0: одна сцена — три синтаксиса

Чтобы вы не путали, какая модель что любит, вот один и тот же кадр в трёх синтаксисах. Сцена: «закат над пшеничным полем, девушка идёт сквозь колосья, лёгкий ветер».

Параметр	Veo 3.1	Sora 2	Kling 3.0
Что подчеркнуть	Cinematography (lens, shot size)	Физика (силы, инерция, скорость)	Timeline + аудио-таймкоды
Длина промпта	100–150 слов	80–120 слов	60–100 слов + sound script
Негатив	В конце, через `Negative:`	Не любит negative — переформулировать позитивно	Отдельные negative для video и audio
Reference image	Поддерживает (frames-to-video)	Только text-to-video	Поддерживает + style transfer
Цена 8s клипа	$1.20–$3.20	≈$1.60 (Pro)	≈$0.96
Где брать	Vertex AI, Flow, Welder	ChatGPT Plus / Pro	Kling.ai, PiAPI

Veo 3.1 промпт:

Medium tracking shot, low angle, 35mm lens, slow dolly-back. A young woman in a white linen dress walking through a wheat field at sunset, fingertips brushing the tops of the wheat. Golden hour light from camera-right, long shadows, light breeze moving the wheat in waves. Terrence Malick aesthetic, warm Kodak Portra palette. Negative: no extra people, no harsh wind, no face distortion.

Sora 2 промпт:

A wheat field at sunset. Wind pushes the wheat in waves moving north-east at ~3 m/s. A woman walks through the field at 0.8 m/s; her dress responds to wind delay (~200ms behind body motion). Sun is 10° above horizon, casting shadows ~6m long. Ground is uneven, her gait adjusts to terrain. Light scatters through wheat dust particles in the air.

Kling 3.0 промпт:

[0-2s] Wide shot, wheat field at golden hour, no movement.
[2-5s] Camera dolly back, woman in white dress enters frame from left, walks toward right.
[5-8s] Close on her hand brushing wheat tops.
Audio: ambient wind 0-8s, light rustle 2-8s, no music.
Style: Malick film, Portra palette.
Negative video: extra people, face warping. Negative audio: voices, music.

Это не вкусовщина — это разная архитектура моделей. Один промпт на всех = слабый результат во всех трёх.

У нас есть отдельный материал про Sora 2 и Veo 3.1 как новый стандарт качества — там про качество выдачи, тут про синтаксис.

Как сшить 10 сцен в один short

Один клип Veo 3.1 — обычно до 8 секунд, в Pro-режиме до 60 секунд через scene extension. Но реальный AI-Short в 2026-м это серия из 6–10 коротких клипов, склеенных по тому же style/lighting.

Конкретный воркфлоу:

Зафиксируйте Style-строку как константу. Меняете только Shot, Subject, Action, Context от сцены к сцене.
Для героя — один и тот же Subject-блок дословно, иначе модель меняет лицо.
Между сценами используйте scene extension Veo 3.1 (frames-to-video), скармливая последний кадр предыдущего клипа как reference. Нативно поддерживается до 20 chained clips, ~140 секунд непрерывного нарратива (Vertex AI / Gemini API pricing).
На сборке режьте до ритма закадрового голоса, а не до окончания клипа.

Детальный гайд по склейке у нас уже опубликован — 10 сцен Veo 3.1 + Sora 2 в один связный нарратив.

7 ошибок, которые повторяют все

Прилагательные вместо существительных. «Beautiful, stunning, amazing» — Veo 3.1 выкидывает их. Конкретное worn leather jacket работает в 10 раз лучше, чем cool jacket.
Style и Subject в одной строке. Модель путается. Держите блоки раздельно.
Длинные causal chains в одном клипе. Veo 3.1 это не Sora 2. Не пишите «человек идёт, потом останавливается, поднимает чашку, делает глоток». Один глагол на клип.
Текст в кадре. До 80% промптов с просьбой нарисовать читаемые надписи проваливаются. Накладывайте текст в монтаже — мы сравнивали Submagic, Captions и CapCut в отдельной статье.
Игнор разрешения. Для Reels/TikTok указывайте vertical 9:16, 1080x1920 явно — иначе модель отдаёт 16:9 и кропит вам половину кадра.
Нулевая длительность движения камеры. Если пишете dolly-back — указывайте over 4 seconds. Иначе Veo делает резкий скачок.
Один и тот же seed в серии. Меняйте seed между клипами серии, иначе получаете лёгкие копии одного кадра. В Welder seed-rotation включён по умолчанию.

Workflow в Welder: как мы убрали ручной промпт-инжиниринг

Честно: половина людей не хочет писать 100-словные treatment-ы под каждый клип. Welder поэтому делает это за вас — вы пишете тему и нишу, модель собирает 5-частный промпт под каждую сцену, дублирует Style/Subject между клипами, добавляет negative-блок и отдаёт на Veo 3.1 (или Veo 3.1 Lite, если вы на стартовом тарифе).

Ключевые фичи под промпт-инжиниринг:

Style lock. Один раз настроили палитру и референс — он держится во всех клипах серии.
Voice lock. Голос ElevenLabs прибит к каналу, не дрейфует от ролика к ролику.
Auto-negative. Базовый блок добавляется автоматически, точечные артефакты вы добавляете в один клик.
Scene extension. Сшивает 6–10 клипов по последнему кадру.

Тарифы смотрите на странице /pricing — Lite-план рендерит на Veo 3.1 Lite, что даёт серию из 10 клипов за ~$0.50 в себестоимости. Это в 6 раз дешевле, чем те же 10 клипов на Veo 3.1 Standard через Vertex напрямую.

В остальном — структура та же, что описана выше. Промпт-инжиниринг не магия, это формат. Освойте 5 частей и negative-блок — и в 80% случаев вам не нужен AI Studio.

Что делать прямо сейчас

Возьмите один из 12 шаблонов выше под свой жанр.
Заполните переменные. Не уходите в эпитеты — ставьте конкретные существительные.
Добавьте базовый negative-блок.
Сгенерируйте серию из 6 клипов с одним Style-блоком и одним Subject-блоком.
Соберите на ритм голоса.

Если не хочется руками — соберите ту же серию в один клик в Welder. Открыть дашборд и попробовать.