Search models...
Seedance 2
ByteDance's multimodal video generation family on Vidgo API with prompt-led creation, first/last frame guidance, reference image/video/audio control, optional native audio, and pay-per-second pricing.

Kling 2.6 Motion Control
Kuaishou's motion control model that transfers motion from reference videos to character images while maintaining identity and adapting environments.


Wan 2.6
Alibaba's Wan 2.6 video generation family for text-to-video, image-to-video, and video-to-video with multi-shot 1080p output.

GPT Image 1.5
OpenAI's latest image model with 4x speed, precision editing, and superior text rendering.

Hailuo 02
MiniMax's #2 globally-ranked video model with NCR architecture, ultra-realistic physics, and 1080p cinematic output.

Seedance 1.0 Pro
ByteDance's #1 ranked video model with multi-shot storytelling, cinema-grade motion, and bilingual text-to-video generation.

Z-Image
Alibaba's efficient 6B-parameter image model with sub-second generation and exceptional Chinese-English bilingual text rendering capabilities.

Kling 2.6
Kuaishou's revolutionary video model that simultaneously generates visuals with synchronized dialogue, sound effects, and ambient audio in one pass.

Seedream 4.5
ByteDance's unified 4K image generation and editing model with professional-grade text rendering and commercial photography quality.

Seedream 5.0 Lite
ByteDance Seedream 5.0 Lite family on Vidgo API: seedream-5.0-lite for text-to-image and seedream-5.0-lite-edit for image-to-image editing, with 2K/3K presets, custom sizes, and up to 10 reference images.


FLUX.2
Black Forest Labs' production-grade model combining 4MP image generation and editing with multi-reference support, precise typography, and hex color control.


Wan Animate
Alibaba's 14B-parameter character animation model that transfers motion from reference videos to static characters with exceptional identity preservation.

GPT-4o Image
OpenAI's native multimodal image generator with exceptional text rendering, precise prompt following, and conversational editing capabilities.

Nano Banana
Google's leaderboard-topping image model (Gemini 2.5 Flash) excelling in natural language editing, character consistency, and multi-image blending.

Nano Banana Pro
Google's Nano Banana Pro image model powered by Gemini 3 Pro for 4K generation, strong text rendering, multi-image blending, and production-ready image editing.

Sora 2
OpenAI's advanced video model with realistic physics simulation, synchronized audio generation, and innovative Cameo feature for personalized content.

Sora 2 Pro
Premium Sora 2 variant delivering professional-grade 1024p video with enhanced fidelity, extended duration, and sophisticated audio-visual coherence.

Veo 3.1
Google DeepMind's 1080p video model with native audio generation, scene extension to 60+ seconds, and advanced creative controls for cinematic storytelling.


Suno v5
Suno's most advanced AI music model with studio-quality audio, authentic vocals, 10x faster generation, and up to 8-minute track support.


Suno Music
AI music generator with customizable styles, vocals, and full creative control over musical characteristics and quality.


Extend Music
Extend or modify existing music tracks by creating sequels based on source audio. Supports custom mode with full parameter control or simple mode inheriting original parameters. Specify continuation points and maintain style consistency across extensions up to 8 minutes.


Upload and Cover Audio
Transform audio tracks into new styles while preserving original melodies. Upload your audio files (up to 2 minutes) and convert them with AI-powered style transfer. Supports custom and simplified modes with vocal/instrumental options and audio weight controls.


Upload and Extend Audio
Upload audio files and extend them while maintaining the original style and characteristics. AI generates seamless continuations from specified time points. Supports multiple model versions with style weight and creative controls for natural extensions.


Add Instrumental
Generate musical accompaniment for uploaded audio files containing vocals or melodies. AI creates matching instrumental backing tracks with customizable style tags, genre preferences, and quality controls. Perfect for adding professional-quality backing to vocal recordings.


Add Vocals
Layer AI-generated vocals onto existing instrumental tracks. Provide lyrics or descriptions and the API generates matching vocal performances with customizable gender, style, and expression. Transform instrumental music into complete songs with professional AI singing.


Get Timestamped Lyrics
Retrieve lyrics synchronized with precise timestamps from generated music. Returns word-by-word timing data, waveform visualization, and alignment accuracy scores. Essential for karaoke applications, lyric videos, and music synchronization projects.


Boost Music Style
AI-enhanced music style description generator. Transform simple style inputs like 'pop, mysterious' into detailed, comprehensive musical descriptions. Optimize your prompts for better music generation results with enriched genre, mood, and instrumentation details.


Generate Music Cover
Generate alternative cover versions of existing music tracks. Create variations with automatic style changes while maintaining the essence of the original composition. Perfect for producing multiple versions or exploring different interpretations of your generated music.


Replace Section
Replace specific sections of generated music tracks with precision timing control. Modify choruses, verses, or any segment by specifying start and end times. Maintains overall coherence while allowing targeted changes to lyrics, style, or musical elements.


Generate Persona
Create reusable music personas from existing audio tracks. Save distinctive vocal characteristics, musical styles, and personality traits for consistent use across multiple generations. Build your own AI artist profiles for brand consistency and style continuity.


Generate Lyrics
AI-powered lyrics generation based on themes, moods, and descriptions. Create original song lyrics from simple prompts up to 200 characters. Generate creative, coherent lyrics for any genre or emotional tone with professional songwriting quality.


Convert to WAV
Obtain high-quality WAV format files from your generated music. Convert any PoYo-generated audio to lossless WAV format for professional use, further editing, or high-fidelity playback. Essential for production workflows requiring uncompressed audio.


Vocal Remover
Separate vocals from instrumentals or split audio into multiple stem tracks. Two modes available: vocal separation for isolating vocals and backing tracks, or stem splitting for extracting drums, bass, vocals, and other instruments individually. Professional-grade audio source separation.


AI Music Video
Generate visualized music videos from audio tracks. Create engaging visual content automatically synchronized to your music with optional author attribution and brand watermarks. Perfect for social media content, promotional materials, and music distribution.
Sora 2 Official
OpenAI Sora 2 on PoYo with synced audio, improved physics, optional reference image input, and fixed 4-second, 8-second, 12-second, 16-second, and 20-second tiers.
Kling 3.0 Motion Control
Kling 3.0 Motion Control is a reference-driven motion transfer model that combines one character image and one source video with transparent per-second pricing.
Hailuo 2.3
MiniMax's Hailuo 2.3 video model for realistic human motion, expressive characters, and text-to-video or first-frame guided generation at 768p and 1080p.
Kling 2.1
Kling 2.1 on PoYo provides Standard and Pro image-to-video modes with 5-second and 10-second clips, start-frame control, and optional end-frame control in Pro.
Kling 2.5 Turbo Pro
Kling 2.5 Turbo Pro is a flexible short-form video model with text-to-video, optional frame guidance, smooth motion, cinematic depth, and fixed 5-second and 10-second tiers.

Wan 2.2 Fast
Wan 2.2 Fast provides fast text-to-video and image-to-video generation with low-cost 480p and 720p tiers for quick iteration.

Wan 2.5
Wan 2.5 combines text-to-video and image-to-video generation with 5-second and 10-second output, synchronized audio support, and multiple size and resolution tiers.
Runway Gen-4.5
Runway Gen-4.5 is a high-fidelity video model focused on prompt adherence, cinematic motion, visual fidelity, and optional reference image guidance.

Nano Banana 2
Google's next-gen image model powered by Gemini 3.1 Flash with native 2K/4K resolution, chain-of-thought reasoning, precise multi-language text rendering, and up to 14 reference images.
Kling 3.0
Kuaishou's most advanced video model with native 4K/60fps output, multi-shot storyboarding, multilingual audio, and character consistency for up to 3 people.
Seedance 1.5 Pro
ByteDance's latest video model with synchronized audio generation, flexible aspect ratios, and enhanced motion control.

Grok Imagine
xAI's Aurora-powered visual AI for image generation and video creation with Fun, Normal, and Spicy creative modes.

Seedream 4
ByteDance Seedream 4 image generation and editing with selectable aspect ratios, 1K to 4K output, and batch generation support.


Flux Kontext
Flux Kontext Pro and Max image generation and editing with a unified submit API, configurable aspect ratio, and PNG or JPG output.