Image | ||||
1 | Text-to-image | T2I | Text-to-Image | Generates an image based on a text prompt. (Example: Dali3, Midjourney) |
2 | Image-to-text | I2T | Image-to-Text | Analyze images to generate text descriptions. (e.g. Midjourney /describe function) |
3 | Image-to-image | I2I | Image-to-Image | Create new images by transforming or stylizing existing images (e.g. Stable Diffusion, Midjourney Style Reference) |
Video | ||||
4 | Text-to-video | T2V | Text-to-Video | Generate videos based on text prompts (e.g. Gen-2, Pika, Sora, Veo) |
5 | Image-to-video | I2V | Image-to-Video | Generate continuous video using images as a source (e.g. Gen-3, Pika, EMO, MS VASA-1) |
6 | Video-to-video | V2V | Video-to-Video | Convert or auto-edit the style of your video to create a new video (e.g. Hey-Gen, A1111, Domo) |
7 | Video-to-text | V2T | Video-to-Text | Analyze the content of your video to generate a text description. |
Sound | ||||
8 | Sound-to-text | S2T | Sound-to-Text | Generate text descriptions by analyzing sound or speech (e.g. Clova Notes, ChatGPT Voice Mode) |
9 | Text-to-sound | T2S | Text-to-Sound | Generate sounds, voices, and music based on text descriptions (e.g. Suno, Udio, ElevenLabs) |