CogVLM2

Used Model: cogvlm2-video

CogVLM2 is a model that combines image and video understanding, enabling tasks like captioning, visual question answering, and multimodal analysis.

Demo IconDemo
Top P

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens

0.1
Prompt

Input prompt

Describe this video.
Input Video

Input video

Drop your files
Temperature

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic

0.1
Max New Tokens

Maximum number of tokens to generate. A word is generally 2-3 tokens

2048
Result

Preview, share or download. Again with one click.

"In the video, we see a large elephant walking across a dry grassland. The elephant's skin is covered in a vibrant, rainbow-colored pattern. The elephant's ears are large and floppy, and it has a long, curved trunk. The elephant's eyes are visible, and it appears to be moving purposefully. The background is a clear blue sky, and there are no other objects or creatures in sight. The elephant's colorful skin stands out against the natural surroundings, creating a striking visual contrast."