When OpenAI first demoed their text-to-video model Sora we, along with a large swathe of the media industry, thought “wow, this is going to be a game-changer for B-roll.”
Generative AI video is still way too random and inconsistent to be used for A-roll. Characters, objects and settings will look different shot-to-shot, so getting reliable continuity is basically impossible. We learned a few months after the Sora unveil that even one of the featured videos – air head by Shy Kids – required substantial correction in post-production to remove inconsistencies and genAI weirdnesses. Shy Kids estimated they generated 300 minutes of Sora footage for every usable minute.
As with all AI efforts right now, we’re seeing huge progress towards more usable systems, and generative video AI startups like Odyssey have already appeared specifically promising the consistency and continuity necessary for good storytelling.
So for now, genAI video isn’t ready to tell stories all by itself. But maybe it can be a part of the storytelling process by producing B-roll. Any generative AI system is only as good as its training data, and there are millions of hours of establishing shots, landscapes, cityscapes and more out there. So lets put it to the test.
I’m going to include 6 of the most popular free generative AI systems on the market right now with a few different styles of prompt. I’m only going to use systems which allow full video generation from a text prompt, not systems which animate images.
Every generative AI system is unique and responds to different types of prompts in different ways, so this shouldn’t be seen as a test of which text-to-video AI is “the best” – you will definitely be able to get better results from each system by playing with the prompt and settings, and experimenting with each to get the best out of it.
The most important test for genAI systems right now – whether text, image or video – is if their output can appear as if it doesn’t come from a genAI system. That’s the benchmark we’ll be applying.
The systems we’ll be using:
Test 1 – A cityscape at night
The prompt
A panning shot of a present-day city at night. Streets, buildings and billboards fill the entire frame.
The results
The verdict
There are elements from a few videos that could be usable, specifically the middle-distance and skyline shots. The buildings created by Runway and Luma are very close to realistic, and the skylines in all shots that contain them are passable.
However without fail the traffic is a disaster – complex moving elements continue to be the achilles heel of generative AI video, and it will be interesting to see if the upcoming models from larger providers (particularly Sora from OpenAI and Veo from Google) can make improvements here.
Test 2 – A forest at sunset
The prompt
The camera pans upwards from the treeline of a pine forest to reveal rolling hills beyond covered in trees, with mist resting in valleys between the hills. On the right side of the frame the sun is setting behind the trees in the distance, partially obscured by wisps of cloud, while a small flock of birds flies on the left side of the frame.
The results
The verdict
Now, these are much better results. There are a few genAI artifacts (Pixverse and Haiper’s birds, in particular), but overall these shots are usable. And perhaps more importantly for people generating footage for use in projects, these shots look like what I was picturing in my head when I wrote the prompt.
I purposely included multiple instructions in the prompt to see which model would follow them best. The individual elements were:
- Camera movement
- Type of trees
- Misty valleys
- Position of the birds
- Position of the sun, with clouds and trees in front
I was pleasantly surprised to see that most of the models followed most of these instructions – a few missed the birds, but all of them nailed the forest, the misty valleys and the sunset. One notable curiosity is that only Kling followed the instruction to pan the shot correctly, every other model went for more of a drone or dolly shot with some movement. Kling’s generation interface specifically includes camera controls, so it makes sense it would understand this part of the prompt better.
Test 3 – a stormy seascape
The prompt
The camera flies quickly over a calm sea, we see the water moving with a few waves as we pass close above it. The camera pans upwards to reveal the horizon with a thunderstorm brewing in the distance.
The results
The verdict
In this test we can clearly see some unintended video hallucinations. In particular Haiper, which included the wake of a boat, and Pixverse, whose shot has been invaded by an unwanted seagull.
But again, much of the visual fidelity of these shots is close to good enough. Luma did a particularly good job of following the prompt. With the right color matching and editing, I think half of these shots could be used without being recognized as genAI. And for a technology that is hardly a year old, that is incredible.
What’s the future for AI-generated B-roll?
The simple answer is, as with everything in the generative AI space, it’s going to get a lot better. The industry is realising a simple text prompt isn’t enough to provide the kind of control filmmakers need, so we’re already seeing these tools integrating camera movements, zoom controls and more to give creatives the ability to direct the shot in many of the same ways you would a live crew..
Visual quality will continue to improve, as will the speed of models, lessening the issue of having to generate reels and reels of renders to find something useful.
It’s also worth thinking about how generative AI will impact the use of B-roll more broadly. Of course it will always be important for artistic reasons, but covering a cut or a spoiled shot could become a thing of the past. Adobe recently announced they are adding features to extend shots and remove items via smart masking to Premiere soon, so maybe you’ll no longer need to plaster over that interview shot where someone walks behind your subject – you can just have Firefly cut them out and recreate the background?
AI search is improving B-roll usability too
We’ll never – or probably never – reach a point where there’s no demand for filmed B-roll, so your huge back catalogue of material will always retain its value. And AI isn’t all about generating new material, it can help you understand, index and search your library too. In fact that’s exactly what we’re building here. Check out the demo below to see how we’re unlocking archives for broadcasters, documentarians and more.
Article credits
Originally published on
Filed under
With image generation from
And TikTok creation from