Maximizing monetization potential for VOD services
Subscribe to NCS for the latest news, project case studies and product announcements in broadcast technology, creative design and engineering delivered to your inbox.
To increase profitability, VOD services need to extract as much value as possible from their existing content libraries. This is leading many to rethink both engagement strategies and monetization models. Ad-based and hybrid models offer promise, but personalization remains a sticking point. Video services draw from huge libraries of content yet the metadata behind that content can be difficult to extract or can lack the level of detail needed to drive smarter workflows. While asset level metadata provides surface-level understanding, it doesn’t go deep enough to provide the level of granular insight needed for smarter ad placement and more tailored user experiences.
For this level of insight, providers need to mine much deeper into the content to uncover scene-level metadata such as scene transitions, mood, key objects, setting, keywords and character presence. This is made possible at scale with multimodal AI analysis. Being able to access this type of deep scene-level information is the key to creating more personalized and engaging user experiences, improving recommendations, and strategically inserting ads for an improved user experience as well as enhanced ad-revenue.
Unpicking the problem
The challenge for many services is that their metadata is still structured for the early days of streaming. A movie or series is tagged by title, genre, runtime, cast, and perhaps a brief synopsis. That’s enough to drive basic search and recommendation functions, but not nearly sufficient for a market where every second of audience attention is contested. The more a video service knows about what is happening within a specific scene, the more it can align that scene to a user’s preferences or an advertiser’s target criteria.
Scene-level intelligence makes this possible by identifying the natural rhythms of a piece of content, in terms of emotional peaks, transitions as the tone shifts from light-hearted to tense, as well as determining the characters dominating the frame, and the objects creating the atmosphere. By mapping these details, services can begin to unlock a far richer dataset that supports both engagement and monetization.
Enhancing personalization
On the personalization side, this depth of analysis enables services to make recommendations that are more specific than the standard “viewers who liked this title also liked…”. One viewer might respond well to fast-paced action sequences, while another may prefer emotional character dialogue scenes. If a video service understands which scenes deliver the experiences that viewers best engage with, they’re better able to surface unexpected content from the library that aligns with those tastes.
Being able to access this deep scene-level metadata also creates opportunities for more responsive playback features, such as surfacing highlight clips, thumbnails, or reorganizing chapter markers around natural scene breaks. This allows for playback features to be much more dynamic and responsive to what is happening on the screen rather than being dependent on manual intervention or rigid, time-based frameworks.
Strategic ad insertion
The availability of scene-level metadata also brings a number of advantages when it comes to ad-insertion. Firstly, by detecting scene transitions and understanding tone, it can help video providers get the timing right for ad breaks. This helps to prevent ads being inserted at a point that is annoying for the viewer. This is a long standing pain point for video services because if ads are inserted at inopportune moments such as mid-scene or in the middle of an important conversation, it annoys viewers and leads to disengagement. Conversely, if ads are inserted at a time that feels natural, this reduces the negative impact of the ad on the viewer experience, and viewers are more likely to stay engaged.
Secondly, it also serves to enable video providers to deliver contextual ads that are tailored and relevant to the content being viewed at that precise moment, for better ad-targeting. For example, if scene-level analysis determines that a scene features a person chopping fresh ingredients in a kitchen, contextual ads could advertise fresh vegetable delivery services, healthy meal subscription service kits, or even kitchen knife sets. This approach aims to deliver ads that align with what the viewer is watching for a more enjoyable and relevant ad experience. This level of scene context can also help maintain brand safety by avoiding inappropriate and insensitive ad insertion, for example where an ad for a car shows immediately after a road traffic collision.
Unlocking the true potential
There’s real potential that capturing this type of granular, scene-level metadata and integrating it into video workflows could help providers create more personalized and engaging user experiences, improve recommendations, and deliver more targeted ads. Yet, just as with all data, it’s not the data alone that holds the value but rather how it is used and applied. That’s where the real magic lies. As VOD services deepen their understanding of what’s happening within the content, they’re laying the groundwork for even better viewer experiences, and applying these insights creatively may well open doors to engagement strategies we haven’t even imagined yet.
Subscribe to NCS for the latest news, project case studies and product announcements in broadcast technology, creative design and engineering delivered to your inbox.



tags
Ad-Supported Video on Demand (AVOD), Adtech, Advertising, Bitmovin, Broadcast Monetization, data analytics, dynamic ad insertion, Metadata, Personalization, VOD
categories
Advertising, Featured, Streaming, Thought Leadership, Voices