BeyondBeings
Back to blog
Under the hood·May 22, 2026·8 min read

What research agents and design agents actually do in a graphics workflow

“Agent” is heading toward the same fate as “AI” — soon it'll be a marketing label everyone slaps on a product, and the word will stop meaning anything. Before that happens, here's what an agent actually is in a graphics workflow, and what each of the four BeyondBeings agents actually does when you hit generate.

The working definition

An agent, the way we use the word, is a system that makes decisions and takes actions across multiple steps without you driving each one. Two things distinguish it from a plain AI call:

  • It chooses. The system decides what to do — which sources to weight, which model to invoke, which framing to use — rather than executing a single instruction.
  • It acts. The system runs the step it decided on, evaluates the result, and either continues to the next step or revises.

That's the bar. If a product calls itself agentic but just exposes a prompt box for an LLM, that's not an agent — that's an AI tool with a buzzword on top.

BeyondBeings runs four agents end-to-end across the editorial graphics pipeline. Each owns a real decision boundary. Here's what they actually do.

1. The Agentic Research Engine

What it owns

You pick a topic — “Quibi's $1.75B collapse,” “why Meta's Q4 ad revenue beat,” “the OpenAI board crisis timeline.” The research engine takes that input and decides what angle to pursue.

The decisions

Three layers of decisions happen in the research stage: what hasn't been done (the angle that's saturated vs. the angle that's fresh), what source weighting matters (a pop-culture topic weights social and press differently than a business topic), and what depth the story needs (a 4-slide carousel calls for a different research depth than an 8-slide carousel).

An AI tool would expose this as “write your research query.” The research engine decides it for you. That's the agentic part.

2. Agentic Headline & Positioning

What it owns

Once the angle is set, the headline agent owns the editorial voice — the slide titles, the narrative sequencing, and the positioning. It writes the way a top media operator would write: hook on slide one, three story beats across the middle slides, payoff on the last slide.

The decisions

The headline agent picks the hook pattern — numbered listicle, contrarian claim, hidden-story angle, “what everyone got wrong,” or one of eight others — based on the topic shape. It decides how much to give away on slide one versus hold back for the payoff. It calibrates the editorial register: tabloid for entertainment pages, sober for finance, urgent for breaking news.

This is the most underestimated agent in the stack. Editorial title-writing is the kind of skill editors at top publications spend careers refining; encoding it into an agent is the work most AI graphic tools don't bother with.

3. The Agentic Carousel Designer

What it owns

The carousel designer composes the visuals. This is the stage most products stop at — generate an image, done — but for editorial output the image is half the job. The designer also owns model selection, composition, typography overlay, and the visual continuity across slides.

The decisions

The single most visibly agentic decision in the system: which image model to invoke for each slide. The designer routes across three state-of-the-art models depending on what the slide needs.

  • Nano Banana Pro— Google's Gemini 3 Pro Image. Flagship for editorial realism, real subjects, magazine-cover composition.
  • GPT Image 2— OpenAI's newest premium image model. Strongest in the stack at legible on-image text and recognizable likenesses.
  • FLUX 2 Pro— Black Forest Labs' Pro tier. Extremely photoreal for text-only generations where pure image quality is the priority.

A slide that's mostly about a real person's face goes to GPT Image 2. A slide that needs magazine-cover editorial framing goes to Nano Banana Pro. A slide that's a pure visual scene with no embedded text goes to FLUX 2 Pro. You don't pick — the designer does. Best-model-for-the- job routing is the most concrete example of an agentic behavior in the system.

After image generation, the designer composites the bold Anton editorial typography on the lower third — making sure the title overlay never crops the subject. That layout decision is also agentic: the system reasons about where the subject sits in the frame, not just where to drop the text.

4. The Agentic Engagement Optimizer

What it owns

Most graphics tools end at the visual. The engagement optimizer keeps going: caption, CTA, post structure, hashtag weave. The agent owns everything between “the graphic is rendered” and “the post is ready to publish.”

The decisions

The optimizer writes the caption in the same editorial voice as the slide titles — not a generic AI summary, but a 2-3 sentence elaboration that reads as analysis. It decides what kind of question or CTA closes the caption (a save-driving question, a share-driving statement, a comment-driving provocation). It threads hashtags without diluting the editorial voice.

These are small decisions individually. Stacked across every post, they're the difference between a carousel that gets 200 likes and one that gets 200,000.

How they hand off

The four agents don't run as a single LLM call with four subprompts. They run as a pipeline with handoffs: the research agent's output becomes the headline agent's input; the headline agent's output becomes the designer's input; the designer's output flows to the engagement optimizer.

Where possible, stages parallelize. For a 6-slide carousel, slide images all generate at once — the designer fires six image-model calls in parallel. Headline writing happens while the research is still finishing for later slides. The whole pipeline lands in a few minutes end-to-end.

Why this is the design

You could imagine building this differently. One giant LLM call. A single prompt-to-image pipeline with no agent layer. A wrapper around Midjourney.

None of those would do what an agentic pipeline does, for one reason: they don't make decisions across the workflow. They produce outputs. A user who wants a finished editorial post still has to assemble the workflow themselves.

The decisions are the product. The four agents — research, headline, design, engagement — exist because each one of those stages requires real judgment, and encoding that judgment is what makes BeyondBeings an agentic editorial graphics platform rather than a tool with an AI label.

The clearest way to feel this is to use it. Open the Content Terminal and direct the agents on a topic of your own — or read the full how-it-works page for the deep technical pass.

Direct the agents on a topic of your own

The clearest way to feel the agentic pipeline is to use it. Free to try, no signup needed.

Open the Content Terminal