For most of the last few years, the rule was simple: never ask an image model for text. You would type "a poster that says SUMMER SALE" and get back SUMMR SAEL in a font that does not exist. So designers generated the art in AI and added every word by hand in a layout tool, because the model could not be trusted with a single letter.

That rule expired in 2026. Typography accuracy on the top models now runs around ninety-five percent, across Latin and several non-Latin scripts, because the newer models reason about the text instead of drawing it as texture. You can put real words on a poster, a label, a logo lockup. But the old habits and the old prompts still produce garbage, because getting clean text out is a specific skill. Here is how.

Why the models can spell now

The leap is reasoning. Older models treated letters as shapes to approximate, so they produced text-flavored squiggles. The newer ones, GPT Image 2 chief among them, run the prompt through a kind of thinking step that treats the words as words, plans the layout, and renders the actual characters. That is why typography went from a punchline to production-usable in one generation of models.

The practical consequence: text is now an instruction you give carefully, not a thing you avoid. And like any instruction, the quality of the output tracks the precision of the ask.

Put the exact words in quotes

The first rule is to quote the literal text. Do not describe it, state it. Write the headline you want inside quotation marks .. a poster with the headline "Summer Sale, 40% Off" .. so the model treats it as a string to render, not a concept to paraphrase.

Keep each block of text short and explicit. The model is far more reliable on a tight headline than on a paragraph, and far more reliable when you give it the exact characters than when you say "some sale text." If you need several text elements, name each one and its words separately: the headline says this, the subhead says this, the button says this.

Specify placement and hierarchy

Text in design is not just words, it is where the words go and which ones dominate. Tell the model the layout: headline across the top, product name centered, fine print along the bottom. Without placement, it will scatter the text wherever the composition has room, which is rarely where you want it.

State the hierarchy too .. which line is largest, which is secondary. "Large headline at the top, smaller tagline beneath it" gives the model a typographic structure to build, and structure is what makes the difference between a design and a ransom note of mismatched type.

Logos and packaging want tight, bounded text

Logos and product labels are the highest-stakes text because they are small, central, and unforgiving. A garbled word on a packaging mockup is the whole point of the image, ruined.

Keep these asks minimal and bounded. A logo is usually one or two words .. give exactly those, specify the style (clean sans-serif, vintage script), and resist piling on extra elements that crowd the letters. For packaging, name the product text and the few supporting lines, and keep the rest of the prompt about the container and scene, not more words. The fewer competing text elements, the cleaner each one renders. If you already have a wordmark or a label to match, hand the model that reference so it builds the new text against your real type, not a guess at it.

Infographics and data want live information

One genuinely new capability in 2026: some models, Seedream among them, pull live web information into the image, which makes data-driven visuals .. infographics, charts, time-sensitive callouts .. possible without you hand-typing every figure. For content that needs current numbers, this is the difference between a mockup and a usable asset.

Treat the data as text you still verify. The rendering can be clean and the underlying figure still worth a check, especially for anything you publish. The model gets you a designed, populated infographic fast, and you confirm the numbers before it goes out.

Fix wrong letters with an edit, not a reroll

Even at ninety-five percent, one word in twenty comes out wrong, and the fix is not to reroll the whole image and lose a composition you liked. It is to edit the text in place. The same instruction-editing models that spell can also correct: "change the headline to read exactly 'Summer Sale' and keep everything else unchanged."

This is the workflow that makes text reliable in practice. Generate the design, accept that a letter or a word may miss, then run a targeted edit to fix only the text. Quote the correct string, protect the rest of the image, and verify at full zoom. A near-perfect poster with one wrong letter is one edit from done, not a reroll.

Where this runs

Text work is iterative .. generate the layout, fix a misspelled word, swap the headline for an A/B test, localize it into another language .. and each of those steps usually means exporting the image into a different editor, where the type either gets redone by hand or the image degrades.

On Vilva the model that spells and the model that edits text live on the same canvas, so generating a poster and then fixing one wrong letter is a clean branch in one graph, not an export into a layout tool. You can run the same headline through GPT Image 2 and another model side by side when one renders your script more cleanly, fix letters with a protected in-place edit, and spin one approved design into multiple headlines or languages without rebuilding the layout each time.

Two things make this easier when you do need to step in. First, the typography controls. Pick the font, set the weight and size, nudge placement and color so a headline sits the way your brand expects, all without leaving the canvas. The model carries the heavy lifting and the controls are right there for the last ten percent, so getting hands-on is a small move, not a different tool. Second, the agentic side. Build a workflow that carries your references .. a brand logo, a font sample, an earlier poster, the exact product shot .. and the agent generates against them, so the text and the look land closer to what you meant on the first pass instead of the fifth. Reference-anchored runs are how you get accuracy without re-prompting from scratch every time.

Free to try at vilva.ai (200 credits on signup).

When to still reach for a layout tool

Honesty matters: the model is not a replacement for a typesetter on everything. Vilva's own type controls close a lot of the gap, since you can set the font, weight, size, and placement right on the canvas. But for long-form text, exact brand fonts, precise kerning, and legal copy that must be letter-perfect, a dedicated layout tool still wins, and the smart move is to generate the art and the short display text in AI, then drop it into a layout for the fine typographic control.

The shift is not "AI does all the type now." It is that the headline, the logo treatment, the packaging callout, and the data viz .. the display text that used to force a hand-built layout for every concept .. now come out of the model clean enough to use or to fix in one edit. That is what collapses a five-tool design pass into one canvas.

The takeaway

AI could not spell, so we built a whole workflow around avoiding text. That workflow is now obsolete for display type.

Quote the exact words, specify placement and hierarchy, keep logos and labels tight, and fix the occasional wrong letter with an edit instead of a reroll. The model will spell for you in 2026. You just have to ask it like you mean the exact letters, because now it renders exactly what you ask.