AI Product Photography: From Flat-Lay to On-Model in Your Browser
AI product photography now runs the whole shoot in a browser .. flat-lay to on-model, one product photo into dozens of lifestyle scenes, virtual try-on, and per-platform exports. Here is the full workflow, and the consistency fixes that keep a catalog from looking like fifty different shoots.
A real product shoot costs a few hundred to a few thousand dollars per session, and you get one set, one model, one location, and a week of waiting. AI product photography does the same catalog for cents an image, in a browser, in an afternoon. The cost gap is not subtle, it is roughly eighty to ninety-five percent.
The catch is that doing it badly is just as easy as doing it cheaply. Most AI catalogs fall apart in two places: the product subtly changes between shots, and the fifty images look like fifty different shoots stapled together. This is the full workflow, with the consistency fixes built into the steps where they matter, so you get a catalog that looks like one brand, not a pile of generations.
What you need before you start
- One or more clean product photos, ideally a flat-lay or a plain-background shot with the product sharp and well lit. This is your ground truth.
- An image model for generation and editing, and, if you sell apparel or anything worn, a model with virtual try-on or on-model generation.
- A short list of the contexts you need .. studio white, lifestyle scene, in-use, plus the platforms you ship to (Amazon, Shopify, your site, the social channels).
- 200 free credits if you are doing this in Vilva. Sign up at vilva.ai.
Step 1, lock the product as ground truth
Before you generate a single scene, fix the product itself. The number one tell of AI catalog work is a product that drifts .. the logo moves, the color shifts, a strap appears that was never there. Every later image must reference the real product, not a regenerated version of it.
Start from your cleanest real photo. If you only have one angle, generate the missing angles once, carefully, and treat those as part of your locked set. From here on, every scene references this set. The product never gets reinvented per image, it gets placed into each new context.
Expected result: a small, consistent set of the real product from the angles you need, saved as the reference every later image points back to.
Step 2, generate clean studio shots first
Get the boring, essential shots done before the creative ones. White-background pack shots are what marketplaces require and what your product page leads with, and they are the easiest to get consistent because there is no scene to manage.
Generate the product on clean backgrounds in the angles your listing needs, all referencing the Step 1 set so the product is identical across them. These also double as a sanity check .. if the product is drifting here, on the simplest possible shot, fix it before you move to complex scenes.
Expected result: a set of marketplace-ready white-background shots in the required angles, all showing the identical, on-brand product.
Step 3, spin one product into many lifestyle scenes
This is where AI earns its keep. From the locked product you can generate dozens of lifestyle contexts .. on a kitchen counter, in a bathroom, outdoors, in use .. without staging a single set. The product stays fixed, only the world around it changes.
Decide the lighting and mood once and hold it across the scenes so they feel like one campaign. Reference the product set in every scene so the thing on the counter is the thing on the shelf is the thing in the listing. Variety in context is the goal, drift in the product is the failure.
Expected result: a batch of lifestyle scenes that vary the setting while keeping the product, lighting mood, and brand feel consistent across all of them.
Step 4, on-model and virtual try-on for anything worn
If you sell apparel, accessories, or anything held or worn, you need it on a person, and this is the highest-stakes shot. Generate or reuse a consistent model, then place the garment on them with virtual try-on, or generate the on-model shot directly with the product referenced.
Two things make or break it. The garment has to keep its real cut, color, and print, so reference the product hard. And the model should be consistent if you want a coherent catalog .. one or a small set of recurring models reads as a brand, a fresh stranger per item reads as a stock-photo bin. Lock the model the same way you locked the product.
Expected result: on-model shots where the garment keeps its true cut and print, worn by a consistent model or small model set across the line.
Step 5, fix the contact points, hands and drape
The realism tells in product photography are specific. For held products it is the hand .. the grip floats, the fingers pass through, the scale is subtly wrong. For worn products it is the drape .. fabric that hangs like plastic or clings where it should fall.
Spend your QC attention here. For held items, reference a real hand-holding-object image so the finger geometry is right and check the product is the correct size for the hand. For worn items, look at how the fabric folds at the shoulders and waist. These contact points are where a viewer's eye goes first and where "AI" gets clocked fastest.
Expected result: held products with believable grip and scale, and worn products whose fabric drapes naturally at the contact points.
Step 6, export the right size for every platform
The same image needs to ship at different dimensions .. Amazon's square requirements, Shopify's ratios, your social channels' verticals. Do not crop blindly and chop off the product, generate or reframe each crop so the product stays composed in frame at every ratio.
Batch this. One master image becomes a set of platform-correct exports, each with the product well placed, not a center-crop that amputates it. Getting the sizing right per channel is unglamorous and it is half of why catalogs look unprofessional.
Expected result: every image exported at the correct dimensions for each platform, with the product properly composed in each crop rather than blind-cropped.
Where this runs
The reason AI product photography usually looks like a patchwork is the tooling. You remove backgrounds in one app, generate scenes in another, do try-on in a third, and resize in a fourth, and the product degrades a little at every export, so by image fifty it has quietly drifted.
On Vilva the whole shoot is one canvas. The product you lock in Step 1 feeds the studio shots, the lifestyle scenes, and the on-model try-on as the same reference, so it stays identical from pack shot to lifestyle to model. Background work, scene generation, try-on, and per-platform export are nodes in one graph, and the agent carries the locked product and model across every image, which is what keeps a fifty-image catalog looking like one brand. Free to try at vilva.ai (200 credits on signup).
Troubleshooting and next steps
- The product changes between shots. You regenerated it instead of placing it. Reference the Step 1 locked set in every image.
- The catalog looks like different shoots. Your lighting mood drifts (Step 3). Decide the light once and hold it across scenes.
- The garment loses its print on-model. Weak product reference in try-on (Step 4). Anchor the real product hard.
- The hand holding it looks fake. Grip and scale (Step 5). Reference a real hand-holding-object image and check size against a real photo.
- The crops cut off the product. You center-cropped (Step 6). Reframe per ratio so the product stays composed.
- Next step: once one product runs the full chain clean, save the model and the lighting setup as reusable assets. The next SKU is mostly swapping the product into the same scenes.
The takeaway
AI product photography is cheap and fast, and that is exactly why most of it looks cheap and fast. The difference is consistency, the product and the model locked once and reused everywhere, not reinvented per image.
Lock the product, hold the light, fix the hands and the drape, and size it right for every channel. A catalog that looks like one brand is not a better model, it is the same product placed into many worlds instead of generated fifty times.