Author: Eric
Rubin and Dmitriy Pinskiy
Date: October 2025
This paper introduces a novel generative
workflow that combines 3D Gaussian Splatting (3DGS) with Low-Rank
Adaptation (LoRA) of latent diffusion models
such as Stable Diffusion (SD).
The method is designed not to guarantee
exact product reproduction, but to strongly bias the generative model
toward maintaining the product’s recognizable geometry and appearance across
multiple views and stylistic transformations.
This framework unlocks three core use cases:
Generate new interpretations, embellishments, or alternative versions inspired by the captured physical object.
Rapidly explore product-line ideas, shape variants, colorways, and structural modifications without manual 3D modeling.
Reimagine the product across photographic, editorial, artistic, or material styles while preserving its core identity cues.
Two complementary approaches are presented:
Both approaches are practical, scalable, and cost-efficient for product visualization, eCommerce pipelines, and generative design tools.
Generative diffusion models excel at style, lighting, mood, and composition — but are weak at:
●
preserving exact geometry,
●
maintaining multi-view
consistency,
●
controlling identity across
styles.
DreamBooth improves identity retention, but 2D training images are inherently limited:
●
restricted view coverage
●
inconsistent lighting
●
segmentation noise
●
no depth or structural cues
This motivates the integration of 3D representations into the generative process.
3DGS reconstructs an object from casual
video into a dense set of Gaussians capturing both geometry and appearance.
This makes it uniquely suited to support
generative models.
Render any camera angle → perfect for multi-view LoRA training.
Depth, normals, and mask maps aligned with the RGB render.
Captures fine-grained details such as reflections, microstructure, gemstone behavior, and material roughness.
Unlike meshes, 3DGS requires no UVs, topology, or retopology.
Supports large-scale batch workflows.
3DGS + LoRA:
●
does not enforce exact
shape reproduction,
●
does not act as a hard
renderer,
●
cannot guarantee pixel-accurate
identity preservation.
Instead, it provides a strong bias toward the captured object's identity — dramatically reducing, but not eliminating, distortions.
This honesty is essential for real-world applications.
This method treats 3DGS as an infinite clean data generator, producing multiview synthetic photographs used to fine-tune LoRA weights.
From a single 3DGS:
●
~150 views (36 azimuth × 5
elevation)
●
controlled lighting variation
●
solid, neutral, and blurred
photographic backgrounds
●
optional depth, normals, and masks
Yields a dataset of 300–600 high-quality images ideal for LoRA.
LoRA updates only low-rank matrices in SD’s U-Net:
●
main SD weights remain frozen
●
geometry influence comes
implicitly from consistent imagery
●
LoRA learns a token representing the product identity (e.g. [RING123])
“Reimagine [RING123] in an organic Art Nouveau style.”
“Produce thinner, thicker, or more angular versions of [RING123].”
“Editorial fashion photo of [RING123] under neon lighting.”
The product remains recognizable, though not guaranteed to be perfect.
Though powerful, DreamBooth LoRA suffers from:
●
shape drift in extreme styles
●
no explicit geometric constraints
●
difficulty controlling exact
viewpoint
●
inability to enforce multi-view
consistency
This motivates an architecture that directly injects 3D geometry into diffusion.
Geometry-Injected LoRA introduces explicit 3DGS signals into SD’s architecture.
We present three variants, with increasing strength of geometric conditioning:
Depth + normals → geometry encoder → bias in cross-attention logits.
Text tokens attend differently depending
on geometry.
This increases likelihood of geometric
consistency without fully constraining the model.
3DGS Geometry (Depth, Normals)
|
v
Geometry Encoder (F_geo)
|
v
SD Cross-Attention Block
logits = (QKᵀ)/√d + B_geo(F_geo, T)
●
Consistent perspective across
variations
●
Style-prototyping while reducing
shape drift
Add geometry encoder output directly into U-Net latent input.
z0 = z_noisy + conv_geo(F_geo)
The U-Net receives spatial structure directly, increasing structural fidelity but still allowing stylistic transformation.
●
Material swaps
●
Colorway changes
●
Controlled shape alterations
A lightweight geometry-processing branch (LoRA-powered mini-U-Net) injects multi-scale feature maps into SD’s U-Net.
Geometry
Branch (LoRA mini U-Net)
| | |
R1 R2 R3
...
\ | /
U-Net Down / Middle / Up Blocks
+ geometry residuals
Most accurate geometric consistency among the LoRA models, though still not a hard constraint.
●
Camera-controlled generation
●
Consistent product animation
frames
●
Design exploration with structural
coherence
This section focuses on your three critical use cases.
Produce novel artistic or concept-driven reinterpretations of the original product:
●
ethnic-inspired ring motifs
●
minimalist reinterpretations
●
futuristic reinterpretation
●
“luxury
editorial” flavor
●
abstract or sculptural variations
3DGS LoRA ensures the creative output remains inspired by the original product, even if not perfectly preserved.
Use the model as a rapid exploratory engine:
●
explore thickness, curvature,
silhouette
●
test new gemstone arrangements
●
reshaped handles, straps, clasps,
folds
●
alternative body shapes for
apparel
●
industrial design iterations
This drastically reduces iteration time for artistic and manufacturing prototyping.
The system excels at applying stylistic transformations while retaining key identity cues:
●
studio lighting concepts
●
photography style boards
●
campaign looks (high fashion,
indie editorial, cinematic)
●
color palettes, materials, surface
finishes
●
eCommerce hero shots
This enables rapid creative direction work.
A crucial correction:
LoRA
combined with 3DGS cannot guarantee perfect identity preservation.
Instead, the method:
●
increases the likelihood of
recognizable identity,
●
stabilizes geometry across views,
●
reduces shape drift,
●
provides controllable generative
variation.
For exact preservation, one must use renderers or differentiable geometry pipelines — beyond the scope of LoRA.
3DGS-driven LoRA is a powerful and practical approach to building geometry-aware generative tools for product content creation, creative design, and stylistic prototyping.
While it does not guarantee perfect structural preservation, it significantly biases diffusion models toward:
●
consistent multi-view geometry
●
recognizable product identity
●
high-quality stylistic variations
●
scalable automation
●
rapid creative exploration
This combination of flexibility + structural bias makes it uniquely effective for modern eCommerce, digital art, and design R&D workflows.
3DGS
Reconstruction
|
v
Synthetic
Dataset Builder (RGB, Depth, Normals, Masks)
|
v
Choose LoRA Approach:
• DreamBooth-Style
LoRA
• GeoLoRA v1
(Attention Bias)
• GeoLoRA v2
(Latent Injection)
• GeoLoRA v3
(Hybrid)
|
v
Train LoRA (Low-Rank Updates Only)
|
v
Geometry-Biased
Generative Model
def cross_attn_with_geo(x, T, F_geo):
Q = proj_q(x); K = proj_k(T);
V = proj_v(T)
G = lin_geo(flatten(F_geo))
U = lin_text(T)
B_geo = einsum("bpd,bld->bpl",
G, U)
logits = (Q @ K.transpose(-2,-1)) /
sqrt(d) + B_geo
A = softmax(logits)
return A @ V