Demo video: SPRITE transforms raw Game UI screenshots into editable, engine-native assets through semantic scaffolding and precision grounding
Game UI implementation requires translating stylized mockups into interactive engine entities. However, current "Screenshot-to-Code" tools often struggle with the irregular geometries and deep visual hierarchies typical of game interfaces. To bridge this gap, we introduce SPRITE, a pipeline that transforms static screenshots into editable engine assets. By integrating Vision-Language Models (VLMs) with a structured YAML intermediate representation, SPRITE explicitly captures complex container relationships and non-rectangular layouts. We evaluated SPRITE against a curated Game UI benchmark and conducted expert reviews with professional developers to assess reconstruction fidelity and prototyping efficiency. Our findings demonstrate that SPRITE streamlines development by automating tedious coding and resolving complex nesting, effectively blurring the boundaries between artistic design and technical implementation in game development.
SPRITE Our system transforms mockups into engine assets via three stages: (1) Semantic Scaffolding, VLM infers a schema-guided hierarchical YAML; (2) Precision Grounding, utilizing 2D models for pixel-perfect extraction and geometric calibration; and (3) Engine-Native Synthesis, where an MLLM generates executable UXML/USS code and interaction logic.
If you find this research useful, please cite our paper: