Vector graphics are essential in design, providing artists with a versatile medium for creating resolution-independent and highly editable visual content. Recent advancements in vision-language and diffusion models have fueled interest in text-to-vector graphics generation. However, existing approaches often suffer from over-parameterized outputs or treat the layered structure --- a core feature of vector graphics --- as a secondary goal, diminishing their practical use. Recognizing the importance of layered SVG representations, we propose NeuralSVG, an implicit neural representation for generating vector graphics from text prompts. Inspired by Neural Radiance Fields (NeRFs), NeuralSVG encodes the entire scene into the weights of a small MLP network, optimized using Score Distillation Sampling (SDS). To encourage a layered structure in the generated SVG, we introduce a dropout-based regularization technique that strengthens the standalone meaning of each shape. We additionally demonstrate that utilizing a neural representation provides an added benefit of inference-time control, enabling users to dynamically adapt the generated SVG based on user-provided inputs, all with a single learned representation. Through extensive qualitative and quantitative evaluations, we demonstrate that NeuralSVG outperforms existing methods in generating structured and flexible SVG.
We show results generated by our method when keeping a varying number of learned shapes in the final rendering. Even with a small number of shapes, our approach effectively captures the coarse structure of the scene.
Given a learned representation, we render the result using different background colors specified by the user, resulting in varying color palettes in the resulting SVGs. The upper row shows colors observed during training while the bottom row shows unobserved (generalized) colors.
We present results from optimizing NeuralSVG with aspect ratios of 1:1 and 4:1. In each pair, the left image displays the generated SVG with a 1:1 aspect ratio (square format), while the right image shows the model's output with a 4:1 aspect ratio.
NeuralSVG generates sketches with varying numbers of strokes using a single network, without any modifications to the framework.