NeuralSVG

Abstract

Vector graphics are essential in design, providing artists with a versatile medium for creating resolution-independent and highly editable visual content. Recent advancements in vision-language and diffusion models have fueled interest in text-to-vector graphics generation. However, existing approaches often suffer from over-parameterized outputs or treat the layered structure --- a core feature of vector graphics --- as a secondary goal, diminishing their practical use. Recognizing the importance of layered SVG representations, we propose NeuralSVG, an implicit neural representation for generating vector graphics from text prompts. Inspired by Neural Radiance Fields (NeRFs), NeuralSVG encodes the entire scene into the weights of a small MLP network, optimized using Score Distillation Sampling (SDS). To encourage a layered structure in the generated SVG, we introduce a dropout-based regularization technique that strengthens the standalone meaning of each shape. We additionally demonstrate that utilizing a neural representation provides an added benefit of inference-time control, enabling users to dynamically adapt the generated SVG based on user-provided inputs, all with a single learned representation. Through extensive qualitative and quantitative evaluations, we demonstrate that NeuralSVG outperforms existing methods in generating structured and flexible SVG.

How Does it Work?

We learn an implicit neural representation for generating vector graphics from text prompts. We encode the SVG into the weights of a small MLP network, optimized using an Score Distillation Sampling (SDS).

To promote an ordered representation, we use a dropout-based technique to encourages each learned shape to have a meaningful and ordered role in the overall scene.

Our neural representation enables inference-time control over the generated asset such as dynamically adjusting the color palette or aspect ratio of the generated SVG, all with a single learned representation.

What Can it Do?

Encouraging Ordered Representation with Nested Dropout

We show results generated by our method when keeping a varying number of learned shapes in the final rendering. Even with a small number of shapes, our approach effectively captures the coarse structure of the scene.

Color Palette Control

Given a learned representation, we render the result using different background colors specified by the user, resulting in varying color palettes in the resulting SVGs. The upper row shows colors observed during training while the bottom row shows unobserved (generalized) colors.

Aspect Ratio Control

We present results from optimizing NeuralSVG with aspect ratios of 1:1 and 4:1. In each pair, the left image displays the generated SVG with a 1:1 aspect ratio (square format), while the right image shows the model's output with a 4:1 aspect ratio.