Skip to main content

Command Palette

Search for a command to run...

Building PetPaint (Part 1): From Local POC to Production Deployment

I asked Stable Diffusion for a cat portrait. It gave me a vase.

Updated
9 min read
Building PetPaint (Part 1): From Local POC to Production Deployment

This is a two-part engineering retrospective on PetPaint. Part 1 covers Phase 1 and Phase 2. Part 2 covers the Phase 3 image quality improvement. Live product: pet-paint.vercel.app

Why PetPaint

This project started with curiosity. I was interested in how models actually work under the hood, and I was exploring a new career direction at the same time. I found fast.ai's Practical Deep Learning for Coders course. By lesson three, I was getting bored — watching videos, running notebooks, rinse and repeat. I felt like I was learning things, but nothing was sticking. Around that time I was listening to Naval Ravikant's podcast, where he talked about learn by doing and live in the arena — real learning happens through doing, because that's how you find out where and when to apply specific knowledge. So I decided to start building an AI project.

Why pet oil paintings? My cat Mijiang passed away unexpectedly during that same period. I saw AI products that could transform photos into different artistic styles, and I thought — I'll build one myself, specifically for turning cat and dog photos into oil paintings. Why oil paintings? They feel timeless to me, more classic and more meaningful than a photograph. And that's how PetPaint started.


Phase 1: Getting It to Run

The first step was to write a local generate.py, feed it a cat photo, call the model, and see if it could produce an image. Good or bad, I just needed it to run.

I used runwayml/stable-diffusion-v1-5, an open-source model from 2022. It uses an img2img pipeline. The general idea is: the original image gets compressed into a smaller numerical representation (latent) through a VAE, then noise is added — the strength parameter controls how much. The more noise, the more original image information is lost. The model then denoises step by step, guided by the prompt, generating a new image. Style transfer happens during this denoising process.

I wrote the code and ran it:

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
)

result = pipe(
    prompt="oil painting, impressionist, brush strokes, artistic",
    image=image,
    strength=0.75,
    guidance_scale=7.5,
).images[0]

After a short wait, an image came out. I opened it —

A vase. With daisies next to it. Where's my cat?

I traced the problem back to strength=0.75. This parameter controls how much of the original image is replaced by noise. At 0.75, 75% of the original image was wiped out. The model had almost no signal that there was a cat in the picture, and since the prompt only described a style without any content, the model just painted whatever it wanted.

Since it was too high, I started lowering it. I tried values like 0.5, 0.45, 0.3, checking each output — did a cat appear? Was the face distorted? Did it look like an oil painting? Too low and the oil painting style was barely visible, the output still looked like a photo. Too high and it went back to vases. I eventually found a reasonable range between 0.45 and 0.5, and a cat finally appeared.

The face was crooked, the fur color was off, but at least it was a cat.

Along the way, I also ran into a completely black image. No error — just pure black. After investigating, I found it was a known issue with float16 precision on Apple M1 Pro's MPS backend. Intermediate values in the matrix computations exceeded float16's maximum representable value (65504), overflowing to inf. When inf entered subsequent operations, it became NaN, and NaN has a property: any number that touches it also becomes NaN. Once a single NaN appeared, it spread like ink dropped into water, until nearly every value was NaN. When converting to pixel values, NaN was treated as 0 — and RGB all zeros is pure black.

Switching to float32 fixed it:

torch_dtype=torch.float32

Slower and more memory-hungry, but at least the images came out normal. The cat was still ugly though. The distortion was still severe, so I started experimenting with different parameters to improve the output.

First, the prompt. I changed it from the simple initial version to a more detailed description:

prompt="oil painting, thick impasto brushstrokes, painterly, warm golden tones, detailed fur texture, fine art"

The background gained a noticeable oil painting texture after the change, but the cat itself was still bad — the shape was different every time. It looked nothing like the original cat, and couldn't even maintain basic cat anatomy.

Raising strength from 0.45 to 0.55 made the distortion even worse. After trying different combinations of prompt and strength, I couldn't find a balance between "looks like an oil painting" and "the cat isn't distorted."

Objective Result
Technical feasibility ✅ Model runs, images are generated
Quality feasibility ❌ Severe cat face distortion, parameter tuning can't fix it

Phase 1's goal was to validate technical feasibility, and that goal was met. The quality hit a wall, but that wasn't the problem to solve right now — first, get the product up and running. Quality could come later.


Phase 2: Turning It Into a Real Product

Phase 1 proved the model could run. Phase 2's goal was to make it usable — a frontend where users can upload photos, a backend API that receives images, runs the model, and returns results, with both deployed to production.

Tech Stack

Backend: FastAPI Native async/await support, clean file upload handling — for a single API endpoint backend like PetPaint, it was more than enough.

Deployment: HuggingFace Spaces + Docker HuggingFace is purpose-built for AI/ML deployment. Model downloads are fast, the free CPU tier is enough to get started, GPU is available on-demand with hourly billing, and CUDA drivers come pre-installed. Docker ensures environment consistency between local and server — my local machine runs macOS while the server runs Linux, and without Docker, environment differences would cause all kinds of issues.

Frontend: React + Vercel Frontend is where I'm most experienced. I built an upload interface — users select a photo, click generate, the frontend sends the image to the backend API, and displays the oil painting result when it comes back. Vercel makes deployment straightforward, so this part was relatively smooth.

Unlike a typical backend, an AI backend has higher demands on the runtime environment — hardware, dependencies, and inference speed can all become problems. Deploying the backend to HuggingFace Spaces, I ran into three issues back to back.

Three Deployment Issues

Issue 1: Deployment environment doesn't support MPS

After deploying to HuggingFace, the container threw an error on startup:

RuntimeError: PyTorch is not linked with support for mps devices

During local development, I had hardcoded device = "mps" — since I was always running on my M1 Pro, where MPS is the PyTorch backend for Apple Silicon's GPU, everything worked fine locally. But after deploying to HuggingFace Spaces, the server was a Linux environment on the free CPU tier, which doesn't support MPS. The code failed the moment it hit pipe.to("mps").

I realized that code that runs locally won't necessarily work in deployment — different environments have different hardware and configurations, so nothing should be hardcoded. I switched to using environment variables:

device = os.getenv("DEVICE", "mps")  # Defaults to MPS locally, overridden by env var on server
pipe.to(device)

I set DEVICE=cpu in HuggingFace's Settings. Now the same code uses MPS locally and CPU on the server, without any code changes between deployments.

Issue 2: Missing python-multipart

After fixing the MPS issue and restarting, another error appeared:

RuntimeError: Form data requires "python-multipart" to be installed

FastAPI needs this library to handle file uploads (UploadFile), but it doesn't install automatically with FastAPI. My local conda environment happened to have it, so everything worked locally. It wasn't until deploying to the server — a clean environment that only installs what's listed in requirements.txt — that the missing dependency surfaced.

Fix: explicitly add python-multipart to requirements.txt. Lesson learned: if you use it, list it. Don't rely on your local environment happening to have it.

Issue 3: CPU inference timeout

After fixing the first two issues, the backend was finally up. I uploaded a photo from the frontend, clicked generate, and waited. And waited. Then the page showed an error:

ERR_CONNECTION_CLOSED

The connection was dropped. No image returned. But this error only says the connection was closed — it doesn't tell you why. One clue was the wait time — it took several minutes before failing, which meant the request itself wasn't broken. It was just taking too long. My guess was that CPU inference was too slow, and the network layer timed out and cut the connection.

To test this, I temporarily upgraded HuggingFace's hardware from the free CPU tier to an Nvidia T4 small GPU. After the upgrade, the same request returned an image in seconds. Confirmed: the problem was CPU inference being too slow, causing a connection timeout. After testing, I switched back to CPU to save costs.

Phase 2 Conclusion

Objective Result
Backend deployed to HuggingFace
Frontend deployed to Vercel
User uploads photo → returns oil painting

The product was live. Users could upload a photo, and the API would return an oil painting style image.

But the image quality problem from Phase 1 was still there. The product's foundation was in place — now it was time to focus on the output quality. That's what Part 2 is about.