Video: generate synthetic data with Stable Diffusion to augment computer vision datasets

Oct 26, 2022

Building image datasets is hard work. Instead of scraping, cleaning and labeling images, why not generate them directly with a Stable Diffusion model?

In this video, I show you how to generate new images with a Stable Diffusion model and the diffusers library, in order to augment an image classification dataset. Then, I add the new images to the original dataset, and push the augmented dataset to the Hugging Face hub. Finally, I fine-tune an existing model on the augmented dataset.

Code: https://gitlab.com/juliensimon/huggingface-demos/-/tree/main/food102
Food101 dataset: https://huggingface.co/datasets/food101
Original model: https://huggingface.co/juliensimon/autotrain-food101-1471154053
How the original model was created with AutoTrain:

Stable Diffusion model: https://huggingface.co/runwayml/stable-diffusion-v1-5
Stable Diffusion Space: https://huggingface.co/spaces/runwayml/stable-diffusion-v1-5
Diffusers library: https://github.com/huggingface/diffusers
Food102 dataset: https://huggingface.co/datasets/juliensimon/food102
New model: https://huggingface.co/juliensimon/swin-food102

The AI Realist

Discussion about this post

Ready for more?