Research on the technology of image generation based on text description using the Stable Diffusion model
Abstract
The article discusses the results of the research on the technology of image generation based on text description using the Stable Diffusion model. The principles of the forward and reverse diffusion mechanism, which consists in gradually adding and removing noise from images, were considered in detail. The implementation was carried out using the Python programming language and the PyTorch and Hugging Face diffusers libraries, which allowed for effective generation of images from given text queries. A software module was developed that demonstrates the operation of the Stable Diffusion architecture. The module implements a full generation cycle - from entering a text query by the user to obtaining a finished image. The components U–Net, Variational Autoencoder (VAE) and CLIP text encoder were used to build the system. The created module allows you to set the generation parameters (number of diffusion steps, level of text influence, resolution, etc.) and visualizes the results obtained. The forward and backward diffusion algorithms underlying the model have been investigated. Based on experiments, it has been found that reducing the number of diffusion steps preserves image quality provided that the noise coefficients and the guidance scale parameter are correctly selected. It has also been confirmed that the use of latent space allows for a significant reduction in computational costs without losing the photorealism of the result
References
2. Dehouche N. What’s in a text–to–image prompt? The potential of Stable Diffusion. Patterns. 2023. Vol. 4, No. 5. DOI/Publisher. [Electronic resource]
3. Podell D., English Z., Lacey K. et al. SDXL: Improving Latent Diffusion Models for High–Resolution Image Synthesis. arXiv preprint. 2023. [Electronic resource]
4. An Introduction to Diffusion Models and Stable Diffusion.” Marvik AI Blog. 2023, Nov 28. [Electronic resource]
5. «Рисунок 1.1 – Схематичне зображення процесу прямої та зворотної дифузії у DDPM». ResearchGate. [Electronic image resource]


