Research on the technology of image generation based on text description using the Stable Diffusion model

Keywords: Stable Diffusion, Clip Text Encoder, U-Net & Scheduler, AutoEncoder & Decoder, Tocen

Abstract

The article discusses the results of the research on the technology of image generation based on text description using the Stable Diffusion model. The principles of the forward and reverse diffusion mechanism, which consists in gradually adding and removing noise from images, were considered in detail. The implementation was carried out using the Python programming language and the PyTorch and Hugging Face diffusers libraries, which allowed for effective generation of images from given text queries. A software module was developed that demonstrates the operation of the Stable Diffusion architecture. The module implements a full generation cycle - from entering a text query by the user to obtaining a finished image. The components U–Net, Variational Autoencoder (VAE) and CLIP text encoder were used to build the system. The created module allows you to set the generation parameters (number of diffusion steps, level of text influence, resolution, etc.) and visualizes the results obtained. The forward and backward diffusion algorithms underlying the model have been investigated. Based on experiments, it has been found that reducing the number of diffusion steps preserves image quality provided that the noise coefficients and the guidance scale parameter are correctly selected. It has also been confirmed that the use of latent space allows for a significant reduction in computational costs without losing the photorealism of the result

References

1. Ho J., Jain A., Abbeel P. Denoising Diffusion Probabilistic Models. In: Advances in Neural Information Processing Systems. 34th Conference (NeurIPS 2020), Vancouver, Canada. 2020.
2. Dehouche N. What’s in a text–to–image prompt? The potential of Stable Diffusion. Patterns. 2023. Vol. 4, No. 5. DOI/Publisher. [Electronic resource]
3. Podell D., English Z., Lacey K. et al. SDXL: Improving Latent Diffusion Models for High–Resolution Image Synthesis. arXiv preprint. 2023. [Electronic resource]
4. An Introduction to Diffusion Models and Stable Diffusion.” Marvik AI Blog. 2023, Nov 28. [Electronic resource]
5. «Рисунок 1.1 – Схематичне зображення процесу прямої та зворотної дифузії у DDPM». ResearchGate. [Electronic image resource]
Published
2025-12-05
How to Cite
Pekh , P., & Frolov , O. (2025). Research on the technology of image generation based on text description using the Stable Diffusion model. COMPUTER-INTEGRATED TECHNOLOGIES: EDUCATION, SCIENCE, PRODUCTION, (61), 58-63. https://doi.org/10.36910/6775-2524-0560-2025-61-08
Section
Computer science and computer engineering