Comparative evaluation of training strategies for transformer models on public cloud platforms

Keywords: LLM, Transformer, cloud computing, model training, fine-tuning, PEFT, LoRA, TPU

Abstract

This paper presents a comprehensive comparative evaluation of modern training strategies for Transformer-based models, particularly Large Language Models, on leading public cloud platforms: Amazon Web Services, Google Cloud Platform, and Microsoft Azure. The study systematizes and analyzes the key technical, economic, environmental, and scalability challenges facing developers and researchers. It examines major LLM architectures such as BERT, GPT, LLaMA, and Falcon, and provides a detailed analysis of training methodologies: full pretraining, fine-tuning, and Parameter-Efficient Fine-Tuning methods, including LoRA and QLoRA. The paper features a comparative analysis of cloud-provider-specific computational resources (GPU A100/H100, TPU, AWS Trainium), MLOps tools, networking solutions, and pricing models. Based on a synthesis of empirical data and benchmarks, scientifically-grounded recommendations are formulated for selecting optimal training strategies and cloud configurations depending on use-case scenarios and resource constraints. The work aims to provide actionable insights for practitioners and identify directions for future research in optimizing LLM training

References

1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30, pp. 5998–6008).
2. Amazon Web Services. (2022, March 15). Amazon FSx for Lustre: Best practices guide. Retrieved from
3. Brown, T. B. et al. (2020). Language Models are Few-Shot Learners. arXiv:2005.14165.
4. Databricks. (2023, June 30). Benchmarking large language models on NVIDIA H100 GPUs with CoreWeave (Part 1). Databricks Engineering Blog.
5. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv Preprint, arXiv:1810.04805
Published
2025-12-05
How to Cite
Т., Kopiika О., & Hryshchak , D. (2025). Comparative evaluation of training strategies for transformer models on public cloud platforms. COMPUTER-INTEGRATED TECHNOLOGIES: EDUCATION, SCIENCE, PRODUCTION, (61), 119-126. https://doi.org/10.36910/6775-2524-0560-2025-61-17
Section
Computer science and computer engineering