New ChatGPT image model finally fixes AI text problem
It was once relatively easy to tell the difference between human-created and AI-generated images. Just a couple of years ago, image models struggled even with simple tasks like producing a restaurant menu, often inventing nonsensical words such as “enchuita,” “churiros,” “burrto,” and “margartas.”
Now, the latest ChatGPT Images 2.0 model is capable of generating a Mexican restaurant menu that appears realistic enough to be used in a real setting without raising suspicion—though details like a $13.50 ceviche might still prompt questions about quality, News.Az reports, citing TechCrunch.
For comparison, earlier tools such as DALL-E 3 had notable difficulty rendering accurate text when generating images.
Historically, AI image generators struggled with spelling because they relied on diffusion models, which reconstruct images from noise. As Asmelash Teka Hadgu explained in 2024, these models prioritize broader visual patterns, making small details like written text harder to reproduce accurately.
RECOMMENDED STORIES
To address these limitations, researchers have explored alternative approaches, including autoregressive models, which generate images by predicting what they should look like—similar to how large language models operate.
OpenAI has not disclosed the exact architecture behind Images 2.0, declining to comment during a recent press briefing.
However, the company said the model includes “thinking capabilities,” allowing it to search the web, generate multiple images from a single prompt, and verify its outputs. These features enable it to create marketing materials in various formats, as well as more complex visuals such as multi-panel comic strips.
OpenAI also noted improvements in rendering non-Latin scripts, including Japanese, Korean, Hindi, and Bengali. The model’s knowledge base extends up to December 2025, which may affect its ability to reflect very recent developments.
According to the company, Images 2.0 delivers a higher level of precision and detail in image generation. It can follow instructions closely, maintain requested design elements, and accurately render components that have traditionally challenged image models, such as small text, icons, user interface elements, and dense visual compositions—all at resolutions up to 2K.
While these advanced capabilities mean image generation may take longer than typing a standard query, even complex outputs like multi-panel comics can be produced within minutes.
Access to Images 2.0 is being rolled out to all ChatGPT and Codex users, with paid users receiving access to more advanced features. OpenAI is also introducing the gpt-image-2 API, with pricing based on output quality and resolution.
By Nijat Babayev





