[ad_1]
OpenAI is best known for its advanced large language models (LLMs) used to power some of the most popular AI chatbots, such as ChatGPT and Copilot. However, multimodal models can take chatbot capabilities to new heights by unleashing a new range of visual applications, and OpenAI just made one available to developers.
On Tuesday, via an X (formerly Twitter) post, OpenAI announced that GPT-4 Turbo with Vision, the latest GPT-4 Turbo model with vision capabilities, is now generally available to developers via the OpenAI API.
Also: How to use ChatGPT
This model maintains GPT-4 Turbo’s 128,000 token window and knowledge cutoff from December 2023, with the only significant difference being its vision capabilities which allow it to understand images and visual content.
Before this model was made available, developers had to call on separate models for text and images. Now developers can just call on one model that can do both, simplifying the process, and opening the doors for a wide range of use cases.
OpenAI shares some ways developers already use the model, and they are pretty fascinating.
Also: The best AI image generators of 2024: Tested and reviewed
For example, Devin, an AI Software engineering assistant, leverages GPT-4 Turbo’s vision to better assist with coding. The health and fitness app, Healthify, uses GPT-4 Turbo with Vision to scan photos of users’ meals and give nutritional insights through photo recognition. Lastly, Make Real, is using GPT-4 Turbo with Vision to convert a user’s drawing into a working website that is powered by real code.
Even though the GPT-4 Turbo with Vision model is not yet available inside ChatGPT or to the general public, OpenAI teased it will soon be available in ChatGPT. If you are a developer looking to get started with OpenAI’s GPT-4 Turbo with Vision API, you can learn how to get started here.
[ad_2]
Source link