OpenAI is rolling out new voice and image capabilities for ChatGPT. These additions open up many new possibilities for more interactive and intuitive interactions with AI.
Voice Capability
Speak with it on the go, request a bedtime story for your family, or settle a dinner table debate.
- Users can now engage in back-and-forth conversations with ChatGPT using voice commands.
- Five different voices are available for users to choose from, each created in collaboration with professional voice actors.
- Whisper, OpenAI’s open-source speech recognition system, is used to transcribe spoken words into text.
- The voice capability aims to provide a more natural and human-like conversation experience.
- To activate voice, users can go to Settings → New Features in the mobile app and opt-in for voice conversations.
Image Capability
You can now show ChatGPT one or more images. Troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal, or analyze a complex graph for work-related data. To focus on a specific part of the image, you can use the drawing tool in our mobile app.
- Users can now share one or more images with ChatGPT for various tasks, such as troubleshooting, image analysis, or planning meals based on fridge contents.
- There is also a drawing tool available in the mobile app to help guide the AI’s understanding of specific parts of an image.
- The image understanding is powered by multimodal GPT-3.5 and GPT-4 models, which can reason about text and images together.
Initially, voice and image capabilities are being rolled out to Plus and Enterprise users in the next 2 weeks.
Enabling users to interact with images in a meaningful way represents a significant step forward in AI’s versatility and usefulness.