ChatGPT's annual update summary, revisiting the evolution of the "big model vane"

01-15

This article is machine translated

Show original

Here is the English translation of the text, with the specified terms preserved:

In 2024, OpenAI's ChatGPT continued to break new ground in the field of large language models, launching a series of innovative features such as a personalized chatbot marketplace, enhanced memory capabilities, and multimodal processing abilities. They also made continuous optimizations in terms of security, stability, and efficiency. Let's take a look back at the key updates!

In 2024, large language models have become deeply integrated into our daily lives.

As the frontrunner, ChatGPT has always been the compass in the world of large language models. Whether it was the o1-pro, Sora Turbo, or various small features like video mode and interrupting speech, released last year, or even the high-priced $200 monthly subscription, every OpenAI launch event brings new shocks and new ideas to the vast community of AI enthusiasts.

Let's review the key updates released by OpenAI in 2024 and witness the evolution of large language models!

January

GPT Marketplace: Users can publish their customized chatbots (GPTs) and search by categories such as writing, lifestyle, and education.

Guardian Tools (Election-related): OpenAI updated its policy to prohibit users and creators of ChatGPT, DALL-E, and others from impersonating candidates or local governments, using them for campaign activities or lobbying, or using them to obstruct or distort the voting process.

Inline Tagging: Users can trigger the GPT mention function by typing "@" in the chat box, and the system will display a list of available GPT models, allowing users to integrate and interact with multiple AI models in a single conversation.

Voice Playback (Mobile App): Added the ability to play back responses in audio, improving the convenience for users to access information.

GPT Self-Service Appeal Process: Users can submit appeals for issues encountered when using GPT.

Team Plan: The beta version of the ChatGPT plugin has been discontinued.

February

Memory Function (Sunshine) Release: Enhances the model's memory of past conversations, making interactions more coherent and better understanding user context and needs.

New Appearance (Hedgehog) Release

Feedback Function: Introduced a user feedback mechanism for GPT, allowing users to rate and provide suggestions for different GPTs to facilitate improvements.

Author Verification: Implemented social verification for GPT creators' personal profiles, enhancing the credibility of creators and the authority of content.

Sora Release: Able to quickly generate high-quality videos up to one minute long based on simple text descriptions, better following user instructions and producing visually realistic scenes, complex interactions, and specific types of motion.

Dark and Light Modes: Optimized the visual effects of the interface to adapt to different usage scenarios and user preferences.

GPT Version History: Allows users to understand the iteration of GPT and track functional changes.

March

Custom Instructions (GPT-4): Users can customize instructions for ChatGPT at the system level, including personal background information and response format requirements.

DALL·E 3 Controls (Style & Aspect Ratio), Editor & Inpainting: Provided users with a rich set of predefined style options; users can refine specific areas using natural language prompts, such as adding, removing, or modifying elements.

Text-to-Speech (Web): Automatically detects the language of the text being read and reads it in the corresponding language; provides five different voices.

Monetization Program: Shares revenue with developers based on GPT usage, providing a new monetization channel to incentivize the creation of higher-quality GPT services.

April

No-Account Access: Provides a more convenient experience for ChatGPT, but only the free GPT-3.5 version can be used, while advanced features like DALL-E 3 still require an account.

Data Control v2: Users can choose whether to allow their data to be used for model training without affecting their ability to view chat history; a new mobile voice data option has been added, which is disabled by default.

The domain has been unified to chatgpt.com, consolidating the brand and service entry point.

GPT-4 Turbo Release: Generates content twice as fast as GPT-4, with a larger context window of 128k tokens, and is priced at only 1/3 of the cost.

May

Free users can also choose the default conversation model, such as switching to GPT-4o-mini and GPT-4o, to customize the conversation model based on their own needs, improving efficiency and consistency.

Connected Apps: Available only for ChatGPT Plus, team, and enterprise users, allowing them to directly upload files from Google Drive and Microsoft OneDrive to ChatGPT, making it easier for users to analyze and process files stored in the cloud.

A desktop application was launched for macOS system users.

GPT-4o was released, with multimodal capabilities to simultaneously process text, audio, and visual information. It performs exceptionally well in voice conversations, with natural and fluent responses that can express emotions and understand the underlying sentiment. It supports 50 languages and has a lower API price, 2x performance improvement, and 5x higher rate limit.

The ChatGPT interface was redesigned, codenamed Fruit Juice.

Users can use different models to regenerate responses for the same prompt.

The "Sky" voice option is no longer provided to users, and the specific reason has not been made public.

Users can switch models within the same conversation based on the progress and needs, improving the flexibility and effectiveness of the dialogue.

Free users can now access some tools and GPTs that were previously only available to paid users, such as internet access, image upload and analysis, chart creation, advanced data analysis, enabling memory function, and accessing the GPT Marketplace.

June

At the 2024 Worldwide Developers Conference (WWDC), Apple announced a partnership with OpenAI to integrate ChatGPT into Siri. User requests will not be stored by OpenAI, and user IP addresses will be obfuscated, with the option to connect a ChatGPT account.

The Sidekick desktop application, previously only available for ChatGPT Plus users, is now accessible to all users. It allows users to take screenshots and discuss code snippets or interpret complex charts with GPT-4o within the application.

July

GPT-4o mini (Chive) was released, with fewer parameters than GPT-4o, supporting 128k input tokens and 16k API, priced over 60% cheaper than GPT-3.5 Turbo. It is also OpenAI's first AI model to use the new "Instruction Hierarchy" safety strategy, which prioritizes executing pre-set commands to prevent malicious users from inducing the model to perform illegal operations.

After the release of GPT-4o and GPT-4o mini, GPT-3.5 became significantly weaker in terms of multilingual support, response speed, and processing capabilities, and was officially retired on July 19th.

The new ChatGPT interface (Fruit Juice) became the default for all users.

OpenAI released a prototype product called SearchGPT, which can accurately understand users' complex queries and provide more relevant search results, overcoming the shortcomings of traditional search engines in handling complex and ambiguous queries. It not only provides relevant search results but can also use its powerful language generation capabilities to directly generate detailed answers. Users can ask follow-up questions as if conversing with a person, and the search results highlight and link to the sources of information, with clear inline attribution in the responses. Users can also quickly access more source links from the sidebar.

August

Based on GPT-4o's video and audio capabilities, Advanced voice(gpt-4o-s2s) can perceive and respond to user emotions, providing a more natural and real-time dialogue experience, and users can interrupt at any time.

Free users can use DALL・E 3 to generate two images per day.

The maximum Token length of the model's memory has been increased to 8k, which can better retain context information when processing long texts and complex dialogues, avoiding incomplete responses or forgetting previous content due to memory limitations.

Starter Prompts v2: Provides updated and more enriched starting prompts to better guide users in asking high-quality questions and requests.

ChatGPT announced that it is developing new sync connectors with Google Drive and Slack, allowing users to seamlessly access document content and improve team efficiency.

September

OpenAI has updated the advanced voice mode of ChatGPT, adding video and screen sharing features, which can understand various accents and tones and accurately convert them into text, and also support real-time translation, making it convenient for international users to communicate.

OpenAI released o1-preview, designed specifically for handling high-complexity tasks that require deep reasoning, such as legal analysis, academic research, and complex decision-making; it can process various data formats including images and audio; developers can highly customize the model based on specific business needs, adapting to scenarios such as e-commerce product recommendations and educational training course design.

o1-mini is more economical, with costs reduced by about 80% compared to o1-preview, suitable for environments with limited computing resources but requiring structured reasoning capabilities, and performs well on basic reasoning tasks such as mathematics and programming.

Two shortcut commands have been added: "/picture" can call the DALL-E model to generate images; "/search" can convert user input into a search query.

October

Advanced voice features have been launched for the macOS and Windows desktop versions, allowing users to set custom commands to customize the model's voice style, speech speed, and more.

Based on GPT-4o, the Canvas feature (gpt-4o-canmore) has been launched, allowing users to draw, create mind maps, flowcharts, etc.; providing developers with a visual code structure tool, where users can draw software architectures or function structures on the canvas; able to organize thoughts intuitively, drag and drop document structures, add annotations, and optimize text for users; users can brainstorm, organize key points, and create slides.

Users can quickly search through chat history (Fanny Pack) for specific content, questions, answers, and more.

November

Paid users of the ChatGPT web version can use the advanced voice feature, which can perceive subtle differences in user voice tone and speed; users can set custom commands to customize the model's speaking style, such as speaking at a specific rhythm, clear pronunciation, slow speed, and periodically including the user's name.

The Windows desktop application (Sidetron) supports voice input, screen capture, and local file uploads.

The ChatGPT desktop version on macOS supports calling ChatGPT to get code explanations and solve errors in IDEs like Xcode, VSCode, TextEdit, and integration with applications like the terminal.

December

The advanced voice mode has added video and screen sharing features, allowing ChatGPT to see the user's operations and displayed content, and make more accurate responses, suitable for online meetings, remote collaboration, online teaching, and other scenarios.

Users can directly execute Python code in the canvas, providing a more convenient data analysis and processing environment for data scientists and analysts.

OpenAI released the o1 official version, with a 50% speed increase and a 50% reduction in the probability of major errors; o1-pro requires ChatGPT Pro to use, with a monthly fee of $200, and can think more deeply to provide higher-quality answers.

OpenAI showcased the o3 model, which scored 75.7% on the ARC-AGI benchmark test, demonstrating strong reasoning, coding, and math problem-solving capabilities, approaching or even exceeding human expert levels in some areas; the o3-mini-preview is relatively more cost-effective, and the official o3-mini version is planned for release at the end of January 2025.

To ensure the safety and reliability of the o3 and o3-mini models before release, OpenAI has adopted a multi-layered security testing approach, combining internal evaluation with external research programs, recruiting security researchers to participate in testing to identify potential security risks and vulnerabilities in a timely manner.

OpenAI released Sora Turbo, which supports text, image, and video inputs, and can generate videos up to 1080p resolution and 20 seconds long, with options for widescreen, portrait, or square formats; it supports 5 creative tools, allowing users to precisely control the content of each frame, add multiple shots to the video, replace, delete or restructure elements in the video, use loop editing, and create seamless repeating videos.

Reference materials:

https://x.com/btibor91/status/1873391215980527840

This article is from the WeChat public account "New Intelligence", author: New Intelligence, authorized by 36Kr for release.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content