The thing that kills the ChatGPT chat window is the "mouse".

This article is machine translated

Show original

In San Francisco in 1968, computer scientist Douglas Engelbart introduced a new species to the world at a launch event later known as "The Mother of All Demos," holding a small wooden box with two metal wheels.

That was the first time humans publicly used a mouse to guide a digital cursor on a screen. In the decades that followed, this little arrow became virtually ubiquitous. It traversed office software, game interfaces, browser windows, and countless spreadsheets, becoming humanity's most familiar yet silent guide as we entered the digital world.

However, in the past half-century, the computing power, form and application scenarios of computers have almost changed, but the essence of the mouse cursor has hardly changed: it knows which coordinates it is on the screen, knows X and Y, but does not know whether you are pointing to a line of code, an invoice or a landscape photo.

Faced with the constantly flashing pixels, it can only do very basic things: click, drag, and wait for the next click.

Today, Google is going to reinvent the mouse cursor with Gemini.

At the recently concluded Android Show, Google laid out almost all of its plans surrounding Android, AI, and the hardware ecosystem. Among them, a new feature called " Magic Pointer " gives the old mouse cursor "eyes" and a "brain".

Google's intentions are clear: future AI interaction should not rely on lengthy prompts, but simply point to the screen and say, "Move this over there," just like in real life. So the question is, when the mouse cursor finally learns to "understand" the screen, where will it lead human-computer interaction?

What exactly can this AI arrow with its eyes open do?

To understand the significance of this technology, we must first see the most awkward aspect of current AI tools: interaction costs.

Over the past few years, the capabilities of large language models have skyrocketed, but the barrier to entry for using them remains high. In order for AI to accurately understand intent, users are forced to learn a complex "cue word engineering": setting roles, adding background information, and limiting the output format. Writing short essays of several hundred words for a simple requirement is commonplace.

Furthermore, typical AI tools often run in separate web pages or application windows, frequently interrupting the user's workflow. For example, when you are reading a 50-page PDF and want AI to create a chart, you usually need to go through the following steps: take a screenshot -> save -> open your browser -> go to the AI webpage -> upload the image -> enter the prompt word.

Google calls this cumbersome cross-application operation "AI detours." This kind of switching is not only inefficient, but it can also easily interrupt people's focused attention, the so-called "flow" state.

To this end, Google's first interaction principle is "flow." In their experimental AI cursor prototype, the AI's capabilities are no longer limited to a specific app or webpage, but are attached to the mouse cursor, ready to be used at any time.

The triggering method is also kept to a minimum : no keyboard shortcuts need to be memorized; simply "shake" the mouse, and the AI interface will automatically appear based on the currently hovered content, providing highly contextualized operation suggestions. Selecting an image will ask if you want to "compare"; hovering over a paragraph will proactively offer polishing solutions.

The entire process requires no instruction and is entirely guided by intuition. Let's look at a few extremely intuitive scenarios:

First, the ultimate form of picture description.

When browsing a cartoon cityscape, a traditional mouse only allows you to click and zoom in. But now, you can simply hover the AI cursor over a building in the background of the photo and say into the microphone, "Move this element of the image here."

There's no need to explain who "this place" is, or describe the building's appearance. The AI cursor directly understands the pixel you're pointing to, identifies the corresponding element, and moves successfully.

In the past, a mouse could only tell the system "where I clicked"; now, it has begun to tell the system "what I am referring to".

Second, use fewer introductory words and more natural references.

When you see an extremely complex baking recipe on a webpage, you don't need to copy and paste, nor do you need to write something like "Please multiply all the ingredient quantities in the following recipe by two." You just need to highlight the text with your cursor and casually say, "Double the quantities of 'these'."

In a flash, the AI rewrote a new recipe for you right there on the spot.

Third, convert pixels into interactive entities.

To a computer, a screen is just a few million glowing pixels. But an AI cursor can transform those static pixels into living entities.

For example, you're watching a travel vlog, and a restaurant that looks amazing flashes by in the video. You pause it, point your cursor at it, and the previously lifeless video instantly transforms into a real, interactive location, with a reservation link for the restaurant popping up next to it.

For example, you casually snap a picture of a sticky note covered in scribbles, and with a flick of your mouse, the ink transforms into a checkmark-based to-do list. Notice anything? Before, you had to search for AI; now, AI follows your mouse and obediently comes to your fingertip.

Kill AI prompts, return to human intuition

Upon closer examination, the most powerful communication tool for humankind is actually pronouns.

When you and your colleagues are sitting in front of the screen revising a design, you would never say in a clear, articulate voice, "Please move the blue rectangle at the top left corner of the screen (X:120, Y:350) 50 pixels to the right." You would simply point at the screen and say:

"Move this a little to the right, and dilute it a bit."

"That restaurant looks nice, how do we get there?"

What does this error message in the code mean?

In our daily lives, we rely heavily on "this" and "that." Gestures combined with minimal spoken language are the most efficient communication code for humans. The reason for this is that we live in the same physical space and share the same visual context.

Google astutely grasped this point and distilled it into a product principle: Embrace the power of "This" and "That".

Instead of forcing humans to learn complex cue word frameworks, we should do the opposite: remove the dirty work of expressing intentions from us and let machines adapt to humans' laziest and most instinctive "gesturing".

The good news is that this interaction method is already being implemented. Gemini in the Chrome browser is the first to support it starting today; Google's newly launched Googlebook laptop line has "Magic Pointer" directly integrated into the operating system, covering all applications.

Googlebook's ambitions extend beyond just a mouse. Google defines this product line as "the perfect companion to Android phones."

Similar to Apple's iPhone mirroring, users can seamlessly project Android apps onto their Googlebook desktop, running them in native aspect ratio and freely navigating between devices in the file manager, completely breaking down the ecosystem barriers between phones, tablets, and laptops. Furthermore, Gemini can generate custom dynamic widgets on the desktop as needed (such as a passenger's real-time flight card).

In terms of hardware design, all Googlebook models integrate a "Glowbar" light strip on the body, allowing you to distinguish it from traditional Chromebooks or Windows laptops at a glance.

The first batch of Googlebooks will be manufactured by Acer, Asus, Dell, HP, and Lenovo, and are expected to be available this fall.

Interestingly, Samsung is absent from this list. Recent reports suggest that Samsung may be preparing a Galaxy laptop running Google's new operating system, and its next Unpacked event is rumored to be scheduled for July 22.

As for the underlying driving core, although Google did not name it, the emphasis throughout the article on "a modern operating system born for intelligence" and the deep integration of Android and ChromeOS all point to the long-rumored "Aluminum" system.

This means that AI is beginning to become an infrastructure at the operating system level. And when AI truly becomes your mouse cursor, it gains the authority to intervene in everything— what you see is what you get, what you point to is what you control.

AI human-computer interaction is at a crossroads.

Looking back to 1968, the first mouse that amazed the world had an incredibly simple function: tracking position. Over the past fifty years, the mouse has been enhanced with scroll wheels, side buttons, and even fans and weights, but its soul remains a blank slate: it accurately marks coordinates, yet can never comprehend the meaning behind those coordinates.

Google's AI cursor has achieved an unprecedented evolution in the history of interaction: it not only knows where you are, but also what that is.

Over the past year, countless startups that have secured funding have scrambled to create the next "super gateway to the AI era." Everyone's frantically focused on the realism of dialog boxes and the complexity of agent workflows. But Google has now given the entire industry a stark lesson:

What is the best technology? It's the subtle, pervasive influence. Chatboxes are never the final form of AI; they are merely a compromise during a transitional period. The best AI should recede into the background, becoming infrastructure embedded in your daily actions, rather than just a separate application that needs to be opened.

From command-line interfaces (CLI) with black text on a white background, to graphical user interfaces (GUI) with mouse clicks, and then to touchscreen swiping in the mobile era (NUI), large language models have briefly taken us back to the era of typing communication in the past few years, causing countless people to suffer from Prompt anxiety.

But after today, we know that it was just a detour before dawn. Truly useful AI must eventually learn to think like humans: to understand your every glance and to comprehend every "put this there" you say.

Fifty-eight years ago, when Douglas Engelbart held that simple wooden mouse, his ultimate dream was to "enhance human intelligence."

Fifty-eight years later, as AI is integrated into this ancient pointer, machines are finally beginning to truly "understand" the world. The era of prompt engineers is coming to an end, and the ultimate closed loop of human-computer interaction will take a historic leap forward with each ambiguous "this" and "that."

Here is the link to experience it:

https://aistudio.google.com/apps/bundled/ai-pointer-create?showPreview=true&showAssistant=true&fullscreenApplet=true

https://aistudio.google.com/apps/bundled/ai-pointer-find?showPreview=true&showAssistant=true&fullscreenApplet=true

This article is from the WeChat official account "APPSO" , authored by Discover Tomorrow's Products, and published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

BeInCrypto Việt Nam

3 altcoins that benefited most from the CLARITY Act and why.

XRP

2.49%

TechFlow

Trump's Q1 stock trading activities revealed; these stocks he recently bought.

BeInCrypto Việt Nam

on-chain data proves the Bitcoin cycle has changed.

BTC

3.23%