Google’s Gemini on Android: A Glimpse into the Future and Past of Mobile Computing

Nearly a decade ago, Google showed off a feature called Now on Tap in Android Marshmallow—tap and hold the home button and Google will surface helpful contextual information related to what’s on the screen. Talking about a movie with a friend over text? Now on Tap could get you details about the title without having to leave the messaging app. Looking at a restaurant in Yelp? The phone could surface OpenTable recommendations with just a tap.

I was fresh out of college, and these improvements felt exciting and magical—its ability to understand what was on the screen and predict the actions you might want to take felt future-facing. It was one of my favorite Android features. It slowly morphed into Google Assistant, which was great in its own right, but not quite the same.

Today, at Google’s I/O developer conference in Mountain View, California, the new features Google is touting in its Android operating system feel like the Now on Tap of old—allowing you to harness contextual information around you to make using your phone a bit easier. Except this time, these features are powered by a decade’s worth of advancements in large language models.

“I think what’s exciting is we now have the technology to build really exciting assistants,” Dave Burke, vice president of engineering on Android, tells me over a Google Meet video call. “We need to be able to have a computer system that understands what it sees and I don’t think we had the technology back then to do it well. Now we do.”

I got a chance to speak with Burke and Sameer Samat, president of the Android ecosystem at Google, about what’s new in the world of Android, the company’s new AI assistant Gemini, and what it all holds for the future of the OS. Samat referred to these updates as a “once-in-a-generational opportunity to reimagine what the phone can do, and to rethink all of Android.”

The upgraded Circle to Search in action.

It starts with Circle to Search, which is Google’s new way of approaching Search on mobile. Much like the experience of Now on Tap, Circle to Search—which the company debuted a few months ago—is more interactive than just typing into a search box. (You literally circle what you want to search on the screen.) Burke says, “It’s a very visceral, fun, and modern way to search … It skews younger as well because it’s so fun to use.”

Samat claims Google has received positive feedback from consumers, but Circle to Search’s latest feature hails specifically from student feedback. Circle to Search can now be used on physics and math problems when a user circles them—Google will spit out step-by-step instructions on completing the problems without the user leaving the syllabus app.

Samat made it clear Gemini wasn’t just providing answers but was showing students how to solve the problems. Later this year, Circle to Search will be able to solve more complex problems like diagrams and graphs. This is all powered by Google’s LearnLM models, which are fine-tuned for education.

Gemini is Google’s AI assistant that is in many ways eclipsing Google Assistant. Really—when you fire up Google Assistant on most Android phones these days, there’s an option to replace it with Gemini instead. So naturally, I asked Burke and Samat whether this meant Assistant was heading to the Google Graveyard.

“The way to look at it is that Gemini is an opt-in experience on the phone,” Samat says. “I think obviously over time Gemini is becoming more advanced and is evolving. We don’t have anything to announce today, but there is a choice for consumers if they want to opt into this new AI-powered assistant. They can try it out and we are seeing that people are doing that and we’re getting a lot of great feedback.”

Ben Oliver

Gabrielle Caplan

Jaina Grey

Simon Hill

With a coming update, you’ll be able to drag AI-generated images into emails and messages.

At I/O, the updates to Gemini on Android are to make it more contextually aware, just like Now on Tap nearly a decade ago. Later this year, you’ll be able to generate images with Gemini and drag and drop them into apps like Gmail or Google Messages. Burke showed me an example of Gemini generating an image of tennis with pickles; he was responding to someone’s text about playing pickleball. He hailed Gemini—which popped up as an overlay over the messaging app—asked it to generate the image, and then dragged one and dropped it in the chat.

You’ll be able to ask Gemini to pull specific bits of information out of a video.

He then pulled up a YouTube video on pickleball rules. Call up Gemini while watching and you’ll see a prompt to “Ask this video.” This lets you employ Gemini to find specific information in the video without scrubbing through the whole thing yourself. (Who has time for that?) Burke asked about a specific pickleball rule, and Gemini quickly spat out an answer based on the video. This “summarize” functionality has been the hallmark of many AI tools—summarizing PDFs, videos, memos, and news stories (yay).

Ben Oliver

Gabrielle Caplan

Jaina Grey

Simon Hill

Text summaries of videos may prove helpful.

Speaking of PDFs, you’ll soon be able to attach a PDF to Gemini (there will be a prompt for “Ask this PDF”) and Gemini can deliver specific information, saving you the need to scroll through several pages. Burke says these features are rolling out to millions of devices over the next few months, though the PDF feature will only be available for Gemini Advanced users—folks paying for the $20 per month subscription to access the cutting-edge capabilities of Google’s AI models.

Gemini in general will show more “dynamic suggestions” based on what’s happening on the screen. These will pop up right above the Gemini overlay when you activate the assistant.

Gemini Nano is Google’s large language model powering select on-device features on certain phones, like the Pixel 8 series, Samsung Galaxy S24 range, and even the new Pixel 8A. Running these as on-device features means data does not need to be sent to the cloud, making the features more private. They can even work offline too.

Google’s feature ‘Summarize’ in their Recorder app, alongside ‘Smart Reply’ in selected messaging applications, are presently powered by Nano. An upgrade of this model – the Gemini Nano with Multimodality – is expected to launch this year, initially on Pixel phones. This implies that Gemini Nano will extend its capabilities beyond just processing text.

The upgraded model, described as having 3.8 billion parameters and being multimodal, is the first model of such kind to be built on-device. According to Burke, this model is extremely potent, achieving approximately 80 percent of Gemini 1.0 – a remarkable feat for a small model.

An upgrade is also slated for Google’s screen reader to enhance comprehension and depiction of images.

The TalkBack screen reader feature on Google’s Android system will now be enabled by this model. This feature aids users with visual impairments to understand what is on the screen. With Gemini Nano, descriptions of images will be richer and more detailed. Google claims that TalkBack users encounter “90 unlabeled images per day”. However, the deficit can be covered by Gemini as it is equipped to visualize, comprehend images on the screen, and offer detailed descriptions even when the user is offline.

Google has poured many of its AI smarts over the past few years into improving its call screening technology to limit robocalls, and Gemini Nano with Multimodality will soon help you avoid phone scams—in real time. A new feature called Scam Detection will have Gemini listening in your phone calls, and if it picks up on certain phrases or requests from the person on the other end, it will issue an alert that you’re likely in the middle of a scam call. Burke says this model was trained on data from websites like BanksNeverAskThat.com to learn what a bank wouldn’t ask you—and the types of things scammers typically ask for. He says all of this listening and detection happens on-device, so it’s private. We’ll hear more about this “opt-in feature” later this year.

Ben Oliver

Gabrielle Caplan

Jaina Grey

Simon Hill

Unusually, Google says it will be unveiling a few new Android features tomorrow rather than compressing all of the new stuff into today’s announcements, so stay tuned for more.

With the rise of AI hardware gadgets vying to replace your smartphone—and the talk of app-less generative interfaces—I asked Samat how he sees Android changing in the next five years. He’s excited to see the innovation from new and existing companies trying new things—and that Google is “trying a lot of things internally” too. But he boiled things down to an analogy with the automotive space.

If you buy a car, you’ve come to expect certain standard features, like a steering wheel. But with AI, one giant leap would be to take away those features—no steering wheel, no interfaces. “Some people would be excited by that, some people would not be excited by that.” He believes certain functions we do on our phones will be more assistive than ever with the help of AI—and we can expect some features to be replaced in that way.

“As that continues, what we will find—and we’re already seeing this in our own testing—is there are opportunities to fundamentally transform the UI in certain areas where it tips over from the point of, ‘OK, that’s really assistive,’ to ‘Actually, there should be an entirely new way of doing this.’ That’s what’s fun and exciting about right now. It’s an amazing time to be working on this technology.”

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article

Rising Concerns About Secrecy Surrounding Spy Powers Aimed at US Data Centers

Next Article

Amazon's Massive Book Sale: Free Kindle Unlimited Subscriptions, Discounted Ebooks, Ereader Bargains, and More

Related Posts