Posted by Terence Zhang – Developer Relations Engineer and Kristi Bradford – Product Supervisor
Google Pixel’s Recorder app permits individuals to report, transcribe, save, and share audio. To make it simpler for customers to handle and revisit their recordings, Recorder’s builders turned to Gemini Nano, a robust on-device massive language mannequin (LLM). This integration introduces an AI-powered audio summarization characteristic to assist customers extra simply discover the fitting recordings and shortly grasp key factors.
Earlier this month, Gemini Nano received an influence enhance with the introduction of the brand new Gemini Nano with Multimodality mannequin. The Recorder app is already leveraging this improve to summarize longer voice recordings, with improved processing for grammar and nuance.
Assembly consumer wants with on-device AI
Recorder builders initially experimented with a cloud-based resolution, attaining spectacular ranges of efficiency and high quality. Nevertheless, to prioritize accessibility and privateness for his or her customers, they sought an on-device resolution. The event of Gemini Nano offered an ideal alternative to construct the concise audio summaries customers have been in search of, all whereas preserving information processing on the gadget.
Gemini Nano is Google’s best mannequin for on-device duties. “Having the LLM on-device is helpful to customers as a result of it supplies them with extra privateness, much less latency, and it really works wherever they want since there’s no web required,” mentioned Kristi Bradford, the product supervisor for Pixel’s important apps.
To realize higher outcomes, Recorder additionally fine-tuned the mannequin utilizing information that matches its use case. That is finished utilizing low order rank adaptation (LoRA), which permits Gemini Nano to persistently output three-bullet level descriptions of the transcript that embrace any speaker names, key takeaways, and themes.
AICore, an Android system service that centralizes runtime, supply, and important security parts for LLMs, considerably streamlined Recorder’s adoption of Gemini Nano. The supply of a developer SDK for working GenAI workloads allowed the staff to construct the transcription abstract characteristic in simply 4 months, with solely 4 builders. This effectivity was achieved by eliminating the necessity for sustaining in-house fashions.
Since its launch, Recorder customers have been utilizing the brand new AI-powered summarization characteristic averaging 2 to five occasions day by day, and the variety of general saved recordings elevated by 24%. This characteristic has contributed to a major improve in app engagement and consumer retention general. The Recorder staff additionally famous that suggestions in regards to the new characteristic has been constructive, with many customers citing the time the brand new AI-powered summarization characteristic saves them.
The subsequent huge evolution: Gemini Nano with multimodality
Recorder builders additionally carried out the most recent Gemini Nano mannequin, often called Gemini Nano with multimodality, to additional enhance its summarization characteristic on Pixel 9 gadgets. The brand new mannequin is considerably bigger than the earlier one on Pixel 8 gadgets, and it’s extra succesful, correct, and scalable. The brand new mannequin additionally has expanded token help that lets Recorder summarize for much longer transcripts than earlier than. Gemini Nano with multimodality is at present solely obtainable on Pixel 9 gadgets.
Integrating Gemini Nano with multimodality required one other spherical of fine-tuning. Nevertheless, Recorder builders have been ready to make use of the unique Gemini Nano mannequin’s fine-tuning dataset as a basis, streamlining the event course of.
To totally leverage the brand new mannequin’s capabilities, Recorder builders expanded their dataset with help for longer voice recordings, carried out refined analysis strategies, and established launch standards metrics centered on grammar and nuance. The inclusion of grammar as a brand new metric for assessing inference high quality was made attainable solely by the improved capabilities of Gemini Nano with Multimodality.
Doing extra with on-device AI
“Given the novelty of GenAI, the entire staff had enjoyable studying find out how to use it,” mentioned Kristi. “Now, we’re empowered to push the boundaries of what we will accomplish whereas assembly rising consumer wants and alternatives. It’s actually introduced a brand new degree of creativity to problem-solving and experimentation. We’ve already demoed at the very least two extra GenAI options that assist individuals get time again internally for early suggestions, and we’re excited in regards to the potentialities forward.”
Get began
Study extra about find out how to carry the advantages of on-device AI with Gemini Nano to your apps.