It looks like Google is going all in on AI agents.
The latest Google Cloud Next keynote presentation, which covers some of the new product developments across Google, had one major theme, and that is AI agents. AI agents for customer service. AI agents for helping your employees get more stuff done. For marketing, for driving, for analyzing vast quantities of data. for creating content and podcasts and videos, AI agents for cybersecurity, for coding, for deploying new e-commerce pages, AI agents for everything. In this video, I’ve cut the most important announcements and AI agent demos that they’ve shown so you can understand better what Google is about to roll out. Our biggest announcements today are focused on generative AI. You know, customers have quickly gone from experimenting with generative AI, helping it to answer questions, to make AI predictions, and are now building generative AI agents.
Agents process multimodal information simultaneously, conversing, reasoning, learning, and making decisions. Agents can connect with other agents and with humans, and they will transform how each of you interact with computing devices and the web itself. You know, our customers are building early versions of AI agents. Our foundation model, Gemini, which has advanced reasoning skills with multimodal information. We’re the only cloud to offer widely used first-party, third-party, and open-source models. Vertex AI to access, tune, augment, and deploy custom models and to build agents. Shortly before this presentation, Google rolled out its latest model, Gemini 1.5 Pro with a 1 million token context window. And here’s the important thing to understand. Google has seemingly been struggling a lot in the AI race, but just in the last few months, I think they made a pretty substantial breakthrough with the 1.5 Pro model.
Google adopt mixture of expert architect for update AI
They’ve adapted the mixture of experts architecture, so The model is not similar to GPT-4, which uses the same architecture, which by the way, Google pioneered along with Transformers. They’ve pioneered, they’ve created a lot of this technology. Open AI was able to use that, put it together better, innovate, iterate, and get it out faster. But now Google is reclaiming that technology and advancing their models. And here’s the thing, Gemini 1.5 Pro with the 1 million token context window is good. If you’ve played around with some of their other stuff before and found it lackluster, you might be dubious about some of the claims that they’re making in this keynote. But again, as I’ve been using that model, as many other people online have been using that model and reporting back the results, it’s becoming very clear that it is very, very good.
Now I’m sure there’s some areas where GPT-4 is better, some areas where GPT-4 is worse, and there are some areas where the Gemini model is clearly better, like for example, the context window. So let’s take a look at all the big announcements from Google and keep in mind that this will be rolling out very rapidly to all the Google customers, all their enterprise partners, all the big businesses that will want to give this technology to all of their employees. And as those employees and customers, as they get their hands on and start using it, Google will get more and more data into how people are using it, how to improve the products more and more. kind of starting that flywheel of improvement. So whether or not you’re cheering for Google to win this AI race, it’s important to understand that they’re gunning for the enterprise space.
They’re really looking to put their AI products into these massive global companies and getting everyone to use their products. You know, the opportunity for customer agents is tremendous. To help each of you build customer agents faster, we’re introducing Vertex AI Agent Builder. you can now create customer agents that are amazingly powerful in just three key steps. First, you can use Gemini Pro to create free-flowing, human-like conversations with text, voice, images, and video as inputs and personalize them with custom voice models. Second, you can use natural language instructions to control the conversation flow. and guide it on specific topics you don’t want it to discuss, such as current events, in the same way that you train your human agents. You can also control when it hands over to a human agent with transcription and summarization of its conversation history to make these transitions extremely smooth.
Third, you can improve response quality with vector-based and keyword-based search to connect to your internal information and the entire web. You can also use extensions to complete tasks for customers like updating contact information, booking a flight, ordering food, and many more. Let’s take a look at an example of a customer agent in action. Let’s head over to my favorite store, Symbol Fashion. They just launched a customer agent and it leverages Gemini and Vector Search to deliver a seamless shopping experience. So last night I was watching a video of this band and I love the keyboard player’s shirt. So I was thinking, I’d really like to be wearing that shirt tomorrow night. But can I find it in my size and in time to be rocking it at the concert here in Vegas?
Find me a checkered shirt like the keyboard player is wearing. I’d like to see prices, where to buy it, And how soon can I be wearing it? I’m going to include the video. Now, the customer agent is using Gemini’s multimodal reasoning to analyze the text and video to identify exactly what I’m looking for. Then, Gemini turns it into a searchable format. How cool is this? It found the checkered shirt I’m looking for, right? And some other great options in no time. Of course. It never fails. They only have three left in my size. I don’t want to miss out on wearing this shirt, so I’m going to go ahead, give the store a call, and ask them to set it aside for me.
I’m so close to having this shirt for the concert. Let’s give the store a call. Hi there. This is the Symbol Fashion customer agent at South Las Vegas Boulevard. Am I speaking with Amanda? Yes, this is Amanda. Great. Thanks for reaching out, Amanda. I see you had a session on another device. I’ve sent you an SMS message with a link to our live chat companion. If you would like to switch to chat, please click the link. How can I help you today? I’d like to purchase the shirt in my cart with the card I have on file. Absolutely. I see you’re also a Symbol Fashion Rewards member. Looks like you have a 20% off voucher available to you. Would you like to apply it to this purchase?
Yes, please. That would be great. The shirt you’re purchasing goes well with these items, also available for pickup in your preferred size. Would any of these be interesting to you? Absolutely. Please add the white shirt and the boots to my cart. Great. Your total is $203.73. Okay to proceed with the cart on file? Yes. Your purchase is confirmed. Do you need anything else today? No, I’m all set. Thank you. Watching that last video about customer service agents, as an e-commerce owner myself, I am very excited about what it can do for my business. As a normal human being that gets very frustrated with how companies go about marketing, especially online, using various pop-ups and injecting pages into your history so that when you click the back button, it actually takes you to a different page.
Various auto play ads forcing you to watch ads before you can interact with certain functionality in apps or in websites. The massive, massive amounts of emails that they send. I feel like that trend will continue. But now with the AI assistance, so as you’re talking to them over the phone, there’s really nothing that prevents them from giving you the hard pitch to buy something. So we’ll see how this gets implemented. It could be a nightmare, but hopefully there will be some counterbalance that will prevent these companies from really abusing it. We’ll see. But in this next part, let’s take a look at employee agents. So these are agents that will help employees be more productive, but also will help you as an individual be productive.
And some of this stuff is extremely exciting. If you’ve ever had to comb through massive legal files, like if you’re trying to lease a car or an apartment, if you’ve ever been confused about certain medical bills here in the US, the medical billing industry is insane. It’s very complicated. It’s very annoying. I don’t know how half the stuff that they do can even be legal, but having an intelligent AI assistant seems like it could be a big, big help. Take a look. There are lots of ways Gemini for Workspace is helping our customers save time. Let me show you the magic of one example in action. I’ve been asked to evaluate two proposals for a new payroll system. I open up my drive, and I see that there are two statements of work here, or SOWs, that outline the vendor proposals.
Google Drive is AI-ready without any additional pre-work, and everything here is protected with our industry-leading security. All I need to do to get started is drag and drop the two files over into the Gemini side panel, which is currently in preview. In just a few seconds, Gemini for Workspace will give me an outline of these proposals so that I can have a good idea of what they’re about. And I could go even further and ask, compare the price of the two offers. Each of these documents is over 70 pages. It would have taken me hours to review these docs. But instead, Gemini is going to help me find a clean answer to save me a ton of time. And as you can see, I have the answer right here in the Gemini side panel.
Let’s go ahead and dive into one of these proposals. So I’m going to pick this one. One of the things that you’ll notice about Gemini in Docs is that it will proactively show me a summary of this file. so that I can get a sense of what’s going on without needing to take the time to read the whole thing. But before I proceed with this vendor, I need to ensure that no compliance issues exist. And I’m going to be honest, I have not memorized every rule in our compliance rulebook because it is over 100 pages. I would have to need to scour the 80 pages of this proposal and compare it manually with 100 pages of the rulebook. So instead, in the side panel, I ask, does this offer comply?
Whoops, with the following. And I’m goanna just at mention our compliance rule book, hit enter, and see what Gemini has to say. Okay, so interesting. Gemini has found an issue because the supplier does not list their security certifications. Because Gemini is grounded in my company’s data with source citations to specific files, I can trust this response and start to troubleshoot before selecting a vendor. Gemini for Workspace was terrific in this example, saving me a lot of time without compromising on accuracy. And this technology can be applied in so many other use cases. A sales team analysing RFPs, a recruiter developing interview questions, and so much more. Many of our customers are building employee agents to automate workflows that are tedious and repetitive. One of the nation’s leading healthcare providers, HCA Healthcare,
is using our generative AI technology to help caregivers spend more time with patients and less on paperwork. They’re testing Katie. So Katie is a nurse handoff digital assistant, helping to ensure continuity of care when one caregiver shift ends and another begins. It delivers summaries in everyday language to ensure that caregivers can share information easily and provides them up-to-the-minute patient data to expedite the discharge process. Hi, folks. So I know you all want to hear about awesome AI stuff that’s coming, but I need to talk to you for a minute about my annual benefits enrolment. See, I forgot. I have to finish signing up by today, and as you can see, I might be a little bit busy. So if you don’t mind, let’s go ahead and look at this open enrolment email together.
Okay, yep, I’ve got a deadline. I knew that. Thank you. I’ve got FSA stuff. I’ve got an online portal for my company. Okay, there’s a lot here. Ooh, they included video. Let’s see if this makes my life easier. And the video is… Okay, so it’s almost an hour long. Yeah, I’m not going to have time to review all of this stuff. Let’s see how this employee agent that we’ve developed using Google Workspace, Gemini Models, and Vertex AI might be able to help me. As you can see, it’s integrated directly into my Google Chat, so I don’t have to context switch while I’m figuring all this stuff out. First things first, let’s have it summarize the email and the video that it sent me.
Summarize the body and attached video from my recent email with subject open enrolment closing. So behind the scenes, the agent is referencing that email body and its attachments as context in the prompt using retrieval augmented generation. That way, its response is limited to the content that matters to me. The Gemini model’s multimodal capabilities allows the agent to understand and reason across text, audio, and video from a single prompt. I mean, this is a way quicker read. Okay, good. And I can immediately see that the medical plans have been completely revamped this year. Let’s go ahead and jump into the benefits portal to see more. Now, I’ve already done my dental and my vision, but I procrastinate, I mean saved the most important plan for last, my medical plan.
Let’s see how this option stacks against my existing coverage. Compare these coverage options to the PDF doc I have on the Platinum plan. The Gemini model’s long context window paired with vertex extensions enables the agent to cross-reference large amounts of data from a variety of sources, including unstructured data like PDFs. Leveraging Gemini’s advanced reasoning capabilities, the agent is able to understand the complex details of my current plan and compare it with the new options for 2025. And since the enterprise grounding features links me to the exact data that Gemini used to draw its conclusions, which you can see linked here, I can confidently trust its recommendation that the gold plan is best for me. And done. So one last thing. Now that I’ve officially completed enrolment,
My daughter’s going to need braces this year. Let’s use the agent to help me find the right orthodontist. My daughter will need braces this year. Help me find the best in-network orthodontist near my home. With vertex extensions, the Gemini model can integrate with any external or internal API. This allows the agent to join my dental coverage details directly with real-time Google Maps and Places data to help determine the best ranked in-network providers near me. Okay, so it looks like Symbol Orthodontics is the only one that offers Saturday appointments, which works best for my family’s schedule, but what do I really know about them? How do I know that the agent actually picked the best orthodontists? And a quick scan of the summary shows me that they’re great with kids, which is great.
I particularly love this customer review. I’ll go with that. Let’s schedule a call. I don’t know how it’s going to find a schedule, though. I mean, I’m here. The agent knows that I’m at Google Cloud Next because it’s integrated with Google Calendar, and it can find a time to schedule the call. Yes, please. Awesome. I’ve got a call. Do I need anything else? Nope, I’m good. No, thanks. Perfect. Nice and easy. Just imagine what else an employee agent can help you with. So overall, I’m pretty excited about that. everyone, everyday people, to spend less time doing boring manual tasks, tediously going through various documents, and it’s kind of empowering us to do more with less time. Offloading some of the cognitive effort that we have to spend doing tasks that are important but kind of boring.
Next up, they’re talking about a brand new thing that they’re rolling out, just like they have Google Sheets and Google Docs, now they have Google Vids, which is going to be pretty interesting for making video presentations. So think PowerPoint, but with video, but also completely automated with AI. Unfortunately, this will make creating various marketing campaigns online a lot, a lot faster and easier. Take a look. How many times have you heard, should we start with a doc or a deck? Well, we can do a lot better. I’m absolutely thrilled to announce our newest workspace app, Google Vids. Sitting alongside Google Docs, Sheets, Slides, Google Vids is an AI-powered video creation app for work. With Gemini and Vids, you have a video writing, production, and editing assistant all in one.
Let me show you how simple it is to get started with Vids. Now, after a full week with all of you here at Next, I’m going to want to share a recap video to share all the excitement with my organization. When I open up Vids, Gemini helps me get started. I simply type in a prompt, using an existing document for context. Now, based on that prompt, Gemini suggests a narrative outline for the story that I could easily customize and edit. I choose an expressive style, and Vids works its magic. Let’s see what I get. So, wow, just like that, I get the first draft with beautifully designed, fully animated scenes, complete with relevant stock media and music, and even a generated script. I’ll be able to add fun videos and photos that I captured this week from my media directly by accessing Google Drive and Google Photos.
And even without any video expertise, I can build out scenes like a pro just as easily as I make slides. But it doesn’t have to stop there. With Vids, you can personalize your story your way by recording yourself or by using one of our preset voices by Gemini. Beyond bids, many customers are building agents for their marketing teams, audio and video production teams, and all of the creative people out there who could use a hand. We provide a tremendously powerful platform and stack to build creative agents. First, it takes the best image generation model. Imagine 2.0, our most advanced text-to-image technology, helps businesses create images that match their specific brand requirements. This is now generally available in Vertex AI. Imagine’s strong language comprehension and photorealism capabilities contributed to Google securing the leadership position in the recent Forrester computer vision wave.
Second, as a part of Imagine, we’re now introducing text-to-live image in preview today. Marketing and creative teams can generate animated images from a text prompt, including product images, ads, GIFs, and storyboards. Here’s an example of a daylily opening in the morning dew. And a pot of something delicious simmering on the stove. Or an aerial view of a mountain range. Third, we are proud to announce the general availability of digital watermarking for AI-generated images produced by Imagine. It’s powered by Google Deep Mind’s Synth ID. Finally, we’re announcing new editing modes for Imagine 2.0, which will make it really easy to remove unwanted elements in an image, add new elements, and expand the borders of the image to create a wider view for all of you. Let’s see how the creative agent we developed
With Gemini Models, Imagine 2.0’s API, and Google Workspace can help. First things first, let’s get a sense for the look and feel of our brand. What are the key styles and themes of our brand? The creative agent can analyse previous campaigns to understand our unique brand style and apply it to new ideas. In this case, the creative agent has analysed over 3,000 brand images, descriptions, videos, and documents of other products that we have in our catalogue contained within Google Drive to create this summary. And as you can see, it summarized our brand identity as bold, adventurous, and aspirational. This is exactly the look and feel that we’re going for. Next, I’m going to prompt our creative agent using Imagine to dynamically generate amazing new images of our tents in iconic and real outdoor settings.
Stick those out. Machu Picchu. Not bad. Half Dome. These are real landmarks accurately presented. Now, let’s work on a narrative to really help us land the message. The creative agent leveraged Gemini Pro’s multimodal capabilities to brainstorm creative captions that match each photo. These look great too. Generate a storyboard, share it with an email group. The creative agent’s going to move this into slides and share it out with that group. Looks great, but there are three requests. First, something for less experienced campers. Second, content for the Symbol Outfitters podcast. And third, something more visually dynamic for social media. Let’s start with that new storyboard. And you’ll see, in this case, we’re asking for a new image, new captions, and a new storyboard with that new audience that relaxed camping audience.
Take a look. That looks pretty relaxing to me. Looks pretty good. And there’s the storyboard. Great. Just like that, the creative agent modified our ideas for a new audience. It understood our context and made the changes that we asked for. Now, let’s generate that podcast. The Gemini model’s complex reasoning allows me to generate a script and an audio clip in a single prompt. This much longer prompt highlights our ability to do ordered operations. We’re asking the agent to create a podcast script in SSML, or speech synthesis mark-up language, and then passing the output to our text-to-speech API. The result will be an episode with two people discussing tense. So we’ll see the script here, which we could expand if we wanted, but let’s just listen to a little bit of it.
Welcome to the show. Thanks, Freya. It’s great to be here. I’m always happy to geek out about gear. That sounds really good. And I want you to notice the regionalisms in the speaker’s voices. Finally, we were asked for some engaging content for social media. Let’s see what the creative agent can do using our new live image generation capability. We’ve run this prompt before, and so you’ll be seeing cache to speed things up. But these… are simply beautiful. Creating content like this could take days or weeks of scouting and shooting, but our creative agent used Imagines’ new Text-to-Live Image capability to make them in minutes. The creative agent shows how Google’s generative AI capability can make producing custom content easy, seamless, and a lot of fun.
Next up, we have coding. There’s a piece earlier in this keynote that I’ve cut out where the CEO of Goldman Sachs talks about the various ways in which they’re applying this generative AI. And one thing that jumped out at me was saying how various coding assistants can easily add a 40% productivity boost to the software developers. And I’ve been seeing similar numbers across the board. People are saying how much easier it is to do software development, to do coding, when you have some sort of an AI assistant. With Google 1.5 Pro, the Gemini model that has a 1 million token context window and an ability to intelligently code and test code and offer improvement suggestions, this truly does seem like a game changer.
This is where I do expect some pretty drastic changes, some pretty drastic disruptions to occur in the software development field. Our enterprise focused AI code assistance is now called Gemini Code Assist. First, we’re integrating the power of Gemini 1.5 Pro into Gemini Code Assist. This upgrade brings a massive 1 million token context window to coding, which is the largest in the industry. This revolutionizes coding, allowing you to perform large-scale changes across your entire code base. I’m a new developer with Symbol Outfitters, and today we show recommended products to customers only after they’ve made an initial selection. These suggestions are powered by our custom-built recommendation service based on previous purchases. But now, the marketing department has asked me to move this feature to our homepage so that customers can see products that they might be interested in as soon as they get to our site.
Our design department has created a mock-up of what they would want this experience to look like in Sigma. And for the developers out there, you know that this means we’re going to need to add patterns. Padding in the home page, modify some views, make sure that the comfit’s are changed for our micro services. And typically, it would take me a week or two to even just get familiarized with our company’s code base, which has over 100,000 lines of code across 11 services. But now, with Gemini Code Assist, as a new engineer on the team, I can be more productive than ever and can accomplish all of this work in just a matter of minutes. This is because Gemini’s code transformations with full codebase awareness allows us to easily reason through our entire codebase.
And in comparison, other models out there can’t handle anything beyond 12,000 to 15,000 lines of code. And even then, they struggle to get it right. Gemini inside of Code Assist is so intelligent that we can just give it our business requirements, including the visual design. So, let’s ask. Here I am prompting Gemini to add a for you recommendation section on the homepage, along with an image of the future state to show the improved design. Almost immediately, Gemini Code Assist starts by reasoning about the code changes that it needs to make and has insights and experience teammate would have. For example, because we asked Gemini Code Assist to change the recommendation service, it was able to find the recommendation function and extract out the exact details needed to make the call to the recommendation service.
It highlights the files needing to be changed and reveals the reasoning behind its recommendations using our own code base for context. Gemini Code Assist doesn’t just suggest code edits. It provides clear recommendations and makes sure that all of these recommendations are aligned with Symbol Outfitter’s security and compliance requirements. In Code Assist, we’ve also added an option to apply the edit, which keeps me as the developer in the driver’s seat. So let’s take a look at the source code changes that Gemini Code Assist has made in our code base. It looks like we have multiple edits across two files, handlers. Go and also home.html. Gemini Code Assist even applied these changes to the full repository. To put this in context, no pun intended, it would have taken me over 70 hours nonstop to even just read through all of these files.
And just like I would with any code change, my next step is to check the work out by testing out the modified app locally. So let’s try it. And there we go. The For You Recommendation section is exactly what our marketing team was asking for. And as a new developer, I was able to make a meaningful contribution on my very first day. So let’s recap. Behind the scenes, Gemini has analysed my entire code base in Git Lab. It’s implemented a new feature and has ensured that all of the code generated is compatible with my company’s standards and requirements. We are so excited for you to try out Gemini Code Assist. A while back, Open-AI released Code Interpreter. It’s now called Advanced Data Analytics, I believe.
And when I started messing around with it, I was pretty blown away. I did not think it was going to be as good as it was. I could upload a massive Excel spreadsheet with thousands of customer orders and then just ask questions about it. I would talk to Chat GPT, just chatting with it, asking questions like, what is the lifetime value of all my customers. What is the lifetime value if we separate them into different segments? And very, very quickly, Code Interpreter would answer back with those details. If I could formulate the question in English or whatever language, in natural language, it would create the code, it would run the code, it would parse the Excel spreadsheet, and it would figure out the answer.
Understanding data all of a sudden became much, much simpler, much easier. If you knew what questions to ask, The AI would do the hard, tedious, boring work of combing through that data and figuring out the answer for you. A year later, less than a year later, here are Google’s data agents. We’re thrilled to be helping Walmart use data agents to serve their customers. We’re helping many customers build data agents. For example, Bayer is harnessing Big Query and Vertex AI to develop digital medical solutions and drugs more efficiently. As well as streamline the creation of regulatory documentation. News Corp is using Vertex AI to search data across 30,000 sources and across 2.5 billion news articles updated daily. All right, let’s explore with our data agent.
What you’re seeing here are real-time KPIs, the metrics that matter the most, being continuously monitored from my enterprise data in Big Query to surface anomalies and trends. Let’s dig in. Let’s explore cloud shoe sales over customer segments and territories in a heat map. Our data agent uses the Looker semantic layer to effectively ground its AI in specific business data and data definitions that give us trusted answers. And in this case, it’s right down to the territory and even customer segment. How does demand correlate with other core business metrics? Okay. Our data agent again uses the Looker semantic layer to find other business metrics that are similar and looks for anomalies. In fact, I’ve got three helpful insights. I can actually dig into supply and demand levels over the next three months by clicking this first tile.
Our data agent is now using Big Query machine learning to build a future-looking forecast. And it looks like it’s found a Gaskin supply versus demand, and it’s only growing. I might need a substitute product to help me meet demand. And we’re going to upload an image of our cloud shoe. And with that, our agent has all of the information it needs to do that search. And it’s found three top matches and combined these images with structured data about pricing and delivery schedules. Well, given I need a similar looking shoe and delivery times are important, I think option one looks great. One big concern that a lot of people have is the growing risk of cybersecurity attacks. In a previous video, we covered the ex-CIA director, I believe, giving a talk in front of Congress on C-SPAN talking about how
Our cyber security has to get a whole lot better very fast because of this avalanche of potential cyber security attacks that are coming. Not only will AI empower the ability of hackers everywhere to make these attacks but also a lot of the kind of random harmless data that we generate online, that we share online. In the past it wasn’t a big deal because it was just little pieces Here and there, it couldn’t really harm us, but with AI’s ability to take in large quantities of data and make meaningful insights, there’s almost this new kind of attack vector that we have where somebody that’s just gathering up, like hovering up all the data that’s online could potentially find damaging ways of using it by, you know, using AI, by using neural nets to process that data.
In this next segment, Google talks about security agents. Actual AI agents that work on your behalf 24-7 trying to spot cybersecurity attacks and also helping you defend against them. The demand for agents like this or cybersecurity systems like this will likely rapidly spike in the upcoming decade. Now let’s talk about security agents. With security and generative AI, we focused in three important areas. You know, security agents will help each of you in every stage of the security lifecycle, from prevention, detection, and response. Let’s start with threat intelligence. We’re able to take the experience and intelligence we gather from protecting Google’s own services, combine it with Mendicant’s leading frontline insight from their work in incident response to show you emerging threats, the severity, and risk factors, committing to train over 200,000 professionals on our Gen AI products.
We’re at a pivotal moment in our industry. Together, we’re creating a new era of generative AI agents built on a new, truly open platform for AI. And we’re reinventing infrastructure to support it. My name is Wes Roth, and thank you for watching.
🙏Google Restructures for Accelerated AI Development https://t.co/Iibc40PsDD
— Financial News (@TeamFNews) April 19, 2024