From the latest hyper realistic ChatGPT-4o to this latest Google annual I/O conference, it may seem that the robot apocalypse may be looming over us sooner than we know, but who knows, maybe it would be for our own good.
At Google’s yearly event for developers in Mountain View, California, held on Tuesday, 14 March 2024, the CEO Sundar Pichai revealed their big plan: to use artificial intelligence (AI) in pretty much everything they do, from helping people work, plan their lives, navigate the physical world and get answers to questions swiftly and concisely.
As of today, Google’s search engine is utilized by over two billion individuals around the world and contributed to a revenue of $175 billion in the previous year. However, Pichai explained that they want to make all their products even smarter by adding AI features.
This is not just talk. Google has been working hard to make this happen. As we all know, there are dangers and risks posed whenever AI is involved, such as spreading false information. But they are still pushing forward.
Google Unveils New AI Technology in Search Engine
One of Google’s big achievements this year is a new feature called AI Overviews. When you search for something on Google, instead of just getting a bunch of links, you’ll also see a short summary at the top, generated by AI.
This innovation, which aims to enhance the user experience, is expected to start in the US soon and there are plans to roll it out globally to more than a billion people by the end of the year.
According to a blog post written by Google, people have been using AI Overviews “billions of times” via their experiment in Search Labs. It was found that the users appreciate the ability to receive a brief summary of a topic along with links for further exploration.
Users will also have the ability to customize your AI Overview, with options to either simplify the language or delve into more detail. This feature can be especially handy if you’re new to a topic or if you’re looking to explain something in more layman terms.
However, no matter how intuitive, this move has raised concerns among some website owners, who worry that users might rely more on Google’s summaries than visiting their sites, potentially reducing their traffic.
To this, Google has assured stakeholders that they remain committed to directing traffic to publishers and creators, as it is observed that the links provided within AI Overviews receive more clicks compared to when the same page appears as a conventional web listing for that query.
Google’s Gemini Tool Steps Up
When ChatGPT came out towards the end of 2022, some folks in the tech world saw it as a real challenge to Google’s search engine, which is like the go-to spot for finding stuff online.
Since then, Google has been on a mission to stay ahead in the AI game. They introduced a bunch of new tech called Gemini at the end of 2023, which includes fancy AI models for developers and a chatbot for regular users. They’ve also made AI a big part of YouTube, Gmail, and Docs, making it easier for people to create videos, emails, and documents.
Google has been showing off how they’re planning to weave AI even deeper into our lives. They talked about Project Astra, which is all about testing how AI can talk to us, understand images and videos, and basically act like a helpful assistant. Some of these features will start showing up in Google’s Gemini chatbot soon, according to Demis Hassabis, the boss of DeepMind, Google’s AI lab.
DeepMind also introduced Gemini 1.5 Flash, a new AI model that’s speedy and efficient, but not as hefty as the previous Gemini 1.5 Pro model. Dr Hassabis says this new model is really good at thinking things through, summarizing stuff, chatting with you, and describing what’s in images and videos.
Google’s small but powerful language model for phones, formally known as Gemini Nano, has received an upgrade as well.
According to Google’s CEO, Sundar Pichai, this upgrade(branded with the new name Gemini Nano with Multimodality) allows it to “take any kind of input and give any kind of output.”
What does that mean? It means this model can understand information from text, pictures, sound, websites, social media videos, and even live videos from your phone’s camera.
Then, it can summarize what it finds or answer questions about it. Google showed how it works with a video where someone used a camera to scan the titles of all the books on a shelf, which was then saved in a list to recognize them later.
New AI Video, Audio and Photo Creation Tools
Some say that a degree in arts is very niche, as not many people would possess similar skill sets, but it seems that we may have to inevitably compete with AI soon.
Companies developing generative AI, like Google, are aiming to change how people make things like pictures, sound, and movies
At the I/O event, Google has introduced VideoFX, a new generative video model inspired by its DeepMind video generator, Veo. VideoFX generates high-quality 1080p videos based on text prompts and offers more flexibility in the production process compared to previous methods.
Additionally, Google has enhanced ImageFX, an image generator that produces high-resolution images. Google claims that ImageFX has fewer problems creating unwanted digital artifacts in pictures compared to previous versions and is better at understanding and generating text based on user prompts.
Google also unveiled DJ Mode in MusicFX, an AI-powered music generator, during its presentation. This feature enables musicians to create song loops and samples using prompts.
The demonstration of DJ Mode occurred during an entertaining performance by musician Mark Rebillet, which set the stage for the I/O keynote.
Finally, there is Imagen 3, Google’s latest text-to-image generator. According to Google, this model generates the highest quality images to date, with greater detail and fewer imperfections, resulting in more lifelike images.
Similar to Veo, Imagen 3 has enhanced natural language processing capabilities, enabling it to better understand user prompts and their underlying intentions. Google claims that Imagen 3 excels in rendering text, addressing one of the major challenges faced by AI image generators.
Imagen 3 is currently in a limited release, accessible through a private preview within Image FX for specific creators. However, Google plans to make the model more widely available soon through Vertex AI.
The public can sign up to join a waitlist for access to Imagen 3.
Search Tools for Google Photos
Google has integrated powerful visual search capabilities into Google Photos. One notable addition is a feature called Ask Photos, which allows users to ask Gemini, Google’s AI, to search their photos and provide more detailed results than ever before.
For instance, users can provide their license plate number, and Gemini will use contextual clues to locate their car in all of their photos.
According to a blog post by Jerem Selier, a software engineer at Google Photos, this feature does not collect data from users’ photos for advertising purposes or to train other Gemini AI models, except for what is used within Google Photos itself.
Ask Photos is set to be rolled out during the summer this year.
Another important thing introduced by Google during the keynote is a new scam detection feature for Android, designed to listen in on phone calls and identify language patterns commonly associated with scams, such as requests to transfer money to a different account.
If the feature detects suspicious activity, it will interrupt the call and prompt the user to hang up. Google emphasizes that this feature operates on the device itself, ensuring privacy as phone calls are not sent to the cloud for analysis.
In addition, Google has expanded its SynthID watermarking tool, which helps differentiate media created using AI.
This tool aids in detecting misinformation, deepfakes, or phishing attempts. SynthID embeds an imperceptible watermark in media content, which can only be detected through software analysis of pixel-level data.
Hopefully, with this implementation, online scams will slowly become a thing of the past.
Gemini is also set to take over from Google Assistant as the primary AI assistant on Android phones, accessible with a long press of the power button.
Over time, Gemini will expand its presence across different services and apps, offering multimodal support upon request. Additionally, Gemini Nano’s multimodal features will be integrated into Android’s TalkBack feature, providing more detailed responses for users with visual impairments.
Looking forward, Google’s vision for AI is huge. They want to change how we use technology and how it affects our lives. With AI leading the way, Google is shaping what the future might look like.