April 20, 2024

Why Google SGE is stuck at Google Labs and what’s next

The Google Search Generative Experience (SGE) was scheduled to expire as a Google Labs experiment at the end of 2023, but its time as an experiment was quietly extended, making it clear that SGE won’t be running search anytime soon. Surprisingly, letting Microsoft take the lead may have been the best, perhaps unintended, approach for Google.

Google’s artificial intelligence strategy for search

Google’s decision to keep SGE as a Google Labs project fits into the broader trend in Google’s history of preferring to integrate AI in the background.

The presence of AI is not always evident, but it has been a part of Google Search in the background for longer than most people realize.

The first use of AI in search was as part of Google’s ranking algorithm, a system known as RankBrain. RankBrain helped ranking algorithms understand how words in search queries relate to real-world concepts.

According to Google:

“When we launched RankBrain in 2015, it was the first deep learning system implemented in Search. At the time, it was groundbreaking… RankBrain (as the name suggests) is used to help rank (or decide the best order of) the top search results.”

The next implementation was Neural Matching, which helped Google’s algorithms understand broader concepts in search queries and web pages.

And one of the best-known artificial intelligence systems that Google has implemented is the Unified Multitasking Model, also known as Google MUM. MUM is a multimodal artificial intelligence system that encompasses the understanding of images and text and is capable of placing them within the contexts as they are written in a sentence or a search query.

SpamBrain, Google’s anti-spam AI, is most likely one of the most important implementations of AI as part of Google’s search algorithm because it helps eliminate low-quality sites.

These are all examples of Google’s approach to using AI in the background to solve different problems within search as part of a larger core algorithm.

Google would likely have continued using AI in the background until transformer-based large language models (LLMs) could come to the fore.

But Microsoft’s integration of ChatGPT into Bing forced Google to take steps to add AI in a more prominent way with its Search Generative Experience (SGE).

Why keep SGE in Google Labs?

Considering that Microsoft has integrated ChatGPT into Bing, it may seem curious that Google hasn’t taken a similar step and instead keeps SGE in Google Labs. There are good reasons for Google’s approach.

One of Google’s guiding principles for using AI is to only use it once the technology has been proven successful and is implemented in a way that can be trusted to be responsible, and those are two things that generative AI is not capable of doing today.

There are at least three big problems that need to be solved before AI can be successfully integrated into the foreground of search:

  1. LLMs cannot be used as an information retrieval system because they need to be completely retrained to add new data. .
  2. Transformer architecture is inefficient and expensive.
  3. Generative AI tends to create erroneous facts, a phenomenon known as hallucination.

Why AI can’t be used as a search engine

One of the most important problems to be solved before AI can be used as a backend and frontend of a search engine is that LLMs cannot function as a search index where new data is continuously added.

In simple terms, what happens is that in a normal search engine, adding new web pages is a process in which the search engine calculates the semantic meaning of words and phrases within the text (a process called “embedding”). , which makes them searchable and ready to be integrated into the index.

The search engine then has to update the entire index to understand (so to speak) where the new web pages fit into the overall search index.

Adding new web pages can change how the search engine understands and relates all the other web pages it knows about, so it reviews all the web pages in its index and updates their relationships to each other if necessary. This is a simplification intended to convey the general meaning of what it means to add new web pages to a search index.

Unlike current search technology, LLMs cannot add new web pages to an index because the act of adding new data requires a complete retraining of the entire LLM.

Google is investigating how to solve this problem to create a transformer-based LLM search engine, but the problem is not solved by any means.

To understand why this happens, it’s helpful to take a quick look at a recent Google research paper co-authored by Marc Najork and Donald Metzler (and several other co-authors). I mention their names because both researchers are almost always associated with some of the most important research coming out of Google. So if you have any of their names, then the research is probably very important.

In the following explanation, the search index is called a memory because a search index is a memory of what has been indexed.

The research work is titled: “DSI++: Updating the transformer memory with new documents” (PDF)

Using LLMs as search engines is a process that uses a technology called differentiable search indexes (DSI). The current search index technology is called dual encoder.

The research article explains:

“…constructing indices using a DSI involves training a Transformer model. “Therefore, the model must be retrained from scratch every time the underlying corpus is updated, incurring prohibitively high computational costs compared to dual encoders.”

The article goes on to explore ways to solve the problem of “forgetting” LLMs, but at the end of the study they state that they have only moved towards a better understanding of what needs to be solved in future research.

They conclude:

“In this study, we explore the phenomenon of forgetting in relation to the addition of new and different documents to the indexer. It is important to note that when a new document refutes or modifies a previously indexed document, the behavior of the model becomes unpredictable and requires further analysis.

Furthermore, we examine the effectiveness of our proposed method on a larger data set, such as the full MS MARCO data set. However, it is worth noting that with this larger data set, the method introduces significant forgetting. “As a result, additional research is needed to improve model performance, particularly when dealing with larger scale data sets.”

LLMs cannot verify facts themselves

Google and many others are also investigating multiple ways to have AI verify facts to avoid giving false information (known as hallucinations). But so far that research is not making significant progress.

Bing’s experience with AI in the spotlight

Bing took a different path by incorporating AI directly into its search interface in a hybrid approach that married a traditional search engine with an AI interface. This new type of search engine revamped the search experience and differentiated Bing from the competition for search engine users.

Bing’s AI integration initially generated a stir, attracting users intrigued by the novelty of an AI-powered search interface. This resulted in an increase in Bing user engagement.

But after almost a year of rumors, Bing’s market share only saw a marginal increase. Recent reports, including one from the Boston Globe, indicate less than 1% growth in market share since the introduction of Bing Chat.

Google’s strategy is validated in retrospect

Bing’s experience suggests that the AI ​​at the foreground of a search engine may not be as effective as expected. The modest increase in market share raises questions about the long-term viability of a chat-based search engine and validates Google’s cautious approach to using AI in the background.

Google’s focus on AI at the core of search is justified in light of Bing’s failure to get users to abandon Google for Bing.

The strategy of keeping AI in the background, where it currently works best, allowed Google to keep users while AI search technology matures in Google Labs, where it belongs.

Bing’s approach of using AI at the forefront now serves almost as a warning about the dangers of quickly launching a technology before its benefits are fully understood, providing insight into the limitations of that approach.

Ironically, Microsoft is finding better ways to integrate AI as a back-end technology in the form of useful features added to its cloud-based office products.

The future of AI in search

The current state of AI technology suggests that it is most effective as a tool that supports the functions of a search engine rather than serving as the front and backend of a search engine or even as a hybrid approach that users have become accustomed to. refused to adopt.

Google’s strategy of releasing new technologies only when they have been fully tested explains why Search Generative Experience belongs to Google Labs.

AI will undoubtedly take on a bolder role in search, but that day is definitely not today. Expect to see Google add more AI-based features to more of its products and it may not be surprising to see Microsoft continue down that path as well.

See also: Google SGE and generative AI in search: what to expect in 2024

Featured image from Shutterstock/ProStockStudio

Leave a Reply

Your email address will not be published. Required fields are marked *