Google Algorithm Leak 2024: Google’s Been Lying to us (pretend to be shocked)

Google Leak

The SEO community and broader internet are buzzing with the news of the Google Search Algorithm Leak. Internal documentation revealing Google’s proprietary ranking mechanisms has been exposed, offering what we have long expected insights into the tech giant’s search algorithms. This leak sheds light on long-debated practices and confirms several suspicions held by SEO experts

So, what happened? 

An internal link has revealed that Google’s Search Content Warehouse API documentation mirrors the services offered by Google Cloud Platform. Accidentally, outdated documentation for Document AI Warehouse was made public and remains accessible despite attempts to remove it on May 7th. This documentation is under the Apache 2.0 license, meaning it can be used, modified, and distributed freely by anyone who has access to it, adding another layer of complexity to this leak. Furthermore, the situation is evolving and the full picture is still in development. 

17 Key Takeaways from Google’s Search Algorithm Leak

Of the many notable misrepresentations and revelations included in the leaked documents, these are the highlights thus far:  

Internal Algorithm Documentation Leak

Google Search Content Warehouse API documentation was accidentally made public, revealing a wealth of information on data storage and system features. Staying up-to-date on this information is essential, especially for businesses looking to rank higher on Google’s search algorithm, as it provides valuable insights into how Google’s ranking systems work. This knowledge allows businesses to optimize their strategies for improved search visibility and customer acquisition. 

This documentation leaked details about data stored for content, links, and user interactions, emphasizing the complexity and depth of data management. It sheds light on the specific elements that Google considers in its ranking process, allowing businesses to tailor their SEO efforts more effectively. 

Search Results Systems and Features

The leak also revealed many systems and features are manipulated and stored, showcasing how these systems influence search engine results pages (SERPs). The documentation details 2,596 modules and 14,014 attributes related to various Google services, including YouTube, Assistant, and web documents. These modules are housed in a monolithic repository, meaning all the code is stored in a single location and can be accessed by any machine on the network. This is important for businesses as it underscores the interconnectedness of Google’s services, highlighting the need for a comprehensive and integrated approach to SEO and digital marketing strategies to leverage these systems effectively. 

Domain Authority

Google has repeatedly denied using “domain authority,” possibly referring to specific metrics like Moz’s “Domain Authority.” However, the leaked docs reveal a feature called “siteAuthority” used in Google’s ranking systems, indicating some form of domain authority is indeed considered. This is crucial for businesses aiming to improve their visibility on Google, as understanding and leveraging site authority can significantly impact search rankings and help attract more customers.

Clicks for Rankings

Despite Google’s public claims that they don’t use clicks for rankings, the lead docs and DOJ antitrust testimony confirm the use of click-driven measures through systems like NavBoost and Glue. These systems use click data to adjust rankings, validating long-help suspicions within the SEO community. For businesses, engaging and retaining visitors on their site can directly influence their search rankings, making user experience and click-through rates critical factors in drawing more customers through Google. 

Sandboxing

Sandboxing is a temporary limitation placed on new or untrusted websites to prevent them from ranking highly in search results until they establish credibility and trustworthiness.  It involves isolating these sites in a metaphorical “sandbox,” where visibility is restricted until they prove their legitimacy through quality content, user engagement, and adherence to SEO best practices. Google has denied the existence of a “sandbox” for new or untrusted websites. However, the PerDocData module includes an attribute called “hostAge,” used to sandbox fresh spam, proving that such a mechanism exists. This revelation from Google implies that new websites might initially face ranking challenges, making it essential for businesses to establish trust and authority quickly to avoid being sidelined and improve their chances of being found by potential customers on Google. 

Chrome Data for Rankings

While Google representatives have stated that Chrome data isn’t used in search rankings, the documentation shows that page quality scores and other ranking factors incorporate Chrome data, contradicting these public assertions. Given this, businesses should recognize that user behavior and performance metrics gathered from Chrome can influence their search rankings, making it crucial to optimize site performance and user experience to enhance visibility on Google. 

Algorithm Architecture

Google’s ranking system consists of multiple microservices rather than a single algorithm, showcasing the complexity and distributed nature of Google’s systems. For businesses, optimizing for search rankings requires a comprehensive approach that addresses various aspects of SEO, as different microservices may impact other elements of their online presence. 

Twiddlers

Twiddlers are re-ranking functions that adjust search results after the primary Ascorer algorithm, similar to WordPress filters. They can change a document’s retrieval score or ranking just before display, promoting diversity by limiting result types (e.g., only allowing three blog posts in a SERP). Functions like NavBoost and QualityBoost operate as Twiddlers. For businesses, understanding Twiddlers is helpful because these adjustments can impact how and where their content appears in search results. 

Authors and Content Quality

The leak highlighted that Google explicitly tracks and evaluates authors and provides metrics for content originality and keyword stuffing. As we always say, quality content is king. Businesses must ensure they only publish high-quality, original content authored by credible writers to improve their chances of ranking higher on Google. 

Demotions

Various factors lead to content demotion, including link quality and user satisfaction. Understanding this is crucial because addressing potential issues proactively can prevent their content from being downgraded in search rankings, ensuring better visibility and customer reach.

The importance of links remains high with detailed analysis and metrics. Indexing tier impacts link value, with higher tiers indicating more valuable links. This underscores the need to focus on acquiring high-quality backlinks and ensuring content is indexed in high tiers, as this can significantly enhance search rankings and drive more organic traffic to a site.

Content Measures

Content is evaluated based on various factors like originality and relevance to queries. Businesses must prioritize creating unique, high-quality content directly addressing user search intents. By doing so, they can improve search rankings and attract more targeted traffic. Additionally, understanding this evaluation criteria allows businesses to fine-tune their content strategies to better align with Google’s standards, ultimately leading to greater visibility and customer engagement.

Dates and Freshness

The leak revealed multiple date-related attributes affect content ranking and freshness. Ensuring your content is current and up-to-date is vital for maintaining high rankings and providing relevant information for users. Regularly updating your content can also enhance user engagement and build trust with your audience.

Domain Registration

Registration information is stored, which impacts content trustworthiness. The details of a domain’s registration, such as the length of time it has been registered and the transparency of the registrant’s information, can influence how trustworthy Google perceives a website. For businesses, having clear and consistent registration details can enhance their credibility in Google’s eyes, potentially leading to better search rankings. Additionally, domains that have been registered for longer periods may be seen as more stable and reliable, further boosting their trustworthiness and ranking potential. 

YMYL (Your Money, Your Life Scoring)

The leaked document also highlighted that there are specific scoring metrics for content related to health, finance, and other critical areas. YMYL content, which includes topics that can significantly impact a person’s health, financial stability, or overall well-being, is held to higher standards of accuracy, trustworthiness, and reliability. Google strongly emphasizes evaluating the expertise, authority, and trustworthiness (E-A-T) of such content. For businesses operating in these sectors, it is essential to provide high-quality, well-researched, and accurate information to meet the stringent criteria and improve their rankings. 

Embeddings and Topics Relevance

Google’s algorithm leak also disclosed using vector embeddings to measure how on-topic content is relative to the site. This means that Google’s algorithms analyze content in a sophisticated way to determine its relevance to the website’s overall theme. Understanding and optimizing these vector embeddings is essential as it can help enhance the relevance and ranking of their content on Google, ensuring that it reaches the right audience effectively. 

Ultimately, the documentation reveals significant discrepancies between Google’s public statements and actual practices, highlighting the need for the SEO community to rely on empirical evidence and experimentation rather than official claims. 

Cadence SEO’s Insight on the Google Search Document Leak

In general, the article validates our strategy and approach. Google inadvertently confirmed many things we and the SEO community have been mostly sure of for some time. There were also some somewhat surprising things mentioned, many of which we have some doubts about their overall weight on rankings. So, if there was a big takeaway, this proves that most of what we’ve been doing has been spot-on and accurate.

As evidenced in this article, Google has a long-running reputation for obfuscating details to the SEO community to prevent us from being effective in our ranking efforts. But we’re very smart; we conduct extensive research and test strategies on our own sites before recommending them to clients, and our results speak for themselves. Moreover, we don’t just take the latest recommendations at face value, meaning we don’t have knee-jerk reactions to rumors and gossip. We are scientists and treat SEO like a science for our clients.

The Future of SEO

In the long run, it’s important to work with an agency that not only stays on top of evolving trends in SEO (which is what this article and our interpretation represent) but also one that has done enough real-world testing to prove things we know are likely to be true. Most SEO agencies are not surprised by what came out of “Googles Leak,” however it does help when making cases to clients on why they need certain things, to have yet another example of Google not being very truthful in its public statements. These “leaks”, while interesting, have impacted, and expect to impact, little to no changes for us moving forward, rather they reinstate things we already have shown and proven in our testing.

Curious about how these revelations can impact your SEO strategy moving forward? Get in touch with us today, and let’s elevate your digital presence together. 

Picture of Christy Olsen

Christy Olsen

Christy is the Co-Founder and Managing Partner of CadenceSEO. As a self-proclaimed SEO Nerd she is extremely passionate about all thing SEO. With over a decade of service in the SEO space she has helped hundreds of clients get where they want to go. Outside of work she is a proud mother of 6, tri-athlete, ultra-runner, and Cross Country Coach.

Find Out Why Your Website is Underperforming Today

Read Our Recent Posts Get Every Update