If you operate a business in an industry other than search engine optimization (SEO) or search engine marketing (SEM), the massive Google API document leak might have understandably escaped your notice. But in a world not too far away, earth-shattering events were taking place around the end of May 2024.
Turns out that 2,569 internal documents providing the ingredients to Google Search’s secret sauce were leaked on GitHub by a bot (bad robot!) back on March 13, 2024. A search marketer noticed them and contacted retired SEO field general Rand Fishkin on May 5, 2024. After verifying the validity of the documents and the source who led him to them, Rand reached out to API expert and technical SEO giant Mike King to help him decipher the code contained in the documents. They published their initial findings on May 27, 2024, on their respective blogs.
With the smoke cleared and dust settled, we take a look at what those documents reveal about the attributes Search’s algorithm evaluates, the impact of important Google updates, and how they impact the ranking of websites on a Google search engine results page (SERP.) Let me start, however, by using the information gleaned to clarify common SEO practices whose validity were made ambiguous thanks to years of Google denials.
While, for the most part, these leaked documents are vindication for many SEOs, because of the understandable secrecy around Google SERP ranking factors, and the less-understandable denials by Googlers, there is a level of uncertainty regarding SEO practices that, despite being tried and true, we had to convey to our clients when discussing them.
One of the unanticipated benefits to come from this leak is the fact that we can confirm the following attributes are measured by Google in the context of global SEO efforts. The amount of weight given to each, however, is not included in the leaked documents, and in fact, they may even “fluctuate” due to “Twiddlers.” Maybe fluctuate isn’t the right word, but twiddlers are re-ranking functions within Google’s search algorithm framework. They adjust search results after the primary ranking algorithm, Ascorer, has processed them. Twiddlers operate similarly to filters on a search by modifying the information retrieval score or changing the ranking of pages before they are presented to searchers.
Twiddlers can also impose category constraints to instill diversity by limiting the types of results shown. For example, a Twiddler might limit the number of blog posts appearing on a search results page for a query about repairing a faucet to show more YouTube videos on this topic instead.
They are part of a broader system that may include various “Boost” functions, such as NavBoost, QualityBoost, and RealTimeBoost, which adjust rankings based on their associated criteria. They show that attributes can be weighted differently based on the search request. NavBoost, for example, is meant to surface more relevant results based on location. So if a searcher in the Washington D.C. area googles the word “capitals,” they will likely get results related to the hockey team, whereas a searcher in Chicago would likely see SERP with a Google snippet and sites listing all the state capitals.
With that in mind, here is a more accurate assessment of the attributes Google’s Search algorithm considers when presenting SERPs to searchers and a look at how this information ties in with, or alters, global SEO practices.
One of the most eye-opening revelations from the leaks is the nuanced role that different types of clicks play in Google’s ranking algorithms. Not all clicks are created equal, and Google uses a sophisticated framework to evaluate the quality of these interactions. The leaked documents introduced several key metrics (much of the data for which is collected from Chrome browser) that are now understood to be vital in determining search rankings:
Boiled down, what this click data has confirmed is that, assuming the content all similarly answer a searcher’s question, user experience (UX) is a key factor of Google’s ranking algorithm. Sites that deliver a superior user experience are more likely to generate successful clicks—clicks that lead to high engagement and low bounce rates. Several factors contribute to a positive UX, and, thanks to algorithm updates like the Page Experience Update, Google’s algorithms are designed to assess these elements closely:
The leaks have also highlighted the importance of content freshness and originality. They indicate that a Twiddler is engaged to surface recent and recently updated pages (from trusted sites; more on this below.) Content originality is also measured at several stages of the algorithm ranking process.
Content Freshness: Regularly updated content is a key factor in maintaining high rankings. Aside from factoring in the date on a page, Google stores the last 20 changes made to a webpage and can tell when it has been substantially updated. Its Search algorithm favours fresh content because it’s more likely to be relevant and accurate.
Originality: The leaks reinforced that Google’s algorithms are designed to prioritize original content over duplicate or repurposed material. So, while AI writing tools can be helpful with certain tasks, pumping out content written entirely by ChatGPT, for example, can lead to similarly-sounding and/or -structured pages as others in your niche. It may also get your page, and site, flagged as unhelpful, ever since Google’s Spam Content Update.
Google has long touted the value of content written with expertise, experience, authoritativeness, and trustworthiness (E-E-A-T.) This is especially true for searches that fall under Google’s “Your Money Your Life” (YMYL) category. YMYL content includes pages that could impact a person’s future happiness, health, financial stability, or safety. Because of the potential consequences of YMYL content, Google holds these sites to an especially high standard.
Key Factors for YMYL Content: The leaked documents show that Google stores information on authors and their associated works. This makes it more likely that pages written by authors with a track record of writing on specific topics will be preferred in related searches. In other words, having a respected digital bibliography on a specific topic is more important than ever, especially for YMYL sites.
Staying current is also critical for YMYL pages, as outdated information can have serious implications for users.
One of the most intriguing revelations from the leaked documents is the concept of “siteAuthority” – especially considering years of denials by Google about its existence. This metric represents a site’s overall authority, in Google’s eyes, on a specific topic or niche.
We already knew that Google analyzes various factors, such as the quality of a page’s content, the volume and quality of inbound links, user engagement metrics, relevance of a page’s content to the rest of the site, and the site’s historical performance. While we can’t say for certain which of these factors (if any) are considered, or how it’s used, it’s nice to finally have confirmation that Google assigns a site-wide siteAuthority score, as many a professional SEO expert has advised their clients.
Established brands typically have rankings and engagement due to name recognition, earning high-quality backlinks, and getting plenty of clicks. They can also afford SEO agencies who help them rank well across a broad range of topics, even those outside their core expertise.
One of the revelations of the leak is that Google uses an attribute called smallPersonalSite. There is no indication from the leaks how it is used, but considering how Google’s latest updates have decimated niche sites, there’s an argument to be made that it may be currently used to demote smaller niche sites instead of boosting them.
There has also been confirmation of a long-denied “sandbox” for newer sites that keeps them from being displayed in SERPs until they can be verified, thanks to the hostAge attribute.
For niche websites, the concept of siteAuthority presents both challenges and opportunities. On one hand, these sites may struggle to compete with established brands in broad categories due to their lower authority. However, sites that consistently produce high-quality, relevant content within their specialized domain, written by recognized authors, can build their brand over time.
Niche websites should remain focused on building their specialized knowledge to gain authority within their domain and use all available tools to do so.
As the search landscape evolves, the distribution of content across various media platforms is becoming more critical than ever. The leaks highlighted the need for businesses of all sizes to diversify how people get to their sites. Google’s Search algorithm is under constant construction, with tweaks having major impacts on the visibility of uncountable sites. It’s best to use a portion of your resources to understand how content is shared and engaged with across different channels, such as social media. This shift underscores the need for a more holistic approach to SEO, one that integrates content distribution into the overall strategy.
In this new era of SEO, businesses must think beyond their website. Effective media distribution, diversification of traffic sources, and strategic partnerships are becoming essential components of a successful SEO strategy. It’s about crafting a comprehensive digital strategy that spans multiple platforms, engages users wherever they are, and leverages every available channel to build and maintain authority. Those who adapt to these changes and integrate them into their overall approach will be less reliant on the whims of Google and better positioned to succeed in the competitive global market.
Businesses need to adapt their strategies to align with the latest insights and best practices. This section provides a comprehensive toolkit for enhancing your global SEO approach, helping you navigate the SEO terrain with confidence.
Before diving into specific strategies, it’s crucial to assess where your current SEO efforts stand through professional website audits. Below is an interactive checklist designed to help you gauge your practices against the insights revealed by the leaks:
Once you’ve assessed your current practices, it’s time to implement changes that align with the latest SEO insights, preferably by training with a true SEO expert. Here are a few more insights to help you enhance your user experience and engagement based on what we learned from the leaked API documents:
When it’s all said and done, the information contained in the leaked Google API documents was absolutely enlightening, helping to solidify and provide more insight into practices developed by SEOs over years of testing and refining.
That being said, however, the key to success, as always, lies in giving potential customers helpful information they commonly seek during consultations, running an efficient, engaging website, staying agile, focusing on delivering genuine value to users, and leveraging every tool at your disposal to maintain a competitive edge globally. To find out how that applies to your website, contact Paul today.
Search has changed, and if you're still relying on the same old SEO playbook, you're…
Unlike traditional product-based companies, SaaS products and services are often complex, intangible, and rapidly evolving.…
With Google processing billions of daily search queries, its ranking systems are constantly under pressure…
Knowledge graphs help us navigate an ever-expanding sea of information. Like maps, they guide us…
The results from our study measuring the potential impact of Google’s recent Helpful Content and…
Strategies for building and maintaining an effective backlink profile have undergone significant changes since Google's…