Is Google about to Kill Its Penguin?

TL;DR – A theory: The next Google Penguin update, expected to roll out before year’s end, will kill link spam outright by eliminating the negative signals associated with inorganic backlinks. Google will selectively pass link equity based on the topical relevance of linked sites, made possible by semantic analysis. Google will reward organic links and perhaps even mentions from authoritative sites in any niche. As a side effect, link-based negative SEO and Penguin “penalization” will be eliminated.

Is the End of Link Spam Upon Us?

Google’s Gary Illyes has recently gone on record regarding Google’s next Penguin update. What he’s saying has many in the SEO industry taking note:

  1. The Penguin update will launch before the end of 2015. (Since it’s been more than a year since the last update, this would be a welcome release.)
  2. The next Penguin will be a “real-time” version of the algorithm.

Many anticipate that once Penguin is rolled into the standard ranking algorithm, ranking decreases and increases will be doled out in near real-time as Google considers negative and positive backlink signals. Presumably, this would include a more immediate impact from disavow file submissions — a tool that has been the topic of much debate in the SEO industry.

But what if Google’s plan is to actually change the way Penguin works altogether? What if we lived in a world where inorganic backlinks didn’t penalize a site, but were instead simply ignored by Google’s algorithm and offered no value? What if the next iteration of Penguin, the one that is set to run as part of the algorithm, is actually Google’s opportunity to kill the Penguin algorithm altogether and change the way they consider links by leveraging their knowledge of authority and semantic relationships on the web?

We at Bruce Clay, Inc. have arrived at this theory after much discussion, supposition and, like any good SEO company, reverse engineering. Let’s start with the main problems that the Penguin penalty was designed to address, leading to our hypothesis on how a newly designed algorithm would deal with them more effectively.

Working Backwards: The Problems with Penguin

Of all of the algorithmic changes geared at addressing webspam, the Penguin penalty has been the most problematic for webmasters and Google alike.

It’s been problematic for webmasters because of how difficult it is to get out from under. If some webmasters knew just how difficult it would be to recover from Penguin penalties starting in April of 2012, they may have decided to scrap their sites and start from scratch. Unlike manual webspam penalties, where (we’re told) a Google employee reviews link pruning and disavow file work, algorithmic actions are reliant on Google refreshing their algorithm in order to see recovery. Refreshes have only happened four times since the original Penguin penalty was released, making opportunities for contrition few and far between.

Penguin has been problematic for Google because, at the end of the day, Penguin penalizations and the effects they have on businesses both large and small have been a PR nightmare for the search engine. Many would argue that Google could care less about negative sentiment among the digital marketing (specifically SEO) community, but the ire toward Google doesn’t stop there; many major mainstream publications like The Wall Street Journal, Forbes and CNBC have featured articles that highlight Penguin penalization and its negative effect on small businesses.

Dealing with Link Spam & Negative SEO Problems

Because of the effectiveness that link building had before 2012 (and to a degree, since) Google has been dealing with a huge link spam problem. Let’s be clear about this; Google created this monster when it rewarded inorganic links in the first place. For quite some time, link building worked like a charm. If I can borrow a quote from my boss, Bruce Clay: “The old way of thinking was he who dies with the most links wins.”

This tactic was so effective that it literally changed the face of the Internet. Blog spam, comment spam, scraper sites – none of them would exist if Google’s algorithm didn’t, for quite some time, reward the acquisition of links (regardless of source) with higher rankings.

Negative SEO: a problem that Google says doesn’t exist, while many documented examples indicate otherwise.

And then there’s negative SEO — the problem that Google has gone on record as saying is not a problem, while there have been many documented examples that indicate otherwise. Google even released the disavow tool, designed in part to address the negative SEO problem they deny exists.

The Penguin algorithm, intended to address Google’s original link spam issues, has fallen well short of solving the problem of link spam; when you add in the PR headache that Penguin has become, you could argue that Penguin has been an abject failure, ultimately causing more problems than it has solved. All things considered, Google is highly motivated to rethink how they handle link signals. Put simply, they need to build a better mousetrap – and the launch of a “new Penguin” is an opportunity to do just that.

A Solution: Penguin Reimagined

Given these problems, what is the collection of PhDs in Mountain View, CA, to do? What if, rather than policing spammers, they could change the rules and disqualify spammers from the game altogether?

By changing their algorithm to no longer penalize nor reward inorganic linking, Google can, in one fell swoop, solve their link problem once and for all. The motivation for spammy link building would be removed because it simply would not work any longer. Negative SEO based on building spammy backlinks to competitors would no longer work if inorganic links cease to pass negative trust signals.

Search Engine Technologies Defined

Knowledge Graph, Hummingbird and RankBrain — Oh My!

What is the Knowledge Graph?
The Knowledge Graph is Google’s database of semantic facts about people, places and things (called entities). Knowledge Graph can also refer to a boxed area on a Google search results page where summary information about an entity is displayed.

What is Google Hummingbird?
Google Hummingbird is the name of the Google search algorithm. It was launched in 2013 as an overhaul of the engine powering search results, allowing Google to understand the meaning behind words and relationships between synonyms (rather than matching results to keywords) and to process conversational (spoken style) queries.

What is RankBrain?
RankBrain is the name of Google’s artificial intelligence technology used to process search results with machine learning capabilities. Machine learning is the process where a computer teaches itself by collecting and interpreting data; in the case of a ranking algorithm, a machine learning algorithm may refine search results based on feedback from user interaction with those results.

What prevents Google from accomplishing this is that it requires the ability to accurately judge which links are relevant for any site or, as the case may be, subject. Developing this ability to judge link relevance is easier said than done, you say – and I agree. But, looking at the most recent changes that Google has made to their algorithm, we see that the groundwork for this type of algorithmic framework may already be in place. In fact, one could infer that Google has been working towards this solution for quite some time now.

The Semantic Web, Hummingbird & Machine Learning

In case you haven’t noticed, Google has made substantial investments to increase their understanding of the semantic relationships between entities on the web.

With the introduction of the Knowledge Graph in May of 2012, the launch of Hummingbird in September of 2013 and the recent confirmation of the RankBrain machine learning algorithm, Google has recently taken quantum leaps forward in their ability to recognize the relationships between objects and their attributes.

Google understands semantic relationships by examining and extracting data from existing web pages and by leveraging insights from the queries that searchers use on their search engine.

Google’s search algorithm has been getting “smarter” for quite some time now, but as far as we know, these advances are not being applied to one of Google’s core ranking signals – external links. We’ve had no reason to suspect that the main tenants of PageRank have changed since they were first introduced by Sergey Brin and Larry Page back in 1998.

Why not now?

What if Google could leverage their semantic understanding of the web to not only identify the relationships between keywords, topics and themes, but also the relationships between the websites that discuss them? Now take things a step further; is it possible that Google could identify whether a link should pass equity (link juice) to its target based on topic relevance and authority?

Bill Slawski, the SEO industry’s foremost Google patent analyzer, has written countless articles about the semantic web, detailing Google’s process for extracting and associating facts and entities from web pages. It is fascinating (and complicated) analysis with major implications for SEO.

For our purposes, we will simplify things a bit. We know that Google has developed a method for understanding entities and the relationship that they have to specific web pages. An entity, in this case, is “a specifically named person, place, or thing (including ideas and objects) that could be connected to other entities based upon relationships between them.” This sounds an awful lot like the type of algorithmic heavy lifting that would need to be done if Google intended to leverage its knowledge of the authoritativeness of websites in analyzing the value of backlinks based on their relevance and authority to a subject.

Moving Beyond Links

SEOs are hyper-focused on backlinks, and with good reason; correlation studies that analyze ranking factors continue to score quality backlinks as one of Google’s major ranking influences. It was this correlation that started the influx of inorganic linking that landed us in our current state of affairs.

But, what if Google could move beyond links to a model that also rewarded mentions from authoritative sites in any niche? De-emphasizing links while still rewarding references from pertinent sources would expand the signals that Google relied on to gauge relevance and authority and help move them away from their dependence on links as a ranking factor. It would also, presumably, be harder to “game” as true authorities on any subject would be unlikely to reference brands or sites that weren’t worthy of the mention.

This is an important point. In the current environment, websites have very little motivation to link to outside sources. This has been a problem that Google has never been able to solve. Authorities have never been motivated to link out to potential competitors, and the lack of organic links in niches has led to a climate where the buying and selling of links can seem to be the only viable link acquisition option for some websites. Why limit the passage of link equity to a hyperlink? Isn’t a mention from a true authority just as strong a signal?

There is definitely precedent for this concept. “Co-occurrence” and “co-citation” are terms that have been used by SEOs for years now, but Google has never confirmed that they are ranking factors. Recently however, Google began to list unlinked mentions in the “latest links” report in Search Console. John Mueller indicated in a series of tweets that Google does in fact pick up URL mentions from text, but that those mentions do not pass PageRank.

What’s notable here is not only that Google is monitoring text-only domain mentions, but also that they are associating those mentions with the domain that they reference. If Google can connect the dots in this fashion, can they expand beyond URLs that appear as text on a page to entity references, as well? The same references that trigger Google’s Knowledge Graph, perhaps?

In Summary

We’ve built a case based on much supposition and conjecture, but we certainly hope that this is the direction in which Google is taking their algorithm. Whether Google acknowledges it or not, the link spam problem has not yet been resolved. Penguin penalties are punitive in nature and exceedingly difficult to escape from, and the fact of the matter is that penalizing wrongdoers doesn’t address the problem at its source. The motivation to build inorganic backlinks will exist as long as the tactic is perceived to work. Under the current algorithm, we can expect to continue seeing shady SEOs selling snake oil, and unsuspecting businesses finding themselves penalized.

Google’s best option is to remove the negative signals attached to inorganic links and only reward links that they identify as relevant. By doing so, they immediately eviscerate spam link builders, whose only quick, scalable option for building links is placing them on websites that have little to no real value.

By tweaking their algorithm to only reward links that have expertness, authority and trust in the relevant niche, Google can move closer than ever before to solving their link spam problem.

