← Blog

What Google's Helpful Content Classifier Actually Looks At

Plain English on the eight signals Google's helpful content system scores — search intent, facts, E-E-A-T, originality, and more — and what to do about each one.

By Brian Diamond

Plain English on the algorithm that's been quietly demoting content farms since 2022 — and what your articles are being graded against whether you know it or not.


In August 2022, Google rolled out the first version of what it called the Helpful Content Update. Three years later, it's the most consequential change to how content gets ranked in a decade, and most people who publish content for a living couldn't tell you what it actually measures.

The reason is partly Google's fault. Their public communication about it has been carefully vague — talk of "people-first content," "demonstrating expertise," "showing original analysis." Vague enough that it could mean almost anything, which is exactly when SEO advice fills the vacuum with confident speculation.

This post is the opposite of that. We're going to walk through the eight signals the Helpful Content classifier actually scores against, based on what Google has published in the Search Quality Rater Guidelines, the official Search Central documentation, and what the algorithm updates from 2022 through 2026 demonstrably targeted. Each signal gets a plain-English explanation, an example of what failing it looks like, and what you can do about it.

If you publish content of any kind — your own blog, your company's, AI-assisted articles, agency-written posts — your work is being scored against these whether you know it or not. The point of this post is making sure you know it.

What the Helpful Content classifier actually is

First, terminology. "Helpful Content" started life in 2022 as a discrete algorithm update. By March 2024, Google had integrated it into the core ranking system — meaning it's no longer something that runs periodically, it's something every page is scored against, all the time.

The classifier doesn't ask "is this article well-written?" It asks "would a person who clicked this from search results feel like they got what they came for?" Those are different questions. Well-written content can be unhelpful. Helpful content can be mediocre writing.

The eight signals below are the dimensions Google grades that "feel like they got what they came for" question on.

Signal 1: Search intent match

When someone searches for "best espresso machines under $500," they're not looking for a history lesson on espresso. They're shopping. That's commercial intent. If your article on that query is a 2,000-word explainer about how espresso machines work, you've failed the intent match — regardless of how good the explainer is.

Google sorts queries into four broad intent categories:

  • Informational"how does X work," "what is X," "why does X happen"
  • Commercial"best X," "X vs Y," "top X for Y"
  • Transactional"buy X," "X coupon," "X near me"
  • Navigational — looking for a specific brand or site

Each intent type wants a different format. Informational queries want explainers and how-to guides. Commercial queries want comparisons, reviews, or buying frameworks. Transactional queries want product pages with clear purchase paths. Navigational queries want the actual brand site.

What failing this looks like: an article titled "Best Beef Chili Recipe" that delivers a meditation on the philosophy of chili without a clear recipe. The title promises commercial-intent content; the article delivers informational content. Google's classifier picks up the gap, the reader bounces, the article doesn't rank.

The fix: before writing, identify the target query and its intent type. Match the article format to the intent. If you want to write the philosophy piece, title it differently — "What Makes Chili Award-Winning," informational, gets to deliver an explainer.

Signal 2: Factual accuracy

Google's quality systems are increasingly aggressive about factual accuracy, particularly in YMYL categories ("your money or your life" — health, finance, legal, anything affecting wellbeing). The classifier doesn't fact-check in real time, but the Quality Rater Guidelines explicitly instruct evaluators to mark down content with "inaccurate information" — and rater data trains the algorithm.

What this means in practice: specific factual claims — numbers, dates, standards, technical assertions — get evaluated for whether authoritative sources support them. Articles where claims contradict published expert sources score low on this dimension. So do articles that make specific assertions without citation, which the classifier interprets as low-confidence content.

What failing this looks like: an article confidently states a published standard ("the SCA Gold Cup ratio is 1:16") when the actual standard is different (the SCA's published Golden Cup spec is 55g/L, approximately 1:18). Both feel authoritative when read; only one is right.

The fix: every specific factual claim in your article should be one of three things — supported by an authoritative source you can cite, reframed as your own observation ("in our experience," "in our shop"), or removed. Confidently wrong is worse than vague.

Signal 3: E-E-A-T signals

E-E-A-T stands for Experience, Expertise, Authoritativeness, Trustworthiness. It was originally E-A-T (the second E was added in 2022, the same year as the Helpful Content Update — not a coincidence). It's the framework Google's quality raters use to score content, especially in YMYL categories.

Each letter measures something different:

  • Experience — first-hand markers. "I tested this for six months." "When we deployed this in production." "In our shop." Dated, specific, falsifiable.
  • Expertise — appropriate technical depth, accurate domain vocabulary, named credentials.
  • Authoritativeness — citations to authoritative external sources, clear positioning of who is writing and why they would know.
  • Trustworthiness — named author with bio, transparent methodology, balanced views, disclosure of stakes (affiliate links, sponsorships).

The pattern Google's quality raters have documented in their guidelines: pages that score well on E-E-A-T have all four dimensions. Pages that score poorly are usually missing the same things — no named author, no first-hand experience, no external citations, no methodology.

What failing this looks like: an article with no byline, no bio, no "about the methodology" notes, dozens of internal links and zero external authoritative ones. Looks fine on first read. Scores red on every E-E-A-T dimension because the structural signals Google needs to grade trustworthiness simply aren't present.

The fix: the single highest-leverage E-E-A-T fix is adding a named author template across your site. Byline, bio, schema.org Person markup. One change, applied at the template level, lifts every existing and future article. We'll come back to this in a later post.

Signal 4: Originality vs the SERP

If your article is essentially the same as what's already ranking for the target query, Google has no reason to rank it higher than what's there. The classifier explicitly looks for "original analysis," "first-hand information," and "value beyond what's available in the existing search results."

This is the dimension that catches AI-generated content most reliably. When you ask an LLM to write an article on a popular topic, it pulls heavily from the patterns most represented in its training data — which are the same articles already ranking for that topic. The result reads fine but doesn't add anything the SERP doesn't already have.

What failing this looks like: semantic similarity scores above 0.85 between your article and the top three results for your target query. You've written a paraphrase of what already ranks. Google treats it as redundant.

The fix: before writing, read the top 10 results for your target query. Identify what they don't cover — the gap, the contrarian angle, the missing data, the alternative framing. Build your article around that gap. If you can't find a gap, either find a different topic or accept the article is a commodity and adjust expectations.

Signal 5: Helpful content patterns (the rubric itself)

There's a specific set of stylistic patterns the classifier penalizes. Google has hinted at these in their guidance, but the actual list is observable in what got demoted in the 2022, 2023, 2024, and 2025 updates.

The patterns:

  • Throat-clearing intros — "In today's fast-paced digital world..." "More than ever before, businesses must..." These are filler that signals AI default style.
  • Transition stuffing — overuse of "moreover," "furthermore," "additionally," "in addition," especially as paragraph openers.
  • Conclusion-as-summary — final paragraphs that restate what was already said without adding value. The reader knows what they just read; you don't need to recap.
  • Formulaic intro-body-conclusion structure — every article hitting the same three-act pattern with identical pacing.
  • Vague generalities — "many studies show," "experts agree," "some people believe." Specifics signal expertise; vagueness signals padding.
  • Excessive hedging — "could potentially be," "may possibly," "it could be argued." Used too often, these undermine authority.
  • Defining the obvious — paragraphs that explain what a common term means before using it, padding word count without adding information.

What failing this looks like: an article that hits five or more of these patterns simultaneously. Reads competent on first scan, scores poorly on signal density and pattern-avoidance when measured carefully.

The fix: read your article aloud. The phrases that sound like throat-clearing on a second read are the ones to cut. Most articles tighten by 15-20% without losing any substance — and the tightened version scores materially better on this signal.

Signal 6: Internal linking quality

This one's straightforward but often neglected. Google's quality systems evaluate the internal link structure of an article as a signal of editorial quality and topical authority. The dimensions that matter:

  • Link count — too few internal links signals an "orphan" article. Three to five contextually relevant internal links is the rough target.
  • Anchor descriptiveness — "click here" and "read more" are wasted anchors. Descriptive anchors ("our guide to chili technique") tell Google what the linked page is about, strengthening both the source and target.
  • Contextual relevance — links that relate to the article's actual topic, not random cross-promotion.
  • Pillar page hits — articles that link to your authoritative pillar pages on related topics are passing authority signal correctly.

What failing this looks like: an article with zero internal links, or 50 internal links all to category pages with generic "explore more" anchors. Both are bad in different ways.

The fix: before publishing, identify three to five relevant pillar pages on your site. Add contextual links to each with anchor text that describes the target page's topic.

Signal 7: Brand voice consistency

Less obvious than the others, but real. Google's quality systems include signals related to site-level coherence — whether all articles on a site appear to be from a unified editorial voice or whether they read as a grab-bag of disconnected pieces.

This catches one specific failure mode of AI-assisted publishing: articles that default to a generic explainer voice instead of matching the brand's established tone. A recipe site's articles should sound like a recipe site. A technical SaaS blog's articles should sound like a technical SaaS blog. When every article reads like "default helpful internet article voice," the site looks like a content farm.

What failing this looks like: comparing an article to ten others on the same site reveals stark differences in tone, sentence rhythm, vocabulary register, and stylistic signature. The article was likely AI-drafted without a brand voice prompt.

The fix: before drafting (or before approving drafted content), pull three of your strongest existing articles and explicitly instruct your writing process — human or AI — to match their voice. Sentence rhythm, vocabulary register, signature phrases.

Signal 8: Technical SEO fundamentals

The boring stuff. Title tags, meta descriptions, schema markup, heading hierarchy, image alt text, canonical URLs. These don't directly drive helpfulness, but the absence of them signals to Google that the page wasn't carefully published — which correlates strongly with low-quality content.

The specific checks:

  • Title tag present and in the 30-60 character range
  • Meta description present and in the 120-160 character range
  • Exactly one H1, semantically matching the title intent
  • Heading hierarchy clean (no skipped levels — H2 followed by H4 with no H3)
  • All images have descriptive alt text
  • Schema.org Article markup present and valid
  • Canonical URL correctly set
  • OpenGraph tags for social sharing
  • Word count above 600 (penalty for thin content)

What failing this looks like: any of the above missing. They're small individually; cumulatively they signal "this page wasn't taken seriously."

The fix: these are the easiest wins on the entire list. Most are template-level fixes — apply once, lift every article on the site.

Why these eight, why now

Three years into the Helpful Content era, the data on what gets demoted is consistent. The sites that lost the most traffic in the 2023-2025 updates shared specific patterns:

  • AI content farms scaling articles without any of signals 2, 3, 4, or 5 — confidently wrong claims, no E-E-A-T, derivative content, AI-tell patterns
  • Forbes Advisor and similar publisher-affiliate hybrids that produced volume content with thin authorship despite owned-brand authority
  • Recipe content aggregators that scaled by paraphrasing existing recipes without original testing or verifiable methodology
  • Health and finance content sites without genuine medical or financial expertise in the byline

The pattern across all of them: they optimized for signals search engines used to care about — keywords, backlinks, basic on-page SEO — while ignoring the signals search engines now care about. The Helpful Content classifier is what caught them.

The sites that gained traffic during the same period were the opposite: long-running publications with named author bylines, transparent methodology, original analysis or testing, modest volume but high signal density per article.

The lesson isn't subtle. The Helpful Content classifier rewards what looks like genuine human publishing. It demotes what looks like content production at scale without the human signals.

What to do with this

Three concrete actions, ordered by leverage:

1. Audit your highest-traffic existing articles against these eight signals. Pick the five articles that drive the most organic traffic. Score each against the eight signals above. The patterns that emerge — the same signal failing across multiple articles — are the systemic problems worth fixing first.

2. Fix at the template level where possible. Signals 3 (E-E-A-T), 6 (internal linking), and 8 (technical SEO) all have template-level fixes that lift every article on your site at once. Named author template, internal linking checklist, schema markup. One change, lasting impact.

3. Build the eight signals into your publishing process. Whether you write yourself, use AI, or hire agencies, the publishing checklist should include all eight. The article doesn't ship until each signal is addressed. This is the prevention move that compounds over time.

For a worked example of what these eight checks look like on real sites — including the uncomfortable findings — see our case study auditing ChiliStation, Onaro, and day9.coffee.

A small disclosure

I built a tool called Revylo that audits content against exactly these eight signals. It's at revylo.app — you can run one URL through it free, no signup. The tool is the operationalization of this article: turn each signal into a check, run them against any URL, produce a scorecard with specific findings and fixes.

The reason I'm mentioning it here isn't pitch — it's disclosure. If you read this article and want to act on it, you'll need a way to score articles against the eight signals. There are three options: do it manually (~30 minutes per article, requires deep SEO fluency), pay an expert consultant ($150-500 per audit), or use a tool. I built the tool because the first two options didn't scale for me running six properties.

But the eight signals are public. The Google guidelines are public. The Helpful Content Update history is documented. You can apply this framework without my tool. The point of this article is making sure you can — whether or not Revylo is what you end up using.

What matters is that you stop optimizing for an SEO model from 2019. The system you're being graded by now is the one above.


This article was audited by Revylo.

CheckScoreStatus
Search Intent100green
Fact Grounding73yellow
Helpful Content45red
E-E-A-T72yellow
Originality50yellow
Internal Linking86green
Brand Voice
Technical SEO80green

Try Revylo on your own content

Audit a URL free — same eight checks used in this article.

Audit a URL free →