Document Freshness As A Ranking Factor

Every website is closely monitored for how “fresh” its documents are. When a document is no longer considered fresh, it is considered “stale”. To a search engine, a stale document signals that a document is low quality and should be recommended less.

There are metrics in place to measure a document’s staleness. Simply, a document’s freshness can be measured in two ways:

  • When the document was published
  • When the document was updated

In this article, we will take a deeper look into what it means to “update” a document.

When the document was published (Inception date)

Inception date refers to the date when the document is recognized by the search engine. It doesn’t necessarily mean the date when the document was actually published. This data can be falsified.

There are multiple ways a search engine can triangulate a document’s inception date:

  • The document’s self-described publish date
  • The inception date of a link to another document
  • The inception date of a link pointing to the document

Content updates and changes to the document

Content that hasn’t been updated in some time will be considered stale and will be scored lower in search results. This score can be compared to other documents being shown in the same search results and seeing how often those documents are being updated. Documents that have been more recently updated are considered fresher.

Content that doesn’t require much updating is considered static content, but can be considered stale, and will be scored lower regardless. To counteract this negative status, the document’s performance is brought into question.

This leads us to look at deeper metrics that are considered when determining if a document is stale. With every metric being measured, there is also the RATE of which each metric is measured. This tells us that search engines are not only measuring the number of each metric, but also the TREND of those numbers over time.

  • Click-through rate – When the document is shown in a result, does it still appear to be the most relevant result?
  • Bounce rate – When a user clicks through, do they stay on the page or immediately leave, signalling the document is out of date?
  • Time on page – Does the user stay on the page, signalling they found what they are looking for?
  • New links to the document – Is the document still receiving credit for content within? Is the linking document from a real website or was it manually placed in an attempt to spam the search engine?
  • Lost links to the document – Has the document lost links, signalling that it is no longer relevant, or a better document took its place?
  • New links from the document – Has the document added or removed links within, signalling an update to its cited sources?
  • Change in anchor text – This can apply to links featured within the document as well as links pointing to the document. An update in anchor text can signal the document’s meaning has changed.
  • Changing the content – Content on a page is given a “content signature”, where a search engine can create an abstract view of a page, not necessarily parsing the content, but understanding what the content “looks like”, much like a screenshot. The signature is used to compare subsequent crawls to see if any changes took place.
  • Changes in the title – A change in the document’s title signals a large change as the meaning of the page may have changed significantly.
  • All metrics and trends for linking documents – Every document you receive a backlink from has its own set of metrics that are being measured as well as their trends for those metrics. The linking documents can become stale and are considered no longer valid.
  • Social signals – Does the document get shared and produce engagement on social networks? How often are these signals produced? Do they maintain a trend?
  • Traffic (All sources) – Does the document still receive a healthy amount of traffic, signalling the document is still useful? Traffic from advertising is considered, as this contributes towards collecting data on all other metrics.
  • Trusted domain – Search engines can score the legitimacy of a domain based on DNS records, such as WHOIS information compared to information presented on the website. Doman renewal periods are also taken into consideration. Domains renewed on a yearly basis are trusted less than domains that are registered for 5-10+ years.
  • Keywords ranking for the document – How many keywords does the document rank for? If it gained too many keywords at once, is it spam? If it is losing keywords, has the meaning of the keywords changed?
  • Links to and from irrelevant documents – Does the document link to completely unrelated content? Does the document receive links from unrelated content? A document that is associated with unrelated content signals spam or staleness.

Visualizing the metrics

Below is a flowchart of sorts, visualizing how the metrics are measured. There are many moving parts to be mindful of, but ultimately, for an individual, this can be simplified.

historical data to determine document freshness staleness peter krysik

Given enough time, if a web page hasn’t seen any changes or new data, then the document is considered stale.

Testing the findings

A website containing 130 articles published over two years ago has lost its position for nearly all keywords. Many articles have not been updated or promoted in any way since their inception date.

Hypothesis: If a few simple changes were made to the document, it can produce a signal that the document is no longer stale.

Testing (Phase One): Change the page’s title and add 2-3 new sections of content. Publish the update and do nothing else. Monitor the changes.

Testing (Phase Two): Add internal and external links to documents. Change anchor text for existing links.

In the example below, a document that has been ranking in the top 5 positions for a given keyword has been completely removed from search results. Upon updating the content, the document immediately regains its previous positioning.

keyword positions return after content updating

When applied to several pages, the rankings return overnight. The documents in question have strong historical data. Not only do the pages have historical data, but they also carry with them the trend of historical data over many years. When the pages are considered “stale”, they are simply scored very low.

keyword positions return for several pages after content updating

It is not surprising that content updating works, but it helps to understand why it works. By knowing what the search engine is looking for, we can take the guess work out of our content updating processes.

This is not about tricking the search engine or spamming it. It is about knowing what the machine actually cares about. A webmaster may produce a fantastic piece of content, but if the search engine deems it to be stale, then that content becomes invalid. For a business that may publish static content, it is valuable to understand how static content can be made “evergreen” with a few simple updating processes.

Leave a Comment

Leave your thoughts below. Follow on Twitter