With offline interactions still severely disrupted, it is critical that your content is indexed on Google to ensure that your audience can find you online.
While optimizing websites for search engines consists of myriad factors, many take place behind the scenes of what users actually see on the page.
Among the moving parts, deindexation can easily slip under your radar and devalue your site.
Deindexation is when content that previously ranked on Google stops ranking because Google no longer includes it in its pool of eligible answers for a searcher’s query.
You may not realize a page has been deindexed, because your site is running as if everything is normal: You can still access the domain directly and by following links from third-party sites, and you can still see traffic figures because most pages remain searchable on Google.
But what you don’t see is that some pages are deindexed and no longer showing on the search results.
Even a giant social media platform is not immune to this SEO mishap. On May 6, 2020, LinkedIn’s entire site got deindexed from Google search results for around 10 hours before someone noticed and the issue was rectified.
Here, we share how to check whether any of your content has been deindexed and possible causes of deindexation so that you can avoid the issue but also detect/fix it if it does happen and get back to business as quickly as possible online.
How can I check if my content has been deindexed?
Do a Site: operator search
This is by no means a comprehensive check, but it does give you a quick answer as to whether your content is on Google.
Simply search “site: yourdomain.com” on Google for a site-wide check or search “site: yourdomain.com/your/page/path” for a particular page.
If your content is deindexed, it won’t show on the search result.
Access the Google Search Console (GSC) Coverage Report
This is the most comprehensive method to detect deindexation for each URL of your site.
The GSC Coverage Report classifies your pages into 4 statuses:
- Error
- Valid with warnings
- Valid
- Excluded
Clicking on Error and Excluded gives you a breakdown of status type and the reason your pages are not crawled or indexed. You can click on each status type to get a list of URLs that the issue applies to.
GSC Coverage Report: Error and Excluded Status
Inspect URL on Google Search Console (GSC)
Using the URL inspection tool, simply plug one URL at a time into the inspection bar to check indexation status and any reasons for deindexation.
If a URL is not indexed, the report will say “URL is not on Google.”
GSC Inspect URL Example: Page is Not Indexed
What causes Google to deindex my content?
Deindexed content typically results from 3 main issues:
- Your site code or status code is giving signals to Google to deindex your page.
- You misused the Google removal tool.
- You received an algorithm penalty/manual action.
Your site code or status code signals Google to deindex your page
Your site code and status code determine how Google treats your pages.
Below are some situations where changes to your site code or status code can cause a page to be deindexed. Each cause corresponds to a status type in the GSC Coverage Report:
Duplicate Content
The deindexed page is a duplicate of another page on your site. If Google does not accept the deindexed page as the canonical version, and/or you didn’t select the page as the canonical version, it gets excluded from indexation.
Redirect Error
You recently implemented a new set of redirects on your site, which may have created confusion for Google. This can include a redirect chain that goes on for 5 pages or more or a redirect loop where page A redirects to page B and vice versa.
URL Not Found
Google cannot index your content because your page is giving a 4xx status code (e.g., status 404 page is not found).
Server Error
Google cannot index your content because your server is giving a 5xx status code (e.g., status 500 internal server error).
Crawl Anomaly
Google found an unknown issue that prevents it from crawling your site for indexing. This could be a status 4xx, 5xx, or something that needs further troubleshooting by your developers.
Blocked by robots.txt
You accidentally told Google to exclude some pages from their index by implementing a no-index tag with robots.txt, meta robots, or an x-robots tag.
Search Engine Journal gives a good guide on how to fix deindexation issues depending on status type.
You misused the Google removal tool
Google Search Console has a removal tool to temporarily block pages from Google search results; for example, if you want to quickly take down sensitive information.
Quoting Google representative John Mueller’s Twitter, if you misuse this tool to remove a pre-existing HTTP version of your site, Google will also deindex other variations of your site (http/https/www/non-www).
GSC Removal Tool
You received an algorithm penalty/manual action
Your content may get deindexed if it doesn’t comply with Google’s Webmaster Guideline.
Google continuously tweaks its search algorithm to keep up with how we search and consume information. Every 3-4 months, Google releases a core algorithm update, which covers more significant, broader changes to the search algorithm.
You can always check what the new update addresses to see if your content could have been penalized due to noncompliance with recent changes.
***
As the world reopens and recovers, it’s crucial to routinely check whether you have any deindexed pages and take action to get them indexed and ranking again. This will ensure that your audience can find your content and safely, easily, and quickly interact with your brand online.