What is the Difference Between Crawling and Indexing?

Read Time: 3 minutes

There is one strange thing about SEO you should know:

Google can index without crawling.

Weird, we know.

Let’s take a closer look at the difference between each.  Special thanks to Mark Brown for this explanation:

Crawling and indexing are two distinct things and this is commonly misunderstood in the SEO industry. Crawling means that Googlebot looks at all the content/code on the page and analyzes it. Indexing means that the page is eligible to show up in Google’s search results. They aren’t mutually inclusive.

We look at it as if Googlebot were a person who is a tour guide, and he’s walking down a hallway that has many closed doors. If Google is allowed to a crawl a page (a room), he can open the door and actually look at what’s inside (crawling). Once inside the room, there might be a sign that says he’s allowed to show people the room (able to index; the page shows up in SERPs), or the sign might say that he’s not allowed to show people the room (“noindex” meta robots tag; the page was crawled since he was able to look inside, but will NOT show up in SERPs since he’s instructed not to show people the room). If he’s blocked from crawling a page (let’s say there’s a sign on the outside of the door that says “Google, don’t come in here”), then he won’t go inside and look around, and because of that fact, he doesn’t know whether or not he’s supposed to show people the room because those instructions are actually inside of the room. So he won’t look inside the room but he’ll still point out the room (index) to people and tell them they can go inside if they want. Even if there’s an instruction on the inside of the room telling him not to let people go to the room (“noindex” meta robots tag), he’ll never see it since he was instructed not to go into the room in the first place. 

So blocking a page via robots.txt means it IS eligible to be indexed, regardless of whether you have an “index” or “noindex” meta robots tag within the page itself (since Google won’t be able to see that because it’s blocked from crawling, so by default it treats it as indexable). Of course, this means that the page’s ranking potential is lessened (since it can’t actually analyze the content on the page, therefore the ranking signals are all off-page + domain authority). If you’ve ever seen a search result where the description says something like “This page’s description is not available because of robots.txt”, that’s why.

See how Google crawls your website using Sure Oak’s SEO Site Crawler free tool. Then, if you’d like to take more control over your website’s indexing, take advantage of our free robots.txt generator tool. And, for all the rest of your meta tags, we have a free meta tag generator tool, too!

Join thousands of marketing insiders and get exclusive strategies and insights to grow your business

Join thousands of marketing insiders and get exclusive strategies and insights to grow your business

Related Blog Posts

value prop

Value Proposition & How to Use the 3 Uniques Tool

Read Time: 5 minutesHow to Use the 3 Uniques Tool Your value proposition is your competitive differentiator. It’s the proverbial “silver bullet” or “secret sauce”  ...
white label ppc agency

What is a White Label PPC Agency and Do You Need One?

Read Time: 11 minutesA white-label PPC agency creates pay-per-click (PPC) advertising campaigns for businesses that want to leverage this efficient and cost-effective marketing method to ...
pick a niche

How to Use the Pick a Niche Tool

Read Time: 3 minutes “The essence of marketing is narrowing the focus. You become stronger when you reduce the scope of your operations. You can’t ...