In one of my previous posts, I have discussed Custom Robots Header Tags for bloggers. If you have read that post then I hope you guys are aware of its importance in search rankings.
Today, I come with a very useful and must aware of blogging term that is Robots.txt.
If you have read that post then I hope you guys are aware of its importance in search rankings.
In blogger, it is known as Custom Robots.txt which means now you can customize this file according to your choice.
In today’s tutorial, we will discuss this term in deep and come to know about its use and benefits. I will also tell you how to add custom robots.txt file in blogger.
So let’s start the guide.
What is Robots.txt?
Robots.txt is a text file that contains few lines of simple code.
It is saved on the website or blog’s server which instructs the web crawlers on how to index and crawl your blog in the search results.
That means you can restrict any web page on your blog from web crawlers so that it can’t get indexed in search engines like your blog labels page, your demo page or any other pages that are not as important to get indexed.
Always remember that search crawlers scan the robots.txt file before crawling any web page.
Each blog hosted on blogger has its default robots.txt file which is something look like this:
This code is divided into three sections. Let’s first study each of them after that we will learn how to add custom robots.txt file in Blogspot blogs.
User-agent: Media partners-Google
This code is for Google Adsense robots which help them to serve better ads on your blog. Either you are using Google Adsense on your blog or not simply leave it as it is.
This is for all robots marked with an asterisk (*). In default settings, our blog’s labels links are restricted to indexed by search crawlers that mean the web crawlers will not index our labels page links because of the below code.
That means the links having a keyword search just after the domain name will be ignored. See below example which is a link of label page named SEO.
And if we remove Disallow: /search from the above code then crawlers will access our entire blog to index and crawl all of its content and web pages.
Here Allow: / refers to the Homepage that means web crawlers can crawl and index our blog’s homepage.
Disallow Particular Post
Now suppose if we want to exclude a particular post from indexing then we can add below lines in the code.
Here yyyy and mm refer to the publishing year and month of the post respectively. For example, if we have published a post in the year 2020 in the month of March then we have to use the below format.
To make this task easy, you can simply copy the post URL and remove the blog name from the beginning.
Disallow Particular Page
If we need to disallow a particular page then we can use the same method as above. Simply copy the page URL and remove blog address from it which will something look like this:
This code refers to the sitemap of our blog. By adding a sitemap link here we are simply optimizing our blog’s crawling rate.
Means whenever the web crawlers scan our robots.txt file they will find a path to our sitemap where all the links of our published posts present.
Web crawlers will find it easy to crawl all of our posts.
Hence, there are better chances that web crawlers crawl all of our blog posts without ignoring a single one.
Note: This sitemap will only tell the web crawlers about the recent 25 posts. If you want to increase the number of links in your sitemap then replace the default sitemap with below one. It will work for the first 500 recent posts.