Multiple spiders crawling a site increases server burden.
To improve accessibility, sites in China can identify and allow spiders from only Baidu–China’s biggest search engine–and block spiders from smaller search engines.
Here are 2 easy steps to identify Baidu spiders.
Step 1: Check the user agent
A spider is not from Baidu if its user agent is not in the list below. The UA of Baidu’s spiders fall into 3 categories: mobile, desktop, and application.
Baidu Mobile UA
Mozilla/5.0(Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko)Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html)
Or
Mozilla/5.0 (iPhone;CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko)Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0;+http://www.baidu.com/search/spider.html)
Baidu Desktop UA
Mozilla/5.0(compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Or
Mozilla/5.0(compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)
Baidu Applications UA
Mozilla/5.0 (iPhone;CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko)Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0;Smartapp; +http://www.baidu.com/search/spider.html)
Step 2: Reverse look up the IP
Webmasters can find out if a spider is from Baidu by reverse looking up its IP. How to reverse look up the IP depends on your operating system. Below are the validation methods for 3 operating systems: Linux, Windows, and Mac OS.
Linux
In Linux, you can use the command host IP to reverse look up the spider. The hostname of Baidu spiders includes *.baidu.com or *.baidu.jp. If it doesn’t have this hostname, then the spider is not from Baidu. The below image shows 2 examples of Baidu spiders:
Windows
In Windows or IBM OS/2, you can use the command nslookup IP to reverse look up the spider. Open CMD and type in nslookup xxx.xxx.xxx.xxx (IP). If the domain does not include *.baidu.com nor *.baidu.jp, then the spider is not from Baidu.
Mac
In Mac OS, you can use the command dig IP to reverse look up the spider. Open CMD and type in dig xxx.xxx.xxx.xxx (IP). If the domain does not include *.baidu.com or *.baidu.jp, then the spider is not from Baidu.
***
Pro-tip: Another question we’re frequently asked is whether we can add the IPs of Baidu spiders to a crawling white list. Unfortunately, Baidu doesn’t provide their spiders with a constant IP, because the crawlers are dynamic.