Search robots (they are also crawlers, bots, web spiders) – a program that indexes site pages by searching already on the indexed pages.
Bot operation scheme:
Scanning – collecting all data from a page including images, text and video. This process happens more than once, because changes can be made on the page.
Indexing – adding information to the search engine database.
Search results – search for information by index and ranking of pages based on relevance to the query.
How search robots work and their functions
Search results are formed in three stages:
Scanning – collection of all data from web pages by bots, including texts, pictures and videos. This process occurs regularly, taking into account the frequency of resource updates.
Indexing – entering the collected information into the database of search engines with the assignment of a certain index for quick search. On major news portals, content is indexed almost immediately after publication.
Delivery of results – information search by index and page ranking, taking into account the relevance to the request.
Sometimes the process of indexing pages occurs even without first scanning them. In file robots.txt specifies rules for crawling, but not indexing pages. Therefore, if the search robot finds the page in another way, for example, if third-party resources refer to it, it can add it to the database.
What bots do Google and Yandex have?
Each search engine has its own search bots. Let’s take a look at Google and Yandex as examples.
Googlebot – the main bot. Works for desktop and mobile versions of standard sites. Since July 2019, priority scanning of mobile versions of sites has been added, so most robots will process mobile versions.
For example, the code below in the robots.txt file prevents the Yandex.Images robot from indexing all images.
User-agent: YandexImagesDisallow: /
And this one prohibits the main search engine Google from indexing the page on which this tag is located:
What’s on the dark side?
It is undoubtedly cool that you can find the information you need through a search in a couple of seconds. But let’s see how this can be used for evil purposes:
OSINT – it is not so difficult to find personal information through a search, which means to replenish the piggy bank of compromising evidence on an enemy.
Inability to delete – many people think that it will not be difficult to delete personal information, but you are mistaken. Often assholes work on Google, and they will not want to listen to your requests.
According to the results
Different content is processed by bots in a different sequence. This allows huge amounts of data to be processed simultaneously. Thanks to crawlers, we can search for the information we need every day. The robot itself can search for pages, and such a program does not require special expenses for employees. But there are also dark sides, like OSINT through search, refusal to delete information, etc.
It is better to block information from indexing using themeta tag or the X-Robot tag http-header, since the robots.txt file contains only crawling recommendations, not direct commands for action.
All information posted has been taken from public sources and is provided for information purposes only and does not constitute an invitation to action. It was created only for educational and entertainment purposes. All information is intended to protect readers from illegal actions. The visitor undertakes all possible losses caused. The author does all actions only on his own equipment and in his own network. Do not repeat anything read in real life. | Also, if you are the rightholder of the material posted on the pages of the portal, please write to us through contact form complaint about the removal of a particular page, as well as read instruction for rightholders of materials. Thank you for understanding.
We are not against any use of materials, but when you specify an active link to our site. Be sure to share records on social networks - let's develop our cozy service together!
This is an open platform for viewing and publishing a variety of information about PCs, operating systems, gadgets such as Android and Apple, and more!
Connect with us
To contact us, you can write to us at the e-mail address specified in the section "contacts".