Crawl-first SEO is a methodology for SEO auditing of websites which priviliges crawl data analysis. You can learn more about this methodology from the slides at this URL: Crawl-first SEO
Crawl instructions are the rules defined by website owners which aim to guide a crawler's behaviour on their sites. They can be specified in robots.txt file, in source code of pages and in HTTP Headers of pages.
Crawler trap is a group of pages that may cause a web crawler or a bot to request pages infinetely and sometimes can cause crawlers which are not robost to crash. They may be created intentionally to crash unwanted bots or else can be created unintentionally by time dependent pages like calendars.
Crawl rate is requests per second a crawler makes to a website when it is crawling it.
Crawl rate limit
Crawl rate limit is how much a crawler(a bot) is allowed to crawl pages on a site for avoiding server overload and not doing harm.
Usually it is expressed as pages/sec a crawler can request from a server meaning number of simultaneous parallel connections bot may use to crawl the site however time to wait between fetches can also be specified too.
Servers can be configured in order to restrict crawl requests on them for example by responding with HTTP status code other than 200 through exceeding a certain number of requests per second. For instance, a server may respond with HTTP status code 503 (Service Temporarily Unavailable) when a crawler’s requests exceed 10 pages per second.
Crawl access blockings (Crawl blockings)
Servers can be configured in order to block crawl requests from a crawler by using some elements in crawler's requests such as the user-agents or IPs of the crawlers.
Preferred crawl days or hours
Due to a client's request a crawler can be configured to crawl a website in specified days or hours of days.
Crawl budget is an allocation of crawl requests to a host.
Crawl frequency is a program determining which sites to crawl, how often, and how many pages to fetch from each site.
Crawl rank is the frequency a page is crawled by a search engine bot compared to the ranking position of that page on that search engine.
Crawl space is the totality of possible URLs for a website.
Crawl ratio is the number of pages crawled by a search engine bot compared to the total number of available pages to crawl on a website. 100% means search engine knows all the pages on that website.
Effective crawl ratio
Effective crawl ratio is the number of specific pages crawled by a search engine bot in the crawl window of that type of pages on that website for that search engine compared to the total number of available pages to crawl on that type of pages of the website.
Crawl window timeframe a search engine accepts to send visitors to a URL after it crawled it. That period will vary depending on the type of pages. Knowing that number allows to estimate the Effective Crawl Ratio.
Depth is the shortest path ‘minimal number of clicks’ from the homepage to a particular page. Crawl depth is how deep a crawler is programmed to explore a website.
Crawl waste is pages available to crawl on a website which have no unique content, no SEO aim, no add value neither for users nor for search engines.
Crawl simulation is crawling not the website but the crawl of that website which is already performed.
Crawl efficiency is the number of useful crawled pages by a search engine bot compared to all crawled pages by the same bot in a defined period.
Crawl retention is crawl timeframe by a search engine bot before a visit is recorded from that search engine.
Crawl optimization is intelligent use of Crawl Budget(Allocation) on a website.
Crawl performance is the average time a crawler spends downloading a page (in milliseconds)
Useful crawl is crawled pages of a website by a search engine bot which bring at least one visit from that search engine in a defined period.
Useless crawl is crawled pages of a website by a search engine bot which bring no visits from that search engine in a defined period.
Simulate empty crawl
Simulate empty crawl is URLs which are known and kept by a crawler but not requested from the host, typically URLs blocked by robots.txt of a website. A crawler is sometimes configured on purpose to perform that kind of crawl to perform analysis on the links of a website.
Partial crawl is crawling specific, selected parts of a website.
Unique crawl is unique URL, a crawler crawls on a single website in a defined period.
A page on a site which is crawled by a crawler. In Crawl-first SEO it is the page which is crawled by a searche engine bot in the studied period.
Determining a schedule for recrawling pages.
Methods, systems, including computer programs used on a computer storage medium, for scheduling crawl requests by determining the integrity of a document and the crawl frequency on ot, i.e. by using the popularity of the document or the frequency of the content changes on it.
A browser-specific rendering crawler which can determine browser-specific behavior.
Thanks for taking time to read this post. I offer consulting, architecture and hands-on development services in web/digital to clients in Europe & North America. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn
Have comments, questions or feedback about this article? Please do share them with us here.
If you like this article