Crawl dictionary

Crawl-first SEO

Crawl-first SEO is a methodology for SEO auditing of websites which priviliges crawl data analysis. You can learn more about this methodology from the slides at this URL: Crawl-first SEO

Crawl instructions

Crawl instructions are the rules defined by website owners which aim to guide a crawler's behaviour on their sites. They can be specified in robots.txt file, in source code of pages and in HTTP Headers of pages.

Crawler trap

Crawler trap is a group of pages that may cause a web crawler or a bot to request pages infinetely and sometimes can cause crawlers which are not robost to crash. They may be created intentionally to crash unwanted bots or else can be created unintentionally by time dependent pages like calendars.

Crawl rate

Crawl rate is requests per second a crawler makes to a website when it is crawling it.

Crawl rate limit

Crawl rate limit is how much a crawler(a bot) is allowed  to crawl  pages on a site for avoiding server overload and not doing harm.

Usually it is expressed as pages/sec a crawler can request from a server meaning number of simultaneous parallel connections bot may use to crawl the site however time to wait between fetches can also be specified  too.

Crawl restrictions 

Servers can be configured in order to restrict crawl requests on them for example by responding with HTTP status code other than 200 through exceeding a certain number of requests per second. For instance, a server may respond with HTTP status code 503 (Service Temporarily Unavailable) when a crawler’s requests exceed 10 pages per second.

Crawl access blockings (Crawl blockings)

Servers can be configured in order to block crawl requests from a crawler by using some elements in crawler's requests such as the user-agents or IPs of the crawlers.

Preferred crawl days or hours

Due to a client's request a crawler can be configured to crawl a website in specified days or hours of days.

Crawl budget

Crawl budget is an allocation of crawl requests to a host.

Crawl frequency

Crawl frequency is  a program determining which sites to crawl, how often, and how many pages to fetch from each site.

Crawl rank

Crawl rank is the frequency a page is crawled by a search engine bot compared to the ranking position of that page on that search engine.

Crawl space

Crawl space is the totality of possible URLs for a website.

Crawl ratio

Crawl ratio is the number of pages crawled by a search engine bot compared to the total number of available pages to crawl on a website. 100% means search engine knows all the pages on that website.

Effective crawl ratio

Effective crawl ratio is the number of specific pages crawled by a search engine bot in the crawl window of that type of pages on that website for that search engine compared to the total number of available pages to crawl on that type of pages of the website.

Crawl window

Crawl window timeframe a search engine accepts to send visitors to a URL after it crawled it. That period will vary depending on the type of pages. Knowing that number allows to estimate the Effective Crawl Ratio.

Crawl depth

Depth is the shortest path ‘minimal number of clicks’ from the homepage to a particular page. Crawl depth is how deep a crawler is programmed to explore a website. 

Crawl waste

Crawl waste is pages available to crawl on a website which have no unique content, no SEO aim, no add value neither for users nor for search engines.

Crawl simulation

Crawl simulation is crawling not the website but the crawl of that website which is already performed.

Crawl efficiency

Crawl efficiency is the number of useful crawled pages by a search engine bot compared to all crawled pages by the same bot in a defined period.

Crawl retention

Crawl retention is crawl timeframe by a search engine bot before a visit is recorded from that search engine.

Crawl optimization

Crawl optimization is intelligent use of Crawl Budget(Allocation) on a website.

Crawl performance

Crawl performance is the average time a crawler spends downloading a page (in milliseconds) 

Useful crawl

Useful crawl is crawled pages of a website by a search engine bot which bring at least one visit from that search engine in a defined period.

Useless crawl 

Useless crawl is crawled pages of a website by a search engine bot which bring no visits from that search engine in a defined period.

Simulate empty crawl

Simulate empty crawl is URLs which are known and kept by a crawler but not requested from the host, typically URLs blocked by robots.txt of a website. A crawler is sometimes configured on purpose to perform that kind of crawl to perform analysis on the links of a website.

Partial crawl

Partial crawl is crawling specific, selected parts of a website.

Unique crawl 

Unique crawl is unique URL, a crawler crawls on a single website in a defined period.

Crawled page

A page on a site which is crawled by a crawler. In Crawl-first SEO it is the page which is  crawled by a searche engine bot in the studied period.

Crawl schedule

Determining a schedule for recrawling pages.

Crawl schedular

Methods, systems, including computer programs used on a computer storage medium, for scheduling crawl requests by determining the integrity of a document and the crawl frequency on ot, i.e. by using the popularity of the document or the frequency of the content changes on it.

Rendering crawler

A browser-specific rendering crawler which can determine browser-specific behavior.

Thanks for taking time to read this post. I offer consulting, architecture and hands-on development services in web/digital to clients in Europe & North America. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn

Have comments, questions or feedback about this article? Please do share them with us here.

If you like this article

Follow Me on Twitter

Follow Searchdatalogy on Twitter

Legal Terms Privacy

Python SEO

Learn Programming for SEO on Dataizer: Learn Python for SEO

Data SEO

Walid Gabteni: Consultant SEO

Vincent Terrasi: Data Scientist SEO

Remi Bacha: Data Scientist SEO

Recent Posts

SEO data distribution analysis 5 years, 1 month ago
87 million domains pagerank 5 years, 10 months ago
SEO Forecasting 5 years, 11 months ago
SEO data analysis 5 years, 11 months ago
HTTP2 on top sites 6 years, 3 months ago
Desktop & mobile performances 6 years, 7 months ago
Alexa top 1 million sites 6 years, 8 months ago
1 million #SEO tweets 7 years, 8 months ago
SEO, six blind men & an elephant 7 years, 9 months ago
Technical SEO log analysis 7 years, 10 months ago
3 ways for free https 7 years, 11 months ago
Crawl dictionary 8 years ago
Https on top sites 8 years, 1 month ago
SEO web server log files 8 years, 1 month ago
Hsts on google.com 8 years, 2 months ago

Recent Tweets