SEO web server log files

Web Server Logs, SEO and Science

When you talk with SEO consultants or in general with people about SEO, you often hear them saying "SEO is not an exact science" ,"Search engines are driven by algorithms which are updated constantly, it is impossible to follow them" , "Nobody can fully understand how google works even the engineers who are working on it". I could agree with all  those statements only if the web server logs do not exist.

There is one simple fact everyone whether being an SEO professional or not will confirm is "If a page is not crawled by a search engine bot, it can not bring in any organic visits from that search engine". This basic truth alone is the entry point of SEO in science. Although we can not completely perceive how google as a search engine works, there is a way to find out how google perceives a website entirely by analyzing that website's web server logs. 

Web server logs can answer two important questions.

In a certain period of time which pages of the website

  1. Have been crawled by a search engine bot.
  2. Are bringing in organic traffic by that search engine.

Web server logs are the best, in fact the only place where you can check and have insights about how your crawl budget is spent.

Introduction to Web Server Logs

What is a server?

A server simply is a computer which provides data to other computers.

Many types of servers exist, including web servers, mail servers and file servers.

What is a web server?

In computer science, the term "web server" refers to both a physical machine and software.

In the first case, it is a computer connected to the Internet and hosting resources. These resources can be files, programs or databases.

In the second case, a web server is a set of programs which operate and publicize a website or web application.

What are the popular web server softwares?

Apache

• Open Source

• PHP,Perl, Python, Ruby

• Linux,Unix, Windows, Apple OS X

NGIN-X

• Open-Source

• PHP, Perl, Python, Ruby

• Linux,Unix, Windows, Apple OS X

IIS

• Commercial

• ASP.NET

• Windows

How to identify a web server software?

It is essential to know which web server software is running on a web server of a website before analyzing its logs.

There are several methods

1) Curl

2) Live HTTP Headers

On Chrome or Firefox, with this add on, it is possible to get HTTP headers of a page on a browser

3) Online Tools

Netcraft.com

Netcraft is also good for learning about the history of used web server softwares on the web server. 

http://toolbar.netcraft.com/site_report?url=www.bing.com

Builtwith.com

Builtwith is also handy to have an overall idea about the technologies used on the website as well as the web server software on the web server

What Is A Web Server Log File?

A web server log file is a file which keeps track of all requests that are sent to that web server. The requests can come from humans through browsers or from bots (like search engine bots) through crawlers. Each requests either from humans or robots produces a single line of code in these files. Web server log files are also very useful to analyze the audience of a website because they provide precise information on site traffic. 

What Does A Web Server Log File Look Like?

What Is The Information In One Line Of A Web Server Log File?

Web Server Softwares and Web Server Logs Format

http://httpd.apache.org/docs/current/logs.html#accesslog

http://nginx.org/en/docs/http/ngx_http_log_module.html


https://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/676400bc-8969-4aa7-851a-9319490a9bbb.mspx?mfr=true

Why Do We Need Web Server Log File Analysis For SEO?

Any analytics tools provide information about the pages bringing visits to a website but none of them provide complete information about the pages which are crawled by search engine bots.
The information given about indexed pages of a website from google  is not accurate, besides it is not possible to access or extract those pages completely.

The only tool that gives information on google crawl is Google Search Console  but ...

Why  Google Search Console Is Not Sufficient For SEO?

In SEO what we focus on is the contents and the strategical contents crawl by googlebot.

1) However at Search Console on crawl stats graph we see the crawl of all the google bots and all web elements crawls (css, javascript, flash, pdf, images) on a  web site.

2) We can not identify the status code of the crawled web pages on that crawl stats graph.

3) The crawl stats graph shows unique crawl but in SEO we would like to know how frequently (or how many times) a certain page or some pages are crawled too.

4) Crawled pages by googlebot can not be accessed or extracted from Search Console therefore we can not

  • Categorize them
  • Analyze them
  • Keep historical crawl data

5) We do not have access to all active pages ( pages receiving organic search engine visits) data. In Search Console we only have access to pages bringing Top and Middle Tail traffic, we do not have access to the active pages bringing in long tail traffic.

Thanks for taking time to read this post. I offer consulting, architecture and hands-on development services in web/digital to clients in Europe & North America. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn

Have comments, questions or feedback about this article? Please do share them with us here.

If you like this article

Follow Me on Twitter

Follow Searchdatalogy on Twitter

Comments

Legal Terms Privacy

Python SEO

Learn Programming for SEO on Dataizer: Learn Python for SEO

Data SEO

Walid Gabteni: Consultant SEO

Vincent Terrasi: Data Scientist SEO

Remi Bacha: Data Scientist SEO

Recent Posts

SEO data distribution analysis 4 years, 6 months ago
87 million domains pagerank 5 years, 3 months ago
SEO Forecasting 5 years, 4 months ago
SEO data analysis 5 years, 5 months ago
BrightonSEO conference 5 years, 5 months ago
HTTP2 on top sites 5 years, 8 months ago
What HTTP headers googlebot requests? 5 years, 11 months ago
Alexa top 1 million sites 6 years, 1 month ago
1 million #SEO tweets 7 years, 1 month ago
SEO, six blind men & an elephant 7 years, 2 months ago
Technical SEO log analysis 7 years, 3 months ago
3 ways for free https 7 years, 5 months ago
Crawl dictionary 7 years, 5 months ago
Https on top sites 7 years, 6 months ago
SEO web server log files 7 years, 7 months ago
Hsts on google.com 7 years, 7 months ago

Recent Tweets