Web Server Logs As Technical SEO Key Data Source

1. Web Server Logs As Technical SEO Key Data Source Aysun Akarsu // SearchDatalogy https://www.searchdatalogy.com/blog/brightonseo/ @aysunakarsu

2. @aysunakarsu @searchdatalogy #brightonseo

3. @aysunakarsu @searchdatalogy #brightonseo

4. @aysunakarsu @searchdatalogy #brightonseo Web Server Logs as Key Data Source in Technical SEO

5. @aysunakarsu @searchdatalogy #brightonseo There are many data sources in technical SEO but two are essential

6. @aysunakarsu @searchdatalogy #brightonseo Crawl Web server logs

7. @aysunakarsu @searchdatalogy #brightonseo If obliged to choose one of them, my choice would be ...

8. @aysunakarsu @searchdatalogy #brightonseo Because ...

9. @aysunakarsu @searchdatalogy #brightonseo Crawl of a website

10. @aysunakarsu @searchdatalogy #brightonseo Web server logs

11. @aysunakarsu @searchdatalogy #brightonseo What are web servers?

12. @aysunakarsu @searchdatalogy #brightonseo Web server

13. @aysunakarsu @searchdatalogy #brightonseo Web server Clients HTTP Request HTTP Response Server

14. @aysunakarsu @searchdatalogy #brightonseo Market share of web servers July 2019 Web Server Survey Developer June 2019 Percent July 2019 Percent Change Apache 54,879,492 29.39% 56,937,841 29.91% 0.52 nginx 38,382,083 20.55% 37,218,850 19.55% -1.00 Google 15,584,272 8.34% 16,266,687 8.54% 0.20 Microsoft 11,210,548 6.00% 11,308,526 5.94% -0.06 Active sites

15. @aysunakarsu @searchdatalogy #brightonseo Main web servers Apache Open source 30% ● PHP,Perl, Python, Ruby ● Linux, Unix, Windows, Apple OS X Nginx Open source 20% ● PHP, Perl, Python, Ruby ● Linux, Unix, Windows, Apple OS X Google Closed source 9% ● A custom server used only by Google Microsoft Closed source Payable 6% ● ASP.NET ● Windows

16. @aysunakarsu @searchdatalogy #brightonseo How to identify a web server ?

17. @aysunakarsu @searchdatalogy #brightonseo Curl

18. @aysunakarsu @searchdatalogy #brightonseo Live http headers on chrome

19. @aysunakarsu @searchdatalogy #brightonseo Cool apps https://www.wappalyzer.com/

20. @aysunakarsu @searchdatalogy #brightonseo Netcraft https://toolbar.netcraft.com/site_report?url=https%3A%2F%2Foutlook.live.com%2F

21. @aysunakarsu @searchdatalogy #brightonseo What is a log file?

22. @aysunakarsu @searchdatalogy #brightonseo What is a web server log file? HTTP Request HTTP Response GET URL HTTP/1.1 ………………………………. ………………………………. HTTP/1.1 200 OK ………………………………. ……………………………….

23. @aysunakarsu @searchdatalogy #brightonseo An entry 66.249.76.126 - - [02/Aug/2019:01:29:06 -0400] "GET /blog/brightonseo/ HTTP/1.1" 200 8790 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"" 1. IP address of the client (remote host) which made the request to the server. 2. The time that the server finished processing the request. 3. Request line from the client. The method used by the client is GET. The client requested the resource /blog/brightonseo/and the client used the protocol HTTP/1.1. 4. Status code that the server sends back to the client. 5. Size of the object returned to the client, not including the response headers. 6. Referrer. 7. User-agent.

24. @aysunakarsu @searchdatalogy #brightonseo A web server log file 66.249.66.91 - - [22/Jan/2019:03:58:20 +0330] "GET /filter/b656%2Cb703%2Cb67%2Cb226%2Cb41%2Cb598%2Cb168%2Cb723%2Cb597%2Cb88%2Cb548%2Cb6%2Cb679%2Cb215%2Cb105%2Cb194%2Cb74%2Cb542%2Cb35%2Cb113%2Cb820%2Cb574%2Cb442%2Cb880%2C b645%2Cb724%2Cb118%2Cb482%2Cb400%2Cb95%2Cb135%2Cb249%2Cb435%2Cb221%2Cb523%2Cb854%2Cb126%2Cstexists%2Cb216%2Cb217%2Cb152%2Cb99%2Cb188%2Cb209%2Cb192%2Cb213%2Cb136%2Cb218% 2Cb4%2Cb648%2Cb454%2Cb258%2Cb270%2Cb180?page=41 HTTP/1.1" 200 41505 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 66.249.66.194 - - [22/Jan/2019:03:58:20 +0330] "GET /m/filter/b118,b126,b148,b183,b186,b20,b212,b213,b219,b221,b224,b249,b3,b32,b484,b485,b523,b542,b613,b63,b67,b724,b734,b820,b874,b95,b99 HTTP/1.1" 200 24862 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 130.185.74.243 - - [22/Jan/2019:03:58:20 +0330] "GET /filter/stexists,p18554 HTTP/1.1" 200 31168 "-" "Mozilla/5.0 (Windows NT 6.1; rv:42.0) Gecko/20100101 Firefox/42.0" "-" 5.211.97.39 - - [22/Jan/2019:03:58:21 +0330] "HEAD /amp_preconnect_polyfill_404_or_other_error_expected._Do_not_worry_about_it?1548117180000 HTTP/1.1" 404 0 "https://www.zanbil.ir/m/browse/food-preparation/%D8%A2%D9%85%D8%A7%D8%AF%D9%87-%D8%B3%D8%A7%D8%B2%DB%8C-%D8%BA%D8%B0%D8%A7" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_2 like Mac OS X) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.0 Mobile/14F89 Safari/602.1" "-" 34.247.132.53 - - [22/Jan/2019:03:58:21 +0330] "GET / HTTP/1.1" 200 30697 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36" "-" 66.249.66.194 - - [22/Jan/2019:03:58:21 +0330] "GET /m/filter/b105,b126,b135,b148,b168,b180,b183,b212,b213,b216,b219,b252,b35,b484,b597,b598,b605,b612,b613,b615,b655,b718,b719,b724,b8,b879,b880,b884,p1 HTTP/1.1" 200 23446 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 5.209.200.218 - - [22/Jan/2019:03:58:22 +0330] "GET /settings/logo HTTP/1.1" 200 4120 "https://www.zanbil.ir/m/filter/b99%2Cp4510%2Cstexists%2Ct116" "Mozilla/5.0 (Linux; Android 5.1.1; SM-G361H Build/LMY48B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.91 Mobile Safari/537.36" "-" 130.185.74.243 - - [22/Jan/2019:03:58:22 +0330] "GET /filter/stexists,p24656 HTTP/1.1" 200 28929 "-" "Mozilla/5.0 (Windows NT 6.1; rv:42.0) Gecko/20100101 Firefox/42.0" "-" 207.46.13.136 - - [22/Jan/2019:03:58:22 +0330] "GET /product/19210?model=40967 HTTP/1.1" 200 42352 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-" 66.249.66.194 - - [22/Jan/2019:03:58:22 +0330] "GET /product/12705 HTTP/1.1" 404 33559 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 207.46.13.104 - - [22/Jan/2019:03:58:23 +0330] "GET /filter/b152,b74 HTTP/1.1" 200 36098 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-" 157.55.39.245 - - [22/Jan/2019:03:58:23 +0330] "GET /product/31991/%D9%85%D8%AE%D9%84%D9%88%D8%B7-%DA%A9%D9%86-%DA%A9%D9%86%D9%88%D9%88%D8%AF-%D9%85%D8%AF%D9%84-BLP402 HTTP/1.1" 200 42063 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-" 130.185.74.243 - - [22/Jan/2019:03:58:23 +0330] "GET /filter/stexists,p24657 HTTP/1.1" 200 30628 "-" "Mozilla/5.0 (Windows NT 6.1; rv:42.0) Gecko/20100101 Firefox/42.0" "-" 66.249.66.91 - - [22/Jan/2019:03:58:23 +0330] "GET /filter/b481%2Cb874%2Cb32%2Cb67%2Cb36%2Cb226%2Cb41%2Cb136%2Cb570%2Cb598%2Cb180%2Cb615%2Cb168%2Cb648%2Cb103%2Cb80%2Cb213%2Cb597%2Cb724%2Cb613%2Cb135%2Cb877%2Cb183%2Cb497% 2Cb435%2Cb194%2Cb861%2Cb256%2Cb854%2Cb198%2Cb647%2Cb679%2Cb88%2Cb441%2Cb6%2Cb221%2Cb645%2Cb219%2Cb50%2Cb151%2Cb192%2Cstexists%2Cb203?page=11 HTTP/1.1" 200 39549 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 207.46.13.136 - - [22/Jan/2019:03:58:24 +0330] "GET /filter/b180,p34 HTTP/1.1" 200 31579 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-" 207.46.13.136 - - [22/Jan/2019:03:58:24 +0330] "GET /filter/b111,b404,b614 HTTP/1.1" 200 35722 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-" 207.46.13.104 - - [22/Jan/2019:03:58:23 +0330] "GET /filter/b152,b74 HTTP/1.1" 200 36098 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-"

25. @aysunakarsu @searchdatalogy #brightonseo Log files for what ?

26. @aysunakarsu @searchdatalogy #brightonseo Log files can tell ● Past: What has happened? Trend, seasonality... ● Present: Crawl-first SEO metrics? What shall we do now? ● Future: Predictions of SEO data.

27. @aysunakarsu @searchdatalogy #brightonseo Before

28. @aysunakarsu @searchdatalogy #brightonseo ● Minimum number of monthly data ○ Past: 4+ months ○ Present : 1+ month ○ Future : 24+ months How many months?

29. @aysunakarsu @searchdatalogy #brightonseo ● Send a questionnaire ● Send the format of the logs you are expecting ● Receive a sample of logs ● Check the sample Send,receive,check

30. @aysunakarsu @searchdatalogy #brightonseo ● Ask for the required number of months of log files ● Collect them Collect

31. @aysunakarsu @searchdatalogy #brightonseo ● Some issues to check ○ All are collected? ○ Which host? ○ Which protocol? Verify

32. @aysunakarsu @searchdatalogy #brightonseo After

33. @aysunakarsu @searchdatalogy #brightonseo Googlebot Search engine visits Filter real

34. @aysunakarsu @searchdatalogy #brightonseo Command line https://www.searchdatalogy.com/blog/is-my-site-in-googles-mobile-first-index/ Python https://github.com/rory/apache-log-parser Parse

35. @aysunakarsu @searchdatalogy #brightonseo Parse: Example 66.249.76.126 - - [13/Sep/2019:06:35:45 -0400] "GET /brightonseo/confs?year=2019 HTTP/1.1" 200 9381 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" {'remote_ip': '66.249.76.126', 'request_first_line': 'GET /brightonseo/confs?year=2019 HTTP/1.1', 'request_header_referer': '-', 'request_header_user_agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', 'request_http_ver': '1.1', 'request_method': 'GET', 'request_url': '/brightonseo/confs?year=2019', 'request_url_fragment': '', 'request_url_hostname': None, 'request_url_netloc': '', 'request_url_password': None, 'request_url_path': '/brightonseo/confs', 'request_url_port': None, 'request_url_query': 'year=2019', 'request_url_query_dict': {'year': ['2019']}, 'request_url_query_list': [('year', '2019')], 'request_url_query_simple_dict': {'year': '2019'}, 'request_url_scheme': '', 'request_url_username': None, 'response_bytes_clf': '9381', 'status': '200', 'time_received': '[13/Sep/2019:06:35:45 -0400]', 'time_received_datetimeobj': datetime.datetime(2019, 9, 13, 6, 35, 45), 'time_received_isoformat': '2019-09-13T06:35:45', 'time_received_tz_datetimeobj': datetime.datetime(2019, 9, 13, 6, 35, 45, tzinfo='0400'), 'time_received_tz_isoformat': '2019-09-13T06:35:45-04:00', 'time_received_utc_datetimeobj': datetime.datetime(2019, 9, 13, 10, 35, 45, tzinfo='0000'), 'time_received_utc_isoformat': '2019-09-13T10:35:45+00:00'}

36. @aysunakarsu @searchdatalogy #brightonseo Store structured data

37. @aysunakarsu @searchdatalogy #brightonseo Visualize data https://www.searchdatalogy.com/blog/seo-data-analysis/

38. @aysunakarsu @searchdatalogy #brightonseo Past: Example In 2012 I realized an SEO Audit

39. @aysunakarsu @searchdatalogy #brightonseo Past: Example After analysing 8 months of web server logs, I asked the client whether ...

40. @aysunakarsu @searchdatalogy #brightonseo Past: Example The answer was ...

41. @aysunakarsu @searchdatalogy #brightonseo Past: Trend, seasonality https://www.searchdatalogy.com/blog/seo-data-analysis/

42. @aysunakarsu @searchdatalogy #brightonseo Past: Crawled & active pages distribution https://www.searchdatalogy.com/blog/seo-data-distribution-analysis/

43. @aysunakarsu @searchdatalogy #brightonseo Present: Crawl-first SEO metrics 1 Craw lratio 2 Active pages ratio 3 Craw lfrequency

44. @aysunakarsu @searchdatalogy #brightonseo Present: Recommendations Close to crawl some type of pages due to their very low crawl and active pages ratio

45. @aysunakarsu @searchdatalogy #brightonseo Present: Recommendations Make structural changes to increase crawl and active pages ratio of some type of pages

46. @aysunakarsu @searchdatalogy #brightonseo Future: Predictions https://www.searchdatalogy.com/blog/forecasting-seo-data/

47. @aysunakarsu @searchdatalogy #brightonseo Thank you Photos on Unsplash by Christopher Burns Cristian Escobar Darius Bashar Gabriel Santiago Gian d JZxairpkhho Krissia Cruz Ludomil Marcus Wallis Mark Kamalov Maxime Valcarce Rene Bohmer