Posts by Mad_Max

1) Message boards : Questions and problems : False positive network errors detection due to goggle CAPTCHA on reference site (Message 96224)
Posted 29 Feb 2020 by Mad_Max
Post:
Catch it again on another computer with other (never and latest stable release - 7.14.2 ) BOINC client version.
Much more detail in debug in this BOINC ver but exactly same behavior and result - any responses from google other than "200 - OK" is interpreted for some reason as the lack of an Internet connection. Although the simple fact of receiving at least some kind of response (including error codes from the server) means the exactly opposite - that the Internet connection is in order and these are just some errors on the server’s side (in this case, Google banned my ip again due to many BOINC clients often send requests to it and trigger google anti-bot protection).

01/03/2020 00:48:06 | Rosetta@home | [http] [ID#10522] Info: Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds
01/03/2020 00:48:06 | Rosetta@home | [http] [ID#10522] Info: Closing connection 3014
01/03/2020 00:48:06 | Rosetta@home | [http] HTTP error: Timeout was reached
01/03/2020 00:48:06 | Rosetta@home | Temporarily failed download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip: transient HTTP error
01/03/2020 00:48:06 | Rosetta@home | Backing off 03:56:16 on download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip
01/03/2020 00:48:09 | | Project communication failed: attempting access to reference site
01/03/2020 00:48:09 | | [http] HTTP_OP::init_get(): https://www.google.com/
01/03/2020 00:48:09 | | [http] HTTP_OP::libcurl_exec(): ca-bundle set
01/03/2020 00:48:10 | | [http] [ID#0] Info: Connection 3015 seems to be dead!
01/03/2020 00:48:10 | | [http] [ID#0] Info: Closing connection 3015
01/03/2020 00:48:10 | | [http] [ID#0] Info: Connection 3016 seems to be dead!
01/03/2020 00:48:10 | | [http] [ID#0] Info: Closing connection 3016
01/03/2020 00:48:10 | | [http] [ID#0] Info: Trying 74.125.205.147...
01/03/2020 00:48:10 | | [http] [ID#0] Info: Connected to www.google.com (74.125.205.147) port 443 (#3017)
01/03/2020 00:48:10 | | [http] [ID#0] Info: ALPN, offering http/1.1
01/03/2020 00:48:10 | | [http] [ID#0] Info: Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
01/03/2020 00:48:10 | | [http] [ID#0] Info: successfully set certificate verify locations:
01/03/2020 00:48:10 | | [http] [ID#0] Info: CAfile: D:\Boinc\ca-bundle.crt
01/03/2020 00:48:10 | | [http] [ID#0] Info: CApath: none
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (OUT), TLS header, Certificate Status (22):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (OUT), TLS handshake, Client hello (1):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (IN), TLS handshake, Server hello (2):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (IN), TLS handshake, Certificate (11):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (IN), TLS handshake, Server key exchange (12):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (IN), TLS handshake, Server finished (14):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (OUT), TLS change cipher, Client hello (1):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (OUT), TLS handshake, Finished (20):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (IN), TLS change cipher, Client hello (1):
01/03/2020 00:48:10 | | [http] [ID#0] Info: TLSv1.2 (IN), TLS handshake, Finished (20):
01/03/2020 00:48:10 | | [http] [ID#0] Info: SSL connection using TLSv1.2 / ECDHE-ECDSA-AES128-GCM-SHA256
01/03/2020 00:48:10 | | [http] [ID#0] Info: ALPN, server accepted to use http/1.1
01/03/2020 00:48:10 | | [http] [ID#0] Info: Server certificate:
01/03/2020 00:48:10 | | [http] [ID#0] Info: subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=www.google.com
01/03/2020 00:48:10 | | [http] [ID#0] Info: start date: Feb 12 11:47:41 2020 GMT
01/03/2020 00:48:10 | | [http] [ID#0] Info: expire date: May 6 11:47:41 2020 GMT
01/03/2020 00:48:10 | | [http] [ID#0] Info: subjectAltName: www.google.com matched
01/03/2020 00:48:10 | | [http] [ID#0] Info: issuer: C=US; O=Google Trust Services; CN=GTS CA 1O1
01/03/2020 00:48:10 | | [http] [ID#0] Info: SSL certificate verify ok.
01/03/2020 00:48:10 | | [http] [ID#0] Sent header to server: GET / HTTP/1.1
01/03/2020 00:48:10 | | [http] [ID#0] Sent header to server: Host: www.google.com
01/03/2020 00:48:10 | | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.14.2)
01/03/2020 00:48:10 | | [http] [ID#0] Sent header to server: Accept: */*
01/03/2020 00:48:10 | | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
01/03/2020 00:48:10 | | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded
01/03/2020 00:48:10 | | [http] [ID#0] Sent header to server: Accept-Language: en_GB
01/03/2020 00:48:10 | | [http] [ID#0] Sent header to server:
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: HTTP/1.1 302 Found
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Location: https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgRZs32xGJi46_IFIhkA8aeDS0dnNqSxPMmj06j7bUxA4n2V4HsWMgFy
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Date: Sat, 29 Feb 2020 21:48:08 GMT
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Pragma: no-cache
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Expires: Fri, 01 Jan 1990 00:00:00 GMT
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Cache-Control: no-store, no-cache, must-revalidate
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Content-Type: text/html; charset=UTF-8
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Server: HTTP server (unknown)
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Content-Length: 327
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: X-XSS-Protection: 0
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Alt-Svc: quic=":443"; ma=2592000; v="46,43",h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server:
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <TITLE>302 Moved</TITLE></HEAD><BODY>
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <H1>302 Moved</H1>
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: The document has moved
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <A HREF="https://www.google.com/sorry/index?continue=https://www.google.com/&amp;q=EgRZs32xGJi46_IFIhkA8aeDS0dnNqSxPMmj06j7bUxA4n2V4HsWMgFy">here</A>.
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: </BODY></HTML>
01/03/2020 00:48:11 | | [http] [ID#0] Info: Ignoring the response-body
01/03/2020 00:48:11 | | [http] [ID#0] Info: Connection #3017 to host www.google.com left intact
01/03/2020 00:48:11 | | [http] [ID#0] Info: Issue another request to this URL: 'https://www.google.com/sorry/index?continue=https://www.google.com/&q=EgRZs32xGJi46_IFIhkA8aeDS0dnNqSxPMmj06j7bUxA4n2V4HsWMgFy'
01/03/2020 00:48:11 | | [http] [ID#0] Info: Found bundle for host www.google.com: 0x59e28a0 [can pipeline]
01/03/2020 00:48:11 | | [http] [ID#0] Info: Re-using existing connection! (#3017) with host www.google.com
01/03/2020 00:48:11 | | [http] [ID#0] Info: Connected to www.google.com (74.125.205.147) port 443 (#3017)
01/03/2020 00:48:11 | | [http] [ID#0] Sent header to server: GET /sorry/index?continue=https://www.google.com/&q=EgRZs32xGJi46_IFIhkA8aeDS0dnNqSxPMmj06j7bUxA4n2V4HsWMgFy HTTP/1.1
01/03/2020 00:48:11 | | [http] [ID#0] Sent header to server: Host: www.google.com
01/03/2020 00:48:11 | | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.14.2)
01/03/2020 00:48:11 | | [http] [ID#0] Sent header to server: Accept: */*
01/03/2020 00:48:11 | | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
01/03/2020 00:48:11 | | [http] [ID#0] Sent header to server: Referer: https://www.google.com/
01/03/2020 00:48:11 | | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded
01/03/2020 00:48:11 | | [http] [ID#0] Sent header to server: Accept-Language: en_GB
01/03/2020 00:48:11 | | [http] [ID#0] Sent header to server:
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: HTTP/1.1 429 Too Many Requests
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Date: Sat, 29 Feb 2020 21:48:08 GMT
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Pragma: no-cache
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Expires: Fri, 01 Jan 1990 00:00:00 GMT
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Cache-Control: no-store, no-cache, must-revalidate
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Content-Type: text/html
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Server: HTTP server (unknown)
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Content-Length: 2801
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: X-XSS-Protection: 0
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: Alt-Svc: quic=":443"; ma=2592000; v="46,43",h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server:
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <html>
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <head><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta name="viewport" content="initial-scale=1"><title>https://www.google.com/</title></head>
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <body style="font-family: arial, sans-serif; background-color: #fff; color: #000; padding:20px; font-size:18px;" onload="e=document.getElementById('captcha');if(e){e.focus();}">
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <div style="max-width:400px;">
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <hr noshade size="1" style="color:#ccc; background-color:#ccc;"><br>
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <form id="captcha-form" action="index" method="post">
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <script src="https://www.google.com/recaptcha/api.js" async defer></script>
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <script>var submitCallback = function(response) {document.getElementById('captcha-form').submit();};</script>
01/03/2020 00:48:11 | | [http] [ID#0] Received header from server: <div id="recaptcha" class="g-recaptcha" data-sitekey="6LfwuyUTAAAAAOAmoS0fdqijC2PbbdH4kjq62Y1b" data-callback="submitCallback" data-s="8ckgsqksZQfD5k9zBVdIUa5lo_spGlIHsnvk
01/03/2020 00:48:11 | | [http] [ID#0] Info: Connection #3017 to host www.google.com left intact
01/03/2020 00:48:11 | | BOINC can't access Internet - check network connection or proxy configuration.
2) Message boards : Questions and problems : False positive network errors detection due to goggle CAPTCHA on reference site (Message 96221)
Posted 29 Feb 2020 by Mad_Max
Post:
Yes, definitely something wrong at Rosetta (their servers can not send some of files properly) AND something wrong with BOINC client at same time. Because after communication error with one of attached projects (R@H in this example but it can be any other) it try to check Internet connection by "pinging" google but handle goggle responses wrong and declare internet connection is down while it actually works fine (and issues only with just one file from one of the projects, not with internet connection):
29/02/2020 21:37:51 | | BOINC can't access Internet - check network connection or proxy configuration.
3) Message boards : Questions and problems : Stalled downloads (Message 96220)
Posted 29 Feb 2020 by Mad_Max
Post:
Yes, it is a two separate issues:
First: stalled downloads of some files from R@H. Its not related to BOINC client (i also cant download such files in browser or any download manager). Although this may be due to errors in the BOINC server code running on R@H server. Or some problems with R@H servers itself. I and few other volunteers have already reported it on R@H. Its theirs responsibility to check and fix (or report to BOINC devs if they think servers OK)

Second: BOINC client handle such "stuck download" situation very badly - it stucks in indefinite loop trying (and failing each time) to download same file every ~3 hours and completely stop requesting new work from a project where single stalled download is present until the user manual intervention. Even if work queue gone empty it still refuse ask new work from a project if it see any stalled download.
If there was a "backup" project set by user - BOINC switch to backup project completely. If there is no "backup" project (or it have some issues too) client just idle.
4) Message boards : Questions and problems : Stalled downloads (Message 96216)
Posted 29 Feb 2020 by Mad_Max
Post:
Yep. it continue to repeat from time to time. Just became more rare compared to beginning of Feb.
There is a fresh example with stuck R@H download.
It still does not get any new work while even single file download is stalling.
And still no any logs entries even with http_debug when i abort such stale downoads manually.
And still get new work after stale download already aborted.
29/02/2020 21:32:46 | Rosetta@home | [http] HTTP_OP::init_get(): http://boinc.bakerlab.org/rosetta/download/fc/rb_02_24_16848_16671_ab_t000__h002_robetta.zip
29/02/2020 21:32:46 | Rosetta@home | [http] HTTP_OP::libcurl_exec(): ca-bundle 'D:\Boinc\ca-bundle.crt'
29/02/2020 21:32:46 | Rosetta@home | [http] HTTP_OP::libcurl_exec(): ca-bundle set
29/02/2020 21:32:46 | Rosetta@home | Started download of rb_02_24_16848_16671_ab_t000__h002_robetta.zip
29/02/2020 21:32:46 | Rosetta@home | [http] [ID#10370] Info:  Connection 4149 seems to be dead!
29/02/2020 21:32:46 | Rosetta@home | [http] [ID#10370] Info:  Closing connection 4149
29/02/2020 21:32:46 | Rosetta@home | [http] [ID#10370] Info:  Connection 4150 seems to be dead!
29/02/2020 21:32:46 | Rosetta@home | [http] [ID#10370] Info:  Closing connection 4150
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Info:    Trying 128.95.160.157...
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Info:  Connected to boinc.bakerlab.org (128.95.160.157) port 80 (#4151)
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: GET /rosetta/download/fc/rb_02_24_16848_16671_ab_t000__h002_robetta.zip HTTP/1.1
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Host: boinc.bakerlab.org
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.22)
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Accept: */*
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Accept-Encoding: deflate, gzip
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Content-Type: application/x-www-form-urlencoded
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Accept-Language: en_GB
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server:
29/02/2020 21:32:48 | Rosetta@home | [http] [ID#10370] Received header from server: HTTP/1.1 200 OK
29/02/2020 21:37:48 | Rosetta@home | [http] [ID#10370] Info:  Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds
29/02/2020 21:37:48 | Rosetta@home | [http] [ID#10370] Info:  Closing connection 4151
29/02/2020 21:37:48 | Rosetta@home | [http] HTTP error: Timeout was reached
29/02/2020 21:37:48 | Rosetta@home | Temporarily failed download of rb_02_24_16848_16671_ab_t000__h002_robetta.zip: transient HTTP error
29/02/2020 21:37:48 | Rosetta@home | Backing off 03:06:40 on download of rb_02_24_16848_16671_ab_t000__h002_robetta.zip
29/02/2020 21:37:49 |  | Project communication failed: attempting access to reference site
.......................
I aborted stuck download of rb_02_24_16848_16671_ab_t000__h002_robetta.zip and press "update project"
29/02/2020 22:17:16 | Rosetta@home | update requested by user
29/02/2020 22:17:16 |  | [http] HTTP_OP::init_get(): http://boinc.bakerlab.org/rosetta/notices.php?userid=xxxx
29/02/2020 22:17:16 |  | [http] HTTP_OP::libcurl_exec(): ca-bundle set
29/02/2020 22:17:16 |  | [http] [ID#0] Info:  Connection 4172 seems to be dead!
29/02/2020 22:17:16 |  | [http] [ID#0] Info:  Closing connection 4172
29/02/2020 22:17:16 |  | [http] [ID#0] Info:  Connection 4173 seems to be dead!
29/02/2020 22:17:16 |  | [http] [ID#0] Info:  Closing connection 4173
29/02/2020 22:17:16 |  | [http] [ID#0] Info:  Connection 4174 seems to be dead!
29/02/2020 22:17:16 |  | [http] [ID#0] Info:  Closing connection 4174
29/02/2020 22:17:17 |  | [http] [ID#0] Info:    Trying 128.95.160.156...
29/02/2020 22:17:17 |  | [http] [ID#0] Info:  Connected to boinc.bakerlab.org (128.95.160.156) port 80 (#4175)
29/02/2020 22:17:17 |  | [http] [ID#0] Sent header to server: GET /rosetta/notices.php?userid=xxxxx
29/02/2020 22:17:17 |  | [http] [ID#0] Sent header to server: Host: boinc.bakerlab.org
29/02/2020 22:17:17 |  | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.22)
29/02/2020 22:17:17 |  | [http] [ID#0] Sent header to server: Accept: */*
29/02/2020 22:17:17 |  | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
29/02/2020 22:17:17 |  | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded
29/02/2020 22:17:17 |  | [http] [ID#0] Sent header to server: Accept-Language: en_GB
29/02/2020 22:17:17 |  | [http] [ID#0] Sent header to server:
29/02/2020 22:17:17 |  | [http] [ID#0] Received header from server: HTTP/1.1 200 OK
29/02/2020 22:17:17 |  | [http] [ID#0] Received header from server: Date: Sat, 29 Feb 2020 19:17:08 GMT
29/02/2020 22:17:17 |  | [http] [ID#0] Received header from server: Server: Apache/2.4.18
29/02/2020 22:17:17 |  | [http] [ID#0] Received header from server: Expires: Sat, 29 Feb 2020 19:17:08 GMT
29/02/2020 22:17:17 |  | [http] [ID#0] Received header from server: Last-Modified: Sat, 29 Feb 2020 19:17:08 GMT
29/02/2020 22:17:17 |  | [http] [ID#0] Received header from server: Vary: Accept-Encoding
29/02/2020 22:17:17 |  | [http] [ID#0] Received header from server: Content-Length: 1968
29/02/2020 22:17:17 |  | [http] [ID#0] Received header from server: Content-Type: application/xml
29/02/2020 22:17:17 |  | [http] [ID#0] Received header from server:
29/02/2020 22:17:17 |  | [http] [ID#0] Info:  Connection #4175 to host boinc.bakerlab.org left intact
29/02/2020 22:17:20 | Rosetta@home | Sending scheduler request: Requested by user.
29/02/2020 22:17:20 | Rosetta@home | Not requesting tasks: some download is stalled
29/02/2020 22:17:20 | Rosetta@home | [http] HTTP_OP::init_post(): http://srv4.bakerlab.org/rosetta_cgi/cgi
29/02/2020 22:17:20 | Rosetta@home | [http] HTTP_OP::libcurl_exec(): ca-bundle set
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Info:    Trying 128.95.160.156...
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Info:  Connected to srv4.bakerlab.org (128.95.160.156) port 80 (#4176)
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server: POST /rosetta_cgi/cgi HTTP/1.1
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server: Host: srv4.bakerlab.org
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.22)
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server: Accept: */*
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server: Accept-Language: en_GB
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server: Content-Length: 15400
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server: Expect: 100-continue
29/02/2020 22:17:21 | Rosetta@home | [http] [ID#1] Sent header to server:
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Received header from server: HTTP/1.1 100 Continue
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Info:  We are completely uploaded and fine
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Received header from server: HTTP/1.1 200 OK
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Received header from server: Date: Sat, 29 Feb 2020 19:17:12 GMT
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Received header from server: Server: Apache/2.4.18
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Received header from server: Vary: Accept-Encoding
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Received header from server: Content-Length: 2389
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Received header from server: Content-Type: text/xml
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Received header from server:
29/02/2020 22:17:22 | Rosetta@home | [http] [ID#1] Info:  Connection #4176 to host srv4.bakerlab.org left intact
29/02/2020 22:17:22 | Rosetta@home | Scheduler request completed


Work fetch resume only after manual stalled download abort AND BOINC restart.
5) Message boards : Questions and problems : False positive network errors detection due to goggle CAPTCHA on reference site (Message 96211)
Posted 29 Feb 2020 by Mad_Max
Post:
I have caught it another time, now with http_debug on.
No BOINC does not just get headers, it try to retrieve page, failed it (due to google ban bot activity) and declare that no internet connection is available (while it just actually get connection to goggle successfully).
Because BOINC want only HTTP "200 - OK" server response as working internet connection, and interpret all other server responses (like 302, 403, 429 etc) as if there is not internet connection (lol, how it get these error codes from server then? )

29/02/2020 21:32:35 |  | Re-reading cc_config.xml
29/02/2020 21:32:35 |  | Config: use all coprocessors
29/02/2020 21:32:35 |  | log flags: file_xfer, sched_ops, task, http_debug
29/02/2020 21:32:35 | Rosetta@home | Found app_config.xml
29/02/2020 21:32:46 | Rosetta@home | [http] HTTP_OP::init_get(): http://boinc.bakerlab.org/rosetta/download/fc/rb_02_24_16848_16671_ab_t000__h002_robetta.zip
29/02/2020 21:32:46 | Rosetta@home | [http] HTTP_OP::libcurl_exec(): ca-bundle 'D:\Boinc\ca-bundle.crt'
29/02/2020 21:32:46 | Rosetta@home | [http] HTTP_OP::libcurl_exec(): ca-bundle set
29/02/2020 21:32:46 | Rosetta@home | Started download of rb_02_24_16848_16671_ab_t000__h002_robetta.zip
29/02/2020 21:32:46 | Rosetta@home | [http] [ID#10370] Info:  Connection 4149 seems to be dead!
29/02/2020 21:32:46 | Rosetta@home | [http] [ID#10370] Info:  Closing connection 4149
29/02/2020 21:32:46 | Rosetta@home | [http] [ID#10370] Info:  Connection 4150 seems to be dead!
29/02/2020 21:32:46 | Rosetta@home | [http] [ID#10370] Info:  Closing connection 4150
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Info:    Trying 128.95.160.157...
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Info:  Connected to boinc.bakerlab.org (128.95.160.157) port 80 (#4151)
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: GET /rosetta/download/fc/rb_02_24_16848_16671_ab_t000__h002_robetta.zip HTTP/1.1
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Host: boinc.bakerlab.org
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.22)
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Accept: */*
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Accept-Encoding: deflate, gzip
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Content-Type: application/x-www-form-urlencoded
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server: Accept-Language: en_GB
29/02/2020 21:32:47 | Rosetta@home | [http] [ID#10370] Sent header to server:
29/02/2020 21:32:48 | Rosetta@home | [http] [ID#10370] Received header from server: HTTP/1.1 200 OK
29/02/2020 21:37:48 | Rosetta@home | [http] [ID#10370] Info:  Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds
29/02/2020 21:37:48 | Rosetta@home | [http] [ID#10370] Info:  Closing connection 4151
29/02/2020 21:37:48 | Rosetta@home | [http] HTTP error: Timeout was reached
29/02/2020 21:37:48 | Rosetta@home | Temporarily failed download of rb_02_24_16848_16671_ab_t000__h002_robetta.zip: transient HTTP error
29/02/2020 21:37:48 | Rosetta@home | Backing off 03:06:40 on download of rb_02_24_16848_16671_ab_t000__h002_robetta.zip
29/02/2020 21:37:49 |  | Project communication failed: attempting access to reference site
29/02/2020 21:37:49 |  | [http] HTTP_OP::init_get(): http://www.google.com/
29/02/2020 21:37:49 |  | [http] HTTP_OP::libcurl_exec(): ca-bundle set
29/02/2020 21:37:50 |  | [http] [ID#0] Info:    Trying 64.233.165.105...
29/02/2020 21:37:50 |  | [http] [ID#0] Info:  Connected to www.google.com (64.233.165.105) port 80 (#4154)
29/02/2020 21:37:50 |  | [http] [ID#0] Sent header to server: GET / HTTP/1.1
29/02/2020 21:37:50 |  | [http] [ID#0] Sent header to server: Host: www.google.com
29/02/2020 21:37:50 |  | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.22)
29/02/2020 21:37:50 |  | [http] [ID#0] Sent header to server: Accept: */*
29/02/2020 21:37:50 |  | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
29/02/2020 21:37:50 |  | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded
29/02/2020 21:37:50 |  | [http] [ID#0] Sent header to server: Accept-Language: en_GB
29/02/2020 21:37:50 |  | [http] [ID#0] Sent header to server:
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: HTTP/1.1 302 Found
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Location: http://www.google.com/sorry/index?continue=http://www.google.com/&q=EgRZs32xGPXe6vIFIhkA8aeDS8WqECNzKWsTIfjUf6t7VcsFxSoZMgFy
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Date: Sat, 29 Feb 2020 18:37:42 GMT
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Pragma: no-cache
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Expires: Fri, 01 Jan 1990 00:00:00 GMT
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Cache-Control: no-store, no-cache, must-revalidate
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Content-Type: text/html; charset=UTF-8
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Server: HTTP server (unknown)
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Content-Length: 325
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: X-XSS-Protection: 0
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server:
29/02/2020 21:37:51 |  | [http] [ID#0] Info:  Ignoring the response-body
29/02/2020 21:37:51 |  | [http] [ID#0] Info:  Connection #4154 to host www.google.com left intact
29/02/2020 21:37:51 |  | [http] [ID#0] Info:  Issue another request to this URL: 'http://www.google.com/sorry/index?continue=http://www.google.com/&q=EgRZs32xGPXe6vIFIhkA8aeDS8WqECNzKWsTIfjUf6t7VcsFxSoZMgFy'
29/02/2020 21:37:51 |  | [http] [ID#0] Info:  Found bundle for host www.google.com: 0x3b7fc10
29/02/2020 21:37:51 |  | [http] [ID#0] Info:  Re-using existing connection! (#4154) with host www.google.com
29/02/2020 21:37:51 |  | [http] [ID#0] Info:  Connected to www.google.com (64.233.165.105) port 80 (#4154)
29/02/2020 21:37:51 |  | [http] [ID#0] Sent header to server: GET /sorry/index?continue=http://www.google.com/&q=EgRZs32xGPXe6vIFIhkA8aeDS8WqECNzKWsTIfjUf6t7VcsFxSoZMgFy HTTP/1.1
29/02/2020 21:37:51 |  | [http] [ID#0] Sent header to server: Host: www.google.com
29/02/2020 21:37:51 |  | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.22)
29/02/2020 21:37:51 |  | [http] [ID#0] Sent header to server: Accept: */*
29/02/2020 21:37:51 |  | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
29/02/2020 21:37:51 |  | [http] [ID#0] Sent header to server: Referer: http://www.google.com/
29/02/2020 21:37:51 |  | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded
29/02/2020 21:37:51 |  | [http] [ID#0] Sent header to server: Accept-Language: en_GB
29/02/2020 21:37:51 |  | [http] [ID#0] Sent header to server:
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: HTTP/1.1 429 Too Many Requests
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Date: Sat, 29 Feb 2020 18:37:42 GMT
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Pragma: no-cache
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Expires: Fri, 01 Jan 1990 00:00:00 GMT
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Cache-Control: no-store, no-cache, must-revalidate
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Content-Type: text/html
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Server: HTTP server (unknown)
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: Content-Length: 2798
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server: X-XSS-Protection: 0
29/02/2020 21:37:51 |  | [http] [ID#0] Received header from server:
29/02/2020 21:37:51 |  | [http] [ID#0] Info:  Connection #4154 to host www.google.com left intact
29/02/2020 21:37:51 |  | BOINC can't access Internet - check network connection or proxy configuration.

Also page it get from redirect URL ( http://www.google.com/sorry/index?continue=http://www.google.com/&q=EgRZs32xGPXe6vIFIhkA8aeDS8WqECNzKWsTIfjUf6t7VcsFxSoZMgFy) was saved to lookup_website.html file again.

Now i know why Google so often ask me solve CAPTCHA and block me for a "bot activity" on my personal computer - it because of few BOINC clients in the same home network with same public IP keep triggering it.
6) Message boards : Questions and problems : False positive network errors detection due to goggle CAPTCHA on reference site (Message 96014)
Posted 23 Feb 2020 by Mad_Max
Post:
If BOINC does not actually connect to Google, how lookup_website.html file gets it content?
If i delete it BOINC recreates this file after some time and it has fresh content inside each time - my IP and date/time of last access - all generated by Goggle and somehow downloaded and saved to disk by BOINC client. Here is contend of one o such files:
 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta name="viewport" content="initial-scale=1"><title>http://www.google.com/</title></head>
<body style="font-family: arial, sans-serif; background-color: #fff; color: #000; padding:20px; font-size:18px;" onload="e=document.getElementById('captcha');if(e){e.focus();}">
<div style="max-width:400px;">
<hr noshade size="1" style="color:#ccc; background-color:#ccc;"><br>
<form id="captcha-form" action="index" method="post">
<script src="https://www.google.com/recaptcha/api.js" async defer></script>
<script>var submitCallback = function(response) {document.getElementById('captcha-form').submit();};</script>
<div id="recaptcha" class="g-recaptcha" data-sitekey="6LfwuyUTAAAAAOAmoS0fdqijC2PbbdH4kjq62Y1b" data-callback="submitCallback" data-s="DoSc4_miAI-qVjUtEoUlnteeT9Q9Bf59QLu8jDTpyQ8CT90ihe7HbLY1pJGX2CFt9RvIz7YDJhdJABGnQKclpySQ0SbcW9sf0JwrcQ8W6OxQKapvD6sk_caRd43IZIp6JjqtzfDYXy8ta1w90ATF47xsn2T1QVhA5cGOBQr8iWOdx_wmeqWkVw9Pz7ri0alB7jb0T4f_O_uDeRgSlKzGgla9mp3fQ6DsET5X1HV02urupwe_fooQtrA"></div>
<input type='hidden' name='q' value='EgRZs32xGK_Hu_IFIhkA8aeDS4AG37biC27dRj3UUyv57cYQe7YNMgFy'><input type="hidden" name="continue" value="http://www.google.com/">
</form>
<hr noshade size="1" style="color:#ccc; background-color:#ccc;">
 
<div style="font-size:13px;">
<b>About this page</b><br><br>
 
Our systems have detected unusual traffic from your computer network.  This page checks to see if it&#39;s really you sending the requests, and not a robot.  <a href="#" onclick="document.getElementById('infoDiv').style.display='block';">Why did this happen?</a><br><br>
 
<div id="infoDiv" style="display:none; background-color:#eee; padding:10px; margin:0 0 15px 0; line-height:1.4em;">
This page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the <a href="//www.google.com/policies/terms/">Terms of Service</a>. The block will expire shortly after those requests stop.  In the meantime, solving the above CAPTCHA will let you continue to use our services.<br><br>This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests.  If you share your network connection, ask your administrator for help &mdash; a different computer using the same IP address may be responsible.  <a href="//support.google.com/websearch/answer/86640">Learn more</a><br><br>Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.
</div>
 
IP address: 89.179.***.***<br>Time: 2020-02-20T19:53:20Z<br>URL: http://www.google.com/<br>
</div>
</div>
</body>
</html>

As for http_debug i turned it on only on one of my computers. And already turned it off as it caused BOINC manager to work unstable as described in other thread.
I searched if any reference site contacts were logged while http_debug was on but looks like last is already deleted by BOINC - stdoutdae.txt and stdoutdae.old contains only last ~30 hours and there were no reference site contacts during this time on that computer. Probable BOINC delete schedule of "old" logs is based not on time but on number of lines or file size. As on other machines where is no any debug flags set it > month worth of logs.

Is any way to trigger this check intentionally without waiting for next http error on one of the attached projects? [/code]
7) Message boards : Questions and problems : Stalled downloads (Message 96013)
Posted 23 Feb 2020 by Mad_Max
Post:
Good to know. Thanks.
Although i highly doubt i actually ever hit 1000 lines per second.
I have run only one debug option for this time (http_debug only).
Machine is quite active (16 thread CPU + 2 GPU each running 2 tasks so 20 tasks is running in parallel) but i skim trough stdoutdae.txt and looks like 200-300 lines per second was max logging speed.

May be same can happen if something delay polling between the client and the manager?
So not > 1000 lines per second but > 1000 lines since last polling?

P.S.
Have found a workaround - looks like BOINC could apply http_debug flag without restart (via "read config files" after cc_config edit). So i will try this method next time if i see more stuck downloads from Rosetta @ Home.
8) Message boards : Questions and problems : Stalled downloads (Message 96009)
Posted 23 Feb 2020 by Mad_Max
Post:
And i can not catch it. It does not repeats each day and looks like running BOINC with <http_debug>1</http_debug> cause problems to BOINC itself.
After some time it loses international connection (between BOINC and BOINC Manager - red dot on tray icon) and hangs.
Looks like it caused by too many messages in event log due to very high verbose level of <http_debug> and very high BOINC network activity on machine.
I checked stdoutdae.txt and it generate ~ 50 000 log lines per day.

Mat be will try later on less active machine.
9) Message boards : Questions and problems : False positive network errors detection due to goggle CAPTCHA on reference site (Message 96008)
Posted 23 Feb 2020 by Mad_Max
Post:
Hello.

I have notices a lot of such error on all of my BOINC machines (it just few examples from just one computer - actually there were much more of same errors on all my computers) :
14-Feb-2020 19:13:41 [---] Project communication failed: attempting access to reference site
14-Feb-2020 19:13:43 [---] BOINC can't access Internet - check network connection or proxy configuration.
---------- cut ----------
15-Feb-2020 17:45:56 [---] Project communication failed: attempting access to reference site
15-Feb-2020 17:45:58 [---] BOINC can't access Internet - check network connection or proxy configuration.
---------- cut ----------
17-Feb-2020 06:54:05 [---] Project communication failed: attempting access to reference site
17-Feb-2020 06:54:07 [---] BOINC can't access Internet - check network connection or proxy configuration.
---------- cut ----------
19-Feb-2020 02:09:37 [---] Project communication failed: attempting access to reference site
19-Feb-2020 02:09:39 [---] Internet access OK - project servers may be temporarily down.
---------- cut ----------
19-Feb-2020 02:10:25 [---] Project communication failed: attempting access to reference site
19-Feb-2020 02:10:26 [---] BOINC can't access Internet - check network connection or proxy configuration.
---------- cut ----------
20-Feb-2020 22:53:19 [---] Project communication failed: attempting access to reference site
20-Feb-2020 22:53:20 [---] BOINC can't access Internet - check network connection or proxy configuration.

BOINC thinks that there is no internet connection so delay all project requests(Backing off). But it all were false positives - connection is OK, i monitor it independently and know there were no connection issues at all that times.

I opened "lookup_website.html" in BOINC folder (which AFAIK saves last result get from reference site) file and have found the reason: google banned BOINC clients as a "bot activity":
ERROR:
invalid key domain

About this page

Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why did this happen?
10) Message boards : Questions and problems : Stalled downloads (Message 95955)
Posted 19 Feb 2020 by Mad_Max
Post:
I can dig some data next time it happen.
I already added <http_debug>1</http_debug> in config.
But it will just show why downloads from R@H server is stalling, but it will not show why BOINC refuse to get new fork even after such stale download is already aborted. isn't it?
11) Message boards : Questions and problems : Stalled downloads (Message 95952)
Posted 19 Feb 2020 by Mad_Max
Post:
Unfortunately, the log doesn't show you cancelling the download - which is probably the log's fault, not yours.
..............

Will do. But restarting the machine (or presumably just the Boinc client) always clears the problem. My problem is it doesn't clear without restarting.

Same problem here.
Some downloads keeps stalling (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few fays (~ one file stuck of few hundreds downloads clearly ).
Some of the cases, retry transfer does not help (is probable a R@H server bug), aborting stuck transfer usually work, but it some cases it does not help either so i need to restart BOINC completely to resume normal work flow.

There are also no any entries in BOINC event log about aborting transfer:
19-Feb-2020 13:32:36 [Einstein@Home] Requesting new tasks for AMD/ATI GPU
19-Feb-2020 13:32:43 [Einstein@Home] Scheduler request completed: got 1 new tasks
-------------here i have aborted stalled download, but it is not logged -----------
19-Feb-2020 13:33:24 [Rosetta@home] update requested by user
19-Feb-2020 13:33:27 [Rosetta@home] Sending scheduler request: Requested by user.
19-Feb-2020 13:33:27 [Rosetta@home] Reporting 1 completed tasks    - note: this task has stalled download and BOINC recognize it as aborted after stale download canceled, but refuse ask for more work   
19-Feb-2020 13:33:27 [Rosetta@home] Not requesting tasks: some download is stalled
19-Feb-2020 13:33:29 [Rosetta@home] Scheduler request completed
-------------------------------- i restart BOINC and getting new work resume  -------------------- 
19-Feb-2020 13:33:51 [---] Exiting
19-Feb-2020 13:34:44 [---] Starting BOINC client version 7.14.2 for windows_x86_64
19-Feb-2020 13:34:44 [---] log flags: file_xfer, sched_ops, task
19-Feb-2020 13:34:44 [---] Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
-------------- cut --------------------
19-Feb-2020 13:35:06 [Rosetta@home] Sending scheduler request: To fetch work.
19-Feb-2020 13:35:06 [Rosetta@home] Requesting new tasks for CPU
19-Feb-2020 13:35:09 [Rosetta@home] Scheduler request completed: got 7 new tasks
19-Feb-2020 13:35:11 [Rosetta@home] Started download of flags_rb_02_12_15913_15773__t000__2_C1_robetta
19-Feb-2020 13:35:11 [Rosetta@home] Started download of input_rb_02_12_15913_15773__t000__2_C1_robetta.zip
19-Feb-2020 13:35:11 [Rosetta@home] Started download of flags_rb_02_13_14003_15805__t000__2_C1_robetta
19-Feb-2020 13:35:13 [Rosetta@home] Finished download of flags_rb_02_12_15913_15773__t000__2_C1_robetta
19-Feb-2020 13:35:13 [Rosetta@home] Finished download of flags_rb_02_13_14003_15805__t000__2_C1_robetta
12) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91378)
Posted 4 May 2019 by Mad_Max
Post:
Limit for tasks in progress is 200 per GPU plus some (40 if i recall correctly) per CPU core.

But i don't interested in such workarounds. I already found own some time ago.
Just a very simple script which force updates of the project every 2 min via boinccmd.
:start
timeout 120
boinccmd.exe --project milkyway.cs.rpi.edu/milkyway update
goto start

Works very well for me: every few updates client has nothing to report (no completed tasks yet) and thus request to get new work succeed. And cache newer run dry while this script is working.

But it just workaround, not a fix. I want to find root of the problem and pass it to BOINC devs so they can smash this bug for good.
13) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91355)
Posted 2 May 2019 by Mad_Max
Post:
I have got some spare time to test. And now can confirm that problem with getting work from MW happens only with combined requests (report completed tasks + request new).
And fast machines and/or short WUs affected through this: if machine fasts enough or WU short enough almost every request will include report of completed WUs, and fail due to it.

I did not find any BOINC options to suspend reporting of completed task, so i did it manually. I suspend all tasks close to 100%, forcing BOINC to start new ones. This way i managed to get about half an hour of work without a single task completed (and thus no reported tasks in requests too). And during this time - ALL requests to get new work was successful:
02/05/2019 23:33:33 | Milkyway@Home | Sending scheduler request: To fetch work.
02/05/2019 23:33:33 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
02/05/2019 23:33:35 | Milkyway@Home | Scheduler request completed: got 58 new tasks
02/05/2019 23:35:11 | Milkyway@Home | Sending scheduler request: To fetch work.
02/05/2019 23:35:11 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
02/05/2019 23:35:13 | Milkyway@Home | Scheduler request completed: got 58 new tasks
02/05/2019 23:36:49 | Milkyway@Home | Sending scheduler request: To fetch work.
02/05/2019 23:36:49 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
02/05/2019 23:36:51 | Milkyway@Home | Scheduler request completed: got 58 new tasks
02/05/2019 23:38:26 | Milkyway@Home | Sending scheduler request: To fetch work.
02/05/2019 23:38:26 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
02/05/2019 23:38:28 | Milkyway@Home | Scheduler request completed: got 58 new tasks
02/05/2019 23:40:04 | Milkyway@Home | Sending scheduler request: To fetch work.
02/05/2019 23:40:04 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
02/05/2019 23:40:06 | Milkyway@Home | Scheduler request completed: got 58 new tasks
02/05/2019 23:41:42 | Milkyway@Home | Sending scheduler request: To fetch work.
02/05/2019 23:41:42 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
02/05/2019 23:41:44 | Milkyway@Home | Scheduler request completed: got 14 new tasks

Full log available here: https://pastebin.com/xgEU63uv
Note 1: Last request bring only 14 tasks (and few after it - zero) because at this point i hit the server limit of max tasks in progress (600 per host for MW). It is normal and expected.
Note 2: only reporting of completed tasks or task aborted during computation lead to a failure with getting new tasks. Reporting of tasks which was aborted before start does not influence getting new work. There is such example: at lines 82-87 of log i aborted task which did not start yet and at lines 88-94 i aborted task which was already running.

So my current best guess - there may be something in tasks results that trigger this bug. Like Stderr output or other data from completed tasks included in combined requests.
14) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91345)
Posted 2 May 2019 by Mad_Max
Post:
Also, two other things are not included in their (BOINC standard) template SSP:

Number of tasks available per app - NBody or separation. It wouldn't help if most of the 11683 were NBody.
Number of tasks reported per hour - with a maximum runtime of 43 seconds, that could be huge.

(22377 recent computers all doing a task every 40 seconds is over 2 million tasks reported every hour - you get an idea how much is actually happening under the hood)

Well looks like what you even do not try to look (just barely skim on top and jumps to conclusions). Not in logs provided not at status pages.
There IS info about how much work RTS per app basis - it just in the block below.
Tasks by application
Application 	                         Unsent 	In progress 	Runtime of last 100 tasks in hours: average, min, max 	Users in last 24 hours
Milkyway@home N-Body Simulation 	1002 	       129827 	    2.34 (0.01 - 109.98) 	2954
Milkyway@home Separation 	        [b]10104[/b] 	       441163 	0.2 (0.01 - 52.65) 	5743

Most of the work RTS is from Separation app and it was this way for long (few month at least). Then i was saying there are about 10 000 tasks ready to sent all the time - i refereed to the Separation app only not total RTS count. Total is about 10k+1k=11k where ~10k are separation tasks and ~1k NBody tasks.

And again - where you get this nonsense about 40 seconds is a maximum runtime and millions of tasks per hour? Stats show runtimes in HOURS units, not seconds.
0.2 hours (~700 seconds) is an average value for separation runtimes. While max is a few dozen hours ("stuck" tasks and some old mobile devices based on weak ARM CPUs trying to compete with modern powerful GPUs)
And current total throughput of the MW project not a millions of tasks per hour but only 20k-30k tasks per hour (based on RAC and GFLOPS count - about 170 millions cobblestones per day, 220-250 per 1 tasks, ~700k tasks per day, 29k per hour ) .
So 10к RTS witch server tries to mains is enough to feed all clients for 20-30 min even if no new tasks being generated at all. But work generator is running online and replenish this supply every few minutes if it goes <10к.
15) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91344)
Posted 2 May 2019 by Mad_Max
Post:

Which may also have a bearing on this problem. With the Milkyway tasks finishing so quickly, anyone who gets the 'no work' reply and goes into backoff woll clear the backoff via a completion and will ask again - the delay being solely governed by the 91 second server delay. It's possible that you're seeing a whole group of hosts in lockstep - all asking (and asking again) at the same time. The client backoffs were deliberately designed with a randomisation factor to avoid that problem, but there's nothing random about the server delay.

The advantage of the 'update' click could be as simple as that slight extra randomness in the timing.

Oh, wait. The 'zero sent' continues even after the cache is empty and there's no completed work either to clear the backoff, or needing to be reported? That destroys both our hypotheses!

Well if you check my log you will see that my computers not so fast and finish MW tasks only in about 3 minutes (actually 11-13 min to finish task but it runs 4 tasks in parallel, so one task finished every 3 mins or so). It more compared to 91 sec server issued delay. Plus there is a some variation in timings it 2-5 min between requests, not constant exact 3 min delay . So "synchronization" between client should not be the case - as there is enough randomness without manual updates.

And no - why you think that 'zero sent' continues even after the cache is empty? Log shows the opposite (https://pastebin.com/8LCxm5RN):
29/04/2019 05:07:31 | Milkyway@Home | Computation for task de_modfit_80_bundle4_4s_south4s_0_1555431910_4469895_0 finished
29/04/2019 05:08:16 | Milkyway@Home | Computation for task de_modfit_84_bundle4_4s_south4s_0_1555431910_4347759_1 finished

29/04/2019 05:08:30 | Milkyway@Home | Sending scheduler request: To fetch work.
29/04/2019 05:08:30 | Milkyway@Home | Reporting 2 completed tasks
29/04/2019 05:08:30 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
29/04/2019 05:08:32 | Milkyway@Home | Scheduler request completed: got 0 new tasks

29/04/2019 05:19:12 | Milkyway@Home | Sending scheduler request: To fetch work.
29/04/2019 05:19:12 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
29/04/2019 05:19:14 | Milkyway@Home | Scheduler request completed: got 64 new tasks

05:07:31 and 05:08:16 - last 2 task from cache completed.
05:08:30 - last 2 completed tasks reported + request to get new work failed (as few dozen previous too)
05:19:12 - back off times has expired (as there were no completed work to cancel it earlier client wait full back-off)and first request got a lot of work.

Same in the next cycle - then all work client got finished again:
29/04/2019 08:34:03 | Milkyway@Home | Scheduler request completed: got 0 new tasks
29/04/2019 08:35:33 | Milkyway@Home | Computation for task de_modfit_81_bundle4_4s_south4s_0_1555431910_4552914_0 finished

29/04/2019 08:35:39 | Milkyway@Home | Sending scheduler request: To fetch work.
29/04/2019 08:35:39 | Milkyway@Home | Reporting 1 completed tasks
29/04/2019 08:35:39 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
29/04/2019 08:35:41 | Milkyway@Home | Scheduler request completed: got 0 new tasks

29/04/2019 08:48:33 | Milkyway@Home | Sending scheduler request: To fetch work.
29/04/2019 08:48:33 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
29/04/2019 08:48:36 | Milkyway@Home | Scheduler request completed: got 57 new tasks

08:35:33 - last task from cache finished
08:35:39 - last task reported to server, request to get new work failed (as all previous while there was some work in cache)
08:48:33 - firs request with empty cache and without reporting completed succeed.
And so on - see same behavior in all cycles i checked. Cache empty ==> last tasks reported ==> back-off timer expired ==> successful request
16) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91343)
Posted 2 May 2019 by Mad_Max
Post:
Read it. But am still sure that problem is not with the supply of the work.
When I look at
https://milkyway.cs.rpi.edu/milkyway/server_status.php it says at the bottom Task data as of 29 Apr 2019, 10:41:21 UTC. That's about 20 minutes ago. A LOT can change in 20 minutes, so it now showing to have 11683 tasks RTS means that they were that number at 10:41:21 UTC, not now. Aside from that, even if that number is a constant 11K+, you still don't know how many tasks were in the feeder at the given moment your client asked for work. If there were none, or way fewer than you're asking for, it won't give you work.

Perhaps we need an average feeder number on the SSP?

Why feeder should not give work if client asks more than it have it moment? It should give less (up to max it has) but not zero.
Oh, wait - it is actually already works this way on all projects i have tried - if client ask too much work - server just send "some" work, not refuse to give work at all. And give some more at next request after delay expire and so on - until client thinks it has enough or hit server daily quota.
17) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91252)
Posted 29 Apr 2019 by Mad_Max
Post:
P.S.
One of my thought after digging few such logs (also with additional debug info turned on) there may be a problem with combined request: reporting completed work + requesting new work.
As all successful work fetch i saw in logs was pure work requests (without reporting completed tasks)
- when all work was finished and reported (so nothing to report and client only request new work)
- at manual updates when nothing to report because there are no finished tasks yet (so client also only request new work)

Is there any option in current BOINC client to disable / pause automatic work reporting to test this hypothesis?
18) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91251)
Posted 29 Apr 2019 by Mad_Max
Post:
Read it. But am still sure that problem is not with the supply of the work.

Because as you wrote yourself - scheduled request and manual updates should have same chance of getting work. So if there was any shortfall of work supply on the servers failure rate(no getting new work) of scheduled requests and manual updates should be at about same level.

But it is not the case: almost all scheduled requests fail while almost all manual updates succeed.
Also almost all scheduled requests succeed if work cache is empty.
I added example of log from one of my machines in prev post but you probable miss it as it was in edit (did not expect response so fast). Here it again: https://pastebin.com/8LCxm5RN

As you can see there is about 100 failed scheduled requests before cache is empty. This is a last WU from cache is finished, and very first request after it got a lot of new work.
29/04/2019 05:08:16 | Milkyway@Home | Computation for task de_modfit_84_bundle4_4s_south4s_0_1555431910_4347759_1 finished
29/04/2019 05:08:30 | Milkyway@Home | Sending scheduler request: To fetch work.
29/04/2019 05:08:30 | Milkyway@Home | Reporting 2 completed tasks
29/04/2019 05:08:30 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
29/04/2019 05:08:32 | Milkyway@Home | Scheduler request completed: got 0 new tasks
29/04/2019 05:19:12 | Milkyway@Home | Sending scheduler request: To fetch work.
29/04/2019 05:19:12 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
29/04/2019 05:19:14 | Milkyway@Home | Scheduler request completed: got 64 new tasks

Then next ~50 scheduled requests failed until cache is empty again. And got lot of work in next request after cache emptied.
And so on - such cycle repeats for many days on many machines of different users.
19) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91249)
Posted 29 Apr 2019 by Mad_Max
Post:
OK, I can manage the next bit now. The next request was

15823 Milkyway@Home 4/28/2019 10:14:06 AM Scheduler request completed: got 0 new tasks
That's effectively 91 seconds after the one in my last post. In this mode, the client will set a backoff every time it gets a got 0 new tasks response, and will clear it every time one of your previously cached tasks finishes computation. Those are both known design features, like 'em or not.

At the moment, you're completing cached work much more often than once every 91 seconds, so you'll keep asking at every possible occasion. But as soon as you run dry, there's nothing to clear the backoffs (no more work finishing), and they'll build up.

The only way to cure this one is to persuade the project to provide more tasks.

A possible mitigation would be to set a cron job to trigger boinccmd to issue a project update every five minutes, but you might get very unpopular with the server administrators. They obviously aren't set up to satisfy your voracious appetite.

It is not a problem with supply of work from a project. I also have this problem with getting no work from MW. And many other people too (discussion thread at MW forum about it - https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4424 and in few other threads)
There are plenty of work available to sent (usually server maintain about 10 000 tasks ready to sent, sometimes it goes down by 1к-2к task but almost never close to zero), but some users can not get any of them until their local BOINC work cache is empty getting "got 0 new tasks" all the time until this point.
But if you press "update" - client receives a lot(few dozen task per request) of work immediately. Only automatic (scheduled) work fetch is failing.

Here is an example of BOINC log https://pastebin.com/8LCxm5RN

Problem also exit with old BOINC clients - i used 7.6.22 for example.
So may be it is not a client issue but within the server part of BOINC code.




Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.