BOINC (windows) >= 5.10.40 + squid = no scheduler replies

Message boards : BOINC client : BOINC (windows) >= 5.10.40 + squid = no scheduler replies
Message board moderation

To post messages, you must log in.

AuthorMessage
Chris Sutton

Send message
Joined: 29 Aug 05
Posts: 117
Message 17120 - Posted: 2 May 2008, 23:44:04 UTC

Seems we have a problem that affects the current stable release (5.10.45) windows client when operated behind a squid proxy.

Symptoms are:
03/05/2008 01:31:26|malariacontrol.net|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
03/05/2008 01:32:30||Project communication failed: attempting access to reference site
03/05/2008 01:32:32||Access to reference site succeeded - project servers may be temporarily down.
03/05/2008 01:32:32|malariacontrol.net|Scheduler request failed: Server returned nothing (no headers, no data)

My testing has narrowed the problem down to versions after 5.10.39 which does not exhibit this problem. I haven't tested the linux versions yet, focusing only on the windows ones at this time.

The major difference between the versions (gathered from the checkin notes) is the switch to libcurl 7.18.0, but there are some other minor modifications as well.

Interestingly, some preliminary packet sniffing seems to reveal that the scheduler is replying to the request, but the reply is being dropped or otherwise ignored by the client. This is also supported by the Access to reference site succeeded message.

I haven't dived into the source yet. It's late. Maybe someone will have found and fixed the problem while I sleep on it. :)
ID: 17120 · Report as offensive
SekeRob

Send message
Joined: 25 Aug 06
Posts: 1596
Message 17128 - Posted: 3 May 2008, 7:30:32 UTC - in response to Message 17120.  

It's maybe not a coincidence because 5.10.40 & 41 were the 2 iterations that fixed most proxy issues, introduced the <force_auth> flag and made it finally work thru SSL/Port 443 for WCG.

Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 17128 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 17133 - Posted: 3 May 2008, 9:15:19 UTC
Last modified: 3 May 2008, 9:20:11 UTC

I have a vague memory of someone having problems with a squid proxy a few months ago on the CPDN project who discovered that Squid was creating fake ACKnowledgements to packets in order to make the latency look better. I don't know if your problem is similar or not. Also note that my memory is somewhat unreliable...
ID: 17133 · Report as offensive
Chris Sutton

Send message
Joined: 29 Aug 05
Posts: 117
Message 17153 - Posted: 3 May 2008, 21:01:55 UTC

Ok, so here's the news so far.

The HTTP 1.1 POST of the scheduler request in v 5.10.40 includes a new header Expect: 100-continue which is not present in same request made with v 5.10.39

Where a v 5.10.39 scheduler request receives a reply of HTTP/1.0 200 OK and the scheduler reply in the body, v 5.10.40 is sent an HTTP/1.0 100 Continue with no body, whereafter it seems to timeout and then the GET is issued to google which succeeds.

So that seems to be the difference, but I'm not really any closer to understanding the problem and there's various possibilities.

What is the point of Expect: 100-continue and is it necessary?

Could suppressing the Expect: 100-continue work and if so, what are the consequences?

Is BOINC not handling a response of '100 Continue' properly?

libcurl docs say:
CURLOPT_POST
Using POST with HTTP 1.1 implies the use of a "Expect: 100-continue" header.
What happens if we force HTTP 1.0?

Is squid the problem?

Why does this appear to happen on some projects only?
Is the project http server the problem?

Any HTTP experts reading this feel free to jump right in... ;-)
ID: 17153 · Report as offensive
Thyme Lawn

Send message
Joined: 2 Sep 05
Posts: 103
United Kingdom
Message 17157 - Posted: 3 May 2008, 22:07:42 UTC - in response to Message 17153.  
Last modified: 3 May 2008, 22:11:06 UTC

Any HTTP experts reading this feel free to jump right in... ;-)

Not an expert by any means Chris, but I'll give it a go ;-)
What is the point of Expect: 100-continue and is it necessary?

"Expect: 100-continue" is normally used when the POST is split into a header requesting transmission of "Content-length" bytes, with the request body being sent in a separate POST after receiving a "100 Continue" response (or a timeout in case the server doesn't support the expectation).

If the request body is included with the "Expect: 100-continue" POST the server should send a "100 Continue" response followed by a "200 OK" response when the body has been received.
What happens if we force HTTP 1.0?

That might suppress the "Expect: 100-continue" header and make things work as before. Hopefully without breaking anything else.
Is squid the problem?

I doubt it, otherwise all projects would be failing. The expectation should be passed unchanged through proxies (unless there's an HTTP 1.0 server in the chain, in which case you should have got a "417 Expectation failed" response).
Why does this appear to happen on some projects only?
Is the project http server the problem?

I'd guess so. Probably running a version that's not sending "200 OK" when the request body is in the same message as the "Expect: 100-continue" header.

Edit: See RFC2616 section 8.2.3.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 17157 · Report as offensive
Chris Sutton

Send message
Joined: 29 Aug 05
Posts: 117
Message 17162 - Posted: 3 May 2008, 22:45:15 UTC - in response to Message 17157.  
Last modified: 3 May 2008, 23:18:32 UTC

Not an expert by any means Chris, but I'll give it a go ;-)

Thanks. Your help confirms my research. :)

Is squid the problem?

I doubt it, otherwise all projects would be failing. The expectation should be passed unchanged through proxies (unless there's an HTTP 1.0 server in the chain, in which case you should have got a "417 Expectation failed" response).

Sadly, in my particular case, the answer here is yes.
It took me a little while to see the light and I needed to sniff both sides of the proxy to see it. The reality is "Squid is a fully-featured HTTP/1.0 proxy which is almost (but not quite - we're getting there!) HTTP/1.1 compliant." quoted from http://www.squid-cache.org/Intro/

Seems that while squid passes the server '100-continue' response back, it doesn't pass the following '200 OK + Body' back even though the server sends it.

What is the point of Expect: 100-continue and is it necessary?

"Expect: 100-continue" is normally used when the POST is split into a header requesting transmission of "Content-length" bytes, with the request body being sent in a separate POST after receiving a "100 Continue" response (or a timeout in case the server doesn't support the expectation).

All true and one more crucial aspect. It's an HTTP/1.1 feature and squid seems currently incapable of handling it, or maybe just my version. I see there's a version 3.0 but I haven't looked at that yet.

What happens if we force HTTP 1.0?

That might suppress the "Expect: 100-continue" header and make things work as before. Hopefully without breaking anything else.

Yup, I made a cc_config.xml and forced HTTP 1.0 with v 5.10.40 and it did a scheduler request OK. I haven't sniffed that, but I expect as you do that the 100-continue is suppressed.

Why does this appear to happen on some projects only?
Is the project http server the problem?

I'd guess so. Probably running a version that's not sending "200 OK" when the request body is in the same message as the "Expect: 100-continue" header.

That's a possibility, and since in my test the server is sending this, it's just not being passed along by squid, I'm suspecting there may be some transparent proxies in play that may not be fully HTTP/1.1 compliant.

So where does this leave us?
If you are using BOINC v >= 5.10.40 and squid 2.6 and are getting messages like this in your BOINC client:
Scheduler request failed: Server returned nothing (no headers, no data)
create a cc_config.xml with the following:
<cc_config>
<options>
<http_1_0>1</http_1_0>
</options>
</cc_config>

[edit]
Just in case anyone wants the gory details:

BOINC Sends to SQUID:
POST http://www.malariacontrol.net/malariacontrol_cgi/cgi HTTP/1.1
User-Agent: BOINC client (windows_intelx86 5.10.40)
Host: www.malariacontrol.net
Pragma: no-cache
Accept: */*
Accept-Encoding: deflate, gzip
Proxy-Connection: Keep-Alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 11572
Expect: 100-continue

<scheduler_request>
<authenticator>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</authenticator>
<hostid>xxxxx</hostid>
<rpc_seqno>3904</rpc_seqno>
[...snipped..]


SQUID Sends to PROJECT:
POST /malariacontrol_cgi/cgi HTTP/1.0
User-Agent: BOINC client (windows_intelx86 5.10.40)
Host: www.malariacontrol.net
Pragma: no-cache
Accept: */*
Accept-Encoding: deflate, gzip
Content-Type: application/x-www-form-urlencoded
Content-Length: 11572
Expect: 100-continue
Via: 1.1 proxy.company.net:3030 (squid/2.6.STABLE5)
X-Forwarded-For: zzz.zzz.zzz.zzz
Cache-Control: max-age=259200
Connection: keep-alive

<scheduler_request>
<authenticator>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</authenticator>
<hostid>xxxxx</hostid>
<rpc_seqno>3904</rpc_seqno>
[...snipped..]


PROJECT Sends to SQUID:
HTTP/1.0 100 Continue
Date: Sun, 04 May 2008 02:23:39 GMT
Connection: keep-alive
Proxy-Connection: keep-alive
Via: 1.1 cache (NetCache NetApp/6.0.4)


SQUID Sends to BOINC:
HTTP/1.0 100 Continue
Date: Sun, 04 May 2008 02:23:39 GMT
X-Cache: MISS from proxy.company.net
X-Cache-Lookup: MISS from proxy.company.net:3030
Via: 1.1 cache (NetCache NetApp/6.0.4), 1.0 proxy.company.net:3030 (squid/2.6.STABLE5)
Proxy-Connection: keep-alive


PROJECT Sends to SQUID:
HTTP/1.0 200 OK
Date: Sat, 03 May 2008 22:03:39 GMT
Content-Type: text/xml
Server: Apache
Via: 1.1 cache (NetCache NetApp/6.0.4)

<scheduler_reply>
<scheduler_version>601</scheduler_version>
<master_url>http://www.malariacontrol.net/</master_url>
[...snipped..]


This last response is not passed on by squid, hence the situation we find ourselves in.
[further edit]
I don't quite get why the dates/times in the project server responses are so odd. They appear sequentially in the dump file, exactly as here, but the 100-continue seems to take on the local time and the 200-OK the remote time?!?
[/further edit]
[/edit]
ID: 17162 · Report as offensive
Thyme Lawn

Send message
Joined: 2 Sep 05
Posts: 103
United Kingdom
Message 17175 - Posted: 4 May 2008, 8:47:07 UTC - in response to Message 17162.  
Last modified: 4 May 2008, 8:49:09 UTC

Nice bit of detective work Chris :)
I don't quite get why the dates/times in the project server responses are so odd. They appear sequentially in the dump file, exactly as here, but the 100-continue seems to take on the local time and the 200-OK the remote time?!?

I'd hazard a guess that Squid is keeping track of the 'Date:' values and filtering out the '200 OK' because it appears to have been sent before the '100 continue' (i.e. it's out of sequence). Question is, what's causing the 260 minute discrepancy?

The 'Date:' header is optional in '100 Continue' messages, but RFC2616 includes a mechanism which could cause the date to be added by an intermediate proxy or gateway (section 14.18).

   A received message that does not have a Date header field MUST be
   assigned one by the recipient if the message will be cached by that
   recipient or gatewayed via a protocol which requires a Date. An HTTP
   implementation without a clock MUST NOT cache responses without
   revalidating them on every use. An HTTP cache, especially a shared
   cache, SHOULD use a mechanism, such as NTP [28], to synchronize its
   clock with a reliable external standard.

So either the project server is inserting the wrong date in the first place or a system between the project and your proxy server is adding it. If it's the latter the obvious solution would be for the project server to include the 'Date:' header in the '100 continue' message.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 17175 · Report as offensive
Chris Sutton

Send message
Joined: 29 Aug 05
Posts: 117
Message 17176 - Posted: 4 May 2008, 9:37:09 UTC - in response to Message 17175.  
Last modified: 4 May 2008, 9:50:33 UTC

So either the project server is inserting the wrong date in the first place or a system between the project and your proxy server is adding it.

This got me thinking. Can there be another proxy in the mix?
Reviewing the header data below indicates quite clearly that there are two proxies at play here:
Via: 1.1 cache (NetCache NetApp/6.0.4), 1.0 proxy.company.net:3030 (squid/2.6.STABLE5)

The squid one is mine, the NetCache one isn't.

A bit more research and a quick trip to http://whatismyipaddress.com/ confirms it. My ISP or ADSL provider is operating a transparent proxy. Hmmm...Not sure if I should be annoyed or impressed. :)

Regarding the date, I suspect you are correct. The transparent proxy is likely adding it, as it's local time, whereas the 200-Ok appears to be unchanged remote server time.
[edit]
I just checked a standard 200-OK for a HTTP Get to the project server and there's only a 2 hour 4 minute difference in time, which corresponds to timezone + standard clock deviation (on my end), so I'm still none the wiser.
[/edit]
ID: 17176 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 17184 - Posted: 4 May 2008, 18:39:00 UTC - in response to Message 17162.  

Seems that while squid passes the server '100-continue' response back, it doesn't pass the following '200 OK + Body' back even though the server sends it.

"Expects" header and "100" status code are not defined by HTTP 1.0 protocol, they are new features of HTTP 1.1. Squid is a HTTP 1.0 proxy, so it's completely expected (pun?) that it doesn't handle those in the right way.
ID: 17184 · Report as offensive
Chris Sutton

Send message
Joined: 29 Aug 05
Posts: 117
Message 17187 - Posted: 4 May 2008, 20:40:33 UTC - in response to Message 17184.  
Last modified: 4 May 2008, 20:50:55 UTC

"Expects" header and "100" status code are not defined by HTTP 1.0 protocol, they are new features of HTTP 1.1. Squid is a HTTP 1.0 proxy, so it's completely expected (pun?) that it doesn't handle those in the right way.

That's the conclusion I eventually reached. :)

Thing is, I think this is going to be taking some people by surprise.

Previously I never had to worry about HTTP/1.0 vs HTTP/1.1. Even though BOINC was using the default 1.1 it never included the 'Expect' header, but it seems that the update to libcurl in v 5.10.40 now introduces this 'Expect' which causes communication failures when communicating with projects through squid.

Luckily I'm savvy enough to diagnose my problem, accept it and make the necessary changes to get it working in a normal method again.

With the higher costs of international bandwidth, many small (and some not so small) ISP's are implementing transparent proxy servers and they don't tell their customers. If they use squid for this purpose, as some small ISP's may be prone to, we're looking at some confused BOINC users not understanding why they can no longer connect and all they did was update to a client >= 5.10.40.

I think this needs more awareness and when diagnosing communication problems, have people confirm that they're not behind a proxy by visiting a site like http://whatismyipaddress.com/
ID: 17187 · Report as offensive
Thyme Lawn

Send message
Joined: 2 Sep 05
Posts: 103
United Kingdom
Message 17189 - Posted: 4 May 2008, 22:31:45 UTC - in response to Message 17187.  
Last modified: 4 May 2008, 22:34:50 UTC

I think this needs more awareness

Which should preferably be built into the client rather than expecting users to self-diagnose.

If an HTTP/1.1 scheduler request gets no response it would be much better to try again using HTTP/1.0. If that works the client would automatically switch to using HTTP/1.0 for the remainder of its current instance, possibly after a number of consecutive requests have reverted to HTTP/1.0 (in case the HTTP/1.1 failure was due to a transient condition).

Automatic switches to HTTP/1.0 should not be persistent (i.e. no value is recorded in client_state.xml). This would force the client to revert to HTTP/1.1 when it restarts, automatically allowing for upgrade of servers which don't support HTTP/1.1.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 17189 · Report as offensive
Chris Sutton

Send message
Joined: 29 Aug 05
Posts: 117
Message 17196 - Posted: 5 May 2008, 9:09:53 UTC - in response to Message 17189.  
Last modified: 5 May 2008, 9:12:17 UTC

Which should preferably be built into the client rather than expecting users to self-diagnose.

Agreed. I was thinking along the lines of a radio button to switch between HTTP 1.0/1.1 on the proxy options page, but your solution seems much more elegant. :)

I'd also like to see squid send back a 417 if it can't handle the 'Expect' header. At least users could then diagnose further.
ID: 17196 · Report as offensive

Message boards : BOINC client : BOINC (windows) >= 5.10.40 + squid = no scheduler replies

Copyright © 2022 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.