Dual core

Message boards : BOINC client : Dual core
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15716 - Posted: 7 Mar 2008, 19:12:09 UTC - in response to Message 15715.  

You wrote before that the SETI.Beta job takes 202 hours. The remaining run time on that job will be, is my guess, the cause for that project not being entitled to backfill the buffer. 86400 STD less remaining run time on job and the relevant weight broadly fills the bill.

SETI.Beta jobs actually take ~144 hours. 202 hours was BOINC's estimate, at that time, of time remaining to completion (estimate incorrect as RDCF is messed up - it's a Beta project). Time remaining is currently estimated at 131 hours 15 mins.

It's about 2 hours since I upgraded. STD has remained limit-stopped at 86,400: LTD has increased to 591,333. LHC has requested work six times, Orbit has requested work 11 times.

SETI Beta has not requested work at all. It is unable to fulfil the resource share I requested, because it isn't clever enough to recognise that the work in its buffer is indivisible, and can't be split across two cores. I (and I think Mike Gibson) think that it should recognise that an available resource (a second core) is available to honour the resource share, that it can't be used by the task already downloaded because it can't be split across two cores, and download more work as permitted by debt etc.
ID: 15716 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 15717 - Posted: 7 Mar 2008, 21:21:08 UTC

Please read this page and create a cc_config.xml file with the following flags turned on:

<task>
<sched_ops>
<cpu_sched>
<cpu_sched_debug>
<work_fetch_debug>

Please report on the output messages.

BOINC WIKI
ID: 15717 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15718 - Posted: 7 Mar 2008, 23:05:07 UTC - in response to Message 15717.  
Last modified: 7 Mar 2008, 23:12:40 UTC

Please read this page and create a cc_config.xml file with the following flags turned on:

<task>
<sched_ops>
<cpu_sched>
<cpu_sched_debug>
<work_fetch_debug>

Please report on the output messages.

Please look further into the thread: I have a cc_config.xml file, I know how to use it, and I had <work_fetch_debug> turned on most of last night.

With <cpu_sched_debug> turned on as well, I'm getting a huge amount of output - what would be most useful to you? Here's a sample - this is repeated every 5 seconds.

07/03/2008 23:00:11||[cpu_sched_debug] Request enforce CPU schedule: Checkpoint reached
07/03/2008 23:00:11||[cpu_sched_debug] enforce_schedule(): start
07/03/2008 23:00:11|SETI@home Beta Test|[cpu_sched_debug] want to run: ap_22mr07ac_B7_P1_00398_20080208_13385.wu_3
07/03/2008 23:00:11|climateprediction.net|[cpu_sched_debug] want to run: hadcm3istd_026w_1920_160_15925025_0
07/03/2008 23:00:11|climateprediction.net|[cpu_sched_debug] want to run: hadcm3istd_0bob_1920_160_05936405_5
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] want to run: 30mr07ac.30604.3079.12.7.174_0
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] want to run: 30mr07aa.11888.2526.12.7.173_1
07/03/2008 23:00:11|Einstein@Home|[cpu_sched_debug] want to run: h1_0836.30_S5R3__316_S5R3b_0
07/03/2008 23:00:11|Einstein@Home|[cpu_sched_debug] want to run: h1_0836.30_S5R3__315_S5R3b_1
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] want to run: 30mr07ac.20541.1085042.6.7.171_3
07/03/2008 23:00:11|SETI@home Beta Test|[cpu_sched_debug] processing ap_22mr07ac_B7_P1_00398_20080208_13385.wu_3
07/03/2008 23:00:11|SETI@home Beta Test|[cpu_sched_debug] ap_22mr07ac_B7_P1_00398_20080208_13385.wu_3 is already running
07/03/2008 23:00:11|climateprediction.net|[cpu_sched_debug] processing hadcm3istd_026w_1920_160_15925025_0
07/03/2008 23:00:11|climateprediction.net|[cpu_sched_debug] hadcm3istd_026w_1920_160_15925025_0 is already running
07/03/2008 23:00:11|climateprediction.net|[cpu_sched_debug] processing hadcm3istd_0bob_1920_160_05936405_5
07/03/2008 23:00:11|climateprediction.net|[cpu_sched_debug] hadcm3istd_0bob_1920_160_05936405_5 is already running
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] processing 30mr07ac.30604.3079.12.7.174_0
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] 30mr07ac.30604.3079.12.7.174_0 is already running
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] processing 30mr07aa.11888.2526.12.7.173_1
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] 30mr07aa.11888.2526.12.7.173_1 is already running
07/03/2008 23:00:11|Einstein@Home|[cpu_sched_debug] processing h1_0836.30_S5R3__316_S5R3b_0
07/03/2008 23:00:11|Einstein@Home|[cpu_sched_debug] h1_0836.30_S5R3__316_S5R3b_0 is already running
07/03/2008 23:00:11|Einstein@Home|[cpu_sched_debug] processing h1_0836.30_S5R3__315_S5R3b_1
07/03/2008 23:00:11|Einstein@Home|[cpu_sched_debug] didn't preempt 30mr07ac.30604.1081361.12.7.33_0: tr 2137.500601 tsc 6.099602
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] processing 30mr07ac.20541.1085042.6.7.171_3
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] 30mr07ac.20541.1085042.6.7.171_3 is already running
07/03/2008 23:00:11||[cpu_sched_debug] finished preempt loop, nrunning 8
07/03/2008 23:00:11|climateprediction.net|[cpu_sched_debug] hadcm3istd_026w_1920_160_15925025_0 sched state 2 next 2 task state 1
07/03/2008 23:00:11|climateprediction.net|[cpu_sched_debug] hadcm3istd_0bob_1920_160_05936405_5 sched state 2 next 2 task state 1
07/03/2008 23:00:11|SETI@home Beta Test|[cpu_sched_debug] ap_22mr07ac_B7_P1_00398_20080208_13385.wu_3 sched state 2 next 2 task state 1
07/03/2008 23:00:11|Einstein@Home|[cpu_sched_debug] h1_0836.30_S5R3__316_S5R3b_0 sched state 2 next 2 task state 1
07/03/2008 23:00:11|Einstein@Home|[cpu_sched_debug] h1_0836.30_S5R3__315_S5R3b_1 sched state 1 next 1 task state 9
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] 30mr07ac.30604.3079.12.7.174_0 sched state 2 next 2 task state 1
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] 30mr07aa.11888.2526.12.7.173_1 sched state 2 next 2 task state 1
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] 30mr07ac.20541.1085042.6.7.171_3 sched state 2 next 2 task state 1
07/03/2008 23:00:11|SETI@home|[cpu_sched_debug] 30mr07ac.30604.1081361.12.7.33_0 sched state 2 next 2 task state 1
07/03/2008 23:00:11||[cpu_sched_debug] enforce_schedule: end

Periodically, I get this block as well:

07/03/2008 23:02:34||[work_fetch_debug] Request work fetch: timer
07/03/2008 23:02:34||[work_fetch_debug] compute_work_requests(): start
07/03/2008 23:02:34||[cpu_sched_debug] CPU efficiency old 0.991033 new 0.991026 wall 487.668808 CPU 482.714105 w 0.994372 e 0.989840
07/03/2008 23:02:34||[work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
07/03/2008 23:02:34|climateprediction.net|[work_fetch_debug] work fetch: project not contactable; skipping
07/03/2008 23:02:34|CPDN Beta|[work_fetch_debug] work fetch: project not contactable; skipping
07/03/2008 23:02:34|Einstein@Home|[work_fetch_debug] project has no shortfall; skipping
07/03/2008 23:02:34|lhcathome|[work_fetch_debug] work fetch: project not contactable; skipping
07/03/2008 23:02:34|orbit@home|[work_fetch_debug] work fetch: project not contactable; skipping
07/03/2008 23:02:34|SETI@home|[work_fetch_debug] project is overworked; skipping
07/03/2008 23:02:34|SETI@home Beta Test|[work_fetch_debug] work fetch: project not contactable; skipping
07/03/2008 23:02:40||[work_fetch_debug] Request work fetch: Project backoff ended
07/03/2008 23:02:40||[work_fetch_debug] compute_work_requests(): start
07/03/2008 23:02:40||[cpu_sched_debug] CPU efficiency old 0.991026 new 0.991013 wall 48.671997 CPU 47.065467 w 0.999437 e 0.966993
07/03/2008 23:02:40||[work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
07/03/2008 23:02:40|climateprediction.net|[work_fetch_debug] work fetch: project not contactable; skipping
07/03/2008 23:02:40|CPDN Beta|[work_fetch_debug] work fetch: project not contactable; skipping
07/03/2008 23:02:40|Einstein@Home|[work_fetch_debug] project has no shortfall; skipping
07/03/2008 23:02:40|lhcathome|[work_fetch_debug] work fetch: project not contactable; skipping
07/03/2008 23:02:40|orbit@home|[work_fetch_debug] work fetch: project not contactable; skipping
07/03/2008 23:02:40|SETI@home|[work_fetch_debug] project is overworked; skipping
07/03/2008 23:02:40|SETI@home Beta Test|[work_fetch_debug] project has no shortfall; skipping

Edit: Here's a section from last night, on one of the rare occasions when it decided to fetch work from SETI Beta:

2008-03-06 20:35:18 [---] [work_fetch_debug] Request work fetch: Project backoff ended
2008-03-06 20:35:18 [---] [work_fetch_debug] compute_work_requests(): start
2008-03-06 20:35:18 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
2008-03-06 20:35:18 [climateprediction.net] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:35:18 [CPDN Beta] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:35:18 [Einstein@Home] [work_fetch_debug] project has no shortfall; skipping
2008-03-06 20:35:18 [lhcathome] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:35:18 [SETI@home] [work_fetch_debug] project has no shortfall; skipping
2008-03-06 20:35:18 [SETI@home Beta Test] [work_fetch_debug] project has no shortfall; skipping
2008-03-06 20:35:18 [orbit@home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:19 [---] [work_fetch_debug] Request work fetch: timer
2008-03-06 20:36:19 [---] [work_fetch_debug] compute_work_requests(): start
2008-03-06 20:36:19 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 13.950803, overall urgency Need
2008-03-06 20:36:19 [climateprediction.net] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:19 [CPDN Beta] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:19 [Einstein@Home] [work_fetch_debug] best project so far
2008-03-06 20:36:19 [lhcathome] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:19 [SETI@home] [work_fetch_debug] project has less LTD than Einstein@Home
2008-03-06 20:36:19 [SETI@home Beta Test] [work_fetch_debug] best project so far
2008-03-06 20:36:19 [orbit@home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:19 [SETI@home Beta Test] [work_fetch_debug] compute_work_requests(): work req 4.650268, shortfall 0.000000, urgency OK
2008-03-06 20:36:23 [---] [work_fetch_debug] time_until_work_done(): est 0.000000 ssr 800.000000 apr 7.931308 prs 200.000000
2008-03-06 20:36:23 [SETI@home Beta Test] Sending scheduler request: To fetch work
2008-03-06 20:36:23 [SETI@home Beta Test] Requesting 5 seconds of new work
2008-03-06 20:36:48 [SETI@home Beta Test] Scheduler RPC succeeded [server version 601]
2008-03-06 20:36:48 [SETI@home Beta Test] Deferring communication for 7 sec
2008-03-06 20:36:48 [SETI@home Beta Test] Reason: requested by project
2008-03-06 20:36:48 [---] [work_fetch_debug] Request work fetch: RPC complete
2008-03-06 20:36:49 [---] [work_fetch_debug] compute_work_requests(): start
2008-03-06 20:36:49 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 87.374924, overall urgency Need
2008-03-06 20:36:49 [climateprediction.net] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:49 [CPDN Beta] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:49 [Einstein@Home] [work_fetch_debug] best project so far
2008-03-06 20:36:49 [lhcathome] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:49 [SETI@home] [work_fetch_debug] project has less LTD than Einstein@Home
2008-03-06 20:36:49 [SETI@home Beta Test] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:49 [orbit@home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:49 [Einstein@Home] [work_fetch_debug] compute_work_requests(): work req 14.562487, shortfall 0.000000, urgency OK
2008-03-06 20:36:51 [SETI@home Beta Test] [file_xfer] Started download of file 11oc06aa.18571.545437.14.10.88
2008-03-06 20:36:54 [---] [work_fetch_debug] time_until_work_done(): est 80451.142322 ssr 800.000000 apr 7.931366 prs 100.000000
2008-03-06 20:36:54 [Einstein@Home] Sending scheduler request: To fetch work
2008-03-06 20:36:54 [Einstein@Home] Requesting 15 seconds of new work
2008-03-06 20:36:56 [---] [work_fetch_debug] Request work fetch: Project backoff ended
2008-03-06 20:36:56 [---] [work_fetch_debug] compute_work_requests(): start
2008-03-06 20:36:56 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 237.182610, overall urgency Need
2008-03-06 20:36:56 [climateprediction.net] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:56 [CPDN Beta] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:56 [Einstein@Home] [work_fetch_debug] best project so far
2008-03-06 20:36:56 [lhcathome] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:56 [SETI@home] [work_fetch_debug] project has less LTD than Einstein@Home
2008-03-06 20:36:56 [SETI@home Beta Test] [work_fetch_debug] best project so far
2008-03-06 20:36:56 [orbit@home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:56 [SETI@home Beta Test] [work_fetch_debug] compute_work_requests(): work req 79.060870, shortfall 0.000000, urgency OK
2008-03-06 20:36:57 [SETI@home Beta Test] [file_xfer] Finished download of file 11oc06aa.18571.545437.14.10.88
2008-03-06 20:36:57 [SETI@home Beta Test] [file_xfer] Throughput 82665 bytes/sec
2008-03-06 20:36:59 [Einstein@Home] Scheduler RPC succeeded [server version 601]
2008-03-06 20:36:59 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-03-06 20:36:59 [Einstein@Home] Reason: requested by project
2008-03-06 20:36:59 [---] [work_fetch_debug] Request work fetch: RPC complete
2008-03-06 20:36:59 [---] [work_fetch_debug] compute_work_requests(): start
2008-03-06 20:36:59 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
2008-03-06 20:36:59 [climateprediction.net] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:59 [CPDN Beta] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:59 [Einstein@Home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:59 [lhcathome] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:59 [SETI@home] [work_fetch_debug] best project so far
2008-03-06 20:36:59 [SETI@home Beta Test] [work_fetch_debug] project has no shortfall; skipping
2008-03-06 20:36:59 [orbit@home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:36:59 [SETI@home] [work_fetch_debug] compute_work_requests(): work req 1173.277255, shortfall 1173.277255, urgency Need
2008-03-06 20:37:04 [---] [work_fetch_debug] time_until_work_done(): est 211151.817560 ssr 800.000000 apr 7.931474 prs 300.000000
2008-03-06 20:37:04 [SETI@home] Sending scheduler request: To fetch work
2008-03-06 20:37:04 [SETI@home] Requesting 1173 seconds of new work
2008-03-06 20:37:09 [SETI@home] Scheduler RPC succeeded [server version 601]
2008-03-06 20:37:09 [SETI@home] Deferring communication for 11 sec
2008-03-06 20:37:09 [SETI@home] Reason: requested by project
2008-03-06 20:37:09 [---] [work_fetch_debug] Request work fetch: RPC complete
2008-03-06 20:37:09 [---] [work_fetch_debug] compute_work_requests(): start
2008-03-06 20:37:09 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
2008-03-06 20:37:09 [climateprediction.net] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:09 [CPDN Beta] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:09 [Einstein@Home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:09 [lhcathome] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:09 [SETI@home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:09 [SETI@home Beta Test] [work_fetch_debug] project has no shortfall; skipping
2008-03-06 20:37:09 [orbit@home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:11 [SETI@home] [file_xfer] Started download of file 30mr07aa.8116.23794.10.7.128
2008-03-06 20:37:18 [SETI@home] [file_xfer] Finished download of file 30mr07aa.8116.23794.10.7.128
2008-03-06 20:37:18 [SETI@home] [file_xfer] Throughput 69525 bytes/sec
2008-03-06 20:37:20 [---] [work_fetch_debug] Request work fetch: Project backoff ended
2008-03-06 20:37:20 [---] [work_fetch_debug] compute_work_requests(): start
2008-03-06 20:37:20 [---] [work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
2008-03-06 20:37:20 [climateprediction.net] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:20 [CPDN Beta] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:20 [Einstein@Home] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:20 [lhcathome] [work_fetch_debug] work fetch: project not contactable; skipping
2008-03-06 20:37:20 [SETI@home] [work_fetch_debug] project has no shortfall; skipping
2008-03-06 20:37:20 [SETI@home Beta Test] [work_fetch_debug] project has no shortfall; skipping
2008-03-06 20:37:20 [orbit@home] [work_fetch_debug] work fetch: project not contactable; skipping
ID: 15718 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 15719 - Posted: 8 Mar 2008, 0:09:17 UTC

You seem to have hit the highlights.

In any case, BOINC does not guarantee anything aobut short term. However, it should approximately balance in the long run. As more and more other projects drop below the work fetch cuttoff, this project will be asked for a larger and larger share of the queue.

BOINC WIKI
ID: 15719 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15720 - Posted: 8 Mar 2008, 0:34:14 UTC - in response to Message 15719.  

You seem to have hit the highlights.

In any case, BOINC does not guarantee anything aobut short term. However, it should approximately balance in the long run. As more and more other projects drop below the work fetch cuttoff, this project will be asked for a larger and larger share of the queue.

Ack, no short-term guarantees. But this failure to fetch work distorts (unnecessarily, imv) the work-flow pattern for BOINC as a whole, and could in the end be detrimental to production science: doesn't matter on a Beta project, of course.

For example: STD is limit-stopped, but LTD continues to grow. So now SETI general is at -10,856 LTD, and fetch-inhibited. Because I was inveigled into 'upgrading' to v5.10.45 seven hours ago, that means that I now have ten completed SETI tasks which are reporting-inhibited as well.

When, eventually, SETI Beta starts requesting new work, the debt mountain means that many cores (probably at least six) will start working on the same project at the same time. I don't know if you've been following threads like Peter Stoll's SETI and Einstein Cooperation on a Q6600, but that's a very inefficient way of utilising a multicore.

----------------------

Bed-time in the UK now. I'll leave both debug flags active overnight, and see if we can catch another of the elusive Beta requests in action: but unless you have any specific further requests, I'll turn them off in the morning so I can keep a longer, less detailed log history (and save some CPU cycles for science).
ID: 15720 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 15722 - Posted: 8 Mar 2008, 9:37:18 UTC


While projects which offer no work (or are set to 'no more work') cause the LTD to increase, and this is the intended + designed behaviour, personally I've always felt that in this situation they shouldn't increase the LTD any further. But no doubt every cruncher in Boinc-land has their own preferences :-)


ID: 15722 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 15788 - Posted: 11 Mar 2008, 12:44:39 UTC - in response to Message 15722.  


While projects which offer no work (or are set to 'no more work') cause the LTD to increase, and this is the intended + designed behaviour, personally I've always felt that in this situation they shouldn't increase the LTD any further. But no doubt every cruncher in Boinc-land has their own preferences :-)


Projects that are labeled NNW participate in LTD calculations UNTIL THE WORK IS COMPLETED. After the work is completed the projects LTD is frozen. Projects that are suspended also have their LTD frozen. The only exception to the frozen LTD is the rebalancing. If a project is detached, its LTD is removed from the LTD calculation and the LTDs of all CPU intensive projects are shifted so that the mean is 0 again. A similar thing happens if a project is reset - its LTD is momentarily set to 0, and the rebalance shifts all LTDs of CPU intensive projects so that the mean is 0 again.

Projects that do not offer work do not increase LTD as long as they are being asked for work promptly at the end of the communications deferral. This is the best compromise we have been able to come up with.

An example of why you want projects whose last communications was a failure to supply work to increase LTD after the communications deferral time is over:

I had the following set of interactions happen in order:

S@H failed to get work.
Einstein provided work.
CPDN went into EDF for 6 months (the work was actually reported late).
Einstein eventually finished its task in EDF.

Now if S@H had not increased its LTD during that 6 months, it would not have been right. However, S@H was not being asked for work. No project was being asked for work as the computer couldn't handle any more work.

BOINC WIKI
ID: 15788 · Report as offensive
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 15791 - Posted: 11 Mar 2008, 13:10:50 UTC - in response to Message 15789.  

If CPDN gives work with a deadline that has a far enough deadline so it will stick to its Resource Share all the way through, I'd consider computing there again.


Some work given out by CPDN recently really had too short deadlines. But things are changing for better ... last WUs I've received from CPDN have estimated CPU time of 25 days on my P4 Xeons and deadline set to 5 months in future. That looks fine to me.

Metod ...
ID: 15791 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 15793 - Posted: 11 Mar 2008, 14:04:07 UTC - in response to Message 15791.  

If CPDN gives work with a deadline that has a far enough deadline so it will stick to its Resource Share all the way through, I'd consider computing there again.


Some work given out by CPDN recently really had too short deadlines. But things are changing for better ... last WUs I've received from CPDN have estimated CPU time of 25 days on my P4 Xeons and deadline set to 5 months in future. That looks fine to me.

Sort of depends on the capabilities of the computer. The one that had that happen was just barely above minimum specifications. I now have a task that is due sometime in 2010. The current estimate is that it will take 280 days of CPU time. I expect that sooner or later, this CPU is going to go into EDF for a while.

The computer that went into EDF for 6 months now has an LTD of about - 14,000,000 so I am not expecting any work requests for CPDN on that computer for a while.

BOINC WIKI
ID: 15793 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15806 - Posted: 11 Mar 2008, 19:45:41 UTC

I'd been wondering whether CPDN had any sort of 'special case' treatment in the work fetch algorithm.

I have a second machine, a quad core, which had got itself into a very similar situation. I ran it 100% on Einstein for a while, as a timing test for a new Beta application: other projects were flushed down to zero tasks remaining, but CPDN had a single part-crunched task in suspended animation so LTD grew to a massive value. Then, when I opened up the machine to other projects at the end of the test, Einstein was inhibited from fetching work, the single CPDN task (controlled by NNT) wasn't enough to use up the resource share, and STD grew as well.

I ended up with CPDN in a very similar state to the SETI Beta saga I described earlier in this thread: STD up against the stop at 86,400: LTD somewhere in the 400,000s: one task in progress: additional cores available. But this time, when I released NNT, BOINC initiated a CPDN work fetch within a few seconds.

I'm wondering whether there's anything in the work fetch calculation which considers the size of the running task against the time remaining before deadline.

In CPDN's case, those figures are about 96 days to complete a full task, against 30 months deadline. So there is plenty of time available to complete two tasks, even if for some reason they have follow each other on a single core.

But for the SETI Beta/Astropulse example, the figures are 268 hours estimate/134 hours actual computation time (RDCF permanently messed up by the other SETI application), against 2 weeks deadline. So in the initial stages at least, there would not be enough time to slot in a second task on the same core as the first (after it completes), and still meet the deadline for that second task. This may be the point at which BOINC fails to take the availability of a second core into account: if the calculation was done on the basis that any hypothetical new task wouldn't start until the current one has finished, that would account for the behaviour I've observed.

The machine is in a state of transition at the moment. The long-running Beta task has just finished, and I have about a day's worth of short-running tasks to work through, as per my cache preferences. I also have one brand-new long-running task, with initial timings so tight that it spent a couple of brief periods running at high priority. When the short tasks are exhausted, I'll watch to see if it fetches any more, or whether I'm in for several more days running under share allocation, watching STD bump against the stop and LTD climbing inexorably.
ID: 15806 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 15809 - Posted: 11 Mar 2008, 20:37:27 UTC - in response to Message 15806.  

I'd been wondering whether CPDN had any sort of 'special case' treatment in the work fetch algorithm.

I have a second machine, a quad core, which had got itself into a very similar situation. I ran it 100% on Einstein for a while, as a timing test for a new Beta application: other projects were flushed down to zero tasks remaining, but CPDN had a single part-crunched task in suspended animation so LTD grew to a massive value. Then, when I opened up the machine to other projects at the end of the test, Einstein was inhibited from fetching work, the single CPDN task (controlled by NNT) wasn't enough to use up the resource share, and STD grew as well.

I ended up with CPDN in a very similar state to the SETI Beta saga I described earlier in this thread: STD up against the stop at 86,400: LTD somewhere in the 400,000s: one task in progress: additional cores available. But this time, when I released NNT, BOINC initiated a CPDN work fetch within a few seconds.

I'm wondering whether there's anything in the work fetch calculation which considers the size of the running task against the time remaining before deadline.

In CPDN's case, those figures are about 96 days to complete a full task, against 30 months deadline. So there is plenty of time available to complete two tasks, even if for some reason they have follow each other on a single core.

But for the SETI Beta/Astropulse example, the figures are 268 hours estimate/134 hours actual computation time (RDCF permanently messed up by the other SETI application), against 2 weeks deadline. So in the initial stages at least, there would not be enough time to slot in a second task on the same core as the first (after it completes), and still meet the deadline for that second task. This may be the point at which BOINC fails to take the availability of a second core into account: if the calculation was done on the basis that any hypothetical new task wouldn't start until the current one has finished, that would account for the behaviour I've observed.

The machine is in a state of transition at the moment. The long-running Beta task has just finished, and I have about a day's worth of short-running tasks to work through, as per my cache preferences. I also have one brand-new long-running task, with initial timings so tight that it spent a couple of brief periods running at high priority. When the short tasks are exhausted, I'll watch to see if it fetches any more, or whether I'm in for several more days running under share allocation, watching STD bump against the stop and LTD climbing inexorably.

Based on your description, you are in for a bit of running other projects on the other cores. When it is clear to BOINC that it is safe to download more S@H Beta work, you will get more.

BOINC WIKI
ID: 15809 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 15813 - Posted: 11 Mar 2008, 22:23:08 UTC - in response to Message 15789.  

Thank for confirming that John McLeod VII. The CPDN project, though supposedly having an open deadline, does not have one afa BOINC is concerned and effectively shuts out other projects and becomes plain greedy. If CPDN gives work with a deadline that has a far enough deadline so it will stick to its Resource Share all the way through, I'd consider computing there again.

It does stick to resource share, but over a long term. If you have it on 50/50 share with another project, on a slow computer that may mean 6 months of CPDN and 6 months of the other project, instead of switching every hour as you would expect/want.
ID: 15813 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 15814 - Posted: 11 Mar 2008, 22:45:25 UTC - in response to Message 15813.  

... If CPDN gives work with a deadline that has a far enough deadline so it will stick to its Resource Share all the way through, I'd consider computing there again.



CPDN recently tried to issue some models with a long deadline, but this caused big problems - older versions of the Boinc manager couldn't cope with a deadline more than 1000 days (deadline was reset to 1901).


ID: 15814 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15862 - Posted: 13 Mar 2008, 11:09:47 UTC

Well, much to my surprise, I woke up this morning to find I had a cache of SETI Beta work to keep my second allocated core working.

Well, not quite a second allocated core. Apart from the 'upgrade' to BOINC v5.10.45, there has been one other change in the experimental conditions - LHC has started sending out lots of work, so the allocation is now 2/9 (22.22%), instead of 2/8 (25%).

So this is what I saw this morning:


(direct link) ................... (direct link)

That's SEVEN SETI Beta tasks running at the same time, including the Astropulse one at the top. Then they task-switched, and now I'm on SIX SETI main, and two CPDN.

I didn't get the debt values when they downloaded (I was asleep at the time), but when they switched out after an hour's run, BETA was at STD -11,650 LTD +645,133: next highest LTD on the system is LHC at +299,351

The log is curious. I ran dry on the second core at 05:33:13, but it didn't fetch any work for three hours - in fact, it actually reported six tasks (at 4.5 hours each, that's a days work) and requested nothing.

Then it started requesting 1.5 hours work, and receiving 4.5 hours work, again and again and again. Here's the filtered log:

SETI@home Beta Test 13/03/2008 05:33:13 Computation for task 27mr07ag.15695.11524.3.10.182_2 finished
SETI@home Beta Test 13/03/2008 05:33:15 Started upload of 27mr07ag.15695.11524.3.10.182_2_0
SETI@home Beta Test 13/03/2008 05:33:22 Finished upload of 27mr07ag.15695.11524.3.10.182_2_0
SETI@home Beta Test 13/03/2008 05:51:31 Sending scheduler request: To report completed tasks. Requesting 0 seconds of work, reporting 6 completed tasks
SETI@home Beta Test 13/03/2008 05:51:37 Scheduler request succeeded: got 0 new tasks
SETI@home Beta Test 13/03/2008 08:36:21 Sending scheduler request: To fetch work. Requesting 7502 seconds of work, reporting 0 completed tasks
SETI@home Beta Test 13/03/2008 08:36:26 Scheduler request succeeded: got 1 new tasks
SETI@home Beta Test 13/03/2008 08:36:28 Started download of 27mr07ag.15695.15205.3.10.180
SETI@home Beta Test 13/03/2008 08:36:34 Finished download of 27mr07ag.15695.15205.3.10.180
SETI@home Beta Test 13/03/2008 08:41:05 Sending scheduler request: To fetch work. Requesting 4401 seconds of work, reporting 0 completed tasks
SETI@home Beta Test 13/03/2008 08:41:10 Scheduler request succeeded: got 1 new tasks
SETI@home Beta Test 13/03/2008 08:41:12 Started download of 27mr07ag.15695.15205.3.10.168
SETI@home Beta Test 13/03/2008 08:41:18 Finished download of 27mr07ag.15695.15205.3.10.168
SETI@home Beta Test 13/03/2008 08:41:26 Sending scheduler request: To fetch work. Requesting 5338 seconds of work, reporting 0 completed tasks
SETI@home Beta Test 13/03/2008 08:41:31 Scheduler request succeeded: got 1 new tasks
SETI@home Beta Test 13/03/2008 08:41:33 Started download of 27mr07ag.15695.15205.3.10.182
SETI@home Beta Test 13/03/2008 08:41:39 Finished download of 27mr07ag.15695.15205.3.10.182
SETI@home Beta Test 13/03/2008 08:41:46 Sending scheduler request: To fetch work. Requesting 5394 seconds of work, reporting 0 completed tasks
SETI@home Beta Test 13/03/2008 08:41:51 Scheduler request succeeded: got 1 new tasks
SETI@home Beta Test 13/03/2008 08:41:53 Started download of 27mr07ag.15695.15205.3.10.137
SETI@home Beta Test 13/03/2008 08:41:59 Finished download of 27mr07ag.15695.15205.3.10.137
SETI@home Beta Test 13/03/2008 08:42:07 Sending scheduler request: To fetch work. Requesting 5450 seconds of work, reporting 0 completed tasks
SETI@home Beta Test 13/03/2008 08:42:12 Scheduler request succeeded: got 1 new tasks
SETI@home Beta Test 13/03/2008 08:42:14 Started download of 27mr07ag.15695.15205.3.10.193
SETI@home Beta Test 13/03/2008 08:42:20 Finished download of 27mr07ag.15695.15205.3.10.193
SETI@home Beta Test 13/03/2008 08:42:27 Sending scheduler request: To fetch work. Requesting 5510 seconds of work, reporting 0 completed tasks
SETI@home Beta Test 13/03/2008 08:42:32 Scheduler request succeeded: got 1 new tasks
SETI@home Beta Test 13/03/2008 08:42:34 Started download of 27mr07ag.15695.15205.3.10.96
SETI@home Beta Test 13/03/2008 08:42:40 Finished download of 27mr07ag.15695.15205.3.10.96
SETI@home Beta Test 13/03/2008 08:42:47 Sending scheduler request: To fetch work. Requesting 5565 seconds of work, reporting 0 completed tasks
SETI@home Beta Test 13/03/2008 08:42:52 Scheduler request succeeded: got 1 new tasks
SETI@home Beta Test 13/03/2008 08:42:55 Started download of 27mr07ag.15695.15205.3.10.176
SETI@home Beta Test 13/03/2008 08:43:01 Finished download of 27mr07ag.15695.15205.3.10.176
SETI@home Beta Test 13/03/2008 08:43:08 Sending scheduler request: To fetch work. Requesting 5619 seconds of work, reporting 0 completed tasks
SETI@home Beta Test 13/03/2008 08:43:13 Scheduler request succeeded: got 1 new tasks
SETI@home Beta Test 13/03/2008 08:43:15 Started download of 27mr07ag.15695.15205.3.10.147
SETI@home Beta Test 13/03/2008 08:43:21 Finished download of 27mr07ag.15695.15205.3.10.147
SETI@home Beta Test 13/03/2008 08:43:28 Sending scheduler request: To fetch work. Requesting 2093 seconds of work, reporting 0 completed tasks
SETI@home Beta Test 13/03/2008 08:43:33 Scheduler request succeeded: got 1 new tasks
SETI@home Beta Test 13/03/2008 08:43:35 Started download of 27mr07ag.15695.15205.3.10.190
SETI@home Beta Test 13/03/2008 08:43:41 Finished download of 27mr07ag.15695.15205.3.10.190
SETI@home Beta Test 13/03/2008 08:45:14 Starting 27mr07ag.15695.15205.3.10.180_2
SETI@home Beta Test 13/03/2008 08:45:14 Starting task 27mr07ag.15695.15205.3.10.180_2 using setiathome_enhanced version 600
SETI@home Beta Test 13/03/2008 09:24:37 Starting 27mr07ag.15695.15205.3.10.168_2
SETI@home Beta Test 13/03/2008 09:24:37 Starting task 27mr07ag.15695.15205.3.10.168_2 using setiathome_enhanced version 600
SETI@home Beta Test 13/03/2008 09:25:27 Starting 27mr07ag.15695.15205.3.10.182_1
SETI@home Beta Test 13/03/2008 09:25:27 Starting task 27mr07ag.15695.15205.3.10.182_1 using setiathome_enhanced version 600
SETI@home Beta Test 13/03/2008 09:25:33 Starting 27mr07ag.15695.15205.3.10.137_2
SETI@home Beta Test 13/03/2008 09:25:33 Starting task 27mr07ag.15695.15205.3.10.137_2 using setiathome_enhanced version 600
SETI@home Beta Test 13/03/2008 09:26:23 Starting 27mr07ag.15695.15205.3.10.193_2
SETI@home Beta Test 13/03/2008 09:26:23 Starting task 27mr07ag.15695.15205.3.10.193_2 using setiathome_enhanced version 600
SETI@home Beta Test 13/03/2008 09:28:12 Starting 27mr07ag.15695.15205.3.10.96_2
SETI@home Beta Test 13/03/2008 09:28:12 Starting task 27mr07ag.15695.15205.3.10.96_2 using setiathome_enhanced version 600
ID: 15862 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15864 - Posted: 13 Mar 2008, 12:17:56 UTC - in response to Message 15863.  

Are u using the 'Connect every X days' to buffer or the 'Additional Buffer X Days' aka 'Cache' option?

Settings are 0.01 'Connect every', 1.0 days 'Additional Cache'.
ID: 15864 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 16003 - Posted: 19 Mar 2008, 19:23:10 UTC

Cross-referencing to SETI thread boinc resource share, where I've posted some data-logged graphs of BOINC STD and (indirectly) task switching. Doesn't immediately answer the opening question about Work Fetch (although my data logger tracks LTD as well), but it certainly demonstrates that the current BOINC isn't as efficient as it could be on a multicore.
ID: 16003 · Report as offensive
Previous · 1 · 2

Message boards : BOINC client : Dual core

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.