Dual core

Message boards : BOINC client : Dual core
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Mike Gibson

Send message
Joined: 5 Mar 08
Posts: 24
United Kingdom
Message 15668 - Posted: 5 Mar 2008, 0:50:01 UTC

Hi, folks

I get the impression that when BOINC checks whether there is enough time available to complete the offered units, it does not allow for multiple cores and assumes that there is only one core operational. Am I correct?

I seem to have had units refused when I reckon that there would have been time after allowing for dual cores. Also, this assessment does not seem to allow for other projects being out of units, thereby increasing the % available for the remaining projects!

Will these anomalies be corrected soon?

Cheers

Mike
ID: 15668 · Report as offensive
rroonnaalldd

Send message
Joined: 7 Jan 08
Posts: 31
Germany
Message 15675 - Posted: 5 Mar 2008, 12:06:21 UTC - in response to Message 15668.  

Hi, folks

I get the impression that when BOINC checks whether there is enough time available to complete the offered units, it does not allow for multiple cores and assumes that there is only one core operational. Am I correct?

I seem to have had units refused when I reckon that there would have been time after allowing for dual cores. Also, this assessment does not seem to allow for other projects being out of units, thereby increasing the % available for the remaining projects!

Will these anomalies be corrected soon?

Cheers

Mike


nope. how many processors you have allowed for boinc? i think you have boinc advised to use only one. check your preferences to set up 2 cores...

ID: 15675 · Report as offensive
Mike Gibson

Send message
Joined: 5 Mar 08
Posts: 24
United Kingdom
Message 15680 - Posted: 5 Mar 2008, 19:36:02 UTC - in response to Message 15675.  

Hi, folks

I get the impression that when BOINC checks whether there is enough time available to complete the offered units, it does not allow for multiple cores and assumes that there is only one core operational. Am I correct?

I seem to have had units refused when I reckon that there would have been time after allowing for dual cores. Also, this assessment does not seem to allow for other projects being out of units, thereby increasing the % available for the remaining projects!

Will these anomalies be corrected soon?

Cheers

Mike


nope. how many processors you have allowed for boinc? i think you have boinc advised to use only one. check your preferences to set up 2 cores...


That is what it looks like but the settings are for 100% of 2 processors, so that can't be the reason!

Mike
ID: 15680 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 15681 - Posted: 5 Mar 2008, 20:00:54 UTC

Mike

That is what it looks like but the settings are for 100% of 2 processors, so that can't be the reason!

You're not, perchance, wanting several cores to work together on a single Work Unit (WU), are you?

How many cores Do you have?
And how many cores how you told BOINC it can use?
(In this setting: On multiprocessors, use at most which is referring to the number of cores.)
And it's 100% of THIS NUMBER OF CORES, not 100% of the total number in the cpu.

ID: 15681 · Report as offensive
Mike Gibson

Send message
Joined: 5 Mar 08
Posts: 24
United Kingdom
Message 15682 - Posted: 5 Mar 2008, 21:11:19 UTC - in response to Message 15681.  

Mike

That is what it looks like but the settings are for 100% of 2 processors, so that can't be the reason!

You're not, perchance, wanting several cores to work together on a single Work Unit (WU), are you?

How many cores Do you have?
And how many cores how you told BOINC it can use?
(In this setting: On multiprocessors, use at most which is referring to the number of cores.)
And it's 100% of THIS NUMBER OF CORES, not 100% of the total number in the cpu.



Les

I have 2 cores and BOINC runs 2 WU's at any one time. The problem is that when BOINC is checking whether any potential new WU's can be completed within deadline, it doesn't allow for the fact that 10% of 2 cores is equivalent to 20% of 1 core and just uses 10% for the test and therefore thinks that the work will take twice as long as it actually does.

Mike
ID: 15682 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 15683 - Posted: 5 Mar 2008, 21:24:03 UTC

It's a bit complicated, but as far as I know, BOINC uses things like the Result duration correction factor for a project/computer to see how long a WU has taken in the past. (And some other similar factors.)

It's probably in the Wiki somewhere. No doubt one of the more knowledgable will show up soon with better detail.

ID: 15683 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 15689 - Posted: 5 Mar 2008, 23:33:24 UTC - in response to Message 15683.  

It's a bit complicated, but as far as I know, BOINC uses things like the Result duration correction factor for a project/computer to see how long a WU has taken in the past. (And some other similar factors.)

Better said, to learn how long tasks take in general. Different task run-times can throw the number off. It should be 1 or below.

Where to find it? In your computers list, hostID details, at the bottom of the page.
ID: 15689 · Report as offensive
Mike Gibson

Send message
Joined: 5 Mar 08
Posts: 24
United Kingdom
Message 15690 - Posted: 6 Mar 2008, 0:00:10 UTC - in response to Message 15689.  

It's a bit complicated, but as far as I know, BOINC uses things like the Result duration correction factor for a project/computer to see how long a WU has taken in the past. (And some other similar factors.)

Better said, to learn how long tasks take in general. Different task run-times can throw the number off. It should be 1 or below.

Where to find it? In your computers list, hostID details, at the bottom of the page.



Hi, Jord

The RDCF is 1.195342. Hasn't that already been incorporated in the Time to Completion? I had been doing my comparison of what was possible using the Time to Completion, which seems to have been fairly accurate. A 20% shift on that would not have been sufficient to cause the problem.

Regards

Mike
ID: 15690 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 15691 - Posted: 6 Mar 2008, 0:27:40 UTC - in response to Message 15690.  

The RDCF is 1.195342. Hasn't that already been incorporated in the Time to Completion?

That's quite a healthy number.
It has been incorporated into the time to completion, but it changes during and after each individual task. So if you have one task that runs for an hour, while the estimate was 2 hours, the RDCF goes down. If then the next task takes 3 hours, where the ETC is again 2 hours, it goes back up.

The biggest question is, on which project do you see this? (Or projects, if more).
ID: 15691 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15699 - Posted: 6 Mar 2008, 13:48:08 UTC
Last modified: 6 Mar 2008, 13:54:24 UTC

Came across this thread while pondering a work-fetch problem. I think it's a related issue.

Bear with me, this is going to get complicated.

I have 8 cores. Neglecting projects which currently have no work or are disabled, they have been allocated:

Share___Project__________STD____________LTD
200 ... CPDN ........ -6,253 ... -1,129,611
100 ... Einstein ... -21,185 ....... 74,984
300 ... SETI ....... -14,491 ....... 52,856
200 ... SETI Beta ... 41,931 ...... 495,423

Those shares add up to 800, so the cores should be working 2, 1, 3, 2 on the four projects.

By the debt figures, SETI Beta should be the next to both fetch work and schedule a task. Yet I have only one SETI Beta task on board, and BOINC shows no sign of fetching another one.

The single SETI Beta task I have is an Astropulse WU, currently estimating 202 hours remaining for a 19 March deadline. Tight, but with a 0.9888 efficiency for the machine as a whole, and a low CI/cache, there's no deadline pressure. In fact, it will finish in plenty of time, but BOINC doesn't know that (SETI Beta RDCFs are confused by having two separate applications under test).

From previous experience - this has been nagging me for some time - BOINC will only start to fetch Beta work for a second core when the time remaining on the AP task has dropped below my cache setting (1 day). By then, STD will have bumped up against the limit stop of 86,400, and LTD will be up around 850,000 - both figures have gone up by over 1,000 since I started typing this.

As Mike Gibson says, BOINC doesn't seem to be correctly allowing for the availability of extra cores when making work fetch calculations/decisions.

My BOINC - as Ageless well knows - is my old faithful v5.10.13 for Windows: but I've checked the change log, and there's no report of any adjustment to this part of the scheduler since then.

Edit - I've turned on <work_fetch_debug> in cc_config.xml - let's see what that throws up.

At the moment, it's saying:

06/03/2008 13:52:03||[work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
06/03/2008 13:52:03|SETI@home Beta Test|[work_fetch_debug] project has no shortfall; skipping
ID: 15699 · Report as offensive
Mike Gibson

Send message
Joined: 5 Mar 08
Posts: 24
United Kingdom
Message 15700 - Posted: 6 Mar 2008, 20:08:22 UTC - in response to Message 15691.  

The RDCF is 1.195342. Hasn't that already been incorporated in the Time to Completion?

That's quite a healthy number.
It has been incorporated into the time to completion, but it changes during and after each individual task. So if you have one task that runs for an hour, while the estimate was 2 hours, the RDCF goes down. If then the next task takes 3 hours, where the ETC is again 2 hours, it goes back up.

The biggest question is, on which project do you see this? (Or projects, if more).



I am mostly getting it on LHC. I wouldn't worry about any others because they have all utilised LHC time and catch up eventually. LHC only has work for brief periods so it is particularly galling to be refused work unnecessarily when there is a hefty debt. LHC, of course, never catches up.

By the way, where can I find the official debt figures? I have done my own figures to check.

Regards

Mike

Mike
ID: 15700 · Report as offensive
Mike Gibson

Send message
Joined: 5 Mar 08
Posts: 24
United Kingdom
Message 15701 - Posted: 6 Mar 2008, 22:19:14 UTC - in response to Message 15699.  

Came across this thread while pondering a work-fetch problem. I think it's a related issue.

Bear with me, this is going to get complicated.

I have 8 cores. Neglecting projects which currently have no work or are disabled, they have been allocated:

Share___Project__________STD____________LTD
200 ... CPDN ........ -6,253 ... -1,129,611
100 ... Einstein ... -21,185 ....... 74,984
300 ... SETI ....... -14,491 ....... 52,856
200 ... SETI Beta ... 41,931 ...... 495,423

Those shares add up to 800, so the cores should be working 2, 1, 3, 2 on the four projects.

By the debt figures, SETI Beta should be the next to both fetch work and schedule a task. Yet I have only one SETI Beta task on board, and BOINC shows no sign of fetching another one.

The single SETI Beta task I have is an Astropulse WU, currently estimating 202 hours remaining for a 19 March deadline. Tight, but with a 0.9888 efficiency for the machine as a whole, and a low CI/cache, there's no deadline pressure. In fact, it will finish in plenty of time, but BOINC doesn't know that (SETI Beta RDCFs are confused by having two separate applications under test).

From previous experience - this has been nagging me for some time - BOINC will only start to fetch Beta work for a second core when the time remaining on the AP task has dropped below my cache setting (1 day). By then, STD will have bumped up against the limit stop of 86,400, and LTD will be up around 850,000 - both figures have gone up by over 1,000 since I started typing this.

As Mike Gibson says, BOINC doesn't seem to be correctly allowing for the availability of extra cores when making work fetch calculations/decisions.

My BOINC - as Ageless well knows - is my old faithful v5.10.13 for Windows: but I've checked the change log, and there's no report of any adjustment to this part of the scheduler since then.

Edit - I've turned on <work_fetch_debug> in cc_config.xml - let's see what that throws up.

At the moment, it's saying:

06/03/2008 13:52:03||[work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
06/03/2008 13:52:03|SETI@home Beta Test|[work_fetch_debug] project has no shortfall; skipping



Richard

I have a 50% allocation to CPDN and with 2 cores, it seemed odd that BOINC wouldn't let CPDN have 1 core and divide up the other core between my other projects. Instead CPDN had 2 WUs part of the time and none at others. I thought this must be wasting computing time with all the stopping and starting, so at one point I was forcing it by setting CPDN to not get more tasks and by suspending the second task. This seemed to work well and could probably work for 8 cores but it would require a lot of monitoring. However, I had to abandon that approach when I started doing LHC which only has work available part-time. Sometimes the WUs are finished in seconds, but BOINC has supplied as many as 40 WUs over a period to keep my machine crunching as long as possible.

Cheers

Mike
ID: 15701 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 15702 - Posted: 7 Mar 2008, 0:20:00 UTC

I don't think BOINC ever designates a specific core for a particular project or task.
ID: 15702 · Report as offensive
Mike Gibson

Send message
Joined: 5 Mar 08
Posts: 24
United Kingdom
Message 15703 - Posted: 7 Mar 2008, 1:36:29 UTC - in response to Message 15702.  

I don't think BOINC ever designates a specific core for a particular project or task.



Mo

That was what we were saying, but also that it is possible to manually force it to happen.

However, the main issue was that BOINC is not allowing for multi cores correctly in assessing WU fetches.

Mike
ID: 15703 · Report as offensive
Mike Gibson

Send message
Joined: 5 Mar 08
Posts: 24
United Kingdom
Message 15704 - Posted: 7 Mar 2008, 2:23:07 UTC - in response to Message 15699.  

Came across this thread while pondering a work-fetch problem. I think it's a related issue.

Bear with me, this is going to get complicated.

I have 8 cores. Neglecting projects which currently have no work or are disabled, they have been allocated:

Share___Project__________STD____________LTD
200 ... CPDN ........ -6,253 ... -1,129,611
100 ... Einstein ... -21,185 ....... 74,984
300 ... SETI ....... -14,491 ....... 52,856
200 ... SETI Beta ... 41,931 ...... 495,423

Those shares add up to 800, so the cores should be working 2, 1, 3, 2 on the four projects.

By the debt figures, SETI Beta should be the next to both fetch work and schedule a task. Yet I have only one SETI Beta task on board, and BOINC shows no sign of fetching another one.

The single SETI Beta task I have is an Astropulse WU, currently estimating 202 hours remaining for a 19 March deadline. Tight, but with a 0.9888 efficiency for the machine as a whole, and a low CI/cache, there's no deadline pressure. In fact, it will finish in plenty of time, but BOINC doesn't know that (SETI Beta RDCFs are confused by having two separate applications under test).

From previous experience - this has been nagging me for some time - BOINC will only start to fetch Beta work for a second core when the time remaining on the AP task has dropped below my cache setting (1 day). By then, STD will have bumped up against the limit stop of 86,400, and LTD will be up around 850,000 - both figures have gone up by over 1,000 since I started typing this.

As Mike Gibson says, BOINC doesn't seem to be correctly allowing for the availability of extra cores when making work fetch calculations/decisions.

My BOINC - as Ageless well knows - is my old faithful v5.10.13 for Windows: but I've checked the change log, and there's no report of any adjustment to this part of the scheduler since then.

Edit - I've turned on <work_fetch_debug> in cc_config.xml - let's see what that throws up.

At the moment, it's saying:

06/03/2008 13:52:03||[work_fetch_debug] compute_work_requests(): cpu_shortfall 0.000000, overall urgency OK
06/03/2008 13:52:03|SETI@home Beta Test|[work_fetch_debug] project has no shortfall; skipping


Richard

I have just realised that BOINC is allowing for your projects that are out of work in case work becomes available by the next fetch, so your total is more than 800. However, the debt aspect should kick in, but unlikely to be continuous until the debt is cleared. The incorrect allowance for multicores in the fetch calculations could be why the debt is not being recovered.

Try suspending other projects one at a time and doing an update on the project with the debt until BOINC asks for the right WU. Then reinstate the projects that you suspended. You may find that High Priority labels are assigned to some WUs again because of the incorrect allowance for multicores. However these are likely to be short-lived and you should find that the debt eases gradually. The High Priority labels could return for reducing periods for a while.

Mike
ID: 15704 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15708 - Posted: 7 Mar 2008, 9:31:09 UTC
Last modified: 7 Mar 2008, 9:31:28 UTC

Mike,

I've been round the blocks a few times with BOINC, and I think I've used most of the tricks you describe. My debt values are deliberately skewed: I run with a baseline of -1,000,000 for CPDN (and No New Tasks set as well) so I will never download a new CPDN task without it being a deliberate choice on my part: and a baseline of +200,000 for all other active projects, so they should never be inhibited from fetching work by the "LTD less than negative of task switch interval" rule. And as you say, there are debt values for CPDN Beta, LHC and Orbit (all of which have been inactive for the last few hours) which I left out for clarity - that's why the figures don't balance to zero as they should.

LHC is a special case. In order to distribute the limited amount of available work fairly among the crunchers, they have unusually strict rules about issuing work (maximum two tasks per request, no repeat request for 15 minutes, quota 10 per CPU per day, quota limited to 4 CPUs). I think any problem with LHC is more likely to be supply restriction, rather than BOINC request calculations.

You asked about checking debt figures. I know of two tools: BoincDV, which displays a snapshot of the current values on request (and can also reset them to zero if you wish): and BoincView, which can monitor/control/log multiple BOINC hosts across a network. BoincView doesn't display debt by default, but you can select columns for both STD and LTD on the projects tab - they update in real time, every 60 seconds. (Incidentally, you can find a bug report from me - [trac]#136[/trac] - on these pages saying that debt isn't updated when networking is suspended, and a contradiction from David Anderson saying that debt is only updated once an hour anyway. He was wrong).

Anyway, back to the problem at hand. The current values from BV are:

SETI Beta - STD 86,400 LTD 559,335

BOINC believes that the single SETI Beta task running on the host will finish in 6 days' time, on 13 March - deadline is 19 March. It has downloaded, completed and reported one additional short (1 hour) task overnight, and requested 40 seconds or so of work three more times: each time, it then asked for Einstein work instead, and stopped asking for SETI Beta work (SETI Beta is out of work at the moment, which confuses things - but BOINC didn't even continue trying). Einstein is at STD -44,969 LTD +43,810 cache 30 hours.

So I think you're right: there is a bug in the work fetch algorithm for dual-cores and upwards, but it's obscure and I don't think I can put it into clear enough language to make it an issue for the main BOINC development team. However, if we could attract the attention of John McLeod VII (who wrote this particular bit of code) to this thread, he might be more interested.
ID: 15708 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 15709 - Posted: 7 Mar 2008, 12:12:47 UTC

I don't have all of the information, but, it is entirely possible that BOINC is blocking downloads from one of the projects because of time pressure. If BOINC cannot finish a task within 90% of the time from now to the computation deadline based on round robin scheduling, it is in time trouble. The computation deadline is the report deadline - (Connect ever X + task switch interval + safety margin). The safety margin is either 24 hours or 0 depending on the version of BOINC. This is to avoid making an already bad situation worse. A project that has a task under time pressure is not allowed to download more work.

If there is sufficient work to cover all of the cores of a machine for the queue time, and there is any task with deadline pressure, no project will be asked for work. Again, no sense in making a bad situation worse.

If a task has enough work to cover its share of the work queue, it will not be asked for work. It already has its share downloaded. Of course, if there is no time task with time pressure, enough work from someplace will be downloaded to keep the queue full for all cores. It just may not be the exact mix that you thought you specified.

BOINC WIKI
ID: 15709 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15710 - Posted: 7 Mar 2008, 12:54:42 UTC - in response to Message 15709.  
Last modified: 7 Mar 2008, 12:55:26 UTC

Thanks for dropping by.

I don't have all of the information, but, it is entirely possible that BOINC is blocking downloads from one of the projects because of time pressure. If BOINC cannot finish a task within 90% of the time from now to the computation deadline based on round robin scheduling, it is in time trouble. The computation deadline is the report deadline - (Connect ever X + task switch interval + safety margin). The safety margin is either 24 hours or 0 depending on the version of BOINC. This is to avoid making an already bad situation worse. A project that has a task under time pressure is not allowed to download more work.

Plugging figures into the formula, I get:

Computation deadline = [19 March 2008 07:06:50] minus (0.01 day + 1 hour + 24 hours)

or about 05:52 on 18 March 2008.

Current completion estimate is 12:28:16 on 13 March 2008 (and dropping), so I think we can rule out time pressure.
If there is sufficient work to cover all of the cores of a machine for the queue time, and there is any task with deadline pressure, no project will be asked for work. Again, no sense in making a bad situation worse.

Work, in general, is being fetched as expected. Requests are regularly going out for typically 70 seconds of SETI general work (cache currently at 2 days 17 hours, against 1 day x 3 cores requested), and around 62,500 seconds of each of LHC and Orbit (caches currently empty).
If a task has enough work to cover its share of the work queue, it will not be asked for work. It already has its share downloaded. Of course, if there is no time task with time pressure, enough work from someplace will be downloaded to keep the queue full for all cores. It just may not be the exact mix that you thought you specified.

The question is, why is no work being requested for SETI Beta, which has by far the highest debt values (both STD and LTD), no unstarted work, and only one active task against a requested resource share of 2/8?

I can only explain that by assuming that BOINC is looking at the "6 days to completion" figure for the current task, converting that into 3 days x 2 cores resource share as the measure of work available, and concluding that there is sufficient to meet my request of 1 day additional cache.

What BOINC seems not to be taking into account is that the 6 day completion figure is a single monolithic task, and cannot be processed across two cores as resource share would specify. I think it should.
ID: 15710 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15713 - Posted: 7 Mar 2008, 16:39:53 UTC - in response to Message 15712.  

Well, there is the more radical solution to go to the project tab, select each project that is allowed to fetch work, but has none in progress or buffer and reset those projects. That should right them for the most part but some residual LTD. OR ditch 5.10.13 and get on a current recommended version.... old trusty is up for long overdue retirement.... why else are we now on like 5.10.45.

As somebody just wrote in a message board far, far away .... this is a bug report.

Sure, I can work round it. Or I can ignore it. (Which I've been doing for weeks). But what I'm trying to do is to document the bug, so that somebody can fix it.

Apart from the (irrelevent) bugfix at v5.10.17, can you point to one bit of evidence that this bug has been corrected between v5.10.13 and v5.10.45?

Sigh. OK, I know: "Upgrade to the latest version, re-test, report again if the problem is still apparent."
ID: 15713 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 15714 - Posted: 7 Mar 2008, 17:03:47 UTC

OK, I've upgraded. v5.10.45 it is.

Current debt report:

Share___Project__________STD____________LTD
200 ... CPDN .......... 1280 ... -1,132,438
100 ... CPDN Beta ........ 0 ...... 200,000 (inactive)
100 ... Einstein ... -47,285 ....... 27,571
100 ... LHC .............. 0 ...... 307,296
100 ... Orbit ............ 0 ........ 9,009
300 ... SETI ....... -40,471 ........ 4,503
200 ... SETI Beta ... 86,400 ...... 584,057

Now please tell me why since the upgrade, BOINC has requested new work (in sequence) from SETI, Orbit (none available) and LHC (none available): but it has NOT requested any work from SETI Beta. SETI Beta project is active (running one task), and NOT in comms deferral. BUG.
ID: 15714 · Report as offensive
1 · 2 · Next

Message boards : BOINC client : Dual core

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.