resource sharing

Message boards : Questions and problems : resource sharing
Message board moderation

To post messages, you must log in.

AuthorMessage
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 100002 - Posted: 19 Jul 2020, 19:27:22 UTC

Preliminaries:
I use BOINC 7.16.7 (x64) on Win10. BOINC is not a service.
My current projects are Rosetta and Einstein.
Related threads:
(1) Mine: https://setiathome.berkeley.edu/forum_thread.php?id=85155
(2) Mine: https://einsteinathome.org/content/processor-sharing
(3) Mine: https://setiathome.berkeley.edu/forum_thread.php?id=84954
(4) https://boinc.berkeley.edu/forum_thread.php?id=13802

Background: I suspect the BOINC Scheduler isn't functioning on my computer as I understand it's advertised to function. I'm collecting data to convince myself.

Question: What can I do to achieve a controlled balance between projects?


Short Summary

I collected all lines in any BOINC project job log for the past 19 weeks. Accepting that each line represents BOINC results on one "task" (aka work unit) and that the "fe" field records the actual processor time spent on the task in seconds, I totaled the times for tasks by project and between configuration changes. I expect this to produce an approximate reflection of the results of the BOINC Scheduler decisions on my computer. The numbers do not inspire confidence the BOINC Scheduler is working as advertised. I'll describe here my continuing attempts to find the balance among projects I seek.

Background

I'll never make any BOINC leader-board in terms of the amount of processing contributed. I like contributing, however. My earliest contributions were to SETI before BOINC. I stayed on the SETI application for a few months after BOINC started to let BOINC harden a bit before I swapped over.

Before the period I'll report on here, I was overactive in tweaking configurations. More experienced BOINC folks told me correctly, "There lies madness." I change configurations seldom and carefully, now. I document each change.

My current computer is an Intel i-5 (8 threads). The Core Temp software (a recommended addition) told me I was running too hot; I don't run wide open.

I previously ran three projects: SETI, Rosetta, and Einstein. SETI no longer sends data; I run only Rosetta and Einstein. Perhaps SETI will send more data, so I'm leaving the computer configured to run SETI tasks, should any arrive.

My Goals

I wanted each project to get roughly an equal amount of time on my computer.

My Experience

Before going into the current data analysis effort, I felt confident the time wasn't equal. It seemed (no data to back this) that each project got roughly an equal number of tasks. SETI sent very small tasks. Einstein sent very large tasks in big groups of tasks that required far more than their intended share of the computing available to meet the due dates. In some cases, it looked like those groups required twice or more of Einstein's intended share.

Rosetta's tasks were between SETI's and Einstein's sizes, but also came in groups that required more than the Rosetta share of computing.

The Rosetta project offers a setting for the average size of tasks sent (a highly desirable attribute of a BOINC project). I adjusted it to advantage.

I convinced myself that the BOINC "<rec_half_life_days>" default setting of 10 days is too low for current task sizes and task groups the Scheduler accepts.

A motivating example. Suppose a Scheduler has instructions to equally balance time between two projects. Suppose Project A sends a group of eight tasks, each of which needs eight thread-days to complete. One way the Scheduler can meet all goals on my processor (eight threads) is to alternate eight-day periods between the two projects. When Project A has the processor, it completes all the tasks in the group it sent. In theory, when Project B has the processor, Project A must wait its turn.

What happens in this example if Project A gets the processor first? Project B starts during day 9, but the processing done on day 1 is quickly disappearing from the Scheduler's "memory". (By day 11, it will about be half gone, by the definition of half-life.) Sometime well before Project B gets its allotted time, the Scheduler will have "forgotten" enough of Project A's run that it'll perceive a balance between the projects. If it accepts another group of eight tasks each needing eight thread-days, it will presumably share the processor until Project B finishes the work it has sent and will give the processor entirely to Project A until their tasks are done. If Project A tasks are at risk of being late, the Scheduler will not accept further Project B tasks. This cycle can continue indefinitely.

The example should be too extreme. Yet, I believe I saw that behavior.

Now to collect data.

During some of the above experiences, I had the processor running wide open. It was always at 100% utilization and I had no way to know the processor temperature.

I installed shareware "Core Temp" and realized I was harming my processor. I set BOINC Computing Preferences to use at most "50%" of the CPUs and to use at most "100%" of the CPU time. Temperatures reduced. Also, the number of tasks accepted into my queue reduced by about half.

As positive as the change was, the allocation among projects still seemed poor.

Configurations During the Period of This Data

Config #1: At the start of this data, the configuration was:
- Resource share ratio (SETI:Rosetta:Einstein) 3000:1000:300.
- Record at least 0.04 days (1 hour) of work.
- Record at least 0.01 days (15 min) of additional work.
- GPU tasks off. (The on-board GPU with this processor isn't great for BOINC tasks.)
- "Use at most 50% of CPUs."
- "Use at most 100% of CPU time."
- rec_half_life_days: 10 days.
- Concurrent tasks unrestricted.
- Switch tasks after 360 min.

As to resource share: If the Scheduler works right, I'd expect SETI to get more time than Einstein in a ratio of 3000:300, and Rosetta to get more time than Einstein in a ratio of 1000:300. The ratios equate to percentages near SETI 70%, Rosetta 23%, and Einstein 7%.

Config #1a (a data change, not a config change): Seven weeks later, SETI stopped sending tasks. During the seven weeks, my computer didn't get enough SETI tasks to give the Scheduler a chance to balance for SETI. Thus, this analysis doesn't analyze SETI data. Perhaps the balance between Rosetta and Einstein remains useable. Both those projects sent plenty of tasks.

Config #2: Two weeks later, I put
<rec_half_life_days>20</rec_half_life_days>

in cc_config.xml. This doubled the half-life of scheduler data from 10 days to 20.

Config #3: Ten weeks later (now), I put
<rec_half_life_days>40</rec_half_life_days>

in cc_config.xml. This doubled the half-life of scheduler data again, from 20 days to 40.

The Data

This data set has lots of variation in it. One project ended. The others sent tasks and task groups of varying sizes and intervals between. My computer had power for varying amounts of time per day. My processing in addition to BOINC varied.

Analysis at the granularity of a week, for example, is too short because too much variation remains. As I'll discuss, that's probably also true for three weeks.

More sophisticated analysis could include variation in data and could support more assertive conclusions. This analysis is sufficient for this purpose.

All columns measure processing: weeks, seconds per week, or percent of processing.

Between    | weeks | sec/wk SETI | sec/wk Rosetta | sec/wk Einstein | sec/wk Total | pct SETI | pct Rosetta | pct Einstein
#1 to #1a  |     7 |   67882.334 |     329567.177 |      280842.696 |   678292.208 |    10.0% |       48.6% |        41.4%
#1a to #2  |     2 |       0.000 |     423968.460 |      301551.490 |   725519.950 |     0.0% |       58.4% |        41.6%
#2 to #3   |    10 |       0.000 |     518712.810 |      265800.734 |   784513.544 |     0.0% |       66.1% |        33.9%


Expected percentages:
Config #1: SETI 70%, Rosetta 23%, and Einstein 7%.
Later: SETI 0%, Rosetta 80%, and Einstein 20%.

Interpretation:
1. SETI was low because my computer did not get enough SETI tasks to balance the other projects.
2. The first row uses the 10-day half-life. It probably demonstrates that allocation was far from 70%, 23%, 7% in the configuration. At the same time, it demonstrates some success in achieving an equal balance among projects.
3. The last row uses the 20-day half-life. The differences between row 3 and previous rows may demonstrate the 20-day half-life helped the Scheduler make better decisions.

The last row at greater detail:

Between         | weeks | sec/wk Rosetta | sec/wk Einstein | sec/wk Total | pct Rosetta | pct Einstein
#2 to #3, early |     4 |     602745.672 |      296468.840 |   899214.511 |       67.0%	|        33.0%
#2 to #3, mid   |     3 |     312289.830 |      154594.398 |   466884.228 |       66.9%	|        33.1%
#2 to #3, late  |     3 |     613091.973 |      336116.262 |   949208.235 |       64.6%	|        35.4%


Interpretation:
1. When aggregated (line "#2 to #3" in the first table), this same data may demonstrate better performance with the 20-day half-life (closer to the expected 80% and 20% split).
2. When less aggregated (this table), this data appears to indicate declining performance in successive parts of the 10-week period. That's probably a coincidence due to variability in the underlying data. This analysis needs additional long periods of data to distinguish.

Next

I've started another period of data collection, this time with a half-life of 40 days. Otherwise, I'm holding the configuration unchanged. Sometime later, I intend another of my infrequent reports.
ID: 100002 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 100003 - Posted: 19 Jul 2020, 20:04:43 UTC - in response to Message 100002.  

I convinced myself that the BOINC "<rec_half_life_days>" default setting of 10 days is too low for current task sizes and task groups the Scheduler accepts.
What do you expect <rec_half_life_days> to do, or what it stands for? What does it measure, in your opinion?
ID: 100003 · Report as offensive
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 100004 - Posted: 19 Jul 2020, 23:20:15 UTC - in response to Message 100003.  

What do you expect <rec_half_life_days> to do, or what it stands for? What does it measure, in your opinion?

Well, you would certainly know about https://boinc.berkeley.edu/wiki/Client_configuration. It says, "A project's scheduling priority is determined by its estimated credit in the last X days. Default is 10; set it larger if you run long high-priority jobs."

That's all I "know". It's not much.

If consistent with the concept of half-life, half of information about past decisions will be lost in this period of time (default 10 days).

Said another way, having completed a job of a given size just after this period of time will have twice the effect on a current decision as the same job completed at the start of this period.

A common implementation of half-life could involve keeping track of past credit. Maintaining this "past credit" number might involve a daily multiplication by a factor and adding the day's credit.

If the "factor" above was (1/2)^(1/n), n multiplications would yield the same result as multiplying by 1/2.

Speculation: This past credit value could be in a file like client_state.xml. It would make sense to track it for each project, maybe in a field like host_expavg_credit.

The "exp" in the cited field name suggests involvement of an exponential process. The above is one.

If there's a place I can read for more detail, I'd love to know the place. If the <rec_half_life_days> field does something different than that, I'm eager to know it! Thanks in advance ...
ID: 100004 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5078
United Kingdom
Message 100011 - Posted: 20 Jul 2020, 9:24:16 UTC

<rec_half_life_days> adjusts the rate of change of the relative scheduling priorities of different projects. Explore the results with the <priority_debug> event log flag.
ID: 100011 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 100013 - Posted: 20 Jul 2020, 9:35:54 UTC - in response to Message 100004.  

What do you expect <rec_half_life_days> to do, or what it stands for? What does it measure, in your opinion?

Well, you would certainly know about https://boinc.berkeley.edu/wiki/Client_configuration. It says, "A project's scheduling priority is determined by its estimated credit in the last X days. Default is 10; set it larger if you run long high-priority jobs."

That's all I "know". It's not much..
So if you know that, then why are you tracking the run time of tasks and not the amount of credit they gather?
Also, why do you double the value of rec_half_life_days? Neither Einstein nor Rosetta run tasks that take months to finish while they run in high priority. If anything you're better off lowering that value, many people set it to 1 immediately and that works quite a bit better than 10.

As to where REC is being tracked, I suspect it's at the projects, because that makes it easier to keep safe than a user editable file. But we're still looking into that.
ID: 100013 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5078
United Kingdom
Message 100015 - Posted: 20 Jul 2020, 10:17:57 UTC - in response to Message 100013.  

Background is at https://boinc.berkeley.edu/trac/wiki/ClientSchedOctTen#Proposal:credit-drivenscheduling. Note that we are using the second part of that proposal (estimated credit - REC and its half-life), having rejected the use of actual (granted) credit as unreliable.
ID: 100015 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5078
United Kingdom
Message 100017 - Posted: 20 Jul 2020, 11:58:02 UTC - in response to Message 100013.  

REC is tracked in the client_state.xml file, i.e.locally. For example,

<project>
    <master_url>http://einstein.phys.uwm.edu/</master_url>
    <project_name>Einstein@Home</project_name>
    ...
    [snip ~20 lines]
    ...
    <rec>7688.584041</rec>
    <rec_time>1595244898.740450</rec_time>
    <resource_share>100.000000</resource_share>
    ...
ID: 100017 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 100018 - Posted: 20 Jul 2020, 12:37:56 UTC - in response to Message 100017.  

Thanks for tracking that Richard. I owe you a pint.
ID: 100018 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 100027 - Posted: 20 Jul 2020, 20:38:49 UTC

Now that's it's been fixed, what you can do is run your scenarios through the client simulator: https://boinc.berkeley.edu/trac/wiki/ClientSim
That way you don't have to run it in actual fact, but can simulate what happens. Do make sure that all projects are allowed to fetch work, my test just moments ago didn't have that so it ran to a stop within 8 hours.
ID: 100027 · Report as offensive
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 100029 - Posted: 21 Jul 2020, 3:59:34 UTC - in response to Message 100011.  

For Richard Haselgrove: This question has reached high levels indeed. Thanks for taking the time.

<rec_half_life_days> adjusts the rate of change of the relative scheduling priorities of different projects.


Unsure how to interpret this.

Maybe my understanding explained earlier is largely sound? And this statement means that higher values for <rec_half_life_days> slow "the rate of change of the relative scheduling priorities ..."?

If so, my understanding may be consistent with the statement. Said another way, scheduling information captured in REC decays to low levels of significance slower for higher values of <rec_half_life_days>?

Explore the results with the <priority_debug> event log flag.

(Perhaps someone would want to see that Richard needn't take time on this? 🙏)

I'm unfamiliar with this flag and with the process for using it.

Perhaps I put
<priority_debug>1</priority_debug>
in cc_config.xml. Perhaps it will activate project scheduling statements I can later read in the event log?

Thanks in advance for any replies that come!
ID: 100029 · Report as offensive
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 100030 - Posted: 21 Jul 2020, 5:54:55 UTC - in response to Message 100013.  

For Jord: Thanks for the kind replies. Well done.

So if you know that, then why are you tracking the run time of tasks and not the amount of credit they gather?


As I said, my goal is "to get roughly an equal amount of time [for each project] on my computer." Perhaps more accurately, I want to be able to control the ratio of time for the projects on my computer.

Credit is not part of my goal. At the same time, I don't object to using credit if its the best proxy available for time.

In fact, I'm using credit as a proxy for time. For Rosetta and Einstein, my current resource shares are 1000 and 300, respectively. It was a while ago that I came up with those and I'm not sure about the method, but data indicates it's my most successful technique of balancing time yet.

Perhaps a valuable result of the design document will be more incentive to projects to grant credit consistently. Obviously, there's lots of potential improvement in this area. Probably, that contributes to my allocation problems.

Also, why do you double the value of rec_half_life_days?

Neither of us is confident I correctly understand the purpose for tracking REC yet. To the extent I understand, there is value in keeping past scheduling information inside the REC value longer. This would dampen the excesses of having little information when projects supply work inconsistently.

There's a downside of doing this: If I change resource values, the REC values assume values corresponding to my new priorities slower. In concept, I'm happy to accept that in trade for more control over time for each project. If it doesn't work well in actuality, I'll keep searching.

If anything you're better off lowering that value, many people set it to 1 immediately and that works quite a bit better than 10.

Great! I'd like to explore this new idea (to me). In what ways does "1" work "quite a bit better than 10"? If those goals are indeed consistent with mine, this is for me.
ID: 100030 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5078
United Kingdom
Message 100033 - Posted: 21 Jul 2020, 7:59:24 UTC - in response to Message 100029.  

Unsure how to interpret this.

Maybe my understanding explained earlier is largely sound? And this statement means that higher values for <rec_half_life_days> slow "the rate of change of the relative scheduling priorities ..."?
The length of time it takes to shift half-way to the new value. https://en.wikipedia.org/wiki/Half-life

I'm unfamiliar with this flag and with the process for using it.

Perhaps I put
<priority_debug>1</priority_debug>
in cc_config.xml. Perhaps it will activate project scheduling statements I can later read in the event log?
Or you could let Ctrl+Shift+F do it for you.
ID: 100033 · Report as offensive
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 100053 - Posted: 22 Jul 2020, 2:53:28 UTC - in response to Message 100033.  

The length of time it takes to shift half-way to the new value. https://en.wikipedia.org/wiki/Half-life
Thanks. It sounds like we're saying the same thing.
ID: 100053 · Report as offensive

Message boards : Questions and problems : resource sharing

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.