Message boards : Questions and problems : GPU tasks skipped after scheduler overcommits CPU cores
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 31 Dec 18 Posts: 298 |
And same issue shows itself here too: Nidle is, I believe, the number of idle cores |
Send message Joined: 9 Apr 06 Posts: 302 |
And same issue shows itself here too: I believed too, but obviously - no. There are 4 cores and only 3 running task (even if GPU one takes full core =3, not 4). And still that field is zero. |
Send message Joined: 31 Dec 18 Posts: 298 |
And same issue shows itself here too: Just to be picky, the fact that the machine has 4 cores does not mean that Boinc believes that it has 4 cores available to it :-p |
Send message Joined: 9 Apr 06 Posts: 302 |
Indeed. But anyway I learnt how BOINC deals with "waiting for memory" and completely refused this mode of operation. BOINC's behavior is too dumb. It doesn't try to fill computing devices, just suspends one of tasks leaving device idle. Time to time it suspended even GPU task (!) leaving GPU idle with few CPU memory-consuming GW tasks running. So back to app_config. For now I'm quite despaired to find adequate operation mode w/o micro-managing. |
Send message Joined: 5 Oct 06 Posts: 5129 |
OK, plan of action, after yet another report of the same problem (thread 14182, from someone who wants to gripe about the problem, but not to be part of the solution). I'm going to try and work up my hack into something presentable, and let a few trusted beta testers take it for a spin. Windows only, at this stage.
With any luck, and provided real life doesn't intrude, I may have something for you by the weekend. |
Send message Joined: 9 Apr 06 Posts: 302 |
OK, plan of action, after yet another report of the same problem (thread 14182, from someone who wants to gripe about the problem, but not to be part of the solution). I'm going to try and work up my hack into something presentable, and let a few trusted beta testers take it for a spin. Windows only, at this stage. Richard, as I understand your biggest issue is with char string and passing it into inner area. Above I proposed how to replace it with boolean variable. So, just add ,bool work_fetch=false) instead of ) in any inner function declaration. This would allow not to touch parts outside of work fetch at all (default value is false, if param not listed in call it will be assumed false). And initial initialization: void rr_simulation(const char* why) { static double last_time=0; bool work_fetch=(why=="work fetch"?true:false); Have no handy build environment currently so building on you... |
Send message Joined: 5 Oct 06 Posts: 5129 |
I chose to use an integer - allows for future expansion of the rr_sim space ;-) It builds, without warnings and - after some effort - errors. I've re-created the problem case, but I'm going to leave it running on the old app overnight, so I can test properly with fresh eyes in the morning. |
Send message Joined: 5 Oct 06 Posts: 5129 |
I finished tidying up the re-worked hack yesterday, and it's been running for about 18 hours without problems. Here's a slightly artificial log to show the effect - I manually boosted the number of CPU tasks cached, to increase the numbers. 11/03/2021 11:46:03 | | [rr_sim] doing sim: CPU sched 11/03/2021 11:46:03 | | [rr_sim] start: work_buf min 864 additional 864 total 1728 on_frac 0.997 active_frac 1.000 11/03/2021 11:46:03 | Einstein@Home | [rr_sim] 9.23: h1_0676.05_O2C02Cl4In0__O2MDFS3a_Spotlight_676.85Hz_929_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (2585.35G/280.08G) 11/03/2021 11:46:03 | Einstein@Home | [rr_sim] 378.46: h1_0676.05_O2C02Cl4In0__O2MDFS3a_Spotlight_676.85Hz_925_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (105999.42G/280.08G) 11/03/2021 11:46:03 | Einstein@Home | [rr_sim] 932.30: h1_0676.05_O2C02Cl4In0__O2MDFS3a_Spotlight_676.85Hz_926_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (258535.16G/280.08G) 11/03/2021 11:46:03 | Einstein@Home | [rr_sim] 1301.53: h1_0648.65_O2C02Cl4In0__O2MDFS3a_Spotlight_649.20Hz_511_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (258535.16G/280.08G) 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] 1758.54: wu_sf3_DS-16x271-2_Grp197929of1000000_1 finishes (1.00 CPU) (11087.93G/6.31G) 11/03/2021 11:46:03 | Einstein@Home | [rr_sim] 1855.37: h1_0648.65_O2C02Cl4In0__O2MDFS3a_Spotlight_649.20Hz_510_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (258535.16G/280.08G) 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | Einstein@Home | [rr_sim] 1922.87: p2030.20170627.G31.90+02.51.S.b0s0g0.00000_2911_3 finishes (0.50 CPU + 1.00 Intel GPU) (31419.20G/16.34G) 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | Einstein@Home | [rr_sim] 2224.60: h1_0648.05_O2C02Cl4In0__O2MDFS3a_Spotlight_648.60Hz_532_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (258535.16G/280.08G) 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] at app max concurrent for GetDecics 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] 2424.21: wu_sf3_DS-16x271-2_Grp618165of1000000_0 finishes (1.00 CPU) (15285.09G/6.31G) 11/03/2021 11:46:03 | NumberFields@home | [rr_sim] 6516.53: wu_sf3_DS-16x271-2_Grp617510of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | | choose_project(): 1615463165.647471 11/03/2021 11:46:05 | | [rr_sim] doing sim: work fetch 11/03/2021 11:46:05 | | [rr_sim] start: work_buf min 864 additional 864 total 1728 on_frac 0.997 active_frac 1.000 11/03/2021 11:46:05 | Einstein@Home | [rr_sim] 9.23: h1_0676.05_O2C02Cl4In0__O2MDFS3a_Spotlight_676.85Hz_929_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (2585.35G/280.08G) 11/03/2021 11:46:05 | Einstein@Home | [rr_sim] 377.46: h1_0676.05_O2C02Cl4In0__O2MDFS3a_Spotlight_676.85Hz_925_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (105718.49G/280.08G) 11/03/2021 11:46:05 | Einstein@Home | [rr_sim] 932.30: h1_0676.05_O2C02Cl4In0__O2MDFS3a_Spotlight_676.85Hz_926_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (258535.16G/280.08G) 11/03/2021 11:46:05 | Einstein@Home | [rr_sim] 1300.53: h1_0648.65_O2C02Cl4In0__O2MDFS3a_Spotlight_649.20Hz_511_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (258535.16G/280.08G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 1758.54: wu_sf3_DS-16x271-2_Grp197929of1000000_1 finishes (1.00 CPU) (11087.90G/6.31G) 11/03/2021 11:46:05 | Einstein@Home | [rr_sim] 1855.37: h1_0648.65_O2C02Cl4In0__O2MDFS3a_Spotlight_649.20Hz_510_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (258535.16G/280.08G) 11/03/2021 11:46:05 | Einstein@Home | [rr_sim] 1921.76: p2030.20170627.G31.90+02.51.S.b0s0g0.00000_2911_3 finishes (0.50 CPU + 1.00 Intel GPU) (31400.97G/16.34G) 11/03/2021 11:46:05 | Einstein@Home | [rr_sim] 2223.60: h1_0648.05_O2C02Cl4In0__O2MDFS3a_Spotlight_648.60Hz_532_2 finishes (1.00 CPU + 1.00 NVIDIA GPU) (258535.16G/280.08G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 2423.21: wu_sf3_DS-16x271-2_Grp618165of1000000_0 finishes (1.00 CPU) (15278.75G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 6516.53: wu_sf3_DS-16x271-2_Grp617510of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 6613.36: wu_sf3_DS-16x271-2_Grp618075of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 6981.59: wu_sf3_DS-16x271-2_Grp616684of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 7181.20: wu_sf3_DS-16x271-2_Grp619492of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 11274.52: wu_sf3_DS-16x271-2_Grp616806of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 11371.35: wu_sf3_DS-16x271-2_Grp618515of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 11739.58: wu_sf3_DS-16x271-2_Grp617586of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 11939.19: wu_sf3_DS-16x271-2_Grp619493of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 16032.51: wu_sf3_DS-16x271-2_Grp619476of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 16129.34: wu_sf3_DS-16x271-2_Grp619477of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 16497.57: wu_sf3_DS-16x271-2_Grp618817of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 16697.18: wu_sf3_DS-16x271-2_Grp618644of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 20790.50: wu_sf3_DS-16x271-2_Grp619016of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 20887.33: wu_sf3_DS-16x271-2_Grp619479of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 21255.56: wu_sf3_DS-16x271-2_Grp619478of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | NumberFields@home | [rr_sim] 21455.17: wu_sf3_DS-16x271-2_Grp619017of1000000_0 finishes (1.00 CPU) (30000.00G/6.31G) 11/03/2021 11:46:05 | | [work_fetch] ------- start work fetch state ------- 11/03/2021 11:46:05 | | [work_fetch] target work buffer: 864.00 + 864.00 sec 11/03/2021 11:46:05 | | [work_fetch] --- project states --- 11/03/2021 11:46:05 | Einstein@Home | [work_fetch] REC 824078.357 prio -2.159 can request work 11/03/2021 11:46:05 | NumberFields@home | [work_fetch] REC 1860.297 prio -0.075 can request work 11/03/2021 11:46:05 | | [work_fetch] --- state for CPU --- 11/03/2021 11:46:05 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 20790.50 busy 0.00 11/03/2021 11:46:05 | Einstein@Home | [work_fetch] share 0.000 blocked by project preferences 11/03/2021 11:46:05 | NumberFields@home | [work_fetch] share 1.000 11/03/2021 11:46:05 | | [work_fetch] --- state for NVIDIA GPU --- 11/03/2021 11:46:05 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 1855.37 busy 0.00 11/03/2021 11:46:05 | Einstein@Home | [work_fetch] share 1.000 11/03/2021 11:46:05 | NumberFields@home | [work_fetch] share 0.000 blocked by project preferences 11/03/2021 11:46:05 | | [work_fetch] --- state for Intel GPU --- 11/03/2021 11:46:05 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 1921.76 busy 0.00 11/03/2021 11:46:05 | Einstein@Home | [work_fetch] share 1.000 11/03/2021 11:46:05 | NumberFields@home | [work_fetch] share 0.000 no applications 11/03/2021 11:46:05 | | [work_fetch] ------- end work fetch state -------David only ran a single type of [rr_sim], the first one shown here (which I've left untouched). You can see that it concludes with the last NumberFields task - my CPU app - finishing after 6516 seconds. My hack runs a separate version of [rr_sim] for work fetch, giving a more realistic buffer size of 21,455 seconds, which is reflected in the 'saturated' work fetch figure for the CPU. I haven't done anything special for GPUs - they're normally controlled by the number of GPUs in the system, rather than by app_config.xml. If anyone wants to test it (and has experience of the CPU overfetch we've been discussing), let me know. |
Send message Joined: 9 Apr 06 Posts: 302 |
Well, you run at least 2 projects so cache keeps busy. With only single project I expect idle CPU cores after some time of working with max_concurrent. And sure, want to test. |
Send message Joined: 28 Aug 19 Posts: 50 |
BOINC gets work but can't finish on time? What? Plus, won't get **ANY** work units from project unless I stop another project. Exp: If TN-Grid is scheduled to get work then only TN-Grid will only get work. If WCG is scheduled to get work then only WCG will get work and will be more in charge and takes out TN-Grid. This is happening on 2 of my computers. One Linux and one Windows 10. **"Days overdue; you may not get credit for it. Consider aborting it". ???** **"Tasks won't finish in time: BOINC runs 99.6% of the time; computation is enabled 100.0% of that". ???** Unless I stop it from getting other projects. 3/9/2021 3:57:05 AM | TN-Grid Platform | Task 181047_Hs_T154140-OXCT1_wu-277_1614546821865_1 is 3.11 days overdue; you may not get credit for it. Consider aborting it. |
Send message Joined: 28 Jun 10 Posts: 2709 |
BOINC gets work but can't finish on time? What? BOINC should prioritise work that is going to miss the deadline unless it does so. Are you overcommitting your computer by having too big a cache? If projects have short deadlines, then if you go for the maximum of 10days work plus 10 additional days, some tasks are going to run out of time before the 20 days is up. I haven't run the projects you mention apart from WCG on my computer so don't know if deadlines are short but I would suggest running at minimum cache size. I have mine at 0.1 +0.1 days. It is very rarely that I have anything go over the time limit. |
Send message Joined: 29 Aug 05 Posts: 15569 |
Are you...?Sandman isn't looking for help (from us). He's been specifically asked by Richard to come over here and post his logs, in order to hunt for the bug(let) in this thread. Expect more posts and logs in the future, but no need to try to help him. He's in good hands. |
Send message Joined: 5 Oct 06 Posts: 5129 |
Sorry, I was away from the screen for a while. I've got a couple of PMs in my inbox too. I'll get on to them. |
Send message Joined: 5 Oct 06 Posts: 5129 |
@Sandman192 - you have replies to your PMs. |
Send message Joined: 28 Aug 19 Posts: 50 |
I can have 10 projects schedule but WCG or TN-Grid is all I get unless I stop WCG or TN-Grid from getting any work at all and things seem to be normal. And again I have never had this problem before I updated my BOINC version. And also happing on my second computer running Linux. |
Send message Joined: 5 Oct 06 Posts: 5129 |
You probably pushed up the priority of the other projects by being locked into working on TN-Grid for so long by deadline pressure. It will return to normal gradually, but over a period of several days. See the Configuration Options page of the User Manual. Try setting the line <rec_half_life_days>X</rec_half_life_days>to something much smaller: one day, instead of the default 10, would sort things out quicker. |
Send message Joined: 9 Apr 06 Posts: 302 |
Running Richard's bugfixed build with work fetch through last week. Initial bug definitely fixed - there is no overfetch. Unfortunately, the signs of other issue I mentioned earlier getting stronger: Currently host has only GW tasks in cache + 1 running FGRP task. Cause only 2 GW tasks allowed at once and host has 4 cores there will be idle ones soon... And GPU part suffers from inability to honor project shares. But this issue worth separate thread. EDIT: Unfortunately, it's not "signs", it 's happened already... 2 GW tasks, 1 FGRP task... and 1 GPU MW task that doesn't need full CPU core! So, 1 CPU core already idle. When FGRP task finishes there will be 2 cores with high probability.... So, this bugfix isn't enough to use max_concurrent as expected. EDIT2: And BOINC doesn't react on idle device (CPU): 3/21/2021 22:48:01 PM | | [work_fetch] ------- start work fetch state ------- 3/21/2021 22:48:01 PM | | [work_fetch] target work buffer: 129600.00 + 8640.00 sec 3/21/2021 22:48:01 PM | | [work_fetch] --- project states --- 3/21/2021 22:48:01 PM | Einstein@Home | [work_fetch] REC 19077.814 prio -6075.662 can request work 3/21/2021 22:48:01 PM | Milkyway@Home | [work_fetch] REC 12328.567 prio -0.513 can request work 3/21/2021 22:48:01 PM | SETI@home Beta Test | [work_fetch] REC 0.000 prio 0.000 can't request work: suspended via Manager 3/21/2021 22:48:01 PM | | [work_fetch] --- state for CPU --- 3/21/2021 22:48:01 PM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 153096.94 busy 0.00 3/21/2021 22:48:01 PM | Einstein@Home | [work_fetch] share 1.000 3/21/2021 22:48:01 PM | Milkyway@Home | [work_fetch] share 0.000 blocked by project preferences 3/21/2021 22:48:01 PM | SETI@home Beta Test | [work_fetch] share 0.000 3/21/2021 22:48:01 PM | | [work_fetch] --- state for NVIDIA GPU --- 3/21/2021 22:48:01 PM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 139709.31 busy 0.00 3/21/2021 22:48:01 PM | Einstein@Home | [work_fetch] share 0.000 3/21/2021 22:48:01 PM | Milkyway@Home | [work_fetch] share 1.000 3/21/2021 22:48:01 PM | SETI@home Beta Test | [work_fetch] share 0.000 3/21/2021 22:48:01 PM | | [work_fetch] ------- end work fetch state ------- Manual update didn't help (as expected): 3/21/2021 22:50:17 PM | Einstein@Home | piggyback: resource CPU 3/21/2021 22:50:17 PM | Einstein@Home | piggyback: don't need CPU 3/21/2021 22:50:17 PM | Einstein@Home | piggyback: resource NVIDIA GPU 3/21/2021 22:50:17 PM | Einstein@Home | piggyback: don't need NVIDIA GPU 3/21/2021 22:50:17 PM | Einstein@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (0.00 sec, 0.00 inst) 3/21/2021 22:50:17 PM | Einstein@Home | Sending scheduler request: Requested by user. 3/21/2021 22:50:17 PM | Einstein@Home | Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: job cache full) 3/21/2021 22:50:18 PM | Einstein@Home | Scheduler request completed 3/21/2021 22:50:18 PM | Einstein@Home | Project requested delay of 60 seconds 3/21/2021 22:50:18 PM | | [work_fetch] Request work fetch: RPC complete 3/21/2021 22:50:23 PM | | choose_project(): 1616356223.263348 3/21/2021 22:50:23 PM | | [work_fetch] ------- start work fetch state ------- 3/21/2021 22:50:23 PM | | [work_fetch] target work buffer: 129600.00 + 8640.00 sec 3/21/2021 22:50:23 PM | | [work_fetch] --- project states --- 3/21/2021 22:50:23 PM | Einstein@Home | [work_fetch] REC 19076.165 prio -15117.019 can't request work: scheduler RPC backoff (54.94 sec) 3/21/2021 22:50:23 PM | Milkyway@Home | [work_fetch] REC 12330.310 prio -1.120 can request work 3/21/2021 22:50:23 PM | SETI@home Beta Test | [work_fetch] REC 0.000 prio 0.000 can't request work: suspended via Manager 3/21/2021 22:50:23 PM | | [work_fetch] --- state for CPU --- 3/21/2021 22:50:23 PM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 153031.73 busy 0.00 3/21/2021 22:50:23 PM | Einstein@Home | [work_fetch] share 0.000 3/21/2021 22:50:23 PM | Milkyway@Home | [work_fetch] share 0.000 blocked by project preferences 3/21/2021 22:50:23 PM | SETI@home Beta Test | [work_fetch] share 0.000 3/21/2021 22:50:23 PM | | [work_fetch] --- state for NVIDIA GPU --- 3/21/2021 22:50:23 PM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 139559.98 busy 0.00 3/21/2021 22:50:23 PM | Einstein@Home | [work_fetch] share 0.000 3/21/2021 22:50:23 PM | Milkyway@Home | [work_fetch] share 1.000 3/21/2021 22:50:23 PM | SETI@home Beta Test | [work_fetch] share 0.000 3/21/2021 22:50:23 PM | | [work_fetch] ------- end work fetch state ------- |
Send message Joined: 9 Apr 06 Posts: 302 |
Currently (as was expected) 2 GW CPU tasks + 1 NV FGRP task. 1 core sits completely idle. And no MW tasks on host.... Seems I have no other way as to set E@h as backup (zero share) project again and crunch w/o cache on my second-power host... Accounting for instability of my current router this almost definitely means idle host time :/ Richard, if no more info about modded build required I prefer to return to stock one cause it will ask for work when CPU is idle. |
Send message Joined: 5 Oct 06 Posts: 5129 |
Fair enough. I think we've established what I set out to achieve - that there is a bug (several bugs!), and that a simple hack eliminates the massive overfetch that's given in the thread title. To go further, and eliminate the extra bugs that pertain to your setup, would require re-writing the whole of rr_sim to keep track of max_concurrent at every step of the way. I don't think I'm skilled enough to do that. We'll have to stop at the proof-of-concept. |
Send message Joined: 9 Apr 06 Posts: 302 |
Yep, the choice between over-fetch and idle cores - the sad choice... BOINC many years suffers from wrong "atomic entity" definition, I would say. First time I said that 15 (?) years ago, at first approach of GPU computing, it was in mail list those times, not only on forums... There are many ad-hoc additions since those times but no real re-write with re-design. And it's just as needed as before. The "atomic entity" here is app_version/plan class. Not the project. And still we suffer from project-centric initial approach. It's just everywhere. From per-project server requests to per-project shares. Initially there was only 1 app per project (and just single project - SETI ). And "atomic project" was == "atomic app" (it appeared so cause no other apps were there). But then AstroPulse added... and whole Credit system gone mad. Then GPGPU emerged - wow, 2 different devices in single host just uncomparable by computing abilities... And so on... BOINC manages not projects, it manages tasks for particular specific apps as atomic entities. And those apps then grouping into different groups by resource usage and by project owning.... What I mean - all that can be done between 2 apps belonging to 2 different projects should be able to do between 2 apps belonging to single project. And it's definitely not the case still... |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.