Message boards : Questions and problems : "Use at most x% of the CPUs" not working - sometimes
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
Testing - Initial conditions: Preferences set to use 62.5% (5) of the CPUs. There are no LHC tasks in the queue. 5 CPU & 1 GPU Einstein tasks running. OK Downloaded 1 LHC ATLAS mt task. It did not start. OK To test: Change the number of CPUs used. I made a mistake - I meant to set the CPUs to 50% (4) immediately - but only set the "When computer is not in use" to 50% of CPUs. There was no immediate change. But the next step - Restart BOINC, (no strays) with "When computer is in use" still (ignorantly) set to 62.5% (5) of the CPUs showed these results: 3 CPU & 1 GPU Einstein tasks running 3.9 CPUs 1 CPU ATLAS mt LHC task running 5.0 CPUs8.9 CPUs - and where I spotted my "in use/not in use" preferences mistake. Windows Task Manager showing 100% in use: The LHC mt task is using 62~63% (5 CPUs) with the other 3.9 CPU tasks vying for the remaining ~37% (3) of the CPUs. Further Tests: I suspended the LHC@home project and 5.9 CPUs running Einstein. OK I set the Local Preferences to use 50% (4) of the CPUs "When computer is in use". Immediately - 4.9 CPUs running Einstein. OK Windows Task Manager showing ~67% in use. (~7% is average OS background task noise) More tests: Resumed LHC Result: LHC mt did not start - OK 4.9 CPUs running Einstein. OK - Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running (0.9 CPUs + 1 NVIDIA GPU) All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) LHC@home Waiting to run (4 CPUs) ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas) Even More Tests: After ~90 minutes of (the above) stability - I set both the Local Preferences to use 62.5% (5) of the CPUs. Result: Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running (0.9 CPUs + 1 NVIDIA GPU) All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) LHC@home Running (4 CPUs) ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas) A total of 8.9 CPUs. But looking at the "Details" pane of Windows Task Manager - The LHC mt task is using 62~63% (5 CPUs ????). The Einstein GPU task uses 12% (1) CPUs. The 4 remaining Einstein tasks are scratching around with 25% (2) CPUs. In reality 9.9 CPU tasks total. ************ NOTE Running in this state for 5:30 hours produced - Further Observations with the Windows Task Manager -------------------- The LHC mt task has now reduced to ~50% (4) CPUs. (Despite the Local Preferences set to use 62.5% (5) of the CPUs.) The Einstein GPU task still uses ~12% (1) CPUs. The 4 remaining Einstein tasks are sharing ~36% (3) CPUs. Which all adds up to 8.9 CPU tasks on 8 CPUs. Windows Task Manager registers 100% of CPUs in use. ************ I suspended the Einstein@home project and the LHC mt is still running with ~50% (4) CPUs. Nearly OK... Should be 5 CPUs. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
With only Einstein CPU tasks running: Changed the local preferences from 50% (4) CPUs to 75% (6) of the CPUs. Immediately 6 tasks running: Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Resumed LHC: Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Waiting to run Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Waiting to run Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Waiting to run Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Waiting to run Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) LHC@home Running (4 CPUs) ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas) The previously downloaded LHC mt task is still using only 4 CPUs. It seems this is because I downloaded that LHC mt task earlier, when the preferences were set to use 50% (4) CPUs and BOINC permanently assigned 4 CPUs to the mt task then. To demonstrate; after I changed the preferences to use 37.5% (3) CPUs: LHC@home Running (4 CPUs) ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)Windows Task Manager registers 4 CPUs in use. And 75% (6) CPUs: Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) LHC@home Running (4 CPUs) ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)Windows Task Manager registers 6 CPUs in use. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
And curiously, with 75% (6) CPUs set in the local preferences, newly downloaded multi-thread (mt) tasks are still being assigned only 4 cores: LHC@home Ready to start (4 CPUs) CMS Simulation 70.30 (vbox64_mt_mcore_cms) Confession: I have since changed local and web preferences to 75% (6) CPUs. My buffer's full - I'm going to let it run. |
Send message Joined: 5 Oct 06 Posts: 5139 ![]() |
When you choose to run an application which the project has designated as 'MT' (multi-threaded), it is the project server which decides how to configure each task. This is fixed at the moment the server decides to assign the task to your computer, and those values stay unchanged throughout the running time of that particular task - even if you change the settings later. The server sets three values:
'avg_ncpus' tells your machine how to expect this task to run, and to keep enough space in BOINC's workload for it to run. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
That would be nice, thank you Richard. My most prominent problem is with the LHC@home virtual machines: All the LHC@home tasks are Linux native. To run them on Windows they need a Linux "virtual machine". My Windows 10 box is (mostly) ten year old hardware. This may or may not be a factor in why it struggles to start and stop these vm's - there are newer CPUs with special faculties for virtualisation. My CPU: Processor: i7-4790K CPU @ 4.00GHz [Family 6 Model 60 Stepping 3] (4th Gen Hyper-Threading CPU 2014) Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 pbe fsgsbase bmi1 smep bmi2 32 GB DDR3 RAM 2 TB SSD The symptoms are excessive Windows system disk access - especially immediately after stopping a vm. I have to wait 90s for it to subside. If I stop two at once they get knotted. The solution I'm using is to stop/start each task manually. But this takes time and patience. So I use the preferences to limit the total number of CPUs and an app_config.xml to limit the number of LHC@home vm tasks running concurrently: app_config.xml for LHC@home: <app_config> <app> <name>CMS</name> <max_concurrent>1</max_concurrent> </app> <app> <name>Theory</name> <max_concurrent>3</max_concurrent> </app> </app_config> Simply so the mannual control takes less time... Can this be improved? Looking at the app_config.xml manual, I fear that lengthy testing needs to be done to find the best setup. At first glance, useful additional elements seem to be <app_config> <project_max_concurrent>4</project_max_concurrent> <!-- 4 tasks - easier to start/stop manually --> and <app_version> <app_name>CMS</app_name> <cmdline>--nthreads 4</cmdline> <!-- leaves space for other tasks to run --> I've also heard rumours about an automatic script that starts/stops LHC tasks in a timed fashion. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
Had a go - app_config.xml for LHC@home: <app_config> <project_max_concurrent>4</project_max_concurrent> <app> <name>ATLAS</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> <app> <name>CMS</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>CMS</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> <app> <name>Theory</name> <max_concurrent>4</max_concurrent> </app> </app_config> Started BOINC suspended. Event Log: 24/07/2024 13:06:02 | LHC@home | Found app_config.xml 24/07/2024 13:06:02 | LHC@home | Entry in app_config.xml for app 'ATLAS', plan class 'mt' doesn't match any app versions 24/07/2024 13:06:02 | LHC@home | Entry in app_config.xml for app 'CMS', plan class 'mt' doesn't match any app versions 24/07/2024 13:06:02 | LHC@home | Max 4 concurrent jobs 24/07/2024 13:06:02 | LHC@home | ATLAS: Max 1 concurrent jobs 24/07/2024 13:06:02 | LHC@home | Theory: Max 4 concurrent jobs 24/07/2024 13:06:02 | LHC@home | CMS: Max 1 concurrent jobs The ATLAS & CMS notices. What does it mean? They are multi thread jobs: ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas) CMS Simulation 70.30 (vbox64_mt_mcore_cms) That's what the "mt" in the application name means... Test it by running. Resume LHC@home first. 24/07/2024 13:16:20 | LHC@home | project resumed by user LHC@home Running (4 CPUs) CMS Simulation 70.30 (vbox64_mt_mcore_cms)This job was downloaded as a 4 thread task. OK Resume Einstein@home - 24/07/2024 13:22:25 | Einstein@Home | project resumed by user Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) Einstein@home Running (0.9 CPUs + 1 NVIDIA GPU) All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) OK 6.9 CPUs total. Windows Task Manager registers 93% of CPUs in use. (6.9 CPUs is 86.25% - the ~7% is OS backgound tasks) It runs. |
Send message Joined: 5 Oct 06 Posts: 5139 ![]() |
OK, you're close, but not quite there yet. It turns out that LHC don't use the basic plan_class labels, but "something much longer containing the letters mt in the middle". You need to use the long version. You've found the long names yourself, so the current ones become <plan_class>vbox64_mt_mcore_atlas</plan_class> <plan_class>vbox64_mt_mcore_cms</plan_class>Sorry about that. For the record: You can add or change an app_config file while BOINC is running - just prepare the file, and activate it by going to the options menu in BOINC Manager advanced view, and choosing "Read config files". Then look in the Event Log: it should confirm that it's found the file, and if the app name is wrong (as in this case), it'll tell you which names it does know (i.e., the names of the apps you've run before). If you want to work out in advance what names are needed, every project has an 'Applications' page on their website. In this case, it's https://lhcathome.cern.ch/lhcathome/apps.php - use the strings in brackets in the 'version' column. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
Thank you! That runs with no warning notices. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
The new set up initially showed a good range of tasks. I let BOINC run for 8 hours and: Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running (0.9 CPUs + 1 NVIDIA GPU) All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) LHC@home Running (4 CPUs) CMS Simulation 70.30 (vbox64_mt_mcore_cms) 9.9 of 8 CPUs - not the desired 6.9 of 8 CPUs. Windows Task Manager shows: The LHC CMS is using 4 cores. OK The Einstein GPU task is using 1 core. OK The 5 Einstein BRP4X64's are sharing the remaining 3 cores. 9.9 of 8 CPUs It seems that BOINC sees the LHC CMS 4 CPU task as just 1 CPU job. |
Send message Joined: 5 Oct 06 Posts: 5139 ![]() |
OK. I think the next stage of the investigation involves getting down and dirty with BOINC's Event Log. Warning: what I'm about to suggest writes a lot of information into the Event Log in a very short time. Be prepared to set it, and then remove it, as quickly as possible - we can study what it reports at leisure. I find the easiest way to do this - if you have a large enough monitor - is to open BOINC Manager in Advanced View, and from there open the Event Log window. Arrange the two windows side by side on the screen. From the main window, open the Event Log options dialog (Ctrl+Shift+F) and leave it open. Then in sequence: Check cpu_sched_debug Click 'Apply' Watch the Event Log window until a large block of text appears UNcheck cpu_sched_debug Click 'Save'. That should get you one cycle of messages (you only want one!), which has the framework 25/07/2024 12:24:19 | | [cpu_sched_debug] Request CPU reschedule: Core client configuration 25/07/2024 12:24:20 | | [cpu_sched_debug] schedule_cpus(): start ... 25/07/2024 12:24:20 | | [cpu_sched_debug] enforce_run_list(): start ... 25/07/2024 12:24:20 | | [cpu_sched_debug] final job list: ... 25/07/2024 12:24:20 | | [cpu_sched_debug] enforce_run_list: endwith lists of tasks in the gaps. Post that entire cycle here, and we can all look at it, to see if we can identify the problem. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
1 | 25/07/2024 13:09:12 | [cpu_sched_debug] Request CPU reschedule: Core client configuration 2 | 25/07/2024 13:09:13 | [cpu_sched_debug] schedule_cpus(): start 3 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] domino prevention: mark p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 as deadline miss 4 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] domino prevention: mark p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 as deadline miss 5 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] domino prevention: mark p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 as deadline miss 6 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] domino prevention: mark p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 as deadline miss 7 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA 8 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40741_0 (NVIDIA GPU, FIFO) (prio -2.468008) 9 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (CPU, EDF) (prio -2.570108) 10 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (CPU, EDF) (prio -2.570513) 11 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (CPU, EDF) (prio -2.570918) 12 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: CMS_167286_1721823814.462743_0 (CPU, FIFO) (prio -0.021328) 13 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: Theory_2743-2829666-331_1 (CPU, FIFO) (prio -0.022409) 14 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: Theory_2743-2790498-325_2 (CPU, FIFO) (prio -0.022679) 15 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: Theory_2743-2790649-331_1 (CPU, FIFO) (prio -0.022949) 16 | 25/07/2024 13:09:13 | [cpu_sched_debug] enforce_run_list(): start 17 | 25/07/2024 13:09:13 | [cpu_sched_debug] preliminary job list: 18 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] 0: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40741_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes) 19 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] 1: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (1.00 CPU; MD: yes; UTS: no) 20 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] 2: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: yes; UTS: no) 21 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] 3: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (1.00 CPU; MD: yes; UTS: yes) 22 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] 4: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no) 23 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] 5: Theory_2743-2829666-331_1 (1.00 CPU; MD: no; UTS: no) 24 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] 6: Theory_2743-2790498-325_2 (1.00 CPU; MD: no; UTS: no) 25 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] 7: Theory_2743-2790649-331_1 (1.00 CPU; MD: no; UTS: no) 26 | 25/07/2024 13:09:13 | [cpu_sched_debug] final job list: 27 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] 0: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (1.00 CPU; MD: yes; UTS: yes) 28 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] 1: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (1.00 CPU; MD: yes; UTS: no) 29 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] 2: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: yes; UTS: no) 30 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] 3: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40741_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes) 31 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] 4: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: yes) 32 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] 5: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no) 33 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] 6: Theory_2743-2829666-331_1 (1.00 CPU; MD: no; UTS: no) 34 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] 7: Theory_2743-2790498-325_2 (1.00 CPU; MD: no; UTS: no) 35 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] 8: Theory_2743-2790649-331_1 (1.00 CPU; MD: no; UTS: no) 36 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (high priority) 37 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (high priority) 38 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (high priority) 39 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40741_0 40 Einstein@Home | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 41 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling CMS_167286_1721823814.462743_0 42 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] all CPUs used (8.90 >= 6), skipping Theory_2743-2829666-331_1 43 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] all CPUs used (8.90 >= 6), skipping Theory_2743-2790498-325_2 44 LHC@home | 25/07/2024 13:09:13 | [cpu_sched_debug] all CPUs used (8.90 >= 6), skipping Theory_2743-2790649-331_1 45 | 25/07/2024 13:09:13 | [cpu_sched_debug] enforce_run_list: end |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
Current: Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running (0.9 CPUs + 1 NVIDIA GPU) All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) LHC@home Running (4 CPUs) CMS Simulation 70.30 (vbox64_mt_mcore_cms)6 jobs -using- 8.9 CPUs. |
Send message Joined: 5 Oct 06 Posts: 5139 ![]() |
Ah. Bingo! Follow the p2030 tasks through the stages. They go through: mark p2030... as deadline miss add to run list: p2030... (CPU, EDF) scheduling p2030... (high priority)"deadline miss", "EDF", and "high priority" are all synonyms for the same issue - you have too much work in your cache. These tasks have relatively short deadlines (7 days, from memory), and they may not be completed in time. BOINC deals with them as quickly as possible, and schedules them first - before applying other limits, like your chosen number of core to run. Reduce the number of days' work you request, and let them work through. You should see the number of cores in use reduce as BOINC feels the deadline pressure is reducing. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
OK - Thank you. I adjusted "Options -> Computing preferences". They were: Store at least [2] days and up to an additional [1] days of work. Changed to: Store at least [1] days and up to an additional [1] days of work. This produced: 6.9 CPUs OK Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running (0.9 CPUs + 1 NVIDIA GPU) All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) LHC@home Running Theory Simulation 300.30 (vbox64_theory) LHC@home Running (4 CPUs) CMS Simulation 70.30 (vbox64_mt_mcore_cms) and the "CPU Scheduler Debug" output: 1 | 25/07/2024 14:41:17 | [cpu_sched_debug] Request CPU reschedule: Core client configuration 2 | 25/07/2024 14:41:18 | [cpu_sched_debug] schedule_cpus(): start 3 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] thrashing prevention: mark p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 as deadline miss 4 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] thrashing prevention: mark p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 as deadline miss 5 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA 6 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40749_0 (NVIDIA GPU, FIFO) (prio -2.467961) 7 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: Theory_2743-2829666-331_1 (CPU, FIFO) (prio -0.021359) 8 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: CMS_167286_1721823814.462743_0 (CPU, FIFO) (prio -0.021629) 9 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: Theory_2743-2790498-325_2 (CPU, FIFO) (prio -0.022710) 10 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: Theory_2743-2790649-331_1 (CPU, FIFO) (prio -0.022980) 11 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (CPU, FIFO) (prio -2.570062) 12 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (CPU, FIFO) (prio -2.570467) 13 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (CPU, FIFO) (prio -2.570872) 14 | 25/07/2024 14:41:18 | [cpu_sched_debug] enforce_run_list(): start 15 | 25/07/2024 14:41:18 | [cpu_sched_debug] preliminary job list: 16 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] 0: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40749_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes) 17 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] 1: Theory_2743-2829666-331_1 (1.00 CPU; MD: no; UTS: yes) 18 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] 2: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no) 19 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] 3: Theory_2743-2790498-325_2 (1.00 CPU; MD: no; UTS: no) 20 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] 4: Theory_2743-2790649-331_1 (1.00 CPU; MD: no; UTS: no) 21 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] 5: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: yes) 22 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] 6: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (1.00 CPU; MD: no; UTS: no) 23 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] 7: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: no; UTS: no) 24 | 25/07/2024 14:41:18 | [cpu_sched_debug] final job list: 25 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] 0: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40749_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes) 26 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] 1: Theory_2743-2829666-331_1 (1.00 CPU; MD: no; UTS: yes) 27 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] 2: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: yes) 28 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] 3: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no) 29 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] 4: Theory_2743-2790498-325_2 (1.00 CPU; MD: no; UTS: no) 30 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] 5: Theory_2743-2790649-331_1 (1.00 CPU; MD: no; UTS: no) 31 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] 6: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (1.00 CPU; MD: no; UTS: no) 32 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] 7: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: no; UTS: no) 33 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] scheduling h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40749_0 34 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] scheduling Theory_2743-2829666-331_1 35 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] scheduling p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 36 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] scheduling CMS_167286_1721823814.462743_0 37 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping Theory_2743-2790498-325_2 38 LHC@home | 25/07/2024 14:41:18 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping Theory_2743-2790649-331_1 39 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 40 Einstein@Home | 25/07/2024 14:41:18 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 41 | 25/07/2024 14:41:18 | [cpu_sched_debug] enforce_run_list: end |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
And, after a few hours with "Won't get new tasks" for both projects: 6.9 CPUs Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) Einstein@home Running (0.9 CPUs + 1 NVIDIA GPU) All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) LHC@home Running (4 CPUs) CMS Simulation 70.30 (vbox64_mt_mcore_cms) And the "CPU Scheduler Debug" output: 1 | 25/07/2024 21:56:16 | [cpu_sched_debug] Request CPU reschedule: Core client configuration 2 | 25/07/2024 21:56:17 | [cpu_sched_debug] schedule_cpus(): start 3 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA 4 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: h1_1553.60_O3aC01Cl1In0__O3ASHF1d_1554.00Hz_56530_0 (NVIDIA GPU, FIFO) (prio -2.467535) 5 LHC@home | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: CMS_167286_1721823814.462743_0 (CPU, FIFO) (prio -0.021643) 6 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (CPU, FIFO) (prio -2.569635) 7 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (CPU, FIFO) (prio -2.570041) 8 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (CPU, FIFO) (prio -2.570446) 9 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 (CPU, FIFO) (prio -2.570851) 10 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: LATeah2102F_1048.0_291588_0.0_0 (CPU, FIFO) (prio -2.571256) 11 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: LATeah2102F_1048.0_291544_0.0_0 (CPU, FIFO) (prio -2.571661) 12 | 25/07/2024 21:56:17 | [cpu_sched_debug] enforce_run_list(): start 13 | 25/07/2024 21:56:17 | [cpu_sched_debug] preliminary job list: 14 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 0: h1_1553.60_O3aC01Cl1In0__O3ASHF1d_1554.00Hz_56530_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes) 15 LHC@home | 25/07/2024 21:56:17 | [cpu_sched_debug] 1: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no) 16 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 2: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: no; UTS: no) 17 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 3: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (1.00 CPU; MD: no; UTS: yes) 18 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 4: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: no) 19 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 5: p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 (1.00 CPU; MD: no; UTS: no) 20 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 6: LATeah2102F_1048.0_291588_0.0_0 (1.00 CPU; MD: no; UTS: no) 21 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 7: LATeah2102F_1048.0_291544_0.0_0 (1.00 CPU; MD: no; UTS: no) 22 | 25/07/2024 21:56:17 | [cpu_sched_debug] final job list: 23 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 0: h1_1553.60_O3aC01Cl1In0__O3ASHF1d_1554.00Hz_56530_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes) 24 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 1: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (1.00 CPU; MD: no; UTS: yes) 25 LHC@home | 25/07/2024 21:56:17 | [cpu_sched_debug] 2: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no) 26 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 3: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: no; UTS: no) 27 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 4: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: no) 28 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 5: p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 (1.00 CPU; MD: no; UTS: no) 29 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 6: LATeah2102F_1048.0_291588_0.0_0 (1.00 CPU; MD: no; UTS: no) 30 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] 7: LATeah2102F_1048.0_291544_0.0_0 (1.00 CPU; MD: no; UTS: no) 31 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] scheduling h1_1553.60_O3aC01Cl1In0__O3ASHF1d_1554.00Hz_56530_0 32 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] scheduling p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 33 LHC@home | 25/07/2024 21:56:17 | [cpu_sched_debug] scheduling CMS_167286_1721823814.462743_0 34 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] scheduling p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 35 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 36 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 37 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping LATeah2102F_1048.0_291588_0.0_0 38 Einstein@Home | 25/07/2024 21:56:17 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping LATeah2102F_1048.0_291544_0.0_0 39 | 25/07/2024 21:56:17 | [cpu_sched_debug] enforce_run_list: end |
Send message Joined: 5 Oct 06 Posts: 5139 ![]() |
I take it from that last show that BOINC is currently running as you would expect from your current settings? I'm sure you'll keep an eye on it, but we're probably done for now. I'll just pass on one final thought - it involves the fractional CPU usage shown for the Einstein Gravitational Wave app. GPU hardware and GPU applications are very variable. Actual observed CPU usage can range from minuscule (one or two percent) to 100% - BOINC is very bad at detecting and reacting to the extremes. In particular, the combination of Windows OS + NVidia hardware + OpenCL programming language responds best when a full CPU core is available for it to use - and the GW app falls into that trap. Einstein have deployed it with a 90% CPU setting, but that's not really enough: with a fractional setting, no matter how high, BOINC allows another CPU task to run. You can upgrade the GW app to request a full CPU core by using another app_config file. Use the 'Application' settings, and a <cpu_usage> of 1. As before, ask if you need help. |
![]() Send message Joined: 29 Aug 05 Posts: 15585 ![]() |
Great, new terminology. "domino prevention" "trashing prevention" What's the difference between the two? |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
Thanks that's great. So with just <app_config> <app> <name>einstein_O3AS</name> <max_concurrent>1</max_concurrent> <cpu_usage>1</cpu_usage> </app> </app_config> I still got Einstein@home | Running (0.9 CPUs + 1 NVIDIA GPU) | All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) | h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2 So I expanded the app_config.xml for Einstein@home: <app_config> <!-- All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) --> <!-- (0.9 CPUs + 1 NVIDIA GPU) default --> <!-- make it use 1 CPUs --> <app> <name>einstein_O3AS</name> <max_concurrent>1</max_concurrent> <cpu_usage>1</cpu_usage> </app> <app_version> <app_name>einstein_O3AS</app_name> <plan_class>GW-opencl-nvidia-2</plan_class> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> <cmdline>--nthreads 1</cmdline> </app_version> </app_config> "I think" I momentarily got Einstein@home | Running (1 CPUs + 1 NVIDIA GPU) | All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) | h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2 before it was kicked - Einstein@home | Computation error (0.9 CPUs + 1 NVIDIA GPU) | All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) | h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2 Event log: Einstein@Home | 26/07/2024 15:05:04 | project resumed by user Einstein@Home | 26/07/2024 15:05:07 | Computation for task h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2 finished Einstein@Home | 26/07/2024 15:05:07 | Output file h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2_1 for task h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2 absent Sorry about that - I should have been quicker to observe. And just by chance there were no more GPU apps in the queue. So - "Allow new tasks" got me - Einstein@home Running Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64) p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 Einstein@home Running Gamma-ray pulsar search #5 1.08 (FGRPSSE) LATeah2102F_1048.0_291588_0.0_0 Einstein@home Running (0.2 CPUs + 0.33 NVIDIA GPUs) Binary Radio Pulsar Search (MeerKAT) 0.12 (BRP7-cuda55) Ter5_1_dns_cfbf00021_segment_6_dms_200_40000_156_7250000_1 Einstein@home Running (0.2 CPUs + 0.33 NVIDIA GPUs) Binary Radio Pulsar Search (MeerKAT) 0.12 (BRP7-cuda55) Ter5_1_dns_cfbf00021_segment_6_dms_200_40000_156_7400000_0 Einstein@home Running (0.2 CPUs + 0.33 NVIDIA GPUs) Binary Radio Pulsar Search (MeerKAT) 0.12 (BRP7-cuda55) Ter5_1_dns_cfbf00021_segment_6_dms_200_40000_156_6950000_1 LHC@home Running (4 CPUs) CMS Simulation 70.30 (vbox64_mt_mcore_cms) .... LOL |
Send message Joined: 5 Oct 06 Posts: 5139 ![]() |
Always keep the User Manual open beside you while you're working, and refer to it often... I make that <app_config> <app> <name>einstein_O3AS</name> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> <app_config>The sub-entries inside <gpu_versions> aren't optional. |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 ![]() |
OK MeerKATs aside... With the app_config.xml I posted, I just got some GPU tasks and they look like this - Einstein@home Ready to start (1 CPUs + 1 NVIDIA GPU) All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) Out of curiosity I'll let it run a few before trying your version. The small litter of MeerKATs have yet to finish... |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.