Thread '"Use at most x% of the CPUs" not working - sometimes'

Message boards : Questions and problems : "Use at most x% of the CPUs" not working - sometimes
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114316 - Posted: 22 Jul 2024, 18:20:27 UTC
Last modified: 22 Jul 2024, 18:29:12 UTC

Testing -

Initial conditions:
Preferences set to use 62.5% (5) of the CPUs.
There are no LHC tasks in the queue.
5 CPU & 1 GPU Einstein tasks running.  OK


Downloaded 1 LHC ATLAS mt task.
It did not start.
OK


To test:
Change the number of CPUs used.
I made a mistake - I meant to set the CPUs to 50% (4) immediately
- but only set the "When computer is not in use" to 50% of CPUs.
There was no immediate change.

But the next step -

Restart BOINC, (no strays) with "When computer is in use" still (ignorantly) set to 62.5% (5) of the CPUs

showed these results:

3 CPU & 1 GPU Einstein tasks running   3.9 CPUs
1 CPU ATLAS mt LHC task running        5.0 CPUs
8.9 CPUs - and where I spotted my "in use/not in use" preferences mistake.

Windows Task Manager showing 100% in use:
The LHC mt task is using 62~63% (5 CPUs) with the other 3.9 CPU tasks vying for the remaining ~37% (3) of the CPUs.



Further Tests:
I suspended the LHC@home project
and
5.9 CPUs running Einstein.  OK

I set the Local Preferences to use 50% (4) of the CPUs "When computer is in use".
Immediately -
4.9 CPUs running Einstein.  OK

Windows Task Manager showing ~67% in use.  (~7% is average OS background task noise)



More tests:
Resumed LHC

Result:
LHC mt did not start -      OK
4.9 CPUs running Einstein.  OK -

Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)            
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)            
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)            
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)            
Einstein@home   Running (0.9 CPUs + 1 NVIDIA GPU)       All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2)       

LHC@home        Waiting to run (4 CPUs)                 ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)                           



Even More Tests:
After ~90 minutes of (the above) stability -
I set both the Local Preferences to use 62.5% (5) of the CPUs.

Result:
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)            
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)            
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)            
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)            
Einstein@home   Running (0.9 CPUs + 1 NVIDIA GPU)       All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2)       

LHC@home        Running (4 CPUs)                        ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)   

A total of 8.9 CPUs.

But looking at the "Details" pane of Windows Task Manager -

The LHC mt task is using 62~63% (5 CPUs ????).
The Einstein GPU task uses 12% (1) CPUs.
The 4 remaining Einstein tasks are scratching around with 25% (2) CPUs.

In reality 9.9 CPU tasks total.

************
NOTE
Running in this state for 5:30 hours produced -
Further Observations with the Windows Task Manager
--------------------
The LHC mt task has now reduced to ~50% (4) CPUs.  (Despite the Local Preferences set to use 62.5% (5) of the CPUs.)
The Einstein GPU task still uses ~12% (1) CPUs.
The 4 remaining Einstein tasks are sharing ~36% (3) CPUs.
Which all adds up to 8.9 CPU tasks on 8 CPUs.
Windows Task Manager registers 100% of CPUs in use.
************

I suspended the Einstein@home project
and the LHC mt is still running with ~50% (4) CPUs. Nearly OK... Should be 5 CPUs.
ID: 114316 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114319 - Posted: 23 Jul 2024, 11:29:56 UTC

With only Einstein CPU tasks running:

Changed the local preferences from 50% (4) CPUs to 75% (6) of the CPUs.

Immediately 6 tasks running:
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)

Resumed LHC:
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Waiting to run                          Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Waiting to run                          Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Waiting to run                          Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Waiting to run                          Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)

LHC@home        Running (4 CPUs)                        ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)

The previously downloaded LHC mt task is still using only 4 CPUs.

It seems this is because I downloaded that LHC mt task earlier,
when the preferences were set to use 50% (4) CPUs
and BOINC permanently assigned 4 CPUs to the mt task then.

To demonstrate; after I changed the preferences to use 37.5% (3) CPUs:
LHC@home        Running (4 CPUs)                        ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)
Windows Task Manager registers 4 CPUs in use.

And 75% (6) CPUs:
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)

LHC@home        Running (4 CPUs)                        ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)
Windows Task Manager registers 6 CPUs in use.
ID: 114319 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114320 - Posted: 23 Jul 2024, 12:28:21 UTC
Last modified: 23 Jul 2024, 12:38:12 UTC

And curiously, with 75% (6) CPUs set in the local preferences, newly downloaded multi-thread (mt) tasks are still being assigned only 4 cores:
LHC@home        Ready to start (4 CPUs)                 CMS Simulation 70.30 (vbox64_mt_mcore_cms)

Confession:
I have since changed local and web preferences to 75% (6) CPUs.
My buffer's full - I'm going to let it run.
ID: 114320 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5139
United Kingdom
Message 114321 - Posted: 23 Jul 2024, 13:45:29 UTC - in response to Message 114320.  

When you choose to run an application which the project has designated as 'MT' (multi-threaded), it is the project server which decides how to configure each task. This is fixed at the moment the server decides to assign the task to your computer, and those values stay unchanged throughout the running time of that particular task - even if you change the settings later. The server sets three values:

  • <plan_class> mt
  • <avg_ncpus> x
  • <cmdline> --nthreads y

'avg_ncpus' tells your machine how to expect this task to run, and to keep enough space in BOINC's workload for it to run.
'nthreads' controls the actual behaviour of this project task.

If your want, you can over-ride the server settings, and set absolute values yourself locally. This is done by creating a file called app_config.xml, as described in the User Manual.

Take care: BOINC is very picky about that file being exactly correct, and the manual assumes that you are familiar with certain conventions, like 'items in square brockets ([ ]) are optional'. Probably best to ask for a guiding hand here, and we can help you cook one up.

ID: 114321 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114322 - Posted: 24 Jul 2024, 12:44:20 UTC

That would be nice, thank you Richard.

My most prominent problem is with the LHC@home virtual machines:

All the LHC@home tasks are Linux native. To run them on Windows they need a Linux "virtual machine".
My Windows 10 box is (mostly) ten year old hardware. This may or may not be a factor in why it struggles to start and stop these vm's - there are newer CPUs with special faculties for virtualisation.

My CPU:
Processor: i7-4790K CPU @ 4.00GHz [Family 6 Model 60 Stepping 3] (4th Gen Hyper-Threading CPU 2014)
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 pbe fsgsbase bmi1 smep bmi2

32 GB DDR3 RAM
2 TB SSD

The symptoms are excessive Windows system disk access - especially immediately after stopping a vm. I have to wait 90s for it to subside. If I stop two at once they get knotted.

The solution I'm using is to stop/start each task manually. But this takes time and patience.

So I use the preferences to limit the total number of CPUs and an app_config.xml to limit the number of LHC@home vm tasks running concurrently:

app_config.xml for LHC@home:
<app_config>
  <app>
    <name>CMS</name>
    <max_concurrent>1</max_concurrent>
  </app>
  <app>
    <name>Theory</name>
    <max_concurrent>3</max_concurrent>
  </app>
</app_config>

Simply so the mannual control takes less time...



Can this be improved?

Looking at the app_config.xml manual, I fear that lengthy testing needs to be done to find the best setup.

At first glance, useful additional elements seem to be

<app_config>
  <project_max_concurrent>4</project_max_concurrent>            <!-- 4 tasks - easier to start/stop manually -->
and
  <app_version>
    <app_name>CMS</app_name>
    <cmdline>--nthreads 4</cmdline>                             <!-- leaves space for other tasks to run -->

I've also heard rumours about an automatic script that starts/stops LHC tasks in a timed fashion.
ID: 114322 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114323 - Posted: 24 Jul 2024, 12:53:31 UTC

Had a go -
app_config.xml for LHC@home:
<app_config>

  <project_max_concurrent>4</project_max_concurrent>

  <app>
    <name>ATLAS</name>
    <max_concurrent>1</max_concurrent>
  </app>
  <app_version>
    <app_name>ATLAS</app_name>
    <plan_class>mt</plan_class>
    <avg_ncpus>4</avg_ncpus>
    <cmdline>--nthreads 4</cmdline>
  </app_version>

  <app>
    <name>CMS</name>
    <max_concurrent>1</max_concurrent>
  </app>
  <app_version>
    <app_name>CMS</app_name>
    <plan_class>mt</plan_class>
    <avg_ncpus>4</avg_ncpus>
    <cmdline>--nthreads 4</cmdline>
  </app_version>

  <app>
    <name>Theory</name>
    <max_concurrent>4</max_concurrent>
  </app>

</app_config>

Started BOINC suspended.

Event Log:
24/07/2024 13:06:02 | LHC@home | Found app_config.xml
24/07/2024 13:06:02 | LHC@home | Entry in app_config.xml for app 'ATLAS', plan class 'mt' doesn't match any app versions
24/07/2024 13:06:02 | LHC@home | Entry in app_config.xml for app 'CMS', plan class 'mt' doesn't match any app versions
24/07/2024 13:06:02 | LHC@home | Max 4 concurrent jobs
24/07/2024 13:06:02 | LHC@home | ATLAS: Max 1 concurrent jobs
24/07/2024 13:06:02 | LHC@home | Theory: Max 4 concurrent jobs
24/07/2024 13:06:02 | LHC@home | CMS: Max 1 concurrent jobs


The ATLAS & CMS notices.
What does it mean?
They are multi thread jobs:

ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)
CMS Simulation 70.30 (vbox64_mt_mcore_cms)

That's what the "mt" in the application name means...

Test it by running.
Resume LHC@home first.

24/07/2024 13:16:20 | LHC@home | project resumed by user

LHC@home        Running (4 CPUs)                        CMS Simulation 70.30 (vbox64_mt_mcore_cms)
This job was downloaded as a 4 thread task.

OK


Resume Einstein@home -

24/07/2024 13:22:25 | Einstein@Home | project resumed by user

Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)
Einstein@home   Running (0.9 CPUs + 1 NVIDIA GPU)       All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2)

OK

6.9 CPUs total.

Windows Task Manager registers 93% of CPUs in use. (6.9 CPUs is 86.25% - the ~7% is OS backgound tasks)

It runs.
ID: 114323 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5139
United Kingdom
Message 114324 - Posted: 24 Jul 2024, 14:28:23 UTC - in response to Message 114323.  

OK, you're close, but not quite there yet.

It turns out that LHC don't use the basic plan_class labels, but "something much longer containing the letters mt in the middle". You need to use the long version.

You've found the long names yourself, so the current ones become

<plan_class>vbox64_mt_mcore_atlas</plan_class>
<plan_class>vbox64_mt_mcore_cms</plan_class>
Sorry about that.

For the record:
You can add or change an app_config file while BOINC is running - just prepare the file, and activate it by going to the options menu in BOINC Manager advanced view, and choosing "Read config files". Then look in the Event Log: it should confirm that it's found the file, and if the app name is wrong (as in this case), it'll tell you which names it does know (i.e., the names of the apps you've run before).

If you want to work out in advance what names are needed, every project has an 'Applications' page on their website. In this case, it's https://lhcathome.cern.ch/lhcathome/apps.php - use the strings in brackets in the 'version' column.
ID: 114324 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114337 - Posted: 25 Jul 2024, 10:21:54 UTC


Thank you!

That runs with no warning notices.
ID: 114337 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114338 - Posted: 25 Jul 2024, 10:52:51 UTC
Last modified: 25 Jul 2024, 11:44:54 UTC

The new set up initially showed a good range of tasks.
I let BOINC run for 8 hours and:

Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running (0.9 CPUs + 1 NVIDIA GPU)       All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2)

LHC@home        Running (4 CPUs)                        CMS Simulation 70.30 (vbox64_mt_mcore_cms)

9.9 of 8 CPUs - not the desired 6.9 of 8 CPUs.


Windows Task Manager shows:

The LHC CMS is using 4 cores. OK

The Einstein GPU task is using 1 core. OK

The 5 Einstein BRP4X64's are sharing the remaining 3 cores.

9.9 of 8 CPUs
It seems that BOINC sees the LHC CMS 4 CPU task as just 1 CPU job.
ID: 114338 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5139
United Kingdom
Message 114339 - Posted: 25 Jul 2024, 11:44:29 UTC - in response to Message 114338.  

OK. I think the next stage of the investigation involves getting down and dirty with BOINC's Event Log. Warning: what I'm about to suggest writes a lot of information into the Event Log in a very short time. Be prepared to set it, and then remove it, as quickly as possible - we can study what it reports at leisure.

I find the easiest way to do this - if you have a large enough monitor - is to open BOINC Manager in Advanced View, and from there open the Event Log window. Arrange the two windows side by side on the screen.

From the main window, open the Event Log options dialog (Ctrl+Shift+F) and leave it open. Then in sequence:

Check cpu_sched_debug
Click 'Apply'
Watch the Event Log window until a large block of text appears
UNcheck cpu_sched_debug
Click 'Save'.

That should get you one cycle of messages (you only want one!), which has the framework

25/07/2024 12:24:19 |  | [cpu_sched_debug] Request CPU reschedule: Core client configuration
25/07/2024 12:24:20 |  | [cpu_sched_debug] schedule_cpus(): start
...
25/07/2024 12:24:20 |  | [cpu_sched_debug] enforce_run_list(): start
...
25/07/2024 12:24:20 |  | [cpu_sched_debug] final job list:
...
25/07/2024 12:24:20 |  | [cpu_sched_debug] enforce_run_list: end
with lists of tasks in the gaps. Post that entire cycle here, and we can all look at it, to see if we can identify the problem.
ID: 114339 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114340 - Posted: 25 Jul 2024, 12:13:45 UTC
Last modified: 25 Jul 2024, 12:54:20 UTC

 1                 | 25/07/2024 13:09:12 | [cpu_sched_debug] Request CPU reschedule: Core client configuration
 2                 | 25/07/2024 13:09:13 | [cpu_sched_debug] schedule_cpus(): start
 3  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] domino prevention: mark p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 as deadline miss
 4  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] domino prevention: mark p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 as deadline miss
 5  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] domino prevention: mark p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 as deadline miss
 6  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] domino prevention: mark p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 as deadline miss
 7  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
 8  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40741_0 (NVIDIA GPU, FIFO) (prio -2.468008)
 9  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (CPU, EDF) (prio -2.570108)
10  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (CPU, EDF) (prio -2.570513)
11  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (CPU, EDF) (prio -2.570918)
12  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: CMS_167286_1721823814.462743_0 (CPU, FIFO) (prio -0.021328)
13  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: Theory_2743-2829666-331_1 (CPU, FIFO) (prio -0.022409)
14  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: Theory_2743-2790498-325_2 (CPU, FIFO) (prio -0.022679)
15  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] add to run list: Theory_2743-2790649-331_1 (CPU, FIFO) (prio -0.022949)
16                 | 25/07/2024 13:09:13 | [cpu_sched_debug] enforce_run_list(): start
17                 | 25/07/2024 13:09:13 | [cpu_sched_debug] preliminary job list:
18  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] 0: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40741_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes)
19  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] 1: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (1.00 CPU; MD: yes; UTS: no)
20  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] 2: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: yes; UTS: no)
21  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] 3: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (1.00 CPU; MD: yes; UTS: yes)
22  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] 4: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no)
23  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] 5: Theory_2743-2829666-331_1 (1.00 CPU; MD: no; UTS: no)
24  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] 6: Theory_2743-2790498-325_2 (1.00 CPU; MD: no; UTS: no)
25  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] 7: Theory_2743-2790649-331_1 (1.00 CPU; MD: no; UTS: no)
26                 | 25/07/2024 13:09:13 | [cpu_sched_debug] final job list:
27  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] 0: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (1.00 CPU; MD: yes; UTS: yes)
28  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] 1: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (1.00 CPU; MD: yes; UTS: no)
29  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] 2: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: yes; UTS: no)
30  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] 3: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40741_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes)
31  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] 4: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: yes)
32  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] 5: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no)
33  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] 6: Theory_2743-2829666-331_1 (1.00 CPU; MD: no; UTS: no)
34  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] 7: Theory_2743-2790498-325_2 (1.00 CPU; MD: no; UTS: no)
35  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] 8: Theory_2743-2790649-331_1 (1.00 CPU; MD: no; UTS: no)
36  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (high priority)
37  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (high priority)
38  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (high priority)
39  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40741_0
40  Einstein@Home  | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0
41  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] scheduling CMS_167286_1721823814.462743_0
42  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] all CPUs used (8.90 >= 6), skipping Theory_2743-2829666-331_1
43  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] all CPUs used (8.90 >= 6), skipping Theory_2743-2790498-325_2
44  LHC@home       | 25/07/2024 13:09:13 | [cpu_sched_debug] all CPUs used (8.90 >= 6), skipping Theory_2743-2790649-331_1
45                 | 25/07/2024 13:09:13 | [cpu_sched_debug] enforce_run_list: end
ID: 114340 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114342 - Posted: 25 Jul 2024, 13:05:58 UTC

Current:
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running (0.9 CPUs + 1 NVIDIA GPU)       All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2)

LHC@home        Running (4 CPUs)                        CMS Simulation 70.30 (vbox64_mt_mcore_cms)
6 jobs -using- 8.9 CPUs.
ID: 114342 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5139
United Kingdom
Message 114343 - Posted: 25 Jul 2024, 13:27:34 UTC - in response to Message 114340.  

Ah. Bingo!

Follow the p2030 tasks through the stages. They go through:

mark p2030... as deadline miss
add to run list: p2030... (CPU, EDF)
scheduling p2030... (high priority)
"deadline miss", "EDF", and "high priority" are all synonyms for the same issue - you have too much work in your cache. These tasks have relatively short deadlines (7 days, from memory), and they may not be completed in time. BOINC deals with them as quickly as possible, and schedules them first - before applying other limits, like your chosen number of core to run.

Reduce the number of days' work you request, and let them work through. You should see the number of cores in use reduce as BOINC feels the deadline pressure is reducing.
ID: 114343 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114344 - Posted: 25 Jul 2024, 14:14:53 UTC

OK - Thank you.

I adjusted "Options -> Computing preferences".

They were:

Store at least [2] days and up to an additional [1] days of work.

Changed to:

Store at least [1] days and up to an additional [1] days of work.

This produced:

6.9 CPUs OK
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running (0.9 CPUs + 1 NVIDIA GPU)       All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2)

LHC@home        Running                                 Theory Simulation 300.30 (vbox64_theory)
LHC@home        Running (4 CPUs)                        CMS Simulation 70.30 (vbox64_mt_mcore_cms)

and the "CPU Scheduler Debug" output:

1                   | 25/07/2024 14:41:17 | [cpu_sched_debug] Request CPU reschedule: Core client configuration
2                   | 25/07/2024 14:41:18 | [cpu_sched_debug] schedule_cpus(): start
3   Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] thrashing prevention: mark p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 as deadline miss
4   Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] thrashing prevention: mark p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 as deadline miss
5   Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
6   Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40749_0 (NVIDIA GPU, FIFO) (prio -2.467961)
7   LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: Theory_2743-2829666-331_1 (CPU, FIFO) (prio -0.021359)
8   LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: CMS_167286_1721823814.462743_0 (CPU, FIFO) (prio -0.021629)
9   LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: Theory_2743-2790498-325_2 (CPU, FIFO) (prio -0.022710)
10  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: Theory_2743-2790649-331_1 (CPU, FIFO) (prio -0.022980)
11  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (CPU, FIFO) (prio -2.570062)
12  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (CPU, FIFO) (prio -2.570467)
13  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] add to run list: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (CPU, FIFO) (prio -2.570872)
14                  | 25/07/2024 14:41:18 | [cpu_sched_debug] enforce_run_list(): start
15                  | 25/07/2024 14:41:18 | [cpu_sched_debug] preliminary job list:
16  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] 0: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40749_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes)
17  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] 1: Theory_2743-2829666-331_1 (1.00 CPU; MD: no; UTS: yes)
18  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] 2: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no)
19  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] 3: Theory_2743-2790498-325_2 (1.00 CPU; MD: no; UTS: no)
20  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] 4: Theory_2743-2790649-331_1 (1.00 CPU; MD: no; UTS: no)
21  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] 5: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: yes)
22  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] 6: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (1.00 CPU; MD: no; UTS: no)
23  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] 7: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: no; UTS: no)
24                  | 25/07/2024 14:41:18 | [cpu_sched_debug] final job list:
25  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] 0: h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40749_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes)
26  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] 1: Theory_2743-2829666-331_1 (1.00 CPU; MD: no; UTS: yes)
27  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] 2: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: yes)
28  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] 3: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no)
29  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] 4: Theory_2743-2790498-325_2 (1.00 CPU; MD: no; UTS: no)
30  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] 5: Theory_2743-2790649-331_1 (1.00 CPU; MD: no; UTS: no)
31  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] 6: p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0 (1.00 CPU; MD: no; UTS: no)
32  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] 7: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: no; UTS: no)
33  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] scheduling h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_40749_0
34  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] scheduling Theory_2743-2829666-331_1
35  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] scheduling p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0
36  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] scheduling CMS_167286_1721823814.462743_0
37  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping Theory_2743-2790498-325_2
38  LHC@home        | 25/07/2024 14:41:18 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping Theory_2743-2790649-331_1
39  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping p2030.20200319.G185.10-00.34.C.b6s0g0.00000_1272_0
40  Einstein@Home   | 25/07/2024 14:41:18 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0
41                  | 25/07/2024 14:41:18 | [cpu_sched_debug] enforce_run_list: end
ID: 114344 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114346 - Posted: 25 Jul 2024, 21:15:09 UTC

And, after a few hours with "Won't get new tasks" for both projects:

6.9 CPUs
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)
Einstein@home   Running (0.9 CPUs + 1 NVIDIA GPU)       All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2)

LHC@home        Running (4 CPUs)                        CMS Simulation 70.30 (vbox64_mt_mcore_cms)


And the "CPU Scheduler Debug" output:
 1                  | 25/07/2024 21:56:16 | [cpu_sched_debug] Request CPU reschedule: Core client configuration
 2                  | 25/07/2024 21:56:17 | [cpu_sched_debug] schedule_cpus(): start
 3  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
 4  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: h1_1553.60_O3aC01Cl1In0__O3ASHF1d_1554.00Hz_56530_0 (NVIDIA GPU, FIFO) (prio -2.467535)
 5  LHC@home        | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: CMS_167286_1721823814.462743_0 (CPU, FIFO) (prio -0.021643)
 6  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (CPU, FIFO) (prio -2.569635)
 7  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (CPU, FIFO) (prio -2.570041)
 8  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (CPU, FIFO) (prio -2.570446)
 9  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 (CPU, FIFO) (prio -2.570851)
10  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: LATeah2102F_1048.0_291588_0.0_0 (CPU, FIFO) (prio -2.571256)
11  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] add to run list: LATeah2102F_1048.0_291544_0.0_0 (CPU, FIFO) (prio -2.571661)
12                  | 25/07/2024 21:56:17 | [cpu_sched_debug] enforce_run_list(): start
13                  | 25/07/2024 21:56:17 | [cpu_sched_debug] preliminary job list:
14  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 0: h1_1553.60_O3aC01Cl1In0__O3ASHF1d_1554.00Hz_56530_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes)
15  LHC@home        | 25/07/2024 21:56:17 | [cpu_sched_debug] 1: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no)
16  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 2: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: no; UTS: no)
17  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 3: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (1.00 CPU; MD: no; UTS: yes)
18  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 4: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: no)
19  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 5: p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 (1.00 CPU; MD: no; UTS: no)
20  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 6: LATeah2102F_1048.0_291588_0.0_0 (1.00 CPU; MD: no; UTS: no)
21  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 7: LATeah2102F_1048.0_291544_0.0_0 (1.00 CPU; MD: no; UTS: no)
22                  | 25/07/2024 21:56:17 | [cpu_sched_debug] final job list:
23  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 0: h1_1553.60_O3aC01Cl1In0__O3ASHF1d_1554.00Hz_56530_0 (0.90 CPU + 1.00 NVIDIA GPU; MD: no; UTS: yes)
24  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 1: p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0 (1.00 CPU; MD: no; UTS: yes)
25  LHC@home        | 25/07/2024 21:56:17 | [cpu_sched_debug] 2: CMS_167286_1721823814.462743_0 (4.00 CPU; MD: no; UTS: no)
26  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 3: p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0 (1.00 CPU; MD: no; UTS: no)
27  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 4: p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0 (1.00 CPU; MD: no; UTS: no)
28  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 5: p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2 (1.00 CPU; MD: no; UTS: no)
29  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 6: LATeah2102F_1048.0_291588_0.0_0 (1.00 CPU; MD: no; UTS: no)
30  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] 7: LATeah2102F_1048.0_291544_0.0_0 (1.00 CPU; MD: no; UTS: no)
31  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] scheduling h1_1553.60_O3aC01Cl1In0__O3ASHF1d_1554.00Hz_56530_0
32  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] scheduling p2030.20200319.G199.24-00.53.S.b6s0g0.00000_3296_0
33  LHC@home        | 25/07/2024 21:56:17 | [cpu_sched_debug] scheduling CMS_167286_1721823814.462743_0
34  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] scheduling p2030.20200319.G185.36+00.11.C.b0s0g0.00000_664_0
35  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping p2030.20200319.G199.60+00.17.S.b4s0g0.00000_1744_0
36  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2
37  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping LATeah2102F_1048.0_291588_0.0_0
38  Einstein@Home   | 25/07/2024 21:56:17 | [cpu_sched_debug] all CPUs used (6.90 >= 6), skipping LATeah2102F_1048.0_291544_0.0_0
39                  | 25/07/2024 21:56:17 | [cpu_sched_debug] enforce_run_list: end
ID: 114346 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5139
United Kingdom
Message 114347 - Posted: 26 Jul 2024, 11:20:51 UTC - in response to Message 114346.  

I take it from that last show that BOINC is currently running as you would expect from your current settings? I'm sure you'll keep an eye on it, but we're probably done for now.

I'll just pass on one final thought - it involves the fractional CPU usage shown for the Einstein Gravitational Wave app.

GPU hardware and GPU applications are very variable. Actual observed CPU usage can range from minuscule (one or two percent) to 100% - BOINC is very bad at detecting and reacting to the extremes. In particular, the combination of

Windows OS + NVidia hardware + OpenCL programming language

responds best when a full CPU core is available for it to use - and the GW app falls into that trap. Einstein have deployed it with a 90% CPU setting, but that's not really enough: with a fractional setting, no matter how high, BOINC allows another CPU task to run.

You can upgrade the GW app to request a full CPU core by using another app_config file. Use the 'Application' settings, and a <cpu_usage> of 1. As before, ask if you need help.
ID: 114347 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15585
Netherlands
Message 114348 - Posted: 26 Jul 2024, 12:31:27 UTC - in response to Message 114343.  

Great, new terminology.

"domino prevention"
"trashing prevention"

What's the difference between the two?
ID: 114348 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114350 - Posted: 26 Jul 2024, 15:01:02 UTC
Last modified: 26 Jul 2024, 15:27:03 UTC

Thanks that's great.

So with just

<app_config>
  <app>
    <name>einstein_O3AS</name>
    <max_concurrent>1</max_concurrent>
    <cpu_usage>1</cpu_usage>
  </app>
</app_config>

I still got

Einstein@home | Running (0.9 CPUs + 1 NVIDIA GPU)     | All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2)     | h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2

So I expanded the

app_config.xml for Einstein@home:
<app_config>

  <!-- All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) -->
  <!-- (0.9 CPUs + 1 NVIDIA GPU) default -->
  <!-- make it use 1 CPUs -->
  <app>
    <name>einstein_O3AS</name>
    <max_concurrent>1</max_concurrent>
    <cpu_usage>1</cpu_usage>
  </app>
  <app_version>
    <app_name>einstein_O3AS</app_name>
    <plan_class>GW-opencl-nvidia-2</plan_class>
    <avg_ncpus>1</avg_ncpus>
    <ngpus>1</ngpus>
    <cmdline>--nthreads 1</cmdline>
  </app_version>

</app_config>


"I think" I momentarily got

Einstein@home | Running (1 CPUs + 1 NVIDIA GPU)             | All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) | h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2

before it was kicked -

Einstein@home | Computation error (0.9 CPUs + 1 NVIDIA GPU) | All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2) | h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2

Event log:
Einstein@Home | 26/07/2024 15:05:04 | project resumed by user
Einstein@Home | 26/07/2024 15:05:07 | Computation for task h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2 finished
Einstein@Home | 26/07/2024 15:05:07 | Output file h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2_1 for task h1_1551.60_O3aC01Cl1In0__O3ASHF1d_1552.00Hz_5839_2 absent

Sorry about that - I should have been quicker to observe.

And just by chance there were no more GPU apps in the queue.

So - "Allow new tasks" got me -

Einstein@home   Running                                 Binary Radio Pulsar Search (Arecibo,GBT,long) 1.33 (BRP4X64)            p2030.20200310.G200.41-00.24.C.b2s0g0.00000_1336_2
Einstein@home   Running                                 Gamma-ray pulsar search #5 1.08 (FGRPSSE)                               LATeah2102F_1048.0_291588_0.0_0

Einstein@home   Running (0.2 CPUs + 0.33 NVIDIA GPUs)   Binary Radio Pulsar Search (MeerKAT) 0.12 (BRP7-cuda55)                 Ter5_1_dns_cfbf00021_segment_6_dms_200_40000_156_7250000_1
Einstein@home   Running (0.2 CPUs + 0.33 NVIDIA GPUs)   Binary Radio Pulsar Search (MeerKAT) 0.12 (BRP7-cuda55)                 Ter5_1_dns_cfbf00021_segment_6_dms_200_40000_156_7400000_0
Einstein@home   Running (0.2 CPUs + 0.33 NVIDIA GPUs)   Binary Radio Pulsar Search (MeerKAT) 0.12 (BRP7-cuda55)                 Ter5_1_dns_cfbf00021_segment_6_dms_200_40000_156_6950000_1

LHC@home        Running (4 CPUs)                        CMS Simulation 70.30 (vbox64_mt_mcore_cms)

.... LOL
ID: 114350 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5139
United Kingdom
Message 114353 - Posted: 26 Jul 2024, 15:37:01 UTC - in response to Message 114350.  
Last modified: 26 Jul 2024, 15:38:33 UTC

Always keep the User Manual open beside you while you're working, and refer to it often...

I make that

<app_config>
   <app>
      <name>einstein_O3AS</name>
      <gpu_versions>
         <gpu_usage>1</gpu_usage>
         <cpu_usage>1</cpu_usage>
      </gpu_versions>
   </app>
<app_config>
The sub-entries inside <gpu_versions> aren't optional.
ID: 114353 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 114354 - Posted: 26 Jul 2024, 16:21:09 UTC

OK
MeerKATs aside...

With the app_config.xml I posted,
I just got some GPU tasks and they look like this -

Einstein@home   Ready to start (1 CPUs + 1 NVIDIA GPU)        All-Sky Gravitational Wave search on O3 1.07 (GW-opencl-nvidia-2)

Out of curiosity I'll let it run a few before trying your version.
The small litter of MeerKATs have yet to finish...
ID: 114354 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Questions and problems : "Use at most x% of the CPUs" not working - sometimes

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.