Thread 'Multi core tasks alongside single core tasks.'

Message boards : BOINC client : Multi core tasks alongside single core tasks.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111880 - Posted: 27 May 2023, 14:34:21 UTC
Last modified: 27 May 2023, 14:52:19 UTC

Running one old resend from CPDN. I decided to allow some amicable numbers tasks to also run. I downloaded a few one of which started on 7 cores. I increased the percentage of cores to be used to 50 to allow the CPDN task to keep going. It wouldn't start till I had paused all the AN tasks, then I was able to un-pause them and it continued to run.

1. Is this a known behaviour?
2: Is there an option to limit the number of cores multi core tasks will use?

Edit: Even after I increased the number of cores available to BOINC to 60% after a minute or two of the AN 7 core task running alongside the CPDN task the CPDN task stops and shows as, "Waiting to run." While waiting to here views on this, I will nip over to Git-hub and see if there is a bug filed for this. As the CPDN task is a priority for me I have aborted the AN tasks as they wouldn't restart before the deadline if paused to let the CPDN one complete.

Edit:2 Issue opened on git-hub. #5254
ID: 111880 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111881 - Posted: 27 May 2023, 15:13:49 UTC - in response to Message 111880.  

AenBleidd commented May 27, 2023

This is the expected default behavior.
If you want to limit number of cores for multithreaded tasks to let BOINC run it alongside with singlethreaded tasks, you should configure every particular project as described here:
https://boinc.berkeley.edu/forum_thread.php?id=13170


The bit I don't understand is that if I increase the cores available after the multicore task has started, the CPDN one still stops within two minutes after an Amicable Numbers one is restarted.
ID: 111881 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111882 - Posted: 27 May 2023, 16:33:19 UTC - in response to Message 111881.  
Last modified: 27 May 2023, 16:46:08 UTC

I got involved at github because Vitalii's response includes a suggested resolution from me, and we need to do some proper testing on the latest code (Dave's running v7.23.0 self-build from master).

I've joined Amicable, and found I could set the number of CPUs to use in Project Preferences, as well as the workround I suggested. I chose 3, and got a task allocated with:

    <avg_ncpus>3.000000</avg_ncpus>
    <plan_class>mt</plan_class>
    <cmdline>--nthreads 3</cmdline>
so that bit's working as documented. I'll monitor what it gets up to when running.

But this is with a pretty well-tested v7.20.5 from the PPA. Looks like I might have to do some building, too.

Edit: And now my 6-core machine is running 3 cores for AM, 1 single-core CPU task, and 2 GPU tasks with a full core assigned to each (ugh - OpenCL). That's what I expect and want, but it took a while for BOINC to respond and pause the other two single-core tasks, so for a while I was running 8 cores.
ID: 111882 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111883 - Posted: 27 May 2023, 17:49:11 UTC - in response to Message 111882.  

First test, if I cut down the number of CPUs BOINC can use, the CPDN single core task keeps running. When the downloaded tasks which are now set at 4 threads are finished, I will increase to 5 and try again. So this seems like another anomaly, the AN task runs using 4 cores and the CPDN one uses one making 5 in total so 5/16 rather than 1/4 which is what I set in BOINC. i have not currently got an app_config for any projects.
ID: 111883 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111884 - Posted: 27 May 2023, 18:36:02 UTC

OK, I've pulled down the artifacts for #5251 - 2 days old, so probably pretty close to yours. I'll take them for a spin tomorrow.
ID: 111884 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111885 - Posted: 27 May 2023, 20:08:32 UTC - in response to Message 111884.  

OK, I've pulled down the artifacts for #5251 - 2 days old, so probably pretty close to yours. I'll take them for a spin tomorrow.


14th May was the date for my master.
ID: 111885 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111886 - Posted: 27 May 2023, 20:41:48 UTC - in response to Message 111885.  

v7.22/23 is all about a whole new set of global preferences, separate ones for 'not in use'. #5251 was a late afterthought by David, who thought a late and undocumented change - which he couldn't remember why he put in - might lead to "Otherwise BOINC will stop computing after 60 minutes of idleness."

That would be stopping everything, not just one project, but it's not a million miles from the question you're asking. That's why I want to go on poking and prodding until we understand exactly what's going on, and why.
ID: 111886 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111887 - Posted: 27 May 2023, 20:46:26 UTC - in response to Message 111886.  

That would be stopping everything, not just one project, but it's not a million miles from the question you're asking. That's why I want to go on poking and prodding until we understand exactly what's going on, and why.


I shall try and get a more precise summary of the issue here. Most of the time I don't do the sort of thing that has alerted me to the issue so its not a big issue for me but I can see it might be for some, especially if they want their system to crunch 24/7 without any intervention. I look at what is happening on this machine at least two or three times most days so tend to notice if something is not behaving as expected quite quickly.
ID: 111887 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111889 - Posted: 28 May 2023, 7:04:00 UTC
Last modified: 28 May 2023, 7:05:55 UTC

May have something. Increased %cpus to 40. Downloaded bunch of AN tasks. got rid of the nvidia ones that always crash on my machine that would run alongside the multicore tasks till they crashed. CPDN task didn't start. Increased cpu% to 50% i.e. 8 cores.Still didn't start.
Extract from event log.

Sun 28 May 2023 07:53:29 BST |  | [cpu_sched_debug] Request CPU reschedule: periodic CPU scheduling
Sun 28 May 2023 07:53:29 BST |  | [cpu_sched_debug] schedule_cpus(): start
Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] add to run list: hadam4h_a05m_200011_5_932_012141214_1 (CPU, FIFO) (prio -0.175154)
Sun 28 May 2023 07:53:29 BST |  | [cpu_sched_debug] enforce_run_list(): start
Sun 28 May 2023 07:53:29 BST |  | [cpu_sched_debug] preliminary job list:
Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] 0: hadam4h_a05m_200011_5_932_012141214_1 (MD: no; UTS: no)
Sun 28 May 2023 07:53:29 BST |  | [cpu_sched_debug] final job list:
Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] 1: hadam4h_a05m_200011_5_932_012141214_1 (MD: no; UTS: no)
Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] all CPUs used (6.00 >= 6), skipping hadam4h_a05m_200011_5_932_012141214_1
Sun 28 May 2023 07:53:29 BST |  | [cpu_sched_debug] enforce_run_list: end


It seems like the client hasn't increased the number of CPUs in use.
Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] all CPUs used (6.00 >= 6), skipping hadam4h_a05m_200011_5_932_012141214_1
Is the line that may be a smoking gun?
I aborted the AN tasks in the queue to see if it was to do with task priority but again having stopped the Amicable task the CPDN one restarted but stopped again once the AN started. There is a risk that I might lose the CPDN task but given it is almost a year old so probably not going to be used i was wondering about stopping the client and restarting it to see if that made any difference? But wondered if there was anything else I should look at first?
ID: 111889 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111890 - Posted: 28 May 2023, 7:33:58 UTC - in response to Message 111889.  

Can you show us the bit of the log that says what those 6 cpus are busy with, please? Is that your current Amicable ncpus setting?
ID: 111890 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111891 - Posted: 28 May 2023, 8:11:37 UTC - in response to Message 111890.  

Can you show us the bit of the log that says what those 6 cpus are busy with, please? Is that your current Amicable ncpus setting?


Will scroll back and find it in a minute. first,

Sun 28 May 2023 09:08:30 BST |  | [cpu_sched_debug] final job list:
Sun 28 May 2023 09:08:30 BST | climateprediction.net | [cpu_sched_debug] 0: hadam4h_a05m_200011_5_932_012141214_1 (MD: no; UTS: yes)
Sun 28 May 2023 09:08:30 BST | climateprediction.net | [cpu_sched_debug] scheduling hadam4h_a05m_200011_5_932_012141214_1
Sun 28 May 2023 09:08:30 BST |  | [cpu_sched_debug] using 1.00 out of 6 CPUs
Sun 28 May 2023 09:08:30 BST | climateprediction.net | [css] running hadam4h_a05m_200011_5_932_012141214_1 ( )
Sun 28 May 2023 09:08:30 BST |  | [cpu_sched_debug] enforce_run_list: end

Now the Amicable numbers task is finished, as you can see the event log is showing Sun 28 May 2023 09:08:30 BST | | [cpu_sched_debug] using 1.00 out of 6 CPUs despite the manager showing 8 as being in use.
ID: 111891 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111892 - Posted: 28 May 2023, 8:13:48 UTC

Sun 28 May 2023 08:54:17 BST | | [cpu_sched_debug] final job list:
Sun 28 May 2023 08:54:17 BST | Amicable Numbers | [cpu_sched_debug] 0: amicable_10_21_32464_1685246402.619422_63_0 (MD: no; UTS: yes)
Sun 28 May 2023 08:54:17 BST | Amicable Numbers | [cpu_sched_debug] scheduling amicable_10_21_32464_1685246402.619422_63_0
Sun 28 May 2023 08:54:17 BST | Amicable Numbers | [css] running amicable_10_21_32464_1685246402.619422_63_0 (6 CPUs)
Sun 28 May 2023 08:54:17 BST | | [cpu_sched_debug] enforce_run_list: end
ID: 111892 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111893 - Posted: 28 May 2023, 8:31:47 UTC

That really does look like a problem and a smoking gun. I've pulled down a 2-week old artifact as well, and I'll deploy that when my caffeine levels have reached optimum - see if I can reproduce it. Then, it's probably off to the simulator - that will be David's first response.

I did get a change made to MT handling (#4992) for CPDN/IFS, but that should have been for the server only - and it looks like it is (sched/sched_send.cpp is a server file). The conversation in that PR rather tails off, but I think we got it tested in the end, thanks to LHC.
ID: 111893 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111894 - Posted: 28 May 2023, 8:55:08 UTC

I think I am going to need to go back to school as it were and learn some programming in whatever version of C BOINC uses. My only formal learning was in ALGOL which I haven't seen evidence of it being used for at least 30 years.
ID: 111894 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111895 - Posted: 28 May 2023, 9:19:50 UTC - in response to Message 111894.  

Likewise! I learned Algol 60 at the back of my mother's classes in the school holidays, and we used Algol W as the main language on my diploma course. Algol W was a Stanford University product, so Berkeley probably doesn't recognise it.
ID: 111895 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111896 - Posted: 28 May 2023, 9:28:12 UTC - in response to Message 111895.  

Likewise! I learned Algol 60 at the back of my mother's classes in the school holidays, and we used Algol W as the main language on my diploma course. Algol W was a Stanford University product, so Berkeley probably doesn't recognise it.


I can't even remember the differences between 60 and W now. W I spent a couple of terms doing while still at school, 60 I did some of at St. Andrew's Uni. before going off in a different direction and working in child and adolescent mental health nursing for 25 years.
ID: 111896 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111897 - Posted: 28 May 2023, 9:40:17 UTC
Last modified: 28 May 2023, 10:05:29 UTC

OK, back to business - prepping up.

This is a 6-core i5-9600KF, which I think is capable of hyperthreading to 12 cores, but I have that turned off in hardware. So, first prep job is to turn that down in preferences, so I can turn it back up later.

Sun 28 May 2023 10:31:18 BST |  | Number of usable CPUs has changed from 6 to 5.
Sun 28 May 2023 10:31:18 BST |  | max CPUs used: 5
Sun 28 May 2023 10:31:19 BST | NumberFields@home | [cpu_sched] Preempting wu_sf3_DS-16x271-21_Grp890573of1000000_0 (left in memory)
Note that I have activity set to 'run always', not according to preferences, but that preference is acted on anyway.

With the 2-week old artifact,

Sun 28 May 2023 10:48:22 BST |  | Starting BOINC client version 7.23.0 for x86_64-pc-linux-gnu
Sun 28 May 2023 10:48:22 BST |  | This a development version of BOINC and may not function properly
Sun 28 May 2023 10:48:23 BST |  | -     max CPUs used: 5
Now off to find an MT task...

Got a couple. Now waiting for something to finish so MT starts up.
ID: 111897 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111898 - Posted: 28 May 2023, 10:14:35 UTC

I started my course in 1973, after this happened:

1969 Move to new building on an adjacent site. Titan airlifted by crane ('the computing service is suspended').

1971 IBM 370/165 installed for the Computing Service.

1973 Titan switched off. IBM memory doubled from 1 Mbytes to 2 ...
(quotes from the official history. Upgrades, eh?)

According to Wikipedia, Algol W was a key language on the IBM 360 range, and I presume 370 as well. That's the one I used.
ID: 111898 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111899 - Posted: 28 May 2023, 10:31:30 UTC

Back to business.

Sun 28 May 2023 11:09:50 BST | NumberFields@home | Computation for task wu_sf3_DS-16x271-21_Grp890573of1000000_0 finished
Sun 28 May 2023 11:09:50 BST | Amicable Numbers | [cpu_sched] Starting task amicable_10_21_2426_1685252702.324398_984_1 using amicable_10_21 version 300 (mt) in slot 2
Sun 28 May 2023 11:10:50 BST | NumberFields@home | [cpu_sched] Preempting wu_sf3_DS-16x271-21_Grp883275of1000000_0 (left in memory)
Sun 28 May 2023 11:10:50 BST | NumberFields@home | [cpu_sched] Preempting wu_sf3_DS-16x271-21_Grp881899of1000000_0 (left in memory)
Amicable is running on 3 cores, as yesterday. Note the one-minute time delay before pausing the two other tasks.

And this is definitely bonkers, and a bug.

Sun 28 May 2023 11:20:24 BST |  | Number of usable CPUs has changed from 5 to 6.
Sun 28 May 2023 11:20:25 BST | Amicable Numbers | Starting task amicable_10_21_2426_1685252702.324398_998_1
Sun 28 May 2023 11:20:25 BST | Amicable Numbers | [cpu_sched] Starting task amicable_10_21_2426_1685252702.324398_998_1 using amicable_10_21 version 300 (mt) in slot 5
I'm now running on 8 cores - 2x3 for Amicable, 1 each supporting the 2 GPUs.

cpu_sched_debug...

And here's the evidence.

Sun 28 May 2023 11:25:55 BST | NumberFields@home | [cpu_sched_debug] all CPUs used (8.00 >= 6), skipping wu_sf3_DS-16x271-21_Grp883275of1000000_0
Sun 28 May 2023 11:25:56 BST |  | [cpu_sched_debug] Request CPU reschedule: application exited
Sun 28 May 2023 11:25:56 BST |  | [cpu_sched_debug] final job list:
Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched_debug] 0: LATeah4021L07_940.0_0_0.0_34824363_2 (MD: no; UTS: yes)
Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched_debug] 1: LATeah4021L08_1116.0_0_0.0_6637911_0 (MD: no; UTS: no)
Sun 28 May 2023 11:25:56 BST | Amicable Numbers | [cpu_sched_debug] 2: amicable_10_21_2426_1685252702.324398_984_1 (MD: no; UTS: yes)
Sun 28 May 2023 11:25:56 BST | Amicable Numbers | [cpu_sched_debug] 3: amicable_10_21_2426_1685252702.324398_998_1 (MD: no; UTS: yes)
Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 4: wu_sf3_DS-16x271-21_Grp883275of1000000_0 (MD: no; UTS: no)
Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 5: wu_sf3_DS-16x271-21_Grp881899of1000000_0 (MD: no; UTS: no)
Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 6: wu_sf3_DS-16x271-21_Grp890082of1000000_0 (MD: no; UTS: no)
Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 7: wu_sf3_DS-16x271-21_Grp889806of1000000_0 (MD: no; UTS: no)
Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 8: wu_sf3_DS-16x271-21_Grp890572of1000000_0 (MD: no; UTS: no)
Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 9: wu_sf3_DS-16x271-21_Grp898646of1000000_0 (MD: no; UTS: no)
Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched_debug] scheduling LATeah4021L07_940.0_0_0.0_34824363_2
Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched_debug] scheduling LATeah4021L08_1116.0_0_0.0_6637911_0
Sun 28 May 2023 11:25:56 BST | Amicable Numbers | [cpu_sched_debug] scheduling amicable_10_21_2426_1685252702.324398_984_1
Sun 28 May 2023 11:25:56 BST | Amicable Numbers | [cpu_sched_debug] scheduling amicable_10_21_2426_1685252702.324398_998_1
Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] all CPUs used (8.00 >= 6), skipping wu_sf3_DS-16x271-21_Grp883275of1000000_0
Sun 28 May 2023 11:25:56 BST | Einstein@Home | Starting task LATeah4021L08_1116.0_0_0.0_6637911_0
Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched] Starting task LATeah4021L08_1116.0_0_0.0_6637911_0 using hsgamma_FGRPB1G version 128 (FGRPopencl2Pup-nvidia) in slot 1
Sun 28 May 2023 11:25:56 BST |  | [cpu_sched_debug] enforce_run_list: end
ID: 111899 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111900 - Posted: 28 May 2023, 10:36:40 UTC

Eliott22,000 here though I couldn't find it on the Wikipedia page for Eliott computers.
ID: 111900 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : BOINC client : Multi core tasks alongside single core tasks.

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.