Profile recommendations for HPC

Message boards : Questions and problems : Profile recommendations for HPC
Message board moderation

To post messages, you must log in.

AuthorMessage
JWMED

Send message
Joined: 9 Sep 20
Posts: 10
Germany
Message 100683 - Posted: 10 Sep 2020, 14:06:04 UTC
Last modified: 10 Sep 2020, 14:35:06 UTC

I am running BOINC on two headless HPCs (80 and 192 logical cores, 2x hyperthreaded I believe). We are a research instituition where people will log in and run both small daily statistical tasks as well as big multi-core computations from time to time. As my boss has kindly allowed our server infrastructure to be used for OpenPandemics, I'd like to make sure that BOINC does not interfere with anyone's work, while at the same time get the maximum contribution out of this.

I was wondering what the recommendations are for this use case. I have limited the number of logical cores to about 85% so far. I was especially wondering if there are recommendations for the CPU time. Will using values lower than 100% mean that user tasks will start quicker, as they get access to a core more easily (if that's a thing, I don't know much about scheduling and the like), and will the constant change in temperature and the resulting mechanical work from expansion and contraction wear out the CPUs more quickly than going with 100% CPU time?

Also, users are instructed to run all their tasks with niceness 19, to make logins and RStudio sessions smooth.

Any other considerations I should make?
ID: 100683 · Report as offensive
Bryn Mawr
Help desk expert

Send message
Joined: 31 Dec 18
Posts: 285
United Kingdom
Message 100685 - Posted: 10 Sep 2020, 15:53:37 UTC - in response to Message 100683.  

I am running BOINC on two headless HPCs (80 and 192 logical cores, 2x hyperthreaded I believe). We are a research instituition where people will log in and run both small daily statistical tasks as well as big multi-core computations from time to time. As my boss has kindly allowed our server infrastructure to be used for OpenPandemics, I'd like to make sure that BOINC does not interfere with anyone's work, while at the same time get the maximum contribution out of this.

I was wondering what the recommendations are for this use case. I have limited the number of logical cores to about 85% so far. I was especially wondering if there are recommendations for the CPU time. Will using values lower than 100% mean that user tasks will start quicker, as they get access to a core more easily (if that's a thing, I don't know much about scheduling and the like), and will the constant change in temperature and the resulting mechanical work from expansion and contraction wear out the CPUs more quickly than going with 100% CPU time?

Also, users are instructed to run all their tasks with niceness 19, to make logins and RStudio sessions smooth.

Any other considerations I should make?


The received wisdom is to use 100% CPU time for exactly the reason you give plus it slows down the work units for no real gain in flexibility for other users.

You could consider listing the big heavy jobs as exclusive applications to guarantee that Boinc cannot interfere with their operation, the smaller jobs should fit in fine.
ID: 100685 · Report as offensive
Ant Evans

Send message
Joined: 2 Mar 06
Posts: 13
Germany
Message 101028 - Posted: 9 Oct 2020, 15:11:44 UTC - in response to Message 100685.  
Last modified: 9 Oct 2020, 15:15:05 UTC

Run BOINC tasks at 100% as suggested and trust the scheduler. Maybe let users use something a little higher (lower, I guess, in Linux) so that BOINC tasks are pushed into the background. It works.

One other thing, think in physical cores, not threads, so you don't allocate resources you don't have. You will get better results if you stress all threads, but you will also increase the clock time to completion (see below) of any given task.

On the other hand, memory is less ephemeral and will be handled less elegantly. You want to have tasks complete in the shortest clock time possible to reduce memory thrashing and contention. Ways to do this: reduce the number of tasks, or radically increase the time between switching. Only uncheck 'leave non-GPU tasks in memory' if you have done the latter.

This all requires familiarity with boinccmd or a remote connection for the manager. No esoteric settings are needed.
ID: 101028 · Report as offensive

Message boards : Questions and problems : Profile recommendations for HPC

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.