Incorrect CPU threshold

Message boards : Questions and problems : Incorrect CPU threshold
Message board moderation

To post messages, you must log in.

AuthorMessage
flakinho

Send message
Joined: 5 Dec 12
Posts: 49
United States
Message 59291 - Posted: 3 Jan 2015, 22:18:21 UTC

Hi,
since several days ago, BOINC doesn't apply the option "on multiprocessor systems, us at most xx% of the processors", when vLHC@home and ATLAS@home have tasks running at the same time.

In detail: I have that option set to 85%. In my Intel CORE i7 (8 processors), that means 6 tasks running on 6 cores.
It has been this way for years.
However, when 2 vLHC tasks and 1 ATLAS are running at the same time, only 1 more task will run at the same time with them (so, that is the 50% of the available cores) and the total CPU usage moves around 40%-60%, measured by the operative system.
It is not a memory problem: I have 16Gb and the memory has still 8Gb free in this situation.
It is not a disk space problem either, there are 20 Gb free for BOINC usage.

Whenever I suspend ATLAS and vLHC, more tasks from other projects are downloaded and everything backs to normal, with 6 tasks running on 6 cores at 100% CPU (~75% of the total number of cores).

One week ago, the same situation was not causing any trouble at all.
Does anybody know what is going on?

More specs, I am running:
- windows 7 SP1 64 bits
- boinc 7.4.27 (x64)
- VM VirtualBox 4.3.12 r93733
- I have the file app_config.xml limiting to 1 concurrent task running for ATLAS.

Thanks for any help!
ID: 59291 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 59308 - Posted: 5 Jan 2015, 17:32:59 UTC - in response to Message 59291.  

The virtual machines use CPU cores for themselves as well. The project will set how much CPU each VM uses. As far as I know, vLHC uses 2 cores per one of their tasks, so if you have 2 running, you use 4 cores already. If ATLAS uses 2 cores as well, that's 6 cores already.

But you best ask at the projects how many CPU cores they use.
ID: 59308 · Report as offensive
flakinho

Send message
Joined: 5 Dec 12
Posts: 49
United States
Message 59403 - Posted: 7 Jan 2015, 22:57:43 UTC

Well, I really think that vLHC uses 1 core per task as well as ATLAS does.
At least it is what it does in all my other machines.
I think that vLHC implemented the use of 2 cores per task some time ago, but after testing it they finally decided it was not a good idea.
When they did it, you could clearly see on BOINC when a taks was using 2 cores, the same way that you can see if it is using a GPU.
I think they decided to stop using several cores per task, because a big part of the run the vLHC task cannot work in parallel, so 1 of the cores is idle and when it could be allocated to another BOINC task.
I tried to localize the post but I couldn't.
On the other hand I don't have notice that ATLAS used 2 cores per task ever.

So I think that is not the answer. Correct me if I am wrong.

It seems to happen with the combination of climateprediction, ATLAS and vLHC tasks at the same time (that computer is multicore).
It is really weird.

Thanks for the idea.

Yacob
ID: 59403 · Report as offensive
flakinho

Send message
Joined: 5 Dec 12
Posts: 49
United States
Message 60409 - Posted: 19 Feb 2015, 15:14:06 UTC

Hi,
I guess that nobody is paying attention to this problem, but it appeared in other one of my computers as well. And this is a real powerful one.
It seems to be related to the projects that use Oracle VM Virtualbox.
Briefly: when I have two tasks running on the VM from different projects (usually vLHC and ATLAS, but also ATLAS and RNA world), the client stops asking for more tasks.

In my new computer, a Dell Workstation 7810:
- Ubuntu 14.04
- BOINC 7.2.42 (x64)
- Oracle VM 4.3.22 r98236
- Intel Xeon E5-2620 2.14 GHz, 12 CPUs
- 32 Gb RAM
- 4 Tb HDD
- NVIDIA Quadro K4200

At certain point, instead of using 10 CPUs as usually BOINC does, it stops asking for more tasks and keep running, like this morning, only 2 tasks: one for ATLAS, one for RNA world.
Whenever I suspend one of these 2 projects, the client starts downloading more tasks and resumes the work in 10 CPUs. Then, I can resume the suspended project.
It happens either I suspend ATLAS or RNA (and in my other computer with vLHC and ATLAS).
Really weird.

Am I explained it well?
Any clue what is happening here?
Anybody else reports the same problem?

Yacob
ID: 60409 · Report as offensive
flakinho

Send message
Joined: 5 Dec 12
Posts: 49
United States
Message 60413 - Posted: 19 Feb 2015, 15:21:53 UTC

The point is that it doesn't make any sense:
- when I arrive in the morning, BOINC is running only 2 VM tasks of different projects.
- then I suspend one of them, BOINC starts asking for more work, it downloads new tasks for other projects and start working on them, I resume the suspended project and I get then the same 2 VM tasks working + 8 new ones. Without doing any change in the configuration.

If any of my configurations is wrong, it is not very intuitive.
What is wrong here?
ID: 60413 · Report as offensive
flakinho

Send message
Joined: 5 Dec 12
Posts: 49
United States
Message 60457 - Posted: 22 Feb 2015, 1:15:26 UTC - in response to Message 60413.  

I think I figure it out by my own.
It is long to explain, but I think BOINC people should fix this:

I made an experiment 2 h ago.
My BOINC was running set up at maximum usage of 85% of processors.
In my system that is ~10 CPUs.

At t=0 I had the following WU:
1 RNAworld, 2 vLHC, 10 ATLAS, 10 MindModelling, 1 WUProp (non-CPU intensive) and 6 GPU taks for Einstein.
Of them, they were running:
1 RNA, 1 vLHC, 1 ATLAS, 7 MindModel, 1 WUProp and 1 Einstein-GPU.
That made a total of ~10 CPU at 100% load = 85% total processors.

1 h later, and here it comes the problem, all the MindModel tasks were finished.
Then, I had only running: 1 RNA, 1 vLHC, 1 ATLAS, 1 WUProp and 1 Einstein-GPU.
According to a system monitor program, my system was loaded at ~35%. Far below the maximum threshold of 85% and many projects with tasks available.
Still they were 10 CPU tasks on queue (1 for vLHC, 9 for ATLAS), ready to start but not running, and the client was not asking for more work to any project.

Then I suspended the 9 ATLAS tasks that were not running and boinc automatically downloaded 38 POGS@home new tasks.
Seconds later 8 of them were running and the load was ~85% again.

The explanation:
As you probably know the VM tasks (vLHC, ATLAS, RNA, etc) are very demanding, specially ATLAS needs several Gb of RAM and HDD per running task.
Also, an additional problem is that the virtual machines stop sometimes with the following error message: "Scheduler wait: VM Job unmanageable, restarting later".
And this error repeats too often sometimes for the same tasks.
As a solution to avoid errors, I tried to limit the number of VM tasks running simultaneously to 1 per VM project, by creating the files:

config.xml for the projects ATLAS and vLHC:
<boinc>
 <config>
  <max_wus_in_progress>1</max_wus_in_progress>
 </config>
</boinc>

app_config.xml for the same projects:
<app_config>
 <app>
  <name>ATLAS</name>
  <max_concurrent>1</max_concurrent>
 </app>
</app_config>
or
  <name>vboxwrapper</name>
for vLHC.

The result is that ATLAS usually downloads 10 tasks in my system, but runs only one.
And it seems that BOINC does not download more tasks because it detects that the ATLAS tasks are filling the queue, so it doesn't need more (in my system ATLAS has high priorities).
BOINC doesn't run more than 1 ATLAS task because of the config.xml and app_config.xml files, but it seems to be confused and doesn't ask for more tasks keeping the workload really low.
When I suspend the remaining 9 "ready to start" tasks, boinc requests more tasks of other projects and raises the CPU load to my defined 85%.

I think that this is going to happen to somebody else soon, if it is not happening right now.
Somebody, either BOINC people or ATLAS people, should fix this problem.
ATLAS could act like vLHC@home and download only a limited number of tasks (2 or 3), considering the most of personal computers won't match the required specifications in RAM for many tasks.
Or even better, boinc program should consider the value and meaning of the parameter <max_wus_in_progress> when considering requesting more tasks or not.

I didn't find any parameter in app_config.xml, config.xml, or the other .xml configuration files to limit the number of tasks downloaded for a project.
Parameters as <feeder_query_size> or <shmem_work_items> don't serve for this purpose.
At this point, to solve this problem I have to make an script with boinccmd to suspend/resume automatically ATLAS tasks in queue. Otherwise, my computer would work always below my defined maximum load.

Please, give me some feedback.

Thanks,
Yacob
ID: 60457 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 60462 - Posted: 22 Feb 2015, 9:57:31 UTC - in response to Message 60457.  

I wrote to the boinc_alpha bug reporting mailing list on 30 November 2014, under the title "Max_concurrent, work fetch, and idle resources":

... some users might set max_concurrent for a project which was known to place heavy resource (memory, disk?) demands on the system, and intend to donate the remaining resources to light-weight projects like, as it happens, NumberFields and SIMAP. Should there be a special work_fetch for the idle resource under these circumstances?

David Anderson replied:

I understand the problem.
Fixing it would a fairly big task, which I don't want to undertake right now.
I created a Trac ticket, and will hopefully get to it next month.
-- David

That was a couple of days later, into December, so I assumed he meant January. But it hasn't happened yet.

https://boinc.berkeley.edu/trac/ticket/1373
ID: 60462 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 60468 - Posted: 22 Feb 2015, 16:07:37 UTC - in response to Message 60462.  

That was a couple of days later, into December, so I assumed he meant January. But it hasn't happened yet.

https://boinc.berkeley.edu/trac/ticket/1373

You can wait quite a long time if you're keeping an eye on Trac. Tickets there are no longer updated. The Issues list in gitHub has taken over these tasks.

https://github.com/BOINC/boinc/issues/1344
https://github.com/BOINC/boinc/issues?q=is%3Aopen+is%3Aissue
ID: 60468 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 60469 - Posted: 22 Feb 2015, 16:19:50 UTC - in response to Message 60468.  

That was a couple of days later, into December, so I assumed he meant January. But it hasn't happened yet.

https://boinc.berkeley.edu/trac/ticket/1373

You can wait quite a long time if you're keeping an eye on Trac. Tickets there are no longer updated. The Issues list in gitHub has taken over these tasks.

https://github.com/BOINC/boinc/issues/1344
https://github.com/BOINC/boinc/issues?q=is%3Aopen+is%3Aissue

Thanks - I knew about the migration from trac tickets to git issues, but didn't know the urls (they may not have been there the last time I tried to look - it was a while ago, when the announcement was made).

But I wasn't looking there for a resolution, just to show Yacob the state of play. I'd have seen the code being checked into git itself if it had been - but it hasn't.
ID: 60469 · Report as offensive
flakinho

Send message
Joined: 5 Dec 12
Posts: 49
United States
Message 60487 - Posted: 23 Feb 2015, 14:38:59 UTC

Hey Richard,
thanks so much for your help!
This is exactly the kind of interaction I was waiting for.
I don't know anything about how to report a bug or check if it was reported, so I needed help from somebody to make the next step.

Thanks Richard and Ageless!

Yacob
ID: 60487 · Report as offensive

Message boards : Questions and problems : Incorrect CPU threshold

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.