Posts by Paul Schauble

1) Message boards : Questions and problems : Invalid client RPC password Try reinstalling Boinc. (Message 106061)
Posted 10 Nov 2021 by Paul Schauble
Post:
There was recently a thread about this happening on Linux. I am having the same problem, except on Windows 10.

Roughly every other day, the BOINC manager will stop working with the message "Invalid client RPC password Try reinstalling Boinc". Most of the time, I can make the manager reconnect by doing File / Select Computer and select the local machine. But the disconnect keeps happening.

This started with an older version of BOINC, but updating to 7.16.20 has not stopped the problem.

Is there anything I can do to fix this?

Thanks.
2) Message boards : Documentation : Format of job log files (Message 104436)
Posted 22 May 2021 by Paul Schauble
Post:
Where can I find documentation on the format of the job log files in the BOINC directory?

Thanks
3) Message boards : Questions and problems : I just don't understand how BOINC schedules tasks (Message 100343)
Posted 19 Aug 2020 by Paul Schauble
Post:
I though it was something like that. In other words, the whole scheduler needs to be rethought if the light f multi-cpu tasks.

It would likely not be that had to make the scheduler not preempt a task that was in danger of missing its deadline. The scheduler supposedly already handles tasks like that. But doing so would likely just cause a different set of weirdness.

Just have to put up with it.

++PLS
4) Message boards : Questions and problems : I just don't understand how BOINC schedules tasks (Message 100250)
Posted 9 Aug 2020 by Paul Schauble
Post:
I am running BOINC 7.16.7 (x64) on Windows 10 v1909 x64, with project srbase, LHC, and Asteroids. The CPU has 4 hyperthreaded core with BOINC set to allow 3 cores (yes, different definition of "cores").

I just watched BOINC suspend 2 very near deadline, and should have been high priority, Asteroids tasks with less than 1 hour to run in favor of an LHC ATLAS task with an 8 hour estimate due in 4 days.

The two Asteroids tasks where the last two in queue, so everything would have run successfully had they been allowed to finish.

The BOINC event log shows this when the Asteroids tasks were suspended:
2020-08-08 10:00:33 | Asteroids@home | Computation for task ps_200721_input_411087_1_0 finished
2020-08-08 10:00:33 | LHC@home | Starting task hEKKDmLfLNxnsSi4apGgGQJmABFKDmABFKDmEoBPDmABFKDmq6rqen_0
2020-08-08 10:00:33 | LHC@home | [cpu_sched] Starting task hEKKDmLfLNxnsSi4apGgGQJmABFKDmABFKDmEoBPDmABFKDmq6rqen_0 using ATLAS version 200 (vbox64_mt_mcore_atlas) in slot 0
2020-08-08 10:00:35 | Asteroids@home | Started upload of ps_200721_input_411087_1_0_0
2020-08-08 10:00:38 | Asteroids@home | Finished upload of ps_200721_input_411087_1_0_0
2020-08-08 10:00:43 | Asteroids@home | Sending scheduler request: To report completed tasks.
2020-08-08 10:00:43 | Asteroids@home | Reporting 1 completed tasks
2020-08-08 10:00:43 | Asteroids@home | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: )
2020-08-08 10:00:44 | Asteroids@home | Scheduler request completed
2020-08-08 10:00:44 | Asteroids@home | Project requested delay of 7 seconds
2020-08-08 10:01:33 | Asteroids@home | [cpu_sched] Preempting ps_200721_input_411087_2_0 (removed from memory)
2020-08-08 10:01:33 | Asteroids@home | [cpu_sched] Preempting ps_200721_input_411152_2_1 (removed from memory)

I think the sequence here was

  1. The last three Asteroids tasks are running. One completes.
  2. BOINC starts the LHC ATLAS task, which requires 3 CPUs, without taking notice of the 3 CPU requirement.
  3. After running for 1 minute, BOINC notices the requirement for 3 CPUs and preempts the two Asteroids tasks to provide the other two CPUs. This preemption is without regard to priorities or deadlines.


It seems obviously wrong to sacrifice two tasks that could have finished within deadline to run a task that will finish within deadline in any case.

Can someone please explain why it works this way?

Thanks,
++PLS

5) Message boards : Questions and problems : Problem suspending tasks (Message 91668)
Posted 30 May 2019 by Paul Schauble
Post:
Suspending BOINC activity, then suspending individual tasks is also a good work around.

I'm glad the issue is already known.

++PLS
6) Message boards : Questions and problems : Problem suspending tasks (Message 91662)
Posted 29 May 2019 by Paul Schauble
Post:
I am on Windows 10 v1809. And the key is to highlight EVERYTHING, including the running tasks. This has happened to be twice now. The project I'm running is SRBase. Perhaps part of the problem is that this project uses a fairly small program and can start a task quite quickly.

++PLS
7) Message boards : Questions and problems : Problem suspending tasks (Message 91659)
Posted 29 May 2019 by Paul Schauble
Post:
Yes, at the moment just one project. I also should have said that I'm running BOINC 7.14.2.

++PLS
8) Message boards : Questions and problems : Problem suspending tasks (Message 91656)
Posted 29 May 2019 by Paul Schauble
Post:
I needed to limit BOINC work being done to reduce CPU overheating. I decided to force one particularly long running tasks to complete and then restart with a reduced workload.

So I did the following:
- Set the project to no new work
- Return to the Tasks window, highlight all tasks, and click Suspend.
- Intending to return and resume the one tasks I wanted to finish.

However, I think what happened is that BOINC suspended the first task in the window. BOINC then instantly started a new task. BOINC them came back and suspended the second task, which immediately started a replacement task. And so on.

I ended up with about half the waiting tasks going from "not started and holding no resources" to "started, holding resources, and suspended with 1 second running time".

I now know that I have to suspend all non-running tasks before suspending any running tasks to avoid this mess. But I think this is a bug in how suspend is handled.

Is there a way to report BOINC bugs, other than here?

Thanks,
++PLS
9) Message boards : Questions and problems : Better control of subprojects (Message 89794)
Posted 25 Jan 2019 by Paul Schauble
Post:
Yeah, it probably would require server changes. Right now, the client can decline to request jobs, but the only granularity is CPU v GPU. If the client requests CPU jobs, it gets what it gets. The request would have to be change to "give me CPU tasks for these subprojects". Still, I see other uses for that capability.

S
10) Message boards : Questions and problems : Better control of subprojects (Message 89757)
Posted 23 Jan 2019 by Paul Schauble
Post:
I hadn't though of that aspect. But I did say that using <app_version> instead of <app> would allow for finer control. So, consider:

<app_version>
<app_name>camb_boinc2docker</app_name>
<plan_class>vbox64_mt</plan_class>
<no_download>1</no_download>
</app_version>


Here the <no_download> is conditioned on both full app name and plan class. The project could define plans based on runtime, for example
run1 = 1 to 10 hours on a standard machine
run2 = 10 to 100
run3 = 100 to 300

and so on, as appropriate.


Maybe to make this work right we need to have <no_download> at both the <app> and <app_version> level with <app_version> overriding. Then you can use <no_download> at the <app> level to block everything, and use it at the <app_version> level to allow the plans you want.
11) Message boards : Questions and problems : Better control of subprojects (Message 89711)
Posted 21 Jan 2019 by Paul Schauble
Post:
I'm personally of the opinion that subprojects should only be used when the characteristics of the work are very similar. Sadly, that not the way they are being used. Look at the number of projects that have a subproject with native tasks and another with vbox tasks.

This is a problem because the current mechanisms for controlling which subprojects a computer runs are painfully limited. Most project web sites allow selecting subprojects, but only using the Home/Work/School/Other categories. First, 4 catagories is two few for people with multiple computers. Second, the Gridcoin people have no access to this mechanism.

I like to request an extension to the project config file to allow control of subprojects. Something like
<app>
<name>camb</name>
<no_download>1</no_download>
</app>

Putting the entry in <app_version> might allow even finer control.

This should do exactly the same thing as the project web site setting: Existing tasks will run, but no new tasks will be downloaded. This should be easy to implement.

Thanks
12) Message boards : BOINC Manager : [Request] Show account manager set status (Message 89375)
Posted 28 Dec 2018 by Paul Schauble
Post:
The account manager can set status on a project, for example, the account manager can set a project to "no new tasks". Currently, the manager shows "no new tasks" when set from the manager, but doesn't show it when this status is set by the account manager.

This should be displayed, along with an indication that it was set by the account manager.

Thanks
13) Message boards : Questions and problems : 7.14.2 does not suspend running tasks when CPU usage limit is lowered (Message 88786)
Posted 8 Nov 2018 by Paul Schauble
Post:
There is a mistake at the end of my last post It should be

7.14.2
Allowed CPUs 5% (of 8)
BOINC starts 1 LHC Atlas task (3 CPUS), and 1 LHC Theory tasks (3 CPU)


If BOINC will run 6 CPUs of work while the allowed CPU usage is set to less than 1/2 CPU, then the CPU limit means nothing.

++PLS
14) Message boards : Questions and problems : 7.14.2 does not suspend running tasks when CPU usage limit is lowered (Message 88785)
Posted 8 Nov 2018 by Paul Schauble
Post:
Ok, I went back to 7.12.1, with these results:

7.14.2
Allowing 6 CPUs
Running 1 LHC Atlas 3 CPU, 1 LHC Theory 3 CPU.
Drop allowed CPUs to 4
3 minutes later, same two tasks running.

7.12.1 Installed
verify running 7.12.1
Allowing 6 CPUs
Running 2 LHC Theory 3 CPU each.
Drop allowed CPUs to 4
3 minutes later, same two tasks are running
Drop allowed CPUs to 2
3 minutes, same two tasks are running
Drop allowed CPUs to 5% (of 8 core)
3 minutes later, same two tasks running

7.14.2 ihstalled
5% of CPUs (of 8 core) are allowed
BOINC starts 1 LHC Atlas task 3 CPUs.

This last may indicate the problem. If the allowed CPUs are about 1/2 CPU, then BOINC should just not start a 3 CPU tasks at all.

Conclusions
1. There is no difference between the two versions
2. There is a difference in behavior when allow CPU is dropped between native tasks and VBox multi-threaded tasks.
3. IMHO, both versions are wrong. When allowed nCPUs is dropped, BOINC should shed load accordingly.
4. BOINC should not start tasks that would place usage above the allowed nCPUs
15) Message boards : Questions and problems : 7.14.2 does not suspend running tasks when CPU usage limit is lowered (Message 88771)
Posted 7 Nov 2018 by Paul Schauble
Post:
Nice to know. The project in questions is LHC. I've been running LHC for a long time and every previous version would automatically suspend tasks when the allowed CPU was lowered.

If a task doesn't checkpoint, what happens if you hit the Suspend button on it. Because that suspends the task. And it will later resume with little change in time remaining, so I doubt the task is being restarted.

Now what is new is that I have only just started getting multi-cpu tasks. But I'm pretty sure that for the short period I was running the previous BOINC that auto-suspend was working.

I'm fine with dropping back a BOINC version and testing to be sure. So, questions:
1. Where can I find an install package for the previous BOINC release?
2. Are there any problems with installing the older package over a newer?

++PLS
16) Message boards : Questions and problems : Repeatedly running the same job (Message 88770)
Posted 7 Nov 2018 by Paul Schauble
Post:
I'd like to do some configuration on on the the BOINC projects.

Is there a way to manually pull one one job from a project and repeatedly run it to completion in different configurations. I don't care if the results are reported.

Thanks
17) Message boards : Questions and problems : 7.14.2 does not suspend running tasks when CPU usage limit is lowered (Message 88746)
Posted 6 Nov 2018 by Paul Schauble
Post:
When I'm not actively using my computer I let BOINC use 75% (6 of 8 cores). When I want to actively use it, I go into the BOINC menu and lower the CPU usage limit to 50%.

All previous versions would suspend tasks to being CPU usage to below the new limit. 7.14.2 does not do this, it does not suspend anything. It does, sort of, honor the new limit in that if I manually suspend a task, BOINC will not start a replacement.

I think the old way of automatically suspending tasks is correct.

++PLS
18) Message boards : BOINC client : How does BOINC get WU CPU time (Message 88740)
Posted 4 Nov 2018 by Paul Schauble
Post:
First, I'm referring to Windows here.

So, if I understand correctly

1. If BOINC run the app doing the work directly, it depends on the worker app to report CPU time. BOINC has a process handle to the worker process and could get CPU time from this handle, but doesn't do that.

2. If a wrapper is used, or if the worker process starts child worker processes, perhaps for multiple levels, then BOINC doesn't have a process handle on the children and depends on all intermediate processes to report CPU time up the ladder.

3. BOINC and/or the wrapper could cover all these cases by creating a Windows JOB object before it starts the first worker process. BOINC would create the first worker process/wrapper and add it to the JOB. The JOB object then tracks all child processes subsequently and add up the used CPU time for all of them. BOINC then reads the real total CPU time from the JOB object after the wrapper returns and gets everything. Of course, this is Windows specific code, but I'm sure it's not the only Window specific code in BOINC.

If all this is correct, would you like me to do the code for item 3? Do you think the JOB handling should be in BOINC, where it's needed only once, or in the wrapper, where each wrapper needs it?

++PLS
19) Message boards : BOINC client : How does BOINC get WU CPU time (Message 88733)
Posted 4 Nov 2018 by Paul Schauble
Post:
Since a lot of projects have BOINC start a wrapper, which runs another process, which runs another process, which in some cases starts many other processes, how does BOINC get the toal CPU time?

Could someone please give me a short summary, perhaps with pointers to the source code?

Thanks,
++PLS
20) Message boards : Questions and problems : Users of grcpool beware! (Message 86519)
Posted 10 Jun 2018 by Paul Schauble
Post:
BeemerBiker:

Is that message between BOINC and grcpool that you show there? If so, what did you do to get them?

Thanks


Next 20

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.