Posts by Joseph Stateson

1) Message boards : Questions and problems : Only 1 out of seven projects reporting work (Message 113127)
Posted 17 Nov 2023 by Profile Joseph Stateson
Post:
I am attached to seven projects:
GPUGrid.net
Rosetta@home

This is not what I signed up for. I want to see work in the other projects. They all have work available.

What is wrong here? What do I need to do?


I cannot remember the last time I got work from them.

https://boinc.bakerlab.org/rosetta/server_status.php
https://www.gpugrid.net/server_status.php

Nothing to send

Sign up to yoyo they have plenty

WCG comes and goes.

Climate is a waste of energy, CO2 levels drop any lower most life could end. We are about 417 and should be 10x that to turn everything green
2) Message boards : The Lounge : Windows 11 23H2? (Message 113124)
Posted 17 Nov 2023 by Profile Joseph Stateson
Post:
I'm just curious -- has anyone has upgraded their system to Win 11 23H2, and did they any problems with it.


I upgraded 6 systems using the Rufus trick and 23H2. None of the systems were "blessed" by Microsoft. All were in-place upgrades.


Lenovo S20: from 10->23h2
USB-3 PCIe adapter that was working in 10 no longer woke up from sleep. Solved by going to it's adapter power management and not allowing it to be turned off.

MSI x229m-a: 22h2 -> 23h2
Search no longer worked: Solved by "restart the Start Menu Experience Host"
"iqvw64e.sys" was banned by 23h2 so I just deleted it when I could not find any device using it.

No Problem:
HP Z400
Dell 435mt
EVGA SLI 131-GT-e767
MSI Z270 SLI Plus


All systems working fine. On the other hand, I replaced the broken motor for my car passenger window and it goes down instead of up and up instead of down and I am waiting to hear from the vendor if they sold me the wrong side door motor.
3) Message boards : Questions and problems : 7.24.1: install dialog box blocks win11 taskbar search input (Message 112729)
Posted 21 Sep 2023 by Profile Joseph Stateson
Post:
Note sure if this is a windows 11 22h2 problem or in 7.24.1

New install, existing user account. When asking for password I tried to run a password app by entering the name of the app in the search tab. The search tab highlighted when I clicked the mouse but any text I typed in went instead into the password box of the Boinc manager. I had to close the username/password dialog box before I could run my password manager.
4) Message boards : Projects : World Community Grid needs help. can they get any here? (Message 111697)
Posted 1 May 2023 by Profile Joseph Stateson
Post:
A couple of weeks ago WCG did make an announcement that the stats pages would be out of synch for some time as they were having to rebuild a lot of users & computers stats data from scratch.


IMHO, out of sync is not the same as missing devices. I have devices registered before 2019 that are crunching away but not listed. Users are posting their host ID's and the staff is forwarding the ID for manual entry into the database. That is NFG, and they need help.
5) Message boards : Projects : World Community Grid needs help. can they get any here? (Message 111695)
Posted 1 May 2023 by Profile Joseph Stateson
Post:
I just realized that only 4 of my 9 system show up as active over at WCG. The missing 5 system have been crunching away for several weeks with no problems indicated.

Their forum is full of users reporting the same problem. Some users have 1000s of missing devices. Probably that charity engine gang.is worried about the gridcoins they are losing.

There are several threads, all too long for me to go through. I was unable to find an official statement about the problem and what to do.
6) Message boards : Projects : News on Project Outages (Message 111685)
Posted 29 Apr 2023 by Profile Joseph Stateson
Post:
Milkyway@home is down. AGAIN.


Yea, they seem to have problems and when they do come back up there is no mention of what went wrong. My AMD boards are only useful on Milkyway. Their superior DP float does not provide any benefit on Einstein. Asteroids had no OpenCL apps and Numberfield's OpenCL seem to run only on newer AMD cards with UEFI bios. When I asked about the problem, the admin did not know how the controlling XML was generated let alone how to fix it... There was a bad thunderstorm, heavy hail last night and I shut them all off. I may leave them off the whole summer. My nVidia boards run fine on all projects. I really miss SETI.
7) Message boards : BOINC client : "SSL Connect Error" BOINC 7.20.2 for Windows 10 22H2 (Message 111581)
Posted 15 Apr 2023 by Profile Joseph Stateson
Post:
However, it would be nice if BOINC updated ALL_PROJECTS_LIST.XML to have the correct attachment url.
David normally does this, but it requires that someone, project admin, trusted users etc., tell him about it. It can't do it on its own. Unless we add AutoGPT to BOINC and make BOINC sentient. 😂


I recall this was discussed some time ago. WUProp@home is not in the list because they did not bother to ask to have it put in.

As to why 7.16 is working I am guessing it does not rely on windows to handle the certificate whereas 7.20 does and an expired cert somewhere triggers an error in 22h2
8) Message boards : BOINC client : "SSL Connect Error" BOINC 7.20.2 for Windows 10 22H2 (Message 111579)
Posted 15 Apr 2023 by Profile Joseph Stateson
Post:
But this is the message line that you only see when adding http_debug, otherwise it's silent. It's not the same one as the one where you add the project and BOINC then first (or second, or 100th) talks to the scheduler which checks what project URL is used. This URL is used by Curl, I think.


You are probably correct. I did a "find" and https shows up on 22h2 systems that are attached correctly using http.
However, it would be nice if BOINC updated ALL_PROJECTS_LIST.XML to have the correct attachment url.

This is what I am guessing is happening and IANE on networks

http and https difference is that ssl and tls certificates are checked when using https

A connection is being made to wcg but it goes through a driver that has a problem. For example, when I enabled core isolation (an hour ago on a new system) I got an iqvw64e.sys driver error

https://answers.microsoft.com/en-us/windows/forum/all/iqvw64esys-a-driver-cannot-load-on-this-device/dcc336f6-8815-4346-952b-fff97fe81523?page=2

That is a VPN driver, but I do not use VPN and never have and there is no problem with boinc on that system. I had to uninstall intel pro networking client to remove the driver and allow core isolation to be enabled.
Possibly a connection to WCG goes through a problem driver on SoCrunchy's system. A good check is to enabled core isolation and see if any network driver has a problem

I never got that app to work on my Dell system in 22h2. Dell let the certificate expire and was not interested in updating it.
9) Message boards : BOINC client : "SSL Connect Error" BOINC 7.20.2 for Windows 10 22H2 (Message 111577)
Posted 15 Apr 2023 by Profile Joseph Stateson
Post:
I have 100's of these messages, they only go away after attaching using http

JYSArea51

86	World Community Grid	4/14/2023 10:37:06 PM	This project seems to have changed its URL.  When convenient, remove the project, then add http://www.worldcommunitygrid.org/	
87	World Community Grid	4/14/2023 11:12:06 PM	This project seems to have changed its URL.  When convenient, remove the project, then add http://www.worldcommunitygrid.org/	
88	World Community Grid	4/14/2023 11:14:09 PM	This project seems to have changed its URL.  When convenient, remove the project, then add http://www.worldcommunitygrid.org/	
89	World Community Grid	4/14/2023 11:16:13 PM	This project seems to have changed its URL.  When convenient, remove the project, then add http://www.worldcommunitygrid.org/	


EDIT: should have mentioned that I have several 22h2 systems, win 10 & 11 and show http as well as https (like you) and I do not have any problem connecting to worldcommunitygrid l
10) Message boards : BOINC client : "SSL Connect Error" BOINC 7.20.2 for Windows 10 22H2 (Message 111574)
Posted 15 Apr 2023 by Profile Joseph Stateson
Post:
BOINC 7.20.2 on Windows 7 Pro SP1 64-bit works fine:

4/15/2023 8:53:43 AM | World Community Grid | update requested by user
4/15/2023 8:53:43 AM |  | [http] HTTP_OP::init_get(): https://www.worldcommunitygrid.org/viewNoticesRSSFeed.action?userIdHash=blah



l just noticed you are using https

Detach and use http://www.worldcommunity
The list in boinc folder is incorrect and shows https it should be http

maybe this makes a difference in 22h2
just guessing

edit: if "global_prefs:" has worldcommunity make sure it is http and not https
11) Message boards : BOINC client : "SSL Connect Error" BOINC 7.20.2 for Windows 10 22H2 (Message 111573)
Posted 15 Apr 2023 by Profile Joseph Stateson
Post:
I had a similar problem upgrading to 22H2: a required driver had an expired certificate and "core isolation" declared it explicitly revoked. l was unable to run the app except on a pre-22h2 system.

If you have core isolation enabled, then disabled it. if it is disabled then enable it.

Look in the (boinc) event log and see if the error message changes when you make changes in core isolation. windows event log may show more info so you might check that.

If is is not possible to enable core isolation, then click on "details" and see why it cannot be enabled. I had about 6 problem drivers. that I had to update or remove to enable core isolatiion.

edit: changed protection to isolation.
12) Message boards : Projects : News on Project Outages (Message 111477)
Posted 1 Apr 2023 by Profile Joseph Stateson
Post:
Things have gone downhill ever since SETI folded up.
I want over to BoincStats and made the following notes

150 Retired projects
43 active projects (unaccountably includding collatz!)
24 projects that have at least 1 work unit of data.

Out of those 24 projects with data only 9 have enough to be candidates for the next crunching contest.
Out of those 9 there are 2 that currently have data and statistics collection problems and another 3 that have their servers hosted in Russia.


Does anyone have or know of a chart of BOINC project growth / loss?
I am guessing a bell curve would show SETI at the top.
If BOINC was a company there would be shareholders looking for new management.

My 0.02c

Sorry if I offended anyone.
Feel free to delete this post or move it somwhere where management wont see it.
13) Message boards : Projects : certificate problem at asteroids at home (Message 111267)
Posted 12 Mar 2023 by Profile Joseph Stateson
Post:
It's all good now.
https://asteroidsathome.net/boinc/


Took 6 hours for all my completed tasks to finally upload. I had a huge queue when the cert expired.
Problem was exacerbated by the WGC tasks that cannot upload yet. BOINC tried and failed each upload of WCG before going on to Asteroids.
I should have requested a restart of the transfers for just Asteroids.

I suspect all my WCG have passed their deadline.
14) Message boards : Projects : certificate problem at asteroids at home (Message 111257)
Posted 11 Mar 2023 by Profile Joseph Stateson
Post:
Cannot easily log in. Seems their certificate expired

Unable to connect

An error occurred during a connection to asteroidsathome.com.

    The site could be temporarily unavailable or too busy. Try again in a few moments.
    If you are unable to load any pages, check your computer’s network connection.
    If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the web.



Neither windows nor ubuntu

I can lot in using edge but have to cornfirm I know it is dangerous.

Websites prove their identity via certificates, which are valid for a set time period. The certificate for asteroidsathome.net expired on 3/9/2023.
 
Error code: SEC_ERROR_EXPIRED_CERTIFICATE
15) Message boards : Questions and problems : Not possible to recover from "unable to parse account file" (Message 111007)
Posted 6 Feb 2023 by Profile Joseph Stateson
Post:
I had a video card overheat and freeze the system. The card problem was fixed but after booting up the message log shows the unable to parse message for both the milkyway account and statistics xml files.

In addition, there were 900 messages about "project not in state file". I assume one message for each of the 900 jobs waiting to run. The Milkyway job "txt" file was 43 mb in size.

Both those xml files were empty although their size was not 0. Since the account files for milkyway are identical on all my systems, I copied over both an account file and a statistics file to replace the corrupted files.

Although this "worked", all 900 jobs were marked as "abandoned" on the server and another 900 were downloaded. By "worked" I mean I did not have to delete the project and reattach it to fix the corrupted files

There must have been more files corrupted other than the account and statistics.
I am not sure why the account files was being re-written since it never changes unless the project changes their html. It seems to me the account file should be opened in read only mode. I am guessing it was opened in read-write mode and was still open when the graphics board hung. The account and statistics file for wuprop was also corrupted. I did not bother to copy those files over and the project disappeared completely after rebooting. I will have to add that project back in if I want it unlike milkyway.

Since the 900 files were "abandoned" I assume they cannot be re-sent as a "ghost task"
I am tempted to have the OS set the attribute of the account file to be read-only to prevent it from being corrupted.
16) Message boards : Projects : How can "Advances in GPUs make the BOINC program redundant" ? (Message 110967)
Posted 26 Jan 2023 by Profile Joseph Stateson
Post:
A quote from Peom at home wikipedia

POEM@Home was a volunteer computing project hosted by the Karlsruhe Institute of Technology and running on the Berkeley Open Infrastructure for Network Computing (BOINC) software platform. It modeled protein folding using Anfinsen's dogma. POEM@Home was started in 2007 and, due to advances using GPUs that rendered the BOINC program redundant, concluded in October 2016.[1][2] The POEM@home applications were proprietary.


I recall that poem did protein folding. Perhaps the "folding a home" project made poem redundant?

Perhaps one of the Wikipedia editors here at BOINC can correct their conclusions.
17) Message boards : GPUs : OpenCL missing after Nvidia re-install: fixed a really strange way (Message 110921)
Posted 15 Jan 2023 by Profile Joseph Stateson
Post:
Hope this does not happen to anyone else. I am posting my solution in case someone needs it.

background:
I was debugging a problem with a video board and in the process lost OpenCL and Windows failed to boot and tried a repair
I got window working after removing the defective video board. Obviously, this caused problems.

BOINC was unable to process any work units requiring OpenCl. The log shows that OpenCL was available for the CPU and the Intel GPU but not for the Nvidia boards. However, the log showed that CUDA was still available and had the correct version and date.

Running clinfo.exe showed the exact same thing as Boinc did: OpenCl only for the Intel CPU.

I did a re-install of Nvidia driver 526.86 but that did not fix the OpenCL problem. I then got the latest 528.02 Nvidia and did a clean install but that did not fix the problem either. Googling I read that windows\system32\OpenCL.dll was where all the good stuff is kept. That Nvidia version was dated 2016. i replaced that dll with Nvidia OpenCL.dll dated 2022 from another system and rebooted. OpenCL is now working fine.

Unaccountably, an install of latest Nvidia driver did not update that DLL.
18) Message boards : Questions and problems : BOINC calculation on custom event? API? (Message 110919)
Posted 14 Jan 2023 by Profile Joseph Stateson
Post:

At one time I had been sending a text message to myself if there was a problem but google made it difficult to script mail by requiring an app password for gmail.


For what it's worth I finally got it to work using powershell as shown here
19) Message boards : Questions and problems : BOINC calculation on custom event? API? (Message 110918)
Posted 14 Jan 2023 by Profile Joseph Stateson
Post:
Hello there,
is it possible to create a custom event where a script is able to tell BOINC when to start or pause?
For example if it's a cold weather... to start BOINC, but if it's not then it pauses it.

Like is there some sort of API that BOINC is exposing?


As Ian&Steve C mentioned, you can use the BOINC command tool in any script.


However, if you are only concerned about overheating the CPU or GPU then
If running Windows, there is another option: you might consider using eFMer's tthrottle

It slows down "pauses" the CPU or GPU when it detects too high a temperature.
It does not control fans like MSI's Afterburner but is a good app to use as a failsafe.

Your script can also be run under eFMer's boinctasks app. For example, you can set a boinctask rule to run your script when the temperature exceeds a certain value.
Boinctasks can supply arguments to identify the project that is running hot and your script can handle the problem. There are also options in Boinctasks to do any of the following

Allow new work 
No more work 
Resume network 
Resume project 
Run program 
Snooze 
Snooze GPU 
Suspend network 
Suspend project 
Suspend task


based on any of the following conditions

Elapsed Time CPU %
Progress %
Time Left Progress / min % Use
Temperature
Status
Wall-dock Time
Connection
Deadline
Elapsed Time CPU %
Progress %
Time Left Progress / min % Use
Temperature
Status
Wall-dock Time
Connection
Deadline
Time Left Project


For example, you would select "temperature" and a value and then "Run Program" and enter the path to your script.

At one time I had been sending a text message to myself if there was a problem but google made it difficult to script mail by requiring an app password for gmail.
20) Message boards : Questions and problems : Boinc.exe terminates at start on W10 (Message 110459)
Posted 15 Nov 2022 by Profile Joseph Stateson
Post:
It seems to work sometimes, actually now it is working, like every 3rd boot or something like that.
The repair installaion, downloaded the latest client, opened upp the installer, selected repair.

The output from the client alone, was in the first post, no manager running at that time.

Running from an elevated command window should tell me if there is any issues with rights, I.E if I can run it from an elevated window but not from an standard Window.


When looking in the task manager for an app, be sure to look under "users" not just processes. It is possible for BOINC to be missing from processes but be running just fine under "users"

When you installed BOINC I assume you installed for all users and did not do a service install

As RobSmith mentioned you should not run BOINC in elevated mode especially if the manager is running in standard mode.

You might want to verify that all 3 BOINC apps are NOT running in elevated mode.


Looking at your original post I spotted a few problem projects
- setiathome is not operational and should be suspended or removed
- cosmology at home no longer had a valid certificate. You might consider attaching to "universe" instead
- einstein I recommend you detach from einstein and then use the elevated command prompt to make sure that all einstein files have been deleted from both "data" and "data\project"

If you continue having problems I recommend you use a free tool such as revo uninstaller to uninstall boinc and revo's scan capability to find and delete all references. I would also use elevated mode to verify boinc is gone from the D drive.
21) Message boards : Questions and problems : Boinc.exe terminates at start on W10 (Message 110444)
Posted 14 Nov 2022 by Profile Joseph Stateson
Post:
Having problem running BOINC, since it terminates when it starts.
Starting boinc from admin terminal gives this output:
How do I solve this issue?

14-Nov-2022 17:02:55 [] Couldn't parse account file account_einstein.phys.uwm.edu.xml
14-Nov-2022 17:02:55 [---] Couldn't parse statistics_einstein.phys.uwm.edu.xml


Possibly two problems

BOINC, or what remains of BOINC, may reside in memory from the last time it was run due to a crash.

BOINC may have established a connection to port 31416 and that connection was not properly closed.
You can run the command netstat from an administrator prompt to determine if a port is open or closed. For example


    C:\Windows\system32>netstat -aon | find "31416"
    TCP 0.0.0.0:31416 0.0.0.0:0 LISTENING 1236
    TCP 127.0.0.1:31416 127.0.0.1:49870 ESTABLISHED 1236
    TCP 127.0.0.1:49870 127.0.0.1:31416 ESTABLISHED 7420
    TCP 127.0.0.1:55649 127.0.0.1:31416 TIME_WAIT 0
    TCP 127.0.0.1:55663 127.0.0.1:31416 TIME_WAIT 0
    TCP 192.168.1.184:31416 192.168.1.241:50430 ESTABLISHED 1236



If the port is established, then it cannot be bound (listened) to by another process.
AFAIK only BOINC uses port 31416

If you are to run BOINC from the command line, make sure a copy is not already running and that port 31416 is NOT being listened to already.

The second problem is that the Einstein files are possibly corrupted. Either that or when BOINC crashed it had opened the Einstein files for reading and crashed before those files could be closed.
I recommend you reboot and then bring up the windows event viewer and look under "applications" for problems. If boinc is crashing on startup, then uninstall it and re-install.

I assume when you ran BOINC.EXE from the command line you used something like "boinc.exe --dir d:\boinc\data"

22) Message boards : GPUs : Bad or incompatible GPU? (Message 110423)
Posted 12 Nov 2022 by Profile Joseph Stateson
Post:
finally got a handle on the problem. i booted windows 10 and the system worked fine
Problem was the 22H2 update to windows 11
Googling "22H2 slowdown" brought up a motherload of complaints

I put the win11 drive back in and I and tried every suggestion that google dished up but was stuck at under %50 utilization no matter what option I enabled or disabled. It has been 10 days since 22H2 went in so no easy way to get back to when the system was working. All windows 11 recovery points are after 22H2. Going to beg on answers.microsoft.com. This system is old but was qualified for windows 11
23) Message boards : GPUs : Bad or incompatible GPU? (Message 110406)
Posted 10 Nov 2022 by Profile Joseph Stateson
Post:
Problem is the CPU / Motherboard or recent BIOS upgrade / Windows 11 feature update that all happened over a 2 day period

The problem was not the GPU.
24) Message boards : GPUs : Bad or incompatible GPU? (Message 110389)
Posted 9 Nov 2022 by Profile Joseph Stateson
Post:
Think I have a bad or incompatible RTX-3070.
RTX-2080 went dead in a Win11 gen3 express system (Area51) and Gigabyte replaced it under warranty with RTX-3070. They did not have any more 2080
I observed the following problem briefly discussed over at Boinctasks

All my CPU tasks are running at less than %50 utilization.
The top 6 Universe were run on an i9-7900x at 3.3ghz
The bottom Universe on an old Xeon x5650 at 2.67ghz and complete 2x as fast as the i9 cpu.
All CPUs were allowed %100. There is no thermal throttling. The previous system with RTX2080 the CPU was always %100 busy
Either the graphics board is defective or the gen4 express is causing problems in a gen3 system.
The GPU tasks runs fine, just a problem with the CPU utilization.

[/img]
25) Message boards : Projects : Cosmology's certificate has expired again (Message 110385)
Posted 9 Nov 2022 by Profile Joseph Stateson
Post:
http works find but not https
https://www.cosmologyathome.org/

Existing system with that project work fine but I cannot log into my account or view results as https is required

I have read where this has happened several times over the years..
26) Message boards : Questions and problems : Install of client failed to obtain the service script (Message 110368)
Posted 8 Nov 2022 by Profile Joseph Stateson
Post:
OK, found problem (there was none)
Some ubuntu release puts the script into
/usr/lib/systemd/system
others use
/lib/systemd/system

when the app did not run, I looked at /lib and it was not there and I assumed that was the problem.
27) Message boards : Questions and problems : Install of client failed to obtain the service script (Message 110359)
Posted 7 Nov 2022 by Profile Joseph Stateson
Post:
sudo add-apt-repository ppa:costamagnagianfranco/boinc
followed by install of boinc-client failed to install the service script at /lib/systemd/system

This occurred after a clean install of 18.04.6
The scriipt was installed just fine when I tried 22.04.1
I assume I can no longer use 18 and will have to find out why my RX570 drivers failed to install in 22.04
Is there any reason why the service script did not get installed using the franco repository?
28) Message boards : Questions and problems : Illegal opcode on new install of 22.04 (Message 110357)
Posted 7 Nov 2022 by Profile Joseph Stateson
Post:
Too late to edit my post.

I assume the problem of the illegal opcode is related to the android version being placed into the distribution as discussed over at Github
No longer using 22.04 as I had problems with AMD drivers for OpenCl in addition to BOINC permission problems. However, the boinc versoin from
sudo add-apt-repository ppa:costamagnagianfranco/boinc
worked fine
29) Message boards : Questions and problems : Illegal opcode on new install of 22.04 (Message 110346)
Posted 6 Nov 2022 by Profile Joseph Stateson
Post:
Consistent every time I run the start script or even ask for version

    kern.log:Nov 6 17:05:59 dual-linux kernel: [ 1840.265297] traps: boinc[3522] trap invalid opcode ip:55de226c06a8 sp:7ffe42695620 error:0 in boinc[55de226b6000+c3000]



Somewhat similar discussion here
https://forums.linuxmint.com/viewtopic.php?t=379747


Going to try another repository costamagnagianfranco

[edit] OK, can get version info after getting that better repository. Probably working

30) Message boards : Questions and problems : Duplicate CPID (Message 106583)
Posted 28 Dec 2021 by Profile Joseph Stateson
Post:
For the machine you need to change the CPID to the correct one, just edit its client_state.xml file and change the CPID in the <external_cpid>{CPID}</external_cpid> field and save the file.
The external CPID field is the one that the stats sites and GridPool pay attention to.


I've just checked one of the offending machines, and that field is blank. Can I just put anything in there? I suppose I could make something that looks similar to what they should be.

The file has:
<cross_project_id>d8ffd2cc24b930b1c79b0041a5f21fc5</cross_project_id>
<external_cpid></external_cpid>

But grcpool.com shows that machine as dd731609735947bc6aed4d89a2205adc


I just went through this as un-accountably, einstein had not put a value in the extern_cpid field

The value in there is the one for all your systems, the one that starts with 53...

https://www.gridcoinstats.eu/cpid/53ed9d9b7d568cb7eb1ccc25a7dc4492

Hopefully you do not have different external CPIDs which is a PITA to fix.

[edit] I do not see any of your system addresses. Either you have another account or the systems have not propagated their CPID into the eu stats table. if you put in my name you will see 5 active addresses each corresponding to a system CPID and under that the four main projects I work on. You should have a single external CPID good for all your work.
31) Message boards : Questions and problems : Boinc Manager Keep Stop Workinig (Message 106550)
Posted 27 Dec 2021 by Profile Joseph Stateson
Post:
Look in the windows event viewer at both Applications and System for warnings or errors.
Often the logs are so large it is hard to find things. I would delete the logs, reboot then
try to add a project then go back to the the event log and look for error messages near
the time you attempted to add a project. I suspect some type of system problem.
32) Message boards : Questions and problems : Windows 11 ssh and remote desktop still not work with GPU? (Message 106546)
Posted 27 Dec 2021 by Profile Joseph Stateson
Post:

I'm dealing with that right now. Is there any difference between the free version of VNC and the professional version of VNC in terms of security? I can't find anything on their website that lays out the differences.


Do not use the pro version, sorry.

I did pay for the Splashtop remote subscription so I can use my cell phone to access my systems but I am using the "5 free one" version even on Splashtop.
33) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106541)
Posted 26 Dec 2021 by Profile Joseph Stateson
Post:
Follow-up on this problem. All my WCG systems have stabilized after a week of 24/7 and I got a "solution" to setting Priority to 0 with that max_concurrent app option.

Recap: Setting Priority to 0 normally means the queue never exceeds 1 work unit even if each core is working on a WCG task.

During initial configuration of BOINC it is possible a lot of unwanted work units will download but eventually the system will get to where a new download occurs only when there are not other tasks of the same type (CPU) in the queue. This is a different problem.

When using "max_concurrent" in WCG's app_config file, I was able to demonstrate that Priority of "0" is ignored if the number of cores allocated to the system is greater than the value of that max_concurrent parameter.


On my test system, I left # cores at 11 with max_concurrent at 8 and the number of WCG tasks increased to several 100. However, at no time did the number of waiting work units exceed the deadline. As long as I left the system running 24/7 they would all finish in within the dead line. When I set the number of cores down to 8, the same value as max_concurrent, there were no more downloads of work units and eventually the queue got down to 0 at which time a single download occurred. This is the expected behavior for Priority of 0.

Probably not many users have priority set to 0. Should this problem should be reported as an issue over at github? Can someone else verify this behavior?

Thanks for looking!
34) Message boards : Questions and problems : Inconsistency in "Project List" (Message 106456)
Posted 18 Dec 2021 by Profile Joseph Stateson
Post:
BM is independent of BOINC so you need to address this question with them, bearing in mind than some projects do not share all their information with third parties of any one of a great number of reasons/excuses.


I am confused. I thought the Boinc Manager (BM) was part of the Boinc project since there is only one github "boinc" project.

[EDIT] You are probably thinking of "BAM!" a 3rd party. I mean the Boinc Manager that comes with Berkeley's boinc download.
35) Message boards : Questions and problems : Inconsistency in "Project List" (Message 106454)
Posted 18 Dec 2021 by Profile Joseph Stateson
Post:
I am working on a pet project where I analyze the job_log files. Some are old retired projects, others active. The job log is named in a way that partially identifies the actual project and I wanted to look up the info and get the "real name" of the project.

BM uses "get_all_project_list" an RPC call that I do not want to replicate plus it does not get everything.

In the boinc folder there is a file "all_projects_list.xml" but is is missing, for example, Asteroids@home.
I only noticed that because it starts with "A" and was one of the first I looked for. It shows up in the manager when "add project" is selected so I am not sure why it is missing from that all_projects_list file.

If BM creates that "all_project_list.xml" why is it missing that Asteroids project?
36) Message boards : Questions and problems : Inconsistency in project url for WCG (Message 106453)
Posted 18 Dec 2021 by Profile Joseph Stateson
Post:
When using BM to add World Community Grid the prefix https is used
Unaccountably, WCG complains that it need to be "http" and I get 1000's of error messages that show up like this:
lenovos20

56	World Community Grid	12/17/2021 2:02:52 PM	This project seems to have changed its URL.  When convenient, remove the project, then add http://www.worldcommunitygrid.org/	

That message originates in the client and is not from the project. It is generated when the reply url is not the same as the current url.

I scanned some of the xml in ProgramData\boinc in my lenovo system and found the following
--------- SCHED_REPLY_WWW.WORLDCOMMUNITYGRID.ORG.XML
<master_url>http://www.worldcommunitygrid.org/</master_url>
      <url>https://www.worldcommunitygrid.org/research/viewAllProjects.do</url>
---------- SCHED_REQUEST_WWW.WORLDCOMMUNITYGRID.ORG.XML
   <source_project>https://www.worldcommunitygrid.org/</source_project>
C:\ProgramData\BOINC>find /i "http:" *world*.xml
---------- ACCOUNT_WWW.WORLDCOMMUNITYGRID.ORG.XML
---------- MASTER_WWW.WORLDCOMMUNITYGRID.ORG.XML
---------- SCHED_REPLY_WWW.WORLDCOMMUNITYGRID.ORG.XML
<master_url>http://www.worldcommunitygrid.org/</master_url>
---------- SCHED_REQUEST_WWW.WORLDCOMMUNITYGRID.ORG.XML
---------- STATISTICS_WWW.WORLDCOMMUNITYGRID.ORG.XML


On the systems where I did the detach and re-attach the following differences
D:\ProgramData\Boinc>find /i "http:" *world*.xml

---------- ACCOUNT_WWW.WORLDCOMMUNITYGRID.ORG.XML
    <master_url>http://www.worldcommunitygrid.org/</master_url>

---------- MASTER_WWW.WORLDCOMMUNITYGRID.ORG.XML

---------- SCHED_REPLY_WWW.WORLDCOMMUNITYGRID.ORG.XML
<master_url>http://www.worldcommunitygrid.org/</master_url>

---------- SCHED_REQUEST_WWW.WORLDCOMMUNITYGRID.ORG.XML
   <source_project>http://www.worldcommunitygrid.org/</source_project>
    <source_project>http://www.worldcommunitygrid.org/</source_project>

---------- STATISTICS_WWW.WORLDCOMMUNITYGRID.ORG.XML
    <master_url>http://www.worldcommunitygrid.org/</master_url>


So, after detaching and re-attaching all the request, reply, sched xml files are consistent as "http". I have had to do this on every system as that error message is printed up for every transaction which make the log file badly cluttered. What is strange is that the client_state.xml uses https and does not generate any problems and for that matter, not making the HTTP change has only a cosmetic effect: Lot of error message.

I assume the fix to this is to have WCG use "https" in their reply.
Alternately, it would be nice if once the user is notified there are no further error messages from the same project.
For that matter there should be no back-to-back identical error messages.

Observation: If WCG got the "request" and "replied" then why is there a problem? If they cannot reply because the url is wrong there is obviously a real problem.
37) Message boards : Questions and problems : missing project in BM project list (Message 106452)
Posted 18 Dec 2021 by Profile Joseph Stateson
Post:
The project "WUProp@home" is missing from BM's "add project"
It is listed here https://boinc.berkeley.edu/wiki/Project_list
How does one go about getting a project added so it shows up when the manager does an RPC "get_all_project_list" call?

There is a contact number at this location for adding https://boinc.berkeley.edu/projects.php but I suspect that does not get onto the BM RPC call database and someone at the project needs to make the request in any event.
38) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106446)
Posted 16 Dec 2021 by Profile Joseph Stateson
Post:


I find it unlikely that WCG would issue 1600 tasks in response to a single request: most projects set a lower limit in their server's feeder configuration (100 or 200).

Even without the current log, you can still track the history.
* Under Windows, in files stdoutdae.txt and stdoutdae.old in the data folder. You can configure those to retain any size you like.
* In the task list (either in the BOINC Manager, or on the project website), by inspecting the deadlines of the allocated tasks. WCG requests a delay of two minutes between fetches: you would be able to sees a discontinuity of 2+ minutes between batches if multiple fetches were involved.


[edit-2] All three systems used app_config to limit the number of concurrent WCG tasks. I replaced boinc.exe with that version you posted. Maybe this will fix the problem?
Yes, that's exactly what the #4592 patch was designed to fix. That's why it's called "client: fix work-fetch logic when max concurrent limits are used". The problem makes itself apparent by causing multiple, repeated, limitless, work fetch requests. Which Is why I keep asking if multiple, repeated, limitless, work fetch requests are visible in your logs.


I think the only exception to large download limit is that "lost task" download but my tasks were not lost, they were aborted due to no possibility of finishing.

An observation that might be be a clue:
One of my win10 + NVidia systems has app_config with max concurrent of 9 but does not and never had a problem with too many WCG downloads. However, I set max number of core to 9 on that system and it also runs one Einstein. I am guessing the max_concurrent is not used as the # of cores limit takes precedence in the fetch algorithm???
My linux system uses max cores to limit wcg and not app_config and it does not and never had a problem.
39) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106445)
Posted 16 Dec 2021 by Profile Joseph Stateson
Post:
Just checked my Lenovo again. More tasks have downloaded but the deadline has not changed. There are 135 tasks waiting. At 4 hours per task and 8 cores that is 68 hours of work and is still within the deadline of 12/23. However, there should have been no downloads with project priority of 0. Perhaps that feature (the "0") is not a "client" specification anymore if it ever was. I have been using it as a fallback project so if Milkyway runs dry (like just happened recently) then Einstein gets to run but as soon as one Einstein finishes Milkyway can take over since it is 100% and Einstein is %0. AFAICT WCG is the only project where the "0" has a problem.

I do not want to babysit WCG. if it wants to download 1700 apps I do not want to crunch apps that will not be used. I have spotted 100's of their apps as "aborted by project" on my system and have been trying to figure out how to prevent it. Project priority of "0" does not work on some system and on others it does. As I have been writing this post that WCG app count went from 135 to 145. If I shut the system down for a long weekend about 1/2 will be expired before they even start.
40) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106443)
Posted 16 Dec 2021 by Profile Joseph Stateson
Post:
After installing that "max fix" version on 3 system I got one system that responded after "allow new work"

On the LenovoS20 that had 8 apps running (max concurrent is 8) and with no apps waiting there were two back to back downloads that totaled 14 days / 84 work units.
That actually can be done as calculating 4 hours per core and 8 cores with deadline of 12/22 through 12/23. All 84 apps should finish in about 42 hours.
The problem is that NONE should have downloaded with share of 0.

The other two systems I put the "max fix" on had a day of WCG already waiting so I assume that affected the "allow new work" differently and they were not tempted into downloadiing more stuff.

The net effect is that (1) I am confident my 7.16.3 "special" that contains a coding "mod" for the Milkyway idle problem did not cause the WCG problem. I do plan to update that app eventually.
(2) there is a problem with WCG and/or the client config as some of my systems work perfectly with share=0 on WCG and others do not. I suspect most users do not use share=0 so no complaints.
41) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106442)
Posted 16 Dec 2021 by Profile Joseph Stateson
Post:
Oh dear. The resource share you read in a <work_fetch_debug> segment of the event log has NOTHING TO DO with the resource share you set on a project web site.
To a first approximation, a dump indicates a server problem: a trickle indicates a client problem. We need to know where to start looking.


Well, at least I was correct about the 2c.

Hmm - I do not remember problems like this in other projects NOR in WCG before they implemented GPU for that COVID app.
There seems to be no rhyme nor reason to this problem as some systems are not affected:

- 16 core linux with 7.16.3 and 2 AMD boards never has a problem. When one WU is uploaded another is downloaded
- Pair of windows 10 with NVIdia likewise no problem Been running perfectly for a long time. A single download for every upload.

All the new system I recently built have problems except one

The one with no problem runs win10 and BOINC as a service and WCG is %100 share. 3 cores of 4 are allocated and checking I just saw that 20 tasks are waiting which is OK.

There are 3 system with problems Two are newly minted win10 and my main desktop that I just upgraded to win11. All have a single NVidia and are set for "No New Tasks" on WCG until the problem gets fixed.

I suspect there is a dump of WCG tasks. I did not see the 1600 download all at once as a dump as the event log was too big and got truncated.

On one system (my desktop, share = 0) I watched about 10 days worth download while 20 days worth were waiting to run.
After it stabilized at 30 days worth I increased the core count and watched another 10 days worth download.
I had a limit of only 6 concurrent WCG tasks. There should have been no need to download anything on account of share=0 and the limit of 6
Not sure if this counts as a trickle.

Should I be running that version you posted about two weeks ago? I tested it out on one of the new systems but the problem was the initial startup after installing BOINC which is not the same as I am seeing here.

[edit] I deleted my boinc.exe and copied over your version, the "max fix" one to try it.
Question: when building the x64 release I get an executable that is 2x as big as the 7.16.20 that Berkeley has. There must be some setting the in my VS2019 that is different from Berkeley's. Usually the debug version is the size hog.

[edit-2] All three systems used app_config to limit the number of concurrent WCG tasks. I replaced boinc.exe with that version you posted. Maybe this will fix the problem?
42) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106438)
Posted 16 Dec 2021 by Profile Joseph Stateson
Post:
The code change had no real effect. While the 1.000 no longer showed up in the log file, the system with only 8 cores went and got 10 days worth of work. The system with 20 cores got just one day. Neither system should have download more than 1 WCG task at a time with share set to 0. None of my other projects have this behavior. There is a problem somewhere..
43) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106436)
Posted 16 Dec 2021 by Profile Joseph Stateson
Post:
Some thoughts on the following code, worth about 2c (my thoughts, not the code)

        if (!p->rsc_pwf[j].rsc_project_reason) {
                p->rsc_pwf[j].fetchable_share = rsc_work_fetch[j].total_fetchable_share?p->resource_share/rsc_work_fetch[j].total_fetchable_share:1;
...
...
        msg_printf(p, MSG_INFO,
            "[work_fetch] share %.3f %s %s",
            rpwf.fetchable_share,
            rsc_reason_string(rpwf.rsc_project_reason),
            buf


The following indicates that a "1" was not put into the resource
That means the IF part reason was "false" and consequently the project_reason was "true"
[work_fetch] share 0.000 zero resource share



The following indicates that not only was the IF true (project_reason was false)
but in addition the "rsc_reason_string" is empty as nothing was printed.
[work_fetch] share 1.000 


Anyway, I edited that code and changed "1" to "0" and put a copy "7.16.19" on two of my worst WCG offender systems.
After rebooting the system with 20 cores downloaded 4 new apps and the system with only 8 cores downloaded only 2 apps. This was after I aborted abot 75 days of work most of which could not have been completed by the deadline.

Will know tomorrow for sure if my "fix" worked.
44) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106434)
Posted 15 Dec 2021 by Profile Joseph Stateson
Post:
Found something strange in the code

Looking for "[work_fetch] share 0.000 "

I found the above was printed by the function
void RSC_WORK_FETCH::print_state(const char* name) {
...
....
        msg_printf(p, MSG_INFO,
            "[work_fetch] share %.3f %s %s",
            rpwf.fetchable_share,
            rsc_reason_string(rpwf.rsc_project_reason),
            buf
...


where that variable the has the value of 0.0000 (or 1.0 or 0..5) is defined here
double fetchable_share;
        // this project's share relative to projects from which
        // we could probably get work for this resource;
        // determines how many instances this project deserves


and it can be set to "1" here based on "project reason"
           if (!p->rsc_pwf[j].rsc_project_reason) {
                p->rsc_pwf[j].fetchable_share = rsc_work_fetch[j].total_fetchable_share?p->resource_share/rsc_work_fetch[j].total_fetchable_share:1;
 

so if "project reason" is true (just noticed the negation) then share is set to 1.0

I do not know where the 0.5 came from. However, someone has hard coded a 1.0 for the project share which is suspicious. If I knew more about "project reason" maybe there is a "reason"

[edit] just realized that "rsc_reason_string(rpwf.rsc_project_reason)" is null since nothing was printed after the 1.0000 so it is "false" ?? and a 1.0 seems to have been assigned to project share ?

HTH
45) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106432)
Posted 15 Dec 2021 by Profile Joseph Stateson
Post:
[edit] I had to delete most of what I wrote as I had been looking at the wrong system.
The system that had downloaded just one tass has now gone and downloaded a few more for a total of 4. That is probably ok. During that time another 7.16.20 downloaded another weeks worth
46) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106431)
Posted 15 Dec 2021 by Profile Joseph Stateson
Post:
WHAT DID THE EVENT LOG SAY ABOUT FETCHING?


[EDIT] i fixed the version numbers I had garbled up. Note that ALL system had share set to 0 and had been that way for a long time.

OK, I turned on <work_fetch_debug> on three systems. One I had to stop and restart as the chatter went off the event screen and the "top" was missing
LOOKS LIKE I DUPLICATED THE PROBLEM FROM 7.16.3 ON 7.16.20!!

The two I just upgraded to 7.16.20 and the one I just recently restarted. There was a difference ON ALL THREE

This one running 7.16.20 downloaded one task. I had just aborted 1600+ and was afraid I would not get any because of daily limit, but I did get one. So actually, this is normal

bjysdualx2

84			12/15/2021 1:43:40 PM	[work_fetch] target work buffer: 86400.00 + 0.00 sec	
85			12/15/2021 1:43:40 PM	[work_fetch] --- project states ---	
91	World Community Grid	12/15/2021 1:43:40 PM	[work_fetch] REC 26763.703 prio -0.000 can request work	
92			12/15/2021 1:43:40 PM	[work_fetch] --- state for CPU ---	
93			12/15/2021 1:43:40 PM	[work_fetch] shortfall 1869495.34 nidle 0.00 saturated 230.49 busy 0.00	
99	World Community Grid	12/15/2021 1:43:40 PM	[work_fetch] share 0.000 zero resource share 	
100			12/15/2021 1:43:40 PM	[work_fetch] --- state for AMD/ATI GPU ---	
101			12/15/2021 1:43:40 PM	[work_fetch] shortfall 344087.02 nidle 0.00 saturated 230.49 busy 0.00	
107	World Community Grid	12/15/2021 1:43:40 PM	[work_fetch] share 0.000 zero resource share 	
108			12/15/2021 1:43:40 PM	[work_fetch] ------- end work fetch state -------	
120	World Community Grid	12/15/2021 1:43:40 PM	choose_project: scanning	
121	World Community Grid	12/15/2021 1:43:40 PM	can't fetch CPU: zero resource share	
122	World Community Grid	12/15/2021 1:43:40 PM	can't fetch AMD/ATI GPU: zero resource share	
123			12/15/2021 1:43:40 PM	[work_fetch] No project chosen for work fetch	
124			12/15/2021 1:44:41 PM	choose_project(): 1639597481.509739	


The above does not show any download because I had to restart to get the "TOP"

The next is for another 7.16.20 that unfortunately downloaded more stuff. I had just restarted after putting in 7.16.20 and then I aborted 50 day worth and that must have triggered more downloads. I did not have work_fetch_debug in the cc so I missed what happened when it got extra stuff.. I then changed %cpu to allow more tasks and got more downloads THAT SHOULD NOT HAVE HAPPENED (note the 14 cpu a change from 12 caused more tasks)

JYSArea51

1779			12/15/2021 1:57:59 PM	   max CPUs used: 14	
1780			12/15/2021 1:57:59 PM	   (to change preferences, visit a project web site or select Preferences in the Manager)	
1781			12/15/2021 1:57:59 PM	[work_fetch] Request work fetch: Prefs update	
1782			12/15/2021 1:57:59 PM	[work_fetch] Request work fetch: Preferences override	
1783			12/15/2021 1:58:00 PM	choose_project(): 1639598280.665096	
1784			12/15/2021 1:58:00 PM	[work_fetch] ------- start work fetch state -------	
1785			12/15/2021 1:58:00 PM	[work_fetch] target work buffer: 8640.00 + 43200.00 sec	
1786			12/15/2021 1:58:00 PM	[work_fetch] --- project states ---	
1810	World Community Grid	12/15/2021 1:58:00 PM	[work_fetch] REC 6981.661 prio -1000.053 can't request work: scheduler RPC backoff (13.04 sec)	
1812			12/15/2021 1:58:00 PM	[work_fetch] --- state for CPU ---	
1813			12/15/2021 1:58:00 PM	[work_fetch] shortfall 700695.25 nidle 7.00 saturated 0.00 busy 0.00	
1837	World Community Grid	12/15/2021 1:58:00 PM	[work_fetch] share 0.000  	
1839			12/15/2021 1:58:00 PM	[work_fetch] --- state for NVIDIA GPU ---	
1840			12/15/2021 1:58:00 PM	[work_fetch] shortfall 51647.39 nidle 0.00 saturated 192.61 busy 0.00	
1864	World Community Grid	12/15/2021 1:58:00 PM	[work_fetch] share 0.000 zero resource share 	
1866			12/15/2021 1:58:00 PM	[work_fetch] ------- end work fetch state -------	
1914	World Community Grid	12/15/2021 1:58:00 PM	choose_project: scanning	
1915	World Community Grid	12/15/2021 1:58:00 PM	skip: scheduler RPC backoff	
1919			12/15/2021 1:58:00 PM	[work_fetch] No project chosen for work fetch	
1920			12/15/2021 1:58:13 PM	[work_fetch] Request work fetch: Backoff ended for World Community Grid	
1921			12/15/2021 1:58:15 PM	choose_project(): 1639598295.784178	
1922			12/15/2021 1:58:15 PM	[work_fetch] ------- start work fetch state -------	
1923			12/15/2021 1:58:15 PM	[work_fetch] target work buffer: 8640.00 + 43200.00 sec	
1924			12/15/2021 1:58:15 PM	[work_fetch] --- project states ---	
1948	World Community Grid	12/15/2021 1:58:15 PM	[work_fetch] REC 6981.661 prio -1000.052 can request work	
1950			12/15/2021 1:58:15 PM	[work_fetch] --- state for CPU ---	
1951			12/15/2021 1:58:15 PM	[work_fetch] shortfall 700709.42 nidle 7.00 saturated 0.00 busy 0.00	
1975	World Community Grid	12/15/2021 1:58:15 PM	[work_fetch] share 1.000  	
1977			12/15/2021 1:58:15 PM	[work_fetch] --- state for NVIDIA GPU ---	
1978			12/15/2021 1:58:15 PM	[work_fetch] shortfall 51661.53 nidle 0.00 saturated 178.47 busy 0.00	
2002	World Community Grid	12/15/2021 1:58:15 PM	[work_fetch] share 1.000  	
2004			12/15/2021 1:58:15 PM	[work_fetch] ------- end work fetch state -------	
2052	World Community Grid	12/15/2021 1:58:15 PM	choose_project: scanning	
2053	World Community Grid	12/15/2021 1:58:15 PM	can fetch CPU	
2054	World Community Grid	12/15/2021 1:58:15 PM	CPU needs work - buffer low	





The system still running 7.16.3 downloaded another week worth. The is the chatter:

lenovos20

43			12/15/2021 1:24:01 PM	choose_project(): 1639596241.273872	
44			12/15/2021 1:24:01 PM	[work_fetch] ------- start work fetch state -------	
45			12/15/2021 1:24:01 PM	[work_fetch] target work buffer: 86400.00 + 0.00 sec	
46			12/15/2021 1:24:01 PM	[work_fetch] --- project states ---	
48	World Community Grid	12/15/2021 1:24:01 PM	[work_fetch] REC 4124.711 prio -0.112 can request work	
49			12/15/2021 1:24:01 PM	[work_fetch] --- state for CPU ---	
50			12/15/2021 1:24:01 PM	[work_fetch] shortfall 695894.09 nidle 1.00 saturated 0.00 busy 0.00	
52	World Community Grid	12/15/2021 1:24:01 PM	[work_fetch] share 1.000  	
53			12/15/2021 1:24:01 PM	[work_fetch] --- state for NVIDIA GPU ---	
54			12/15/2021 1:24:01 PM	[work_fetch] shortfall 18361.75 nidle 0.00 saturated 68038.25 busy 0.00	
56	World Community Grid	12/15/2021 1:24:01 PM	[work_fetch] share 0.500  	
57			12/15/2021 1:24:01 PM	[work_fetch] ------- end work fetch state -------	
58	World Community Grid	12/15/2021 1:24:01 PM	choose_project: scanning	
59	World Community Grid	12/15/2021 1:24:01 PM	can fetch CPU	
60	World Community Grid	12/15/2021 1:24:01 PM	CPU needs work - buffer low	
61	World Community Grid	12/15/2021 1:24:01 PM	checking CPU	
62	World Community Grid	12/15/2021 1:24:01 PM	[work_fetch] using MC shortfall 591132.340164 instead of shortfall 695894.087949	
63	World Community Grid	12/15/2021 1:24:01 PM	[work_fetch] set_request() for CPU: ninst 10 nused_total 227.00 nidle_now 1.00 fetch share 1.00 req_inst 0.00 req_secs 591132.34	
64	World Community Grid	12/15/2021 1:24:01 PM	CPU set_request: 591132.340164	
65	World Community Grid	12/15/2021 1:24:01 PM	checking NVIDIA GPU	
66	World Community Grid	12/15/2021 1:24:01 PM	[work_fetch] using MC shortfall 18361.747788 instead of shortfall 18361.747788	
67	World Community Grid	12/15/2021 1:24:01 PM	[work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 0.00 fetch share 0.50 req_inst 0.00 req_secs 18361.75	
68	World Community Grid	12/15/2021 1:24:01 PM	NVIDIA GPU set_request: 18361.747788	
47) Message boards : Questions and problems : Getting too may WCG tasks on systems that had been working ok (Message 106429)
Posted 15 Dec 2021 by Profile Joseph Stateson
Post:
Going to switch to latest version as I cannot account for why too many tasks are being downloaded when share is set to 0.

I have several 7.16.3 and the linux ones do not show a problem. Three win10 systems:
- 70 days, 834 tasks
- 322 days, 1658 tasks and I had abort 700+ tasks a few days ago.
- 2 days, 16 tasks

The above was not on new builds where share is set to 100 for a few minutes.

I went over to the WCG forum but did not see any similar problems. They do not have a "question and problems" forum so I had to poke around
It does not look like a problem at their end caused by the move from IBM. If it happens with 7.16.20 then I can try to debug it if I knew what to look for.

[edit] I just started boinc back up on a windows system that rebooted due to windows feature update. It has 7.16.3 and i just watched it download additional WCG tasks when there was no need. Share was 0 and there were already a weeks worth of tasks. Maybe when rebooting the %0 is not noticed ???
48) Message boards : Questions and problems : All Milkyway@Home GPU WU"s get Computation error (Message 106400)
Posted 13 Dec 2021 by Profile Joseph Stateson
Post:
I'm trying to get some GPU work going on my Mac Pro. It is running 10.13.6 and has a GeForce GT120.


that board does not support double precision float. All GPU tasks will fail.
49) Message boards : Questions and problems : WCG: new systems download 100s of CPU work units, not possible to work all (Message 106386)
Posted 11 Dec 2021 by Profile Joseph Stateson
Post:
[edit] I didnt wait long enough. Got additional tasks. Maybe this fixes the 91 second minimum delay problem!!! Will let it run for a while

Wow!! Could it be as simple as that? What I would like to see is a reported task and requested work during the same scheduler connection being filled.


Sorry, just got around to reading this.

No, that option did not cause new work units to be downloaded after a "finished" upload.
The work count starts at 300 for a single board and slowly drops to 0 and then there is that 91 second + up to 5 minute wai and occasionally even longer idle.

I think what happened was I requested an update and it just so happened that 91 seconds had elapsed since the last request so I actually got serviced.

On my "racks" with multiple GPUs an MW work unit finishes on the average of every 15 seconds so the 91 second requirement never happens. This test system had 1 board and all 4 tasks finish about exactly the same time and 2.5 minutes apart so there is a good chance the 91 seconds have elapsed. The net effect is I still have to use my boinc client "mod": to avoid the long idle time.
50) Message boards : Questions and problems : WCG: new systems download 100s of CPU work units, not possible to work all (Message 106383)
Posted 10 Dec 2021 by Profile Joseph Stateson
Post:
The option
<fetch_on_update>0</fetch_on_update>

is not working like I expected. I added it to cc_config.xml "options"
I think it works the way the developers intended:

<fetch_on_update>0|1</fetch_on_update>
When updating a project, request work even if not highest priority project.
Setting it to 1 adds extra fetching, but 0 doesn't block normal fetches. That quote comes from the User Manual.


IMHO the "Extra Fetch" was clearly added as shown quote "Sending scheduler request: Requested by user"

I set the option to >1< and restarted the client and did an update after a few minutes and got essentially the same thing
hp3400

57	Milkyway@Home	12/10/2021 1:48:11 PM	update requested by user	
58	Milkyway@Home	12/10/2021 1:48:15 PM	Sending scheduler request: Requested by user.	
59	Milkyway@Home	12/10/2021 1:48:15 PM	Requesting new tasks for AMD/ATI GPU	
60	Milkyway@Home	12/10/2021 1:48:33 PM	Scheduler request completed: got 0 new tasks	
61	Milkyway@Home	12/10/2021 1:48:33 PM	Not sending work - last request too recent: 35 sec	
62	Milkyway@Home	12/10/2021 1:48:33 PM	Project requested delay of 91 seconds	


Unless I am missing something, there is no difference on either update I requested other than I did get additional tasks with the >0<

so with or w/o work is always requested.

[edit] I didnt wait long enough. Got additional tasks. Maybe this fixes the 91 second minimum delay problem!!! Will let it run for a while

hp3400

57	Milkyway@Home	12/10/2021 1:48:11 PM	update requested by user	
58	Milkyway@Home	12/10/2021 1:48:15 PM	Sending scheduler request: Requested by user.	
59	Milkyway@Home	12/10/2021 1:48:15 PM	Requesting new tasks for AMD/ATI GPU	
60	Milkyway@Home	12/10/2021 1:48:33 PM	Scheduler request completed: got 0 new tasks	
61	Milkyway@Home	12/10/2021 1:48:33 PM	Not sending work - last request too recent: 35 sec	
62	Milkyway@Home	12/10/2021 1:48:33 PM	Project requested delay of 91 seconds	
63	Milkyway@Home	12/10/2021 1:50:04 PM	Sending scheduler request: To fetch work.	
64	Milkyway@Home	12/10/2021 1:50:04 PM	Requesting new tasks for AMD/ATI GPU	
65	Milkyway@Home	12/10/2021 1:50:07 PM	Scheduler request completed: got 36 new tasks	
66	Milkyway@Home	12/10/2021 1:50:07 PM	Project requested delay of 91 seconds	
51) Message boards : Questions and problems : WCG: new systems download 100s of CPU work units, not possible to work all (Message 106381)
Posted 10 Dec 2021 by Profile Joseph Stateson
Post:
The option
<fetch_on_update>0</fetch_on_update>


is not working like I expected. I added it to cc_config.xml "options"

<cc_config>
    <options>
        <use_all_gpus>1</use_all_gpus>
      <allow_remote_gui_rpc>1</allow_remote_gui_rpc>
      <fetch_on_update>0</fetch_on_update>
    </options>
</cc_config>


and restarted the client, waited a while, then requested an update and got over 100 tasks

hp3400

68	Milkyway@Home	12/10/2021 1:20:31 PM	update requested by user	
69	Milkyway@Home	12/10/2021 1:20:34 PM	Sending scheduler request: Requested by user.	
70	Milkyway@Home	12/10/2021 1:20:34 PM	Requesting new tasks for AMD/ATI GPU	
71	Milkyway@Home	12/10/2021 1:20:36 PM	Scheduler request completed: got 119 new tasks	


However, the project Milkyway has a known problem: It does not download new work units until 91 seconds after all existing work units have finished so getting 100+ tasks was doubly unexpected!
52) Message boards : Questions and problems : WCG: new systems download 100s of CPU work units, not possible to work all (Message 106361)
Posted 9 Dec 2021 by Profile Joseph Stateson
Post:
IMHO That problem with gpugrid is going to be hard to debug. I would not expect a gpu tasks to be swapped out for another from the same project.

Thinking about that reminds me of a problem that showed up over at Milkyway earlier that I tried to help with.
an n-body (cpu needs 4 threads) was totally idle wile four cpu tasks were running (system had only 4 cores).
My guess was the nbody was swapped out but would never got a time slice again because of all the smaller cpu tasks that finish at different times. All tasks were MW.
I suggest to run either one or the other but not both from the same project.

In other news I was able to verify that a new install of BOINC needed "WUprop" so that adding Einstein or WCG would not .cause 100s of downloads

Einstein is my fallback project with share = 0 and Milkway is my %100 as I can run 4 concurrent tasks.

I tried running two Einstein concurrently. Saw a tiny improvement but not enough to justify having to use a bigger fan to cool my rack of GPS.

I recently joined that supersecret GPU club and have some ideas to work on. One is to try to arrange my "boinc mod" so that if gpugrid gets suspended the GPUs get assigned to the same slot they were using.
When running my rack of three gpugrid tasks: p102-100, gtrx1070 and gtx1660ti all three can die when resumed from suspension as the CUDA compiler does not know the meta data is different and tries to pick up where it left off which causes a failure. The alternative is to run 3 instances of BOINC but that is a PITA.
53) Message boards : Questions and problems : WCG: new systems download 100s of CPU work units, not possible to work all (Message 106350)
Posted 8 Dec 2021 by Profile Joseph Stateson
Post:

Why is share being set to 100%. It is shown ad 0 in the manager but 100 is listed in the log (Boinctasks log)

10 12/7/2021 2:24:43 PM All projects have zero resource share; setting to 100

Wild guess:
"If All projects are set to zero, then there's no point in trying to do anything. So obviously this person doesn't know what he's doing. I'll be helpful and set them to 100% for him."


What I find strange is that of all the settings the user can control, the parameter that determines a project "share" is controlled at the project account and not at the boinc manager.

My first thought was that setting all to %100 allowed bundled Charity Engine to start crunching on un-suspecting users who would never have a project account nor know the definition of "share". However, after reading what Richard wrote about "fix my last checkin" I decided that Hanlon's razor is applicable here

I think there is a fix that does not involve adding an option to cc_config nor deleting that code. I run WUProp@home on systems that do not crunch CPU tasks so that I observe the CPU temperature that boinctasks displays. I just need to install WUProp on all new builds. It always runs at %100 and only one app ever runs. That will fix the "set all projects to %100" It just needs to be the first project added on new builds.
54) Message boards : Questions and problems : several cores flash 99 degree temps at 20 percent utilization (Message 106346)
Posted 7 Dec 2021 by Profile Joseph Stateson
Post:
I was running 60 and 70 percent of cores and cpu times and temps of system was in 80's. It started to get hot with fans roaring and found that there were various cores that flashed hot....even running at 20 per cent cores and 20 percent cpu....cpu temp bounces from 50 to 98 degrees.....any ideas


Look and see if there is a gap between the CPU heat spreader and the bottom of the heat sink. That actually happened to me and a copper shim fixit the problem until I found the correct cooler. I had the exact symptoms you mention.
55) Message boards : Questions and problems : WCG: new systems download 100s of CPU work units, not possible to work all (Message 106344)
Posted 7 Dec 2021 by Profile Joseph Stateson
Post:
I've just been having the same conversation with another user by email. So this is conveniently on my clipboard:

https://drive.google.com/drive/folders/14C1sfF9wDbG1U0fPSwkXx3jq_M1HrxwB?usp=sharing

You'll need both a .ZIP handler and a 7-zip handler to unpack boinc.exe - so good they compressed it twice.


?????

This must be your test hander that showed the problem
I had to suspend Einstein as it was downloading days worth of data with share set to "0" which is not right. I have 151 einstein tasks waiting to run. I can actually do that as the 2 GPU are good and the deadline is not tomorrow.

Why is share being set to 100%. It is shown ad 0 in the manager but 100 is listed in the log (Boinctasks log)


xps-435t

1			12/7/2021 2:24:42 PM	Starting BOINC client version 7.19.0 for windows_x86_64	
2			12/7/2021 2:24:42 PM	This a development version of BOINC and may not function properly	
3			12/7/2021 2:24:42 PM	Libraries: libcurl/7.80.0-DEV Schannel zlib/1.2.11	
4			12/7/2021 2:24:42 PM	Data directory: C:\ProgramData\BOINC	
5			12/7/2021 2:24:42 PM	Running under account josep	
6			12/7/2021 2:24:43 PM	CUDA: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 456.71, CUDA version 11.1, compute capability 6.1, 3072MB, 2488MB available, 3936 GFLOPS peak)	
7			12/7/2021 2:24:43 PM	CUDA: NVIDIA GPU 1: GeForce GTX 1060 3GB (driver version 456.71, CUDA version 11.1, compute capability 6.1, 3072MB, 2488MB available, 3936 GFLOPS peak)	
8			12/7/2021 2:24:43 PM	OpenCL: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 456.71, device version OpenCL 1.2 CUDA, 3072MB, 2488MB available, 3936 GFLOPS peak)	
9			12/7/2021 2:24:43 PM	OpenCL: NVIDIA GPU 1: GeForce GTX 1060 3GB (driver version 456.71, device version OpenCL 1.2 CUDA, 3072MB, 2488MB available, 3936 GFLOPS peak)	
10			12/7/2021 2:24:43 PM	All projects have zero resource share; setting to 100	
11			12/7/2021 2:24:43 PM	Version change (7.16.20 -> 7.19.0)	


why the following code in cs_statefile.cpp?

// if total resource share is zero, set all shares to 1
    //
    if (projects.size()) {
        unsigned int i;
        double x=0;
        for (i=0; i<projects.size(); i++) {
            x += projects[i]->resource_share;
        }
        if (!x) {
            msg_printf(NULL, MSG_INFO,
                "All projects have zero resource share; setting to 100"
            );
            for (i=0; i<projects.size(); i++) {
                projects[i]->resource_share = 100;
            }
        }
    }


Is this something that can be turned in as an issue?
56) Message boards : Questions and problems : WCG: new systems download 100s of CPU work units, not possible to work all (Message 106341)
Posted 7 Dec 2021 by Profile Joseph Stateson
Post:
I deliberately put one machine into the state where it was fetching the same quantum of new work every 30 seconds, and getting it, every time - so it was disregarding the new work when calculating what to fetch next time. Is that how your excess tasks arrive?

I downloaded and installed the CI test build of #4592: that cured it.


Doing something wrong: got the code that did not have the changes.

Clicked on that 4592 issue
Clicked on "dpa_max_concurrent"
observed the 6 day old change at client so I think I am looking at the mod you tested
selected "CODE" (the green box) and clicked on "Open with GitHub desktop"
Put the download in my project folder using my GitHub desktop
built using VS2019 release x64 no errors under win11
Looked at work_fetch.cpp and none of the changes were there

went back and re-looked at the green box and it is downloading from github.com/BOINC/boinc.git which I suspect is not what I wanted. I am not up to speed on using github for anything more than sharing my code.

Wanted to test that new boinc fix on my system as I want to enable WCG and do not want another 500+ downloads.

I built 3 system in last two weeks, one for a nephew and 2 for one of my kids. I forgot about the problem on the first system and was too slow getting around to stopping the WCG downloads on the next two.

57) Message boards : Questions and problems : WCG: new systems download 100s of CPU work units, not possible to work all (Message 106334)
Posted 7 Dec 2021 by Profile Joseph Stateson
Post:
Still having problems and I tried 7.16.20. I tried to make sure the share = 0 was recognized and configured only for Einstein instead of WCG

Rebuild of old system XPS-435t with three gtx-1060

Installed win10x64 21h2
Installed all Visual C Runtime (all versions)
Installed 7.16.20 and set advanced view
Added Einstein (my project default is GPU and share = 0)
Saw 100% appear under share and set "no new tasks" as soon as that option was enabled.
After a minute or two I saw a single tasks executing and that share had gone to 0.
I looked at the event log and the two GPUs that had only 3gb of memory were being ignored. I edited cc_config so that all 3 GPUs work and rebooted

Next time I looked there were 3 tasks executing but there were 12 GPU tasks waiting to execute. Should have been none waiting to execute.
The CPU has 12 threads. I checked but the 12 waiting tasks were all GPU tasks, none were CPU.
Just checked again and only 11 are left. Eventually will get down to 0 and then will be getting 1 for each one I turn in which is correct for share=0

Two days ago I aborted over 700 WCG tasks (total of 1200 in last 2 weeks) but it was my old 7.16.3 and so I decided to try 7.16.20 on a rebuild of an old system.
58) Message boards : Questions and problems : WCG: new systems download 100s of CPU work units, not possible to work all (Message 106245)
Posted 30 Nov 2021 by Profile Joseph Stateson
Post:
I recently assembled a pair of windows system with WCG pre-configured as "0" share. Normally only 1 wu per cpu gets downloaded.

Both systems have the older 7.16.3 boinc. Would the newer 7.20 handle this initialization correctly? I am guessing the project sees 12 threads and downloads a boatload of tasks and never notices that the share is supposed to be 0 till after the download.

I end up aborting 400+ files: about 58 days of work where the deadline was only about 3 days in the first place.
59) Message boards : Questions and problems : The BOINC client has exited unexpectedly 3 times within the last 3 minutes (Message 106243)
Posted 30 Nov 2021 by Profile Joseph Stateson
Post:
One of the problems I have with Linux is looking for informative error messages. About all I can do is grep for "boinc" in the var/log folder and then grep for "error". Usually the files are so large I need to delete them and reboot to be able to spot a problem the next time it happens.

Not anywhere as clean as looking in the windows event viewer and spotting "memory resource" warnings and finding I had too many apps running to leave suspended apps in ram.

AFAICT there are no linux apps that break logs into "apps" and "system" and then organize the results into critical, error, warning, and "info" like windows does.

I run about 10 systems and monitor the boinc event log using boinctasks. There are so many meaningless messages from all systems that I did a mod of boinc to filter out the worst ones. I have no need to be told 500 times that I can get CPU tasks but that I chose not to. Those push real error messages off the bottom of the log file before I can read them.
60) Message boards : Questions and problems : Can not get Rosetta Python (Vbox) tasks (Message 106109)
Posted 15 Nov 2021 by Profile Joseph Stateson
Post:

There is a big difference between a project disabling a feature and a project not enabling a feature


Just like the difference between a glass half empty and same one half full. You don't have all the resources you could have.
61) Message boards : Questions and problems : Can not get Rosetta Python (Vbox) tasks (Message 106093)
Posted 14 Nov 2021 by Profile Joseph Stateson
Post:
Apologies, I wasn't clear. I do not want to be able to choose sub-projects. I am happy to run whatever the project throws my way. But some of my machines do not download the VirtualBox tasks and I cannot work out why - that is my issue.


That's the problem: there's no way to tell it to send only vbox tasks.

LHC at home has vbox and you can pick and choose.
if you want you can select vbox (Atlas or theory) and nothing else will be downloaded except vbox apps

Default computer location	---
Run only the selected applications
SixTrack: yes
sixtracktest: yes
CMS Simulation: no
Theory Simulation: yes
ATLAS Simulation: no
ATLAS (long simulation): no


Rosetta does not list their apps where you can pick and choose.

This also makes debugging difficult: your vbox config might be wrong but you don't know whether it's BOINC or the project not sending what you want
62) Message boards : Questions and problems : Can not get Rosetta Python (Vbox) tasks (Message 106091)
Posted 14 Nov 2021 by Profile Joseph Stateson
Post:
I'm having the same issues. One machine is running a few Rosetta Python tasks and the other's queue is completely empty. Neither have any other projects enabled - only Rosetta.

PC 1: 3700X, 32GB RAM, Win 10, BOINC 7.16.11, VirtualBox 6.1.12, local account install.
PC 2: 2700X, 24GB RAM, Win 10, BOINC 7.16.20, VirtualBox 6.1.28, local account install.

I'm going to try uninstalling VirtualBox on PC 2 and then installing 6.1.12.

My first conclusion is that BOINC isn't currently sufficiently helpful when trying to run VirtualBox tasks - is that a fair assessment? It would be very helpful to have it show clearly whether VirtualBox is accessible. For example, on a service installation it would show that VB isn't available, or if virtualisation is disabled in the BIOS. Same for the GPU I guess.

As I'm typing this, Rosetta has downloaded a full queue of Rosetta 4.20 tasks (so non-VirtualBox tasks).


This project disabled the mechanism that allows users to select what sub-projects they want to run. A side effect is that one cannot distribute resources to prevent one subproject from "hogging" all the resources. There is nothing wrong with Berkeley's included version of Virtual Box. If you want a more recent version you can easily update.

The only way I see to run just the subprojects you want is to have a script that automatically aborts or deletes any download from Rosetta that is not wanted.

[EDIT] I recall something like that was done back in the days of the classic "seti cheaters" . One could determine how long it took to process a work unit ahead of time and it could then be deleted to get to a faster one.


You mgiht want to ask the admin to allow excluding 4.2 as you want to run python and see what they say, if anything. Good luck!
63) Message boards : Questions and problems : Projects need to do more to ensure proper downloads (Message 106082)
Posted 13 Nov 2021 by Profile Joseph Stateson
Post:
I swapped out two power supplies and two motherboards before I discovered that one of Milkyway's "star" parameter files was empty. It had 983 kilobytes of empty spaces.

Possibly it was corrupted by my system. It would have to have been corrupted during the initial download as the star map is never changed. Only new ones added.

Occasionally Boinc has an empty :"state" file. At least I get a warning right away This is the first time I have had a project with a critical file that is empty of data.
64) Message boards : Questions and problems : In the last couple days some tasks rapidly consume all RAM and freeze Windows requiring a hard restart. (Message 106043)
Posted 9 Nov 2021 by Profile Joseph Stateson
Post:
I have virtually the same system: Pair of RX570 and using 21.10.2 with three projects
WCG, Milkyway and Einstein. Unaccountably, all of my Einstein tasks are failing "Computation error"
The Milkyway and WCG have been running fine since I built the system 12 hours ago.
I just started looking at the Einstein problem when I read your post.

My system is not locking up, just Einstein tasks dropping dead yet over 500 valid Milkyway. Your system behavior is a lot different.

[edit] Posted over at Einstein about the driver


Problem solved for me over at Einstein. Their beta app does not work with "older AMD" cards like my RX-570 !
65) Message boards : Questions and problems : In the last couple days some tasks rapidly consume all RAM and freeze Windows requiring a hard restart. (Message 106042)
Posted 9 Nov 2021 by Profile Joseph Stateson
Post:
I have virtually the same system: Pair of RX570 and using 21.10.2 with three projects
WCG, Milkyway and Einstein. Unaccountably, all of my Einstein tasks are failing "Computation error"
The Milkyway and WCG have been running fine since I built the system 12 hours ago.
I just started looking at the Einstein problem when I read your post.

My system is not locking up, just Einstein tasks dropping dead yet over 500 valid Milkyway. Your system behavior is a lot different.

What projects are you running?

Where / how did you notice the memory usage?

Can you suspend all projects then let one at a time start up with only one task?
To do that set project No New Work and then do a user suspend of the tasks
That assume you can get to the manager before the system crashes.

[edit] Posted over at Einstein about the driver
66) Message boards : Questions and problems : Can not get Rosetta Python (Vbox) tasks (Message 106032)
Posted 8 Nov 2021 by Profile Joseph Stateson
Post:

1. Create a venue over at RAH: "School" and select only Python <-- not possible. RAH does not have that option. If it did I would have tried that already. You can not isolate Python from 4.2 tasks. I already tried to exclude 4.2. BOINC just coughed up errors. I can do the rest however. New profile and suspend everything else.


Bummer : I was not aware of that. Just went over there and read the following quote

Rosetta@home does not currently divide its tasks into sub-projects that users can select

Obviously there is injustice in having some projects suffer because only a few users wants to contribute to them Clearly this project and its administrators are fully Woke to this injustice!

That means my way or the highway and I took the highway long ago.

The following might not work in R@H project directory (app_config.xml) but you could try it.

<app_config>
<app>
<name>Woke</name>
<max_concurrent>1</max_concurrent>
</app>
</app_config>


There will be an error message "No Woke projects" and POSSIBLY there will be a list of sub-project names that can be used to exclude.

At least this works on other projects. They print out the names that can be used. I they spell out something like "RH4_2" then put that in place of "Woke" and only one of those 4.2 will be run. Hopefully you will get a viperload of Pythons.

HTH
67) Message boards : Questions and problems : Can not get Rosetta Python (Vbox) tasks (Message 106025)
Posted 8 Nov 2021 by Profile Joseph Stateson
Post:
RAH could easily stop sending 4.2 tasks and put Python in and there would be enough memory for a mix of Python and 4.2. But I get 0 Python these days


RAH is native to Windows. Python runs in Vbox only. The project cannot just swap a Python for an RAH like you suggest. Vbox has a say in what it wants and allows. It is not allowing any Python tasks and only it knows why?

Taking a guess: Can you log onto the guest OS and check resources? I assume it it linux or whatever or (got help you) Max OS. I can not help with any of those OS'es


What you should do if you want to sell car, house, spouse, and everything you own for the love of PYTHON

1. Create a venue over at RAH: "School" and select only Python
2. Set your system to use the "School" venue
3. Suspend all other BOINC and FAH tasks
4. Reboot
5 Wait (don't hold your breath) for some PYTHON to start up

Once you got a bunch of PYTHON running, then resume all other tasks and start looking for error messages and / or tasks that are refuse to run. If FAH no longer wanting to run go ask over at FAH what is going on.
68) Message boards : Questions and problems : Windows 11 ssh and remote desktop still not work with GPU? (Message 106023)
Posted 8 Nov 2021 by Profile Joseph Stateson
Post:
I wouldn't expect them to work as long as Microsoft uses its own - non-updatable - driver for remote desktop. That's what is causing this. A universal driver that only does 2D desktop rendering on all videocards, doing away with things as DirectX3D, OpenGL, OpenCL, and CUDA.


I do not understand what Microsoft is trying to do. They used to charge license fees for remote access to their servers and you had to be licensed in additional for office apps not just to log in. Each person that logged in got their own desktop. I assume they still do that. I am guessing that if you licensed their remote access you would be able to keep your GPU apps running.

So VNC, Splashtop and TeamViewer all work fine but Microsoft's remote access stops GPU tasks. I am guessing they consider this a security feature.

I ran out of free license for VNC and enabled Remote Desktop. Then I spend an hour trying to debug why Milkyway and Einstein were not running before I remembered that the GPU is suspended when remoting in. Very likely, win11 will not change anything.
69) Message boards : Questions and problems : Can not get Rosetta Python (Vbox) tasks (Message 106020)
Posted 8 Nov 2021 by Profile Joseph Stateson
Post:
Just trying to help. I assumed you asked over at Rosetta and got sent here or, more likely, no one answered.
I no longer run vbox. Takes too much resources from the host system. I have seen tasks waiting for both CPU and Virtual Memory and I have several dual xeon system with 24gb.

What about Vbox and your guest OS? All that adds up.
I am not familiar with all you are running. At first glance It look like cpu's are allocated as following
einstein: 2
wcg 2
sidock 3
rah 4
lhc 4
fah 2

Above adds to 16
Whatever is left over then includes the host OS, Vbox and the guest OS in Vbox
If CPU tasks are suspended are you allowing them to remain in memory?
Allowing enough disk space for the virtual memory?
AFAICT Boinc Manager does not show total CPU usage like Boinctasks. If I look at my cpu usage they are all near %99 and if not there is a problem somewhere as not enough CPU is available.
Can you tell if all your CPU tasks are really busy?
BT can also display virtual and physical memory in tabular form but it cannot show the virtual memory that the guest OS allocates. I don't think fah is a boinc app so it does not know about its resource needs correct?

[edit] asked about fah

70) Message boards : Questions and problems : Can not get Rosetta Python (Vbox) tasks (Message 106015)
Posted 8 Nov 2021 by Profile Joseph Stateson
Post:
Something does not add up. Vbox needs cpu and so does windows. I see you have 20 "in progress" Rosetta. How many of those are actually running. I am not running that project right now but when I did I recall each took 1.25 gb of ram. Check properties of a few tasks and see how much ram is required and add it up then add in the memory usage that was excluded,. A quick test would be to suspend all apps except Rosetta but you would also have to create a venue that allowed only python apps through.

Possibly event viewer might have warning message, not sure about that.
There are almost 5000 python apps ready to be sent by their server, but I don't see anything at Rosetta that shows how much ram each needs.
71) Message boards : Questions and problems : Windows 11 ssh and remote desktop still not work with GPU? (Message 106014)
Posted 8 Nov 2021 by Profile Joseph Stateson
Post:
I do not have windows 11 to test and was wondering if Remote Desktop and SSH still stop Boinc GPU apps. VNC and Splashtop work fine. I do not know why Microsoft's remote access tools are designed not to work.
72) Message boards : Questions and problems : SLI computing (Message 105983)
Posted 5 Nov 2021 by Profile Joseph Stateson
Post:
y=2 if SLI is active or y=1 if SLI is INactive?
I haven't been following all the most recent developments in GPU design and project application development, but I've never seen - or read discussion about - multi-GPU applications running under BOINC.

BOINC itself is not SLI aware. It will always see and report an SLI pair as two separate devices, and assign a task to run on one or the other. I'll look at the code, but I don't think it's possible (yet) to schedule a task to wait until two GPUs are both available, and then launch a single task to occupy both.

In my (only brief) experience of running an SLI pair, my observation was that SLI operation locked both GPUs to the same clock rate. This rather negated the benefit of modern developments like 'Graphics boost clock'.


Years ago, the project DNETC had an app that used both GPUs.

I had SLI and Crossfire enabled for testing at one time and noticed that the project I was testing on used only one of the two boards. At one time ATI turned on crossfire automatically if a second board was discovered. I remember having to disable crossfire in their driver to stop that.
73) Message boards : Questions and problems : Low CPU usage (Message 105982)
Posted 5 Nov 2021 by Profile Joseph Stateson
Post:
Temp throttling? Are you monitoring the temperatures? WCG tasks use 100% of CPU. 32 threads 24/7 in a 1U case?. I have two dual xeon servers, motherboards pulled from 1U racks. On one (16 threads)I had to use high speed cpu coolers. On another one, 24 threads, I got tired of the noise and put in DIY liquid cooling. Both are in open air racks. TThrottle can show temps and if you use Boinctasks it can report temperatures back to you.

74) Message boards : Questions and problems : Run Boinc as a service on windows 10? (Message 105925)
Posted 1 Nov 2021 by Profile Joseph Stateson
Post:
No reason why not. The restriction (no GPU in service mode) came about because of a Microsoft security restriction in the GPU driver model: I doubt they'll remove that now, even in Windows 11.

The big win is that service mode allows BOINC to do science when no user is logged into the console - ideal for servers.


This worked out really well. The system I put the service on is headless, no keyboard, has no usable GPU (old HP 6000) but does have a core 2 quad. It is running rsync and is the backup for a NAS.

I set Boinc to use 3 cores and not run if %25 busy or if there is any keyboard / mouse activity.

When I logged in using remote desktop Boinc suspended the 3 WCG tasks. After 2 minutes of no activity the apps resumed.
When I started the backup app on the NAS Boinc immediately suspended the WCG apps. The apps started back up when the backup was done.

This worked out nice. It is unfortunate that a GPU cannot be used when it service mode. I wonder if that is a problem with the opencl or cuda implemenations or at the driver level (ATI, NVidia) or at the Microsoft driver handler.
75) Message boards : Projects : WU-PROP is having some strange errors (Message 105905)
Posted 31 Oct 2021 by Profile Joseph Stateson
Post:
Yea, cant post. They probably have no one to monitor the project and keep it orderly. I recall a UOTD posted an in-appropriate comment and they stopped doing UOTD. Not allowing posts makes it easy to control spam.
My hours are there, sorry about yours

The project is missing from the boinc :"add project" and I have to poke around to figure out how to add them. I use them on system were I do not have enough CPU cores to crunch. That way they (wuprop) shows up in BioncTasks and I can see the temperature of the CPU on the Boinctasks tasks page. I have a couple of systems where the GPU needs a full CPU and I cannot crunch any CPU tasks so I dont know the temp of the CPU w/o wuprop.

Else the project is useless AFAICT.
76) Message boards : Questions and problems : Run Boinc as a service on windows 10? (Message 105882)
Posted 29 Oct 2021 by Profile Joseph Stateson
Post:
I know there is a problem (or was) with Boinc accessing the GPU. when installed as a service. I assume this is still true even with 7.16.20 on Windows 10.

What if only the CPU is being used by Boinc? Will a service work ok then?
77) Message boards : The Lounge : Apple M1, M1 Pro and M1 max (Message 105879)
Posted 28 Oct 2021 by Profile Joseph Stateson
Post:
Apple removed even the soft OpenCL support that it had. now it only supports Apple's Metal. no projects use Metal, and I doubt any will in the future.


Adobe had gotten into bed with Apple pulling "metal" covers over themselves. I don't think that "open source" is even in the vocabulary of either of these companies.

Does Premiere Pro support metal?
Starting with the 14.0 release, Premiere Pro and Adobe Media Encoder default to Apple Metal graphics rendering on macOS. This applies to new and existing projects. Apple Metal provides a modern and unified render pipeline for all users on that platform and will be the focus of our development on macOS going forward.
78) Message boards : Questions and problems : Invalid client RPC Password (Message 105876)
Posted 28 Oct 2021 by Profile Joseph Stateson
Post:
Add yourself to the user group boinc and give yourself rw permissions to /var/lib/boinc-client.

I hope I have that right from memory. It has been discussed here in the past.


I know enough about Linux to get into trouble but not always enough get out. Any system I put together 18.04 or 20.2, I always my account "jstateson" to get boinc and the GPU drivers.
When I edit cc_config or want to make changes to app_config I have to use sudo.

Looking at ownership and other properties I don't see any consistency and assume that is just the way Linux works. For the executable binary


jstateson@jysdualxeon1:/usr/bin$ ls -l boinc
-rwxr-xr-x 1 root root 20579552 Oct 22 20:43 boinc


but different here

jstateson@jysdualxeon1:/etc/boinc-client$ ls -l
total 24
-rw-rw-r-- 1 boinc boinc 2792 Oct 22 20:45 cc_config.xml
-rw-rw-r-- 1 boinc boinc  467 Apr 10  2020 cc_config.xml.bu
-rw-r--r-- 1 root  boinc   31 Jun 11 13:39 config.properties
-rw-rw-r-- 1 root  boinc 1498 Oct 11 08:51 global_prefs_override.xml
-rw-r----- 1 root  boinc   11 Oct 11 19:25 gui_rpc_auth.cfg
-rw-r--r-- 1 root  boinc  306 Apr 10  2020 remote_hosts.cfg
jstateson@jysdualxeon1:/etc/boinc-client$ cat config.properties
data_dir=/var/lib/boinc-client


and even the links from the data directory have different ownership than the targets which is confusing

jstateson@jysdualxeon1:/var/lib/boinc-client$ ls -l
total 1136
-rw-r--r--  1 boinc boinc   1569 Oct  8 18:51 account_boinc.bakerlab.org_rosetta.xml
-rw-r--r--  1 boinc boinc   5511 Oct 27 06:59 account_einstein.phys.uwm.edu.xml
-rw-r--r--  1 boinc boinc   2129 Oct  4 22:19 account_milkyway.cs.rpi.edu_milkyway.xml
-rw-r--r--  1 boinc boinc   5502 Oct 27 06:59 account_www.worldcommunitygrid.org.xml
-rw-r--r--  1 boinc boinc   1951 Sep 18  2020 acct_mgr_reply.xml
-rw-r--r--  1 boinc boinc   9200 Sep 18  2020 acct_mgr_request.xml
-rw-r--r--  1 boinc boinc  49812 Oct 22 20:43 all_projects_list.xml
lrwxrwxrwx  1 boinc boinc     34 Apr 10  2020 ca-bundle.crt -> /etc/ssl/certs/ca-certificates.crt
lrwxrwxrwx  1 root  root      31 Oct 11 17:40 cc_config.xml -> /etc/boinc-client/cc_config.xml
-rw-r--r--  1 boinc boinc 209336 Oct 28 08:14 client_state_prev.xml
-rw-r--r--  1 boinc boinc 209446 Oct 28 08:14 client_state.xml
-rw-r--r--  1 boinc boinc   4256 Oct 27 06:53 coproc_info.xml
-rw-r--r--  1 boinc boinc  20217 Oct 28 08:14 daily_xfer_history.xml
-rw-r--r--  1 boinc boinc  12642 Oct 22 20:44 get_current_version.xml
-rw-r--r--  1 boinc boinc   2710 Oct  5 22:53 get_project_config.xml
lrwxrwxrwx  1 root  root      43 Oct 11 17:40 global_prefs_override.xml -> /etc/boinc-client/global_prefs_override.xml
-rw-r--r--  1 boinc boinc   6022 Oct 22 20:03 global_prefs.xml
lrwxrwxrwx  1 root  root      34 Oct 11 17:40 gui_rpc_auth.cfg -> /etc/boinc-client/gui_rpc_auth.cfg


The following two items are obviously different from the rest and also the target ownerships are just the opposite.

lrwxrwxrwx  1 boinc boinc     34 Apr 10  2020 ca-bundle.crt -> /etc/ssl/certs/ca-certificates.crt
lrwxrwxrwx  1 root  root      31 Oct 11 17:40 cc_config.xml -> /etc/boinc-client/cc_config.xml
-rw-rw-r-- 1 boinc boinc 2792 Oct 22 20:45 cc_config.xml
-rw-r--r-- 1 root root 199113 Sep 25 06:41 ca-certificates.crt


This system seems to be working fine Some Linux system the apt-get chooses an old version and on other occasions a suggestion is made (here at Berkeley) to use a repository run by someone who keeps the latest and greatest. I assume they (the repository gurus) know what they are doing.

Anyway, I assume all the above is correct since it works but it seems a mix of too many hands in the pot.

[edit] AFAICT I am not a "member of boinc" and do not even know how to join a boinc group but this system seems to work.
79) Message boards : GPUs : Opencl does not always to report the the correct gpu name to the client (Message 105861)
Posted 27 Oct 2021 by Profile Joseph Stateson
Post:
This is what I get for using really old graphics boards like the S9050 (AMD HD-7950 equivalent with more memory)

Windows 10 correctly reports I have a pair of RX-570 and a single S9050

AMD FirePro S9050
Radeon RX 570 Series
Radeon RX 570 Series


clinfo gets the name wrong, thinks I have three RX-570 BUT all the rest of the clinfo is correct: speed, ram, 1.2 instead of 2.0, etc


C:\ProgramData\BOINC>  clinfo | find /i "board name"
  Board name:                                    Radeon RX 570 Series
  Board name:                                    Radeon RX 570 Series
  Board name:                                    Radeon RX 570 Series

C:\ProgramData\BOINC>


and from event viewer: note the "GPU 2" RX-570 has unusual statistics!
BOINC (7.16.11) gets the same data as clinfo
Apps crunch just fine on all three boards with no error, but the "third RX-570" taking longer as it is a little slower than a "real" RX-570

10			10/26/2021 9:17:17 PM	Starting BOINC client version 7.16.11 for windows_x86_64	
14			10/26/2021 9:17:18 PM	OpenCL: AMD/ATI GPU 0: Radeon RX 570 Series (driver version 2671.3, device version OpenCL 2.0 AMD-APP (2671.3), 4096MB, 4096MB available, 5095 GFLOPS peak)	
15			10/26/2021 9:17:18 PM	OpenCL: AMD/ATI GPU 1: Radeon RX 570 Series (driver version 2671.3, device version OpenCL 2.0 AMD-APP (2671.3), 4096MB, 4096MB available, 5243 GFLOPS peak)	
16			10/26/2021 9:17:18 PM	OpenCL: AMD/ATI GPU 2: Radeon RX 570 Series (driver version 2671.3, device version OpenCL 1.2 AMD-APP (2671.3), 12288MB, 12288MB available, 3226 GFLOPS peak)	


Thought the Boinc developer might want to know that it is not always the Boinc app with the problem, it is the opencl stuff stuff causing the problem. At least here.
80) Message boards : Questions and problems : Want info about the latest client: VS2019? 7.18? (Message 105850)
Posted 24 Oct 2021 by Profile Joseph Stateson
Post:
In theory: The BOINC master branch - the code you see when you first visit GitHub, or when you update a cloned copy, has the most recent accepted code contributions. It should compile and run, but that's not guaranteed: it's treated as 'untrusted', and given an odd version number - 7.17, 7.19, etc.


Which of course means, my 7.19 that I have compiled on my Ubuntu box may have differences in the code from anyone else's. Mine was from code downloaded from git-hub on the second of September. Worth making a note of the date your code is from in order to check whether issues have been resolved since then or even introduced!


I made a mod to 7.16.3 to improve the version number ;<)

root@h110btc:/usr/bin# ./boinc --version
7.16.3 x86_64-pc-linux-gnu Build:2020-02-22T12:14:55
root@h110btc:/usr/bin#

Changes automatically on every build!
81) Message boards : Questions and problems : Want info about the latest client: VS2019? 7.18? (Message 105845)
Posted 24 Oct 2021 by Profile Joseph Stateson
Post:
VS2019 builds OK on my Windows 10 test machine: it downloads and builds the dependency sources as required. Note - that makes it very slow for a first-time build. Make sure it's got a stable internet connection. It gets those sources via a git transfer, so make sure you've got a Git client and a 7-zip decompresser installed before you start.

But: v7.16.20 wasn't - and can't be - built using VS2019. I tried - my uploaded error messages led to https://github.com/BOINC/boinc/issues/4544#issuecomment-935787091


Thanks Richard!

I looked at your install.txt file but did not want to put in Git neither the 2.33.0.2 nor the Tortoise because I have been using the Desktop Git. It does not have a command git function.

However, VS2019 actually has a GIT.EXE but it had to be added to the path. Once I did that I was able to build 7.16.20 (But it is named 7.19.0!)

The additions I had to make to Windows 10x64 (v) 21h1
--> VS2019 community
--> SDK: 22000.194.210911-1543.co_release_svc_prod1_WindowsSDK

In system path the following:
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\TeamFoundation\Team Explorer\Git\cmd


I also made a few other changes before I figured out the git had to be in the path. I am going to undo those change and verify all I needed was the git.exe to be the right one.

The "debug" build took a long time, most of which was git'ing but I managed to build the client

3>Generating Code...
3>boinc_cli_vs2019.vcxproj -> D:\Projects\VSrepository\b_7_16_20\boinc-master\win_build\Build\x64\Debug\boinc.exe
========== Build: 3 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========


I then execute the client and asked for the version:

D:\Projects\VSrepository\b_7_16_20\boinc-master\win_build\Build\x64>dir *.exe /s
 Volume in drive D is DATA
 Volume Serial Number is 7A49-C614

 Directory of D:\Projects\VSrepository\b_7_16_20\boinc-master\win_build\Build\x64\Debug

10/24/2021  02:23 PM         9,728,000 boinc.exe
               1 File(s)      9,728,000 bytes

     Total Files Listed:
               1 File(s)      9,728,000 bytes
               0 Dir(s)  1,801,046,863,872 bytes free

D:\Projects\VSrepository\b_7_16_20\boinc-master\win_build\Build\x64>cd debug

D:\Projects\VSrepository\b_7_16_20\boinc-master\win_build\Build\x64\Debug>boinc.exe --version
7.19.0 windows_x86_64



It seems to me the version should have been 7.17. by adding 1 to the 16

So why is it 7.19?

On an unrelated question, if I found a boinc binary that was 7.17 what build was it taken from?
82) Message boards : Projects : WCG OPNG sans OPN1 (Message 105843)
Posted 24 Oct 2021 by Profile Joseph Stateson
Post:
We had a long discussion on that right at the beginning of OPNG. You can't separate the CPU and GPU work units, so if you want 50, you will get 50.
I think that is total, though they probably won't send you enough GPU work anyway, so you will end up doing the CPU stuff.

It is a waste of everyone's time and resources, but that is the way they do it.
(I do Folding.)


One one of my NVidia systems, I set the venue "no cpu" and "allow nvidia" but have not received any tasks from WCG.
On a linux ATI system that is open for CPU and GPU I get a boatload of CPU tasks and a dinghy load of GPU.

I was guessing there are no NVIdia tasks available but a few ATI. Maybe the problem is linux ATI but no windows NVidia ???

Or is the problem getting the NVidia because I specified no CPU?

I poked around WCG but cannot find a list of applications. Their site is so different from other projects it is difficult to even find my own account. Is there a list of apps or even a server status page?
83) Message boards : Questions and problems : Want info about the latest client: VS2019? 7.18? (Message 105841)
Posted 24 Oct 2021 by Profile Joseph Stateson
Post:
I recently found out that VS2019 can be used to build Boinc. That is a huge jump from VS2013.

Was 7.16.20 built using VS2019?

I noticed there is a 7.18 but it is listed as an Android download. Is it correct to assume there will be a 7.18 release for Windows and Linux?

I just put VS2019 on my system. Is there anything special that needs to be done to build the latest? Not looking for a walkthrough but more of a heads-up.

Thanks for looking!

(Thanks for ujpgrade to VS2019 !!!)
84) Message boards : GPUs : COVID-19 Project Not Sending GPU Tasks - My Config or a Project Issue? (Message 105832)
Posted 22 Oct 2021 by Profile Joseph Stateson
Post:
GPU work is intermittent. Maybe a coincidence but I had the same question myself and just went to their site and poked around looking. In the mean time I picked up a few ATI tasks from them with not having to do anything.

I assume you don't want to use your Vega for crunching, only the other, slower card.
Currently, I have 10 cpu tasks and 2 ati tasks running WCG with 7 other ati tasks read to run. A few minutes ago there were no ati tasks.

My default profile for the above system is nvidia,ati: yes; intel: no; cpu: yes
I only do the covid


Some projects will not send tasks to system with cards that are too slow and cannot complete before the deadline. I do not know if WCG does that.
85) Message boards : Questions and problems : Collatz Conjecture issue for many users (Message 105776)
Posted 16 Oct 2021 by Profile Joseph Stateson
Post:
Richard Haselgrove posted a link to a file on google drive that can fix the problem
https://boinc.berkeley.edu/forum_thread.php?id=14413&postid=105552#105552

put that file "ca-bundle.crt" at

\Program Files\Boinc
or at
"Program Files (x86)\Boinc\

depending on what windows you are running, or where you put the executable. You need to be in elevated mode to be able to replace the old, bad file.

Collatz is a waste of computer time, project good only for points,. IMHO
86) Message boards : Questions and problems : Hibernate requires manager to be restarted on resume. (Message 105772)
Posted 16 Oct 2021 by Profile Joseph Stateson
Post:
How are you hibernating? Did you assign "closing the lid" to hibernate instead of suspend?

I am guessing that the both apps treat a "hibernate" as shutdown. When starting back up the client app runs because it is a service but the manager is not.

Looking at "ap_control.cpp" I see the following case
            switch (got_signal) {
            case SIGHUP:
            case SIGINT:
            case SIGQUIT:
            case SIGKILL:
            case SIGTERM:
            case SIGSTOP
--
"handle_exited_app"
--


Looks like the app just exits no matter what. When Boinc restarts it just attempts to pick up where it left off.

IANE, just taking a guess as there is not much else to do since SETI closed down.

my 2c
87) Message boards : Questions and problems : Asteroids project (Message 105653)
Posted 4 Oct 2021 by Profile Joseph Stateson
Post:
I thought asteroids was an excellent project. If the principal investigator had written up a proposal to work with the near earth orbit program maybe he or she could have gotten a NASA or other type of grant. Just a guess. I was sorry to see SETI go. I would rather have seen SETI get a "hit" than asteroids-at-home get a "hit". That being said, the asteroid hit is much more likely than finding an Alien IMHO.
88) Message boards : Questions and problems : Suspend when non-BOINC CPU question. (Message 105643)
Posted 4 Oct 2021 by Profile Joseph Stateson
Post:
I had a similar problem with DVDFAB transcoding from Blu Ray to mp4. I found I had to exclude boinc from running when DVDFAB was processing or I would have problems. I am not using DVDFAB media server, just the ripper.

Do you have a coprocessor like NVidia CUDA or are you just using the CPUs?
89) Message boards : Questions and problems : HTTP error: Peer certificate cannot be authenticated with given CA certificates (with workaround) (Message 105615)
Posted 3 Oct 2021 by Profile Joseph Stateson
Post:
Ubuntu with 7.16.11 did not have problem with certificate but 7.16.11 on windows 10 did.
Removing the expired certificate "dst ca x3" from windows store had no effect so I put it back into that store.
Removing from ca-bundle.crt worked fine for gpugrid. No warning about the cert but did not get a work unit as none avaialble
90) Message boards : Questions and problems : BOINC reports disk usage as just the space taken by the slots directory. (Message 104089)
Posted 24 Apr 2021 by Profile Joseph Stateson
Post:
Things look suspicious. I checked two projects "properties" and what they report agreed for Einstein but not for WCG.
WCG claimed 978mb but file sizes add up to only 230mb

Project Einstein@Home

This project disk usage 8.20 GB
All projects disk usage 17.15 GB
Allowed to use 1,676.60 GB
BOINC is using 32.15 MB
Free disk space 1,677.12 GB
Total disk space 1,862.89 GB
====from dos dir/s in that einstein project directory====
2329 File(s) 8,803,230,864 bytes

Total Files Listed:
2329 File(s) 8,803,230,864 bytes
2 Dir(s) 1,800,714,088,448 bytes free

D:\ProgramData\Boinc\projects\einstein.phys.uwm.edu>




Project World Community Grid

This project disk usage 978.47 MB
All projects disk usage 17.15 GB
Allowed to use 1,676.60 GB
BOINC is using 32.14 MB
Free disk space 1,677.11 GB
Total disk space 1,862.89 GB
===from dos dir/s in that wcg project directory=====
288 File(s) 230,854,116 bytes

Total Files Listed:
288 File(s) 230,854,116 bytes
2 Dir(s) 1,800,780,283,904 bytes free

D:\ProgramData\Boinc\projects\www.worldcommunitygrid.org>
91) Message boards : Questions and problems : Hardware problems running BOINC finally debugged (Message 104086)
Posted 24 Apr 2021 by Profile Joseph Stateson
Post:
3.3V is on the 24 pin connector and also on any SATA connector.


I reconnected the old power supply to run some tests and the system is working fine as if there was no problem to start with.

I am guessing the 4+4 motherboard connector was not making good contact. The radiator of the CPU cooler presses hard against the wiring as that connector is directly underneath the radiator. Since this was not a modular power supply any contact problem has got to be on mombo or video board. Alternately, stress on the cables can open a solder joint at the connector. Usually smoke shows up when that happens. Right now the cables are all unstressed as the power supply is hanging above the system. System is fully loaded and working fine. I am going to poke the cables around and will use a magnifier to examine the contacts for any discoloration and put it back together.

When I tried to measure "ripple" I got 0.025 volts a/c. I also got the same 0.025 with voltmeter leads hanging loose in the air.
92) Message boards : Questions and problems : Hardware problems running BOINC finally debugged (Message 104076)
Posted 23 Apr 2021 by Profile Joseph Stateson
Post:
I was unable to check the 3.3 volt but I may look at it later. Not sure where to test that voltage on the motherboard, I replaced the Seasonic focus 650 bronze (non modular) with a Seasonic focus 850 platinium and that appears to solve the problem.

I found that when I pulled the pair of gtx1060 and put in a gtx1070-ti the system was unstable even with no CPU tasks running. It crashed within seconds of starting BOINC. This system, even with all cpu';s working %100 and pair of gtx1060 never pulled over 400 watts and was usually good for several hours before rebooting

That old bronze power supply must have a problem even rated at 650. I checked for ripple using A/C voltmeter on the 12 and 5 but there was nothing obvious. Maybe the 3.3 volt had the problem?

I am currently running the 1070-ti with 8 or so cpuj tasks and all seems OK. The voltage shown by CPU-ID is the same value as shown then using that older power supply.

Click to Pimp my rig

[/url]
93) Message boards : Questions and problems : Hardware problems running BOINC finally debugged (Message 104050)
Posted 21 Apr 2021 by Profile Joseph Stateson
Post:
Have a system with 6 core zeon (12 threads), 24gb ram and pair of gtx1060 run running WCG Covid apps that was consistently rebooting. Did not have this problem before with Einstein, Milkyway and WCG. I had swapped out a single RX570 for the pair of 1060s for testing purposes.

I found the problems was due to the eVga motherboard not handling transients and/or poor power supply regulation.

I had 6 CPU tasks suspended, when I resumed all 6 using a single commend from Boinctasks, the system rebooted instantly. I connected the system to a wattmeter and powered it back up. As I resumed each CPU task, one at a time, the wattage jumped by 13 watts then settled down to plus 5 watts. I am guessing that surge was not handled properly by the x5675 power regulator.

I then looked at the pair of gtx 1060 using Tech Power Up's GPU-z. One of the GPUs (on the left side) went immediately into the PerfCap Reason warning: the blue color. The other GPU was ok until the Memory Controller Load went to 55% then that warning kicked in. The warning is that the performance of the gpu is "Liimited by Operating Voltage". I looked at other system I was running and the slot voltage and the 6 pin voltage was consistently 12.1. This system was in the mid too low 11 volts.

hope this helps someone.

94) Message boards : Questions and problems : Boinc refuses to get new work from Primegrid, it says "don't need". (Message 103996)
Posted 17 Apr 2021 by Profile Joseph Stateson
Post:
Newer versions of Boinc managers don't show about the panic mode anymore. It was considered unsettling to the users. Boinc will go to panic mode if required but just doesn't show it.
I use Boinctasks, not Boinc Manager. I don't know how anyone can use that poor quality simple pile of rubbish. In Boinctasks, I see "running" or "running high priority" under the status column. Why would that scare people? I guess they'd also remove the petrol guage from a car in case it makes people nervous. The strange thing is, it only shows it on some of the tasks, and not the ones it should.


Select the tasks and look under properties for anything suspicious. BT has a message dialog box that can get long and many messages are ignorable. Filter on primegrid and look for anything suspicious. I once found a message in that was too long and when I stretched the box out I found a warning about not enough virtual memory that had been hidden.

I have not run primegrid for years and in addition they are not whitelisted in gridcoin the last time I looked. I do recall trying more than one tasks and did not see an improvement worth the effort. However, it is possible for other projects to use the "other half" of the GPU if that project is allowed "halves". It gets more mplicated when set to 0.2 and one gets 3 of one type and the project using 0.5 cannot run and the project set to 0.2 is out of work. But that fixes itself when one of the three finishes.

Reading your comment about petrol reminds me of the MGB I bought new in 1970. The handbook recommended not checking the petrol with a naked torch. I always though that as a reflection on the manufacture not updating the manual but it could have applied to the drivers across the pond.
95) Message boards : Questions and problems : How in the hell do you stop the BOINC client from putting a password in gui_rpc_auth.cfg? (Message 103989)
Posted 17 Apr 2021 by Profile Joseph Stateson
Post:
Alternatively, you could download the source code and build your very own copy - it's all publicly available. You could even remove the generated alert message.


I have now done it on a number of occasions so it isn't that difficult.


I got tired of looking at error messages and did just that. The problem that triggered me to mod the BOINC app was literally 100's of ignorable error messages that obscured the only message that was worth reading.
96) Message boards : Questions and problems : Finding Bottlenecks (Message 103988)
Posted 17 Apr 2021 by Profile Joseph Stateson
Post:
I have been crunching for Rosetta@Home. On one host, real-time and CPU time are almost the same, but real-time is 1.5 to 2 times higher than the CPU time for another host. The host with the real-time longer has a better CPU and faster memory. Both BOINC directories are on SSDs. I cannot think what the bottleneck is.



According to this thread, the real time should be about the same. I do not run Rosetta anymore so I cannot check the validity of that post (dated April fools day) but I did check the real time of some leaders and they were all in the 8 hour range Irregardless of the CPU speed.

[edit] I looked for your computers over at Rosetta but could not find you. I suspect, that if both of your systems are finishing the jobs in about 8 hours then all is ok.
97) Message boards : Questions and problems : host ID not matching ID at boincstats: cannot set resource share (Message 101595)
Posted 16 Nov 2020 by Profile Joseph Stateson
Post:
I asked about this back in august over at boinc stats. I got not response but since things seemed to work OK it was not big deal.

by work OK i mean I can attach, detach projects, etc. We are using 7.16.11. This is my sons account that has the problem. He uses BAM!

His external CPID is fb4efe.... that matches his gridcoin address, the address shown at the BAM! main account page and a "find " shows the XML files in the boinc folder have the correct external cpid. I assume that fb4efe.. is correct.

However, the account manager xml files in that boinc folder show a "host_cpid" that is 90bc86d... I have no idea what that is and I never got a response at the BAM! forum. Is that the external_cpid? if so, it does not match the actual fb4efe... He has only 2 hosts at BAM! one has a CPID of c4fd555... the other's CPID is 2ae7f835. Neither are 90bc8ed.,..

The contents of the acct_mgr_request.xml file and acct_mgr_reply.xml show that %100 is used for all resource shares. This hostid's all match the hostid's at the project. I suspect the value 90bc86d... in those acct_mgr files need to be fb4efe... is that correct?

At the BAM! main account page there is an option to "change cpid" but the cpid they show in the dialog box is the external cpid. I do not want to change that as it takes a long time for gridcoin to sync and start paying and all indications are that the fb4efe.... is correct

I read the release notes for 7.16.11 and noticed the following "Client: if AM reply includes a project we're attached to under a different account, honor the params in the AM reply, e.g resource share"

I am not sure that that means. This system used to be mine. I deleted all my projects and added the BAM! manager to the boinc manager using his account at BAM! His account at bam was used to create project accounts. I looked through all my projects and I never had a "host_cpid" of 90bc86d...

I cannot figure out why he or I cannot set the resource share. We have no problem adding projects or setting other values.
98) Message boards : Questions and problems : possible to use boinccmd --quit when multiple clients are running? (Message 100881)
Posted 28 Sep 2020 by Profile Joseph Stateson
Post:
Read the User Manual for the Boinccmd tool.

The GUI rpc port can be specified as part of the hostname argument:

hostname can be a domain name, an IPv4 address, or an IPv6 address. If the client uses a non-default GUI RPC port, you can specify it as hostname:port, IPv4_addr:port, or [IPv6_addr]:port.



thanks, forgot about "--host hostname:port "

"boinccmd --help" is not the same as RTFM

I was setting up a second client so that gridcoin research could obtain project info for "user2" while the system was crunching for "user1". There was no need for user2 to crunch, just needed login info for gridcoin.

Has anyone ever considered allowing the manager to add the same project with a different username ie: account? Probably no need since multiple clients can run
99) Message boards : Questions and problems : possible to use boinccmd --quit when multiple clients are running? (Message 100878)
Posted 28 Sep 2020 by Profile Joseph Stateson
Post:
I looked at the boinccmd command line arguments (7.16.11) but do not see a gui_rpc_port option.
Is there a way to cause one of the client tasks to gracefully quit using the boinccmd program?

If I use kill task does that cause the client tasks to write a checkpoint and exit?

I assume I can use the manager and then quit the manager causing one of the clients to stop but that is awkward

thanks for looking
100) Message boards : Projects : recaptcha not allowing new accounts but boincstats did it (Message 100858)
Posted 25 Sep 2020 by Profile Joseph Stateson
Post:
OK, seems that this is a known problem and a solution given there.

read a complaint here with no response by management

Tried Firefox, Chrome and the newest Edge got same error. One difference: Chrome did not popup the "select [car] [train] [airplane] [crossroads}" dialog box. I have that Chrome extension PrivacyPass which is supposed to allow skipping that once valid login is done.

When connecting to Cosmology via boincstats I errored out 41 vbox tasks before I got around to looking at the tasks que. Guess I should have looked at that first before trying to create a profile and join a team. vt is enabled and vbox installed. not sure what the problem is but 23 cpus were assigned out of the 24 available.

this shows up on all three browsers, hope it helps someone
Warning: Creating default object from empty value in /home/boincadm/project/html/user/create_profile.php on line 205
101) Message boards : Projects : external_cpid is empty for some projects (Message 100836)
Posted 23 Sep 2020 by Profile Joseph Stateson
Post:
gpugrid and einstein do not filll
<external_cpid></external_cpid>


as a result the client_state.xml file has "missing" cpids.


I suspect they have not interest in fixing this as the problem only shows up on gridcoin apps.

Is there any harm in editing that file and filling in the CPID value as I know what it is.

thanks for looking
102) Message boards : Projects : recaptcha not allowing new accounts but boincstats did it (Message 100835)
Posted 23 Sep 2020 by Profile Joseph Stateson
Post:
Not sure what is going on but I tried creating accounts for one of my kids at amicable, cas and cosmology and the recaptcha failed for edge and chrome. Ublock was disabled so should have worked.

I used boincstats to create his accounts and was able to log in for him at those sites.

Is this a problem with the projects or are chrome and edge misconfigured?

AFACT active x (virus transfer protocol) is enabled on both browsers.
103) Message boards : Questions and problems : BOINC Manager V7.16.5 is issuing "annoying" messages which I would like to suppress (Message 97534)
Posted 12 Apr 2020 by Profile Joseph Stateson
Post:
With a number of "boinc rigs" I got tired of having to scroll through 1000's of annoying message just to find that the ones I wanted to see had already been purged off the 2000 limit that boinc tasks has.

FWIW, I added a message filtering capability to cc_config
<exclude_proj_msg>       <!-- put this in the flags section, not the options section -->
<proj_name></proj_name>  <!-- project name with correct syntax are isted in event messages -->
<msg_type>low</msg_type>
<msg_content></msg_content>  <!-- if enmpty, all are excluded -->
</exclude_proj_msg>


the message that were particularly annoying
<msg_content>settings do not allow fetching tasks</msg_content>
<msg_content>No work available</msg_content>
<msg_content>No work sent</msg_content>
<msg_content>No work is available</msg_content>
<msg_content>Resent lost</msg_content>
<msg_content>resend lost</msg_content>
<msg_content>already reported</msg_content>



Unfortunately, it only works with my app
https://github.com/JStateson/MSboinc
104) Message boards : The Lounge : The Seti is Slumbering Cafe (Message 97297)
Posted 5 Apr 2020 by Profile Joseph Stateson
Post:
Just passed to say hello. I'm still alive.

My alcohol stock are depleting faster than the Seti WU.
At this rate it will last for only about one week only and i can refill it due the prohibition.
Did any one have a good link where i could read about how to produce moonshine in doors?


I picked up a water distiller for use with my Keurig machine as I got tired of cleaning the coffee maker. This was shorty before the corona virus hit. Timing worked out nice as it is hard to find distilled water. About 1/3 of the distillers listed by amazon did spirits and more show up googling "spirit distillers". My brother attended a renaissance fair 20 years ago and got into making mead and has a belly to prove it.

Have a nice day!



"Class 6" store at Ft Sam Houston
105) Message boards : Questions and problems : rosetta@home tasks don't download (Message 96810)
Posted 16 Mar 2020 by Profile Joseph Stateson
Post:
Network managers have lot of tools at their disposal. In addition to executables, the firewall can be programmed to block zips or allow zips but not those with executables inside. Depends on how secure or "locked down" down they want the system to be. I am guessing that there is also overhead (time) expended in examining the data which might cause the sender or receiver to give up.
106) Message boards : The Lounge : Best Mining Motherboards for BOINC? (Message 96704)
Posted 12 Mar 2020 by Profile Joseph Stateson
Post:
Got to looking at a 7 slot PCIe server: dual xeon 1366 for mining. multicore xeon's are considerably cheaper than any comparable gen 6 or gen 7.
https://www.ebay.com/itm/Xyratex-0944037-02-Dual-Socket-1366-Server-System-Motherboard/392343147646?ssPageName=STRK%3AMEBIDX%3AIT&_trksid=p2057872.m2749.l2649

Worked fine with Linux although I did have a problem with the AMD driver but that was due to the GPUs being unusual: "s9050" Windows 7 and 8.x worked fine with those low power HD7950 equivalents.
Currently have 8.1 installed only because I had an unused license.

DDR3 ECC server memory is cheap. The mombo is strictly for mining as no x16 slots and is not ATX or EATX size. Screw holes do not line up even with the mining rack.
Power supply requires an extra 8 pin but otherwise uses standard ATX power some of which have the extra 8pin.
It can do both Gen2 and Gen1 but not Gen3

I looked at inserting a license into the bios but didn't try. If one of my TB85 boards crap out I may get another and try my bios mod.
https://www.bios-mods.com/forum/Thread-xyrantex-two-different-bios-onboard-mombo
107) Message boards : Projects : Can we Cure Corona (Message 96676)
Posted 11 Mar 2020 by Profile Joseph Stateson
Post:
Insane, grocery here is out of Corona beer. How silly can people get!


FOUND PROBLEM!

poteet strawberry festival bought out all the corona beer from local grocery

https://www.mysanantonio.com/food/article/Deal-of-the-day-Pandemic-deal-includes-six-15123089.php
108) Message boards : Projects : Can we Cure Corona (Message 96626)
Posted 10 Mar 2020 by Profile Joseph Stateson
Post:
Insane, grocery here is out of Corona beer. How silly can people get!
109) Message boards : Questions and problems : Highest PPD project on Nvidia GPU outside of Collatz? (Message 96609)
Posted 10 Mar 2020 by Profile Joseph Stateson
Post:
Unaccountably, GPUgrid has work. Starting about 2 weeks ago I noticed my queue was full on al systems. I had set GPUgrid to 100 resource and Einstein to 0 months ago and I rarely saw a GPUgrid. Big change. IMHO GPUGrid pays more GRC coins than other projects.

Currently, I have 18 GPUgrid executing (really need GTX-1060 or higher) and 22 queued. About 2 out of 3 are Linux.

My ATI boards all run Milkyway.
110) Message boards : BOINC client : purpose of XMLs: global_prefs and global_prefs_override on "leave apps in memory" (Message 96313)
Posted 3 Mar 2020 by Profile Joseph Stateson
Post:
Never paid much attention to any of those files. They show up during an install of boinc.
My "global_prefs.xml" is from WCG. Just looked at 4 Linux systems and one windows. They are all from WCG and I assume the rest of my "farm" also have WCG as the default source for parameters in global_prefs.xml. The first project I ever signed up for was SETI so unsure of how WCG got the default.

LHC at home has a warning that it does not checkpoint multi-core apps and recommends they be left in memory when restarting boinc. This applies only to Linux systems, and I was unaware of this until after I restarted boinc. It seems that the tasks resided in memory as the times did not get reset to 0 so I got to looking at why there were not reset.
jstateson@jysdualxeon:/var/lib/boinc$ grep "leave_apps" *.xml
global_prefs_override.xml:   <leave_apps_in_memory>0</leave_apps_in_memory>
global_prefs.xml:  <leave_apps_in_memory>0</leave_apps_in_memory>
global_prefs.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
global_prefs.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
global_prefs.xml:    <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_einstein.phys.uwm.edu.xml:   <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_einstein.phys.uwm.edu.xml:  <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_einstein.phys.uwm.edu.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
sched_request_einstein.phys.uwm.edu.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
sched_request_einstein.phys.uwm.edu.xml:    <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_lhcathome.cern.ch_lhcathome.xml:   <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_lhcathome.cern.ch_lhcathome.xml:  <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_lhcathome.cern.ch_lhcathome.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
sched_request_lhcathome.cern.ch_lhcathome.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
sched_request_lhcathome.cern.ch_lhcathome.xml:    <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_milkyway.cs.rpi.edu_milkyway.xml:   <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_milkyway.cs.rpi.edu_milkyway.xml:  <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_milkyway.cs.rpi.edu_milkyway.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
sched_request_milkyway.cs.rpi.edu_milkyway.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
sched_request_milkyway.cs.rpi.edu_milkyway.xml:    <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_www.worldcommunitygrid.org.xml:   <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_www.worldcommunitygrid.org.xml:  <leave_apps_in_memory>0</leave_apps_in_memory>
sched_request_www.worldcommunitygrid.org.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
sched_request_www.worldcommunitygrid.org.xml:    <leave_apps_in_memory>1</leave_apps_in_memory>
sched_request_www.worldcommunitygrid.org.xml:    <leave_apps_in_memory>0</leave_apps_in_memory>


From the above it looks like the default WCG caused all projects to have the same parameters
(global),generic, home, school, work respectively (0),0,1,1,0 for "leave apps in memory"
I recall going to WCG and setting the 0,1,1,0 so can I assume that caused all the other projects to get set the same way ?
Looking at the "override" which shows "0" and reading this
https://boinc.berkeley.edu/trac/wiki/PrefsOverride
It appears that the override in not applied automatically as an RPC call needs to be made to force it. Is that correct?

What is strange is that the Atlas app was "generic" venue and that was "0" in all of the above so it should have started from scratch but it seems to have been left in memory and picked up where it left off which is OK with me. OTOH, the project may have implemented checkpoint and didn't bother to update the warning.

If the venue is "work" and a "work" app is running on project X, and that venue requires the app be left in memory, I assume that BOINC will only leave that app in memory and not other venues or other project apps.
111) Message boards : BOINC client : Client lost track of how many CPUs were allocated. (Message 96296)
Posted 3 Mar 2020 by Profile Joseph Stateson
Post:
Do not want to repost everything that is over here
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5328#41807

If you add up running tasks you are 7 short of the possible 24. I think the client allocated 8 CPUs for the first "atlas" but that project app only picked up one cpu, not 8. If it has only 1 CPU (not clear from looking at mpstat) then it will take weeks to finish. In any event, there are 7 unused threads and waiting WCG tasks that are unaccountably idle.
below image is from boinctasks but the manager shows exact same thing

112) Message boards : Questions and problems : 200 tasks downloaded for a project with a resource share of 0? (Message 96026)
Posted 24 Feb 2020 by Profile Joseph Stateson
Post:
I think what happens is that when your system contacts the projects it asks for data before doing anything else.

For example: Your system is in venue "default" and you go to the project and you set resource to 0.0 for venue "school" and select "school" for your system.
The next time your system contacts the project it asks for data and gets a boatload and then is told to switch to "school"
Thereafter, the next time it contacts the project its "school" venue says it can have only 1 work unit but only after finishing all tasks.

This has happened to me more times than I can remember. Suggest let a few tasks finish before aborting them all or you might get on the 24 hour blacklist.
113) Message boards : Questions and problems : AMD's Radeon open compute (RocM) has problem with boinc (Message 96003)
Posted 22 Feb 2020 by Profile Joseph Stateson
Post:
yea, next big thing if you buy the best AMD motherboard and GPUs money can buy.. Not going to repost everything from that thread over at Einstein, summery below as applies to linux:

ROCm or rocM or WTF it is called requires you have as many GEN3 PCIe lanes to the GPU as you have GPUs. It is part of all recent amdpro drivers, just needs to be activated to work.

Once installed, if you do not have enough "PCIe atomics" (nice word for gen3) then those GPUs not having a gen3 lane to the CPU disappear for apps like boinc. They will show up using, for example "clinfo", or "sensors" or any diagnostics program. So, like me, if you have 5 GPUs but boinc only sees a single one and all the diagnostics you run indicated there are 4 more GPUs available you starts suspecting that something is wrong with BOINC not realizing the problem is the AMD driver that wants to run only on a superhighway and ignores all the little x1 back roads.
114) Message boards : Questions and problems : AMD's Radeon open compute (RocM) has problem with boinc (Message 95996)
Posted 21 Feb 2020 by Profile Joseph Stateson
Post:
Been running the tests at Einstein using their beta app that relies on the RocM driver. Boinc does not see multiple GPUs, only the one in the X16 slot. Going to post url to the message over at Einstein rather than duplicate it all here. I actually was unaware of this driver prior to seeing the complaint over there that the app was not working.

https://einsteinathome.org/content/clbuildprogramfailure-02mdf-gw-opencl-ati?page=1#comment-175716
115) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95782)
Posted 9 Feb 2020 by Profile Joseph Stateson
Post:
This thread is really off topic. I am declaring it as a troll thread and am providing tools for identify abusers.
My pick for Peter is the "Wank - o - meter". Rest of you can decide what your are
Simplified - Numeric
    TROLL-O-METER
0 1 2 3 [4] 5 6 7 8 9 10 


Simplified - Legend
    T R O L L - O - M A T I C 
PATHETIC ---------+---------INSPIRED 


Simplified - with watchdog
 0  1  2  3  4  5  6  7  8  9 10 
+------------------------------+ 
|**********************        | 
|**********************        | 
+-----­------------------------+ 
|            (o o) 
| -------oOO--(_)--OOo-------- 


Move along folks. Nothing to see here. Just a troll having a seizure.  Show's over. Keep it moving.
TROLL-O-METER 
1 2 3 4 5 6 7 8 9 +10 +20 
|||||||||||||||||-||||||| 

BULLSHIT-O-METER 
1 2 3 4 5 6 7 8 9 +10 +20 
||||||||||||||||­|||||||| 

DUMBASS-O-METER 
1 2 3 4 5 6 7 8 9 +10 +20 
||||||||||||||||­|||||||| 


Try to be a bit more subtle next time. Thanks for playing!
---------------------- 
0-1-2-3-4-5-6-7-8-9-10 
---------------------- 
^

Better, but no bite. I am uninterested in getting into a flame war with you no matter how many personal attacks you make. If you insist on having a fight, go ahead and start without me.
.---­------------------------------­-------. 
[ reeky neighborhood watch Troll-O-Meter] 
[---0---1---2---3-­-4--5--6--7--8---9---] 
[||||||||| ] 
'---­------------------------------­-------'


 Not bad, as trolls go. The Lame-o-meter was off scale on the highest range and the Clue-o-meter wouldn't register. 
0  1  2  3  4  5  6  7  8  9  10 
_________________________________ 
|  |  |  |  |  |  |  |  |  |  | 
---------------------------------
                                ^ 
                                | 


OMG! They said it couldn't be done, but this post rates: 
+--------------------------+ 
|           4 5            |
|        3       6         | 
|     2             7      | 
|   1                 8    | 
|  0         o         9   | 
|  -1      /               | 
|   -2    /                | 
|     -3 /                 | 
+--------------------------+ 
|      Troll-O-Meter       | 
'--------------------------'
Warning, troll-o-meters are susceptible to particularly stupid statements, and may give inaccurate readings under such conditions.  A shrill response generally indicates an effective complaint. 

 
 /_______________________/| 
| TROLL-O-METER(tm)      || 
|                        || 
| .---  .---             || 
|     |     | millitrolls|| 
| .---'     |    _  _    ||
| |         | . | || |   || 
|  ---'     '   `-'`-'   |/
`------------------------'



A bit better, but still not enough to hook me. Better luck on your next cast! 
+----------------------+ +----------------------+ 
|0 1 2 3 4 5 6 7 8 9 10| |0 1 2 3 4 5 6 7 8 9 10| 
| \ TROLL-O-METER      | | WANK-O-METER    /    | 
|  \                   | |                /     | 
|   \                  | |               /      | 
|    \                 | |              /       | 
|     \                | |             /        | 
|      \               | |            /         | 
|       \              | |           /          | 
|Certifie\  next cal:  | |Certified / next cal: | 
| NIST    \ date 12/5  | | NIST    / date 12/5  | 
+----------------------+ +----------------------+ 



*BANG* 

That was the Troll-O-Meter exploding. 
Fell for that one hook, line, and sinker.

                    # 
                   ; 
                  ; 
                       @ 
+----------------+   ,.' 
| .0 .2 .4 .8 1.0|   ,.' 
|              ' | 
|              ` | ''..''..''% 
+------------`    | 
| Troll-O-Met`  (((o 
+-----------`      '., 
                      '., 
             ;            # 
               ; 
                * 
       _ 
 _____|_|_____ 
|   PLEASE!   | 
|-------------| 
| Do NOT Feed | 
|  The Troll  | 
|_____________| 
      | | 
      | | 
  \ \ | | / /

                                            .:\:/:.
         +-------------------+            .:\:\:/:/:.
         |   PLEASE DO NOT   |           :.:\:\:/:/:.:
         |  FEED THE TROLLS  |           :=.' - - '.=:
         |                   |           '=(\ 9 9 /)='
         | Thank you,        |             (  (_)  )
         |     Management    |             /`-vvv-'\
         +-------------------+            /         \
                 | |         @@@         / /|,,,,,|\ \
                 | |         @@@        /_//  /^\  \\_\
   @x@@x@        | |          |/        WW(  (  )  )WW
   \||||/        | |         \|          __\,,\ /,,/__
    \||/         | |          |     jgs (______Y______)
/\/\/\/\/\/\/\/\//\/\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
116) Message boards : Questions and problems : Manager without client (Message 95749)
Posted 8 Feb 2020 by Profile Joseph Stateson
Post:
I am guessing you already tried

sudo apt-get install boinc-manager

If you want the latest boinc version be sure to do the below first

sudo add-apt-repository ppa:costamagnagianfranco/boinc

However, there might be a problem accidently getting the boinc client when doing an update.

I assume the systems you want to control are on the local subnet. You might want to put in a remote desktop program such as\ VNC and use that.
117) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95736)
Posted 7 Feb 2020 by Profile Joseph Stateson
Post:

I can't even service one GPU with that CPU. It's doing MW and Einstein Gamma instead. No matter how many Gravity tasks I put onto one GPU, the CPU can't keep up. I can max out all four cores of the CPU, while the GPU sits at 30%


You mention gravity and also gamma. I quit doing gravity due to low credit and problems with drivers on older boards. With "weak" CPU, the gamma ray pulsar seems to run nicely on ATI boards, not so on NVidia. This was discussed over at Einstein and it seems to be the way the polling is implemented as well as the hardware.

Following is a Linux system with a really cheap 2 core Celeron G1840. Motherboard is Hl61BTC I got down from the attic and put a pair of RX560 and a pair of RX570

The 11 minute 44 sec completion is typical for the RX570, not obvious from the picture, but the RX560 typical is 25 - 29 minutes. Note the CPU usage is at most %22 which works out nicely for the celeron.



On the other hand, the nvidia board uses up a full cpu. Both systems run 18.04

118) Message boards : Questions and problems : boinc.exe Phishing detection (Message 95701)
Posted 5 Feb 2020 by Profile Joseph Stateson
Post:
have same malwarebytes version but not seeing any problem with boinc. what projects are you subscribed to?
119) Message boards : Questions and problems : Preventing BOINC from using GPU (Message 95667)
Posted 4 Feb 2020 by Profile Joseph Stateson
Post:
I would go about in reverse.

There should be a setting in config, where you can (only) use a specific graphics card (or cards).
See:
https://boinc.berkeley.edu/wiki/GPU_computing


I agree, but you should not stop at just a config setting

First, the <exclude_gpu> should be rename to "<use_gpu> so that a "0" or a "1" can be changed. Currently to re-enable use of an excluded GPU the xml code must be commented out or deleted and a "read config" command issued unless you want to reboot.

What could be be done is up to the imagination:

From boinc manager, select a project
Select "properties" of that project
Imagine observing a bunch of check-boxes, one for each GPU and in group boxes named "NVidia" or "ATI" or "Intel". If you uncheck a box [X] boinc is told to put a '0' on the appropriate <use_gpu> and issue a "read config". Would look something like the following
120) Message boards : BOINC client : Concerns on windows development using older VS compilers: they have bugs (Message 95609)
Posted 1 Feb 2020 by Profile Joseph Stateson
Post:
Have been reading about changes to Visual Studio after 2013 at various web sites and how the improvements have helped.

I did try once to compile boinc client under VS2017 but gave up as way too many compiler problems. Clearly the software needs to be compatible with several platforms and a lot of different hardware and the newer compilers deviate a lot from Linux compiler.

Want to mention a problem I had a few days ago when using VS2013 with latest patches. AFAICT this is the latest compiler that works with BOINC.

I was using the VS2013 debugger to step through x64 code as I wanted to see what had been read in from the coproc_info.xml file. I had done a "clean" followed by a build all and the program seems to be working just fine.

When stepping through I noticed I was stepping through grayed out code. That should not have happened. Grayed out code is supposed to be excluded. I stopped the debugger to investigate. The code below the "SIM" was grayed out.
#ifndef SIM
// alert user if any jobs need more RAM than available
// (based on RAM estimate, not measured size)
//
static void check_too_large_jobs() {
    unsigned int i, j;
    double m = gstate.max_available_ram();


According to VS2013 intelligence, the macro "SIM" was set to 1. That was not possible as I was not running a simulation and clearly the code was being executed even though it was grayed out..

I added "#undef SIM" right before the "#ifdef SIM" and that cleared up the grayed out text.
Wanting to figure out what was going on and where SIM got set to "1", I started moving my "#undef SIM" higher up in the source code. When got to the following, the grayed out code returned

#endif
#undef SIM
    if (strlen(host_info.virtualbox_version)) {
        msg_printf(NULL, MSG_INFO,
            "VirtualBox version: %s",
            host_info.virtualbox_version
        );
    } else {
#if defined (_WIN32) && !defined(_WIN64)
        if (!strcmp(get_primary_platform(), "windows_x86_64")) {
            msg_printf(NULL, MSG_USER_ALERT,
                "Can't detect VirtualBox because this is a 32-bit version of BOINC; to fix, please install a 64-bit version."
            );
        }
#endif
    }
}
---
lot of code is here
---
#ifndef SIM
// alert user if any jobs need more RAM than available
// (based on RAM estimate, not measured size)
//
static void check_too_large_jobs() {
    unsigned int i, j;
    double m = gstate.max_available_ram();


I got the grayed out code to disappear (back to normal) after moving the #undef SIM down below the following #endif
Somethng was confusing the compiler. I also noticed it took maybe 10-15 seconds** before the VS2013 "grayed" or "ungrayed" any code. I have never seen delays that long, but this is a huge program.

** it was taking 10-15 seconds depending on how far away the #undef SIM was from the #ifdef SIM problem area.

Want to point out that the program compiles just fine, it is just the VS2013 "inteliisense" that was unable to figure out what to color as gray or not.
The next day I looked at this the problem was gone. VS2013 had been closed, and the system had been rebooted after a Microsoft update to win10.
121) Message boards : BOINC client : 7.16.4 release info? (Message 95531)
Posted 26 Jan 2020 by Profile Joseph Stateson
Post:
Did not see anything about any 7.16

Poking around at GitHub I noticed that changes to source involve updating virtual box from 5 to 6. Think that was just 16.3 => 16.4
I know a full release notice will be made eventually but is there a way I can l check at GitHub to see what has been changed from 0->1->2->3 and then the 4?
122) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 95359)
Posted 20 Jan 2020 by Profile Joseph Stateson
Post:
Verified a solution to the problem that I had guessed

6 GPUs: d0 is 1660Ti (cc7 not supported by Asteroids at home) rest are gtx class (CC 6.0)

d0 was an excluded GPU for Asteroids.

d1..d5 were excluded GPUs for Einstein as it was OK for Einstein to use the gtx1660Ti

Unaccountably, d0 was idle. This is a different problem for a different post.

A gpugrid was running on d1 and in slot 0
There were 4 Asteroids running.

I did not want an idle gpu so I suspended all tasks, went to slot 0 and deleted the checkpoints. This forces gpugrid to start over.

After rebooting, I resumed gpugrid first. That got it d0, the best GPU, and it started up in slot 0 which is where the app was suspended in the first place. Al the other projects resume from suspension just fine and there are no idle gpus and no gpugrid "computation error" since it started from scratch, not where it was left off on a gtx1070.
123) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95349)
Posted 20 Jan 2020 by Profile Joseph Stateson
Post:
circling back to the original discussion. I signed up for Einstein at home and did some PCIe testing.

I had always heard from other users that Einstein was PCIe dependent, to the point that anything less than x16 links caused tasks to run slower. but actual testing on numerous different cards and PCIe lane widths shows that's not true. Einstein is even less PCIe dependent on both the Gamma Ray and Gravity Wave tasks. I saw about 1% PCIe bus use on both types of tasks on just a PCIe 3.0 x1 link, so it's no surprise that you haven't seen a slowdown. In light of this, it looks like SETI actually uses more PCIe bandwidth (at least on the optimized CUDA special App). Maybe in the past with old tasks Einstein used to have more reliance on PCIe, but it does not appear to be the case anymore.

As far as how many cards you can run, you will have to test and find the limiting factor of how many GPUs can be attached before the system will no longer boot. my guess is it will be somewhere between 3-7 GPUs. no way to tell without

The next limit will be CPU resources to support the GPU tasks. you only have a 4c/4t CPU, and a rather old/weak one at that compared to modern chips. luckily Gamma-ray tasks don't seem to mind running on a weak CPU, but you may have bad results with the Gravity wave tasks.

you'll have to test the impact to Milkway though. I'm not going to attach to that one with the machines I have now since it relies so heavily on DP performance and recent Nvidia cards like I have have abysmal DP performance for the cost/power use. I might build a Radeon VII based system in the future for Milkyway though, that's the best bang for buck card on that project.


That gravity wave "2.07" consistently take 100% of CPU on my 4/8t and I had to limit concurrent tasks to 6 (not just 8) and also exclude the Zotac P106-90 card which was OK on SETI but not too useful on Einstein. However, Asteroids at home uses only 0.01 CPU and that seem to work OK on the two slowest GPUs. Currently running 6 of the gravity and 2 Asteroids. Maybe you can comment on this post and bump up my question.

[edit] Should be running 3 Asteroids as there are a total of 9 GPUs. the cpu count should be six of the 1.0 and three of the 0.01 but unaccountably only 2 Asteroids are running. Took a while and scheduling priority went from -1,000 to only -0.29 but I am not running the additional Asteroids. Something in 7.16.3 I think
124) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95320)
Posted 19 Jan 2020 by Profile Joseph Stateson
Post:
Well, there are no publishable pictures of the complete beast. Indeed the complete beast is very boring to look at, just a large box with power, data and cooling connections.
Initially a small system was tested using RTX2080 to give an idea of what feeders were going to be needed. Next tests were with earlier Quadro which left the RTX2080 behind, after six months (and some mods to the cooling) the RTX8000 were installed, and they are a step up again. The trouble with bench marks and specs is they don't always reflect what happens in real life under very high stress.

Being a totally air-cooled system the GPUs were obtained without their fans, etc. blast air at ~4C keeps everything in check.

But we digress.


I dare you to run Boinc on it, just for a day.


Some Russian scientist tried something similar on their own "super beast" It did not go well
125) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95281)
Posted 18 Jan 2020 by Profile Joseph Stateson
Post:
Also, why do people bother mining? I've tried it on GPUs and ASICs, it just is not profitable. The electricity cost is approximately twice the coins you earn.


I have been "mining" since classic SETI but it was not called that back in 1999. Three (?) years ago I quit the Texas A&M club and joined the Gridcoin club. At the time I joined a single GRC was just under a quarter USD as I recall. If it had risen to a full quarter I would have 61,000 * 0.25 = $15,250. Unfortunately it is currently worth less than 1/4 cent I will let you do the math. The conclusion of this exercise is that I get a small return of something more valuable than just mining for "credits".
126) Message boards : Questions and problems : 7.16.3 has idle GPU: which parameter is causing the delay? (Message 95271)
Posted 18 Jan 2020 by Profile Joseph Stateson
Post:
Did not notice the "debug" so I set just the sched_op to 1 not the debug flag.

I think the problem is the project out of work / server busy and a coincidence it happened at this time.

However, the "resetting" of the parameters after requested a "read config" was not expected. why would backoff times be reset to 0?
[edit]

Scheduling priority is back to -1,000.97 as shown by both bonctasks and boinc manager.
BT shows 20 minutes backoff interval for nvidia.. I assume this is all correct as the server has problems. Should have checked their servers before posting. My other systems were crunching SETI just fine but I didn't check to see if they were getting new work.
127) Message boards : Questions and problems : 7.16.3 has idle GPU: which parameter is causing the delay? (Message 95268)
Posted 18 Jan 2020 by Profile Joseph Stateson
Post:
Set <sched_op_debug> and see what you're actually asking for. You need to distinguish between "SETI doesn't have any work available" and "I didn't even ask for any work, available or not".


Issuing "read config" seems to have messed with those parameters. I did not expect to see "0" for priority. I also verified using boinc manager, not just boinctasks

108			1/18/2020 10:29:14 AM	Re-reading cc_config.xml	
---
140	Einstein@Home	1/18/2020 10:29:19 AM	Sending scheduler request: To report completed tasks.	
141	Einstein@Home	1/18/2020 10:29:19 AM	Reporting 1 completed tasks	
142	Einstein@Home	1/18/2020 10:29:19 AM	Not requesting tasks: "no new tasks" requested via Manager	
143	Einstein@Home	1/18/2020 10:29:21 AM	Scheduler request completed	
144	SETI@home	1/18/2020 10:29:47 AM	update requested by user	
145	SETI@home	1/18/2020 10:29:51 AM	Sending scheduler request: Requested by user.	
146	SETI@home	1/18/2020 10:29:51 AM	Requesting new tasks for NVIDIA GPU	
147	SETI@home	1/18/2020 10:30:36 AM	Scheduler request completed: got 0 new tasks

Duration correction factor	1.0000000000
Scheduling priority	0.00
CPU backoff time	--
Backoff Interval	-
NVIDIA backoff time	--
Backoff Interval	-

163	SETI@home	1/18/2020 10:37:52 AM	Sending scheduler request: To fetch work.	
164	SETI@home	1/18/2020 10:37:52 AM	Requesting new tasks for NVIDIA GPU	
165	SETI@home	1/18/2020 10:38:34 AM	Scheduler request completed: got 0 new tasks	


so my theory that getting the priority positive seems a too simple solution to a complex problem.

Not getting any tasks and not a lot of help from the "project properties" toward diagnosing the problem. This might even be a problem with the project servers. I just got a timeout trying to access my account at the site
128) Message boards : Questions and problems : 7.16.3 has idle GPU: which parameter is causing the delay? (Message 95266)
Posted 18 Jan 2020 by Profile Joseph Stateson
Post:
I am trying to reduce my count of Einstein tasks that I accidently downloaded and had set SETI to NNT to concentrate on getting rid of the backlog of Einstein tasks. Due to limited CPU / Threads, the Einstein project is set to maximum of 6 concurrent tasks, one per each of the first 6 GPUS. This leaves 3 GPU idle and there are two threads available.

After about 18 hours I decided to let SETI start downloading but I set the resource to 0 by selecting the "work=0" venue and requesting an update. Unlike my attempt at Einstein, I verified the resource was 0 before allowing more tasks. I am not getting any tasks, three GPUs are idle so I requested another update and nothing happened. I looked at the SETI project properties and am posting some of what I see as I suspect something there is causing the lack of work.
Duration correction factor	1.0000000000
Scheduling priority	-1,012.61
CPU backoff time	--
Backoff Interval	-
NVIDIA backoff time	1/18/2020 9:57:21 AM
Backoff Interval	00:10:00


in the time it took me to write this post (10 minutes?) the above changed as follows:
Duration correction factor	1.0000000000
Scheduling priority	-1,000.97
CPU backoff time	--
Backoff Interval	-
NVIDIA backoff time	1/18/2020 10:12:21 AM
Backoff Interval	00:20:00


I noticed the scheduling priority is slowly get back to a positive number. When that becomes positive will I start getting tasks? What can be done to speed this up assuming my guess is correct?

assuming it increased 10 points in 10 minutes it looks like 1000 / 10 = 100 minutes to wait. Can the priority be set to a value not a huge amount under zero?
129) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95261)
Posted 18 Jan 2020 by Profile Joseph Stateson
Post:

I wonder if you can daisychain the 4 way splitters to get infinite cards?


I suspect you can have an infinite number of "bus id's" but, as suggested by pro digit, if a unique lane must be associated with each "bus id" then there is a limit.

On the other hand, if the driver is smart enough, it could use the same lane for all the traffic to the multiplexer (the 4-in-1) but that is a guess as I have no knowledge of the workings of the multiplexer.

Looking at this and assuming it is not "fake news" one would think that 104 boards on risers would need 104 lanes.
https://videocardz.com/newz/biostar-teasing-motherboard-with-104-usb-risers-support-for-mining
130) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95237)
Posted 17 Jan 2020 by Profile Joseph Stateson
Post:
I finally received the x1 to x16 USB risers. I don't yet have the 4 way version, it's in the post.

I connected an AMD R9 280x via one of the risers to the PCI Express 2.0 x16 slot, and it ran Milkyway or Einstein at full speed (two tasks per GPU). Same full speed when connecting it to the PCI Express 1.0 x1 slot.

I'm not sure it is really only PCI Express 1.0 though. This specs page doesn't state the version for the x1 slots:
https://www.asus.com/Motherboards/P5ND/specifications/
I find it hard to believe they'd use both versions on the same motherboard, I'm just going by what someone wrote above.


I ran SETI on P5K, P5E and P7N using core-2-quad for years and gave most away. I did get one down from the attic that had had a lot of x1 slots and tried risers but the problem was the CPU, not the risers when adding more GPUs and more so on windows.

One test I would like to run but I no longer have socket 775 boards would be run a load test on Einstein to see if the problem is the number of boards:

Using a core 2 duo, run 2 concurrent tasks on 2 boards and compare that to 1 task each on 4 boards. I have been wondering if dedicating a core to a single board with 2 tasks is more efficient than 2 cores allocated to 4 boards.

[edit] Forgot to mention in my earlier post: I bought a second set of 1-16 risers from the same company as the first set. The new purchase came with a warning that the manufacturer had released a number of risers that had the polarity reversed on the capacitors. He included a picture of an incorrect assembly: the shaded top 1/2 at the top of the capacitor was not on the same side as the colored design on the board where it was soldered. This would mean the + and - were reversed. The seller said to return any defective to him for replacement. I went and checked all my risers and all were ok.
131) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95230)
Posted 17 Jan 2020 by Profile Joseph Stateson
Post:
I had mixed results with risers on old motherboards and especially those 4-in-1 risers.

An older X8DTL (1366 socket) required a board in the X16 slot to install Ubuntu 18.04. After installing ubuntu I was able to replace it with a riser. A 4-in-1 riser only showed one board when more than 1 ATI was used so I never got more than 4 boards to work.

my TB-85 (8 slot) worked fine with 8 risers, all gtx1060, ubuntu 18.04. For a "seti wow event" I temporarily added first a gtx1070 and then a 1070Ti. Things quickly went south, probably because of the different mix of boards.

I would see the following about twice a week
"Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU**" 
In addition, the fan sensors frequently reported "ERR" instead of RPM.

I saw that "reboot" message daily when I added the second "extra" board. I tried a 2nd splitter thinking that keeping similar boards on the same splitter would help. I ended up getting an H110BTC that has 12 x1 slots and the 4-in-1s are in the scrap pile.

The TB85 had settings for lane speed and I tried a lot of variations where I set the lane speed to spec 2 for the slot that had the 4-in-1 but eventually left it all as "default" as things got even worse.

The problem I have now with TB-85 and H110BTC are projects like Einstein and GPUgrid that use almost a full CPU while SETI and Milkyway use a small fraction. My gen 6 & 7 CPUs only support 8 threads so there is a problem on the H110BTC as I cannot feed Einstein fast enough and 10-minute work units stretch to 30+ minutes with 9 boards. I solve this by limiting the number of concurrent tasks and reporting fewer GPUs to the project than I have.

** I created a program that shutdown the GPUs and reports using a text message here but I have not had a problem since I quit using those 4-in-1 risers and I have a mix of 1660, 1060, 1070, p102-100, p104-100, p104-90 and all work fine. I had to do this because more often than not, the work units would "time out" and another job was assigned and very quickly I would have 100's of errored out tasks.
132) Message boards : Questions and problems : Move data dir on Ubuntu ? (Message 95168)
Posted 15 Jan 2020 by Profile Joseph Stateson
Post:
You might want to add the "ReadWritePath" and also the "EnvironmentFile" as shown below. Change the paths "/var/lib/boinc" to what you want and move the filles there.
After editing "/lib/systemd/system/boinc-client.service" you will have to run "systemctl daemon-reload"
A discussion of systemctl is here
https://www.digitalocean.com/community/tutorials/how-to-use-systemctl-to-manage-systemd-services-and-units

if something goes wrong use this for debugging
journalctl -xe

I have not moved my files but I have used the environment file at "etc/default/boinc-client" to pass parameters to boinc.
Post if problems and also confirm if you got it working.

[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
Type=simple
ProtectHome=true
PrivateTmp=true
ProtectSystem=strict
ProtectControlGroups=true
ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc 
EnvironmentFile=/etc/default/boinc-client
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle

133) Message boards : The Lounge : The Seti is Slumbering Cafe (Message 95092)
Posted 15 Jan 2020 by Profile Joseph Stateson
Post:
Collatz has always made me feel stupid.


41 valid tasks and an RAC of 114k or so.


How about 320,000 credits every 5 and 1/2 seconds?

http://www.ukboincteam.org.uk/newforum/viewtopic.php?t=6221

The project is good for credit points only and ranks up there with bitcoin utopia. No scientific value what-so-ever but that is just my honest opinion worth about 2c. I did run up a lot of points on it and also on bitcoin utopia but could have been finding solution for medical problems over at WCG or other more useful work. Again, just IMHO but I didn't know better.
134) Message boards : The Lounge : The Seti is Slumbering Cafe (Message 95084)
Posted 15 Jan 2020 by Profile Joseph Stateson
Post:
I not k now what metodoth or program you use to spoofed the GPU count, but i could tell for sure, max concurrent & scheduler works totaly different (not broken) from the previous versions than on the 7.16 Boinc. That is why we not use that with the spoofed client we use. Instead of that we manage the number of active cores/threads with CPU usage.

BTW I will remain at the outrage pub for about 1/2 hour, need to work tomorrow soon, hope that will be enought to satisfy the SETI Gods and bring the servers back to life. Tried to find a virgin here to sacrify at the vulcano and that was impossible.


I made a change to my program as I had been applying the 64 to all projects. I am now using the project app_config and setting the # of gpus depending on the project. Since this system has 9 GPUs then the below just limits the count to 4 instead of 9. Seti still has 64 to get through the off-line time. However, the 4000 limit I use did not get me over the 13+ hours.
root@h110btc:/var/lib/boinc/projects/einstein.phys.uwm.edu# cat app_config.xml
<app_config>
 <app>
  <name>einstein_O2MDF</name>
  <max_concurrent>4</max_concurrent>
 </app>
 <spoofedgpus>4</spoofedgpus>
</app_config>

I set the value in cs_scheduler
    // update hardware info, and write host info
    //
    host_info.get_host_info(false);
    set_ncpus();
    iGPU = (gstate.spoof_gpus == -1) ? 0 : gstate.spoof_gpus;
    if(p->app_configs.spoofedgpus > 0) iGPU = p->app_configs.spoofedgpus;
    host_info.write(mf, !cc_config.suppress_net_info, false, iGPU);
135) Message boards : The Lounge : The Seti is Slumbering Cafe (Message 95080)
Posted 15 Jan 2020 by Profile Joseph Stateson
Post:
My contention is that GPUs should never sit idle, regardless of any perceived debt. Apparently, the software feels otherwise.
I'd be interested to see if you experience anything like this.


Exactly what I have been looking at in the last 2 hours and trying to figure out. I had 4 GPU idle that should have been running Einstein and the other 5 GPUs are running milkyway. This system normally runs SETI and GPUgrid at %100 and Einstein at %0. I added Milkyway at 0 and after a while the Einstein GPUs went idle.

The work count in excess of 64 seem to be "lost work units" and I am guessing that number is not used when checking the GPU count. Both mining systems had a lot of "lost work units": However, I cannot account for something like 300 lost units. I only run Einstein when seti is offline. I clicked on Einstein's "www host schedule log" which duplicate info shown in the event viewer: "...lost tasks..." However, I also saw a strange message "..[CRITCAL] … two instances of the scheduler running.." or something to that wording. I am not running two instances of Boinc. The so-called "schedule" is an Einstein app that (my understanding) arranges to download database items, not just project work units.

There is no reason for the 4 GPUs to be idle. I aborted the Milkyway as I didn't want them stopping Einstein from running. Einstein then started up and, !INCREDIBLY! I got 3 GPUgrid work units. Probably been a week or more since any showed up. 7 of the 9 GPUs are at %100 utilization but I got 2 idle due to the CPU not having enough threads.
136) Message boards : The Lounge : Help Desk Expert? (Message 95076)
Posted 15 Jan 2020 by Profile Joseph Stateson
Post:
Yea, happened to me too here. At DVDFab I posted CUDA BluRay movie "rip" times for various NVidia boards and became their first "Knowledgebase Contributor". Not sure if that was a good idea but I think I am still allowed one backup for each movie I buy.
137) Message boards : The Lounge : The Seti is Slumbering Cafe (Message 95072)
Posted 15 Jan 2020 by Profile Joseph Stateson
Post:
The extended outage allowed me to notice that a 4 core (8 thread) CPU cannot feed 9 GPUs running Einstein. I had to configure for 4 concurrent Einstein and 5 concurrent Milkyway and in addition had to scrap the "64" spoofed GPUs as that got too many Einstein. I had resources set to 0 but got way more than 64 work units. Should have gotten 1 for each GPU but I am looking at 110 on one mining system and 241 on another. Resource on both for Einstein was 0 so something not right.
138) Message boards : Questions and problems : Building client only, fails because of missing libnotify (Message 94650)
Posted 2 Jan 2020 by Profile Joseph Stateson
Post:
Thanks for your answer. I'm building in a , with no packages available.

I'm more concerned by the fact that libnotify is required while I'm trying to not build the manager :)

As mentioned by keith, _autosetup is the key an if errors show up then a problem

There is no configure file. Those 2 lines of code are in configure.ac
running _autosetup uses configure.ac and possibly other files to create "configure"
configure.ac is supposed to have that test for the manager as that is how it is determined if the manager gets created or not.
If you see an error message like "cant find wxwidgets" then you accidently included the manager.

walk-through below
https://boinc.berkeley.edu/forum_thread.php?id=13059&postid=92381#92381

I have no idea what "simili linux-from-scratch" is. Do you have bash or dash or something else. what version?
I myself got caught by a script that behaved differently than what I expected as it used sh instead of bash. Does your configure file have "#! /bin/sh" at the top or something else?

If you have
./configure --disable-server --disable-manager
that test of line 36044 will take the no path and continue on
else it will take the yes path and continue on

IN NO EVENT WILL IT GENERATE A SYNTAX ERROR UNLESS THE OS CANNOT PARSE IT.

suggestion: write a small batch file and run it with those lines of code something like
if test (whatever) = yes; then
echo "found a yes"
fi
If you get a syntax error then problem with whatever is running the script.

also good is "bash -x ./configure" assuming you have bash. else source, else I don't know.

HTH

[EDIT] My opinion is only worth 2c. It used to be worth a lot less but I got promoted to "Help Desk Expert" so maybe the 2c is good.

The above assume you got the source from GitHub after having selected the branch "client 7.16.3" else all bets are off
139) Message boards : Projects : Access Android desktop remotely? (Message 94646)
Posted 1 Jan 2020 by Profile Joseph Stateson
Post:
You can use add-ons such as BOINCTasks Mobile (requires Windows) and AndroBOINC



1) Was not aware of AndroBOINC. Went there and looked around and found screenshots
https://code.google.com/archive/p/androboinc/wikis/ScreenShots.wiki

they will not display even with JavaScript enabled on Google's chrome. Edge requires a policy change to run JavaScript and I suspect the problem is something else. /svn/www/projects.png is not a valid URL, it is folder. I tried the export to GitHub but that failed.

Where can I see screen shots? I do not have android devices.

2) I use splashtop to access the windows system running Boinctasks. Probably not much different than using mobile boinctasks on iPad under safari except the tiny screen on the iPhone requires zoom and panning. $10 a year gets me remote access from anywhere, not just the subnet.
140) Message boards : BOINC client : Support for Visual Studio versions newer than 2013? (Message 94640)
Posted 1 Jan 2020 by Profile Joseph Stateson
Post:
I have been building the client using VS2013 and also the latest Linux gcc (GitHub)
Recently was able to build the milkyway app for windows on Linux using mingw cross compiler (githhub)
Also built TBar's "special seti" source on Linux using latest gcc and CUDA libs (found zip at forum)

Was looking at building a windows version of that seti app and found a problem:

1>CUDACOMPILE : nvcc warning : nvcc support for Microsoft Visual Studio 2013 and earlier has been deprecated and is no longer being maintained
1>  support for this version of Microsoft Visual Studio has been deprecated! Only the versions between 2015 and 2019 (inclusive) are supported!


This is not a problem for the client as it does not run any CUDA code. However, the app clearly needs to be built with CUDA and I am guessing the newer libraries from NVidia might not be linkable with object code build by VS2013. That seti app uses source code from the client, especially include files and using VS2017 will require mods to the sources. I can try building the seti app using VS2017 and was wondering if there is any active work in making the client compatible with VS2015 or later? Possibly the seti app is best built with the mingw cross compiler instead of any MS product.
141) Message boards : Questions and problems : Data breach notification on Boinc.berkley.edu? (Message 94626)
Posted 31 Dec 2019 by Profile Joseph Stateson
Post:
I have never seen that pic nor was I even aware of this capability. I do use chrome for first visits or searching as I have chrome locked down. My other browser is Edge. I don't like it but it does work better on forms mainly because I keep chrome on tight leash. Sometimes chrome wont even show a required "captcha" popup because I loaded it with so many blocking extensions.

My normal desktop "office" system has McAfee via Dell and I pay for subscription to McAfee. OTH my surface pro has only windows 10 plus I do pay for Malware Bytes premium. One thing I noticed on the surface pro. If I browse to Seti@home and select "Number Crunching' and then the most popular thread "server panic" I ALWAYS get a warning that a trojan was found. Some site in u.nu had or has a trojan or is well known for poor security and is on Malwarebytes list. McAfee shows no problem, but who knows? The following is a screen grab from my SP4. BTW SETI has a "server panic" so often they start a new thread as the messages are too long. Currently # 118 If you read the message behind the "trojan warning" you can understand why they constantly have panics: a 20,000 WU cache size and any # of gpus you want (as long as you are a member of the club).



[edit] Thanks for letting me correct this post.
142) Message boards : Questions and problems : Big-little configuration and Boinc setup (Message 94625)
Posted 31 Dec 2019 by Profile Joseph Stateson
Post:
Thanks for posting this. I was unware of the big little terminology and went and read up on it here
https://en.wikipedia.org/wiki/ARM_big.LITTLE

My take: Intel extends battery life by reducing the clock speed when cpu not being used much.
ARM has the potential of switching to a core that has fewer transistors in addition to reducing the clock speed.
However, the OS has to implement the strategy and the applications needs to be tailored.
The article indicates that if one app needs a big core than all switch and vice-versa but better operating systems and better tailored apps can be more efficient.

I remember running a boinc app on a blackberry, forget what the Android version was but it made for a really good hand warmer when crunching.
143) Message boards : Questions and problems : iPhone credit app (Message 94616)
Posted 31 Dec 2019 by Profile Joseph Stateson
Post:
Just realized something is missing. Picture is screen grab from iPhone X and looks like there is more info under the "Average Credit .." Does anyone know what is there or if there is something to click on like to go to the project forum?

144) Message boards : Questions and problems : iPhone credit app (Message 94605)
Posted 31 Dec 2019 by Profile Joseph Stateson
Post:
Installed it. Nice that it has a link to the forum here.

Some possibilities as the source code is on GitHub:

Forum link for each project
Something like a stock ticker showing rise or drop for each project
Need eyeball not "X" on the password line plus should prefill email once first email is entered

Have never developed for iOS. Do not even know if apple has open source tools like gnu C or not.
145) Message boards : Questions and problems : "Phantom" GPU devices showing up in 7.16.3 and 441.66 again (Message 94601)
Posted 30 Dec 2019 by Profile Joseph Stateson
Post:
Ran some more tests after talking with Dell and it turned out the fan was not the problem. The NVidia board is running the fan at %100 which is ruining my hearing as well as the fan.

Just removed the "read only" coproc file and started boinc and it wrote out a good coproc_info.xml file that actually matched the one I had edited.

The board arrangement is the same. Maybe it needed another reboot for the "cleaner" to work.

Turned out the "basic" warranty (have 40 days left) covers the video board so they wanted proof so I took a lot of pictures. GPUz was helpful as it showed 5000 rpm and "no load" on the bad board and 1100 rpm on the good one also at no load. It also shows the history which is as good as a video.

I think an issue should be brought up about that coproc_info file. The detect GPU should never write out identical GPUs as the same address. If boinc has no control over the program doing the writing (which I suspect) then for sure when the client reads in the info file to see what is there it should ignore duplicates at the same bus address. Unfortunately, the ATI behavior is different.

https://stateson.net/images/coproc_normal.png
146) Message boards : Questions and problems : "Phantom" GPU devices showing up in 7.16.3 and 441.66 again (Message 94594)
Posted 30 Dec 2019 by Profile Joseph Stateson
Post:
Went back to feb 2019 and got the AMD RX-570 zipped coproc_info that I had provided earlier in the year when the problem first arose..

There is a difference, although both coproc info files have an extra pair of GPUs, the arrangement is not the same as nvidia. In this case I deleted the last two sections before making the file read-only.

	device_num, device_index
OCLati0		0	0
OCLati1		1	1
OCLati2		2	0
OCLati3		3	1



C:\Users\josep\Desktop\debug coproc>fc OCLat0.txt OCLat1.txt
Comparing files OCLat0.txt and OCLAT1.TXT
***** OCLat0.txt
      <opencl_driver_version>2766.5</opencl_driver_version>
      <device_num>0</device_num>
      <peak_flops>5095424000000.000000</peak_flops>
***** OCLAT1.TXT
      <opencl_driver_version>2766.5</opencl_driver_version>
      <device_num>1</device_num>
      <peak_flops>5095424000000.000000</peak_flops>
*****

***** OCLat0.txt
      <opencl_available_ram>4294967296.000000</opencl_available_ram>
      <opencl_device_index>0</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
***** OCLAT1.TXT
      <opencl_available_ram>4294967296.000000</opencl_available_ram>
      <opencl_device_index>1</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
*****


The nvidia coprioc info lists 2 CUDA devices so if more than 2 OpenCL device then a clue there is a problem. There is no count of actual cards nor do any of the OpenCL have duplicate sections so the ATI problem I harder to solve if just analyzing the file.
147) Message boards : Questions and problems : "Phantom" GPU devices showing up in 7.16.3 and 441.66 again (Message 94593)
Posted 30 Dec 2019 by Profile Joseph Stateson
Post:
I know how it happened and what can be done to fix it but not why.

How: Had to replace blower fan on one of two boards on my office desktop, long story, but ended up with the two boards back in but the slots were reversed. Installed 441 after Microsoft put in 3xx as it seems reversing the PCIe slots confuses windows.

Boinc showed 2 CUDA and 4 OpenCL devices with the pair of extra "phantom" GPU's attempting to crunch. Revo Uninstaller, clean install of 441 did not solve the problem. The Revo showed a mix of 339 and 441 but the clean install should have worked.

Looked at the coproc_info xml file
header
cuda0
cuda1  
opencl  num,index
OCLnv0  ===> 0,0
OCLnv1  ===> 0,0
OCLnv2  ===> 1,1
OCLnv3  ===> 1,1


C:\Users\josep\Desktop\debug coproc>fc OCLnv0.txt OCLnv1.txt
Comparing files OCLnv0.txt and OCLnv1.TXT
FC: no differences encountered


C:\Users\josep\Desktop\debug coproc>fc OCLnv2.txt OCLnv3.txt
Comparing files OCLnv2.txt and OCLnv3.TXT
FC: no differences encountered


C:\Users\josep\Desktop\debug coproc>fc OCLnv1.txt OCLnv3.txt
Comparing files OCLnv1.txt and OCLnv3.TXT
***** OCLnv1.txt
      <opencl_driver_version>441.66</opencl_driver_version>
      <device_num>0</device_num>
      <peak_flops>8186112000000.000000</peak_flops>
***** OCLnv3.TXT
      <opencl_driver_version>441.66</opencl_driver_version>
      <device_num>1</device_num>
      <peak_flops>8186112000000.000000</peak_flops>
*****

***** OCLnv1.txt
      <opencl_available_ram>3726508031.000000</opencl_available_ram>
      <opencl_device_index>0</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
***** OCLnv3.TXT
      <opencl_available_ram>3726508031.000000</opencl_available_ram>
      <opencl_device_index>1</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
*****


The gpu detect program wrote out duplicate entries for the same GPU. My fix was to delete the OCnv1 and OCnv3 and set the attributes of the coproc_info.xml file to read only.

Suggestion: The program that writes out that file should check for duplicates. Alternately, the program that reads it in should do a check.

other thoughts: clean uninstall should have worked. possibly I should have disconnected the ethernet to prevent windows from re-downloading the same 339 (?) driver. I was instructed to reboot several times to removed 441 and 339 stuff. Since I was busy with replacing the fan I may not have responded in time to continue the uninstall.
148) Message boards : Questions and problems : problem setting up anonymous platform - need help (Message 94583)
Posted 28 Dec 2019 by Profile Joseph Stateson
Post:
Oince I put
<dont_check_file_sizes>1</dont_check_file_sizes>

into the cc_config.xml then there was no urgency for an anonymous platform and I deleted the app_info.xml

From memory I think I had the following
<app_info>
<app>
<name>milkyway</name>
</app>
<file_info>
<name>ati_milkyway_separation.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway</app_name>
<version_num>1.46</version_num>
<platform>windows_x86_64</platform>
<plan_class>opencl_ati_101</plan_class>
</app_version>
</app_info>


I put the above (or something like it) together after looking at
https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3987 and comparing it to the one that Tbar released for SETI.

starting with the first <app> above the first "101" and I changed nvidia to ati
the "factory" app I have been using is milkyway_1.46_windows_x86_64__opencl_ati_101
so I guessed and broke that down in to the 4 parts name, ver, platform, class

In order to try the above xml I will have to run down my WU count from 850 to zero, set resources to "0" and exclude all but 1 gpu and only process 1 work unit at a time else I might dump a lot of good workunits due to my misconfiguration of the anonymous platform. However, I got plenty of free time and can try getting it to work.
149) Message boards : Questions and problems : problem setting up anonymous platform - need help (Message 94567)
Posted 28 Dec 2019 by Profile Joseph Stateson
Post:
I am attempting to create an anonymous platform at Milkyway.and
am using the Boinc documentation here

https://boinc.berkeley.edu/wiki/Anonymous_platform

Looking a the example's first 8 lines is confusing:
<app_info>
<app>
    <name>setiathome_enhanced</name>
</app>
<file_info>
    <name>setiathome_6.6_windows_intelx86.exe</name>
    <executable/>
</file_info>


"setathome_enhanced" is not really "app name". When actually coding up the app_info file one
would use "setiathome_v8" if you wanted to acquire v8 data

"setiathome_6.6_windows_intelx86.exe" is the name of the executable that is to be used to process the data.
It may or may not have the same name as an existing program at SETI.

Is the above analysis correct?

Here is the problem at Milkyway:

My app_info.xml file is recognized as an anonymous platform by milkyway
Milkyway@Home	12/27/2019 11:01:18 PM	Found app_info.xml; using anonymous platform	


I created an app that I call "test". If I rename "test.exe" to have the same name as an existing Milkyway app
" milkyway_1.46_windows_x86_64__opencl_ati_101.exe" then when the boinc client starts there is the
message "milkyway_1.46_windows_x86_64__opencl_ati_101" has size X bytes but only Y bytes was expected
That executable is then deleted and Milkyway downloads the 1.46 version to replace my app.

If just put "test.exe" at <file_info> then there is no message about the wrong length,
but test.exe is deleted anyway and nothing is downloaded.

Does not look like they support anonymous platform or I am doing something wrong.
I have not yet figured out what goes where "setiathome_v8" is.
I tried "Milkway@home separation" and have been guessing but have not
hit the right name that they call their data yet.

[edit] the download can be stopped using cc_config "don't check size" but tjhere is not reason to delete the test.exe app
150) Message boards : Questions and problems : Server "msg_to_host" out of control on some projects + how it works? (Message 94551)
Posted 25 Dec 2019 by Profile Joseph Stateson
Post:
Fixed and tested under Ubuntu and Win10

cc_config.xml looks like this:
<exclude_proj_msg>
<proj_name>Einstein@Home</proj_name>
<msg_type></msg_type>
<msg_content>no longer needed</msg_content>
</exclude_proj_msg>

<exclude_proj_msg>
<proj_name>GPUGRID</proj_name>
<msg_type></msg_type>
<msg_content>No tasks</msg_content>
</exclude_proj_msg>

<exclude_proj_msg>
<proj_name>GPUGRID</proj_name>
<msg_type></msg_type>
<msg_content>no tasks available</msg_content>
</exclude_proj_msg>


I print one messages up and never print anymore.
12/25/2019 1:08:50 AM	Not showing project messsage from Einstein@Home of type "ALL" with content "no longer needed"	
12/25/2019 1:08:50 AM	Not showing project messsage from GPUGRID of type "ALL" with content "No tasks"	
12/25/2019 1:08:50 AM	Not showing project messsage from GPUGRID of type "ALL" with content "no tasks available"	
12/25/2019 1:08:50 AM	Config: use all coprocessors	
...
12/25/2019 1:23:46 AM	For project GPUGRID,  excluded this message: "No tasks sent" of priority "low"	
12/25/2019 1:23:46 AM	For project GPUGRID,  excluded this message: "Project has no tasks available" of priority "low"	

151) Message boards : Questions and problems : Server "msg_to_host" out of control on some projects + how it works? (Message 94525)
Posted 24 Dec 2019 by Profile Joseph Stateson
Post:
I am getting 1000's of messages from Einstein and 100's from GPUgrid. The event log only holds 2000 and right now the log starts at 2081 and ends at 4081.

My guess is 1,500 of " Einstein … boinc will delete file xxx (no longer needed)"
and about 500 from GPUgrid telling me in no uncertain terms there is no project work available.

What is missing from the event log? All the debugging stuff I wanted to look at to spot a problem that was in the first 2000 or so messages I cannot see anymore.

Looking at module sched_locality I see
        sprintf(buf, "BOINC will delete file %s (no longer needed)", fi.name);
        g_reply->insert_message(buf, "low");

That is server code, not client, but there is a filter: "low" and also the project "Einstein"

The client handles those message as
    // show messages from server
    //
    bool got_notice = false;
    for (i=0; i<sr.messages.size(); i++) {
        USER_MESSAGE& um = sr.messages[i];
        int prio = MSG_INFO;
        if (!strcmp(um.priority.c_str(), "notice")) {
            prio = MSG_SCHEDULER_ALERT;
            got_notice = true;
        }
        msg_printf(project, prio, "%s", um.message.c_str());
    }


The only msg filtering is to send the notices to the notice dialog box and let the event log show all the rest ie: "low". (I did not see a "high" but there might be one)

A) Feature request: This message handler should do some filtering.

Some ideas

1. Have project put additional classes in the message such as "ignorable" in addition to "low" and "high" assuming that exists. The client can have a debug flag <show_ignorable>1</show_ignorable> , etc. I suspect the projects have a lot of inertia and this wont happen

2. Mod the client: Count number of similar message such as "Einstein" + "no longer needed" and only display the first message and if a subsequent message from Einstein changes, then print the total "CNT" of similar massage along with the new message. This may or may not work with GPUgrid as they display 5 lines of messages all stating what projects have no work available. A filter such "low" + "gpugrid" would work

If staff thinks any of this is useful I can code it (#2) up and present it as a fix to an "issue" Otherwise I plan to drop any "low" on my "special mod" which will be put on GitHub.

B) Is there sample code that shows how to send these messages to the client? I looked at "boinccmd --help" and did not see a "send message to client:" The reason I bring this up is that I have a python script that reads temperatures from the GPU and the CPU and if I receive the NVidia driver request "lost GPU: please reboot system" it would be nice to send that message and have it show up either at BM or BT. Currently I issue an order to stop all GPU work using boinccmd and send a text message to my phone. This is Linux, not windows of course.
152) Message boards : Questions and problems : Unable to revert to "legacy" driver (Message 94470)
Posted 22 Dec 2019 by Profile Joseph Stateson
Post:
At the "normal" download site, https://www.nvidia.com/Download/index.aspx the only types listed are GRD and Studio.
On that page, scroll down to "Beta and Older Drivers". That's a link, although it doesn't look like it. Click through, and you have a better search tool.


Yea, I saw that, but it looks like 441.66 is working so going to stick with it. I had read your suggestion on the n00bish thread and went and found that old driver thinking that would solve the problem of the second GPU not getting work.

At some point, not sure when or why, my NVidia system ended up with "phantom" OpenCL devices. This is unique to OpenCL and I have seen this before on ATI video board. I think we discussed this some time ago. It was fixed by originally editing the clinfo xml file to remove the extra devices and then marking it read only so boinc did not try to re-create it. I have not seen this problem on ATI board ever since better drives came out for RX500 class. This is the first time since maybe a year ago.

I just checked BT's "long term history" file and the problem of 1-2 hour Einstein goes back a week. GW's take at most 30 minutes and the 1k-nvidia are normally 12-15 minutes not 2 hours. There are no hour long jobs on any of my NVidia boards. Whatever caused the problem is gone. Either the upgrade to 7.16.3 fixed it or the RevoUninstaller cleaned out an NVidia mess.

I know other project are reporting problems with the newer driver, could it be related to this? The symptom here is it simply takes way to long to complete a tasks with the GPU running at 12.5 watts and averaging %1 utilization but the result eventually validates. I think other projects are getting wrong answers which is not happening here but I will keep an eye on it.
153) Message boards : Questions and problems : Unable to revert to "legacy" driver (Message 94463)
Posted 22 Dec 2019 by Profile Joseph Stateson
Post:
If you do a full search for Beta / older drivers at NVidia.com, and specify Windows 10, you get a new search term box for "Windows Driver Type:", with a choice of Standard or DCH.

Driver 431.60 is available for download in DCH format.


Looks like my collection of previously working drivers is not usable.

This is unreal. At the "normal" download site, https://www.nvidia.com/Download/index.aspx the only types listed are GRD and Studio. Clicking on "search" for gtx1070ti shows on 441.66 WHQL There is nothing mentioned about DCH. However, when I click on download the name is "-dch-whql.exe" so the default is dch.

I have noticed for some time that the GeForce Experience wants me to use it to "keep drivers up to date". I have been avoiding installing Experience in the past as there no games to monitor plus they want me to log in and in addition verify a captcha. The last couple of upgrades I have done the Experience checkbox was grayed out which causes it to go in.
154) Message boards : Questions and problems : Unable to revert to "legacy" driver (Message 94460)
Posted 22 Dec 2019 by Profile Joseph Stateson
Post:
Ran out of GPUGRID tasks and system brought in Einstein. GPUGRID uses CUDA and Einstein uses OpenCL. Those CUDA tasks finished ok but one of the Einstein was taking way too long indicating a problem with the 2nd gtx1070 board. GPUz said the 2nd board was used between %0 to %2 (ie: no usage)

I restarted BOINC so I could look at the startup and noticed 2 CUDA but 4 OpenCL. This is identical to a problem I had with ATI board some time ago.

Nvidia driver was 441.66 and I was running the stock 7.14.2

I tried that 431.60, a known good driver but WTF !!!



My system now requires the DCH class of drivers. So, in addttion to Studio and Game Ready, there is now a DCH version!
I used RevoUninstallerPro to get rid of all NVidia and at same time I put in 7.16.3. One of those or maybe both worked. Windows upgraded the driver for me to 441.66 which is what I had earlier and I noticed the "ghost" pair of GTX1070 are no longer showing up. I have 2 CUDA and 2 OpenCL as it normal, Einstein is working fine and both boards show running warm and usage %90 as is normal.

I am guessing that last Tuesday's Microsoft feature update started required DCH drives but didn't put one in. The other thing that is strange is that I did not have to go and download a driver. Windows put in 441.66 w/o me asking to look for a driver. I assume it found it on the disk and used that one.
155) Message boards : Questions and problems : view multiple hosts (Message 94444)
Posted 22 Dec 2019 by Profile Joseph Stateson
Post:
There are 3rd party apps to do that and they are listed under "add -ons" on the main Compute for Science page. I have been using boinctask since it was available. It can show temperatures on the remote systems if the companion TThrottle is install on the remotes.
156) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94377)
Posted 18 Dec 2019 by Profile Joseph Stateson
Post:
Found solution

One of my boards overheated and I needed to reboot and issue new fan speed settings as I had forgotten to do that when I powered it up earlier. The board overheating was the one vnc was using so unable to use vnc to run speed settings. Speed settings required a $DISPLAY and cannot be done from putty.

There was one GPU grid task left over from the four I had and it had about an hour left. It was D4 so I excluded do..d3 and d5..d8 in cc_config and rebooted

GPUGRID got d4 as all others were excluded. This gave me the idea:

if a pair of RTX-2070 and as single gtx1060 then use one boinc service for the pair of RTX and another service for the gtx. Since the client sees all the boards, then the exclude is used to deny access to the other services boards.

It is a PITA to set up multiple clients with the existing boinc windows and Linux version. However, a script can be created to simplify the procedure. I actually have a script I tested on Milkyway that worked fine,. It split my 6 GPUs into a pair of 3 each which allowed me to obtain the project max of 900 work units for each client. I have no need to get 1800 work units, it was just a test and I am back to 900 for all 6 GPUs. My script was simplified as I as able to use my special boinc client "mod" to supply a different hostname to the client which get a unique project id for the new host. Without that option the script would be much more complicated. If anyone is truly interested I can put a script together and submit it as 3rd party but I would need the feature of setting the hostname. This was discussed in issue 3337 and marked as to-be-determined.

Different GPUs in a system is not that common among regular users and a professional gridcoin miner would have all identical boards per system. The GPUGrid project could modify their code to "start over". They already know the board is different as a message was printed to that effect. This whole discussion is storm in a teacup .
157) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94366)
Posted 18 Dec 2019 by Profile Joseph Stateson
Post:
No: the server does not know that you have GPUs of different specifications. Look in the file 'sched_request_www.gpugrid.net.xml' - that's the only way that GPUGrid (or any project) gets information about our machines. The file contains all the data you see on the website (and then some), but it goes on to say

<coproc_cuda>
   <count>2</count>
   <name>GeForce GTX 970</name>
- and that's from one of my GTX 970 + GTX 750 Ti combos. There's no reference to the lesser card at all.


Then a module in the gpugrid project folder decides to use s_52 for the lower class and s_60 for the better.

I suspect if it simply picked the s_52 then the problem goes away since ether co-processor can process s_52

This could be suggested to the project.

It would be nice to prove this was the case.
158) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94363)
Posted 18 Dec 2019 by Profile Joseph Stateson
Post:

I think it's more like

d0 starts working on task 1
d1 starts working on task 2
d0 finishes task, starts working on task 3
task 1 reports, and everything shifts up one: d1 is working on task 1, d0 is working on task 2

I do not follow this. I would think that if task 1 completed then it is gone and done with and d1 would get task 3,4,5 etc
however, if a "task" is considered a class such as task 1 is s_52 and task 2 is s_60 then there are only "2 tasks"
There is no task 3, just data 1,2, 3 ,etc for only the two classes
task 1 reports and another s_52 arrives and starts running
so far:
 d0 is on s_60 
 d1 is on s_52
 reboot occurs  T1 asks for a device before T2 does
s_52 gets d0 as d0 is top of list ==>fail 
s_60 gets d1 as d1 is next in line ==>fail



restart
d0 is allocated to the new task 1
d1 is allocated to task 2

- which is a swap from before the restart. I think it's a simple 'first come, first served' for each device, each task.


Another possibility I was thinking of was one of Keith special SETI cuda jobs come in and after 60 minutes d0 is switched to it
If the resource for GPUGRID is 0 and SETI is 100 (likely for keith) then I believe GPUGRID will run to completion and the 60
minute time slice does not apply. I run a lot of backup projects, normally Einstein and I have never seen them give up their time to a higher priority when they are zero. I see time slicing at 60 minutes when both projects are %50 or thereabouts. However, I might not have noticed a 0-100 exchange so cannot be %100 sure

[EDIT] The whole thread was moved, not just the part that deviated. Anyway, I am glad that I did not get a private message for each of the "moved" messages in the thread like what happened to me in SETI recently.

[EDIT-2]. If indeed, the data is downloaded as s_52 and s_60 then the problem could be fixed by only sending the lower class as the better device will be able to handle any of the lower classes.

Question: How does the project know there is more than one type of GPU? If the scheduler request identifies what is available then that accounts for different classes being sent. In that case the GPU identification could be "faked" to indicate that all the GPU were lower class and all would get s_52 instead of a mix. That could easily be done as the SETI people already fake the number of GPUs and all that is necessary is force all the identities to be the weaker GPU. just a guess and it would only work if the checkpoint file contains science data only and not unique gpu parameters.
159) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94345)
Posted 18 Dec 2019 by Profile Joseph Stateson
Post:
While the thread is still around, I will brag: I got 4 gpugrid tasks running on my SETI mning machine. SETI has been out of tasks for hours and I lucked out and snagged a few.
160) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94333)
Posted 17 Dec 2019 by Profile Joseph Stateson
Post:
Sadly, no go. Found an error message:
ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device!



Does this scenario describe what is happening:
Premise: at start  d0 is faster than d1 and boinc assigns faster GPU first 
d0 using long-data-0 in slot-0
d1 using short-data-1 in slot-1 and has a short deadline
d1 finished first as data-1 is simple
d1 working on short-data-2 in slot-1 and also has a short deadline

====tasks suspended and system reboots===

on startup tasks resumed are data0 and data2
data0 is in slot 0
data2 is in slot 1
so far no problem
boinc looks at priorities to decide which to run first: the short tasks have short deadlines
boinc choses faster gpu for the short deadline and "d0" starts working on data-2 in slot-1 which
is a GPU mismatch, not a slot or data mismatch


just a guess, trying to figure out what has happened.
I have 3 systems set up to get gpugrid but not a single tasks has shown up in days, so I am just speculating. Even it I did get some tasks in I would have to move a gtx1060 into a system with a gtx1070 to get a mismatch and I have had bad experiences moving boards needlessly.

[edit] This thread is so far off the original subject that Keith should request the moderator to move just about everything to a new thread "GPUgrid not always resuming tasks correctly" or something like that and put that into "projects"
161) Message boards : GPUs : Linux only uses Nvidia GPU, not Intel IGP? (Message 94283)
Posted 15 Dec 2019 by Profile Joseph Stateson
Post:
Is the Intel Atom N3650 supported for IGP?



I don't think it does. Looking here
https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors

There is no mention of the 3650 and OpenCL or GL.

However, I did see my old bay trail n2808 listed.

3 years I was experimenting with a Liva-X and attached an asic miner to it and also installed the intel OpenCL. The N2808 is listed as having support for OpenCL in that WiKI but the newer (?) 3650 is not.

I stopped using the OpenCL on the N2808 as the video driver with that library was not as good as the driver without it plus it was overheating. Bitcoin utopia was an interesting crunch while it lasted but it totally screwed up my "credits" to where no other projects would even show up on a statistical graph because their number were so small. Should have been banned.
162) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94244)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
No slot contents got moved during the course of this conversation. It's the device that matters, not the storage location.


Possibly may work if the apps are all the same executable code and can handle sm_52 or sm_60 or anything.

Consider this: The app that was running in slot 17 is brought is back up in 17 by boinc. However, it is given a different GPU device

If the data files in the slot have been moved there correctly, they now match the device that is going to do the crunching.

This gets difficult with more than two boards but can be easily tested but is not anything I am really interested in doing and work units are far and few between.

Looking in the gpugrid folder there seem to be a lot of stuff and possibly different apps and if there is one app for each class of device then moving the slot data will fail because the app cannot handle the data even if the data matches the device.
163) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94241)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
The problem is that ACEMD spits out so many blasted files - I was trying to find the right ones.

To me, a "checkpoint" file is written out (or added to) as the science progresses. I'd expect that device hardware enumeration would take place only once, at the start of the run - and the most likely candidate is the compilation stage. If we can prove that, we have something to offer the admins.


Sounds good! Hopefully I will get some gpugrid tasks in to look at.

[EDIT] This link shows what various boards handle which CUDA
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

sm_52 is only good for gtrx970 class or "below" although at some depths "below" will no longer be an option. Clearly, the file you have works only with the 970 board. Petri over at SETI just built an app that uses lastest 10.2 CUDA libraries and works with all boards CUDA 5 or later. Maybe they can hire Petri or convince the SETI folks that gpugrid can help find ET. More accurately: Tbar built the executable and Petri coded up the app using new features in CUDA.
164) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94239)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
Now, here's a thought. All those hexadecimal hash files in the slot directory are actually plain text content, and they start

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-26218862
// Cuda compilation tools, release 10.1, V10.1.168
// Based on LLVM 3.4svn
//

.version 6.4
.target sm_52
.address_size 64
Just as a test, we could try deleting those for a paused task. My guess is that the app will re-compile them if it finds they're missing. And if the .target sm_52 is different on a different device, the binary compiler output might be different, and might run on the new hardware. Worth a punt?

(edit - that target value is on my GTX 970. Is yours different, for a different card?)


Just saw this. if the machine class is not sm_52 then deleting the checkpoint file will not help. In addition to the delete, the class needs to be changed as you mentioned or it wont run on the card.

I saw this problem on the new SETI app. The SETI app includes everything above SM_30 as 30 and below is not CUDA 5.0 and wont run the older boards.. Maybe the staff at gpugrid built the app exactly for a particular device they used sm_52 just for tjhose devices and did not include the library for the sm_60 and higher like the SETI folks did.. If they put all the libraries in then it would work. Maybe this is the problem and not the checkpoint file?

[edit] Both that checkpoint and that header file have to match the gpu. I would assume the libraries for various classes of co-processors are embedded in the executable as that makes configuration control easier (one app) but who knows. The seti app has just about everything and is 229mb in size. There is nothing that big in the gpugrid folder but adding all the DLLS up gets up high enough..

All my gpugrid tasks finished and the Einstein backup is at work. The slots were wiped clean of any gpugrid residuals.
165) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94238)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
OK, so the real problem seems to be that GPUGrid's 'New version of ACEMD' app evaluates the hardware it's running on when it first starts, and then remembers it. If restart hardware doesn't match the original evaluation, it crashes.

That means their new application is "not fit for BOINC". We need to convince them that the hardware evaluation has to be re-done from scratch when resuming from a pause, so that computation can continue.

Is that a fair form of words? If so, we have to work out whether they "don't care", or "don't understand". I suspect it's the latter, for which the appropriate penalty is re-education.


Yea, they open the checkpoint file and read in an OpenCL values like "compute units: 28" but the board actually has only 14 "compute units" and their algorithm does not compensate for the change so the process quickly dies. I don't think their code is public and even if it was I have had bad experiences compiling project code.

Easy fix is to delete the checkpoint file is my guess. Either way all is lost

[edit] The app is CUDA not OpenCL but the idea is the same: parameters in the checkpoint file are incompatible with the new gpu.
166) Message boards : Questions and problems : Questions about BoincTasks columns meaning (Message 94236)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
Here are my boinc stats:

https://www.boincstats.com/stats/-1/user/detail/4877

and here are my boincstats:

https://www.boincstats.com/stats/-1/user/detail/58271033255

the first one is my set of retired projects and the second one are my current projects. is there anyway to combine these two like it used to be? 3 months ago my retired projects separated from my current projects I dunno why. I have always used the same account and I have asked this question before on another message board but no answer and nothing solved


Boincstats is not same as boinctasks
Your CPID's are different. They need to be the same. Not sure if possible to do that as it. You might ask on their forum.

When mining for gridcoins, the CPID is used as the "account". You can transfer your coins to another CPID but I suspect the cannot be done with "credits"

If 3 months ago you got a new computer or reinstall boinc then maybe the following will work:

Go to each project and do a "merge" to put the credits on the new computer. Eventually they will disappear from the old CPID and all be on the new one. Just a guess.
167) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94234)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
Going back to the very beginning of this conversation, you said

I am guessing that when the system reboots boinc has lost track of which app was running in which slot and assign the first device it finds to the first suspended app.
If you'd written "... boinc has lost track of which task was running on which device ..." I'd have agreed with you. BOINC doesn't lose track of which task's files are in which slot.


That was bad choice of word and I have done worse. Unlike the project that seems not to give a hoot, I "own" my mistakes.
168) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94232)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
Both the initial data files, and the checkpoint files, will be in that slot the whole time - neither BOINC nor I moved them.

The problem with GPUGrid is that their new app wants to run on the same model of card after a restart, and BOINC doesn't guarantee that: all it guarantees is that CUDA tasks will run on 'a' NVidia GPU - any NVidia GPU.


this is no different from what I have been saying. A different GPU is given a working directory of slot "17". The files have not been moved so they are the same checkpoint files as was create by the previous, different, GPU and when read in cause problems resuming.
169) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94230)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
Again, no. It's BOINC which manages the slot directories, not the project.


Richard: I don't have a problem with anything you have written here. It is the project that has to manage the resume and it is not working correctly due to some designed fault on their part. I assume it is finding the wrong checkpoint files and that is causing the failure.

[edit] I got the idea of the wrong checkpoint file being used from something Keith told me last week. He said that you need to unsuspend the gpugrid tasks in the same order they were suspended. That implies they are finding the "right stuff" in the "right place". This whole problems is the projects fault and they have a lot more bigger than this.
170) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94228)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
if it did that it would probably try to start a task from a different project.


I follow you on all of this but the problem seems to be with the project and it is the projects responsibility to do the resume.

 Directory of D:\ProgramData\Boinc\slots\0
12/13/2019  09:13 AM        24,184,239 restart.chk
               1 File(s)     24,184,239 bytes
 Directory of D:\ProgramData\Boinc\slots\1
12/13/2019  09:14 AM        24,184,239 restart.chk
               1 File(s)     24,184,239 bytes


The files are identical sizes but different binary contents. It seems logical to me that the files are used to restore the state of the app. I am guessing that the app looks for "restart.chk" and if there it attempts to resume and gets the wrong state info.

One thing nice about my "theory" is that it is falsifiable. You showed that if put into a different slot it ran without failing. Did it restart from the checkpoint or did it just start over when it could not find the checkpoint file? Does it even need that file to resume from where it left off?
If that file is needed to resume from where it left off then how does it find it? What is the "working directory" of the app? I don't have the answers to all of these.

Unlike the global warming theory that cannot be falsified (too little snow proves global warming and so does too much snow) if swapping the slots still causes gpugrid to fail then definitely, my "guess" was wrong.
171) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94225)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
Check the timestamps and other clues.

--device n is an old way of doing things. With GPUGrid, you might be looking at the start instruction given by the wrapper to the science app. In turn, BOINC will have passed the device instruction to the wrapper in init_data.xml - which will probably have been re-written at the start of the 'resume' session.

And the slot number is of no effect. It's just the first scratch folder which happened to be available when the task was first launched - which may have been hours earlier.


The checkpoint file is in the slot. Clearly, the device assigned must be getting the wrong slot on restart. The dates of all files are all current except the app.
12/13/2019  07:42 AM        24,184,238 restart.chk
12/13/2019  07:40 AM        24,184,238 restart.chk.bkp
12/13/2019  06:28 AM               123 stderr.txt


As mentioned, this can easily be tested. I cannot do it myself as I have identical GPUs and it is difficult to even get gpugrid work units.


And the slot number is of no effect. It's just the first scratch folder which happened to be available when the task was first launched - which may have been hours earlier.


Actually, I think that is the problem. on restart the app is assigned the first slot and gets the wrong checkpoint file.
172) Message boards : Projects : GPUgrid not always resuming tasks correctly (Message 94222)
Posted 13 Dec 2019 by Profile Joseph Stateson
Post:
I can shed some light on this problem and offer a possible solution but I think it is up to the project to do a proper resume.

I have been looking a the possibility of removing a defective GPU from the pool of available GPU and learned a few things as to what modules were responsible for assigning a gpu to an app. Being able to assign the same app to the same GPU on a resume is similar to assigning it to a different GPU due to failure of the one it was on..

I looked at the slots on my system that is currently running two gpugrid tasks (lucky me!).

The stderr file in slot 0 shows "boinc input --device 1"
The one in slot 1 shows "boinc input --device 0"

I am guessing that when the system reboots boinc has lost track of which app was running in which slot and assign the first device it finds to the first suspended app.

I am pretty sure this is the case as module "app_start" calls "coproc_cmdline" and that module find the first "N" through iteration and puts it into "--device N"
Later on a slot is assigned.. coproc_cmdline simply gets the first # and only checks to see if it is "out of range"

I am guessing, though it could be verified, the problem of the failing gpugrid task could be solved by a cut and paste of the contents of the "slot" into a different slot based on ascending number.,
KISS: if stderr of slot 0 shows device 1 and stderr of slot 1 shows device 0 then swap the contents of the slots after stopping boinc and before rebooting.
173) Message boards : Projects : Need Recommendation for Long deadline projects - going on vacation with no network for longer than 2 weeks (Message 94126)
Posted 9 Dec 2019 by Profile Joseph Stateson
Post:
I'm going on vacation from Dec 21st to Jan 5. The computer will be running, but no network while I'm gone. My current crop of projects (WCG, Asteroids@home) all have tasks that deadline before I get back.

Any recommendations for projects that I can run while I'm gone that have deadlines greater than 2 weeks? I'd heard climateprediction.net might? Any other recommendations?


SETI has really long deadlines. In fact. the deadlines are so long that some lunatics save the results for several months and only upload them after a contest starts. I plan to do that myself on the next WOW event as I got left behind on the last one.

If you find a project with the correct deadline but it does not give you enough work units, you can artificially raise your CPU count to get more. There is a setting in cc_config to allow the cpu count to be raised. I have not done that myself as I don't run that many CPU bound tasks.
174) Message boards : Questions and problems : Questions about BoincTasks columns meaning (Message 94087)
Posted 8 Dec 2019 by Profile Joseph Stateson
Post:
Dear All,

I'm using BoincTasks to control my remote computers.
Fred has some explanations on the webpage, but not for all columns.

I find that under projects tab, task column, Einstein shows 1/104, LHC shows 53/0, WUProp shows 0/0/1.

I'm just wondering that what does these mean.
Where can I found more explanations about these columns?

Cheers



those values are useful in the gadget when computers are selected, not too useful in tasks column
Einstein@Home	Gridcoin	545,767,226.8	340,509.08	44,941,130.8	66,055.55	100 (9.99%)	0 / 5	- / 04:10:24 (1)		movieserver	school	JStateson	

0/5 => no cpu tasks / 5 gpu tasks
-/04:10:24 => not applicable / 4 hours 10 minutes 24 secs estimated remaining time gpu
(1) => not sure about that guessing 1 einstein is executing or 1 gpu exists

send fred a donation and get your name in the about credits
175) Message boards : Questions and problems : All GPU boinc projects return computation error? (Message 94062)
Posted 6 Dec 2019 by Profile Joseph Stateson
Post:
My pc runs Linux (Lubuntu), and has been working flawlessly for the past few weeks.
From a working condition, I closed Boinc, turned off the PC, and a few days later restart boinc, and all my projects return a computation error.
Nvidia drivers are found, everything was exactly the same as before.


It is up to the project to implement a restart mechanism. Some projects have a robust method and others, to put it nicely, do not.

Some projects ("like "A") take longer to write checkpoint (recovery files) than others. If you "close the lid" on your laptop or tell the OS to shutdown there is a good chance that Project "B" will not get to write its checkpoints and when you restart, some of "A" will have errored out as well as all of "B"

GPUgrid: If you have two GPUs and they are different, there is a %50 chance that GPU0 will use GPU1's checkpoint and GPU1 will use GPU0's. This causes both work units to report compilation errors. If have 3 different GPUs there is far less than %33 chance.

Depending on which system I need to power down I do the following:
Issue command for NO NEW TASKS
Suspend all work units that have not started
wait for all GPU tasks to finish
I have not had a problem with CPU bound tasks like WCG but you may want to suspend CPU tasks fi a problem
Exit the gridcoin "research" program (this is a must as they have a really terrible handler for sigterm or win shutdown )


I ended up needing to reinstall Boinc, for the issue to be fixed on SOME projects.
Asteroids, Einstein, all worked fine before. Now they just error out :(
Meanwhile, new downloads of Collatz and Milkyway seem to work fine...
What could be the cause of this?


The only time a re-install of BOINC is needed is if there is a disk drive problem and boinc does not start.
On rare occasions (gpugrid comes to mind) there is a bug in the project startup like a null account or maybe a null (empty) reply file caused by power going off when the file was written. Very likely all subsequent work units will error out. Just reset the project instead of reinstalling boinc.
176) Message boards : Questions and problems : optional arguments not being passed to boinc client in ubuntu (Message 94033)
Posted 5 Dec 2019 by Profile Joseph Stateson
Post:
Discovered how to pass command line arguments to boinc using the existing ubuntu "default" way. A mod to the service is required

First: the /etc/init.d/boinc-client <start | restart | stop > runs the service but cannot pass arguments.
According to the bible arguments need to be in an environmental file.
That file is not identified in the service and possibly that init.d file is going to be depreciated.

in /lib/systemd/system/boinc-client.service

where
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc
ExecStop=/usr/bin/boinccmd --quit


add the following
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc $BOINC_OPTS
EnvironmentFile=/etc/default/boinc-client
ExecStop=/usr/bin/boinccmd --quit


Where that "default" file has the following
# Here you can specify additional options to pass to the BOINC core client.
# Type 'boinc --help' or 'man boinc' for a full summary of allowed options.
#BOINC_OPTS="--allow_remote_gui_rpc"
BOINC_OPTS=""
# Scheduling options


add your optional arguments at that BOINC_OPTS
save service changes with "systemctl daemon-reload"
then start client with
sudo systemctl start boinc-client

If you made a mistake in the arguments, use "journalctl -xe" to see what systemctl did

Not sure who decides how Boinc is installed in Linux, but I think this change needs to be incorporated in any Linux releases.

[EDIT] I do not see any of this code at the Boinc GitHub so I assume it is controlled elsewhere. I can make a suggestion about the change there but it might just clutter up the "issue" database.
----just looked over there and other people have requested changes to the service so I will add this in.
177) Message boards : Questions and problems : optional arguments not being passed to boinc client in ubuntu (Message 93988)
Posted 1 Dec 2019 by Profile Joseph Stateson
Post:

Why are you trying to pass arguments to the BOINC client, anyway?


I am fine-tuning a special "mod" of the client for super-secret event and only Juan BFP and Keith Myers know what I am up to.

178) Message boards : Questions and problems : optional arguments not being passed to boinc client in ubuntu (Message 93984)
Posted 1 Dec 2019 by Profile Joseph Stateson
Post:
Making some progress: I now know that I really don't know much about Linux and how startup scripts work.

From a discussion here months ago I was told that /systemd/system,boinc-client.service is what really starts boinc

that script has the following

[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
Type=simple
ProtectHome=true
PrivateTmp=true
ProtectSystem=strict
ProtectControlGroups=true
ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle
# The following options prevent setuid root as they imply NoNewPrivileges=true
# Since Atlas requires setuid root, they break Atlas
# In order to improve security, if you're not using Atlas,
# Add these options to the [Service] section of an override file using
# sudo systemctl edit boinc-client.service
#NoNewPrivileges=true
#ProtectKernelModules=true
#ProtectKernelTunables=true
#RestrictRealtime=true


guess what? that service script has no place for an argument. As it is, there is no way to pass any arguments using /etc/default/boinc-client or even /etc/init.d/boinc-client !!!

at least when using ppa:costamagnagianfranco which is where I get anything later than versoin 7.9
maybe another ppa has a better solution???

IMHO of course

[EDIT]
I am guessing I could edit the file and add the arguments behind the /usr/bin/boinc
if so, then obviously that start script I have been using does not do what it says it can do.
A suggestion by Richard was to just use systemcrl to start boinc
maybe that is all that init.d script does and nothing else.

[EDIT2]
After adding my boinc optional argument behind
ExecStart=/usr/bin/boinc

I had to run
systemctl daemon-reload 

before the client would start with the new argument. A real PITA. However, it was nice of the OS to advise me to use that reload command. Saved googleing for the problem
179) Message boards : BOINC client : Need help identifying bug(s): ubuntu or boinc or both? (Message 93928)
Posted 27 Nov 2019 by Profile Joseph Stateson
Post:
Just ran another test of disable and enable the exclude_gpu mechanism in cc_config.

It worked - or at least it behaved differently than I expected.
I booted up with gpu8 disabled
edited cc_config to change the 8 to 18 (there is no gpu 18)
when I issued a "read cc_config" the #8 gpu was enabled. I had expected to have to re-start boinc to get it to work.

This has become more complicated. What I am guessing is that the various failures (like failure to enable gpu0 .. gpu7 were caused by the NVidia driver not being able to handle the stuck gp8 and unable to complete the request to enable 0..7. That is just a guess.
Will have to run some more test to see WTF is going on.
180) Message boards : BOINC client : Need help identifying bug(s): ubuntu or boinc or both? (Message 93927)
Posted 27 Nov 2019 by Profile Joseph Stateson
Post:
Normally works fine but definitely can be a problem when a GPU has hardware problems.

ubuntu 18.04

Will have to go to AskUbuntu and see what can cause a kill -9 to be ignored. Hopefully there is NOT a simple explanation that will cause me to lose the few "reputation points" I got there I always thought that the only stupid question is the one not asked but that can cause a points loss with some critical moderators.

However, I will try that in the future.

Running that /etc/init.d/boinc-client actually passes the arguments to "start-stop-daemon" whatever that is.
My guess it works it way back to systemctl
181) Message boards : BOINC client : Need help identifying bug(s): ubuntu or boinc or both? (Message 93924)
Posted 27 Nov 2019 by Profile Joseph Stateson
Post:
Using 7.16.3 from the recommended repository with a newly reconfigured system of various cheap, used, ebay, nvidia boards. After a few hours gpu8 got stuck. I had a problem, obviously hardware. I edited cc_config to exclude gpu8 but used <device>8</device> instead of <device_num>8</device_num> That caused all nvidia boards to be excluded. That is perfectly understandable. I then corrected my mistake and issued another "read config" command and got the message that 8 was excluded. Unfortunately, that did not un-exclude gpu0..gpu7 I assume this is a bug. Looking at the references I read
If you change GPU exclusions, you must restart the BOINC client for these changes to take effect

So it seem I can exclude a GPU but if I "change the exclusion" I must restart the client. I am guessing that this could be fixed in a feature that could be added in a future version.***

Restarting the client caused a problem
sudo /etc/init.d/boinc-client restart
did not exit and I had to ctrl-c to get back to the bash command prompt.
I did a "stop" to make sure that boinc was stopped and then looked at htop to make sure it was stopped. Things got worse from there on:


There were 9 processes all executing: the cpu% and shared mem change, not just the elapsed time. All of them are accessing that stuck GPU8. I am guessing they timed out during the night as I see 8 errored tasks listed at the seti web site but the tasks continue to execute in background. I tried killing them:

jstateson@h110btc:/usr/bin$ boinccmd --quit
can't connect to local host

root@h110btc:/var/lib/boinc/projects# sudo killall -v boinc
boinc: no process found

sudo kill -9 12374


None of those worked, not even the kill-9
Looking at htop I see PID 12374 getting a time slice: the cpu% changes from 77-99 percent frequently as do al the other instances. Pretty sure this is a problem in ubuntu. The only thing I can think of is to reboot and I suspect the reboot will hang and I will have to power it off.

***There is a discussion at github "computing prefs 2.0" issue #2993


https://github.com/BOINC/boinc/issues/2993

about adding useful features of enabling or disabling GPU. I added a comment a couple of weeks ago. Looks like the disable tool "exclude_gpu" works but re-enabling has a problem. Obviously if GPU had hardware issue it should not be re-enabled but one should be able to re-enable the GPUs that were working.

Was wondering if the ubuntu experts here can shed some light on why the kill -9 didn't work. Also, if the task was timed out by the project, should not it have been killed? If it could not be killed one would hope it would not be re-assigned subsequent tasks.

I spent some time looking at where tasks were assigned to GPU's and it is not clear to me where that is done. Some information such as ignoring is passed to the co_proc handler which only runs when the boinc is started. That could explain why a restart is required. the command to "read cc_config" calls a gpu handler but it seems that program can only disable or remove the gpu and not add it back in.
182) Message boards : GPUs : Linux only uses Nvidia GPU, not Intel IGP? (Message 93864)
Posted 22 Nov 2019 by Profile Joseph Stateson
Post:
Like you say, the gflops are low.
However, it's for my portable pc, so the electricity I don't have to pay.
It runs with an RTX 2060, and a Celeron CPU right now. The RTX 2060 is only loaded halfway.
The CPU is even slower.
From what I could read, the celeron has 12 GPU cores running at 600-700Mhz,
The cpu has 2 cores running at 3,1Ghz.
When GPU crunching, only 10% of a CPU core is utilized, leaving the remaining of the core to another project.
In this case, a lot more work can be done, even if the GPU isn't that efficient. The CPU isn't either.


Problem is not always the electricity. I was lucky to get Microsoft to replace my surface pro for free even after the warrantee period. I had run Einstein intel app on it for while and stopped when I noticed the screen budging out. The problem was the charger had to be on to keep the app running that that constant charging while running the app overheated the battery. Will not be running any apps like that again.

183) Message boards : GPUs : Linux only uses Nvidia GPU, not Intel IGP? (Message 93823)
Posted 22 Nov 2019 by Profile Joseph Stateson
Post:
Is there a way to be able to use both Intel's IGP (of a Celeron G4900 series CPU), and an Nvidia GTX/RTX GPU?


The official release of intel OpenCL for Linux is 18.1
https://registrationcenter.intel.com/en/products/download/3599/

It is called 18.1 but it is only good for ubuntu 16
what intel guru has to say about 18.04
https://software.intel.com/en-us/forums/opencl/topic/797941


Out of curiosity, can you click on the download and see if the file actually ends in .tgz
Unaccountably, I got something else and had to rename it to .tgz
OTOH you might not want to register for that download as it is a PITA plus it is no good anyway.

I got intel OpenCL installed on my Linux 18.04 by visiting the GitHub, non-official release. My motherboard is h110-BTC with i7-6700 and it has (so far) a gtx-1060 and a p106-100 running the Linux special SETI app. Since that special app is the anonymous platform it seems I cannot download the intel app that SETI has available. That is a guess. I have another thought**.


The following works and install intel OpenCL and is recognized by BOINC

https://github.com/intel/compute-runtime/releases

I put in 19.45.14764 package, the lastest. All I had to do was a copy and paste of all the wget at once and then did that sudo dpkg -i *.deb
and rebooted and boinc shows the following:
1			11/21/2019 10:33:51 PM	Starting BOINC client version 7.16.3 for x86_64-pc-linux-gnu	
2			11/21/2019 10:33:51 PM	log flags: file_xfer, sched_ops, task	
3			11/21/2019 10:33:51 PM	Libraries: libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3	
4			11/21/2019 10:33:51 PM	Data directory: /var/lib/boinc-client	
5			11/21/2019 10:33:57 PM	CUDA: NVIDIA GPU 0: P106-100 (driver version 440.26, CUDA version 10.2, compute capability 6.1, 4096MB, 3974MB available, 4374 GFLOPS peak)	
6			11/21/2019 10:33:57 PM	CUDA: NVIDIA GPU 1: GeForce GTX 1060 3GB (driver version 440.26, CUDA version 10.2, compute capability 6.1, 3019MB, 2945MB available, 3936 GFLOPS peak)	
7			11/21/2019 10:33:57 PM	OpenCL: NVIDIA GPU 0: P106-100 (driver version 440.26, device version OpenCL 1.2 CUDA, 6081MB, 3974MB available, 4374 GFLOPS peak)	
8			11/21/2019 10:33:57 PM	OpenCL: NVIDIA GPU 1: GeForce GTX 1060 3GB (driver version 440.26, device version OpenCL 1.2 CUDA, 3019MB, 2945MB available, 3936 GFLOPS peak)	
9			11/21/2019 10:33:57 PM	OpenCL: Intel GPU 0: Intel(R) Gen9 HD Graphics NEO (driver version 19.45.14764, device version OpenCL 2.1 NEO, 2908MB, 2908MB available, 100 GFLOPS peak)	


Since I was unable to get any SETI INTEL work units I tried Einstein.
802	Einstein@Home	11/21/2019 11:14:19 PM	Requesting new tasks for Intel GPU	
803	Einstein@Home	11/21/2019 11:14:21 PM	Scheduler request completed: got 0 new tasks	


Didn't work either.

*** looking at both SETI and Einstein both project claim I have a 9th generation intel HD graphics CPU. In actuality the i7-6700 is only 6 generation and only 100 GFLOPS. That is 40 orders of magnitude less than the nvidia. Not worth trying to figure out why it is not working.

both seti and Einstein show the following
PU type: 
GenuineIntel Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz [Family 6 Model 94 Stepping 3] 
Number of processors:  8 
Coprocessors:  [2] NVIDIA P106-100 (4095MB) driver: 440.26
INTEL Intel(R) Gen9 HD Graphics NEO (2908MB)  
Operating system:  Linux Ubuntu Ubuntu 18.04.3 LTS [5.0.0-36-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] 
BOINC client version:  7.16.3


You have a much newer CPU maybe it is better than 100 GFLOPS which is pathetic compared to an NVidia or ATI.

if you get that OpenCL installed please post the GFLOPS I am curious about the lasts intel chips potentials.
184) Message boards : Projects : What projects NEED CPU crunching? (Message 93787)
Posted 19 Nov 2019 by Profile Joseph Stateson
Post:
What projects are most dependent on CPU crunching?
I'm thinking of putting a dual Xeon server to work, but would like to focus projects that are most in need of CPU crunching.

Preferably projects with lots of work, and that have no GPU support (GPU crunching goes a lot faster, if supported, so projects that preferably ONLY have CPU support).

I do run from Linux.


Been running my dual X5675 on World Community Grid. Really useful research being done there. keep 22 work units crunching and a pair of Milkyway running on the RX-570

System is liquid cooled but in the garage, during the summer, I had to drop the speed from 3.06 to 2.89. While not much of a drop it made a huge difference in temp. The board has only 4 slots, all x4 electrical.
185) Message boards : BOINC client : found a "fix" for an issue in github but.. (Message 93780)
Posted 18 Nov 2019 by Profile Joseph Stateson
Post:
As far as I know, no documentation needs to updating. I am retired and have plenty of time available but am not wasting it. I am sorry if I am wasting your time by asking for advice here. I thought it was appropriate to ask here and not at GitHub.
I'm in the same boat - retired, and with time on my hands. It's absolutely fine to ask questions here, or at any of the multiple other places where BOINCers hang out - but if you refer in one place to a problem described in another place, it's helpful to tie the two together with a link or a reference number.


Sent private message about the problem with issue 3347 and just posted an update there
186) Message boards : BOINC client : found a "fix" for an issue in github but.. (Message 93777)
Posted 18 Nov 2019 by Profile Joseph Stateson
Post:
Without knowing for certain which issue you're talking about, it's hard to advise. Why not at least post the issue number here, in your question?

From what you've actually said, I'd say:

Leave the problem that you first encountered visible, for others to read.
Remove all the guesses that turned out to be false trails.
Write a clear statement about what the true solution was. Possibly make that the first thing that people will read, below the initial statement of the problem.


I do not normally contribute to issues on GitHub concerning Boinc and was concerned that making a lot or even a few changes in an issue or a comment might be unwelcome. Your advice was what I was looking for: It is OK to remove the false trails and restate the problem clearly.



Tell us - here, if you like - if any documentation needs updating, to avoid other people wasting their time as you appear to have done.


As far as I know, no documentation needs to updating. I am retired and have plenty of time available but am not wasting it. I am sorry if I am wasting your time by asking for advice here. I thought it was appropriate to ask here and not at GitHub.
187) Message boards : BOINC client : found a "fix" for an issue in github but.. (Message 93769)
Posted 18 Nov 2019 by Profile Joseph Stateson
Post:
My original guess at the problem was long and misleading as I didn't know exactly what caused the build failure. Basically all I did was point it out. Have finally figured it out.

Question: Should I delete most of my original guess or add a new comment about what is really happening and leave the old guess there?

Are there any rules for changing comments? Unlike this forum, it seem GitHub allows for a lot more editing
188) Message boards : BOINC client : Suspend network activity does not work for "lost tasks" (Message 93765)
Posted 17 Nov 2019 by Profile Joseph Stateson
Post:
What is 'resent' for a lost task after an update is the job specification


That makes sense especially in light of what happened when I switched platform to anonymous on SETI. I lost all the cuda60 work units. They were actually deleted from the hard drive. Later, I tried to restore them and was told to use the "ghost" protocol because SETI no longer re-sends lost tasks. However, Einstein does "send" lost tasks, but just the "job spec" and nothing is actually downloaded.

What if you don't want the lost tasks to be downloaded as there was a legitimate reason in banishing them? I don't see a way to stop the download and even a detach had no effect.
189) Message boards : BOINC client : Suspend network activity does not work for "lost tasks" (Message 93763)
Posted 17 Nov 2019 by Profile Joseph Stateson
Post:
I think that is normal behaviour. I have seen it many times over the years. If network activity suspended, projects will still update, it seems to be only the file transfer that is suspended rather than all network activity.


FWIW: I would consider the project sending a "lost" file to the host to be a file transfer.
190) Message boards : Questions and problems : windows build boinc error (Message 93762)
Posted 17 Nov 2019 by Profile Joseph Stateson
Post:
I think because the buildenv file does not work correctly, what should I do with it.
I want to build a client with RPC communication, which one should I choose.


You should be able to build the client w/o running that buildenv file.

With following structure
src
------boinc_master
------boinc_depends_win_vs2013

You should be able, from vs2013, to open
" \src\boinc-master\win_build\boinc_vs2013.sln"
right click on the client "boinc" and select build. This will generate "win32" and "debug" options



If eventually you want to make a Linux version of your program, and plan to use GIT for source control, you might want to look at this
191) Message boards : BOINC client : Suspend network activity does not work for "lost tasks" (Message 93747)
Posted 17 Nov 2019 by Profile Joseph Stateson
Post:
Yesterday I used the boinc "suspend network activity" for some test I was doing and then forgot to turn it back on.

Today, 24 hours later, I noticed there were 3 tasks "stuck" downloading. I could not get them to download. Tried suspending resuming both the tasks and the project

Did a project reset and got message "lost tasks resent" and the same 3 tasks were back in "transfer" and stuck.

Did a detach and re-attach, same problem: lost tasks were downloaded and stuck in "transfer"

Finally checked network and realized I had suspended network activity.

Well, even though it was suspended, it still send me some tasks albeit "lost"
They started crunching as soon as network was enabled.
192) Message boards : BOINC Manager : Task priorities? (Message 93737)
Posted 16 Nov 2019 by Profile Joseph Stateson
Post:
Hi!

I'd like to run tasks from a specific project, until they're ran out.
Then I'd like to load tasks from a second project, until they're out of tasks.
And so on...

How can I do this?


This can be done with boinccmd and bash script running in the background but it is not worth the effort.
193) Message boards : Questions and problems : GPU usage low, can't find app_config.xml (Message 93728)
Posted 15 Nov 2019 by Profile Joseph Stateson
Post:
Cant access unless you are elevated mode ie: "sudo"

What project are you using GPU on ? Some project do not make a lot of use of the GPU.
Some projects, milkyway for example, can have multiple tasks running on the same gpu which "ups" its usage.

SETI has a command argument "-nobs" to force the CPU to dedicate itself to the GPU. That goes into the appconfig.xml and you probably should ask over at SETI for advice
194) Message boards : BOINC client : Misconfiguration or identification of required client sources in ".gitignore" (Message 93650)
Posted 12 Nov 2019 by Profile Joseph Stateson
Post:
Solved - sort of.

Should have started with the Linux one first, pushed it to the upstream then cloned the windows version.

However, if you start with the windows version then decide you want to clone the Linux you will need to remove or comment out the following files from .gitignore or you will be unable to build for Linux.
#pkginfo
#prototype
#client/scripts/boinc-client
#client/scripts/boinc-client.service

#py/Boinc/version.py
#py/setup.py


Probably just needs a warning in the build wiki to clone the windows from the Linux.
195) Message boards : Questions and problems : GPU dummy plug still needed? (Message 93642)
Posted 12 Nov 2019 by Profile Joseph Stateson
Post:
should be ok as long as you don't let it go to sleep or close the lid

At one time there was a program that kept laptops running even with the lid closed.


If you do buy an HDMI dummy plug be aware that some ebay'ers show 3 plugs in the adv but the fine print says you are getting only one.
196) Message boards : BOINC client : Misconfiguration or identification of required client sources in ".gitignore" (Message 93640)
Posted 12 Nov 2019 by Profile Joseph Stateson
Post:
Ran into a problem building both windows and Linux version of the boinc-master
Not going to submit this as an "issue" as maybe I didn't order the builds correctly. Maybe someone can decide if there is a problem or not
If so I will submit it as an issue unless it is already known

Background: Wanted to have single copy of Boinc source at my upstream and create working windows and Linux versions of same source.

Created private upstream "boinc-master" empty
Created local git repository on my windows system and downloaded the full "zip" from the boinc repository (thanks!)
Initialized and pushed the local repository to my upstream
On my Linux system I did a git clone of my upstream and attempted a build which failed

The Linux build failed because (among other missing items) the file "client\scripts\boinc_client.im was missing
I looked in .gitignore and sure enough that file was excluded with the comment "build by configure"
that means when I did my upstream "push" that file was not pushed and as a result did not show up in the clone.

I don't believe that file is created by "configure" and assume the comment in .gitignore is incorrect.

The order of building the client on Linux is to run the scripts
./_automake
./configure --disable-server --disable-manager


that "automake" script fails and reports (among other stuff)
configure.ac:1289: error: required file 'client/scripts/boinc-client.in' not found
configure.ac:1289: error: required file 'client/scripts/boinc-client.service.in' not found


Since "configure" is done after _automake then it is not possible for those files to be created by "configure" (unless there is a neat trick)

I believe those two files (there are others) need to be removed from .gitignore or a change to "_automake" to create them.

I suspect that the developers work from a single copy of the sources when finalizing both Linux and windows versions. I am trying to do the same thing and it is strange that a "windows push" excludes required Linux files.
197) Message boards : Projects : Some projects will not get tasks (Message 93633)
Posted 11 Nov 2019 by Profile Joseph Stateson
Post:

(not sure if the above will show or not as it does not in the preview).


Some sites require https and other require http
for images and urls

Not sure why but if the preview does not work with https then try http and vice-versa
198) Message boards : Questions and problems : Some projects get compensated by google? (Message 93627)
Posted 11 Nov 2019 by Profile Joseph Stateson
Post:
Out of 26 projects my main system is watching, 6 of them have what I assume is google tracking code.

I discovered this by using find /i "google-analytics" mas*.xml in the boinc data folder

The 6 projects are:

Cas
Nfs
Universe
Cosmology
Gpugrid
Milkyway

for example:

---------- MASTER_WWW.GPUGRID.NET.XML
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

I assume they signed up for this and get some compensation


FWIW, According to https://www.similarweb.com/website/boinc.berkeley.edu SETI is the biggest traffic referral to Boinc at Berkeley,. My guess is they are looking for ET both at SETI and at Berkeley. They will definitely find some at Berkeley.

Maybe this tracking code was added by my computer and not the project?
199) Message boards : BOINC client : problem building linux client: binary code is 20x bigger than before (Message 93622)
Posted 10 Nov 2019 by Profile Joseph Stateson
Post:
Made some progress tracking down why debug was enabled

the " -g " is controlled by

ac_cv_prog_cc_g

if = yes then debug is on
if = no then off

that variable iis in the script "configure" which is created by _autosetup

In one folder "boinc" if I run autosetup I get that flag set to "yes" and get debug stuff "-g -O2"

in another folder "boinc-master" if I run autosetup I get that flag set to 'no" and all I get is -O3

using diff there is no difference between the two _autosetup
nor the two Makefile.in
nor the two Makefile.am
nor the two Makefile.incl
there is a difference between the final "Makefile" as the -g O2 is in one and the -O3 is in the other but that is expected due to that flag being set to "yes"

One folder came from going to GitHub and a download of a zip
the other folder came from
git clone https://github.com/BOINC/boinc boinc

Maybe that is how the difference came about, maybe not.
Not going to pursue this any further. as that strip command you suggested works fine
Going to "smashin crab" for late lunch as I have given up on smashing this bug.
200) Message boards : Questions and problems : Reporting timer? (Message 93613)
Posted 10 Nov 2019 by Profile Joseph Stateson
Post:
Great detective work. Interesting workaround. Wish the MW administrators would look at your code examples and see where they have the server software misconfigured. I think you have asked for the server configuration files from the project to examine. Richard could probably work out what they have done wrong.


Thanks Keith!

Going to restate my conclusion as it is too late to edit the previous post and it might be misleading.

The client properly recognizes and uses the 91 second RPC delay. No problem there. All requests to the project have a 91 second delay whether results are attached or not.

What I found was that the project requires at least one request to HAVE NO RESULTS ATTACHED . That time delay of 256 seconds before I allow results to be attached will cause at least one request (all these requests are for data) to be sent WITH NO RESULTS ATTACHED So, if I had actually used 91 seconds for my delay (instead of 256) the project would exhibit the same behavior and nothing would have been downloaded.
201) Message boards : BOINC client : problem building linux client: binary code is 20x bigger than before (Message 93610)
Posted 10 Nov 2019 by Profile Joseph Stateson
Post:

./configure CXX='g++ -no-pie' --disable-server --disable-manager


Best would be to use compiler flags to not use the debug symbols in the first place but you can strip the symbols out afterwards with the strip command.


the above flags had no effect on size
jstateson@jysdualxeon:~/boinc/client$ ls -l boinc
-rwxr-xr-x 2 root root 20073976 Nov  9 20:41 boinc



strip --strip-debug boinc



this worked!
jstateson@jysdualxeon:~/boinc/client$ sudo strip --strip-debug boinc
jstateson@jysdualxeon:~/boinc/client$ ls -l boinc
-rwxr-xr-x 2 root root 1187568 Nov  9 20:44 boinc
jstateson@jysdualxeon:~/boinc/client$


the grep I did only got a hits at
config.status:old_striplib='strip --strip-debug'
configure:  test -z "$old_striplib" && old_striplib="$STRIP --strip-debug"
libtool:old_striplib="strip --strip-debug"

and all 3 of those "hits" showed up in the old dev system and the new one. Also I don't know the significance of those "hits" FWIW.
202) Message boards : Questions and problems : Reporting timer? (Message 93607)
Posted 10 Nov 2019 by Profile Joseph Stateson
Post:

Milkyway has
<request_delay>91.000000</request_delay>
I'm more interested in

<min_sendwork_interval> N </min_sendwork_interval>
Minimum number of seconds between sending jobs to a given host. You can use this to limit the impact of faulty hosts.
I'm not yet certain that this is the one which emerges as <request_delay>, but I think it's a more plausible candidate.
Edit - candidacy confirmed (I think) by https://github.com/BOINC/boinc/blob/master/sched/sched_types.cpp#L784


I have been looking at this and found a code change to the client to correct for the deficiency, or more likely, improper setup at the Milkyway project. I am not proposing to change the client, rather want to know what is wrong at the project end that allows my change to cause the work flow to work properly.

The problem as stated (many times): A block of MW work units arrive with the scheduler reply which has that 91 second delay requirement. A number of work units are processed, usually takes a minute each, and, a minimum of 91 seconds later results can be returned. Unlike other projects I am familiar with, Milkyway does not download any new work when results are uploaded. No work is download until the last of the work units are uploaded and only after a 10 minute delay.

I looked in cs_scheduler.cpp at
// Write a scheduler request to a disk file,
// to be sent to a scheduling server
//
int CLIENT_STATE::make_scheduler_request(PROJECT* p) {


and noticed that results, if any, are attached to the scheduler request a the location
p->nresults_returned = 0;
		for (i = 0; i<results.size(); i++) {
			rp = results[i];
			if (rp->project == p && rp->ready_to_report) {
				p->nresults_returned++;
				rp->write(mf, true);
			}


On a hunch, I allowed the results to be attached only if 91 seconds had elapsed since the last actual upload of results. However, the 91 second value was not available as that variable was in scheduler_reply, not scheduler_request so I used a local constant that could be obtained from the cc_config file for testing purposes.

I made a linux version in addtion to win32 and win64 and put the source code here along with printouts of work flow.

Looking at those event messages (work flow) from the three different systems running milkyway, you can see that new data is downloaded concurrently with uploads as is the normal behavior of on other projects.

To restate my solution: The project only "honors" a request for work if no existing results are attached to the scheduler request for at least 91 seconds.
Perhaps this can be a clue to find the real problem.
203) Message boards : BOINC client : problem building linux client: binary code is 20x bigger than before (Message 93601)
Posted 9 Nov 2019 by Profile Joseph Stateson
Post:
It is big because it has debug stuff. I grep for "--strip-debug" but it was there so I am at a loss to why all the debug stuff was in the executable. my older project from 2 months ago was obtained the same way and has no debug stuff

stateson@jysdualxeon:~/boinc/client$ file boinc
boinc: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=753642cbdfe8381bf86e41d736eece30774dd318, with debug_info, not stripped


so how do I strip the debug from it?
204) Message boards : BOINC client : problem building linux client: binary code is 20x bigger than before (Message 93599)
Posted 9 Nov 2019 by Profile Joseph Stateson
Post:
I am not an expert on gcc compilers

I have two previous build of the linux boinc client and both builds were small:
jstateson@jyslinux1:~/Downloads/boinc-master/client$ ls -l boinc
-rwxrwxr-x 2 jstateson jstateson 1517832 Nov  1 01:22 boinc
jstateson@jyslinux1:~/Downloads/boinc-master/client$


I decided to use a better system and did a new install of git and the client
I followed the same procedure as here except that package gm4 was not found.

Used https://boinc.berkeley.edu/trac/wiki/SourceCodeGit to get the master.

unaccountably, the final executable was 20x bigger. I assume it is full of debugging stuff? Why is it that big? I want it reduced down to the smaller size but do not know enough about gcc or the make files to do that
jstateson@jysdualxeon:~/boinc$ cd client
jstateson@jysdualxeon:~/boinc/client$ ls -l boinc
-rwxr-xr-x 2 root root 20075752 Nov  9 08:00 boinc


The program runs ok but is way to big. Must have the proverbial kitchen sink in it.

[edit] using the old makefiles did not help. I copied the 4 "Makefile" from the other Linux system and "touched" version.h but still ended up with a huge executable
205) Message boards : Questions and problems : API for downloading and uploading, and offline (Message 93578)
Posted 7 Nov 2019 by Profile Joseph Stateson
Post:
The closest thing to an actual "API" are the commands you can send to the app (the client) using boniccmd.exe. If you want to roll your own code to interact with the client then the add on tools here might be useful and especially the c# code at GitHub here. I was told it can run under Linux but AFAICT there is no native C# compiler for Linux.
206) Message boards : Questions and problems : Reporting timer? (Message 93573)
Posted 7 Nov 2019 by Profile Joseph Stateson
Post:

What do you mean by "most discussion boards"? If you mean outside of BOINC, then I disagree. Most forums I've used, if you change your mind or make a mistake, you can delete it. I see no advantage of forcing people to leave it there.


There are no advertisements in any of the boinc or project websites and they are not selling any products. I am happy with that. Ford, Toyota communities are funded using advertisements AFAICT
Apple, Microsoft and big players have plenty of money for bells and whistles.

Not sure how stackoverflow gets funded. They have over 80 affiliated sites and have a lot of bells and whistles.

However, if the "right to be forgotten" gets extended to "the right to be erased" then you might get your wish to be able to delete your posts
207) Message boards : Questions and problems : Reporting timer? (Message 93562)
Posted 6 Nov 2019 by Profile Joseph Stateson
Post:
Well, this would be the place to look: https://boinc.berkeley.edu/trac/wiki/ProjectOptions

But I must say I've never heard anyone discussing the need for a project setting like that, nor remember seeing one when I've been looking through for something else.


There has been an ongoing problem at milkyway with 10-15 minute delays before getting data. A lot of discussion but one of the key points was the moderator, Tom, who posted that they knew about the problem and it was "some obscure boinc setting somethere"
https://milkyway.cs.rpi.edu/milkyway/forum_user_posts.php?userid=1351529

It looks like you just found those "obscure boinc settings"

the following looks interesting

<min_sendwork_interval> N </min_sendwork_interval>
Minimum number of seconds between sending jobs to a given host. You can use this to limit the impact of faulty hosts. 


I think the problem is that this value, probably about 160 seconds ??? is OK but the project starts counting from the last time the user uploaded results. They need to start counting from the time they last downloaded. That is just a guess. I did not see anything else in that scheduler configuration that would cause the count to start at the last upload. If they start the count from the last time the user asked for data then that is OK but only if data was actually sent to the user. None is and I think that is the problem.

Are these files available to examine? I assume they are on the server and hidden.

[EDIT]
Was looking at
<next_rpc_delay>x</next_rpc_delay>
In each scheduler reply, tell the clients to do another scheduler RPC after at most X seconds, regardless of whether they need work. This is useful, e.g., to ensure that in-progress jobs can be canceled in a bounded amount of time. 


I wonder if setting that value to be greater than the "min_sendwork_interval" would fix the problem? That should cause the client to wait minimum of 160 seconds (or whatever) before uploading results and attaching the "piggyback" work fetch request.

I asked Tom to send me a copy of the file.
208) Message boards : Questions and problems : Ryzen 2600 not 100% utilized, not thermal throttling (Message 93516)
Posted 4 Nov 2019 by Profile Joseph Stateson
Post:
Tools used to perform calculations on GPUs (OpenCL, CUDA) have a limitation of 4gb address space At startup, the boinc message box can show (NVidia) how much video memory is available and it is never more than 4 no matter how much ram is on the board. If you are running gpugrid and have 63c temp that is excellent.

I added an additional 1070 to my existing pair on my area51 system and had to retrofit a liquid cooling system (eVga hybrid) as it got to hot. my hottest board is 80c and that is close to thermal limit (83c) The other boards (not shown) are 73c and 55c with the 55 the water cooled eVga. The fan noise is pretty bad during the summer. Memory used on grfidcoin ranges from 700mb to 900mb with controller load 42. The GPU load moves a lot during computations and can swing from 90 down to 40 and then back up quickly.

209) Message boards : GPUs : Two projects on one GPU? (Message 93507)
Posted 4 Nov 2019 by Profile Joseph Stateson
Post:

A long time ago (I think on a Radeon HD 290) I used to run more than one (about 3) Milkyway tasks at once (just using the one client). I got a reasonable speed increase. But now I don't, the GPU is already running at about 95% anyway with just one task. Either Milkyway has changed, or this card is different.



not sure when but a few years ago milkyway started doubling up the number of work units each job has. looking in a result file one finds
<number_WUs> 4 </number_WUs>
so currently each job is 4 simple work units
210) Message boards : Questions and problems : Any recommendation on avoiding linux upgrades that break drivers (Message 93319)
Posted 26 Oct 2019 by Profile Joseph Stateson
Post:
This does not happen very often but when it does it can be a PITA to recover from.

Just rebooted an 18.04 system and had to reinstall NVidia drivers. Unlike the AMD ones the NVidia frequently survive an upgrade.

From googling the difference between update and upgrade it seems I need to avoid the upgrade. I think I know what happened: I put in an ftp server and did an update followed by an upgrade. I assume the problem was the upgrade. I can't find where I got the walkthrough for the ftp server but I suspect they did it and I just copied what they did and pasted it. The problem with searching for help on Linux is there are numerous OS changes and a walkthrough for xxx.23 does not work the same as when you are running a different version or kernel.

I also get notices occasionally of "updates waiting". I assume those are needed. Have also noticed the list of updates seem to grow quickly if I don't apply them.

I assume that if I stop doing "upgrade" I will not have to reinstall any NVidia or ATI drivers in the future. Is that correct?
211) Message boards : Server programs : bug in server: report to project or to boinc? (Message 93314)
Posted 26 Oct 2019 by Profile Joseph Stateson
Post:
I have been trying to debug why no work is downloaded from milkyway when completed results are uploaded.

AFAICT other projects download a few (or a bunch!) on every upload. Not milkyway, and that was the topic of that thread I listed.

From trial and error, I, and others, have observed that MW does not respond to an update unless about 160 seconds have elapsed since the last request for data and that request must be after all the data is uploaded (or lost)

Users consider this a bug in the server and that it is compounded by boinc waiting for about 15-20 minutes to elapse before asking a second time.

My guess was that Milkyway was considering the upload of "completed results" to be the start time of the "last request".

I got to looking the windows code since I can finally build the client in VS2013.
Looked at: schedule_op, cs_schedule and work_fetch

I noticed that the field "req_secs" was used by the client to report "not asking for tasks" That field was defined as number of seconds of data that the device wants.

Tried the following: Made a mod the client so that "req_secs" was always 0 when sent to the project. Added another client mod to the "piggyback" routine so that when I clicked on "update" the field req_secs was set to a big number of seconds. The idea being that every time data was uploaded to Milkyway there would be no "want more data" but when I manually did an update it would ask for data (req_secs > 0).

Anyway, it didn't work, nothing got downloaded. A previous mod I tried was also unsuccessful: I allowed 3 minutes of data to accumulated before allowing an upload. Again, nothing got downloaded.

I have decided there are other challenges more interesting. I do have my own solution to the MW problem: I set a rule in BoincTasks that waits 160 seconds after the last Milkyway tasks is completed and then issues an RPC update. Since BT itself only checks every couple of minutes there is a worst case of 5-6 minutes of idle time before additional stuff gets downloaded and executed as shown in picture below. Other users simply use a dos script on the local system and loop boinccmd.exe to issue an update every 160 seconds. While this works, I do have 6 GPUs that are idle for 7 minutes and that represents just over 42 work units. I used to let Einstein run with "0" resource but recently some of their task take an unusual amount of time to finish. My idle time of 6 minutes is livable. The 15 to 20 minute wait was unacceptable.

212) Message boards : Server programs : bug in server: report to project or to boinc? (Message 93312)
Posted 25 Oct 2019 by Profile Joseph Stateson
Post:
The server code is contained in the exact same BOINC/boinc repository at GitHub - you can also choose server_release/1/1.0 and server_release/1/1.2 branches if you want confirmed, stable, code.


Thanks, that helped me realize my problem. I did not know what to look for. After reading this guide I realized I had already downloaded the server code.

What got me confused in the first place was when I was tracking down a possible bug in the milkyway project. I wanted to start with the message last request too soon and see where the problem came from. I could not find that phrase when using VS2013 "find in entire solution". I found that phrase on my ubuntu system. There is probably a way to do it under windows 10 but not with win10 "find" as it is not recursive AFAICT.
jstateson@jyslinux1:~/Downloads/boinc-master/sched$ grep -r "last request too recent" .
./handle_request.cpp:                        "Not sending work - last request too recent: %f\n", diff
./handle_request.cpp:                        "Not sending work - last request too recent: %d sec", (int)diff


I do not know how to get that source code into VS2013. I did the following search find -i "handle_request" *.vcxproj and could find any reference to any (c++) project. Since it ends in cpp I would have thought I could bring up VS2013 and look through the code. I am pretty sure I know what is going on and could help the project out. The last info I read about the project looking at this was here and no answer has been given since.

I did not want to post what I think is wrong unless I can read through the server code and would like to use VS2003 for its GUI features.
I just did a "find /I "win32" in the "sched" folder and it appears that code is all Linux so I cant use vs2013 for debugging
I have a suspicion that it is not a bug but a way to reduce load on the database but that is a guess. If a real bug I think I know what has happened.
213) Message boards : BOINC Manager : VS2013 Boinc build: build tools missing (Message 93276)
Posted 23 Oct 2019 by Profile Joseph Stateson
Post:
As I said in that thread, I did update Compile Client for Windows with a note on that limitation. I feel your pain in discovering it afresh - it took me some time to work out what was going wrong, too!



I looked into this and the problem is actually the short name for the win_build. The "Z" would have worked OK.

I made an "issue" out of it and I suspect the "fix" is to just make another note on that wiki you listed above


https://github.com/BOINC/boinc/issues/3347
214) Message boards : Server programs : bug in server: report to project or to boinc? (Message 93254)
Posted 21 Oct 2019 by Profile Joseph Stateson
Post:
Occasionally I find something strange but it could be just the way the project sets up the server.

One bug I am sure about is that there seems to be a timing issue :

(1)I make a venue change, the venue is picked up on the next project update but the contents of the venue are not acted upon during that data exchange. ie: if the new venue says "do not send x" I can get an "x" anyway but never again. This is difficult to repeat but I it has happened on several occasions.

(2)If I create an app_confrig.xml that requires only "X" be downloaded I get an "X" anyway but thereafter only the correct "y"

I think the server looks to see what to send me before it notices the venue changed or the existence of an app_config.xml but it has already made up its mind what to send me.

For what it is worth I am pretty sure I can duplicate (2) It could be that some of these bug are in earlier server and projects are not always running the latest.

I could not find any server code at GitHub but I might have missed it.
215) Message boards : BOINC Manager : VS2013 Boinc build: build tools missing (Message 93249)
Posted 20 Oct 2019 by Profile Joseph Stateson
Post:
There is a thread here back in Feb 2018 about this exact problem.

That thread is too old to post to but the problem has not changed.
There is a limit of 13 characters for a folder name was the conclusion

I had to change this...
Z:\Src\boinc\win_build>buildenv type release platform x64
Initializing BOINC Build Environment for Windows
Software Platform Detected: Visual Studio 2013
Software NOT Detected: Build Tools...


to this...
c:\Src\boinc\win_build>buildenv type release platform x64
Initializing BOINC Build Environment for Windows
Software Platform Detected: Visual Studio 2013



While none of the above folder names exceed the 13 character limit mentioned in the original post, the drive "Z" is actually the network path
\\xxx\yyy\VS2013projects
so it seems the 14 character VS2013projects is not kosher even in a network path mapped as a drive letter.

FINALLY GOT A WIN BUILD!
2>------ Build started: Project: boinc, Configuration: Release x64 ------
…
2>  boinc_cli_vs2013.vcxproj -> c:\Src\boinc\win_build\.\Build\x64\Release\boinc.exe
216) Message boards : Questions and problems : Is there a "wish list" (Message 93237)
Posted 16 Oct 2019 by Profile Joseph Stateson
Post:
You asked where you can request a feature.

The developers don't read this forum, no. Or maybe once in a very blue moon. Or when someone points out a post, and even then I wouldn't hold my breath. We never said they read this forum. Or any project forum for that matter.
Your best bet is always at Github, make a new issue on https://github.com/BOINC/boinc/issues and use the Label option to name it Feature Request. Perhaps that someone looks at it then, but even that's not guaranteed.


Thanks Jord. I did not mean to insinuate that no developers ever read the forums. I have interacted in the past with devs on this and other forums.
I posted my suggestion as a "feature" where you recommended

https://github.com/BOINC/boinc/issues/3337

Hope someone comments on it especially if there is a side effect for what I was doing.

[edit] I commented on another issue that was first brought up here but no one seems to be aware of or want to resolve

https://github.com/BOINC/boinc/issues/3246
217) Message boards : Questions and problems : Is there a "wish list" (Message 93232)
Posted 16 Oct 2019 by Profile Joseph Stateson
Post:
Been two weeks and no one commented on my wish. Maybe not the place any developers look or more likely no interest.

Had another wish but this time I implemented it. I have boinc "client" code I built and tested on Linux that seems to work. Where can I discuss this "feature" I added. I have a GitHub account and can put the few files I changed on it but I think the developers might want to make recommendations first.
218) Message boards : Questions and problems : GPU Missing, Waiting to run (Message 93223)
Posted 15 Oct 2019 by Profile Joseph Stateson
Post:
I just upgraded the GPU in my Mac Pro from the stock GPU to a Sapphire RX580. I don't seem to have any problems save for one: GPU tasks on BOINIC are suspended: "GPU Missing, Waiting to run. Has anyone had this issue before? I've read about it on Windows computers, but I can't seem to find any Mac specific info.

Thanks!


Yea, your old mac pro had (I assume) an NVidia board and any task downloaded for it wont run. You should be able to get rid of those error messages by doing a project reset. If wasn't a different manufacture then some other problem like OpenCL missing.
219) Message boards : Questions and problems : permission problem: if client cannot run the app (wrong owner) why does it delete the work unit? (Message 93199)
Posted 14 Oct 2019 by Profile Joseph Stateson
Post:
So, if the programmers decide to chase down that bug maybe they might consider "fixing" the "feature" that allows uses to fake or spoof the number of GPUs from just 1 physical up to 96 or as many as they want. Something along the lines of

<number_virtual_gpus>128</number_virtual_gpus>


I never understood why it is possible to spoof the number of cpu cores either. I think from memory when I tried it to see what happens it resulted in tasks crashing but that may just be my experience. I didn't actually try to download extra work with it though.


I did figure out the "how" but like you did not want to run any work units.

In cs_scheduler.cpp the following code segment obtains actual system information including number of GPU"s:
// send master global preferences if present and not host-specific
    //
    if (!global_prefs.host_specific && boinc_file_exists(GLOBAL_PREFS_FILE_NAME)) {
        FILE* fprefs = fopen(GLOBAL_PREFS_FILE_NAME, "r");
        if (fprefs) {
            copy_stream(fprefs, f);
            fclose(fprefs);
        }


Further down, below, where the client sends messages to the project, I wrote over that data
 FILE* fprefs = fopen("spoof.txt", "r");
        if (fprefs) {
            copy_stream(fprefs, f);
            fclose(fprefs);
        }


The file "spoof.txt" had a fake number of GPUs but probably had a lot of wrong stuff as I was guessing, but at this point in cs_scheduler it seems only the number of GPUs are used.

Like the SETI GPU Users group I cannot give away all my secrets so the contents of spoof.txt is my little secret.

I have not decide whether to follow through on my "calling" to bunker up work units before the next WOW event but I have already come with a system name: "NumberOfBeasts". While I have not changed my domain name yet, I have picked an appropriate boinc client version number as shown here https://setiathome.berkeley.edu/show_host_detail.php?hostid=8830364
220) Message boards : Questions and problems : permission problem: if client cannot run the app (wrong owner) why does it delete the work unit? (Message 93197)
Posted 14 Oct 2019 by Profile Joseph Stateson
Post:
I'd be very careful about automating this, outside the control of a tight user group. If an automated tool became widely available and increases the server load as before, it could cause the devs to hunt down and fix the server programming bug which opened the loophole in the first place


Thanks for explaining this! I was unaware of any server problem. I thought things were hunky-dory and did not know chasing down lost work units was akin to running amuck through the data base. Sorry for my American slang.

So, if the programmers decide to chase down that bug maybe they might consider "fixing" the "feature" that allows uses to fake or spoof the number of GPUs from just 1 physical up to 96 or as many as they want. Something along the lines of
 <number_virtual_gpus>128</number_virtual_gpus>

would make it a lot easier to download a lot more work units than is normally allowed for just 1 or 2 GPUs.

The problem of my lost tasks originated from my attempt to duplicate the "bunkering" of work units that a few (?) users do before the SETI WOW event that is held yearly. Discussion of that topic is buried midway down this thread

I was able to duplicate the work unit bunkering using a pair of boinc clients on the same machine and 1 or 2 GPUs for testing. The idea being to set "NumClients" to something like 1000 or more and download, accumulate, and process (but not upload) work units during the approximately 2 or 3 months before the WOW event starts
let NumClients=2
let BasePort=31416
for (( n=0; n < NumClients; n++))
do
NumPort=$((BasePort+n))
echo sudo /usr/bin/boinc --gui_rpc_port $NumPort --dir /home/jstateson/nuke$n --detach
done


The process of holding back work units until the WOW event but releasing those before and after the event time period can be handled by a Boinctask "rule" and an app I have.
In the process of testing out my idea (I have nothing better or more interesting to do), some s**t did hit the fan so to say but once it gets working it can be upscaled from 2 clients to what is best for my Linux box.
I thought I was helping the project out by recovering my lost tasks but it was a waste of time though a learning experience.

[EDIT] As far as the original question on this forum, it seems to me that the client should determine if the app is truly missing. If not accessible it should report that in an event message or notification and NOT delete the work units.
221) Message boards : Questions and problems : permission problem: if client cannot run the app (wrong owner) why does it delete the work unit? (Message 93192)
Posted 14 Oct 2019 by Profile Joseph Stateson
Post:
Follow up.

I looked deeper into stdoutdae.txt and picked the first file that was downloaded and then disappeared. The filename was
blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vla

root@jyslinux1:/home/jstateson/nuke1# grep -i "blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vlar" *.txt
stdoutdae.txt:13-Oct-2019 19:15:09 [SETI@home] Started download of blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vlar
stdoutdae.txt:13-Oct-2019 19:15:12 [SETI@home] Finished download of blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vlar
stdoutdae.txt:13-Oct-2019 19:29:01 [SETI@home] Starting task blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vlar_0
stdoutdae.txt:13-Oct-2019 22:14:24 [SETI@home] No application found for task blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vlar: platform x86_64-pc-linux-gnu version 801 plan class cuda90; discarding
stdoutdae.txt:13-Oct-2019 22:14:24 [SETI@home] State file error: result blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vlar_0 not found for task
stdoutdae.txt:13-Oct-2019 22:20:29 [SETI@home] Couldn't delete file projects/setiathome.berkeley.edu/blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vlar.gzt


The above info from that error files makes it look like boinc was unable to delete the file after all. However, it was attempting to delete
blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vlar.gzt
I assume gzt is the compressed file and it must get uncompressed before being stuffed into /projects/setiathome.berkeley.edu
I have never seen a .vlar.gtz only the .vlar

Be that as it may, while it appears the client was unable to delete the file, THE FILE ALONG WITH ANOTHER 100 WERE ACTUALLY DELETED. I am guessing that it deleted the uncompressed file and then, for who know why, tried to delete the compressed file which probably did not exist anymore. Just a hunch.

According to Keith Myers lost tasks can be recovered. Sure enough, I was able to recover all 2 tasks out of the 100. One has about 0.75 to 1.50 seconds to click "no network activity" after seeing "Reported xx tasks" but before "Scheduler request completed" and it took me 5 tries before I got the timing correct. I only got 2 lost tasks recovered because I had restored the priority from to 100 from 0 but boinc checks the current resource and see "0" before it discovers that the project had specified "100" and by then it is too late and only one lost task for each gpu gets downloaded.

Anyway, I can eventually get the rest of the lost tasks, if I want to. Click on Keith's name above to see his procedures which I tried.

that brings me to a second question. Instead of quickly clicking on "no network activity" within a second (after waiting maybe 15 minutes for the chance) could the program scheduler_op.cpp be modified as follows:
        if (p->nresults_returned) {
            msg_printf(p, MSG_INFO,
                "Reporting %d completed tasks", p->nresults_returned
            );
        }
        request_string(buf, sizeof(buf));
        if (strlen(buf)) {
            msg_printf(p, MSG_INFO, "Requesting new tasks for %s", buf);
===========> PUT IN AN EXIT HERE<==========
        } else {
            if (p->pwf.project_reason) {
                msg_printf(p, MSG_INFO,
                    "Not requesting tasks: %s", project_reason_string(p, buf, sizeof(buf))
                );
            } else {
                msg_printf(p, MSG_INFO, "Not requesting tasks");
            }


I do not know how that "ghost protocol' came about but if one is supposed to stop all network activity immediately after seeing that "Requesting new tasks for …" message it seems to me an easy fix is to have a special build of the client and just have it exit right after that message is displayed. I have been able to build the Linux version of boinc and was wondering if my idea would work. I have found it very difficult to perform Keith's procedure to recover lost tasks as I have a fast computer and the system is remote which has some latency.

[EDIT] Amazing - I totally forgot to post this message after preview. I then went off to other web pages and discovered it had not been posted when I click on my back link. I then hit the back arrow until I got to a "are you sure you want to re-submit". It worked! I did not lose my post!. I guess I have Microsoft to think for this.
222) Message boards : Questions and problems : permission problem: if client cannot run the app (wrong owner) why does it delete the work unit? (Message 93191)
Posted 14 Oct 2019 by Profile Joseph Stateson
Post:
I started BOINC manually and failed to use suda. Consequently, the client was running under my name instead of root (I am guessing this is the problem)

It could not find any app even though I can see them with "ls -l" or whatever (ubuntu).. OK, I forgot the suda. It appears the client can delete the work units!!! So how can it get away with deleting the work units after it finds it cannot access the app? Seems like it should have just reported the work units were not there! That caused 100 work units to be trashed.

ls -l
-rwxr-xr-x 1 boinc boinc 181979256 Oct 12 19:06 setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90


13-Oct-2019 22:14:24 [SETI@home] State file error: missing application file setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda90

13-Oct-2019 22:14:24 [SETI@home] No application found for task blc64_2bit_guppi_58692_57863_HIP21594_0020.9424.818.21.44.28.vlar: platform x86_64-pc-linux-gnu version 801 plan class cuda90; discarding


Maybe permissions and ownership are wrong. Maybe it could not access inside "/projects" I am not an expert on Linux. Since I am admin it seems boinc would be able to find and run that seti app even if I forgot to use suda.

Is it necessary to use suda when running boinc?. Maybe I should not have used suda in the first place when I caused the boinc folders to get created.
This is what I used to create the boinc folders.
sudo /usr/bin/boinc --gui_rpc_port 31418 --dir /home/jstateson/nuke1 --detach

The above created alternate implementation of boinc at "nuke1" in my home directory. I am running some tests using multiple instances of the client after reading this windows discussion as a reference. There is no "suda' problem in windows.

What is best way in linux to set up multiple clients? is SUDA necessary? Maybe I am going where not many have gone before!

Is there any way to recover the work units? I read reading something about recovering "ghost" work units over at SETI. They call it "ghost protocol" Is that applicable here?
223) Message boards : Questions and problems : use all CPU power on 1 task at a time (Message 93120)
Posted 7 Oct 2019 by Profile Joseph Stateson
Post:
Is there a way to get one task to run at a time and use all the CPUs available, I presume the task would finish faster??


Only for those projects specifically coded to use more than one CPU.

Milkyway: nbody

gpugrid: quantum chemistry (if they have any work units)

Amicable Numbers

there may be others.
224) Message boards : Questions and problems : GPU and CPU Temperatures. (Message 93050)
Posted 4 Oct 2019 by Profile Joseph Stateson
Post:

Now, I got a I7-8700 in a case with 5 120mm fans (2 in and 3 out), with a Stryx RX-580 GPU... Oh! And the CPU has a aftermarket cooler not the crappy original CPU fan.


Pick a more aggressive curve for fan control and make sure there is nothing blocking the 580's fans intake. On a mining system with several rx570 I had to set all their fans at %100 for 24/7 crunching.
225) Message boards : Questions and problems : Is there a "wish list" (Message 93037)
Posted 3 Oct 2019 by Profile Joseph Stateson
Post:
Is there a place where I can request a feature? I looked here and at GitHub but didn't see anything obvious. Some developers have stuff spread over all types of communities such as Reddit, Steemit, Twitter, Discord and Slack. Looking under BIONC's "web resources" I see subreddit, LinkedIn and Facebook listed but there is no description of what that media is to be used for or its purpose.
226) Message boards : Questions and problems : Compute error - SIGSEGV: segmentation violation (Message 92884)
Posted 21 Sep 2019 by Profile Joseph Stateson
Post:
With CPDN, it stopped I think after a newer version of the particular model type so not a lot you can do at the user end. I can't comment on the Seti@home because I have never seen it with them.


I haven’t seen any work being downloaded but I will wait a week and see what happens.

Aside from trying different projects to see what happens, how can I test the possibility hardware issues? I see a memtest86+ but it comes with mixed reviews.



That memtest works fine. I have used in on latest Dell Area51 back to old dual opteron servers. Usually the ubuntu install comes with it.
227) Message boards : GPUs : AMD GPU Task Turns Computer Off Immediately (Message 92805)
Posted 15 Sep 2019 by Profile Joseph Stateson
Post:
.... So this tells me it is not a heat issue, I would expect the computer to run for a little while before shutting down.
Whilst you would not expect the temperature to rise instantly and therefore might expect to see a bit of a delay, maybe the firmware is using something other than a temperature change to invoke a protection mechanism. I have no idea if this is ever done but perhaps it might be current draw that triggers the response. You don't mention what project is supplying the GPU work but if that work is really compute intensive, maybe some current limit is being tripped. I could imagine that happening quite quickly - almost instantly.



Try underclocking the GPU using msi afterburner or, if supported, AMD's wattman. If it works at low speed then that current surge could be the problem.
228) Message boards : BOINC client : strange error msg: could not assign boinc user to group render (Message 92645)
Posted 31 Aug 2019 by Profile Joseph Stateson
Post:
I did an update to Ubuntu 18.04 followed by an upgrade and saw the following error messages:
Setting up boinc-client (7.16.1+dfsg+201908161115~ubuntu18.04.1) ...
usermod: group 'render' does not exist
Could not assign boinc user to group 'render'


Everything seems to be working fine. The BOINC client did terminate during that upgrade but a restart worked fine and I rebooted just to make sure.

I assume the errors are ignorable.
229) Message boards : GPUs : Client wont start Getting Stuck at OpenCL (Message 92644)
Posted 30 Aug 2019 by Profile Joseph Stateson
Post:
How you resolved that? Same issue is happening with me.


never heard back from original poster, be nice to know even if the answer seems "stupid". The saying "the only stupid question is the one that is not asked" can apply to the answer. OTOH the post had been up almost a week maybe they gave up and left.

my suggestion was to look in the windows event log for errors. the point where the client got hung up appears to be where it is asking for hardware info which it gets from the OS: 16-Jul-2019 01:07:01

pressing ctrl-c a couple of times could actually cause that exit at 16-Jul-2019 01:15:21

there are debug flags that can provide help but i am not not familiar with how to use them or how to interpret the results.

what problem are you yourself seeing?
230) Message boards : Questions and problems : Windows install issues (Message 92634)
Posted 29 Aug 2019 by Profile Joseph Stateson
Post:
On more than one occasion I have accidently installed 32bit BOINC on a 64 bit windows. I remember seeing errors with libraries. Just a guess. However, I was looking at the following


Faulting module name: LIBEAY32.dll, version: 1.0.2.7, time stamp: 0x56d5fc8e


Using https://www.freeformatter.com/epoch-timestamp-to-date-converter.html

Your LIBEALY32.dll is dated 3/1/2016, 2:33:18 PM

I just checked two of my 7.14.2 win10x64 systems and they show 12/18/2016 4:46 PM for the same version 1.0.2.7

Not sure of the significance. If your version includes VM possible the dll package is older.

I have not used service installs of BOINC since I got rid of my XP systems so I cannot advise other to say netplwiz can be used to log in automatically.
231) Message boards : Questions and problems : Windows install issues (Message 92625)
Posted 28 Aug 2019 by Profile Joseph Stateson
Post:
Is this 64 bit windows? 32 bit runtime problem?
232) Message boards : Questions and problems : need help debugging a problem: Linux 7.16.1 (Message 92621)
Posted 28 Aug 2019 by Profile Joseph Stateson
Post:
Have had this happen again. AFAICT it is cause by the GPUs that are on a splitter.

Was thinking about something along this line:

Instead of
 boinc_temporary_exit(180,"Cuda device initialisation failed");


do this instead since the app knows which failed


 boinc_temporary_exit(180,"Cuda device=7 initialisation failed");


Unless I am mistaken, that string is passed back to the BOINC client as it shows up in the message log.
The device id could be extracted by the client and it would know which CUDA device was defective.
Could this info be used to prevent tasks being assigned to that device?

Another piece of the puzzle from this task (not sure how long in SETI database)

The SETI app reports the following:
In cudaAcc_initializeDevice(): Boinc passed DevPref 7
setiathome_CUDA: CUDA Device 7 specified, checking...
   Device cannot be used
  Cuda device initialisation retry 1 of 6, waiting 5 secs...
setiathome_CUDA: Found 6 CUDA device(s):
  Device 1: GeForce GTX 1660 Ti, 5944 MiB, regsPerBlock 65536
     computeCap 7.5, multiProcs 24 
     pciBusID = 4, pciSlotID = 0
  Device 2: GeForce GTX 1070 Ti, 8117 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 19 
     pciBusID = 1, pciSlotID = 0
  Device 3: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 15 
     pciBusID = 5, pciSlotID = 0
  Device 4: GeForce GTX 1060 3GB, 3019 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 9 
     pciBusID = 3, pciSlotID = 0
  Device 5: GeForce GTX 1060 3GB, 3019 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 9 
     pciBusID = 8, pciSlotID = 0
  Device 6: GeForce GTX 1060 3GB, 3019 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 9 
     pciBusID = 10, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 7


I assume the Stderr output is from the app, not boinc, when writing the following:

Clearly, the client says to use device 7 (DevPref 7). It does not know that device is defective. If it did it would not have recommend that device. It needs some feedback from the app to make that decision. Complicating this is the "Device cannot be used" might apply to this project's app and some other project's app might not have a problem using the device. However, the above series of messages seem strange: Why is the app even trying other devices if it was given a preference of "7". This is best answered by the project, but the BOINC developers should be aware that the app is trying devices other than what was recommended because 7 had a problem. The client (IMHO) would never launch an app unless a resource was available. ******

Another factor is the pciBusID. As above they are numbered: 4,1,5,3,8,10. Note that "2" is missing

When I ran nvidia-smi on that system that generated the above paragraph from Stderr Output, I get the following ***
jstateson@tb85-nvidia:~$ nvidia-smi
Unable to determine the device handle for GPU 0000:02:00.0: GPU is lost.  Reboot the system to recover this GPU 


What is interesting is that BOINC runs just fine, the SETI apps on the other 6 GPUS also run fine, but the nvidia-smi app cannot get the handle to one of its GPU's and simple says to reboot the system. Handles are provided by the OS (Ubuntu 18.04) That gpu "Device 7" is hung, nvidia-smi says the bus id is 2, the client in the messages log uses devices D0...D6 and the SETI app uses 1..7. I do not know how the numbering of the bus-id works. One would think that the device driver's numbering would be used rather than a made up number (1..7) or (0..6) etc.

Also, I suspect this forum is not the place to offer constructive criticism. It is a public forum for questions / problems about running the client or manager and criticism here tends to bring out tribal instincts from non-programmers. Maybe there is a better place. For Gridcoin, the programmers tend to use steemit or reddit. GitHub also has a forum. Maybe there is a better place to discuss this, assuming anyone really wants to.

*** I thought that was funny. It reminded me of a project for the Canadian Navy I worked on. The contract specified that the system had to run a minimum of 24 hour without rebooting. Here, there is a problem with the GPU and the driver has lost communication so nvidia-smi recommends a reboot of the system. If the GPUs were each assigned target acquisition that could be a real problem in a naval conflict. Fortunately, BOINC is not a mission critical app, nor is SETI.

****** If a resource is available and an app is launched and that app fails to used that resource and then repeatedly tries to find another resource it seem this could cause a race between itself and the client as I assume the client is also looking for open resources. If a resource is freed, say GPU-x and the client gets x as a resource possibly the app could also get that same x which could cause a conflict. I am also seeing left over tasks that the client cannot terminate
7209	SETI@home	8/10/2019 3:34:45 PM	[error] garbage_collect(); still have active task for acked result blc32_2bit_guppi_58643_76143_HIP73005_0101.26078.409.23.46.97.vlar_0; state 5

It is just a guess /speculation that these are related but they only show up on my systems that have splitters to add additional GPUs.
233) Message boards : Questions and problems : need help debugging a problem: Linux 7.16.1 (Message 92595)
Posted 26 Aug 2019 by Profile Joseph Stateson
Post:

From what I read around, if you set environment variable
export CUDA_DEVICE_ORDER=PCI_BUS_ID
the GPU IDs will be ordered by pci bus IDs and will show the same output as in nvidia-smi.


Thanks Jord!

Tried that, first in bash and then ran /etc/init.d/boinc-client restart
Did not work so I then edited "profile" and rebooted
Had same problem but at least it was not missing when I logged in using xterm.
I then a /etc/init.d/boinc-client restart while in bash but no change.

Get the following all the time. As you can see nvidia reports different order. The boinc manager matches the coproc_info.xml file.

TERM=xterm
SHELL=/bin/bash
CUDA_DEVICE_ORDER=PCI_BUS_ID
SHLVL=1
LOGNAME=jstateson
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
XDG_RUNTIME_DIR=/run/user/1000
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LESSOPEN=| /usr/bin/lesspipe %s
_=/usr/bin/printenv
jstateson@tb85-nvidia:~$ cd /var/lib/boinc-client/
jstateson@tb85-nvidia:/var/lib/boinc-client$ grep -i gtx coproc_info.xml
   <name>GeForce GTX 1660 Ti</name>
   <name>GeForce GTX 1070 Ti</name>
   <name>GeForce GTX 1070</name>
   <name>GeForce GTX 1070</name>
   <name>GeForce GTX 1060 3GB</name>
   <name>GeForce GTX 1060 3GB</name>
   <name>GeForce GTX 1060 3GB</name>
      <name>GeForce GTX 1660 Ti</name>
      <name>GeForce GTX 1070 Ti</name>
      <name>GeForce GTX 1070</name>
      <name>GeForce GTX 1070</name>
      <name>GeForce GTX 1060 3GB</name>
      <name>GeForce GTX 1060 3GB</name>
      <name>GeForce GTX 1060 3GB</name>
jstateson@tb85-nvidia:/var/lib/boinc-client$ nvidia-smi
Mon Aug 26 15:52:26 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40       Driver Version: 430.40       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  Off  | 00000000:01:00.0 Off |                  N/A |
|100%   41C    P8    13W / 180W |     12MiB /  8117MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    Off  | 00000000:02:00.0 Off |                  N/A |
|100%   46C    P8    12W / 151W |      9MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 106...  Off  | 00000000:03:00.0 Off |                  N/A |
|100%   40C    P8     8W / 120W |      9MiB /  3019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 166...  Off  | 00000000:04:00.0  On |                  N/A |
|100%   42C    P8    16W / 120W |     17MiB /  5944MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 1070    Off  | 00000000:05:00.0  On |                  N/A |
|100%   37C    P8     9W / 151W |     18MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce GTX 106...  Off  | 00000000:08:00.0 Off |                  N/A |
|100%   41C    P5     7W / 120W |      9MiB /  3019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  GeForce GTX 106...  Off  | 00000000:0A:00.0 Off |                  N/A |
|100%   41C    P8     9W / 120W |      9MiB /  3019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+


I then went to boinc-master and did a recursive grep for CUDA_DEVICE_ORDER
Nothing showed up but I did get a hit on PCI_BUS_ID but pretty sure it is not used for ranking
---------- GPU_NVIDIA.CPP
    CU_DEVICE_ATTRIBUTE_PCI_BUS_ID = 33,
        (*p_cuDeviceGetAttribute)(&cc.pci_info.bus_id, CU_DEVICE_ATTRIBUTE_PCI_BUS_ID, device);


It would really be useful for debugging purposes (hardware or software) if the GPU0...GPU6 shown by nVidia matches the D0..D6 as shown by BT or BM.

Back around 2007, before I retired, I took a picture of my self standing in front of a 4096 blade system that took up an entire bay. Maybe 8 huge racks of servers that were being shipped to an Okinawa army base. There was a problem with the 1394a control interface. No one pointed fingers or complained about hardware. It just had to be fixed and fixed it was, in software. I know of no way other than stopping the fan and making a note of which device stopped to identify GPUs. Will be more careful of where I put my finger in the future.
234) Message boards : Questions and problems : need help debugging a problem: Linux 7.16.1 (Message 92576)
Posted 25 Aug 2019 by Profile Joseph Stateson
Post:
I'd be prepared to place a small bet that this is a hardware problem, not related to the software version (of either BOINC or SETI) in use.


I agree 100% as this has never happened to me on a machine without GPU risers card and cables.


Risers and cables is a symptom of adding more GPUs on a motherboard than it was designed to use or the OS to manage or the drivers to handle.

I can run nvidia[-smi in a loop all day with 2 or 3 video boards and the fans speeds and usage are reported just fine. When I add additional GPUs I start seeking "ERR" under fan speed at random GPUs and usage varies erratically.

We are pushing the envelope: "going where no BOINC program has gone before" At least, for the 2 week WOW mission.
235) Message boards : Questions and problems : need help debugging a problem: Linux 7.16.1 (Message 92569)
Posted 24 Aug 2019 by Profile Joseph Stateson
Post:

My dual GTX 1660 Ti machine is currently drawing about 360W from the wall, falling to a little over 300W when the CPU is idled


This system has 670 at wall and power supply is either 750 or 850 Seasonic gold. Will have to pull it out to see exactly what it is. There are two gtx1060 on a 4-in-1 splitter and possibly those are the problem. Next time it fails I will remove the splitter and go with just 4.

[EDIT] is 850 watt. I used a DeWalt inspection camera to read the info. I managed to avoid knocking any of the x1 adapters loose on the rig under the power supply.
236) Message boards : Questions and problems : need help debugging a problem: Linux 7.16.1 (Message 92564)
Posted 24 Aug 2019 by Profile Joseph Stateson
Post:
Using the following two error messages
   Device cannot be used
  Cuda device initialisation retry 1 of 6, waiting 5 secs


I cannot find any matching phrase looking recursively through 7.16.1

grep -r "Cuda device initialisation retry"

grep -r "Device cannot be used" .

I did find the "Cuda device initialisation retry" in the SETI source and spotted the following as an exit during Cuda initialization:
	  boinc_temporary_exit(180,"Cuda device initialisation failed");


Somehow this error needs to get more visibility to the user. Possibly it is buried in the event queue. All I see in the manager is a lot of tasks "waiting to run" which is NOT an error but a symptom

I was unable to find "Device cannot be used" anywhere but if
	  boinc_temporary_exit(180,"Cuda device initialisation failed");

is reported to the client then they did their job even if not much.
237) Message boards : Questions and problems : What happened to "requested" and "granted" credits? (Message 92563)
Posted 24 Aug 2019 by Profile Joseph Stateson
Post:
WCG does as well.


Thanks, I knew I had seen it somewhere.

I have a program that graphs work unit elapsed time by GPU. I had the idea of graphing credit instead. According to Richard (another post somewhere but I don't remember where) the credit estimate is based on expected work to be done which goes into calculating the time estimate. If I could figure out how to get that original "requested credit" it might make a more accurate plot of credit runtime. There is a nice plot of actual credit runtime here but it was done manually by looking up 100 values from the project web site. I would like to implement something like this in my Boinctasks history analyzer but need something close to the actual credit since I cannot get the true credit.

Another problem: for some reason, WCH is the only project I cannot scrape for statistics.

It appears they do more security checking than any other project and my program cannot access my data even though I have auto logon enabled. There is probably a way to access it but it is not worth the trouble to debug,
238) Message boards : Questions and problems : need help debugging a problem: Linux 7.16.1 (Message 92562)
Posted 24 Aug 2019 by Profile Joseph Stateson
Post:
This is likely a hardware problem. It is solved by rebooting but I would like to know what could cause this. I now have the capability of building the (Linux) client and could look at where this occurs and possibly come up with an error message that could notify the user the problem has started.

---once every couple of days----

On a 5 GPU rig, one of the GPUs crunches for 0-1 seconds then goes on to another work unit. A queue of "waiting to run" starts building up. Because there are 4 other working GPUs, they pull from this queue so the queue grows only slowly. After about an hour or two there might be 40 items in the queue.

There are no error message in the event queue and the work units all eventually finish and report back OK. There is just no productivity from the GPU that has the problem (assuming it is the same GPU)

There are "error" messages in the stderr file associated with the task.
https://setiathome.berkeley.edu/result.php?resultid=7986887720

Another problem (may be a feature): The GPU's are numbered 0..X where 0 is given to the "best" GPU and larger numbers for the "weaker". I do not know why BOINC bothers to rank GPUs. There seems to be no need and it makes it difficult to find which GPU is causing the problem assumeing the problem is a unique GPU. Why can't BOINC use the same GPU number that nVidia uses in nvidia-smi or that ATi uses in "sensors". Currently, I have to stop the fan spinning on a GPU. Look at nvidia-smi to see which GPU has the fan stopped, then make a note of the BUS-ID and then look up that BUS-ID in the file coproc_info.xml and then look at the Boinc Manager to see if that "Dx" is the same "Dx" that is crunching only 0-1 seconds. This is very awkward plus is dangerous depending on the fan type. I can post a picture of my bloody finger if anyone want to see it.

[edit] I may have looked in the wrong event queue using Boinctasks. Next time I will check the event queue more carefully for an error message.
239) Message boards : Questions and problems : What happened to "requested" and "granted" credits? (Message 92515)
Posted 17 Aug 2019 by Profile Joseph Stateson
Post:
At one time there was a report that showed credits requested & granted, and, after validation, the granted credits were filled in. Was that classic seti or one of the current projects? I looked at some project stats but didn't see any statistical column for "granted". I thought I had seen that somewhere. If that feature was available it would be nice to know a the time of upload what the requested credit was. Even if the value was wrong, it would be useful for a rough estimate of credit performance.
240) Message boards : Questions and problems : Ubuntu: how to kill a task owned by boinc? (Message 92466)
Posted 12 Aug 2019 by Profile Joseph Stateson
Post:
The task is (pardon the screen/text grab)
3376 jstateson	20	0 29696	-3808	3284 S	0.0	0.1	0:00.03 -bash
3407 boinc	30	10 78.8G	79014	36014 S	0.0 10.0	0:00.00 ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.98bl_x86_64-pc-linux-gnu_cuda90


so, the owner of task 3407 is "boinc" and I own 3376

using sudo kill -9 3407 has no effect when that task is "hung"

can I log in as "boinc" to terminate it? Is there a password?
Maybe it is hung so bad it cant receive the terminate signal.
Is there another way to terminate?

$ sudo reboot

If a sigkill is not working, it isn't a BOINC issue. Are you sure that the reaper (launchd PID 1) is able to run? It may take the O/S a while to get to it.


I think that is what happened - the O/S was unable to respond. This was a 4 core system with 10 GPUs and as the motherboard had only 6 slots I had to used splitters. I was even getting periodic messages from nVidia that it had "lost" a GPU or two when reporting fan speeds. I am back to a single GPU for each slot and no problems. It did work good for a long time with 8 but when I added the last pair of gtx1060 and a second splitter all hell broke lose. It was difficult to determine which board had the problem as they had the same name "gtx1060" and I had to stop the fan with my finger and check the fan speed report to identify which board I was testing. On rare occasions i have seen windows 10 task manager unable to terminate a task but usually I don't have to manually reboot windows when this happens as it reboots itself within a second or two after a blue screen. This was the first time kill -9 did not work which I did not expect.
241) Message boards : Questions and problems : Ubuntu: how to kill a task owned by boinc? (Message 92456)
Posted 11 Aug 2019 by Profile Joseph Stateson
Post:
The task is (pardon the screen/text grab)
3376 jstateson	20	0 29696	-3808	3284 S	0.0	0.1	0:00.03 -bash
3407 boinc	30	10 78.8G	79014	36014 S	0.0 10.0	0:00.00 ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.98bl_x86_64-pc-linux-gnu_cuda90


so, the owner of task 3407 is "boinc" and I own 3376

using sudo kill -9 3407 has no effect when that task is "hung"

can I log in as "boinc" to terminate it? Is there a password?
Maybe it is hung so bad it cant receive the terminate signal.
Is there another way to terminate?
242) Message boards : Questions and problems : strange error: garbage_collect ?cannot collect? (Message 92454)
Posted 11 Aug 2019 by Profile Joseph Stateson
Post:
Trying to debug the problem as it is happening once or twice a day.

it would appear that memory is not a problem.

Looking here
if (rp->got_server_ack) {
            // see if - for some reason - there's an active task
            // for this result.  don't want to create dangling ptr.
            //
            ACTIVE_TASK* atp = active_tasks.lookup_result(rp);
            if (atp) {
                msg_printf(rp->project, MSG_INTERNAL_ERROR,
                    "garbage_collect(); still have active task for acked result %s; state %d",
                    rp->name, atp->task_state()


State 5 means finished ok from what I understand. Looks like the Linux seti app does not realize it finished.

On my boinc manager, under status I see the following typical behavior
....running....uploading....ready-to-report

(1) At what point is the status set to 5? Is it after the upload? after the "ready to report"
I am guessing the error occurs as the 5 is generated just after finishing the "running" but "uploading" does not take place for some reason. So it is got the server ack but is marked as still running or a dangling "active task".

(2) what exactly does "uploading" mean?

(3) what exactly does "reporting" mean?

Could there be a timing problem in the app when looking for the ack from the server? Who handles the ack: boinc or the app?
Even if this is not a boinc problem I would like to know answers to 1,2 and 3 before going over to SETI and stirring the pot.

==============some other observations=============
kill and kill -9 do not kill the "dangling" task even under sudo. I am not an expert but kill -9 has always worked for me. I do see that "boinc" is the owner of the dangling task. Is that what is keeping me from being able to kill it? I would rather kill it than reboot. bionccmd --quit stops boinc but not that dangling task. A restart of the service failsL I see the task with command "boinc --detactgpu xx (don't remember exactly) and the task disappears and reappears as the service keeps trying to start but boinc never gets past that detectgpu. I end up with reboot of system and often have to power off and on as it never totally shuts down.
243) Message boards : Questions and problems : strange error: garbage_collect ?cannot collect? (Message 92453)
Posted 10 Aug 2019 by Profile Joseph Stateson
Post:
Two of my GPUs on a 10 GPU mining rig are stuck: 0% utilization with work unit showing %100 done

error messages:

7209	SETI@home	8/10/2019 3:34:45 PM	[error] garbage_collect(); still have active task for acked result blc32_2bit_guppi_58643_76143_HIP73005_0101.26078.409.23.46.97.vlar_0; state 5	
10233	SETI@home	8/10/2019 4:20:49 PM	[error] garbage_collect(); still have active task for acked result blc33_2bit_guppi_58643_86349_HIP33332_0131.3725.0.23.46.188.vlar_0; state 5	


what's happening?

googling I found a previous report dated 2010 over at SETI.

[EDIT] Cannot even kill boinc. tried sudo kill -9 8109 (boinc) and just kill 8109 and task 8109 never disappears from top or htop. Argument shows boinc with command line --detectgpu so it (7.16.1) seems stuck trying to detect the gpu and not bothering to accept the kill signal.

This was after using the /etc/init.d/boinc-client stop
to try to stop

going to reboot

[EDIT 2] Suspended and NNT and rebooted. The two "stuck" tasks were assigned GPUs 0 and 1 and finished in under minutes. resumed rest of tasks look back to normal.

maybe I ran of out memory with only 8gb and 10 gpus.
244) Message boards : BOINC client : Seems 7.16.1 has been released but not shown on download page (Message 92436)
Posted 9 Aug 2019 by Profile Joseph Stateson
Post:
Effectively, Gianfranco's PPA is "the" bleeding-edge test repository for BOINC under Linux. By adding that PPA, you are giving prior consent to receiving unconfirmed code.

Having said that, Gianfranco does build from official release branch code. The v7.16.1 release code branch was forked (without announcement) 10 days ago, but as yet no formal testing processes have been initiated. I installed the same version myself this morning, and I've initiated some conversations with other members of the development team. That's as much as I'm prepared to say until I hear back from the other developers.


Just wanted to get something other than the 7.9 that comes form 18.04 default install.

Noticed the following:

8/9/2019 8:21:09 AM	GUI RPC request from non-allowed address 2.0.206.175	

that ip address is on your side of the pond. This really concerns me but I do know how to follow up on this and it is scary that every install of Linux boinc shows an attempted probe. Maybe is is just a way to count how many deployments have been made but I have seen probes from more than one location and always just after a fresh install.

[EDIT]
Some thoughts on dependencies and how things can quickly change.

I keep a list of what I did so as to quickly install ubuntu followed by either AMD or NVIDIA (gave up on Intel).

Things change:
Wanted to convert AMDGPU-PRO system to NVidia (ubuntu 18.04) but amdgpu--uninstall does not exist even though recent posts on askubuntun and stackoverflow still show that as how to uninstall.

Anyway, I simply put in nvkidia and nvidia took over which was nice.
However my driver install for 390 that worked last week does not work anymore and googleing around found driver-430

Anyway, got it working
245) Message boards : BOINC client : Seems 7.16.1 has been released but not shown on download page (Message 92433)
Posted 9 Aug 2019 by Profile Joseph Stateson
Post:
sudo add-apt-repository ppa:costamagnagianfranco/boinc

followed by apt-get install boinc-client picked up 7.16.1

I checked at the boinc donwload page and only 7.14 is shown.
there is no warning about developmental, must the real McCoy

1			8/9/2019 7:52:43 AM	Starting BOINC client version 7.16.1 for x86_64-pc-linux-gnu	
2			8/9/2019 7:52:43 AM	log flags: file_xfer, sched_ops, task	
3			8/9/2019 7:52:43 AM	Libraries: libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3	
4			8/9/2019 7:52:43 AM	Data directory: /var/lib/boinc-client	
5			8/9/2019 7:52:44 AM	CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 430.40, CUDA version 10.1, compute capability 6.1, 4096MB, 3972MB available, 6852 GFLOPS peak)	
6			8/9/2019 7:52:44 AM	OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 430.40, device version OpenCL 1.2 CUDA, 8118MB, 3972MB available, 6852 GFLOPS peak)	


Since GPUGrid has serious problems I am switching those GPUs to the new fast seti app that uses Linux. The WOW even starts in a week. Maybe not go back to GPUGrid ever.
246) Message boards : Questions and problems : Trouble running 2 RX 570 on Einstein@home (Message 92419)
Posted 7 Aug 2019 by Profile Joseph Stateson
Post:
i recall that AMD can enable crossfire mode by default unlike nvidia. if enabled try disabling it
247) Message boards : GPUs : Seems SLI is causing GPU problems (Message 92415)
Posted 7 Aug 2019 by Profile Joseph Stateson
Post:
I put the SLI connector back on a pair of gtx1070 and have about 50 error'ed out GPUGrid tasks since. I assume the problem is SLI related but other people are reporting bad workunits at the same time over there.

Is this a known problem? Maybe it is particular to the CUDA implementation by GPUGrid. Anyway it is coming off.

[EDIT] Well what do you know! There is a suggestion they failed to update some security certificate. Murphys law: things left to themselves tend to get worst.

Leaving my SLI connector off just to be sure.
248) Message boards : BOINC client : Cannot find release 7.15.0 (Message 92409)
Posted 6 Aug 2019 by Profile Joseph Stateson
Post:
Spoof worked. SETI thinks I have 99 RX560s

Would there be a problem if I used 6.66.666 for the boinc version?
Going to think about this for a while before I actually do any processing. I don't want the GRC people to take away my grid coins thinking I am cheating.
Actually, this whole thing has become a joke. it should NOT be possible to do what I just did in the first place. I suspect that GPU users group over at SETI has opened a can of worms trying to get an edge on other users.

[EDIT] Have thought about this. Not going to pursue the spoofing. It was just a challenge to do the spoofing. A real challenge would be to get the client working under Visual Studio community but I don't think that would be useful to anyone either.

249) Message boards : BOINC client : Cannot find release 7.15.0 (Message 92393)
Posted 6 Aug 2019 by Profile Joseph Stateson
Post:
I am going to try and duplicate the "7.15.0 spoofed client" that was done by the GPU users group over at SETI. When (if) I get it working I will post here (unless there is an objection) on how to make the changes.
From what I see so far the following is an easy change

in file client_state.h use a bigger number say 10000
#define WF_MAX_RUNNABLE_JOBS    1000
    // don't fetch work from a project if it has this many runnable jobs.


more complicated is to ensure that when the XML sched_request is written out to force a big number to spoof SETI

<host_info>
    <n_usable_coprocs>2</n_usable_coprocs>
<coproc_cuda>
   <count>2</count>


TBar over at SETI suggested limiting upload when debugging to prevent tons of erroring out work units

If anyone with "inside" knowledge of the code changes, please PM me as I hate debugging with Linux (I retired when my company switched to Linux and CORBA). Far easier to set a breakpoint and see WTF is going on than to throw in console printouts

BTW, I downloaded two zips from GitHub at different times. The one I put in my windows VS2017 Dev system had version.h of 7.16.1 but I must have gotten the Linux download from a different spot at GitHub as it had 7.15.0 although the filename included the phrase 7.16

Starting with VS2015 code is binary compatible with future code (so I have read) unlike all earlier version of VS. Additionally after VS2013 there were huge changes. I spent a long time trying to get boinc to compile with VS2017 community before giving up. The problem was the (older) 3rd party libraries in addition to where they had to be found and downloaded from. I also found I had to make code changes in the headers as I was unable to use command line "defines" to fix problems. That would be a no-no for any backwards compatibility. I saw the same problem with the GRC "wallet". Someone tried doing a windows build but there was so much work required to maintain Linux compatibility it was a hopeless task. Far easier to stay in Linux and do a cross build for windows binary. IMHO of course.


my Linux "breadboard" just for this test.
250) Message boards : BOINC client : Cannot find release 7.15.0 (Message 92381)
Posted 6 Aug 2019 by Profile Joseph Stateson
Post:
OK, got Boinc Penguin to work!!

Proof: I modded version.h and changed number to "the beast" 7.16.666

Saved my history and copied what should be the necessary commands to build boinc under 18.04
downloaded the boinc_client zip from github for 7.16.1  (but version.h had 7.15.0)

   13  sudo apt-get update
   14  sudo apt-get upgrade

   11  sudo apt-get install build-essential
   12  sudo apt-get install  checkinstall

   21  sudo apt-get install m4
   22  sudo apt-get install gm4

   24  sudo apt-get install pkg-config

   30  sudo apt-get install autoconf

(use 42 instead of libtoolize)
   42  sudo apt-get install libtool m4 automake

   48  sudo apt-get install openssl

not all of 59 worked but 62 & 66 seemed to do it
   59  sudo apt-get install clibcurl4-openssl-dev pkg-config libssl-dev libsslcommon2-dev

   62  sudo apt install libssl-dev
   66  sudo apt-get install libcurl4-gnutls-dev
   75  sudo apt-get install libz-dev
   80  sudo apt-get install libnotify-dev

i scattered the following between the above lines and then searched for the missing packages but should work here no error
   37  ./_autosetup 

ask for that unwanted wxwidgets so I needed disable manager
   69  ./configure --disable-server --disable-manager

   83  make

note:  i edited version.h and put in the "beast" 666

   94  ./client/boinc --version


Look at bottom left of picture for printout of version #


got it up and running also!!

z400-linux

1			8/5/2019 10:32:25 PM	Starting BOINC client version 7.16.666 for x86_64-pc-linux-gnu	
2			8/5/2019 10:32:25 PM	This a development version of BOINC and may not function properly	
3			8/5/2019 10:32:25 PM	log flags: file_xfer, sched_ops, task	
4			8/5/2019 10:32:25 PM	Libraries: libcurl/7.58.0 GnuTLS/3.5.18 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3	
5			8/5/2019 10:32:25 PM	Data directory: /var/lib/boinc-client	
6			8/5/2019 10:32:25 PM	No usable GPUs found	
7			8/5/2019 10:32:25 PM	Version change (7.15.0 -> 7.16.666)	
8			8/5/2019 10:32:25 PM	[libc detection] gathered: 2.27, Ubuntu GLIBC 2.27-3ubuntu1	
9			8/5/2019 10:32:25 PM	Host name: jyslinux1	
10			8/5/2019 10:32:25 PM	Processor: 12 GenuineIntel Intel(R) Xeon(R) CPU X5650 @ 2.67GHz [Family 6 Model 44 Stepping 2]	
11			8/5/2019 10:32:25 PM	Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmper	
12			8/5/2019 10:32:25 PM	OS: Linux Ubuntu: Ubuntu 18.04.3 LTS [5.0.0-23-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)]	
251) Message boards : Questions and problems : resource "share' not working as expected when set to zero (Message 92372)
Posted 5 Aug 2019 by Profile Joseph Stateson
Post:
Not every project updates their server code in sync with BOINC releases - many of them leave it untouched for months or years. For a long time, there were reports that specific projects wouldn't allow the special 0 value to be entered via their websites.


YoYo for some reason doesn't allow 0 so its never a backup project for me. Admins have confirmed 0 is not allowed. Thats the only project I've come across that doesn't allow 0.


That is a shame. Just another excuse for someone to mod the boinc code to make up for deficiency in project code.
252) Message boards : BOINC client : Cannot find release 7.15.0 (Message 92369)
Posted 5 Aug 2019 by Profile Joseph Stateson
Post:
The numbering policy is:

Even numbers are used for releases
Odd numbers are used for development work, not yet ready for release.

If you see v7.15.0 out in the wild, it's a private build, made for testing and possibly with modifications not intended for general use.

The next release branch is 7.16.1, and that can be selected under 'branches' in GitHub: I've made a private Windows build for testing, but I don't know of anybody else admitting to doing that.


Yes, I have been alerted via PM about this version. it seems the project, SETI, has some deficiency or feature (not sure which) that required fixing with a different version of BOINC. I myself had a problem where I was getting way too many work units from Einstein when SETI went off-line but that was fixed when Wille found a bug in !BAM that prevented resource% from being set to 0. I will find out next Tuesday. Hopefully with resource=0 I will get exactly the number of Einstein work units I need to remain busy until SETI comes back on-line. I think we discussed this a few months ago but I was unware that !BAM was affecting the value.

At Milkyway, I had a problem where my S9100 GPU was not being recognized or used efficiently. With the source code available (but very difficult for me to build), I used a binary editor to change one of the recognized board's ID to "S9100" (Hawaii actually) and I changed the number of ALU's and the FP64 ratio to match that of the Hawaii GPU. This allowed the board to be recognized and efficiently utilized by the Milkyway app. This app, like Berkeley's BOINC was created under the GNU Lesser General Pubic License. I am not sure if this "binary hack" is considered a "modification" under the terms of the LGPL but I did post how to make the change here and it was used by several other enthusiasts who had S9150 boards.

From what I see, the version 7.15.0 is being used by a select group of SETI users and not available to the general users. I do not have a problem with that myself. However, I think it is quite hypocritical for staff at Berkeley (not going to mention any names but you can google) to complain about Fred not releasing his BoincTask source code to the public and then not do anything about their own code when it suits them just fine.
253) Message boards : BOINC client : Cannot find release 7.15.0 (Message 92367)
Posted 4 Aug 2019 by Profile Joseph Stateson
Post:
Was looking for the "client release" here
https://github.com/BOINC/boinc/releases

found 7.14.2 under tags but not 7.15 anywhere.

was also wondering if building might require older version of tools such as mingw, openssh, libcurl, etc, as I once tried to build an older Linux app and found that it would not build with the latest versions of 3rd party apps, libraries, and databases and it was difficult to find old versions.
254) Message boards : Questions and problems : resource "share' not working as expected when set to zero (Message 92363)
Posted 4 Aug 2019 by Profile Joseph Stateson
Post:

I noticed that setting resource to "0" over at !BAM it had actually changed to "-1" when I refreshed their webpage


This was a bug that Willie just fixed!!!!

https://www.boincstats.com/forum/7/12283,1

I think the problem was also compounded when I changed the venue to "home" for testing purposes and then synched with the account manager before doing an update. The updates was needed to get the venue into the client else the account manager would not know WTF was going on.

Willie site has been upgraded, looks really nice. I have used !BAM since I join BIONC and possibly this bug has been there all this time???
255) Message boards : Questions and problems : resource "share' not working as expected when set to zero (Message 92360)
Posted 3 Aug 2019 by Profile Joseph Stateson
Post:

I noticed that setting resource to "0" over at !BAM it had actually changed to "-1" when I refreshed their webpage


OK, I disconnected and the re-connected to !BAM and now have resources = 0 showing up on the BOINC client (my PC) for Einstein and %100 for SETI so will see what happens when SETI goes offline next Tuesday.

Not sure why I had to do this to get the 0 to show up. Before I disconnected I was showing 100 on the PC after setting resource to 0 both at the project web site (Einstein) and at !BAM even after doing an update and a sync. But a disconnect and a reconnect seemed to fix it.

Looks like it might work. Will find out next time SETI goes off line.
256) Message boards : Questions and problems : APP_CONFIG: Is there a wiki for various projects "app_config" (Message 92357)
Posted 3 Aug 2019 by Profile Joseph Stateson
Post:
Googleing around I have found in the past, a number of web sites where various "app_config" are listed for various projects. I would be interested in helping maintain a wiki for something like that.. This would involve coordinating with various project principles to find what is currently being used such as command line arguments to their client apps. Probably not something a single person could do although I am a retired programmer with not much else to do.

I noticed there is a SETI users group who maintain a "secret" BOINC client.

Maybe there is already a group that is doing this?

If so, I would like to hear from them.
257) Message boards : Questions and problems : resource "share' not working as expected when set to zero (Message 92356)
Posted 3 Aug 2019 by Profile Joseph Stateson
Post:
Hate to bring this up but I am having to abort tasks that I know will not finish before the deadline.

I asked for help with this over at Einstein
https://einsteinathome.org/content/looking-way-limit-number-work-units

Basically setting resource to 1 did not work as expected but I will try again next time SETI goes down. Possibly the change to "1" did not propagate (sync?) in time to catch on.

I noticed that setting resource to "0" over at !BAM it had actually changed to "-1" when I refreshed their webpage. That caused the project to show up with %100. I have since set it to 1 and verified all are synced and hopefully I will not have to abort the, usually, 75 or so Einstein tasks that will never finish after SETI goes back on=line.

I read were a number of SETI users have a special (even secret!) BOINC client mod to allow their queue to be large enough so their queue will not "run dry". I have no interest in using a mod unless I can do the mod my self and would rather fallback on another project.
258) Message boards : Questions and problems : "Use x% CPU": Cores get too hot in bursts, fan noise kick in (Message 92355)
Posted 3 Aug 2019 by Profile Joseph Stateson
Post:
I failed to mention that even a slight change in CPU speed multipliers can make a huge difference in temperature and still allow hyperthreading. I only know how to do this on windows using the sleep and power management tool but I am able to keep 14 threads busy with WCG tasks and another 2 threads on GPUGRID setting processor power maximum to %96. This runs my I9 at 3100mhz or multiplier of 31. Maximum multipler is 43 which corresponds to boost speed of 4000 (have never been there). The CPU is rated by CPUz at 3300. I no longer have high temps and run the dell fans at auto instead of manual.

I don't know if a Linux tool to this so possibly you may have to set the multiplier in bios setup.

For me, setting the speed down made a huge difference and I keep all threading enabled.
259) Message boards : Questions and problems : "Use x% CPU": Cores get too hot in bursts, fan noise kick in (Message 92319)
Posted 30 Jul 2019 by Profile Joseph Stateson
Post:
Something is not right. Temps should not shoot up that high so fast. Check for dust. Take case off and go over all fans, cpu cooler fins (or radiator if it has one), the power supply, and any filters (behind front panel? or under case?). Dust bunny can pack in corners of power supply over time. You have a professional workstation and it probably has sound suppression padding on the case inside surfaces which contributes to heat retention. I have a Dell Area51 with I9 and I used their thermal control app to set fans almost %100 and the noise was deafening. I had to blow dust out every two months or so. If you do not use all of your 12 threads, disable hyperthreading. That allowed me to run fans at slower speed. Setting fan to %50 may still allow one or two cores to hyperthread so best to disable it entirely. I looked over at eBay at Precision 3630 and looks like it has to side cutouts for air flow. I suspect there it no fan. You might add a pair of low noise fans to pull cool air in but I suspect a good cleaning and disabling hyperthread plus running tthrottle like Ageless mentioned is all you need.
260) Message boards : GPUs : Client wont start Getting Stuck at OpenCL (Message 92264)
Posted 22 Jul 2019 by Profile Joseph Stateson
Post:
What I see looks ok. I have slightly newer driver (2906.9) but it should make difference. Something else is happening. Bring up the Event Viewer and look for anything suspicious in Applications and also in "System". Was this working before? If you installed as a service that could be a problem.

1			7/22/2019 10:45:10 AM	Starting BOINC client version 7.14.2 for windows_x86_64	
2			7/22/2019 10:45:10 AM	log flags: file_xfer, sched_ops, task	
3			7/22/2019 10:45:10 AM	Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8	
4			7/22/2019 10:45:10 AM	Data directory: C:\ProgramData\BOINC	
5			7/22/2019 10:45:10 AM	Running under account JStateson	
6			7/22/2019 10:45:11 AM	OpenCL: AMD/ATI GPU 0: Radeon RX 580 Series (driver version 2906.9, device version OpenCL 2.0 AMD-APP (2906.9), 8192MB, 8192MB available, 6474 GFLOPS peak)	
7			7/22/2019 10:45:11 AM	Host name: StatesonFamily	
8			7/22/2019 10:45:11 AM	Processor: 12 GenuineIntel Intel(R) Xeon(R) CPU X5690 @ 3.47GHz [Family 6 Model 44 Stepping 2]	
9			7/22/2019 10:45:11 AM	Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx smx tm2 dca pbe	
10			7/22/2019 10:45:11 AM	OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18362.00)	
11			7/22/2019 10:45:11 AM	Memory: 11.99 GB physical, 23.99 GB virtual	
261) Message boards : GPUs : Warning: Microsoft's OpenSSH has problems with GPU just like remote desktop (Message 92263)
Posted 22 Jul 2019 by Profile Joseph Stateson
Post:
I tried using Windows 10 OpenSSH server on a headless Windows system and within seconds of the installation I lost 500+ Milkyway tasks before I could suspend. I then stopped the daemon "sshd" in services but that had no effect and I lost another 120 before I could suspend.

There is no uninstall for OpenSSH and I had to go back to a restore point that was before "boinc". I did not see any other way to delete it. I had no idea that OpenSSH on windows had the same problem as Microsoft's RDP implementation. I use Putty all the time on Linux. The Microsoft Windows implementation also works with Putty but has problems with OpenCL on graphics boards.

Maybe it would have worked if I put in OpenSSH before installing BOINC but I am not interested in testing that out. If someone has it working I would like to how as I have run out of free licenses for VNC as well as Splashtop and don't like Team Viewer.
262) Message boards : Questions and problems : Possible to move "leftover" work units into a project directory for processing? (Message 92242)
Posted 20 Jul 2019 by Profile Joseph Stateson
Post:
I have possibly 100 work units at (for example) "projects/seti.old" and was wondering if I can simple move them into "projects/seti" and restart BOINC client and allow them to finish. I vaguely remember trying something like this a year ago and I lost all the works units, not just the few I tried to copy but I think the problem then was an OS change which didn't happen here.
263) Message boards : Questions and problems : gpu tasks postponing for 30 seconds, not sure what is going on (Message 92234)
Posted 19 Jul 2019 by Profile Joseph Stateson
Post:
I posted over at SETI as there is a recent thread about this problem. Just started on a new 18.04 system I put together yesterday. I googled around and found several of the "postponed for 30 seconds" problems but no solutions seems to work and I probably need to enable some log flags to get more info about that is happening.

Ati drivers were installed from amdgpu-pro-19.10-785425-ubuntu-18.04.tar
using
sudo ./amdgpu-install --OpenCL=legacy

WCG tasks run fine but gpu tasks have problems:
all milkway error out (no 30 second stuff)
all setiathome run 2 seconds and generate that 30 second error message
[edit] - all seti tasks finally error'ed out, just took longer for them to think about it.
I would like to debug this but don't know what debug flags to set.
I have issued the command
boinccmd --set_gpu_mode always

where in cc_config.xml can I verify that the above worked?

This is the first Linux system I have built that uses those S9000 "pro" AMD cards.
I noticed that "amdgpu-pro-install" is just a link to amdgpu-install so there is no longer any specific "pro" driver from AMD for Linux unlike windows..

1			7/19/2019 12:00:16 PM	Starting BOINC client version 7.14.2 for x86_64-pc-linux-gnu	
2			7/19/2019 12:00:16 PM	log flags: file_xfer, sched_ops, task	
3			7/19/2019 12:00:16 PM	Libraries: libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3	
4			7/19/2019 12:00:16 PM	Data directory: /var/lib/boinc-client	
5			7/19/2019 12:00:17 PM	OpenCL: AMD/ATI GPU 0: ATI FirePro V (FireGL V) Graphics Adapter (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 3072MB, 3072MB available, 1613 GFLOPS peak)	
6			7/19/2019 12:00:17 PM	OpenCL: AMD/ATI GPU 1: ATI FirePro V (FireGL V) Graphics Adapter (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 3072MB, 3072MB available, 1613 GFLOPS peak)	
7			7/19/2019 12:00:17 PM	OpenCL: AMD/ATI GPU 2: ATI FirePro V (FireGL V) Graphics Adapter (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 3072MB, 3072MB available, 1613 GFLOPS peak)	
8			7/19/2019 12:00:17 PM	[libc detection] gathered: 2.27, Ubuntu GLIBC 2.27-3ubuntu1	
9			7/19/2019 12:00:17 PM	Host name: jysdualxeon	
10			7/19/2019 12:00:17 PM	Processor: 24 GenuineIntel Intel(R) Xeon(R) CPU X5675 @ 3.07GHz [Family 6 Model 44 Stepping 2]	
11			7/19/2019 12:00:17 PM	Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmper	
12			7/19/2019 12:00:17 PM	OS: Linux Ubuntu: Ubuntu 18.04.2 LTS [4.18.0-25-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)]	
13			7/19/2019 12:00:17 PM	Memory: 11.72 GB physical, 2.00 GB virtual	
14			7/19/2019 12:00:17 PM	Disk: 109.53 GB total, 95.54 GB free	
15			7/19/2019 12:00:17 PM	Local time is UTC -5 hours	
16			7/19/2019 12:00:17 PM	Config: GUI RPC allowed from any host	
17			7/19/2019 12:00:17 PM	Config: GUI RPCs allowed from:	
18			7/19/2019 12:00:17 PM	    jysarea51	
19			7/19/2019 12:00:17 PM	Config: use all coprocessors	
20	Milkyway@Home	7/19/2019 12:00:17 PM	URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 810090; resource share 100	
21	SETI@home	7/19/2019 12:00:17 PM	URL http://setiathome.berkeley.edu/; Computer ID 8772790; resource share 100	


and 100's of the following but no error'ed tasks except for milkyway

70	SETI@home	7/19/2019 12:01:51 PM	Task 17jl19ab.1022.17245.15.42.242.vlar_1 postponed for 30 seconds: 	
71	SETI@home	7/19/2019 12:01:52 PM	Task blc45_2bit_guppi_58642_03722_HIP55210_0016.25435.818.22.45.130.vlar_1 postponed for 30 seconds: 	
72	SETI@home	7/19/2019 12:01:53 PM	Task blc42_2bit_guppi_58642_03722_HIP55210_0016.26020.409.22.45.219.vlar_1 postponed for 30 seconds: 	
73	SETI@home	7/19/2019 12:01:54 PM	Task blc42_2bit_guppi_58642_03722_HIP55210_0016.26020.409.22.45.216.vlar_1 postponed for 30 seconds: 	
74	SETI@home	7/19/2019 12:01:55 PM	Task blc42_2bit_guppi_58642_04042_HIP55426_0017.26054.409.22.45.119.vlar_1 postponed for 30 seconds: 	
75	SETI@home	7/19/2019 12:01:56 PM	Task blc42_2bit_guppi_58642_04362_HIP55210_0018.26043.409.22.45.161.vlar_1 postponed for 30 seconds: 	
76	SETI@home	7/19/2019 12:01:57 PM	Task 17jl19ab.1022.17245.15.42.252.vlar_0 postponed for 30 seconds: 	


I asked over at the AMD community if the S9000 is supported in 18.04 as it is not clear from the release info. In any event, I will be returning the boards back to windows once I repair the motherboard caused by 18 awg wireing on an adapter.

I have ordered 16awg adapter wiring but I need to solder wires as the pins are not any good on the mombo and still need to verify the mombo (hp-z400) will work.

264) Message boards : Questions and problems : More apparent hack attempts on new installs of client (Message 92232)
Posted 19 Jul 2019 by Profile Joseph Stateson
Post:
Just been reading through the previous thread.
Out of interest did you get a better router in the end? I have an Asus DSL-AC56U and get at least a couple of requests a week blocked.


No, same router, and same very poor syslog support. I cannot filter messages to remove "info' messages at the modem and would have to buy a real syslog monitor program in addition to a better modem.

I did buy an edge router and put all of the Chinese made cameras on its subnet. I suspect they could still "phone home" but any hacking in would have to go thru the blue-iris system that is locked down on that subnet. if they somehow "phone home" all anyone would see are the feral hogs, coyotes, foxes, hawks, skunks and racoons in the area around my home. If they "phone home" and provide a tunnel back into my subnet they will be stuck at the blue-iris system and not have access to anything else.

I just checked all 3 of my Linux boxes and the only one showing any GUI RPC attempts is the new one. In below picture rx560 has been running for a month. tb85-nvidia for 3 days now and the one on the left with the non-allowed requests only 12 or so hours. I did reboot so those messages occurred in the last hour.

One thing that puzzles me is why the windows installs generate a key for gpu_rpc_auth.cfg but the same is not done on Linux.
I do not run boinc manager on these systems and use boinctasks exclusively from my windows desktop.
Boinctasks connects as soon as I add kits ip address to the remote_hosts.cfg in \etc\boinc_client.

[EDIT] I just brought up syslog on that Linux system using vi editor (not grep) and even though I rebooted all the old entries are there. IANE on Linux and assumed they were deleted after every reboot but no, there must be some aging mechanism before they are deleted. in any event, the other ubuntu 18.04 systems show no unauthorized access and I know for a fact that the tb85-nvidia showed non-allowed attempts from 2.x shortly after I set it up.

265) Message boards : Questions and problems : More apparent hack attempts on new installs of client (Message 92225)
Posted 19 Jul 2019 by Profile Joseph Stateson
Post:
Was first noticed and discussed here

I recently converted two windows system to Linux and immediately after
sudo apt-get install boinc-client

I went to \var\log\syslog and looked for attempts to RPC in and sure enough both systems showed GUI RPC request that were denied. The attempts, like the ones I posted over a year ago, came from 2.x which is controlled by RIPE. I am in USA and using whois I see that 2.0.215.127 is in France. Same as before.

Here is part of the log from most recent new install
syslog:Jul 19 02:15:55 jysdualxeon boinc[7033]: 19-Jul-2019 02:15:55 [---] GUI RPC request from non-allowed address 2.0.214.214
syslog:Jul 19 02:21:00 jysdualxeon boinc[20493]: 19-Jul-2019 02:21:00 [---] Config: GUI RPCs allowed from:
syslog:Jul 19 02:21:56 jysdualxeon boinc[20493]: 19-Jul-2019 02:21:56 [---] GUI RPC request from non-allowed address 2.0.215.127
syslog:Jul 19 02:27:31 jysdualxeon kernel: [    0.304000] NetLabel:  unlabeled traffic allowed by default
syslog:Jul 19 02:27:36 jysdualxeon /usr/lib/gdm3/gdm-x-session[1096]: (==) Max clients allowed: 256, resource mask: 0x1fffff
syslog:Jul 19 02:27:39 jysdualxeon boinc[1211]: 19-Jul-2019 02:27:39 [---] Config: GUI RPCs allowed from:
syslog:Jul 19 02:27:58 jysdualxeon boinc[1211]: 19-Jul-2019 02:27:58 [---] GUI RPC request from non-allowed address 2.0.216.44


I failed to copy the log from the new install of ubuntu I did 2 days ago. The first reboot erased the log but I remember the ip addresses also started with 2 but I failed to make note of the exact number.

I have never seen any attempts to RPC in after any initial install and as the log is erased one would never know unless the log was examined immediately after the install.

I got Ubuntu 18.04 from ubuntu.com
I have no idea where "sudo apt-get install boinc-client" came from. If it came from a French repository then I suspect something nefarious.

Maybe there is a valid explanation for this.

[EDIT] Getting more from 2.x
syslog:Jul 19 02:27:58 jysdualxeon boinc[1211]: 19-Jul-2019 02:27:58 [---] GUI RPC request from non-allowed address 2.0.216.44
syslog:Jul 19 02:36:11 jysdualxeon boinc[1211]: dir_open: Could not open directory '/dev/input/mice' from '/var/lib/boinc-client'.
syslog:Jul 19 02:38:43 jysdualxeon boinc[1211]: 19-Jul-2019 02:38:43 [---] GUI RPC request from non-allowed address 2.0.218.45
syslog:Jul 19 02:38:43 jysdualxeon boinc[1211]: 19-Jul-2019 02:38:43 [---] 6 connections rejected in last 10 minutes
syslog:Jul 19 02:50:42 jysdualxeon boinc[1211]: 19-Jul-2019 02:50:42 [---] GUI RPC request from non-allowed address 2.0.220.82
syslog:Jul 19 02:50:42 jysdualxeon boinc[1211]: 19-Jul-2019 02:50:42 [---] 5 connections rejected in last 10 minutes
syslog:Jul 19 03:00:42 jysdualxeon boinc[1211]: 19-Jul-2019 03:00:42 [---] GUI RPC request from non-allowed address 2.0.221.67
syslog:Jul 19 03:00:42 jysdualxeon boinc[1211]: 19-Jul-2019 03:00:42 [---] 4 connections rejected in last 10 minutes
syslog:Jul 19 03:12:42 jysdualxeon boinc[1211]: 19-Jul-2019 03:12:42 [---] GUI RPC request from non-allowed address 2.0.222.179
syslog:Jul 19 03:12:42 jysdualxeon boinc[1211]: 19-Jul-2019 03:12:42 [---] 5 connections rejected in last 10 minutes


[edit again]
Just checked my other Linux box and there are no attempts to log in. Only this new one that I got working just an hour ago and have not rebooted since putting in boinc client.
266) Message boards : BOINC client : Wish List: Allow {app d=N} to assign different GPU WUs to same GPU (Message 92221)
Posted 18 Jul 2019 by Profile Joseph Stateson
Post:
that already works by default but it can be messy (inefficieint)

with MW set to 0.19 and Einstein set to 0.33 I ended up after while with one MV taking 0nly 1/5 of gpu and one Einstein taking only 1/3 of the same one and wasted gpu time.

Normally 3 Einstein on one gpu or 5 MW but if both enabled then things get haphazard quickly.
267) Message boards : Web interfaces : Projects are not updating their webpages to lastest browsers (Message 92212)
Posted 17 Jul 2019 by Profile Joseph Stateson
Post:
This is really good! I posted over at gpugrid about the problem with captcha and noticed that a few words from my post were missing which is a good example of the problem I mentioned here on the same forum

Anyway, over there I posted the following and took a screen print of the missing words. Then I pressed the refresh key and the words showed up. Subsequent F5 refresh the words did not show up and the sentence stopped at that "f"

Note that in the below picture the last word ends in the letter 'f" and it should be
"free to use their captcha"



this is really aggravating. I can believe it is my win10 system causing the problem but why is only gpugrid's site showing the symptom? I am tempted to put a network analyzer on my enet port see if there is some strange control character being returned by their site or if it is actually missing the rest of the sentence.
268) Message boards : Web interfaces : On some projects "sort" has unexpected side effects (Message 92209)
Posted 17 Jul 2019 by Profile Joseph Stateson
Post:
Follow-up on previous post. I did not want to leave incorrect or misleading information when find out I was wrong.

The problem also occurs with "oldest post first". When I restored to "oldest" I saw my post no longer had missing words and assumed that was the problem. Much later on, I made a second post and noticed that a few words were missing and I was in the default "oldest post first" setting so the problem is not with the sort mechanism.

I post quite a bit and learned long ago to use notepad and make a copy of large posts before clicking the "submit". I also check carefully before (the preview) and after (the submit). AFAICT none of the BOINC website offer a "delete" option so I am especially careful to check out my post before submitting. GPUgrid is the only site where I have noticed words or phrases missing. I have tried Edge and Chrome, both up-to-date, and Windows 10 with latest patches and McAfee subscription.

I am thinking this is a problem with my system but I cannot account for why I don't see it here, at seti, at milkyway, or Einstein nor at askubuntu nor at answers.microsoft, nor at any market site (paypal,ebay, etc) nor social "comments" NYTimes, JPost, Facebook, etc.

Here is an example of the problem

This worked under bash in 18.04 Ubuntu and I am not sure about
Also, that login screen I saw was just the screensaver lockout.


The word "others." is missing after the word about.

I hit refresh several times but it does not show up.
I select "edit" and its there. I still paste my text back into the box, Check preview and the word "others." is there, but when I save my edits it is gone.

I come back hours later or a day later and nothing is missing.

This worked under bash in 18.04 Ubuntu and I am not sure about others.
Also, that login screen I saw was just the screensaver lockout.
269) Message boards : Web interfaces : Projects are not updating their webpages to lastest browsers (Message 92196)
Posted 16 Jul 2019 by Profile Joseph Stateson
Post:
Just minutes ago found out that milkyway is aware of problem and is implementing upgrade to 1.x to fix it.
270) Message boards : Web interfaces : On some projects "sort" has unexpected side effects (Message 92195)
Posted 16 Jul 2019 by Profile Joseph Stateson
Post:
Normally I use "newest post first" settings. Recently I noticed on gpugrid that occasionally some words or phrases that show up in the 'preview" are missing after being posted. Depending on whether I used google chrome or edge the number of words or their location changes. I discovered that if I restore the setting to default "oldest post first" that the preview always matches the posted article.

I was wondering if this artifact is a problem on my computer or with the web site and if anyone else has seen this behavior before.

thanks for looking
271) Message boards : Web interfaces : Projects are not updating their webpages to lastest browsers (Message 92168)
Posted 15 Jul 2019 by Profile Joseph Stateson
Post:
Thought I would post about the problem here, maybe someone here can provide help to the project maintainers / moderators as I suspect the project principals are too involved in the science.


Two projects: Milkyway an GPUgrid have problem with the way they handle account profile edits. When I went to their forum to point out the problem I noticed that the same complaints had been raised months earlier.

GPUgrid does not provide a captcha for Microsoft Edge or Chrome. Without the captcha it is not possible to edit or even add a profile.

Milkyway shows a 500 http error when attempting to edit.

I tried Microsoft Edge and Chome and two different computers. These are update-to-date computers and browsers.

Verified that active-x (virus transfer protocol) is enabled.
Verified that popups are enabled
Verified that ublock was disabled

Was unable to edit profile on either of those two projects listed above.
no problem with boinc and seti
no problem with amicable numbers or primegrid
no problem with asteroids at home or TN-Grid
no problem with collatz or cosmology
no problem with latinsquares or Einstein
no problem with enigma

when I get a chance I will try firefox on Ubuntu 18.04 to access milkyway and gpugrid. I have safrari on my iPhone but the screen is too small to be useful.

some projects do not support user profiles or have only limited support. Profiles are nice was one can provide information to other users when posting at their forum.

[edit]
project numberfield also generates 500 error just like milkyway
project Universe has the same captcha problem
272) Message boards : Projects : Combining BOINC with cryptocurrency (Message 92145)
Posted 14 Jul 2019 by Profile Joseph Stateson
Post:
Carbon Dioxide footprints:

crypto mining: 22-23 megatons
https://cointelegraph.com/news/bitcoin-generates-more-carbon 1-emissions-than-some-countries-study-warns

Streaming Adult Video (porn): 100 megatons
https://sputniknews.com/science/201907131076235927-porn-carbon-dioxide-emissions-research/

that puts porn at the same level as "inactive but always on" USA electronic devices. I suspect porn affects the brain the same way.
273) Message boards : Questions and problems : systemctl start = no GPU (Message 92137)
Posted 11 Jul 2019 by Profile Joseph Stateson
Post:
I am guessing that manually starting boinc (user? root?) is different than starting via the service.

I see "boinc" as owner for just about all in var/boinc but
"root" seems to be owner of script in /etc

How does your service compare to mine?
[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
ProtectHome=true
Type=simple
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle

[Install]
WantedBy=multi-user.target
274) Message boards : GPUs : Seems I can't mix RX570 with HD7950 (Message 92112)
Posted 9 Jul 2019 by Profile Joseph Stateson
Post:
I put an unused HD7950 in with several RX570s and Einstein tasks failed immediately. Was able to suspend fast enough to avoid running through my allotment for the day.


This was a new Linux system and I used

sudo ./andgpu-install --opencl=legacy

It is not clear exactly what that does and the amd release info says use legacy for anything before "vega-10" and I have never seen a vega-10 I have heard of the 56 and 64 vegas.

Anyway, the RX are all crunching seti & Einstein fine, just cant have that HD board in

I just checked command line options and =legacy,pal is another option but I don't know if that will fix the HD problem
275) Message boards : Projects : Combining BOINC with cryptocurrency (Message 92058)
Posted 4 Jul 2019 by Profile Joseph Stateson
Post:
Most recent bitcoin electricity comparison:

...Bitcoin accounts for roughly 0.25 percent of the world’s entire electricity consumption...


...more than the country of Switzerland uses over the same time period...

but as bad as the above is consider the following :

...the electricity wasted each year by always-on but inactive electronic devices in the US could power the Bitcoin network four times over...


One percent of the entire worlds electricity consumption is taken up by "always on but inactive" electronic devices in the USA!!!

https://www.msn.com/en-us/news/technology/bitcoin-consumes-more-energy-than-switzerland-according-to-new-estimate/ar-AADR3c2?ocid=spartandhp

Did my good deed: I unplugged the wall wart to my electric shaver and toothbrush which, if everyone does their part, will save the planet from the expected collapse in the year 2100. I could start using that sun dial I have in the garden and power off all my clocks. Unfortunately, the HOA does not allow roosters so I need at least one for an alarm.

Just read where the Russian submarine that caught on fire a few days ago was suspected of mapping internet cables so they could easily be cut. I think there are more immediate problems to worry about.
276) Message boards : Questions and problems : It appears that CPU tasks take way too long to crunch when GPU is active (Message 92043)
Posted 2 Jul 2019 by Profile Joseph Stateson
Post:
As far as I always understood, one OpenCL application should be able to be used on any OpenCL capable piece of hardware out there, just in the same way that OpenGL can be used on any capable hardware out there without having to write a specific API for that piece of hardware.


Was wondering about this myself. Why cant Einstein or Seti tasks run on whichever GPU is available: I am guessing it is only for verification purpose.

What about resuming from a checkpoint? If a task on a GTX-1080Ti is resumed from its checkpoint I assume it could be assigned to any other 1080Ti, but what about other models like the much weaker 1050?
277) Message boards : GPUs : Lost OpenCL on NVidia when adding ATI to PCIe splitter (Message 92040)
Posted 2 Jul 2019 by Profile Joseph Stateson
Post:
I added an RX-570 to a 4-in-1 riser that had 2 gtx-1060 and lost OpenCL

System was working with pair of RX570 on individual risers and 3 gtx1060 on splitter but I needed one of the gtx1060 for use elsewhere and the RX570 was available so I added it where the 1060 was.

Will try putting all 3 RX570 on the splitter and the pair of remaining gtx1060 on their own risers. The ATI OpenCL works fine, but the apps I am using required OpenCL on the NVidia boards.

1			7/2/2019 10:16:55 AM	Starting BOINC client version 7.14.2 for windows_x86_64	
2			7/2/2019 10:16:55 AM	log flags: file_xfer, sched_ops, task	
3			7/2/2019 10:16:55 AM	Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8	
4			7/2/2019 10:16:55 AM	Data directory: C:\ProgramData\BOINC	
5			7/2/2019 10:16:55 AM	Running under account jstateson	
6			7/2/2019 10:16:58 AM	CUDA: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 430.86, CUDA version 10.2, compute capability 6.1, 3072MB, 2488MB available, 3936 GFLOPS peak)	
7			7/2/2019 10:16:58 AM	CUDA: NVIDIA GPU 1: GeForce GTX 1060 3GB (driver version 430.86, CUDA version 10.2, compute capability 6.1, 3072MB, 2488MB available, 3936 GFLOPS peak)	
8			7/2/2019 10:16:58 AM	OpenCL: AMD/ATI GPU 0: Radeon RX 570 Series (driver version 2841.5, device version OpenCL 2.0 AMD-APP (2841.5), 4096MB, 4096MB available, 5095 GFLOPS peak)	
9			7/2/2019 10:16:58 AM	OpenCL: AMD/ATI GPU 1: Radeon RX 570 Series (driver version 2841.5, device version OpenCL 2.0 AMD-APP (2841.5), 4096MB, 4096MB available, 5095 GFLOPS peak)	
10			7/2/2019 10:16:58 AM	OpenCL: AMD/ATI GPU 2: Radeon RX 570 Series (driver version 2841.5, device version OpenCL 2.0 AMD-APP (2841.5), 4096MB, 4096MB available, 5243 GFLOPS peak)	
11			7/2/2019 10:16:58 AM	App version needs OpenCL but GPU doesn't support it	
12	Einstein@Home	7/2/2019 10:16:58 AM	Application uses missing NVIDIA GPU	
13	Einstein@Home	7/2/2019 10:16:58 AM	Missing coprocessor for task LATeah1061L00_188.0_0_0.0_22721461_0	
278) Message boards : Questions and problems : Had to limit # concurrent tasks due to "hi priority" WCG downloads (Message 91986)
Posted 30 Jun 2019 by Profile Joseph Stateson
Post:
Unaccountably they were all deadline'd for 48 hours even though these were fresh downloads


I am guessing these were "lost tasks" that were resent as I had problem with this system and replaced the power supply Alternately, they were on the disk drive all this time but didn't show up as the WCG account file was corrupted. This system had corrupted gpugrid account & statistics files and the work units did not show up. However, I don't recall seeing any WCG error messages and I saw plenty of error message from gpugrid.
279) Message boards : Questions and problems : Had to limit # concurrent tasks due to "hi priority" WCG downloads (Message 91981)
Posted 29 Jun 2019 by Profile Joseph Stateson
Post:
Fixed, but do have a question.

First want to show the problem.
One of my systems had been doing only GPUGRID and I allowed WCG tasks as I solved the CPU cooling problem. Unaccountably they were all deadline'd for 48 hours even though these were fresh downloads. With 12 threads that was not a problem as they would all complete but I did notice that my other BIONC system did not have this problem which was caused by all of the WUs having about 48 hours to finish on this system. I guess this was my luck and I got the "last" of the bunch.

One of the two GPUGRID work units (have pair of gtx1070) was "ready to run" so I decreased the # of cpus to allow it to run. That did not work. Only when I stopped all of the WCG tasks did the second GPU task start up.

Seems the high priority tasks preempt other tasks but there is no consistency. With all resources set at %100 resources either both GPU tasks should run or none should run if hi priority really preempt even with unused CPU threads. Why was just one allowed to run? In addition I had SETI gpu tasks in the queue but none of them got that left over GTX1070 either.

Some thought: Be nice if BM had a form to set various configurations thus avoid editing app_config. Alternately, a separate program to set parameters on various projects. I looked through some of the sources dealing with RPC calls but did not see anything related to setting a value in app_config but I might have missed it. I used project_max_concurrenet to ensure that only 8 tasks run and now my pair of GPUGRID are crunching in addition to the 8 WCG.
280) Message boards : GPUs : client not detecting all GPUs on risers (Message 91968)
Posted 28 Jun 2019 by Profile Joseph Stateson
Post:
This is all working after all. I am guessing what happened was I failed to reboot after looking at the device manager and seeing 5 devices but only 4 show up in boinc. I did restart boinc but should have rebooted. However, I have also noticed that the riser x1 adapters have variations in manufacturer. Some fit loose in the socket to where their cable weight can tip them out. Others fit tight and require a good pull to remove. I would think that if the 4-in-1 was loose then all the devices would fail, not just one.

Short Story: The 3 gtx1060 are all recognized in the 4-in-1 risers unlike when I first posted about the problem.
281) Message boards : GPUs : client not detecting all GPUs on risers (Message 91966)
Posted 28 Jun 2019 by Profile Joseph Stateson
Post:
Does CLinfo.exe find all cards? https://boinc.berkeley.edu/dl/clinfo.zip



Will try later. I moved one of the NVidia boards out of the 4-in-1 and put it into that empty slot-2 and all 5 were recognized by boinc.

This mombo, X8DTL-iF has a pair of x8 and a pair of x4 sockets. I will try the 4-in-1 in an x8 and see if that helps.

This was actually pretty cheap as I got it for under 90 with a pair of xeons and fans three years ago before the mining craze took off. The 6-gpu frame cost me $19 but was a PITA to assemble.

There is also a chance that it could have worked earlier as possible another reboot was needed. I need to do more testing before posting here. Usually the driver install requests a reboot but occasionally there is no notification. I will look at the clinfo.
282) Message boards : GPUs : client not detecting all GPUs on risers (Message 91960)
Posted 27 Jun 2019 by Profile Joseph Stateson
Post:
Been testing a 4-in-1 riser on motherboard that has only 4 slots.

Windows sees the correct count:

slot 1: 4-in-1 riser: GTX1060 (3gb), GTX1060(3gb), GTX1060(6gb
slot3: RX-570
slot4: RX-570

Client does not see the 6gb nvidia board
1			6/27/2019 3:57:50 PM	Starting BOINC client version 7.14.2 for windows_x86_64	
2			6/27/2019 3:57:50 PM	log flags: file_xfer, sched_ops, task	
3			6/27/2019 3:57:50 PM	Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8	
4			6/27/2019 3:57:50 PM	Data directory: C:\ProgramData\BOINC	
5			6/27/2019 3:57:50 PM	Running under account jstateson	
6			6/27/2019 3:57:53 PM	CUDA: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 430.86, CUDA version 10.2, compute capability 6.1, 3072MB, 2488MB available, 3936 GFLOPS peak)	
7			6/27/2019 3:57:53 PM	CUDA: NVIDIA GPU 1: GeForce GTX 1060 3GB (driver version 430.86, CUDA version 10.2, compute capability 6.1, 3072MB, 2488MB available, 3936 GFLOPS peak)	
8			6/27/2019 3:57:53 PM	OpenCL: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 430.86, device version OpenCL 1.2 CUDA, 3072MB, 2488MB available, 3936 GFLOPS peak)	
9			6/27/2019 3:57:53 PM	OpenCL: NVIDIA GPU 1: GeForce GTX 1060 3GB (driver version 430.86, device version OpenCL 1.2 CUDA, 3072MB, 2488MB available, 3936 GFLOPS peak)	
10			6/27/2019 3:57:53 PM	OpenCL: AMD/ATI GPU 0: Radeon RX 570 Series (driver version 2841.5, device version OpenCL 2.0 AMD-APP (2841.5), 4096MB, 4096MB available, 5095 GFLOPS peak)	
11			6/27/2019 3:57:53 PM	OpenCL: AMD/ATI GPU 1: Radeon RX 570 Series (driver version 2841.5, device version OpenCL 2.0 AMD-APP (2841.5), 4096MB, 4096MB available, 5095 GFLOPS peak)	
12			6/27/2019 3:57:54 PM	Host name: jysdualxeon	


copro_info.xml file looks good but is missing the gtx1060 6gb board.



for what it is worth, when I tried a 4th nvidia board gtx1070, windows device manager also missed the 6gb as well as the 1070. It knew there was something attached but could not recognized the device.
283) Message boards : The Lounge : New project suggestion: DoNKEY (Message 91914)
Posted 19 Jun 2019 by Profile Joseph Stateson
Post:
After visiting with the folks over at Gridcoin I came up with an idea for a new project. There have been a lot of complaints about the amount of electricity that is used by crypto-miners especially ASIC that do not provide anything useful except a block chain value and I discovered a way to reduce or eliminate that waste of electricity.

There is a program available VanityGen that generates valid keys for transactions for various projects. The program runs on CPU or GPU (OpenCL) and is quite efficient. All one needs is X users using Y resources to produce enough keys of Z length to cause transactions to slow down and eventually stop due to running out of keys or timing out trying to get one. I picked the name DoNKEY as this would be like a DDoS attack (Denial of service) but would be Denial of New KEYS.

Maybe one of the project scientists here with a math background could calculate how long it would take to slow down the new key creation by "X" seconds. Note that we don’t want to create a given unique key (3.3 E+33 years from above URL) but only want to generate as many as possible to get the chain so big it won’t fit on the hard drive. I am sure there would be enough BIONC enthusiast to jump on the DoNKEY project and put a dent into crypto-mining.

I hope this is a good topic for the lounge. Maybe one of the lounge lizards here can come up to the bar and take a few shots a this.
284) Message boards : Projects : Combining BOINC with cryptocurrency (Message 91912)
Posted 19 Jun 2019 by Profile Joseph Stateson
Post:
Calling someone a climate denialist likens them to holocaust deniers and is an insult and lack of civility.


Not when those denying the scientific consensus are doing so because of vested interests rather than a genuine belief that the scientific consensus is wrong. (I know people who genuinely believe that all the talk about climate change is a conspiracy along with vaccination, the introduction of 5G and much else besides.) I have long since given up arguing with them about it.


The last time I got into a discussion about this topic I got put on the stop forum spam list and took me a week to get off. I no longer crunch any climate related projects and never will as only a moderator or project insider could do that.
285) Message boards : Projects : Combining BOINC with cryptocurrency (Message 91910)
Posted 19 Jun 2019 by Profile Joseph Stateson
Post:
Bitcoin is responsible for the same amount of carbon dioxide emissions as a city like Las Vegas or Hamburg, and it's time to consider how to reduce its climate footprint, researchers said Thursday.


It is worse than that
https://www.cbsnews.com/news/bitcoin-mining-energy-consumption/
I completely agree %100 as far as ASIC (ant mining) is concerned: calculating a blockchain just for the sake of the calculation.

The only thing good about it is the development of the blockchain technology

A blockchain is a decentralized, distributed and public digital ledger
that is used to record transactions across many computers so that any involved
record cannot be altered retroactively, without the alteration of all subsequent blocks.


This is quite useful and potentially can reduce costs in financial and banking industries.
Not always mentioned is capability of anonymous or untraceable transactions take make it easy that criminals can take advantage of.
I personally have received extortion spam and there is not much that can be done about it.

that all being said, carbon dioxide is not a demonic gas and satellite studies show the earth is greening from more of a gas that is essential to life.
I personally do not know of anyone who believes the earth is not warming or that humans don’t share some of the blame.
Why are people who disagree with the “global consensus” on climate warming called denialists?
Because believers are intolerant of other people viewpoints and deaminize anyone who disagrees with them.
Calling someone a climate denialist likens them to holocaust deniers and is an insult and lack of civility.
The primary disagreement is the belief that CO2 is the control knob that can cure global warming.
286) Message boards : Questions and problems : bug? computation error on restarting based on time of day (Message 91888)
Posted 18 Jun 2019 by Profile Joseph Stateson
Post:
I ran some tests to try to see how often this problem occurs and it does happen but rarely and it seem the problem is the project\'s checkpoint handling like you suggested.

Tried doing a few manual suspensions and resumes: This did not cause an error so I was thinking the problem is the suspension was issued "all at once" when the time of day hits the stop deadline.

I then repeatedly set the deadline and resumed (set start and to same time of day). After about the 4th attempt I managed to get a single seti gpu task to go bad.

I then tried a stop of the service (ubuntu: sudo ./boinc-client restart) without suspending any tasks which pretty much interrupts the processing with little or no warning. All gpu (setI) tasks started up just fine but two WCG CPU tasks generated computation problems.


The above just shows that the projects have difficulty with their checkpoint implementation.

I did notice the following that concerns me. I have two systems that I would like to run only at night when cool The ubuntu system does not shut down the fans on the six RX560 gpus. The windows system shut down the fans on its Rx570s. I have replaced fans on a number of occasions. It is always a PITA trying to locate the exact replacement, more often than not from mainland china. Usually the entire heat sink needs to be removed. Both of these systems use about 115 watts or so with no load whether the fans are turning or not. I think I now want to shut down them when not being used. Is suspect there is a way in bios to turn the system on at a certain time and I suspect there is a network management or remote procedure call I can make to shut them down. Some programs like Nero and Acronis have a shutdown procedure but AFAICT neither Boinc nor it manager have that option.
287) Message boards : The Lounge : Another unwanted "feature" update from Microsoft (Message 91885)
Posted 18 Jun 2019 by Profile Joseph Stateson
Post:
Wil have to change my nickname.
ex-Beemerbiker?
Beemerlessbiker?
Beemerbikerless?

😁


Bikerless Beemer is correct as a "Beemer" is BMW motorcycle rider whereas a "Beamer" drives the BMW auto.
However, I am not going by any of those and discovered that when I changed my name it shows up immediately on all my posts even historical. Seems not possible to update nickname at Einstein at home.

Difference between a man and a boy has always been the price of his toys and I finally outgrew one.
288) Message boards : Questions and problems : bug? computation error on restarting based on time of day (Message 91866)
Posted 17 Jun 2019 by Profile Joseph Stateson
Post:
Due to heat during summer and several systems in the garage with no cooling I was testing running only at night 23:00- 06:00 on a system with 6 GPUs. Have never used this feature before. I had two tasks running on a pair of RX560 when I enabled the 23:00-06:00 filter and they all paused just fine. After a few minutes I set the time filter to 23-23 and to let these last two finish as there were no others queued up. Unaccountably, one reported a compute error instantly, the other gpu continue to run just fine. If this is a one-off occurrence then no problem. I allowed more work to make sure the problem was not the device and got 150 tasks before I could stop the downloads. All 6 gpus are working and I assume this was a random bug in the resume feature and at most a single task would be lost per device, if any.
289) Message boards : Questions and problems : CPU Benchmarks (Message 91859)
Posted 17 Jun 2019 by Profile Joseph Stateson
Post:
Looks O.K. compared to my slightly faster CPU. My "Mapping Cancer Markers" take 2-5 hours on this Haswell CPU running under 7.14.2 and same 18.04

rx560
6/17/2019 12:57:15 AM	Processor: 8 GenuineIntel Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz [Family 6 Model 60 Stepping 3]	
6/17/2019 1:02:03 AM	   Number of CPUs: 7	
6/17/2019 1:02:03 AM	   4481 floating point MIPS (Whetstone) per CPU	
6/17/2019 1:02:03 AM	   113560 integer MIPS (Dhrystone) per CPU	
290) Message boards : Questions and problems : no project url in task state file ??? (Message 91853)
Posted 16 Jun 2019 by Profile Joseph Stateson
Post:
Windows system rebooted unexpectedly (maybe I should start expecting this). Did not see anything in event log other than it was an unexpected shutdown.

Boinc seems to be working fine even with the following error messages. Not sure of their significance and not sure if I need to do anything about it.

GPUGRID	6/16/2019 8:36:27 AM	[error] no project URL in task state file	
GPUGRID	6/16/2019 8:36:27 AM	[error] no project URL in task state file	
World Community Grid	6/16/2019 8:36:27 AM	[error] no project URL in task state file	
World Community Grid	6/16/2019 8:36:27 AM	[error] no project URL in task state file	
291) Message boards : The Lounge : Why would anyone want to run Linux apps on window? (Message 91840)
Posted 14 Jun 2019 by Profile Joseph Stateson
Post:
Just wondering. My brother "heard" I was into Linux and send me this article where Microsoft is making it easier to run Linux apps. You have to be an insider which I used to be until I found I was getting buggy releases.

I would think the only reason would be because the app was free on Linux but the author wanted payment to run under windows. I run Ubuntu as it is free but only if the motherboard does not have an SLIC 2.1 for that "free" windows 10.

Thinking about this for a while I remembered I had an android app for signal strength interference and it allowed picking the best channel but when I replaced the phone with an iPhone the app was not available. I suspect if I install Ubuntu 19 using this new (?) technology my AMD video boards will still have problems with OpenCL.

After reading that article about the free Ubuntu on windows 10, I feel much better about running my free windows 10 on older, unwanted motherboards. More often than not, if the mombo is "missing" the SLIC2 one can easily be added. ;<)
292) Message boards : The Lounge : Anyone using immersion tech for cooling? (Message 91839)
Posted 14 Jun 2019 by Profile Joseph Stateson
Post:
Not sure what I am looking at. Soap bubbles? Co2 rising from dry ice?

I am impressed.

My experience with liquid cooling includes removing the "Made In China" label from the bottom of the water block and failing to see the clear plastic protector that was under the label. Nothing got damaged, just took a while before I figure out why it was overheating. My experiment with cheap tubing for water cooling did not go as well.

I am thinking that the bitcoin miners in Siberia have it easy as they can just tick the hardware outside "immersed in snow".
293) Message boards : Projects : Combining BOINC with cryptocurrency (Message 91812)
Posted 13 Jun 2019 by Profile Joseph Stateson
Post:
1) The mathematical problem must be divisible into smaller subproblems that can allow contributors of computational power to "prove" that they in fact contributed computational power to the BOINC project.


Some mathematical problems' are considered a waste of resources and are avoided by some users.

2) The solution to the smaller subproblems must be "easy to verify" by anyone.


I agree, that would be nice to have. Unfortunately, IMHO the more complicated the problem, the less likely the solution is simple. Frequently there are many solutions to a problem but an exhaustive search must be done to find the best.

Gridcoin requires a central party to verify that the computational work was actually done


Unfortunately, humans can lie, cheat and steal in addition to simple "erring". The projects verify the work was done, but what if they are mistaken, or an abuser has inside knowledge or support from a project insider? SETI had their cheaters back in '99. Collatz only found out about fake results being returned when the gridcoin people spotted an anomaly in coinage.

Not mentioned in your post is how one would handle malware that installs BOINC or bots that users install but are unaware of.
294) Message boards : The Lounge : Another unwanted "feature" update from Microsoft (Message 91810)
Posted 12 Jun 2019 by Profile Joseph Stateson
Post:
This week, all my windows 10 systems are installing or have installed the huge "Version 1903" release even though I thought I had set group policy to prevent such stuff. I assume it includes required security updates that override the policy but my gut feeling is they needed to update the advertising.

I do occasionally update my Ubuntu 18.04 but have never had a problem although I know better than to upgrade to 19.

Finally got rid of my R1100RT. 80000 miles on it in 19 years with only 5000 since I retired 7 years ago. Pass it down to my grandson who I hopes sells it. Wil have to change my nickname.

[EDIT] Check that -- seems an October 2018 update snuck in, not just the 1903 one. Need to find WTF is going on. Since these are headless possibly somethign went wrong, I didn't see it, and windows restored itself to the time before the group policy change. ?? Only Redmond knows for sure.

295) Message boards : GPUs : Request for Intel iGPU support to be a working feature again. (Message 91787)
Posted 10 Jun 2019 by Profile Joseph Stateson
Post:
Hey, it's been months now since BOINC could ever see the iGPU in my i5 8400, and talking with other users I'm far from alone here. It's just a shame to waste the iGPU's unit crunching power. It sits there idle, month after month, while my CPU and Radeon do all the work...

BOINC version 7.14.2
Windows 10 Version 10.0.17763 Build 17763


BOINC only sees what the OpenCL says is there. I ran intel's OpenCL on i5--6300U for a year and a half, right up to the time the overheating caused the LCD screen to pop out. I am sticking with open frame if I cant get a desktop to run cool and will never use a laptop again. I think their tools are here but I personally cant get the ubuntu version to work but then I have 18.04 and they clearly state only 16.04 works (Debian)





296) Message boards : Promotion : Encouraging and promoting 3rd party apps? (Message 91748)
Posted 6 Jun 2019 by Profile Joseph Stateson
Post:
Posted same info over at https://cryptocurrencytalk.com/forum/2436-projects/
and had over 800 views in 30 hours. Normally slow board there and 800 views might take a year or more on the average
297) Message boards : GPUs : GPU run project (Message 91731)
Posted 5 Jun 2019 by Profile Joseph Stateson
Post:
I am no expert on Linux and AFAICT Linux still sucks, but...

You ain't much of an expert on anything.
Windows, a quaint American word, meaning: a real OS is too hard for me.



Your selective editing of my post obscured my conclusion that
"windows sucks more" and it's guaranteed!


I am no expert on Linux and AFAICT Linux still sucks, but...

298) Message boards : GPUs : GPU usage utility available? (Message 91727)
Posted 5 Jun 2019 by Profile Joseph Stateson
Post:
Nope, not working. sensor shows no load and freq is too low.

Boinc Manager should show something like the following



What project are you attached to? I can look a the project and see if anything is out of the ordinary

Also, look in your event log for a message like the following
-
6			6/4/2019 4:59:15 PM	CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 430.86, CUDA version 10.2, compute capability 6.1, 4096MB, 3556MB available, 6852 GFLOPS peak)	
7			6/4/2019 4:59:15 PM	CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 430.86, CUDA version 10.2, compute capability 6.1, 4096MB, 3556MB available, 6463 GFLOPS peak)	
8			6/4/2019 4:59:15 PM	OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 430.86, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6852 GFLOPS peak)	
9			6/4/2019 4:59:15 PM	OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 430.86, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6463 GFLOPS peak)	
10			6/4/2019 4:59:15 PM	Host name: jysHPZ400	
11			6/4/2019 4:59:15 PM	Processor: 12 GenuineIntel Intel(R) Xeon(R) CPU X5680 @ 3.33GHz [Family 6 Model 44 Stepping 2]	
12			6/4/2019 4:59:15 PM	Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx smx tm2 dca pbe	
299) Message boards : GPUs : Intel OpenCL on Linux (Message 91726)
Posted 5 Jun 2019 by Profile Joseph Stateson
Post:
I cannot help with your problem but I too want to get opencl working in Ubuntu 18.04 with my Intel CPU. This Haswell has "HD Graphics 4600" which should be capable of opencl.

I am currently running 5 RX-560 and 1 RX-570 on a BT85 motherboard and had enough trouble getting the AMD driver to work and am concerned that if I try to get the Intel version of opencl to work I may end up with a broken system with only the CPU crunching.

I am going to follow this thread, maybe someone with more knowledge of Linux drivers can help. Actually just being told that both can work together in version xx with drives x and y would be helpfull
300) Message boards : GPUs : Not getting any work for my Anonymous platform at Einstein@home (Message 91697)
Posted 4 Jun 2019 by Profile Joseph Stateson
Post:
Still debugging the Nano platform. Errored out all of my daily allotment of tasks when libraries weren't found. Having to wait out the 24 hour penalty box before trying again.


I have this happen all too often and usually worst is 600+ errored tasks return to milkyway.


====wish list====
Was wondering if there was a boinc client option such as

"Allow new tasks for 10 seconds"

however, that might be a problem on projects such as milkway so..

"Allow 1 new task"

It seems to me that the 10 second time could easily be implemented but I do not know enough about the client to know if it can even ask for a single task or even if the project would honor a request for just 1.
301) Message boards : GPUs : GPU usage utility available? (Message 91696)
Posted 4 Jun 2019 by Profile Joseph Stateson
Post:
GPU-z from Tech Power Up is a good choice.

I once found that when I added a 2nd RX-570 AMD driver on its own saw fit to enabled cross-fire and that second GPU was never used. I had to disable cross-fire. As I recall BOINC assigned a task to it but it never finished and GPU-z showed it had no load.

If you are a game player and use SLI you may have to juggle enable / disable SLI. That is just a guess. Perhaps the AMD drivers as unable to work with boinc in crossfire mode and the nvidia might in SLI. GPUz will tell you in any event.

https://www.techpowerup.com/gpuz/
302) Message boards : BOINC Manager : Finish all jobs and close (Message 91684)
Posted 2 Jun 2019 by Profile Joseph Stateson
Post:
Is there, or could there be an option to let all the current Boinc jobs calculate, upload and finish, and then close the program. I ask because there are times I have to shutdown for a week, and I often wonder if the work being done when I restart is a complete waste, or gets done by someone else's computer and mine gets deleted when uploading because I was to late to the party.


sadly, I am unaware of any way to do that using the Boinc Manager. However, you can set "no new work" on all projects and, using app on smart phone such as splashtop, log in, see if they are done, and shut down the system.

I always wanted a function like that. The problem with selecting "no new work" is I have to remember which projects need to be "allowed" new work out of several dozen possible projects listed, when I power the system back up.
303) Message boards : Projects : tired of "just exclude boinc folder from virus scan" (Message 91681)
Posted 1 Jun 2019 by Profile Joseph Stateson
Post:

Anyway, if you don't want to join fun side, *and* The Scary Stuff isn't my problem, I'm totaly fine with this.


hmm. I failed to put [farcical][/farcical] around my post, my bad.
304) Message boards : Promotion : Encouraging and promoting 3rd party apps? (Message 91671)
Posted 30 May 2019 by Profile Joseph Stateson
Post:
I think if there were some interesting 3rd party apps available and being promoted it might get more people interested.

BoincTasks (Fred) just created a 3rd party forum and allowed me to promote several performance tools. There is a description of the program here and the sources and executables are at github

The program BThistoryReader works only with BoincTasks, using C# and runs only on windows. The program HostProjectStats creates a web based performance tool that runs using any browser and access project data on the web. That page is hosted (by me) here These program were built with VS2017

Here is a sample plot of Elapsed Time for the Milkyway Separation program. This is only one feature, refer to the above link for the other things it does.

305) Message boards : Questions and problems : resource "share' not working as expected when set to zero (Message 91637)
Posted 26 May 2019 by Profile Joseph Stateson
Post:
I failed to mention that this system is Ubuntu and runs an older version 7.9.3. Maybe that is the problem?

Going to try to upgrade to 7.14 although I hate "fixing" a working system.

Put in 7.14.2 using that site mentioned earlier. Also had to add
After=multi-user.target to get all 5 boards recognized on power up

updated image from THIS post so refresh page to see changes
306) Message boards : Questions and problems : Cant seem to find 7.14 for linux (Message 91630)
Posted 25 May 2019 by Profile Joseph Stateson
Post:
I had similar questions for installing Ubuntu on a laptop earlier this month. May I suggest I point you over here on the Seti boards. I don't think it is an official UC-Berkeley build, but it works for me. I'm curious to see Richard's response.


Thanks! I went over there and looked. I did read quote " … You can update your system with unsupported packages from this untrusted PPA by adding..."

I am not too worried about that warning since this is a Linux box. I am not yet ready to set up that other motherboard but will keep this in mind.
307) Message boards : Questions and problems : resource "share' not working as expected when set to zero (Message 91628)
Posted 25 May 2019 by Profile Joseph Stateson
Post:
I was under the impression that if resources on project "A" was %100 and project "B" was 0 then project B would only get data when A had none and would be limited to a single task at a time.

Well, maybe that works if there is only two projects but I got a boatload of Milkyway work units when its resource was %0 and SETI was 100% out of data. I had also set Einstein to %0 as I wanted it as a backup in addition to milkyway. This system has only 2 threads and Milkyway requires a full CPU unlike some of my other systems that run on 8-9% CPU utilization per job easily. Seti came back with more jobs but in the aftermath I have 125 Milkyway work units that cannot possibly finish as I cannot run more than 2 out of the 5 RX560s concurrently. I did notice there is a couple of weeks away but that will just extend the problem for another two weeks before they start executing at priority and 3 of the GPUs remain idle..

I looked that the implemention plan here
https://scienceunited.org/doc/implementation.pdf

I did not see anything that indicated ZERO is a special number that allows only 1 task.

Maybe the problem is with the project handing out a couple of 100 tasks when only 1 was asked for?

If this is a bug then I can submit it else I assume it is a feature request. I went to githug and poked around but all I could find was that PDF on implementation.

I set share to ZERO by going to my manager BAM! and changing from 0 to 1 and back to 0. I think that is a bug in BAM! but in any event when I run my local manger I see %0 share for those two backup projects so it got set to 0 on my PC. Maybe milkyway does not use that??? This was a new system and I was unaware Milkyway need %95 or so CPU for each GPU until it "happened"
308) Message boards : Questions and problems : Cant seem to find 7.14 for linux (Message 91627)
Posted 25 May 2019 by Profile Joseph Stateson
Post:
Which iteration of Ubuntu do you have?

On 19.04 with a fresh install I got 7.14.2. (I installed manager at the same time.


Have 18.04 as when I tried 19.04 I as unable to get the amdgpu-pro driver to install. Looking at AMD drivers I see only 18.04 I supported.

However, the system I am considering for Linux has a pair of gtx1060. Maybe the nvidia driver is up-to-date.

[EDIT] I just checked my attempt on May 1 and there is a record at boincstats of my connection and 7.14.2 was running in 19.04
https://boincstats.com/en/bam/host/872975

but I could not get AMD drivers to install so I figured a way to activate win7 and upgraded to 10.
309) Message boards : Projects : tired of "just exclude boinc folder from virus scan" (Message 91625)
Posted 25 May 2019 by Profile Joseph Stateson
Post:
On the bright side -- Unix for all! You can always find (or build) perfect match. Anecdotal evidence -- I have no problems with viruses, anti-viruses, trojans, worms etc etc for decades. Pity, vapourware is still a thing though :(


You could be part of the problem and not even know it! Your Linux box could be part of the problem (or solution, or feature), one of millions of zombie system "bots" inflicting chaos (or fixing, or upgrading depending on the script kiddie ideals) social media platforms all over the free world. (Russia & china are switching to private internet as we converse so bots cant get in there eventually but probably will always get out)

I once thought to myself "whynot (pun intended) stick with my 1976 Zilog Z80 and its CPM) and use it for the next several decades"

One day, a lot of the secretaries got together and told the boss they wanted to stick with electric pencil and not use word perfect as it was too confusing. I liked word perfect myself and agreed with the boss and realized that I also had to go with the flow and my Z80 CPM was replaced.

[EDIT] Yes, I am aware that Raspberry PI boots CP/M but I am not into retro computing.
310) Message boards : Questions and problems : Cant seem to find 7.14 for linux (Message 91622)
Posted 25 May 2019 by Profile Joseph Stateson
Post:
I looked at the change log info for 7.14.2 and Linux is listed but the downloads are only for windows and mac.

I get Linux from here
https://www.ubuntu.com/download/desktop

and using

apt-get install boinc-client

I get the boinc client 7.9.3 from that distribution location. The Berkeley download is only 7.4

This system, 7.9.3, seems to be working fine. SETI at %100 and milkyway & Einstein backups at %0. I have no other projects in mind as the CPU is rather limited (2 threads). I see other users running the latest version, 7.14.2, under Mint. Where did they get that? I assume it is real Linux and not windows client under Wine. I did spent a short time googling for 7.14 before come here. I do not plan on upgrading my 7.9.3 unless there is a security problem or real benefit but I have a windows system I might want to convert as it has better resources (I could actually use it) and would consider using 7.14.
311) Message boards : Projects : tired of "just exclude boinc folder from virus scan" (Message 91613)
Posted 24 May 2019 by Profile Joseph Stateson
Post:
This is just my opinion, but I am getting tired of being told to just prevent the boinc directory from being scanned to fix "false positives". That is like putting a sign outside your home "this is a gun free area". Just tells the bad guys were to go to do their thing.

I lost all Moo! Wrapper work units due to McAfee on a system that I like to keep secure but still on the internet.

I went over to "moo" people and made a wish and was informed that they get their "stuff" from dnetc and I would have to talk to them. So, the "moo" people don't even know what they are downloading to unsuspecting users. The projects need to step up and work with the antivirus folks.
312) Message boards : Projects : LHC@home moves SixTrack 5.02 to production (Message 91612)
Posted 24 May 2019 by Profile Joseph Stateson
Post:
Recent announcement from Alessio Mereghetti... https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5011

This is a major rebuild of LHC's SixTrack app moved from testing to production. WinXP is not supported but they're working on fixing that.
ATM there are 750K tasks available. Get 'em while they're hot!!


Thanks, was a fan of CERN hoping they could find a small black hole to prove the critics were correct and they could inadvertently destroy the world.

..I quit reading when I got to the "huge source files and Fortran90" statement.

They need to get on the GPU band wagon.
That being said, I am crunching on their stuff.
313) Message boards : GPUs : GPU run project (Message 91610)
Posted 23 May 2019 by Profile Joseph Stateson
Post:
Does any one know of a Project that runs on a Linux Mint OS via a AMD Radeon 7700 GPU?
If so, are there any special commands needed to make it work?
I have a PC that works the same GPU via Windows7 OS, but with Linux I can't make it work.

Thanks



You might want to read this article here especially the part about AMD ditching support for older boards.

I was not successful in getting the proprietary amdgpu-pro-19.10-782345-ubuntu-18.04.tar to work under 19.04 minimal server. I didn't pay much attention to the filename and assumed the 19.10 was ubuntu but actually it was just the AMD version. The driver only worked on 18. I did get it to work with five RX560s on risers under 18.04 minimal desktop.

Since your board came out in 2012 I would think the open source "amdgpu" that comes with all distributions would work instead of the "pro".

that being said, you might want to just upgrade to windows 10 in the meantime. At least that would get you newer GUI that IMHO is superior to 7 unlike that fiasco 8. Micro$oft does update the antivirus and fix defects and security problems unlike windows 7. If your 7 is legit, activated and "registered" with a Micro$oft account then you can still do the free upgrade.

I am no expert on Linux and AFAICT Linux still sucks, but...

314) Message boards : Questions and problems : Need delay in Ubuntu else client does not see GPUs. (Message 91600)
Posted 22 May 2019 by Profile Joseph Stateson
Post:
Wanted to update this thread with what was learned at the GPU forum and also at the AskUbuntu community

In the file /lib/system/system/boinc_client.service

replace

After=network-online.target

with

After=multi-user.target

That file might be in slightly different location depending on flavor of Linux.
Only make the change if GPUs are not being recognized on first power up but are recognized if boinc is restarted.
315) Message boards : Questions and problems : Manager opens when restarting computer (Message 91599)
Posted 22 May 2019 by Profile Joseph Stateson
Post:
... depending (I am guessing) on whether the boinctray program is running...
Wrong guess.

boinctray.exe is the 'user active' (mouse, keyboard, or sundry HID activity) detection utility. It has no visible interface components.


what I have seen, and is confusing, is sometimes BM is in the "hidden" area which I call the "tray" where I have to click on the ^ symbol before I can select it, and sometimes it is adjacent to that ^ or out in the open where I can see its icon. I think now that behavior is a function of how many icons can fit in that area. I was guessing that boinctray decided if it got hidden in that ^ area or just left adjacent to it.

I have never seen BM go desktop when rebooting. Debi0662 probably has it in some type of "windows startup" location like you mentioned.

If boinc was uninstalled and the system rebooted I suspect an error message will show up "cant find boinc" or something like that, but one would have to have some debugging ability to make use of that info to find where it was being run from. That would probably lead to editing the registry which can cause real problems, not just inconvenience. Probably better just to put up with the inconvenience.
316) Message boards : Projects : Enigma@Home Requires Invitation (Message 91575)
Posted 21 May 2019 by Profile Joseph Stateson
Post:
Strange - their account creation page says an invite is needed but the same sentences says not to use the form!

http://www.enigmaathome.net/create_account_form.php

I am a member but it has been a long time and I do not remember if I had to get an invite.

There is a way to double-check.

If you are a member of boincstats then you can go to their "sign up for projects" page and see if there is a space for invites as shown here
317) Message boards : Questions and problems : Manager opens when restarting computer (Message 91573)
Posted 21 May 2019 by Profile Joseph Stateson
Post:
I do not use BM and delete it from the registry after any install. With several BOINC systems, BoincTasks is more convenient than BM.

When BM starts there is a switch "-s" that tells it to start minimized and / or in the tray depending (I am guessing) on whether the boinctray program is running.

Try the following.
Bring up BM and exit so it is no longer running.

Bring up a CMD prompt and navigate to Program Files (if 64 bit) or Program Files (x86) if 32 bit and find boinc

then execute the manager as follows (boinc is on my D drive)

 Directory of D:\ProgramFiles\Boinc

10/11/2018  03:54 AM         1,532,192 boinc.exe
10/11/2018  03:54 AM           641,824 boinccmd.exe
10/11/2018  03:54 AM         9,063,712 boincmgr.exe
10/10/2018  09:55 PM         2,161,152 boincscr.exe
10/11/2018  03:54 AM            16,672 boincsvcctrl.exe
10/11/2018  03:54 AM            69,920 boinctray.exe
09/27/2018  06:18 PM            36,235 boinc_logo_black.jpg
               7 File(s)     13,521,707 bytes
               0 Dir(s)  1,956,268,883,968 bytes free

D:\ProgramFiles\Boinc>boincmgr -s

This should launch BM and put it adjacent or maybe in the tray.

If you get full screen then someone else will have to help. If it is in or near the tray
then possibly the registry got changed. You can check the registery by typing
regedit in the command line
navigate to this location (look at the top line: HKLM\Microsoft\Windows\CurrentVersion\Run
your image will be different than mine as I have BoincTasks.
You should have the boincmgr.exe and the -s argument.
adjacent to it should be bionctray.exe

There is another possibility but it is just a guess. I have seen on occasions a program go desktop because I had inadvertently registered for a notification. I rarely use BM (it is a desktop icon for my occasional use) and do not know if that is even a possibility.

An even less likely possibility is there is a duplicate startup. Make sure you have nothing in the folder as shown below.
C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp>dir
 Volume in drive C is OS
 Volume Serial Number is 9CCA-45E7

 Directory of C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp

09/15/2018  02:33 AM    <DIR>          .
09/15/2018  02:33 AM    <DIR>          ..
               0 File(s)              0 bytes
               2 Dir(s)  702,051,364,864 bytes free

C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp>



318) Message boards : GPUs : Is there a way in Linux Mint 19.1 to delay boinc start at boot? (Message 91568)
Posted 20 May 2019 by Profile Joseph Stateson
Post:
Needed to add
TimeoutSec=infinity

else time out default is 1min 30sec which is too short

I had actually used 300 not the 60 I wrote earlier. The 60 would not have triggered the timeout but it was not long enough for GPU detection.

This did work fine
1			5/20/2019 10:22:37 AM	Starting BOINC client version 7.9.3 for x86_64-pc-linux-gnu	
2			5/20/2019 10:22:37 AM	log flags: file_xfer, sched_ops, task	
3			5/20/2019 10:22:37 AM	Libraries: libcurl/7.58.0 OpenSSL/1.1.0g zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3	
4			5/20/2019 10:22:37 AM	Data directory: /var/lib/boinc-client	
5			5/20/2019 10:22:37 AM	OpenCL: AMD/ATI GPU 0: Radeon RX 560 Series (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 3812MB, 3812MB available, 2449 GFLOPS peak)	
6			5/20/2019 10:22:37 AM	OpenCL: AMD/ATI GPU 1: Radeon RX 560 Series (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 4059MB, 4059MB available, 2449 GFLOPS peak)	
7			5/20/2019 10:22:37 AM	OpenCL: AMD/ATI GPU 2: Radeon RX 560 Series (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 4059MB, 4059MB available, 2449 GFLOPS peak)	
8			5/20/2019 10:22:37 AM	OpenCL: AMD/ATI GPU 3: Radeon RX 560 Series (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 4059MB, 4059MB available, 2449 GFLOPS peak)	
9			5/20/2019 10:22:37 AM	OpenCL: AMD/ATI GPU 4: Radeon RX 560 Series (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 4059MB, 4059MB available, 2449 GFLOPS peak)	
10			5/20/2019 10:22:37 AM	[libc detection] gathered: 2.27, Ubuntu GLIBC 2.27-3ubuntu1	
11			5/20/2019 10:22:37 AM	Host name: rx560	
12			5/20/2019 10:22:37 AM	Processor: 2 GenuineIntel Intel(R) Celeron(R) CPU G1840 @ 2.80GHz [Family 6 Model 60 Stepping 3]	
13			5/20/2019 10:22:37 AM	Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmper	
14			5/20/2019 10:22:37 AM	OS: Linux Ubuntu: Ubuntu 18.04.2 LTS [4.18.0-20-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)]	
319) Message boards : GPUs : Is there a way in Linux Mint 19.1 to delay boinc start at boot? (Message 91567)
Posted 20 May 2019 by Profile Joseph Stateson
Post:
There is a similar query about delaying service startup on StackOverflow. See https://stackoverflow.com/questions/43001223/how-to-ensure-that-there-is-a-delay-before-a-service-is-started-in-systemd

There are a few suggestions in there but I think using a timer and disabling the service is probably the easier one for you.

It would be better if BOINC could do it as a config option as not all of us will want a delay and then there are the many different ways of doing it (just look at the message thread I linked to) and the different systemd/initd used by different flavours of Linux.

BOINC already has a start_delay config option, however thats after its got going and before running tasks. Can we move that delay so that it occurs before the GPU detection? That way we don’t need another config option.



I looked at that thread and tried
ExecStarPre=/bin/sleep 60
but BOINC did not start. Found following error in syslog
May 20 07:30:16 rx560 systemd[1]: boinc-client.service: Start-pre operation timed out. Terminating.
May 20 07:30:16 rx560 systemd[1]: boinc-client.service: Failed with result 'timeout'.
May 20 07:30:16 rx560 systemd[1]: Failed to start Berkeley Open Infrastructure Network Computing Client.


Following was my mod:

[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
ProtectHome=true
Type=simple
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStartPre=/bin/sleep 60
ExecStart=/usr/bin/boinc
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle

[Install]
WantedBy=multi-user.target


I would have thought this would work. I verified that sleep was at /bin
jstateson@rx560:/bin$ ls -l sleep
-rwxr-xr-x 1 root root 35000 Jan 18  2018 sleep
jstateson@rx560:/bin$



Why didn't this work? Ownership? Attributes? I assume it terminated because it could not be executed??

[EDIT] I have 4 AMD S9x00 cards on Win10x64 system and had to add a 5 minute delay in the task scheduler as recommended by Jord. Your suggest about using the cc_config is the way to go. How would this get into the Ubuntu release? I have 7.9.3 and that is not even listed as a download at Berkeley. All I see there is 7.4.22. Is there even a 7.14.2 for Linux?
320) Message boards : GPUs : Is there a way in Linux Mint 19.1 to delay boinc start at boot? (Message 91562)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
I see one possible point of confusion: BeemerBiker is showing us logs relating to an old, development version of BOINC (v7.9.3 - note the 'odd' middle integer), whereas Germano will be seeking to build onwards and upwards from the current release version 7.14.2


yes, I posted about that earlier but it was an edit "afterthought" about 1/2 hour after the original post. is there as way to get the latest? I use the "apt-get" for everything.

quote"
Maybe I have an old distribution. I got 18.04 from here
https://www.ubuntu.com/download/desktop
all I did after getting it to work was
apt-get install boinc-client

have no idea where that came from but AFAICT it is not the lastest and greatest
"
321) Message boards : GPUs : Is there a way in Linux Mint 19.1 to delay boinc start at boot? (Message 91559)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
Don't see much useful in that file

I do see instead. I had to understand if that happened even with BOINC upstream systemd unit file.
Can you also show us the output of
# systemctl status boinc-client
# lsmod


http://stateson.net/images/boinc_info.txt

There is a message when I log in about packages that can be updated and security concerns. I am going to update.
 * Canonical Livepatch is available for installation.
   - Reduce system reboots and improve kernel security. Activate at:
     https://ubuntu.com/livepatch

0 packages can be updated.
0 updates are security updates.

Your Hardware Enablement Stack (HWE) is supported until April 2023.
Last login: Sun May 19 13:27:55 2019 from 192.168.1.212
jstateson@rx560:~$


not sure about livepatch
322) Message boards : GPUs : Is there a way in Linux Mint 19.1 to delay boinc start at boot? (Message 91557)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
Don't see much useful in that file
323) Message boards : GPUs : Is there a way in Linux Mint 19.1 to delay boinc start at boot? (Message 91556)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
I have been told that on Ubuntu systemd unit file should be under /lib/systemd/system
Please find boinc systemd unit file and paste it here


working on it. did find searching from root.
[sudo] password for jstateson:
root@rx560:/# find . -name "boinc-client.service" -print
./lib/systemd/system/boinc-client.service
./etc/systemd/system/multi-user.target.wants/boinc-client.service
find: ‘./run/user/1000/gvfs’: Permission denied
./sys/fs/cgroup/pids/system.slice/boinc-client.service
./sys/fs/cgroup/devices/system.slice/boinc-client.service
./sys/fs/cgroup/systemd/system.slice/boinc-client.service
./sys/fs/cgroup/unified/system.slice/boinc-client.service
./var/lib/systemd/deb-systemd-helper-enabled/multi-user.target.wants/boinc-client.service


===========here========
HOWEVER, THIS IS PROBABLY FROM THE RESTART
SHOULD I REBOOT TO GENERATE THE PROBLEM??
======================
root@rx560:/lib/systemd/system# ls -l boin*
-rw-r--r-- 1 root root 418 May 22  2018 boinc-client.service
root@rx560:/lib/systemd/system# cat boinc-client.service
[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
ProtectHome=true
Type=simple
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle

[Install]
WantedBy=multi-user.target
root@rx560:/lib/systemd/system#
324) Message boards : GPUs : Is there a way in Linux Mint 19.1 to delay boinc start at boot? (Message 91554)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
[size=12]
BeemerBiker, could you please show us the output of
# cat /usr/lib/systemd/system/boinc-client.service


Something wrong with that, there is no "system" nor any reference to boinc

jstateson@rx560:/usr/lib$ cd systemd
jstateson@rx560:/usr/lib/systemd$ ls
boot           system-environment-generators  user-environment-generators
catalog        tests                          user-generators
logind.conf.d  user                           user-preset
jstateson@rx560:/usr/lib/systemd$ find . -name "*boinc*" -print
jstateson@rx560:/usr/lib/systemd$


[EDIT] there are refs to boinc in syslog, will that do??

syslog:May 19 11:33:17 rx560 boinc[1636]: 19-May-2019 11:33:17 [---] Exiting
syslog:May 19 11:33:54 rx560 boinc[838]: 19-May-2019 11:33:54 [---] Starting BOINC client version 7.9.3 for x86_64-pc-linux-gnu
syslog:May 19 11:33:54 rx560 boinc[838]: 19-May-2019 11:33:54 [---] log flags: file_xfer, sched_ops, task
syslog:May 19 11:33:54 rx560 boinc[838]: 19-May-2019 11:33:54 [---] Libraries: libcurl/7.58.0 OpenSSL/1.1.0g zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
syslog:May 19 11:33:54 rx560 boinc[838]: 19-May-2019 11:33:54 [---] Data directory: /var/lib/boinc-client
syslog:May 19 11:33:55 rx560 boinc[838]: 19-May-2019 11:33:55 [---] No usable GPUs found
syslog:May 19 11:33:55 rx560 boinc[838]: 19-May-2019 11:33:55 [---] app version refers to missing GPU type ATI
syslog:May 19 11:33:55 rx560 boinc[838]: 19-May-2019 11:33:55 [SETI@home] Application uses missing ATI GPU
syslog:May 19 11:33:55 rx560 boinc[838]: 19-May-2019 11:33:55 [---] app version refers to missing GPU type ATI
syslog:May 19 11:33:55 rx560 boinc[838]: 19-May-2019 11:33:55 [SETI@home] Application uses missing ATI GPU
syslog:May 19 11:33:55 rx560 boinc[838]: 19-May-2019 11:33:55 [---] app version refers to missing GPU type ATI
syslog:May 19 11:33:55 rx560 boinc[838]: 19-May-2019 11:33:55 [SETI@home] Application uses missing ATI GPU
syslog:May 19 11:33:55 rx560 boinc[838]: 19-May-2019 11:33:55 [SETI@home] Missing coprocessor for task blc25_2bit_guppi_58340_41306_GJ876_0038.19434.409.19.28.102.vlar_0


Maybe I have an old distribution. I got 18.04 from here
https://www.ubuntu.com/download/desktop
all I did after getting it to work was
apt-get install boinc-client

have no idea where that came from but AFAICT it is not the lastest and greatest.
325) Message boards : GPUs : Is there a way in Linux Mint 19.1 to delay boinc start at boot? (Message 91552)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
I have run into the same problem with five RX560 and 18.04 Minimal Desktop. Googling I see where boinc-client can run behind al other scrips. Tried it:

Renamed all boinc_client in Rc0...Rc6 from "01" to "91" which forces it as the last script to run but that did not fix the problem.

I can put a sleep 300 or so into the boinc_client script but I think that blocks. Since this is the last script anyway, maybe that is ok?

There are five RX560 and it seems I need about 4-5 minutes or so to allow 18.04 to stabilize.

I did not see an "autostart" at ./configure so I am not running that version with the autostart.

I am guessing I could always add at script that runs in background

MyScript &

where MyScript is actually the boinc-client with the sleep 300 delay.

I would need to pass arguments to but I don't want delay when running from command line. This gets confusing and I am no expert on Linux.

amdgpu-pro-19.10-785425-ubuntu-18.04.tar
on
ubuntu-18.04.2-desktop-amd64.iso
with RX560s

[EDIT] Just thought more about this. I would have to move boinc-client out of init.d into some /bin
In init.d I need to add a script that starts another script in background. That other script would do the delay and then run the real boinc-client.

As usual I would have to debug ownership and access properties for all scripts which is a PITA.
326) Message boards : The Lounge : Microsoft making it easier to switch to Linux or Mac (Message 91551)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
Last Tuesday major update from Micro$oft, KB449441 seems to have enabled rollover advertising. It took effect Friday morning when I discovered my 24/7 system had rebooted. First thing I noticed was Edge had really different background and the settings are different with "advanced setting" not listed (it is an icon now)

The big push to Linux started when I tried looking for a fix for a problem I had with the AMD forum registration. When I moved the mouse to click on the topic I wanted I happened to move the mouse over three advertisements. Each of those advertisements posted a full page on the tab line of Edge. When I went to close those windows I found those 3 had posted a second time as I had moved the mouse over the ads on the way to close those tabs. I got with microsoft support who advise me to repair Edge. That didn't work but replacing my "host' table with list of bad guys seems to have fixed the problem although I do see a lot of white space.

I finally found out the problem with the AMD forum registration. They requires you to actually read their "limitation" but don't tell you it has to be done. Even the MicroSoft support MVP failed to realize the problem and fell for the trap as I did.
327) Message boards : Questions and problems : Need delay in Ubuntu else client does not see GPUs. (Message 91549)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
Renamed all boinc_client in Rc0...Rc6 from "01" to "91" which forces it as the last script to run but that did not fix the problem

I can put a sleep 300 or so into the boinc_client script but I think that blocks. Since this is the last script anyway, maybe that is ok?

There are five RX560 and it seems I need about 4-5 minutes or so to allow 18.04 to stabilize.

I did not see an "autostart" at ./configure so I am not running that version with the autostart.

Jord's task scheduler worked fine in windows on my windows system but this is Linux and I am no expert on Linux.
328) Message boards : Questions and problems : Manager opens when restarting computer (Message 91548)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
If this is caused by windows re-opening applications (such as BM) you can stop that behavior as explained here
329) Message boards : Questions and problems : want to report some success with Ubuntu 18.04 and AMD's RX560 (Message 91546)
Posted 19 May 2019 by Profile Joseph Stateson
Post:
I made another stab at getting 18.04 working and was successfully only with the minimal Desktop. The server consistently gave dependency errors when using the AMD approved
amdgpu-pro-19.10-785425-ubuntu-18.04.tar
on
ubuntu-18.04.2-desktop-amd64.iso
using rufus to create a USB3 flash install

Tried twice to do full install with 3rd party driver. Both attempts failed and install package made a report to developers.

I then did a minimal desktop but still with 3rd party drivers. This worked but...
"apt-get update" hung at "gdm3" but ctrl-c allowed the update to continue.
I then did the same update again (after a reboot) but this time it got past "gdm3" whatever that is.
I am guessing that something was out of order in the update and whatever was missing got put in
and that is why the gdm3 completed the second time.
I then did
apt-get install build-essential dpkg
as I read somewhere that both were needed.
I then put in openssh and tftpd for remote access
I untarred amdgpu-pro-19.10-785425-ubuntu-18.04.tar
and the install script worked perfectly unlike when I tried the minimal server.
boinc installed just fine but I need to put in a delay as boinc does not see the
five gpus on initial boot.
Somewhere in the above, I had to put in a "sudo dpkg --configure -a"
but I was asked to do that which was nice. I think it was after the ctrl-c
330) Message boards : Questions and problems : AMD driver discovery problem revisited (Message 91543)
Posted 18 May 2019 by Profile Joseph Stateson
Post:
This system has the gpedit app unlike windows "home" so I will give another try to blocking updates.
Item 3 here says to enable update configuration (I had it disabled which had no effect). I suspect it wont work as the suggestion is dated 2015 but maybe??

My modem, Arris BCW210 does not allow blocking and my edge router will required a cable run to the problem system which is a PITA.
putting "127.0.0.1 microsoft.com" into the host table will block browsers but I suspect windows bypasses that.
331) Message boards : Questions and problems : AMD driver discovery problem revisited (Message 91537)
Posted 18 May 2019 by Profile Joseph Stateson
Post:
Do not know how to get windows to stop updating.


I have suggested this on another forum but no one who uses windows has commented on my suggestion. How about blocking the M$ domain(s) in your router?


yes , could do it but for specific systems. I have an ubiquity edge router that has IP cameras. I could not figure how to stop them from phoning home so I put them on a subnet by themselves. it didn't stop them from phoning home but they cant access my network which was a security concern. I could move the problem systems to the subnet but do not know how to block a domain. may ask at the ubiquity forum. I am not a net expert.
332) Message boards : Questions and problems : AMD driver discovery problem revisited (Message 91531)
Posted 17 May 2019 by Profile Joseph Stateson
Post:
What project or app is complaining of using the four year old opencl.dll? I believe almost all the projects are written to use OpenCL 1.2 and I don't know of any that require OpenCL 2.0.


Wanted to use newer opencl as old one possibly has bug. About 1 out of 5 work units are invalid on this driver (the 2015) but I have gotten newer drivers to run w/o any invalids. Very difficult to repeat as after a microsoft update the opencl seems to have been removed and I have trouble re-installing the one that gave no invalid errors.

Virtually every time a major release reboots I lose the working opencl driver. I sometimes see a notification that AMD has restored something (cant find out as it is momentary bottom right corner of windows 10) and I end up doing a clean install of, usually the 2018Q4 driver to fix the problem. This just happened with the Tuesday release this week, but I cannot seem to get that driver to work again. It used to do a work unit in an average of 10 seconds (4 boards) with no invalids but now I have 200+ invalid result with 8500 valid ones. I am guessing the 2018 driver (opencl) has something in it that works better with milkyway but windows keeps rebooting and AMD then changes things.

I did go to Task Scheduler and stopped the AMD updater from doing its thing. Do not know how to get windows to stop updating.
I would use ubuntu but had problem with the latest AMD ubuntu install.

[EDIT] Going to try to get system to work with no invalids. This particular system has unusual boards, three S9000 and one S9100 which is not common. However, AMD does support them in 2019server and I tried that install (the 2019svr AMD update) which installed with no error but BOINC did not see any GPU so I put the 2015 back in. I may give ubuntu a try but will avoid the more recent release and try to find one that matches the latest AMD driver. I think you run Linux, can you recommend a release that is known to work with the lasted AMD drivers? This assumes the latest drivers still work with S9x00 cards.
333) Message boards : Questions and problems : AMD driver discovery problem revisited (Message 91529)
Posted 17 May 2019 by Profile Joseph Stateson
Post:
follow-up on my previous post. Be nice if one could add comments to existing instead of a new message

I have my S9x00 boards working fine with AMD's 2015 driver and have been able to set the clock speed whereas I could not do that with AMD latest "Pro Series". However the opencl is old, 2015 as shown by clinfo.exe. The 2015 driver is the one you get if you asks microsoft to get the latest.

I then downloaded and extracted AMD_OpenCL64.dll from both AMD's 2018Q4 and their 2019Q2 releases and put those at \windows\system32 and also at \windows\SysWOW64 replacing the 2015 opencl.dll

clinfo.exe showed I had the latest opencl but all milkyway tasks errored out on either of those two. I had suspended all but a few MW tasks as I did not want 800+ tasks to error out in a couple of seconds like they did a week ago. Looks like I am stuck with the correct driver but a 4 year old opencl library.

maybe a developer could shed some light on this. How can I upgrade the opencl library but still keep video drivers that were designed for the board?
334) Message boards : Questions and problems : AMD driver discovery problem revisited (Message 91515)
Posted 16 May 2019 by Profile Joseph Stateson
Post:
Follow-up info, maybe this might be useful for debugging.

1. The file I listed, the working "coproc_info.xml", actually had some wrong information but it seems there was enough correct info for opencl to work. All boards were identified as S9100 and clock speed 900 but only the 3rd from the top was an S9100. The rest were S9000. that coproc_info.xml was created using the 10-10-2018 AMD driver, not the new one I am testing, the "19Q2"

2. The following Microsoft updates came in at 4am or so this morning



About 10am I had to shutdown that system and it did not survive the reboot. All those updates were "uninstalled" as windows recovered to the last valid state. Unfortunately, that state was the one that had the problem of the BOINC client not seeing the GPUs
I looked in windows\system32 and there was an opencl dated about 10-10-2018 so that seemed to matched the 2018 driver.

I re-installed the AMD "19Q2" driver and noticed that the opencl.dll was now dated 5-6-2019 which is the date of the 19Q2 AMD release
Boinc client started up and ran fine and I noticed that the coproc_info.xml file had all the correct info. All boards were correctly identified and the clocks speeds were now correct. It seems that the 2019 AMD drivers do a better job of identifying their own boards. Everything seems to be working fine BUT BUT BUT the "windows 10 update center" has 3 installs (those same 3 above) pending a reboot.

I am going to try the following but it is just a guess that it will work.
1. going to prevent boinc from starting automatically. I am guessing that boinc and/or the drivers crash and windows thinks there was a problem with the install which caused a fallback.
2. If I can, I will install each update individually to see if I can figure out where the problem is. Not sure how to do this as they are already downloaded and "pending". Probably a script somewhere that I can edit to prevent the install.
3. once the update is OK then I will verify that opencl.dll is still in system32 and if the drivers look ok then I will start BOINC.

QUESTION FOR DEVELOPERS (or anyone) What happens if the system has 3 different manufacturer GPUs ie: Intel GPU on mombo, an nVidia board and a Radeon board. Since the opencl.dll is in system32 and it seems to be supplied by the vendor, which one is the "better one". Putting in an older nvidia driver might toss the newer and better AMD or vice versa.
335) Message boards : Questions and problems : AMD driver discovery problem revisited (Message 91479)
Posted 14 May 2019 by Profile Joseph Stateson
Post:
On a working system with this copro_info file, I did a disk cleanup, removed all windows 10 restore points, and created a new restore point as I wanted to try a different AMD driver for the S9X000 cards I have.

I rebooted and brought up boinc to make sure all was ok before installing AMD's 19Q2 package.
Unaccountably, boinc did not see any video boards. That copro_info file had the following:
    <coprocs>
<warning>No NVIDIA library found</warning>
<warning>calInit() returned 1</warning>
<warning>clGetPlatformIDs() failed to return any OpenCL platforms</warning>
    </coprocs>


I had made a copy of that coproc_info file before the cleanup so I put it back in and marked it Read Only so it would not be overridden. I restarted boinc and the GPU's were recognized.

This system does not have any nVidia boards so obviously ( I would think) the call to find the library is going to fail. OR MAYBE SINCE I DID A CLEANUP SOME OLD VERSION OF THE LIBARAY GOT REMOVED AND BOINC IS CONFUSED AND DOES NOT LOOK FOR THE AMD STUFF.

This really needs to be fixed.

[EDIT] If I could build the client under VS2017 or m$ofts lastest & greatest possibly I could debug this problem. It might help if there was a "lightweight" windows only version available that does not have the proverbial kitchen sink.
336) Message boards : Questions and problems : AMD drivers taking too long to load: How to delay BOINC startup? (Message 91429)
Posted 7 May 2019 by Profile Joseph Stateson
Post:
Installed 7.14 and so far seems to be working. I did not lose any projects using revo uninstaller.

The uninstall got rid of the coproc file which I had set read-only but after putting in 7.14 and rebooting it was written out correctly for exactly 4 GPUs, not 8 like the previous.

All GPUs were recognized. if I have that problem again, I will enabled that task (in the task scheduler) and try again.
337) Message boards : Questions and problems : AMD drivers taking too long to load: How to delay BOINC startup? (Message 91428)
Posted 7 May 2019 by Profile Joseph Stateson
Post:
Thank Jord

Sill have problems but did make some progress

Unaccountably could not use my regular signing. I logged out and back in to make sure I had correct username / password as some time ago I used netplwiz to automate login.

Never got past the error "the account could not be used" or something like that. After poking and googling I used "administrators" with my password and was able to launch boinc.

Did not work, same problem. Tried 5 minutes then 7. I noticed I was using that old 7.12 as I forgot up restore to 7.14

Will do that in a bit.

I was watching the autologin and never saw the screen show the "cmd prompt" that is normal when running boinc.exe
Could this be a session problem? Maybe the GPUs cannot start properly but the CPU tasks work ok Could this be the "session 0" problem?

I set boinc to run after "log on" as I could not get the "at start" to use my account. After restore to 7.14 I will try again. I will also disable the autlogin and use a keyboard and monitor at the system instead of splashtop to access.

[edit] just when I thought things could not get any worse -- cannot uninstall 7.12.1 get error2503 "Called RunScript when not marked in progress" Just looked up that error - indicates a permission problem.

Will have to run revo uninstaller. Hope projects are not deleted.
338) Message boards : Questions and problems : AMD drivers taking too long to load: How to delay BOINC startup? (Message 91410)
Posted 7 May 2019 by Profile Joseph Stateson
Post:
Looks like I have a problem loading AMD drivers on systems with multipole GPUs on risers.

Boinc does not see any GPUs and works only if I stop and restart it. Sometimes it works ok and sees all of them, other times none and I do a stop/start. At least it does not hang like GPUz.

GPU-z also has problems, I cannot configure GPU-z to start automatically with windows. Has to be started after windows has "stabilized" (for lack of a better word)

Read WiKi, there is a delay parameter in cc_config but it delays starting the projects, not the client
Read THIS over at WCG but I don't think that works on windows 10 as I recall a problem with GPUs and remote access when running boinc as a service.

It that still the case?

Also read this (last item)
https://www.thewindowsclub.com/set-delay-time-startup-programs-windows

QUOTE: If something runs as a service, just change it to delayed startup. If something runs from a startup group or registry key, just create a new scheduled task to start at login, set the delay to a minute or 2, and delete the original startup shortcut/run reg key. No need for more 3rd party bloat.

I think I can figure this out but was wondering if there is a walkthrough at the WiKi for boinc for something like the above.

[edit] want to mention that the client occasionally reports two times as many AMD GPUs as I actually have but I can fix this by editing that coproc_config file. This problem, counting drivers twice, really needs to be fixed. Maybe this is contributing to the problem even though I have "fixed" that config file so it cannot be re-written incorrectly.

[edit again]
Forgot to mention that I am not using BM but start boinc directly from the registry using
"D:\ProgramFiles\Boinc\boinc.exe" --detach --allow_remote_gui_rpc
339) Message boards : BOINC client : What does busy mean in work_fetch event log entry, also rr_sim doing something strange with time slice. (Message 91389)
Posted 5 May 2019 by Profile Joseph Stateson
Post:
I cannot help with the sources, maybe one of the developers can.

However, I have used 720 minutes myself for various projects: srbase, WCG.

Unless I am mistaken, you have a non-gpu system and are currently working only WCG and Rosetta.

What problem are you seeing?
340) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91388)
Posted 5 May 2019 by Profile Joseph Stateson
Post:
Limit for tasks in progress is 200 per GPU plus some (40 if i recall correctly) per CPU core.

But i don't interested in such workarounds. I already found own some time ago.
Just a very simple script which force updates of the project every 2 min via boinccmd.
:start
timeout 120
boinccmd.exe --project milkyway.cs.rpi.edu/milkyway update
goto start

Works very well for me: every few updates client has nothing to report (no completed tasks yet) and thus request to get new work succeed. And cache newer run dry while this script is working.

But it just workaround, not a fix. I want to find root of the problem and pass it to BOINC devs so they can smash this bug for good.


I have too many tasks completing all the time. Running that loop would generate 'Last request too recent". It would eventually work right near when the queue was empty which is actually an improvement over my method that waits will all tasks are complete.

I don't think this "bug" will get fixed anytime soon. Lot of developers are grad students or volunteers. At one time MW sent tasks to GPUs that did not have double precision hardware and there would be 1000's of errored out tasks as they kept sending more work units. I suspect that some users and staff on other projects might consider this bug to actually be a feature as it allows other projects to get some work.

I have systems I need to access remotely so it is convenient to be able to issue the update across the internet which I now have the tools to do.

Here is before and after image of improvement. However, I will try putting my program in a loop like yours and see if I can cut the idle time down more.
341) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91362)
Posted 3 May 2019 by Profile Joseph Stateson
Post:
I have got some spare time to test. And now can confirm that problem with getting work from MW happens only with combined requests (report completed tasks + request new).
And fast machines and/or short WUs affected through this: if machine fasts enough or WU short enough almost every request will include report of completed WUs, and fail due to it.

I did not find any BOINC options to suspend reporting of completed task, so i did it manually. I suspend all tasks close to 100%, forcing BOINC to start new ones. This way i managed to get about half an hour of work without a single task completed (and thus no reported tasks in requests too). And during this time - ALL requests to get new work was successful:


I tried a few things after reading your post. I discovered that if I suspend a single task that the client no longer asks for data after an upload. However, that dead ended as the task had to be un-suspended and I was down to having to do an update after that one task reported. However, I did find an expedient method to get Milkyway work units but it requires BoincTasks.

BoincTasks can determine when a project has zero time left. If I wait minimum of 91 seconds after the empty project signal and issue an update request then the project should provide more work. I ran a test of the basic premise and it seems to be work. Unfortunately, the BoincTask program has to go on the computer that is running milkyway as an update cannot be issued to a remote system although I am putting a tool together to do that.

The following works:

On the system running BoincTasks I created the batch file "d:\runtest.bat" and that file contained the following commands

D:\ProgramFiles\boinc\boinccmd.exe --project "http://milkyway.cs.rpi.edu/milkyway/" update
time /T >> d:\results.txt


In BoincTasks I created the following rule
On project TimeLeft < 1 then wait 1 minute 45 seconds before running d:\runtest.bat

This worked fine and I found the following in the rule log
03 May 2019 - 00:20:54 Rule(s) ---- Active: 1
03 May 2019 - 00:20:54 Rule: MWempty ---- z400-4-s9x00, Milkyway@Home, 1.46 Milkyway@home Separation (opencl_ati_101),  | Time Left Project < 00d,00:00:01
03 May 2019 - 00:20:54 ============================================================================== ---- 
03 May 2019 - 02:00:41 Rule: MWempty, trigger ---- z400-4-s9x00, Milkyway@Home, (Time left project <00d,00:00:00), 
03 May 2019 - 02:00:41 Rule: MWempty ---- Program executed ok: d:\runtest.bat ()
03 May 2019 - 02:14:42 Rule: MWempty ---- No longer active: z400-4-s9x00, Milkyway@Home, (Time left project <00d,00:00:00)
03 May 2019 - 04:48:29 Rule: MWempty, trigger ---- z400-4-s9x00, Milkyway@Home, (Time left project <00d,00:00:00), 
03 May 2019 - 04:48:29 Rule: MWempty ---- Program executed ok: d:\runtest.bat ()
03 May 2019 - 05:02:29 Rule: MWempty ---- No longer active: z400-4-s9x00, Milkyway@Home, (Time left project <00d,00:00:00)
03 May 2019 - 07:35:21 Rule: MWempty, trigger ---- z400-4-s9x00, Milkyway@Home, (Time left project <00d,00:00:00), 
03 May 2019 - 07:35:21 Rule: MWempty ---- Program executed ok: d:\runtest.bat ()
03 May 2019 - 07:39:22 Rule: MWempty ---- No longer active: z400-4-s9x00, Milkyway@Home, (Time left project <00d,00:00:00)


The event log for my system showed the request to do the update.
190	Milkyway@Home	5/3/2019 2:00:41 AM	update requested by user	
191	Milkyway@Home	5/3/2019 2:00:45 AM	Sending scheduler request: Requested by user.	
---
205	Milkyway@Home	5/3/2019 4:48:29 AM	update requested by user	
206	Milkyway@Home	5/3/2019 4:48:33 AM	Sending scheduler request: Requested by user.	
---
311	Milkyway@Home	5/3/2019 7:35:21 AM	update requested by user	
312	Milkyway@Home	5/3/2019 7:35:23 AM	Sending scheduler request: Requested by user.	


However, the update occurred on my PC that has BoincTasks not on the system that had run out of data, but at least this shows the concept works.
Looking at the 7:35 time (logs do not back far enough for the other two) on the offending system I see the following
78694	Milkyway@Home	5/3/2019 7:32:33 AM	Reporting 13 completed tasks	
78695	Milkyway@Home	5/3/2019 7:32:33 AM	Requesting new tasks for AMD/ATI GPU	
78696	Milkyway@Home	5/3/2019 7:32:35 AM	Scheduler request completed: got 0 new tasks	
78697	Milkyway@Home	5/3/2019 7:38:11 AM	Sending scheduler request: To fetch work.	
78698	Milkyway@Home	5/3/2019 7:38:11 AM	Requesting new tasks for AMD/ATI GPU	
78699	Milkyway@Home	5/3/2019 7:38:16 AM	Scheduler request completed: got 602 new tasks	
…
78720	Milkyway@Home	5/3/2019 7:39:51 AM	Sending scheduler request: To fetch work.	
78721	Milkyway@Home	5/3/2019 7:39:51 AM	Requesting new tasks for AMD/ATI GPU	
78722	Milkyway@Home	5/3/2019 7:39:55 AM	Scheduler request completed: got 298 new tasks	


There was only a 5:43 delay from getting "0" to getting "602" tasks. and about a minute later the additional 296 came in. This system has 4 GPUs so I expect 200*4 but the close to 900 is ok
I have a program HERE that is capable of issuing an update request across the internet and I will mod it to run from that batch file.
However, all that will get me on the "7:33;35" empty signal is to reduce the idle time to about 2 or 3 minutes. This particular idle gap, is only 5:43 and I have seen others much longer.

This graph shows the results. The 7:33:35 idle is at the one that goes up to about 9 minutes. That is because completion time on this system is 3 minutes (5 concurrent tasks) so 3 + 5:43 is almost 9 minutes. Note the almost 25 minute delay about 9 hours back. When I implement this scheme the idle times will drop but will still be present.
342) Message boards : Questions and problems : Queue size (preferences) not being used on SETI - why? (Message 91353)
Posted 2 May 2019 by Profile Joseph Stateson
Post:
My Bad - I set priority to 0 (zero) which caused the problem in the first place
Discussion here
https://setiathome.berkeley.edu/forum_thread.php?id=84173&postid=1992318#1992318
343) Message boards : Questions and problems : Queue size (preferences) not being used on SETI - why? (Message 91348)
Posted 2 May 2019 by Profile Joseph Stateson
Post:
Noticed for some time that SETI had exactly 1 task running on each of the 3 GPUs. There is no queue depth. Since generally there are 100,000 or so at the project then something is wrong.

All my systems use account manager BAM! but it seems that preferences at BAM! are not used (they show .1 and .25), the same as the local client preference (according to BoincTasks)
I went to SETI and set preferences there for .25 daily queue with .50 additional (used to be .1 and .25) just to see what happened.

Did an update as that was required by the project and event queue reported

    3347 SETI@home 5/2/2019 10:53:52 AM update requested by user
    3348 SETI@home 5/2/2019 10:53:52 AM Sending scheduler request: Requested by user.
    3349 SETI@home 5/2/2019 10:53:52 AM Not requesting tasks: don't need (CPU: ; AMD/ATI GPU: )
    3350 SETI@home 5/2/2019 10:53:54 AM Scheduler request completed
    3351 SETI@home 5/2/2019 10:53:54 AM General prefs: from SETI@home (last modified 02-May-2019 10:53:54)
    3352 SETI@home 5/2/2019 10:53:54 AM Host location: none
    3353 SETI@home 5/2/2019 10:53:54 AM General prefs: using your defaults
    3354 5/2/2019 10:53:54 AM Reading preferences override file
    3355 5/2/2019 10:53:54 AM Preferences:
    3356 5/2/2019 10:53:54 AM max memory usage when active: 6139.56 MB
    3357 5/2/2019 10:53:54 AM max memory usage when idle: 11051.20 MB
    3358 5/2/2019 10:53:54 AM max disk usage: 116.17 GB
    3359 5/2/2019 10:53:54 AM max CPUs used: 20
    3360 5/2/2019 10:53:54 AM (to change preferences, visit a project web site or select Preferences in the Manager)



As far as I could tell not only did the increase have no effect but I actually lost a work unit as the update asked for a data too soon which caused (I am guessing) the project asked for a backoff. So, an UPDATE needs to happen before the preferences get updated and during an UPDATA the client asks for more data? Is this correct?

After 5-6 minutes (backoff is 300 seconds as I recall) I finally got an extra workunit and all three of my RX560 are busy.

However, what happened to the request for addition buffer? Is the project preferences being overridden by the general client? In any event exactly 1 work unit for a GPU is a queue of exactly ZERO. Where is the original .1 day or the new .25

What has control over preference? client? bam!? project?

Maybe this should be asked over at SETI??

[EDIT]
I just changed the local (client) preferences and read the following that indicates I need to go to the project (which I did earlier)

    jysdualxeon

    3603 SETI@home 5/2/2019 11:42:31 AM General prefs: from SETI@home (last modified 02-May-2019 10:53:55)
    3604 SETI@home 5/2/2019 11:42:31 AM Host location: none
    3605 SETI@home 5/2/2019 11:42:31 AM General prefs: using your defaults
    3606 5/2/2019 11:42:31 AM Reading preferences override file
    3607 5/2/2019 11:42:31 AM Preferences:
    3608 5/2/2019 11:42:31 AM max memory usage when active: 6139.56 MB
    3609 5/2/2019 11:42:31 AM max memory usage when idle: 11051.20 MB
    3610 5/2/2019 11:42:31 AM max disk usage: 116.17 GB
    3611 5/2/2019 11:42:31 AM max CPUs used: 20
    3612 5/2/2019 11:42:31 AM (to change preferences, visit a project web site or select Preferences in the Manager)



In any event, nothing happened though I did not lose a work unit because no update was actually done.


Here is an image from Boinc Manager (not BoincTasks). It shows only 2 tasks running, one GPU is idle and no queue size.

344) Message boards : Questions and problems : Multiple boinc clients are shown in "Startup Task" page (Message 91346)
Posted 2 May 2019 by Profile Joseph Stateson
Post:
You're over-engineering things. BOINC has it's own control in the Options menu in BOINC Manager:

Other options --> Run Manager at login?

Uncheck that - job done.

Task Manager shouldn't be used or necessary unless things have got seriously screwed up. Some of those entries will be for supporting tools most people have never heard of, like boinctray.exe


Using BoincTasks as I have several systems. I do cut back # of systems during the summer months however.

This problem seems to have fixed itself. Since this was a new windows 10 system (I could not get AMD RX drivers to work with ubuntu 18 or 19) then, after a few hours, the first of several huge updates to windows came in and after the first reboot all those extra "start ups" disappeared. This was a dual xeon and in my first attempt to do that "free" upgrade to win10 I forgot that win10-Home does not handle more than one physical CPU so I had to repeat the upgrade to the "pro" version. This might have messed things up but I tried rebooting several times and those extra boinc "start ups" stayed in the task manager until that (maybe) "October" update got finally installed.

That free upgrade from 7 to 10 still works (thanks micro$oft!) but it creates the original windows 10 from a few years ago and there have been major updates.
345) Message boards : Questions and problems : Multiple boinc clients are shown in "Startup Task" page (Message 91336)
Posted 2 May 2019 by Profile Joseph Stateson
Post:
Not sure why this happens to me, but I just converted a Linux system to win7 then that free upgrade to win10 and immediately ran into my unusual GPU problem that seems no one but I have experienced.

1. Have 2 physical RX-560 but 4 show up in event log. I did a complete uninstall of AMD software, rebooted and back to two RX-560, system seems running fine: CPU tasks run OK and the two SETI tasks on the RX560s run ok.

I wanted to reboot to make some changes and did not want BOINC to start so I brought up the task manager startup page to disable boinc.

There were 3 listing for boinc. I brought up regedit, did not see a problem so I then searched the registery for "boinc.exe" but found nothing except the one in RUN that is supposed to be there. I have that in all my windows systems.
"C:\ProgramFiles\Boinc\boinc.exe" --detach --allow_remote_gui_rpc
I have no idea why there are 3 entries in the startup. There is only one executable running.



I have rebooted since but still have those entries in the task manager startup. I even looked in that old "start menu" folder that older windows used but didn't see anything. I have no clue how this happened. Systems seems running OK.

[EDIT] Looked thru the event logs but saw nothing related to boinc except the successful install.
346) Message boards : GPUs : ubuntu 19.04 and AMD drivers (Message 91330)
Posted 1 May 2019 by Profile Joseph Stateson
Post:
Get:1 file:/var/opt/amdgpu-pro-local ./ amdgpu-core 19.10-785425 [2,416 B]


No graphics card here so I am probably missing something but I am impressed that there even is a driver for 19l10 when Ubuntu 19.10 doesn't come out till October!


Actually, I missed it: The 19.10 is the version of the AMD driver, not Ubuntu

https://askubuntu.com/questions/1139717/amd-latest-driver-18-04-install-on-19-04-min-server
347) Message boards : GPUs : ubuntu 19.04 and AMD drivers (Message 91325)
Posted 1 May 2019 by Profile Joseph Stateson
Post:
hmm - tried just the the opencl install as explained here
https://linuxconfig.org/install-opencl-for-the-amdgpu-open-source-drivers-on-debian-and-ubuntu

didn't get too far. the amd package for 19.10 said it could only be installed on 18.04 !!!

Get:1 file:/var/opt/amdgpu-pro-local ./ amdgpu-core 19.10-785425 [2,416 B]
(Reading database ... 100259 files and directories currently installed.)
Preparing to unpack .../amdgpu-core_19.10-785425_all.deb ...
ERROR: This package can only be installed on Ubuntu 18.04.


Should have checked out the available drivers before putting in the latest & greatest ubuntiu.
348) Message boards : GPUs : ubuntu 19.04 and AMD drivers (Message 91311)
Posted 1 May 2019 by Profile Joseph Stateson
Post:
I put ubuntu 19.04 on one of my systems then realized (after everything was working) that latest AMD drivers were for 18.30.

The AMD install fails with "broken packages" messages.

This was just a minimal server install. I assume the problem is 19 and not that I put in a minimal install.

Has anyone gotten those "18" AMD ubuntu drivers to work with 19? If not, then I will have to install 18.30

[EDIT] The video board, RX-560, works fine but BOINC sees no useable GPU so I assume I need that driver.
349) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91259)
Posted 29 Apr 2019 by Profile Joseph Stateson
Post:
Hope this information gets passed on to Eric or Tom. Since Jake is leaving the project shortly he won't be our project contact anymore.


This whole thing is just storm in a teacup. One should be contributing to other projects that are just as deserving whether a deficiency in scheduling or not. All my programs are at GitHub\BeemerBiker and can be built with VS2017. Hope someone finds them useful. I had a lot of fun writing them, especially the Spider Solitaire solver.
350) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91256)
Posted 29 Apr 2019 by Profile Joseph Stateson
Post:
With not much to do since I retired, I am having fun looking at this problem. I discovered the rpcClient library (thanks a lot!) and wrote a program to check the number of tasks remaining using a 2 minute timer and then automatically issue an update to my system on the next 2 minute tick. That guarantees (assuming not off-line) about a 4-6 minute maximum empty gap. Sure enough it worked about 2am, 6 hours ago exactly.

Picture of "count down" is HERE

Graph of last 8 hours is HERE

I didn't restart the program so there were two additional gaps of 13 minutes that occurred in the hours afterwards.
351) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91243)
Posted 28 Apr 2019 by Profile Joseph Stateson
Post:
Richard is the expert in decrypting the work fetch debug output. Everything appears normal for intervals. The work requests look normal. What I don't understand is why you are getting backoffs for 10 and 5 minutes directly after the scheduler acknowledges receipt of reported work. That is coming from the scheduler and not from your host or client. Normally the scheduler backs off if there are issues in contacting the servers or the client has issues downloading work and the client can't acknowledge correct reception of the sent tasks. Have you looked at the Transfers tab in the Manager after you have requested work and see if you have task downloads in backoff?
No need. Look at that event log again, without the clutter:

15402 Milkyway@Home 4/28/2019 10:12:30 AM Scheduler request completed: got 0 new tasks
15404 Milkyway@Home 4/28/2019 10:12:30 AM Project requested delay of 91 seconds
15416 Milkyway@Home 4/28/2019 10:12:30 AM [work_fetch] backing off AMD/ATI GPU 723 sec
The project requested 91 seconds. The backoff was done by the client, as a normal reaction to the lack of available work. And if no work was assigned by the server, there'll be no files to download, and nothing will show in the transfers tab.

Sorry, I've had a busy weekend showing visitors round Yorkshire. They've moved on to London now, but I found myself surprisingly tired (and I've got a watercooler appointment with the TV later tonight). I should be back to normal tomorrow, and I'll try to look through the rest of the log before your morning starts.


Another question might be: Why were 0 tasks sent when the project had about 11,000** tasks ready to send. If the project does not want to send tasks (for whatever reason) then the problem is the project and not the client.

If I wait out the seconds (723 or whatever) then I eventually get some new work. I have had other systems with nVidia cards running milkyway. They run much slower and I don't see them run out of data unless the project is off-line.

*** Not sure how often the server status is updated but I checked it when my last milkyway task finished and the delay started. I did not get any new work for a few minutes so I issued a project update and got work immediately. It is looking like the project is not sending stuff that it has and the client is backing off thinking there is no work which would be the correct procedure IF and only IF the project actually had no work. My guess is the problem is on the server side. Going to put 7.14.2 back on that system.
352) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91240)
Posted 28 Apr 2019 by Profile Joseph Stateson
Post:
Richard is the expert in decrypting the work fetch debug output. Everything appears normal for intervals. The work requests look normal. What I don't understand is why you are getting backoffs for 10 and 5 minutes directly after the scheduler acknowledges receipt of reported work. That is coming from the scheduler and not from your host or client. Normally the scheduler backs off if there are issues in contacting the servers or the client has issues downloading work and the client can't acknowledge correct reception of the sent tasks. Have you looked at the Transfers tab in the Manager after you have requested work and see if you have task downloads in backoff?


I will check that possibility. Be nice if that info was in the event log. AFAICT there is no transfer "history" to review so I got to be ready to catch it. I have good bandwidth here at home but very rarely downloads hang up if too many concurrent. Conceivably, if a number of GPUGRID tasks complete all at once then the upload can be bottlenecked.

I had asked Fred at BoincTasks about implementing a rule for a project being out of data as I could then use the rule to run a batch file and send a text message to my phone. He has a lot on his plate so not sure about when or if that gets implemented. AFAIK that is the only way to find out in real time if project is out of data (other than editing boinc code and building a test program). If the client could put transfer info such as number of pending and estimated time into the event log that would be a real help.

[EDIT] Actually, can babysit the last few work units and when the hit 0 tasks, bring up the transfer tab to see WTF is going on.
353) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91236)
Posted 28 Apr 2019 by Profile Joseph Stateson
Post:
OK, found a way to remove clutter, using BoincTasks "select project" to see only milkyway

====at line 15819=====
At 10:14:04 was last report of completed tasks. 9 reported. THE QUEUE IS EMPTY AT THIS TIME

At 10:20:58 got 621 new tasks. Delay of 6 minutes. Not to bad compared to 15 minutes I have seen in past.

Printout from line 15395 to 16650 is here
stateson.net\images\15395.txt

I have the whole 9 yards available if needed.
HTH !!!

[EDIT] Looks like project requested a 6 minute delay! Could this be the problem? Was it the client that wants a delay? I don't know how to read this info. Is it explained somewhere? If so I don't mind doing an analysis.

15835 Milkyway@Home 4/28/2019 10:14:06 AM [work_fetch] backing off AMD/ATI GPU 381 sec
354) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91235)
Posted 28 Apr 2019 by Profile Joseph Stateson
Post:
Yes, <work_fetch_debug> is a bit of a blunderbuss. Best to set it once, wait until it's done just one cycle, and then unset it again while you pick over the pieces. [That's why I got them to put an 'apply' button on the dialog :-)]

But it is powerful - if you could fillet out that one complete cycle from

[work_fetch] ------- start work fetch state -------

to

[work_fetch] ------- end work fetch state -------

and post it here, we could take a look. Might contain some clues.



I managed t find
    if (found) {
        p->sched_rpc_pending = RPC_REASON_NEED_WORK;
    } else {
        if (log_flags.work_fetch_debug) {
            msg_printf(0, MSG_INFO, "[work_fetch] No project chosen for work fetch");


at THIS location but there was no selection for projects. Really need to exclude projects that are not active. I will delete all unused projects from this system to clean up the message log. l gave up trying to build boinc under VS2017 sometime ago.
355) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91233)
Posted 28 Apr 2019 by Profile Joseph Stateson
Post:
I think we need to understand a little bit better where this delay is coming from.

I'd always suggests setting the <sched_op_debug> Event Log flag

https://ci.appveyor.com/project/BOINC/boinc/builds/23992763/artifacts

I will set those log flags and try to get a better picture of what is happening. I did look at that appveyor but I don't think it applies as I do not use max concurrent in cc_config. I do have an app_config for milkyway that I discovered long ago using google. I am not sure what all it does but it does list more info about the GPU and supposedly allows tasks to run faster. I assume it is not causing the problem.

<app_config>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_ati_101</plan_class>
<avg_ncpus>0.20</avg_ncpus>
<ngpus>0.19</ngpus>
<cmdline>--non-responsive --verbose --gpu-target-frequency 1 --gpu-polling-mode -1 --gpu-wait-factor 0 --process-priority 4 --gpu-disable-checkpointing</cmdline>
</app_version>
</app_config>


[EDIT] going to use the following
<cc_config>
<log_flags>
	<work_fetch_debug>1</work_fetch_debug>
	<sched_op_debug>1</sched_op_debug>
</log_flags>
</cc_config>


[EDIT AGAIN]
Getting message "no project chosen for work fetch". I looked at wiki for cc_config and did not see how to restrict work fetch to just milkyway else I get a lot of messages from projects that are not active
356) Message boards : BOINC client : 7.14.2 and 7.12.1 both fail to get work units on very fast systems (Message 91230)
Posted 28 Apr 2019 by Profile Joseph Stateson
Post:
There was a discussion about this at milkyway and also at seti. Basically my 4 GPUs finish a work unit in 10 seconds on the average. The queue when full is typically 600 - 800 but after it empties (milkyway project) no work units are provided for anywhere from 5 - 15 minutes. The suggestion was to downgrade to 7.12.1 but that did not fix the problem. This is inconvenient as boinc schedules other projects in whereas I have set a priority where I don't want them to run unless the primary project is down, off line, etc. I can issue a manual "update" to fix the problem so the project has data but wont send it.

Tried 7.12.1 : got a 10.5 minute delay as shown at 7:43. Longer delays with 7.14.2 as shown here going back 24 hours of history.

1			4/27/2019 5:02:43 PM	Starting BOINC client version 7.12.1 for windows_x86_64	
2			4/27/2019 5:02:43 PM	log flags: file_xfer, sched_ops, task	
3			4/27/2019 5:02:43 PM	Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8	
4			4/27/2019 5:02:43 PM	Data directory: C:\ProgramData\BOINC	
5			4/27/2019 5:02:43 PM	Running under account josephy@stateson.net	
6			4/27/2019 5:02:45 PM	OpenCL: AMD/ATI GPU 0: AMD FirePro S9100 (driver version 2671.3, device version OpenCL 1.2 AMD-APP (2671.3), 6144MB, 6144MB available, 3226 GFLOPS peak)	
7			4/27/2019 5:02:45 PM	OpenCL: AMD/ATI GPU 1: AMD FirePro S9100 (driver version 2671.3, device version OpenCL 1.2 AMD-APP (2671.3), 6144MB, 6144MB available, 3226 GFLOPS peak)	
8			4/27/2019 5:02:45 PM	OpenCL: AMD/ATI GPU 2: AMD FirePro S9100 (driver version 2671.3, device version OpenCL 2.0 AMD-APP (2671.3), 12288MB, 12288MB available, 4608 GFLOPS peak)	
9			4/27/2019 5:02:45 PM	OpenCL: AMD/ATI GPU 3: AMD FirePro S9100 (driver version 2671.3, device version OpenCL 1.2 AMD-APP (2671.3), 6144MB, 6144MB available, 3226 GFLOPS peak)	
-
-
-
-
2213	Milkyway@Home	4/27/2019 7:31:21 PM	Computation for task de_modfit_85_bundle4_4s_south4s_0_1555431910_4124594_0 finished	
2214	Milkyway@Home	4/27/2019 7:32:32 PM	Sending scheduler request: To fetch work.	
2215	Milkyway@Home	4/27/2019 7:32:32 PM	Reporting 7 completed tasks	
2216	Milkyway@Home	4/27/2019 7:32:32 PM	Requesting new tasks for AMD/ATI GPU	
2217	Milkyway@Home	4/27/2019 7:32:34 PM	Scheduler request completed: got 0 new tasks	
2218	Milkyway@Home	4/27/2019 7:43:14 PM	Sending scheduler request: To fetch work.	
2219	Milkyway@Home	4/27/2019 7:43:14 PM	Requesting new tasks for AMD/ATI GPU	
2220	Milkyway@Home	4/27/2019 7:43:20 PM	Scheduler request completed: got 598 new tasks	
2221	Milkyway@Home	4/27/2019 7:43:23 PM	Starting task de_modfit_80_bundle5_4s_south4s_0_1554998626_1474893_2	
[/code]
357) Message boards : The Lounge : The Seti is Slumbering Cafe (Message 90969)
Posted 7 Apr 2019 by Profile Joseph Stateson
Post:
Neither setting one's hair on fire nor howling at the moon has brought it back. I has been down long enough to declare a
Super Grand Mal outrage
Gary is correct it is party time!


It has been down long enough to get put on the "no pay" list which means that all of my pending uploads are quote "not available for reward".
358) Message boards : BOINC client : Problem building client with VS2017 (Message 90347)
Posted 28 Feb 2019 by Profile Joseph Stateson
Post:
Driver update from AMD fixed a few things.

I combined two systems into one by using pair of 1x risers: One RX560 on an x16 and two RX560 on 1x risers.

Device manager showed all 3, Radeon 19.1.1 showed only 2 and boinc only saw 2. The crossfire option was missing. GPU-z showed 2 boards at %100 and one idling.

I then put all 3 boards on 3@x16 (electrical 16,8,8) which cannot normally be used as there is a heat problem (no fans on this open system). Anyway, this was worse. Device manager shows 3 but Radeon only showed 1 board and boinc showed only 1. gpu-z showed two idling. I assume this was a cross fire problem. I looked again for crossfire selection but not there.

I then upgraded to 19.2.3 and used the pair of 1x risers again. Crossfire option showed up and I was able to unselect crossfire. Boinc saw all 3 RX560 and gpu-z showed that all 3 were under %100 load. The risers are nice because the separate the graphics boards which then run cool w/o any fans. They all run SETI. The increase in speed over CPU is so large it was more efficient to put all 3 boards on a single mombo.

I have now put VS2013 on a separate system along with that required SDK and will look at a boinc build just on that system.
359) Message boards : BOINC client : Problem building client with VS2017 (Message 90044)
Posted 12 Feb 2019 by Profile Joseph Stateson
Post:
On the subject of VS2017 ….

Googling around I read where VS2013 apps can be built on VS2017 if the 120v_xp SDK is available as a retarget. The recommendation was to install VS2013 community and then upgrade to VS2017 community. This was because the SDK was not available or at least a few members on stackoverflow and other forums were unable to find an installable SDK (120v_xp).

I tried just building the boinc client (module libboinc and boinc) but retargeted to latest sdk and for x64 only as that was what I was interested in. There were 96 source programs that compiled correctly (an obj was built). Module boinc could not be built because of diagnostics_win.cpp problem (one file). libboinc cold not be built because of problems in 4 files (same diagnostics_win and 3 includes)

Certainly looks like VS2017 could build the client. That is easier said than done. Does anyone have an installable SDK that "120v_xp" target? If so, PM me where to find it.

Thanks for looking!
360) Message boards : BOINC client : Problem building client with VS2017 (Message 90033)
Posted 12 Feb 2019 by Profile Joseph Stateson
Post:
Finally got it working w/o having to modify that coproc_info.xml file: using driver 17.12.2 but probably works with later version.

This is what I think caused the problem but it is just a guess: An unwanted update from microsoft came in causing driver problem. I installed a driver for the RX570 from my set of drivers but probably failed to uninstall the one that came in;. I am guessing that left an older (or maybe newer?) opencl on the system. I tried several drivers, uninstalling and reinstalling and, I assume, must have uninstalled one that happened to match the leftover opencl. Then, when I put in 17.12.2, there were no other opencl libraries.
My guess is that AMD's uninstall only remove the opencl that was installed with the package. ie: no "cleanup" of opencl. Possibly the custom install would give an option for a clean install. I consistently used the express install.
361) Message boards : BOINC client : Problem building client with VS2017 (Message 90028)
Posted 11 Feb 2019 by Profile Joseph Stateson
Post:
The problem has always been the drivers. That being said, Windows and GPU-Z have never reported more GPUs than actually exist unlike BOINC.

I ran that clinfo test but leave the analysis to you. The results are in this zip file.
clinfo_nocrossfire.txt was run with crossfire disabled (more on that later)
clinfo_dif.txt is the difference between crossfire and no crossfire (FWIW).
coproc_info.xml list 4 gpus: First 2 on the more recent opencl driver, the second two on an older driver that I CANNOT YET FIND A WAY TO UNINSTALL.

One of the driver installs, probable one that windows did on its own, enabled crossfire. I cannot use crossfire and it causes problem with one of my apps "DVDFAB". I noticed it was enabled when GPU-z showed the frequency was "0" on the second RX570 and I disabled crossfire and GPU-z is back to normal as shown. By back to normal I mean that both boards are running near 100% and the second RX570 eventually will finish a work unit, and quickly, unlike when crossfire was enabled.

However, I still have 2 fake GPUs (on driver 2671.3) and must resort to the trick of editing that xml file to ensure that BOINC sees only the first 2 GPUs.
Currently I have been using revo uninstaller to remove excess stuff. revo executes the AMD uninstall package and then it scans the registry and allows me to clean up all references left over after the uninstall. It seems I need something better. I would like to avoid "cleaner" which I think is spyware. I will look for a better AMD cleaner. What concerns me is that windows and gpuz seem to have no problem with the extraneous driver The pulldown box for gpuz only had 2 boards listed. Windows device manager shows only 2 devices but does allow me to rollback to the previous driver so it know there is another.

Thanks for looking.



[EDIT] This may be windows thing. This is on my boinc farm and I could have used Linux but I noticed the motherboard had a license in the bios so I put in windows for free. I can probably put in a flash drive with ubuntu and solve this problem in an hour.
362) Message boards : BOINC client : Problem building client with VS2017 (Message 90026)
Posted 11 Feb 2019 by Profile Joseph Stateson
Post:
Made some progress in locating the problem.

No, did not get vs2017 working with boinc. Did not get VS2013 working although I did find an ISO and downloaded it and will look at its installation later.

Ageless: I got your message and will (eventually) go over to gethub to report the problem which is I suspect AMD driver related.

My problem: Usually any system that has an RX560 or RX570 or an RX Vega has twice as many GPU show up in boinc as there really are. This locks up the VNC service which then makes it difficult to get in and see what is happening.

Sometimes using revo-uninstaller gets rid of all AMD drivers and things are back to normal until the next microsoft upgrade when a new amd goes in and all hell breaks loose. Since VNC server hangs I had to install openssh server to get in to allow a termination of boinc.

read the following at boinc\client\gpu_detect.cpp

    // client-specific GPU code. Mostly GPU detection
    //
    // theory of operation:
    // there are two ways of detecting GPUs:
    // - vendor-specific libraries like CUDA and CAL,
    // which detect only that vendor's GPUs
    // - OpenCL, which can detect multiple types of GPUs,
    // including nvidia/amd/intel as well was new types
    // such as ARM integrated GPUs
    //
    // These libraries sometimes crash,
    // and we've been unable to trap these via signal and exception handlers.
    // So we do GPU detection in a separate process (boinc --detect_gpus)
    // This process writes an XML file "coproc_info.xml" containing
    // - lists of GPU detected via CUDA and CAL
    // - lists of nvidia/amd/intel GPUs detected via OpenCL
    // - a list of other GPUs detected via OpenCL
    //
    // When the process finishes, the client parses the info file



So that file is created and then re-read to see what was found!

Sure enough, I looked at coproc_info.xml and there were 4 gpus. The first two had the latest opencl driver, the bottom two had the previous which was supposedly uninstalled.

I used notepad to edit that xml and removed the two at the bottom. I then made the coproc_info.xml read only so it could not be updated. When I started boinc I was back to my two gpu's but BUT B.U.T.

Unfortunately, the 2nd gpu was fake. At least there are only 2 to worry about, not 4.

This is what I mean by fake: note the values on the 2nd board. The flops should be identical.



I am going to look into editing coproc_info.xml, maybe that can fix the problem. I have another system with an RX570 and RX560 that does not have a problem (it used to). Going to look into its copro_info.xml for some clue. I am not letting it update its drivers. Unaccountably I cannot restore the system with 2 RX570 to make it work again.

This is far from being fixed as VNC locks up accessing the system nor does that fake board even run tasks correctly: they never finish.

363) Message boards : BOINC client : Problem building client with VS2017 (Message 90002)
Posted 11 Feb 2019 by Profile Joseph Stateson
Post:
I have been using VS2017 for some time and have several repositories at GitHub.

I wanted to debug a problem I am seeing with the RX-570 amd boards so I forked boinc and attempted to build the client.

The build requires VS2013 which I don't want to install since I using 2017. I changed the properties on boinc and libboinc to point to the latest SDK and retargeted both to VS2017. There were a boat load of errors. First one was curl.h was missing. I downloaded that and put it at one of the include paths. That fixed the compile error but I suspect the program wont find the library if I ever get it to run. I then started on the next error (see below): "CLIENT_ID" type redefinition. That showed up under a comment about MinGW_W64 defining this. Well, this system does have MinGW_W64 as it was needed for a project that did not use Visual Studio. I am guessing that VS2017 somehow picked up the MinGW env (include) variables??? That is suspicious.

Questions:
1 Has this been build with VS2017?

2. I have another system I can put vs2013 and that 120V_xp sdk on. (that's the SDK for vs2013). Will that solve all the problems including the missing curl stuff?

3. How is the actual client for windows built? If MinGW_W64 is how Berkeley is building the windows version, should I can switch to MinGW? Unfortunately, its debugger is nothing like MSVC and I suspect I will have to throw in print statements to see shat is happening.
364) Message boards : Questions and problems : Boinc drains battery faster than charger can supply. How to resolve? (Message 89793)
Posted 25 Jan 2019 by Profile Joseph Stateson
Post:
You have my sympathy.

I have a Surface Pro 4 with a weak charger. I looked at Best Buy into a docking station but specs said it could not be used to charge through the USB3 ports while attached to the surface pro. Best buy did not carry a charger for surface pro as it was proprietary. I asked at the microsoft store and they said I had to get a charger for the "surface book" if I wanted a heavy duty charger for the "pro"
365) Message boards : Questions and problems : more "phantom" AMD gpus: got 2x as many rx560 as I really have (Message 89777)
Posted 24 Jan 2019 by Profile Joseph Stateson
Post:
Think I solved this problem, not sure exactly how I got into this mess or if the problem will stay fixed.

I went back to 2016 and got an old boinc and an old amd driver. Used revo uninstaller to get rid of all boinc and amd software.

Same problem. Had 2x as many RX560 as I should have. I remembered that the RX560 was released in mid 2017 but I had just put in 2016 drivers. This was suspicious.

I uninstalled amd software again, this time I removed the ethernet cable before rebooting. There was no possibility of windows 10 nor amd updating. Sure enough, I am back to a single amd RX560. No longer have that phantom.

I am guessing that during the boot up windows goes off to find a new driver and puts it in and that is why I got stuck with that extra gpu board.

Now I got to update boinc to the latest and make sure, somehow, that I don't get a 2nd phantom gpu when I go back online.

Would like to know if anyone else has seen this problem.

thanks for looking, hope I didn't bore anyone.

[edit] Note to myself: Be sure system if offline during first boot after adding additional riser video cards as microsoft goes off to find new drivers instead of using the exiting ones that work fine.
366) Message boards : Questions and problems : more "phantom" AMD gpus: got 2x as many rx560 as I really have (Message 89770)
Posted 24 Jan 2019 by Profile Joseph Stateson
Post:
Tried clean install of BOINC using revo uninstaller. Still have 4 RX560s. However, I checked another system that had HD7950 also a pair of RX560 in risers and that was working fine. It did NOT have that Jan 20 driver, the 19.1.1 which seems to have a problem. I need to find the 9/22/2017 one whatever that was.

367) Message boards : Questions and problems : more "phantom" AMD gpus: got 2x as many rx560 as I really have (Message 89769)
Posted 24 Jan 2019 by Profile Joseph Stateson
Post:
Have pair of RX 560 in a Dell XPS-9100. One is in a 16x slot the other in a 1x with a riser. All was working fine on windows 7. I even installed the latest 19.1.1 Adrenaln for win7, dated Jan 10 2019.

Decided windows 10 had better security so I upgraded using that still freebie 10Upgrade program. As soon as the final boot occurred I spotted 4 RX-560s in the device manager instead of 2 and also found a boatload of error'ed milkyway tasks in boinc. Clearly this is a windows driver problem but am I the only one that has run into this?

I had the same problem on a pair of RX570 and gave up and put in Linux. Had same problem with Vega-64 on a system for one of my kids. Ended up removing boinc as I was just using boinc to burn in his computer.

I have had a lot of systems with HD7950, HD7850, S9x00 and windows 10 and never saw a problem like this.
right now, I am using revo uninstaller to get rid of all amd stuff and will try that jan20 win10- version of the 19.1.1 amd driver.

[EDIT] Made some progress: that REVO uninstaller took out the extra pair of RX560 that showed up in the device manager BUT BIONC STILL SEES 4 OF THEM!

368) Message boards : Questions and problems : got another "virtual gpu" - how does this happen? (Message 89643)
Posted 16 Jan 2019 by Profile Joseph Stateson
Post:
cannot get rid of the extra GPU. I ran that revo uninstaller that took out all amd drivers including cleaning the registry. Installed the recommended 19.1.1 Vega driver.

Still show two RX Vega boards.

    Starting BOINC client version 7.14.2 for windows_x86_64
    log flags: file_xfer, sched_ops, task
    Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
    Data directory: C:\ProgramData\BOINC
    Running under account JStateson
    OpenCL: AMD/ATI GPU 0: Radeon RX Vega (driver version 2766.5 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2766.5), 8176MB, 8176MB available, 14336 GFLOPS peak)
    OpenCL: AMD/ATI GPU 1: Radeon RX Vega (driver version 2766.5 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2766.5), 8176MB, 8176MB available, 14336 GFLOPS peak)
    Host name: jysevga
    Processor: 12 GenuineIntel Intel(R) Xeon(R) CPU X5670 @ 2.93GHz [Family 6 Model 44 Stepping 2]
    Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx smx tm2 dca pbe
    OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17763.00)
    Memory: 23.99 GB physical, 47.99 GB virtual



Not only does that phantom GPU run slow, the display is blank as if it is in sleep mode. Something is seriously wrong.

I let it crunch setiathome for an hour but only the CPU tasks. I then allowed the GPU ones to run and got tasks on "both" gpu. After a while system became unresponsive with a blank screen but it was still crunching according to BoincTasks. This system is for my son who is not into boinc and I was just burning it in so boinc is coming out.

369) Message boards : Questions and problems : got another "virtual gpu" - how does this happen? (Message 89627)
Posted 15 Jan 2019 by Profile Joseph Stateson
Post:
This is a new computer I put together for one of my kids. I have seen this problem only on the RX series GPUs. I assume a reboot will fix the problem eventually. I thought an upgrade of all systems to 7.14 would help but it seems the 3 finger windows salute is required

[edit] I do have the amd "beta" driver, (v)19 or something like that. If a reboot fails to fix problem then will do a cleanup and put in the blessed version.
370) Message boards : Questions and problems : got another "virtual gpu" - how does this happen? (Message 89624)
Posted 15 Jan 2019 by Profile Joseph Stateson
Post:
got only a single RX Vega, but 7.14.2 thinks I got two of them

    OpenCL: AMD/ATI GPU 0: Radeon RX Vega (driver version 2766.5 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2766.5), 8176MB, 8176MB available, 14336 GFLOPS peak)
    OpenCL: AMD/ATI GPU 1: Radeon RX Vega (driver version 2766.5 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2766.5), 8176MB, 8176MB available, 14336 GFLOPS peak)
    Processor: 12 GenuineIntel Intel(R) Xeon(R) CPU X5670 @ 2.93GHz [Family 6 Model 44 Stepping 2]
    Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx smx tm2 dca pbe
    OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17134.00)



It seems I am running tasks on them also!



EXCEPT THAT ONE OF THE TASKS FINISH IN 2-3 MINUTES, THE OTHER IN 2-3 HOURS!!!

371) Message boards : Questions and problems : ubuntu: apt-get says that 7.8.3 is the lastest (Message 89612)
Posted 15 Jan 2019 by Profile Joseph Stateson
Post:
Thanks Keith & Dave!

I first tried GitHub (I have my own repository there) but saw this error

The repository 'https://github.com/BOINC/boinc artful Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default

I then did the following that worked better

add-apt-repository ppa:costamagnagianfranco/boinc
apt-get update
apt-get install boinc-client

This got me 7.14.2 on 18.04 but only 7.12.0 on 17.10 (Artful Aardvark)

Artful has a pair of nVidia boards and the driver is a PITA to install. An OS upgrade requires a reinstall of nVidia driver which I am not interested in doing again. OTOH my pair of RX-570 in Bionic Beaver were recognized immediate. I will stick with 7.12 on Artful but the rest of my farm is on 14.

[EDIT] I am running minimal systems, no xwindows or xdisplay. I recently found that nvidia-smi reports back board temperatures, not sure what to use to check the RX-570 but they seem to run much cooler than nVidia.
372) Message boards : Questions and problems : ubuntu: apt-get says that 7.8.3 is the lastest (Message 89600)
Posted 14 Jan 2019 by Profile Joseph Stateson
Post:
Running minimal version 17 (ubuntu) and tried to upgrade client to 7.14.2

jstateson@jyslinux3:~$ sudo apt-get install boinc-client
[sudo] password for jstateson:
Reading package lists... Done
Building dependency tree
Reading state information... Done
boinc-client is already the newest version (7.8.3+dfsg-1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
jstateson@jyslinux3:~$

tried on another system and got almost the same result


jstateson@jstatesonxps730:~$ sudo apt-get install boinc-client
[sudo] password for jstateson:
Reading package lists... Done
Building dependency tree
Reading state information... Done
boinc-client is already the newest version (7.9.3+dfsg-5ubuntu2).
The following packages were automatically installed and are no longer required:
libblkid1:i386 libuuid1:i386
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 311 not upgraded.
373) Message boards : Questions and problems : How to clean up a manual install on Linux? (Message 89219)
Posted 15 Dec 2018 by Profile Joseph Stateson
Post:
Was this a minimal install or did you configure desktop and graphics?

What happened when you ran Dave's suggestion?

Project files (on my minimal systems) are at
/var/lib/boinc-client

diagnostic logs at
/var/log

Programs at
/usr/bin
374) Message boards : Questions and problems : unwanted "virtual" gpu after RADEON upgrade (Message 89211)
Posted 14 Dec 2018 by Profile Joseph Stateson
Post:
Perhaps clean out all drivers with display driver uninstaller in windows safe mode, reboot and only install one driver version.

Edit, I posted links to those in https://setiathome.berkeley.edu/forum_thread.php?id=83672&postid=1970110



I don't trust that "driverinstaller". Is there a way to download the app w/o using that generic installer?
I did find an ati driver cleaner but it was for much older versions such as the HD series of boards.

This is an HP Z400 workstation so kinda old. I am going to pull one of the RX boards, the older one, and just run with the newer RX570 and see what happens. I cannot account for why there are 4 gpu tasks running. One usually has to set app_config to 0.5 gpu to get two on same board. Here I got a pair of "extra" RX-570s which is strange but working.

going to use the revo uninstaller, the free one.

[EDIT] FIXED!!! Used that revo uninstaller to remove drivers and the advanced scan that removed register items as well. 17 was uninstalled and after rebooting I put in 18 and am back to seeing 2 gpus. Unaccoutably, one is rated at 1/4 the gflops as the other. Not sure what is happening there. I will be testing each GPU individually to see if there is a defect in the newer RX-570 I got recently.

I didn't know about revo until I had a problem with my Area51 that was under warranty and a dell tech installed it remotely to fix the thermal sensor program that was failing. I used the same free one the their tech support used.
375) Message boards : Questions and problems : unwanted "virtual" gpu after RADEON upgrade (Message 89209)
Posted 14 Dec 2018 by Profile Joseph Stateson
Post:
I uninstalled 18 and put in 17.6.3 "crimson"

when 18 was uninstalled the drivers reverted to what windows 10 had. Think I saw '25" in version details. There was no reboot required so there were still 4 gpus recognized. I stopped and started boinc and now only 2 gpus were recognized but one was much lower gflops than the other.
I then put in crimson which required a reboot AND I AM BACK WITH 4 GPUs again!!!

Something is terribly wrong. It is taking over an hour to do a collatz should be only 20 minutes max.

However, there is a difference in the recognition. There are two drivers being recognized, not just one.



I have no idea what is going on. I deleted that cc xml thinking it may be a problem. This should not be happening.

The reason I put in adrenalin and later chrimson was because the stock windows 10 drivers did not seem to work. It was taking 15 minuts or so on gpu0 and an hour on gpu1 so I thought I needed the Radeon control panel to configure the two boards.

Could this be some type of crossfire problem? Tried w and w/o crossfire and still got 4 gpus. GPU_Z and CPU-ID report only 2 RX-570
376) Message boards : Questions and problems : unwanted "virtual" gpu after RADEON upgrade (Message 89208)
Posted 14 Dec 2018 by Profile Joseph Stateson
Post:
I upgraded to 7.14.2 and powered off and back on and still have 4 GPUs.
Looking at task manager, there is only 1 BOINC client running but there are 4 of those collatz GPU tasks running.

Looking at Radeon settings, at the top I see "RX570 primary/Discrete" for one RX 570 and just "Discrete" for the other but I do not recall seeing this nomenclature "discrete" on previous versions of Radeon software.

[EDIT] Both BoincTasks and manager are now consistent and show same 5095 value.
377) Message boards : Questions and problems : unwanted "virtual" gpu after RADEON upgrade (Message 89206)
Posted 14 Dec 2018 by Profile Joseph Stateson
Post:
Never seen this before. I have a pair of RX 570 and after upgrading to adrenalin 18.12.2-dec12 an extra pair of GPUs appeared. I don't want this as completion times are longer with the "extra" pair. Googling I cant find and reports of this. I got to poking around using Radeon settings and was lucky to get system restored to where it was usable.

Pictures below show 4 GPUs recognized. There are only 2 and they are not in crossfire mode. Before I upgraded only the 2 gpu were found. The last pic is from boing manager instead of BoincTasks as it shows slightly different (???) parameters





378) Message boards : Questions and problems : Users of grcpool beware! (Message 89142)
Posted 8 Dec 2018 by Profile Joseph Stateson
Post:
How did you get so many coins in two weeks on the pool? I have 8 GPUs going, and I have nowhere near that in almost a month worth of crunching!


just saw this post.

Had several HD7950 & S9000 radeon systems most liquid cooled running milkyway. Gave them away recently as older systems need a lot of baby sitting to keep running plus I got interested in other things that dont involve baby sitting computers. Have a few mostly gtx1070 systems running gpugrid now.

Gridcoin used to be 21 cents just a few years ago but now only .005 (1/2 a cent)

Switched to solo mining after I hit 2000 and am now at 31000. Most of that came from milkyway and gpugrid. Look me up here

379) Message boards : Questions and problems : gpu stopped processing when 2nd user signed in (Message 89140)
Posted 8 Dec 2018 by Profile Joseph Stateson
Post:
I assume this is designed behavior, just never noticed it before. I created an account for a friend and when she signed in my pair of GPU running milkyway (or collatz) showed status of "waiting to run" but in any event they only started back up when I logged her out.

CPU tasks were all left running unlike the GPU.

Notice that boinc somtimes started a new gpu task, for example milkyway, instead of continueing a partially run task such as collatz. ie: collatz was left "waiting to run". But that might be a function of whether a cpu or %50 of a cpu was available as cpu tasks continued to execute when she was logged in unlike the gpu ones.


I also noticed that when she logged in I could momentarily see the boinc command shell run which is configured in the registry as
"C:\ProgramFiles\Boinc\boinc.exe" --detach --allow_remote_gui_rpc

Was wondering if there was some way to configure my system so the GPU tasks would continue to run just like the CPU tasks.

[EDIT] boinc (V) 7.8.3
380) Message boards : BOINC Manager : Linux version seems to require a password for local host (Message 85976)
Posted 24 Apr 2018 by Profile Joseph Stateson
Post:
I started using passwords for BOINC a few months ago when I spotted an attempt to access from an overseas location. The password mechanism worked fine up until yesterday when I started using Ubuntu 17.10.1 as a workstation and not just a headless server.

BM comes up "blank" and I have to go to FILE, then SELECT COMPUTER, and then type in LOCALHOST and set the password. BM then works just fine. However it does not store those settings. Quitting BM and restarting I have to do this all over again. It either should remember the last computer selected or not bother to ask for a password for "localhost" since there is no "remote" access as the computer is actually "local".

Maybe there is some setting I failed to use. IANE on Linux.

Thanks for looking!
381) Message boards : Projects : Are projects responsible for "cheaters" ? (Message 85720)
Posted 4 Apr 2018 by Profile Joseph Stateson
Post:
I remember years ago there were SETI cheaters who got credit for uploading the same work unit. That was one of many possibilities for cheating.

It appears Collatz has a problem and they are offline until it is fixed. I thought BOINC provided a framework for data to upload / download to prevent fake work units. Maybe that was just a guideline and the project had to implement the mechanism? If so, then Collatz has been hit really bad due to a few gridcoin hacker / cheaters.
382) Message boards : BOINC client : task switching limitation when running multiple tasks on same GPU? (Message 85719)
Posted 4 Apr 2018 by Profile Joseph Stateson
Post:
I think this is a case of the project (GPUGrid) not always having work units available. I found that if I wait long enough eventually they will arrive and the low priority for the other projects eventually ensures they will download and run. I can speed things up by suspending other projects and requesting an update but I have other things to do besides babysit. Keeping a short queue size also helps as it ensures checking for work more often.
383) Message boards : BOINC client : task switching limitation when running multiple tasks on same GPU? (Message 85540)
Posted 29 Mar 2018 by Profile Joseph Stateson
Post:
I occasionally have a problem keeping GPUs fully occupied on the best "paying" project and have been assigning various resource values to ensure the best project is running at all times.

question: If one has "switch between tasks" set to one hour, but (Milkyway for example) the tasks take only 3 minutes to complete, and that video board is running 10 tasks concurrently, will a task switch ever occur to cause the GPU to switch to another project?

Here is the problem I ran into that caused the question:

I have set Gpugrid to resource %100 on a GTX1070 which is a pretty good CUDA processor. Occasionally, Gpugrid runs out of data as it is a scientific project. To prevent an idle system, I have set Einstein to resource %0 so that it will run when Gpugrid is not sending WUs. Einstein WUs take 15 or so minutes and I am running 4 concurrently. Einstein downloads and runs only if Gpugrid is out of data. This system has three of the 1070s and occasionally I discover that 12 Einstein are running and I can only get a Gpugrid task if I suspend the Einstein project and request a project update for Gpugrid. Possibly, this scheduling problem it with the project and not the boinc program as Gpugrid may just happen to be out of data at the exact time boinc asks for some. However, I have seen another of my systems, with a slower single gtx1060, get 1 or even 2 Gpugrid tasks while the system with the three 1070 keeps running Einstein and downloading more Einstein to keep its queue full instead of asking for a Gpugrid task.

Things get worse when both Gpugrid and Einstein are out of data simultaneously. I then run Collatz, a math project that is never out of data, but that can be the subject of a different thread.
384) Message boards : Web interfaces : account manager re-attach bug: sync not picking up resource values (Message 85311)
Posted 19 Mar 2018 by Profile Joseph Stateson
Post:
Here is the problem ran into. I first noticed it on GRCPOOL then went to BAM and discovered the same problem.

What works: At either GRCPOOL or BAM, you set up an account then attach boinc to that account, you then get projects, either the default ones from BAM, or you add them from GRCPOOL (no default there). The resource is 100% on GRCPOOL and "-1" on BAM which gives 100% also. All is fine until you reattach. Either you accidently removed the project manager (easily to do in Boincstats if you are not watching what you are doing), or you do it on purpose to get out of or into pool mining.

The bug: If you reattach, you pick up %100 for resource instead of the last assigned resource values. Forcing a project synch from BM or Bointasks does not change the value from 100 to whatever it last was.

for example: I have Milkyway set to 100% and Einstein set to 0% on a system that has advanced double precision arithmetic hardware (S9000 & 7950 ATI Tahiti GPUs) When Milkyway goes offline (usually every Tuesday for maintenance) my system switches to Einstein. This happens only because of the resource values of 100 & 0 respectively. It is more productive to do Milkyway than Einstein on double precision efficient systems.

I recently reattached to BAM and got %100 for all projects (GRCPOOL ditto). To fix this I discovered that I had to change the value of “0” to a positive integer such as “1”. If I then issue a sync command the value on my system finally changes from 100 to 1. I can then go to BAM and change the “1” back to “0” and issue a synch to finally get the desired “0” on my system.

I suspect the problem is that neither BAM nor GRCPOOL will send the value of the resource if it has not changed AT THE WEB SITE. Thus I must force a change AT THE WEB SITE to get my local system account to be updated with the value I need. Alternately, "0" must have some special significance and is not reported after a synchronization command from BM.

This gets worse for unsuspecting users especially newcomers on GRCPOOL where there is no help or advise available. Not only do you get %100 resource, but you "CAN" also get CPU, ATI, NVIDIA and INTEL tasks even if you have those checked as “do not use”. If you have unchecked ATTACH on BAM or checked DETACH on GRCPOOL then those projects are still attached to your system and issuing a SYNC command does not detach them as the web site does not sense a change. I wrote "CAN" because the projects are smart enough not to send NVIDIA to an ATI only system and vice versa.

What can go wrong in the hour to two (or never for newbies at GRCPOOL) it takes to fix the problem: you can download boat loads of unwanted tasks such as N-Body that take all of your CPU threads (Milkyway and GpuGrid) or double precision tasks for single precision only GPUs (milkyway).

The forum at Milkyway is full of complaints about GRCPOOL users causing so many failed tasks that they are looking at a reliability test to prevent systems constantly reporting failed work units. I suspect users, especially GRCPOOL users, are unaware of the problem they are causing.
385) Message boards : Questions and problems : what is Science United? (Message 84821)
Posted 17 Feb 2018 by Profile Joseph Stateson
Post:
Looks like the redirect was removed. From boinctasks, clicking on "boinc" goes to the correct page. For about 45 minutes it went to ScienceUnited.
386) Message boards : Questions and problems : what is Science United? (Message 84819)
Posted 17 Feb 2018 by Profile Joseph Stateson
Post:
I typed "boinc" into the M$ofts Edge browser, pressed return, and ended up at scienceunited.org. This was unexpected and I thought a virus had taken over my browser. Looks like boinc but with another name. I assume it is legit as I ended up here in this forum when poking around.
387) Message boards : Questions and problems : New motherboard fried, not sure if BOINC is to blame (Message 84810)
Posted 16 Feb 2018 by Profile Joseph Stateson
Post:
Hi guys. I've been a volunteer for a few years. Recently, I bought a Dell XPS 8920 with 16 GiB RAM, an i7-7700K CPU @ 4.20GHz and a GeForce GTX 1070.


Dell makes good products and has good tech support and forums both "Owner forums" and Dell community.

That being said, I discovered that my "Sensor Control" program does not regulate and only reports temps on my 1070 gpus. It does a fine job with the cpu sensor, disk sensor, motherboard sensor etc. It automatically scales up or down all the fans speed BUT GPU NEEDS PRECISON X OR AFTERBURNER to set its fan speed. Within hours of posting on the AlienOwners forum I got responses advising to use MSI's Afterburner. Dell uses MSI as OEM for graphic cards supposidly.

Easiest thing to do for tech is test the power supply and if it is good, then simply replace the motherboard. You get to reinstall windows which probably fixes the original problem. They run a diagnostic on the motherboard and then sell it as refurbished.
388) Message boards : Questions and problems : I need help with reinstalling boinc on Linux (Message 84809)
Posted 16 Feb 2018 by Profile Joseph Stateson
Post:
I have 3 systems running Ubuntu 17.4 (as I recall) and use grcpool. There is nothing special about grcpool, it is just another account manager albeit not anywhere as good as BAM!, but it server it purpose.

For my systems apt-get purge boinc-client (or something like that) gets rid the client and the manager can be removed the same way. The account manager disappears along with the boinc code.

I have never used another distro so sorry, I know nothing about pacman, or arch but I recall Fedora has apt-get like Ubuntu. AFAICT boinc stuff is in \var\lib and in \etc directories named boinc-client

Make sure you detach from grcpool both on their web site and on your computer. If you stay on the gridcoin team and fail to detach you will contribute the the pool (thank you for the free gift).
389) Message boards : Questions and problems : BOINC freezes computer (Message 84807)
Posted 16 Feb 2018 by Profile Joseph Stateson
Post:
I read this thread because I occasionally have systems lock up when running BOINC. Invariably due to overheat: usually GPU, occassionally CPU and sometimes the disk drive goes bad also due to overheat. I run 24/7 and also have older systems like you. I used to have Semprons, Athlons, Opterons but ended up with all Intel systems due to price drops and motherboards getting too old and popping too many capacitors too often.

HD sentinal has a free version that can check your disk drive for problems.
Efmer's tthrottle can monitor your system provide CPU & GPU throttling, and even alert you if a problem using an email or text message.
I have a pair of HD 7850 and rarely had a problem with them. They are in adjacent slots but there is a large gap and air cooling works fine.

I suspect the problem is "fx 8350 thermal throttling" google that and look at the problems and suggestions especially in the overclocker forums where they have stability issues with that 8350 and Bios fixes for problems with ASUS motherboards and other MB manufacturers.

I used to turn off cool-and-quiet because BOINC was being throttled to death on my Tyan opteron motherboards. I had to put in really huge heat sinks and eventually got rid of all of them due to old capacitors going bad. Your system is much newer. I read there is a bios feature called "line conditioning" or something like that. The overclockers use that to help stabilize their systems and even reported associated BIOS bugs to Asus.

I have a number of ATI graphics boards. They are owned by AMD now and they all run a lot cooler than similar nVidia boards of the same generation. I suspect the problem is the load on the cpu.
390) Message boards : Questions and problems : BOINC is slowing down SSD write speed even when suspended in Windows 10? (Message 84800)
Posted 15 Feb 2018 by Profile Joseph Stateson
Post:
Not sure if this applies to you, but I was able to use the location property to move ProgramData\BOINC to my large "D" drive. I have only a 256gb SSD but a 2TB D drive. This put all the data writes to the D.

I did not use LOCATION to move the executables. Instead I installed boinc to the D first and then moved the boinc data afterwards.

Possibly the problem is the swap or paging file used for virtual memory. I dont think that LOCATION can be used to move that file. You might try increasing the amount of memory to %90.
I show 8GB of pagefile and 16MB of swapfile on my SSD. I allow %90 of memory for BOINC and %75 of the page file. I am not sure what the swapfile is good for. I normally see only the hibernationo file (13gb) and the pagefile (8gb). With

    02/13/2018 11:41 PM 13,693,136,896 hiberfil.sys
    02/15/2018 09:23 AM 8,623,251,456 pagefile.sys
    02/13/2018 11:10 PM <DIR> ProgramData
    02/13/2018 11:41 PM 16,777,216 swapfile.sys



Looking at "properties" for a single LHC vbox64 work unit one sees the following
Virtual memory size: 85mb
Working set size: 2400mb
slot: 12

Directory of D:\ProgramData\Boinc\slots\12
...
02/15/2018 07:40 AM 3,248,488,448 vm_image.vdi

That virtual image feeds the working set of pages. This all happens on your SSD unless you can move the data to another disk drive.

just my 2c and I am no expert of m$oft internals, but it could be that suspending BOINC leaves fragmented pages in memory and normal writes to/from the ssd are causing page faults. After a while one would think it would clear up.

[EDIT] your 512mb speed may be sequential and random access is probably lower. This is another guess of course.

391) Message boards : Projects : Which projects, if any, use CUDA but not OpenCL? (Message 84787)
Posted 14 Feb 2018 by Profile Joseph Stateson
Post:
I mentioned CUDA, but this might also apply to ATI boards.

I have a pair of GTX1070 that have 8gb of memory. Looking at the event messages, it is clear that OpenCL only uses just under 4gb of memory. I assume that is a restriction based on 32 bit addressing. That is just a guess.

It appears I can run 4 Einstein or 5 milkyway on any GPU that has 4gb of memory. Less concurrency for 3 and 2gb graphics boards. I read that miners have specific programs they run to mine on the larger gtx1080 that have 11gb of memory and they use all of the 11gb. I was wondering if any BOINC projects are coded to use all available GPU memory and not be restricted to 4gb. I assume that would be more efficient but perhaps a larger burden (heat) on the cooler.

I tried running more than 4 tasks on my gtx1070 and the tasks started erroring out. I assume the problem is the 4gb OpenCL. My systems are Core2 quad so I only have 4 cores. Maybe they errored out because they were not fed properly by the cpus and not the 4gb OpenCL limit?
392) Message boards : Questions and problems : separate install of virtual box possible? (Message 84780)
Posted 14 Feb 2018 by Profile Joseph Stateson
Post:
OK, it worked after all. I forgot to reboot after install VBOX. So one can download 5.2.6 to add VBOX and reboot to get it to be recognized.
393) Message boards : Questions and problems : separate install of virtual box possible? (Message 84778)
Posted 14 Feb 2018 by Profile Joseph Stateson
Post:
I have 7.8.3 but not vbox on a system that I now want virtual box.

On a system that DID have virtual box, I received a notice that an upgrade was available. I failed to make a note of the version but I recall the upgrade was recommended and it seemed important for security. I looked on that system for the upgrade in "Downloads" but cant find it. Must have been a temp download. I do not wish to re-install boinc 7.8.3 just to get vbox and then have to upgrade plus have to fix items that need to be changed (customized) with that re-install.

I went to the VirtualBox org and they hide the .32 build and want to install 5.2.6 Looking at boinc's repository I see 5.1.26 which is older than 32.

Should I put in the recommended 5.2.6 or that last update for the old builds the 5.1.32 which is not recommended?

[EDIT] I ran the VirtualBox program on that system and it reported 5.2.6 so that must work with boinc. Unaccountably, I have another system that is still at 5.1.26. Strange that I was advised to upgrade only on one system.

I will download the latest and greatest VBOX and see if a separate install works.

[EDIT] Didn't work - Started up LHCathome and got their message from boinc that virtual box was not installed. Seems I will have to install boinc+VB.
394) Message boards : Questions and problems : Users of grcpool beware! (Message 84643)
Posted 3 Feb 2018 by Profile Joseph Stateson
Post:
Thought I would show how badly things can get screwed up. First of all, grcpool has no forum of their own. The closest forum I can find is gridcoin's forum but they (moderators) do not provide help for grcpool as it is a different entity.

Here is an example of a screw-up by grcpool. I am going to show gpugrid's "request" and "reply" but the problem is the same on all projects, not just gpugrid. AFAICT the problem is grcpool.

From "sched_request_www.gpugrid.net.xml"

    <global_preferences>
    <source_project>https://grcpool.com/</source_project>
    <mod_time>0.000000</mod_time>
    <battery_charge_min_pct>90.000000</battery_charge_min_pct>



From "sched_reply_www.gpugrid.net.xml"


    <source_project>http://www.worldcommunitygrid.org/</source_project>
    <source_scheduler>https://scheduler.worldcommunitygrid.org/boinc/wcg_cgi/fcgi</source_scheduler>
    <mod_time>1503442910</mod_time>
    <run_on_batteries>0</run_on_batteries>



On every service request, from every project, the reply from grcpool provides parameter info from "world community grid" instead of the actual project !!!

OTH, grcpool is the only pool available for uses who cannot afford to buy (with real USD$ ) into gridcoin mining. It does provide a needed function for boinc users to get started who do not have cash to make the initial investment.

[EDIT] For what it is worth, I am not running WCG on any of my systems since switching to grcpool from BAM!

395) Message boards : Questions and problems : security hack? unauthorized gui_rpc attempt seconds after a new installation (Message 84641)
Posted 3 Feb 2018 by Profile Joseph Stateson
Post:
This is a followup.

Out of curiosity I did a re-install of Ubuntu & Boinc and did not see any unauthorized grc connection. This sort of rules out anyone at the ubuntu repository monitoring downloads of boinc and attempting to connect. Just a stupid thought or guess on my part.

However, quite by chance I noticed that other users have spotted the same problem over at gpugrid and posted about it about middle of january.

In trying to lock down my systems, I started looking at the syslogs from my AT&T Arris BGW210-700 router. That router is practically worthless as far as examining threats. I requested help with the AT&T community but it is obvious I will have to buy a real router if I want to see what the H is going on. If someone here uses AT&T maybe they can recommend a better router, at least one that actually has a manual.
396) Message boards : GPUs : Inconsistent GPU enumeration: gpus: 0,1,2.. different for additional cards (Message 84617)
Posted 31 Jan 2018 by Profile Joseph Stateson
Post:
The card sequence is reported by the BOINC client.

If the sequence if not correct add tthrottle.xml to the TThrottle folder.
An example is here: C:\Program Files\eFMer\TThrottle\examples
Use <Device_position>1;0</Device_position> (From what I remember) This should switch the two cards.


Thanks Fred! That fixed the problem with the temperatures and they now correspond to the correct GPU.

I also found that nVidia boards can also show inconsistent enumeration, not just ATI. I had a motherboard with an x1 socket closest to the CPU and the two adjacent X16 were filled with pair of gtx 670. I put an x1 to x16 riser in that x1 socket with a gtx 650 TI in the riser. I expected that two gtx670 shoud be renamed to 1 and 2 respectively with the x1 becoming gpu0. Instead, boinc assigned the board so it showed up in the middle of the order: 670, 650ti, 670 for 0,1,2 respectively. However, it was not necessary to renumber using your Device_position. Both tthrottle and boinctasks had the correct order as shown below. Note that the 45 degree is GPU1 which is the 650ti. The ATIs required me to use your xml file to set the ordering.


However, this does not fix the problem where the cc_config.xml has to be edited to change the gpu_exclusions as they become incorrect when the gpus are renumbered.
397) Message boards : GPUs : Inconsistent GPU enumeration: gpus: 0,1,2.. different for additional cards (Message 84583)
Posted 29 Jan 2018 by Profile Joseph Stateson
Post:
I observed this with AMD/ATI video boards. If the motherboard has only 1 ATI card, that GPU is numbered gpu0. If a second board is added to adjacent slot, the first board becomes gpu1 and the new one is known as gpu0 to boinc 7.8.3. I have not seen this problem on nvidia boards.

For example:
Dell Z400 with with RX-570 closest to cpu and HD7950 in adjacent X16 slot
Windows 10 driver (amd v17) "location path" shows "SLT2" and "SLT4" for the RX-570 and HD7950 respectively, those are the correct slots for this motherboard and SLT2 is the first X16 slot closest to the CPU. SLT4 is the next X16 slot and is further from the CPU. This system was built with the RX-570 and the HD added later.

However, boinc clearly shows the HD7950 as the "first" gpu:

    1/29/2018 7:38:26 AM OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7900 Series (driver version 2527.7, device version OpenCL 1.2 AMD-APP (2527.7), 3072MB, 3072MB available, 3315 GFLOPS peak)

    1/29/2018 7:38:26 AM OpenCL: AMD/ATI GPU 1: Radeon RX 570 Series (driver version 2527.7, device version OpenCL 2.0 AMD-APP (2527.7), 4096MB, 4096MB available, 5095 GFLOPS peak)



Another example
MSI-7380 with Pair of 7950s, Gigabyte in slot closest to CPU and Tahiti LE in adjacent slot further away. The Tahiti has fewer shaders and its GFLOPS are less. Otherwise it is not possible to tell which board is 1 or 2 as boinc does not report board names. Windows drivers do not show SLTx info for this much older motherboard. Instead bus and device number are shown which I do not know how to interpret. Notice that the "weaker" board is listed first as GPU 0. The Gigabyte should have been listed first as the weaker Tahiti was added after the system was built.


    1/29/2018 9:07:32 AM OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7900 Series (driver version 2527.8, device version OpenCL 1.2 AMD-APP (2527.8), 3072MB, 3072MB available, 2842 GFLOPS peak)

    1/29/2018 9:07:32 AM OpenCL: AMD/ATI GPU 1: AMD Radeon HD 7900 Series (driver version 2527.8, device version OpenCL 1.2 AMD-APP (2527.8), 3072MB, 3072MB available, 3315 GFLOPS peak)



This bug is important because if you are using TThrottle to monitor temperatures, the wrong temperatures are assigned to the graphics cards. On the remote system being monitored by BoincTasks: TThrottle, Gpu-z, Radeon and windows correctly enumerate the graphics board order. Unfortunately, the temperature reported back by TThrottle is then associated with the wrong board. This also invalidates cc_config.xml configurations when gpu_exclude is being used and a additional boards are being added.

Again, this does not happen with nVidia boards.

[EDIT] I only noticed this problem because I was getting hot temperatures reported for the RX-570 but the card that was really hot was the HD7950.

398) Message boards : Server programs : newbie question (Message 84557)
Posted 26 Jan 2018 by Profile Joseph Stateson
Post:
I posted a reply over at the grcpool forum. I see where maybe 32 people looked at your request for help there and no one answered. About the same number as here. I think users there think they are losing money if they help someone which is unfortunate.
399) Message boards : Questions and problems : security hack? unauthorized gui_rpc attempt seconds after a new installation (Message 84555)
Posted 26 Jan 2018 by Profile Joseph Stateson
Post:
I have the feeling it is to do with the grcpool AM.
Can you enable the gui_rpc_debug flag in cc_config.xml options, please?


No need to enable the gui_rpc_debug:

If the remote_hosts.cfg file is created and one attempts to log into that system using BM's "Select Computer" then the warning message shows up:

    jstatesonxps730 1101 1/26/2018 12:03:29 AM GUI RPC request from non-allowed address 192.168.1.220



However, I did go ahead and put that debug flag in to see what happens, but I took it out as fast as I could on account of all the normal traffic showing up in the event file.

===================================BUG REPORT=================
I found a bug when I did the above test: The computer I used when I ran the "Select Computer" BM function popped up a dialog box once ever about 0.5 second when attempting to connect. I had trouble reading it as it lasted only about 0.25 seconds but is was an info message to the effect it was trying to connect to the boinc manager. It appeared and disappeared so fast that I was unable to click on either of the two button widgets in its dialog box. I ended up closing [X} the boinc manager to stop this.

400) Message boards : Questions and problems : security hack? unauthorized gui_rpc attempt seconds after a new installation (Message 84547)
Posted 25 Jan 2018 by Profile Joseph Stateson
Post:
All systems are in my home and are controlled by boinctasks from either my tablet or a desktop.
All systems running boinc use grcpool instead of BAM! That is required in order to mine using the pool.
I have a magnitude of 257 but it appears that I have a couple of weeks to go before I get their "decent balance" and can return to BAM! as my project manager. I look forward to that as I will then have full control of the project. With grcpool I cannot specify which sub-projects to run among other problems.

The desktop runs the grcresearchclient or "wallet" 24x7 and also mines.

I read where GridCoin is wanting to hire consultants. I assume they are paying in more than gridcoins. It is obvious they have problems with their manager and could use some help.

[EDIT] I just added a syslog server so I can observe traffic at my router. The log at the router showed only a few hours of activity, it should have showed days. The syslog server will record traffic I am interested to check for security problems.
401) Message boards : Questions and problems : security hack? unauthorized gui_rpc attempt seconds after a new installation (Message 84544)
Posted 25 Jan 2018 by Profile Joseph Stateson
Post:
You should also describe how this new computer is connected to the internet. Yes, IP addresses are used for GUI RPCs, just as they are used for RPCs to project servers.

But an incoming call from France? That should be caught and blocked at least twice: once by the NAT translation in your router, and again by the firewall in your operating system.


Yes, you are correct - Firewall at router should have stopped any probing. I have had a discussion with at&t about brute force hacks. Those stop at the router.

Here is a log from my offending "minimal ssh server, ubuntu 17" It shows two more attempts both from a "2" country which is registered with RIPE as I remember.


    boinc.log:24-Jan-2018 19:47:52 [---] GUI RPC request from non-allowed address 2.0.199.23
    boinc.log:24-Jan-2018 21:30:43 [---] GUI RPC request from non-allowed address 2.0.199.157



Note the IP addresses are different from the one I listed. The logs get replaced after a restart of boinc. My other 2 linux systems do not show any RPC requests.
What concerns me is that the "2" address may be from a proxy or some access through an existing system in my home. My other systems in house run win10 with standard windows defender and are call up to date as I am retired and have plenty of time, and interest, in keeping them that way. One of my kids has all MAC and is not into boinc. The other has win7 with security essentials and also does gridcoin and has a wallet like I do.

This is the sequence of events that happened:
1. I installed minimal ssh server naming system "jyslinux3"
2. I went to my tablet that is running win10x64 and boinctasks and added "jyslinux3" to boinctasks
3. I went back to the ubuntu system and did that apt-get install. When it completed I
edited init.d/default/boinc-client and removed the # sign from "# allow_remote_gui_rpc"
and then did a stop and start of boinc
4. I checked the boinc.log at /var/logs and spotted that "2" address. I expected to see only
192.168.whatever from the tablet but even that is a maybe as I usually have to stop and
start boinctasks to fix a connection problem when a system comes back on line.

===========what I have done since then============
1. I added password to gui_rpc_auth.cfg on all 14 systems.
2. On the 3 ubuntu systems I had to add remote_host.cfg. There must be a difference in how boinc handles RPC calls as this table was not need for windows.
3. I am switching to two step authentication for google as well as other services that keep track of my passwords. I ordered a yubico neo security key to simplify the two step.
4. I attempted to change the passwords on my 3 Chinese made Zmodo cameras that are on the internet. They are behind my router but can be accessed by logging into a server that I assume is in China. I was unable to change the password. Any attempt using their Zviewer software resulted in a "blank" field for the new password and the default remained (something illegible, probably Chinese characters). I have not received authorization from them to join their forum nor have I received a response from my request for how to get rid of the default user/password. I have 3 other cameras, Amcrest, but the default user/password was changed long ago.
5. I scanned my win10 tablet but after I while I stopped it and instead will buy or use a 3rd party program in addition to, or maybe replace, defender.

On one occasional I added a port scan program to observer traffic on one of my windows systems. I do not know how to do that on ubuntu. Is there a debug feature I can turn on so my windows system report unauthorized rpc access? I have never seen a message like that from windows so I assume it is disabled.

402) Message boards : Questions and problems : security hack? unauthorized gui_rpc attempt seconds after a new installation (Message 84537)
Posted 24 Jan 2018 by Profile Joseph Stateson
Post:
This seems strange and I was wondering how the person at 2.0.198.94 (French site) could possibly have known I had just installed boinc on a new ubuntu system here in USA.

I did a sudo apt-get install boinc-client on a new ubuntu 17 server system I just put together.
after that install completed I immediately rebooted and then went and checked the boinc.log, maybe 30 seconds elapsed including the normal reboot.

In the log I spotted an attempt to make a gui_rpc connection to my client that was denied by boinc.

The only think I can think of is that the archive is being monitored by someone and they know if an install takes place.

Maybe there is a simple explanation for this, maybe not???
403) Message boards : GPUs : exclude_gpu not handled properly by scheduler: gpu allowed to run dry (Message 84525)
Posted 24 Jan 2018 by Profile Joseph Stateson
Post:
I found a second case where the boinc scheduler fails to provide work units to a secondary GPU when the project has work units available. The GPU is allowed to run empty unless the work load changes on the primary (or just "other") GPU.

The first case I reported here was for 7.8.3 + vbox and has no solution other than manual intervention. That case did not involve "exclude_gpu" however.

This second case is for 7.8.3 (no vbox) but the problem seems to be related to "gpu_exclude" and "gpu_usage". I read here that for gpu_usage and cpu_usage there is quote " Note: there is no provision for specifying this per GPU type or per device"

Some history first: I have a GTX1070 TI that is set to run a single GPUGRID work unit on a older 4 core system. Task take about 10 hours. Frequently, GPUGRID is out of data so I enabled 3 EINSTEIN and set its resource to 0 so that when GPUGRID is out of data then Einstein takes over with

    <gpu_usage>.333</gpu_usage>
    <cpu_usage>1.0</cpu_usage>


I did not use 0.25 as this system is used for other stuff and I wanted a free core.

All was well and fine until I remember I had a old gtx770 that was not being used. I knew that GPUGRID could not finish by a deadline and EINSTEIN took too many cores so I attached MilkyWay since it uses about .15 core and would easily run 3 units on that 770. I used the following cc_config to route the work units to the proper GPU:


    <cc_config>
    <log_flags>
    </log_flags>
    <options>
    <use_all_gpus>1</use_all_gpus>
    <exclusive_gpu_app>DVDFab.exe</exclusive_gpu_app>
    <allow_remote_gui_rpc>1</allow_remote_gui_rpc>
    <exclude_gpu>
    <url>www.gpugrid.net</url>
    <device_num>1</device_num>
    </exclude_gpu>
    <exclude_gpu>
    <url>https://milkyway.cs.rpi.edu/milkyway/</url>
    <device_num>0</device_num>
    </exclude_gpu>
    <exclude_gpu>
    <url>http://einstein.phys.uwm.edu/</url>
    <device_num>1</device_num>
    </exclude_gpu>
    </options>
    </cc_config>



All seemed well and good for several days. The MilkyWay queue had about 70 tasks, Einstein had about 40 or so. It even worked fine with GPUGrid when that project had work available. The Einstein stopped and were waiting to run of course, and even the the MilkyWay seemed to work perfectly UP UNTIL ITS QUEUE RAN OUT. The boinc scheduler failed to ask the milkyway project for more work!

I noticed that GPUGrid was running, there were 7 Einstein "waiting to run" but there were no MilkyWay tasks running on the gtx770. On a hunch I suspended the "waiting to run" Einstein and sure enough, the boinc scheduler then asked Milkyway to provide tasks.

I think this is a design problem in the scheduler. I can get around this on my end by allowing a larger queue size for milkway but I think there is a limit to how many tasks they send out.

A coding hack would be is to have the scheduler check to see if THERE ARE ANY EMPTY GPUs and authorize a download for that GPU from the last project that used it. A better fix would be to allow a provision for specifying usage per GPU type which is currently not done if I understand this correctly.

[EDIT] I will look at setting various resource values differently among these 3 projects to see if that helps but I suspect there is a scheduling problem.

404) Message boards : Questions and problems : Moo! Wrapper can't create account (Message 84500)
Posted 23 Jan 2018 by Profile Joseph Stateson
Post:
I would like to add two thoughts to this discussion.

(1) I got into a dispute about how bad the effect of the increase in CO2 was to global warming. Someone on the thread said I was a climate denier and put me in that stop spam list. It took me a long time to get out of that list as I could not even register with stop spam as my email was blacklisted. I was also blocked from several forums in the meantime. It took 2 weeks to get out of that list.

(2) moo wrapper is a waste of computing power. As soon as I find out how much it will cost me to cast a vote, I will vote to blacklist them.

https://www.reddit.com/r/gridcoin/comments/7qre20/moo_wrapper_is_this_a_scientific_project/
405) Message boards : Questions and problems : is there an exclude_cpu or ignore_cpu? (Message 84499)
Posted 23 Jan 2018 by Profile Joseph Stateson
Post:
Wanted to follow up on this post to explain more fully how I fixed the problem and why it is a grcpool defect.

I used boinctasks but I assume boinc manager behaves the same way: When using a project manager such as BAM!, I have noticed that I cannot detach a project from my local manager. I have to go through the project manager to do a detach. Apparently, this is not the case with grcpool's manager.

I checked the box to detach milkyway from the grcpool manager. That didnt work, it should have. I then found out I could issue a local command to detach. This should not have happened as grcpool is the manager. Unless policies have changed recently, BAM! would have never let me detach that way.
406) Message boards : Questions and problems : don't need (CPU: job cache full; NVIDIA GPU:) NOT TRUE:one of the GPUs is unused (Message 84498)
Posted 23 Jan 2018 by Profile Joseph Stateson
Post:

    5775 PrimeGrid 1/23/2018 10:28:59 AM Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: )



I have seen this on a number of occassions, it eventually fixes itself, but when it does show up forceing an update still results in the message that the NVIDIA queue is full when it isnt.

This system has 2 identical gpus, both gtx670, and in addition I set cc_config to allow multiple gpus.

This problem can be solved by waiting out the remaining PrimeGrid work unit. When it completes the following shows up


    JYS-EVGA-670

    5783 PrimeGrid 1/23/2018 10:43:43 AM Computation for task pps_sr2sieve_107990833_1 finished
    5784 PrimeGrid 1/23/2018 10:43:44 AM Sending scheduler request: To fetch work.
    5785 PrimeGrid 1/23/2018 10:43:44 AM Requesting new tasks for NVIDIA GPU
    5786 PrimeGrid 1/23/2018 10:43:45 AM Started upload of pps_sr2sieve_107990833_1_0
    5787 PrimeGrid 1/23/2018 10:43:46 AM Finished upload of pps_sr2sieve_107990833_1_0
    5788 PrimeGrid 1/23/2018 10:43:46 AM Scheduler request completed: got 2 new tasks
    5789 PrimeGrid 1/23/2018 10:43:48 AM Starting task pps_sr2sieve_107991039_1
    5790 PrimeGrid 1/23/2018 10:43:48 AM Starting task pps_sr2sieve_107991329_0
    5791 PrimeGrid 1/23/2018 10:43:56 AM Sending scheduler request: To report completed tasks.
    5792 PrimeGrid 1/23/2018 10:43:56 AM Reporting 1 completed tasks
    5793 PrimeGrid 1/23/2018 10:43:56 AM Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: )



look at 5783 to 5788: that one task completed and then I got two more. Seems ok
look at 5792 to 5793: One of the PrimeGrid completed pretty fast but it thinks the NVIDIA queue is full so that GPU is now empty! Could it be checking if the queue is full before releasing the finished work unit?

It will continue running that single task until it completes and then download another 2
I had suspended all other tasks on the core 2 quad system to ensure the cores were all available.
Not sure if it matters, but 3 einstein nvidia tasks had completed about an hour earlier but due to project maintenance they could not be uploaded. I dont think this caused this problem but I thought I would mention it. I am also excluding all other PrimeGrid tasks using cc_config. Don't think this matters though. Boinc (v)7.8.3

407) Message boards : Questions and problems : is there an exclude_cpu or ignore_cpu? (Message 84487)
Posted 23 Jan 2018 by Profile Joseph Stateson
Post:
I detached and then re-attached. This fixed it. I waited until most of the tasks completed. AFAICT there is no way to exclude non-gpu tasks. This was NOT a problem with "lost" tasks. It is a grcpool manager implementation.
408) Message boards : Questions and problems : is there an exclude_cpu or ignore_cpu? (Message 84480)
Posted 22 Jan 2018 by Profile Joseph Stateson
Post:
Jord: the option is "No CPU" and I have it checked. All my other system have "No CPU" checked for MilkyWay and they do NOT get cpu tasks. Something is wrong.

Could this be a case of "lost tasks" that are being resent? Maybe that avoids looking at the [X] that says do not send CPU tasks.

409) Message boards : Questions and problems : is there an exclude_cpu or ignore_cpu? (Message 84478)
Posted 22 Jan 2018 by Profile Joseph Stateson
Post:
Unaccountably, I cannot stop downloading n-body cpu type tasks on one of my systems processing MilkyWay. This is unique as on my other systems I am able to prevent cpu tasks from downloading when I dont want them. Rather than detach and re-attach to fix the problem I was hoping to figure out what went wrong and why this particular system has a problem in case the others start misbehaving.

Temporarily I deleted the executable applications from the milkyway project directory. These CPU tasks no longer run, but they do keep downloading.

I looked through the wiki on client configuration and see only gpu exclusions.

Is there a way to specify a cpu app from being downloaded in cc_config or app_config?

I assume this is punishment for using GRCPOOL to handle my downloads.
410) Message boards : Questions and problems : Is there a "device_nums" for app_config? (Message 84443)
Posted 20 Jan 2018 by Profile Joseph Stateson
Post:
I want 4 Einstein on the 4gb board but only 2 on the 2gb board.
Make sure the 4GB board is #0, then use app_config to set a maximum of 6 concurrent tasks. If you're lucky #0 is filled first, resulting in a 4+2 distribution.



This worked but I could not use it as I only had 4 cores and 6 einstein was too many for this system.

So I set it to max of 4 concurrent tasks and ran into a scheduler problem with milkyway
When the last milkyway job completes it reports nvidia job que is full (or some such wording) and no more milkyway jobs are downloaded. If I temporarily suspend einstein then milkyway downloads a boatload of tasks but eventually they are all processed and the second video board is not being used.

I think this is a bug in the scheduler as the other project is not aware the first one is limited to only 4 tasks and thinks the gpu queue is full

I ended up excluding einstein from the smaller board. The 2 additional milkyway tasks consumes very little cpu so this system works ok with 4+2
411) Message boards : Questions and problems : Is there a "device_nums" for app_config? (Message 84434)
Posted 19 Jan 2018 by Profile Joseph Stateson
Post:
Looking through the wiki here, I do not see anything like <device_number> or <device_nums> for app_config.xml

I have two similar nvidia boards but one had 1/2 the memory the other one has. I can run 4 concurrent tasks from a certain project on the larger board but only 2 concurrent on the smaller board from that same project (they would all error out if 4 tried to run).

I do not see how to do that. The best I can think of is to exclude the project from the smaller board and use another project instead

for example: I want 4 Einstein on the 4gb board but only 2 on the 2gb board.

what I managed to do was 4 Einstein on the larger and 2 Milkyway on the smaller. I had to exclude each project from the corresponding "other" board.
412) Message boards : Questions and problems : Users of grcpool beware! (Message 84431)
Posted 18 Jan 2018 by Profile Joseph Stateson
Post:
You are absolutely correct about grcpool as I found out by asking some simple questions at their form and not getting any answers.

I joined 2 weeks ago and am 1/2 way to my 2000 coin minimum before I can get out and go back to boincstats and do solo mining. In the mean time I have been helping newbies like myself at their forum but it seems I am the only one interested in helping.
413) Message boards : BOINC client : ubuntu: password for boinc? (Message 84034)
Posted 27 Dec 2017 by Profile Joseph Stateson
Post:
Yes, that worked. Just was wondering if a real user account was created where I could just change identity to boinc.

I just switched to grcpool project manager and unlike boincstats they take ownership of the system at the various projects. This caused problems at Einstein as I had specified 1.0, .5, 0.33 and 0.25 for various venues depending on the system. grcpool does not support venues so all my projects defaulted to 1 WU per nvidia board. I had to go to each Einstein project directory and put in the app_config with the correct gpu/wu count. I have not used ubuntu for a while and had forgotten even where the directory was.
414) Message boards : BOINC client : ubuntu: password for boinc? (Message 84025)
Posted 25 Dec 2017 by Profile Joseph Stateson
Post:
I read that a user named BOINC is created during installation and has ownership of boinc items. I assume there is a password?

I wish to create an app_config.xml file at
/var/lib/boinc-client/projects/..whatever..

Generally, when I mess around in places other than my home directory I get permissions so wrong that things that used to work don't work anymore. I got to thinking that if I could log in as "boinc", the permissions would be good to start with and then I can create that file with no worries. I tried logging in and also "su boinc" but alas I need a password it seems.

other suggestions are welcome.

thanks for looking
415) Message boards : GPUs : 1 of 2 WUs suspended due to activity on another GPU. (Message 83920)
Posted 17 Dec 2017 by Profile Joseph Stateson
Post:
I have a mixed pair of GPUs: gtx670 and gtx770 with both enabled for use with BIONC on a win10x64 system. The system is core 2 quad so only 4 CPUs available. The GPUs both have 2gb and the same architecture so no large mismatch on the graphics boards.

They have been running a pair of Einstein tasks each using 0.5 for GPU and 1.0 for CPU. While each task ran slower, the fact they were running simultaneously made up for the difference in speed.

I allowed one PrimeGrid to run as a test. I expected that the board it would run on would have its 2 Einstein tasks bumped off into suspension. Not only were they bumped off, but the one of the 2 Einstein tasks on the other board was also suspended. Thus I had only 2 active tasks, 1 on each GPU and 2 unused CPU cores.

This should not have happened. Why was one of the Einstein tasks suspended from that other graphics board?

[EDIT]
I am running BOINC 7.8.3. Also want to mention that I did not suspend any of the Einstein tasks, I simply enabled "allow more work" on the PrimeGrid and a total of 6 WUs downloaded before I could revert back to "no more work". However, only 1 of the 6 WUs started.
416) Message boards : Questions and problems : remote connection failed unless I started using REMOTE_HOSTS.CFG (Message 74689)
Posted 8 Dec 2016 by Profile Joseph Stateson
Post:
[SOLVED] Juha's thought about the cc_config.xml had me look at that file and I discovered that when I used the boincmgr dialog box option to exclude a program that cc_config file got created, with all defaults, and sure enough the remote gui option was set to "0". This negated my launch command --detach --allow_remote_gui_rpc. Also want to point out I was a member of seti since 1999, not 2009.
417) Message boards : Questions and problems : remote connection failed unless I started using REMOTE_HOSTS.CFG (Message 74668)
Posted 7 Dec 2016 by Profile Joseph Stateson
Post:
I use "\boinc.exe" --detach --allow_remote_gui_rpc in the windows registry to start boinc and do not use that cc file unless boinc ignores one or more of my GPUs. The boincmgr and boinctray keys are deleted as these systems are part of a farm and only need boinc.exe.

I cannot account for why this system requires that hosts file. I had something strange happen while testing the first install. I got an error -155 when attempting to execute bonccmd.exe "Authorization Denied" or something like that. That was in the program file\boinc directory so it should have worked. However, after reinstalling boinc a second time, that error went away. I originally executed the boinc install program using "run as administrator" but since I am the administrator that should have made no difference whether I selected it or not (one would think).
418) Message boards : Questions and problems : Seti@home installed on 4xCPU AMD Athlon II, not doing much (Message 74634)
Posted 6 Dec 2016 by Profile Joseph Stateson
Post:
Are you using the screen saver option? If running all the time check that the threshold for 'non boinc cpu" busy is %90, not %25 and / or set number of cores to use at %75 so one is always a 'non boinc" core.

HTH
419) Message boards : Questions and problems : remote connection failed unless I started using REMOTE_HOSTS.CFG (Message 74633)
Posted 6 Dec 2016 by Profile Joseph Stateson
Post:
Been running a SETI farm since about 2009 and never had this problem before. My second windows 10 system, also running 7.6.22, would not allow a remote connection unless I created REMOTE_HOSTS.CFG and put in the ip address of the remote. i spent two days looking at this trying re-installs, removing / adding firewall, testing with telnet to port 31416, etc. Tried both boinctasks and the "select computer" option in boincmgr. I did notice that boinctasks tried connection to both boinc and the associated tthrottle and that tthrottle did make a connection (port 31417) but the connection timed out for boinc (31416). I then tried creating that remote host config file and sure enough was able to make a connection. Was wondering if anyone has seen this problem before. I have created at least 100+ SETI and later BOINC systems and this is the first time I had to use that cfg file. My normal setup is an empty (0 size) gui_rpc_auth.cfg file and a non-existent hosts cfg file so I can check my systems from any computer running boinctasks.
420) Message boards : Questions and problems : boinccmd --host --passwd problem (Message 63248)
Posted 28 Jul 2015 by Profile Joseph Stateson
Post:
gui_rpc_auth.cfg is empty on the systems in my boinc farm as I don't use a password. I was unable to use boinccmd.exe to access them and I tried the following before giving up and putting in a password.

boinccmd --host 192.168.1.151 --quit
boinccmd --host 192.168.1.151 --passwd "" --quit
boinccmd --host 192.168.1.151 --passwd '' --quit

The first 2 above gave failure to read() error, the 3rd one said the password was wrong.

I also tried 192.168.1.151:31416

I went and put 1234 in that gui_rpc_auth file on that 151 host, rebooted, and the following worked
boinccmd --host 192.168.1.151 --passwd 1234 --quit

It seems to me that leaving off the -passwd should have worked but that is just my 2c
421) Message boards : BOINC client : another task switching problem (Message 58245)
Posted 27 Nov 2014 by Profile Joseph Stateson
Post:
Obviously, this is a problem with project code but I am trying to get a handle on what is happening.

Occasionally a bitcoin utopia GPU task gets hung at 99 or 100 % and runs for hours instead of seconds. This can be cleared up by using BM to pause the GPU or using
    boinccmd --set_gpu_mode never 10

as suggested here both restarting the task. However, I have other GPU projects enabled on this system and if they had been given a time slice as is usual, the restart (BCU does not checkpoint) should have cleared the bitcoin task for me. Why didnt they get a slice? I plan on bringing this problem up at the project forum.

422) Message boards : BOINC client : possible problem task switching every hour?? (Message 52939)
Posted 3 Mar 2014 by Profile Joseph Stateson
Post:
OK, I now know what happened and how I caused it. Yesterday, on a Linux system, I noticed that DistRT was using very little CPU processing so I enabled BC %100 cpu to add the 4th core to the processing capability. This linux system is USB based and only runs asteroids and DistRT under cuda 1.1. That change, from %75 to %100 caused the benchmark to run from watching BoincTask as BT showed the suspension and the benchmark. This morning I noticed the DistRT was taking too long and after aborting, I then checked my dual ATI 5850 and sure enough DistRT has been running for 3 days with progress stuck at %99. I had done the same thing on that system a few days ago: made a change to the number of cores which caused the benchmarks to run.

What got my attention to the problem was the very low temperatures that the GPU was reporting plus very low CPU utilization.

BTW, my Linux box reports temps to BoincTasks from the ubuntu "sensors" app. I had to mod that app to send the temp data to BT's 31417 port on my monitoring system that runs BT.

Anyway, I now have a handle on what happened and how to avoid it.
423) Message boards : BOINC client : possible problem task switching every hour?? (Message 52936)
Posted 3 Mar 2014 by Profile Joseph Stateson
Post:
Thanks Richard, I think that explains what happened. I also discovered that after I aborted the two stuck DistRT tasks that the milkyway tasks all got computation errors. Seems aborting the DistRT must have left their code in the gpu which caused milkway to fault.

I rebooted instead of just "re reading" the cc_config file and the milkway tasks are now running correctly.

As far as GPUGRID is concerned, I still have problems with their "long" tasks that lock the display and the entire computer following a power glitch or outage. As a result, I only run GPUGRID tasks on systems that I have a monitor on as it is difficult to reset their project before the NVidia driver crashes on powering up. Their suggestion was to put a 60 second delay before the project starts up to allow time to reset the project but that is not always workable as well as a PITA. It may be that they have fixed this problem since last month but I don't want to test it by pulling the plug.
424) Message boards : BOINC client : possible problem task switching every hour?? (Message 52932)
Posted 3 Mar 2014 by Profile Joseph Stateson
Post:
ok, so there is no preemptive switching of tasks when the gpu is involved. BC must ask the app to give up and never gets a response.


Anyway, it happened again and this time I looked at the DistRT web site as it is their problem is the cause and sure enough there are complaints and a solution. it seems that if a cpu benchmark is run then the gpu dries up and gets locked waiting for data to arrive and cannot recover from the temporary lack of data. This is a bug in their code but they claim other projects do the same so that excuses it. I made the recommended change to the cc file to not do any benchmarks.

However, I do not remember specifically running any benchmrks but it seems that must have happened which caused their apps to hang "forever"
425) Message boards : BOINC client : possible problem task switching every hour?? (Message 52709)
Posted 22 Feb 2014 by Profile Joseph Stateson
Post:
I run boinc from start using --detach and do not use BM. I assume the client forces the task switch and not BM.

OK, the problem: I just checked one system and noticed that two tasks which normally take 3 hours each (rainbow tables) on my 2 HD-5850 were obviously hung as the combined time was shown by boinctasks to be over 3 days. There were 43 milkywaY tasks and 15 rainbow ready. I aborted the two hung tasks and observed 2 milkyway start up. I then checked "messages" and also checked at the milkyway site and for the last 24 hours there were no milkyway tasks uploaded. Milkyway take only 15 minutes to execute so it would appear that they never got a time slice while the rainbow table tasks were hung. Obviously, there is a problem, but it seems to me that the other tasks should have received a slice every hour.

I just recently started processing those rainbow tables as they perform very well on my (old) 5850. However, looking thru their web site, I am concerned that they are providing tables for hackers as all of the big crunchers are in china.
426) Message boards : BOINC client : Android cannot use BAM (Message 52230)
Posted 30 Jan 2014 by Profile Joseph Stateson
Post:
Thanks Jord!

I don't know what it was I downloaded as the play store showed several "boinc" apps. I have since uninstalled.

I suspect tablets are better for this app than cell phones. Asteroidsathome seems to be one of the apps that runs better on cpu than on gpu so I was wondering if my android would be useful. I leave it plugged in over night so I thought I would try it out. I suspect it is too soon, and the software is not ready yet for my phone.

I looked at my media player "pivos aios" which runs a version of android but it is only 500mhz and not an ARM.

Hmm..... I just reviewed the AIOS specs and the company seems to have picked up an injunction to prevent selling it as the AIOS allows playing ISO BluRay images. That is what I got it for, didn't know it was illegal! I will not chance burning it out with boinc 24/7.
427) Message boards : BOINC client : Android cannot use BAM (Message 52228)
Posted 30 Jan 2014 by Profile Joseph Stateson
Post:
This is an old post but I am new to boinc on android and would like to add my 2c and then some.

I brought up play store and downloaded "boinc" on my galaxy IIIs and got "nativeboinc" which I assume is the official boinc.

The status screen displayed
"suspended until battery is over %90: currently %100"

and "waiting project initialization" on the project page even though the task (asteroids) had been running for many hours.

Event log showed the task had been running for several hours and check pointing about ever 60 seconds so obviously the status page is incorrect.

I tried suspending the project to see if the dialog button would change to "resume" but it didn't and boinc is currently stuck "reading projects" and a full reset failed to fix the problem. After restarting, the status page now no longer shows anything other than "reading". The event log shows null pointers when accessing the status page and I will be uninstalling the app as it seems too many problems too soon for my android which has 2 cores and 2gb memory (the sprint version of the IIIs).
428) Message boards : BOINC client : Does BOINC check the remote manager at startup? (Message 51724)
Posted 14 Dec 2013 by Profile Joseph Stateson
Post:
Yea - pretty sure now that boinc just could not make the connection. At the gpugrid project there was a suggestion to put a 30 second delay somewhere into into cc_config to allow more time to "fix" the problem.

Setting the "project suspend" flag at BAM! would have worked and did on my systems right on the router but failed on the ones I had on the A/C power line ethernet as that, I am guessing, had not recovered properly.
429) Message boards : BOINC client : Does BOINC check the remote manager at startup? (Message 51710)
Posted 12 Dec 2013 by Profile Joseph Stateson
Post:
I am using BAM! and boinc.exe is launched using --detach --allow_remote_gui_rpc

Note: BM and BoincTray are not being used and were edited out of the registery.

For local management I use BT (Boinctasks).

Recently, due to a bug in the gpugrid CUDA handler, I went to BAM! and checked the box that should have caused the gpugrid project to become suspended. This worked fine on 2 out of 5 systems, but the 3 it didnt work on I had to attach a monitor and keyboard to and bring up safe mode and delete any gpugrid files down in in ProjectData. These systems ran the gpugrid project seemingly before checking to see if the project was suspended. This caused the system to freeze because of the CUDA bug.

Anyway, I was wondering if boinc.exe checks the remote manager before launching the project. Maybe this feature is only in BM and I am not running BM.
430) Message boards : GPUs : problem uninstalling ATI's opencl need help (Message 51142)
Posted 4 Nov 2013 by Profile Joseph Stateson
Post:
Solved - Thanks!

Re-install of nVidia driver fixed it.

Took a while to verify that the driver install worked because of MURPHY'S LAW. After the re-install I happened to look at the event log.

A quick check of the event log showed numerous memory errors that were corrected as well as a huge number of IDE controller (not disk) errors. There was no drive on the controller port generating the error which was suspicious. Opening the case (probably 6 months since last time) showed several of the ERCC modules were lifting out of their socket. The locking clips were clearly moved back from their locked position. I pulled all the chips and cleaned them and blew out the slots. One pad on a stick was partially corroded but I managed to clean it and booted "memtest86" with ercc enabled and all sticks passed. I cloned the boot disk from IDE to SATA and disabled the IDE entirely. Took 10 hours before system was back processing. MW tasks are running just fine. I need to check event logs on headless systems more frequently.

431) Message boards : GPUs : problem uninstalling ATI's opencl need help (Message 51127)
Posted 3 Nov 2013 by Profile Joseph Stateson
Post:
Thanks Claggy. Unfortunately, it didn't help. After rebooting I "resumed" milkyway and got some more tasks and they immediately had a computer error. The project has already been reset. I dont know what else to try. Currently the GPUs work fine with collatz and prime grid. The same type of milkway "fit" work units all process fine on another pair of gtx460 but that system is a new one with an intel cpu and never had any ATI stuff.

I put two links above for working and non working but I dont see any difference like library names, versions, etc, so I am at a loss as to what is happening.

EDIT - Just detached and re-attached. Still getting errors and giving up on MW for this system.
432) Message boards : GPUs : problem uninstalling ATI's opencl need help (Message 51125)
Posted 3 Nov 2013 by Profile Joseph Stateson
Post:
I tried removing the AMD catalyst set using "express uninstall manager" but am still having apparent opencl problems on project milkyway. There was a suggestion at their forum
Many people in BOINC-land advise that you remove all traces of all AMD drivers before you change to a different one. AMD seem to have heard them, and produced a driver removal tool. The only official-looking link Google could find for me is http://sites.amd.com/us/game/downloads/Pages/catalyst-uninstall-utility.aspx, but that's asking me for a user/password login, which I don't have. Maybe somebody else here does?

but I had the same problem accessing the utility.

It would appear that there is still some opencl ATI driver somewhere causing a problem. The offending system is a old Tyan S2892 server with onboard ATI video (unused and disabled via jumper) with two gtx460 and win7x64pro. It would appear I will have to use some driver cleaner to get rid of ATI opencl. Alternately, there is some other problem. Maybe a guru can spot another problem here. This system runs 7.0.28. I can try a newer version of BOINC. It seems to me that either the project or BOINC should be handling this issue.
433) Message boards : GPUs : CUDA work properly when video board has artifacts on screen? (Message 47584)
Posted 30 Jan 2013 by Profile Joseph Stateson
Post:
Milkyway failed - never got past 0.0 % and the amd driver reset. The driver reset occurred first which probably caused the hang at 0.0

Anyway, the board is no good. It used to be an xfx gtx280 but it failed under warranty and xfx send me this hd4890 as replacement. All together they lasted for about 3 years running 24/7.
434) Message boards : GPUs : CUDA work properly when video board has artifacts on screen? (Message 47567)
Posted 30 Jan 2013 by Profile Joseph Stateson
Post:
I have several old video boards that dont work. After a suggestion on the eVga forum, I baked the boards at 200c for 10 minutes. One of the boards (hd4890) actually seemed to have salvaged. It used not to work but now it works but shows artifacts. I am using it as "device 1" and have a gts250 as "device 0" so the display is good. ie: If i put a monitor on the hd4890 it shows green dots and other artifacts.

Currently it is running collatz but has not yet completed a work unit. Collatz will validate against a wingman so if the artifacts cause a problem then I assume I will find out eventually. Are there any other gpu projects that can validate a workunit w/o a wingman so I can tell if the board is going to work?

BTW, all the boards that had chips on the back side had the chips fall off when they were baked.
435) Message boards : GPUs : Dummy monitor connector caused analog to die (Message 47480)
Posted 23 Jan 2013 by Profile Joseph Stateson
Post:
I just lost the analog video on an HD5850. No telling when it first went out as it seemed to be working fine, and is still working perfectly, crunching GPU tasks. I only noticed the problem when I did some GPU board shuffleing and it ended up being the primary monitor. It works fine when connected to an HDMI, DP or DVI monitor. I don't have any DP or DVI monitors, but I used adapters to an HDMI TV to verify the video was working.

I suspect it would be cheaper to buy a DVI monitor than to send the board in to HISDigital to get the analog video fixed. It had been 4 days since I asked their support about repair fee and they have not answered.

I bought two adapters on amazon for nvidia boards some time ago. Later, nvidia drivers changed behavior and those plugs were not needed. At some point I remember putting it on the ATI board and I must have left it on there which seems to have been a bad idea. The plug did allow me to bring up the catalitic (?) control panel to make a change, but I should have pulled the plug off afterwards. Their control panel wont show performance tools unless the plug (or a monitor) is connected.
436) Message boards : BOINC client : BOINC 7.0.40-42 and new app_config.xml (Message 47410)
Posted 20 Jan 2013 by Profile Joseph Stateson
Post:
What I was looking for was not a gui change but just another RPC call such as

get_project_file("www.worldcommunitygrid.org","app_config.xml", char* etc

or

set_project_file("boinc.fzk.de_poem","app_data.xml",char* etc

However, over at boinctasks, Fred just posted that something like this had already been rejected.

I can always share project folders across a network for drag and drop, but IMHO boinctasks could easily handle this and be convient and safe if the rpc was available.
437) Message boards : BOINC client : BOINC 7.0.40-42 and new app_config.xml (Message 47399)
Posted 19 Jan 2013 by Profile Joseph Stateson
Post:
I have not used app_config.xml yet as I am still on 7.0.28. However, I was wondering if there will be a get_app_config and set_app_config rpc call like the get and set cc_config? BTW, I dont see the set_cc_config documented it the rpc wiki but that is a separate problem.

The reason I was asking was I was thinking it would be nice if the set and get cc_config rpc calls could be used to obtain (read) and be able to update the app_config and the app_info files. Seeing as both go into the project directories it seems a convenient way to update any of them.

For Example --

I was advised at the project POEM that if I wanted to run POEM on a system that had two nVidia (or ATI) boards and not get constant "restart" errors (they claim a boinc bug) that I need to add XXX to cc_config and YYY to app_info (refer to thread above for the XXX and YYY)

Anyway, some of my systems are headless and I use BOINCTASKS which can retrieve cc_config.xml, allow edits, and update it all using rpc calls.

It would be nice to have app_info and app_config sections in "cc_config" to allow Fred's program to update those xml files.

my 2c.


[EDIT]
1. get_cc_config would obtain each (if any) app_info and app_config from each project directory and put it into a a new section "ExternalAppsXML" in cc_config. The corresponding set_cc_config would store the contents of that "ExternalAppsXML" section at the corresponding project locations (listed in that file) or something like this :-)
438) Message boards : GPUs : Can BOINC tell the difference between ATI & NVIDIA Open CL? (Message 47360)
Posted 18 Jan 2013 by Profile Joseph Stateson
Post:
After pulling an HD5850 from a system that also had a gtx460, opencl would not work even though the utility "gpu cap viewer" said it was available and it had worked on the other gtx460 before the ATI card was pulled.


The above utiity has an "OpenCL demos" popup that would not run claiming that I "did not have a platform that supported open cl". MilkyWas also said I had no opencl.

The above demo, and MilkyWay, worked fine after I uninstalled the ATI video driver. Obviously, I should not have a driver for a missing video card. However, it seems to me that a 3rd party program should be able to determine if opencl is available for nvidia even if it is not available for ati.

my 2c tells me there are a lot more important things to fix beside this.
439) Message boards : GPUs : "This GPU does not support openCL" (Message 47357)
Posted 17 Jan 2013 by Profile Joseph Stateson
Post:
It might be useful to followers here to look at a thread I posted over at MilkyWay

Basically, after I pulled an ATI HD5850, MW no longer processes any gtx460 tasks even though it used to do it. I assumed that MW app is mis-identifying the gpu but it now seems that BOINC is reporting no opencl but associating that "error" with nvidia instead of the missing ATI board.
440) Message boards : BOINC client : Anyone running ubuntu in 4gb flash with latest boinc & cuda? (Message 47252)
Posted 14 Jan 2013 by Profile Joseph Stateson
Post:
OK, I found this article about minimum installation

http://maketecheasier.com/install-a-minimal-ubuntu-on-old-laptop/2012/02/24

and a 40mb download for the minimal install 12.04 even.

https://help.ubuntu.com/community/Installation/MinimalCD


It will be painful, but I suspect I can get the 32bit libraries (for 32 bit apps) and also the cuda library. I will see how far I can get. maybe there is a boinc support group at that newbie ubuntu help site. Hopefully apt-get is included in the min imstall.
441) Message boards : BOINC client : Anyone running ubuntu in 4gb flash with latest boinc & cuda? (Message 47248)
Posted 14 Jan 2013 by Profile Joseph Stateson
Post:
Yea - probably could switch to USB. I am currently using CompactFlash in an IDE to CF adapter that plugs right onto the motherboard on these old systems. I think some can boot USB.
442) Message boards : BOINC client : Anyone running ubuntu in 4gb flash with latest boinc & cuda? (Message 47239)
Posted 14 Jan 2013 by Profile Joseph Stateson
Post:
I have been running Ubuntu 8.04 (Dotsch/UX) out of 4gb flash with BOINC 6.10.56 but some BOINC projects now require a newer version of BOINC. I cannot upgrade and stay in 8.04 on account of latest boinc uses dynamic libs and 8.04 is too old to upgrade to newer libs with latest CUDA. I have several rigs that are part of a BOINC farm like this one here that all boot with 4gb flash.

IANE on ubuntu, but I could install 12.04 onto a hard disk if there is an easy way to then build a 4gb flash boot.

Thanks for looking!
443) Message boards : BOINC Manager : Remove leftover tasks and projects? (Message 47232)
Posted 14 Jan 2013 by Profile Joseph Stateson
Post:
Thanks Claggy!

I didn't try reset because the projects are being used but I will reset them as soon as there is a stopping point (I am running in a challenge right now).

I reset two projects that were long ago "retired" and not even listed on BAM! anymore and it took a while but they finally got removed from BM.
444) Message boards : BOINC Manager : Remove leftover tasks and projects? (Message 47218)
Posted 14 Jan 2013 by Profile Joseph Stateson
Post:
Problem 1: somehow this fixed itself. just took a while before BAM! got it right.

Problem 2: Removed ATI video board and getting "Application uses missing ATI GPU". I dont see any tasks "waiting for ati". Everything seems to be working even the projects that claim the ATI board is missing. BM:7.0.28

    19 Collatz Conjecture 1/9/2013 1:27:17 PM Application uses missing ATI GPU
    20 Collatz Conjecture 1/9/2013 1:27:17 PM Application uses missing ATI GPU
    21 Milkyway@Home 1/9/2013 1:27:17 PM Application uses missing ATI GPU
    22 Milkyway@Home 1/9/2013 1:27:17 PM Application uses missing ATI GPU
    23 Milkyway@Home 1/9/2013 1:27:17 PM Application uses missing ATI GPU
    24 Moo! Wrapper 1/9/2013 1:27:17 PM Application uses missing ATI GPU
    25 PrimeGrid 1/9/2013 1:27:17 PM Application uses missing ATI GPU



Right now, PrimeGrid is using the gtx460 just fine but somehow is also complaining the ATI board is missing. Maybe there is some files I can delete or edit to fix this?

On a system that used to have a gtx280 and now has a gtx460, I get the error message "Milkyway - Application uses missing NVIDIA GPU".

445) Message boards : GPUs : Memory size not calculated correctly (Message 46967)
Posted 1 Jan 2013 by Profile Joseph Stateson
Post:
Programs seem to run fine, but way more memory is shown that is even in the entire system.

    jstateson1quad

    13 2012-12-31 11:40:35 PM NVIDIA GPU 0: GeForce GTX 460 (driver version 306.97, CUDA version 5.0, compute capability 2.1, 1024MB, 8381362MB available, 1025 GFLOPS peak)
    14 2012-12-31 11:40:35 PM NVIDIA GPU 1: GeForce GTX 460 (driver version 306.97, CUDA version 5.0, compute capability 2.1, 1024MB, 951MB available, 941 GFLOPS peak)
    15 2012-12-31 11:40:35 PM NVIDIA GPU 2: GeForce GTX 460 (driver version 306.97, CUDA version 5.0, compute capability 2.1, 1024MB, 946MB available, 941 GFLOPS peak)
    16 2012-12-31 11:40:35 PM OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 306.97, device version OpenCL 1.1 CUDA, 1024MB, 8381362MB available)
    17 2012-12-31 11:40:35 PM OpenCL: NVIDIA GPU 1: GeForce GTX 460 (driver version 306.97, device version OpenCL 1.1 CUDA, 1024MB, 951MB available)
    18 2012-12-31 11:40:35 PM OpenCL: NVIDIA GPU 2: GeForce GTX 460 (driver version 306.97, device version OpenCL 1.1 CUDA, 1024MB, 946MB available)


There are only 2 boards.
Device 0 & 1 are a single board (dual 460)
Device 2 is a single board single 460
Both are eVga.

446) Message boards : GPUs : can dual gpu in one card run 2 tasks? (Message 45406)
Posted 22 Aug 2012 by Profile Joseph Stateson
Post:
Thanks Mitrichr

I have a pair of Asus gtx570's that take up 3 slots each. They run very cool 24/7 but I found on prime grid that the fans must be set manually at 100% to keep the temps down. I had them in SLI mode for a while but ran two tasks. I didn't think to try running just one like you did.

However, I took them out of SLI mode when I ran into a performance issue with DVDFab's blu ray copy program as shown here Using a gtx570 in sli mode to compress a movie from 50gb down to 25gb was not as good as just using my Q9550 CPU by itself with no assistance from the GPU. A single gtx570 was always better then my Q9550 CPU by itself. I was also getting motion sickness with elder scrolls and plants-vs-zombies did not need sli.

I borrowed a gtx670 for testing two weeks ago. It did not work with DVDFab nor with setiathome. DVDFab just released 8.202 version that works with kepler but I no longer have the 670 as it went to one of my kids.
447) Message boards : GPUs : gtx670 compatibility (Message 45242)
Posted 12 Aug 2012 by Profile Joseph Stateson
Post:
I tried the 4.x driver with the 670 and it didnt work just like the 5 didnt work.
448) Message boards : GPUs : gtx670 compatibility (Message 45240)
Posted 12 Aug 2012 by Profile Joseph Stateson
Post:
nVidia's forum has been down for some time so I thought I would post my question here.

I borrowed a gtx670 for testing. It would not run DVDFab's bluray copy program unlike the gtx570 or the gtx460 or any earlier CUDA boards.

It would not run setiathome either but it did run gpugrid stuff and milkeway and prime (as I recall?). I only had it for 24 hours.

Anyway, why cannot gtx670 run programs that work fine on gtx570, 460, etc. Maybe the nVidia forum is down because there are real problem with the 6xx series and not on account of the password hacking.

my 2c.
449) Message boards : GPUs : can dual gpu in one card run 2 tasks? (Message 45239)
Posted 12 Aug 2012 by Profile Joseph Stateson
Post:
Some nVidia gpu's are "duals" such as this eVGA gtx460

Do these type of cards run a single task faster or is it capable of running two tasks?

I have 2 gtx570 and each runs a task even in SLI mode. I am considering buying a "duallie" but only if it runs two tasks. I searched at gpugrid.net but didnt see any info about duallies'.

thanks for looking!
450) Message boards : BOINC client : cpu not being assigned on one system (Message 42859)
Posted 4 Mar 2012 by Profile Joseph Stateson
Post:
Unaccountably, one of my systems is using one less CPU then it has assigned to it. I run WUProp and FreeHAL on all my systems. These are never assigned any cpu resources. I use the %75 processor assignment to limit all my quads cores to 3 cpus as I have found that gpu tasks get starved for data if all are assigned. I just noticed that ONE system was only using 2 cpus: project DNA was runnning only two work units. It should have been 3. I set %processor up to 100 and now DNA is running 3 cpus. It should be running all 4 if I understand how cpu assignments have been made.



The above shows that milkyway and primegrid use .97 and .89 percent of cpu each. That is over 1 full cpu. Is this a new feature of 7.0.18 that when it sees gpus using over 1 cpu that it limits cpu tasks? I had not noticed this before. But then, I never read the release history. If this is the case then I no longer need to set aside a cpu for the gpu any more.

Another system is running collaz on three nvidia boards. Collatz shows 0.01 cpu usage. That is even less than FreeHal or WUProp. That seems way too low. However, I assume it is correct and that might explain why all 3 cpus are being used in the following system (set to %75 processor, not 100%)



I plan on changing all my systems to use %100 processor (all cores) unless someone can point out a GPU project that will be starved for data.
451) Message boards : BOINC client : nvidia driver reset problem: boinccmd --quit (Message 42844)
Posted 3 Mar 2012 by Profile Joseph Stateson
Post:
When issueing the command "boinccmd --quit" I have found that on 2 of 3 systems, I get an nvidia driver reset. About every 1 of 5 resets I get a Win7 BSOD.

I assume this is a problem with the driver version 285. However, I notice that if I bring up an "exclusive app" that I can stop the GPU programs and they restart when the "exclusive app" terminates. This is the way it should be and there is no nvidia driver reset. However, I am wondering if when boinccmd sees the --quit command if it could pause (or whatever) the gpu tasks and then wait a few seconds before doing the remainder of whatever it does when the --quit command is finally processed. Maybe this could mitigate the driver problem.
452) Message boards : BOINC client : exclusive app not shown in BM (Message 42843)
Posted 3 Mar 2012 by Profile Joseph Stateson
Post:
In cc_config.xml I have DFDFab.exe defined as an exclusive app. This program does not show up in the BM dialog box for BOINC-Preferences. However, I must enable "Use GPU based on preferences" before BOINC stops using my GPUs.

Suggestion: If cc_config.xml has an exclusive app defined, then it should be honored even if boinc preferences allow GPU to run all the time.

The following image shows two programs: Boinctask and Boinc 7.0.18. Boinctasks shows that DFDFab.exe is an exclusive app for jstateson-pc-7 but the boinc program shows no such application.

453) Message boards : BOINC client : Device partitioning GPU with OpenCL 1.2 (Message 42842)
Posted 3 Mar 2012 by Profile Joseph Stateson
Post:
How about sending double precision tasks to the gpu that actually has DP hardware and avoiding the single precision GPU? Can that be included in the device partitioning? I still have this problem with milkyway project getting assigned to a single precision gpu.


In the alpha test versions of BOINC there is the ability to exclude a gpu from a project or an app. It's worked from about 7.0.8 onwards. But before you rush out to try the alpha versions I suggest you read the change log message thread, in particular the incompatibility warnings.

Further out there is an idea to get BOINC to recognize different gpu capabilities and schedule accordingly. No idea when that is slated for, but at least its on the drawing board.


This seemed to work fine. I posted my cc_config and observed results here at the milkyway forum.
454) Message boards : BOINC client : Device partitioning GPU with OpenCL 1.2 (Message 42614)
Posted 18 Feb 2012 by Profile Joseph Stateson
Post:
Software such as DVDFab uses CUDA for video encoding. Currently, I stop boinc when I bring up DVDFab. This is done automatically using that cc_config file. However, it would be nice if they could both run at the same time. Ideally, the the device partitioning could handle this one would think.
455) Message boards : BOINC client : Device partitioning GPU with OpenCL 1.2 (Message 42613)
Posted 18 Feb 2012 by Profile Joseph Stateson
Post:
How about sending double precision tasks to the gpu that actually has DP hardware and avoiding the single precision GPU? Can that be included in the device partitioning? I still have this problem with milkyway project getting assigned to a single precision gpu.
456) Message boards : GPUs : both gpus usable in SLI or Crossfire? (Message 42183)
Posted 22 Jan 2012 by Profile Joseph Stateson
Post:
I am not running SLI or crossfire but was wondering if both gpu's can continue to be used for BOINC when in SLI or crossfire mode. At one time I recall only one gpu was useable and I was wondering if that has changed recently.
457) Message boards : Projects : need invalid signing key solution (Message 42145)
Posted 19 Jan 2012 by Profile Joseph Stateson
Post:
I am getting an invalid signing key from primaboinca. Their solution is to detach and re-attach the project. I did this over at boincstats where I am using BAM!. It didnt work as discussed in this thread. In order to detach it appears I have to stop using a project manager (BAM! in this case) then detach from primaboinca and then restore my project manager back to BAM!.

I removed primaboinca from all my computers over at boincstats. I then sync'ed my boinc farm which should have detached but it didnt.

Anyway, is there a walkthru to remove traces of primaboinc (or any offending perp) where the signing key is invalid and not have to remove the host from BAM! ? I have had problems in the past after removing a host and then re-enableing the project manager.

thanks!
458) Message boards : BOINC client : 6.13: Exit BM not stopping milkyway but have other successes (Message 39155)
Posted 21 Jul 2011 by Profile Joseph Stateson
Post:
I thought I would mention the following problems / successes. I upgraded 6.12.26 on two systems gpu's: hd5850+gtx570 and hd4890, to 6.13.0 and a day later to 6.13.1.

(1) I got my first successful result on the hd4890 for milkyway. Previously, all the work units timeout out after about 80 seconds. The hd4890 was a warrantee replacement for my burned out gtx280 as xfxforce was out of gtx280s.

(2) PrimeGrid tasks never start on the hd4890. I understand this is a problem with ATI 11.6 driver from what I read at PrimeGrid forum. Moo! wrapper runs fine on hd4890. The 5850 does not have any of these problem, it seems only the 4890 series.

For both 13.0 and 13.1 closing BM stopped all CPU tasks and most gpu tasks for all projects but the CUDA version of milkyway failed to stop. Starting BM back up gave me two instances of milkyway cuda, but the one left running when BM exited quickly got into "waiting to run". This was easily repeatable.

When the ATI version of milkyway terminates "ready to report" both monitors flicker noticibly and a second flicker about 1/2 sec later when the next milkyway work unit starts up. Hotfix 11.6b was supposed to fix flickering, but it had no effect on BOINC 6.13 (milkyway project).

Anyway, I was very happy to be able to crunch milkyway using my ATI 4890. Only Moo! Wrapper worked when I first put the board into the system to replace the gtx280.

[EDIT] Something in the ATI SDK was needed to make 11.6 work correctly with PrimeGrid. It iw working now on hd4890
459) Message boards : BOINC client : wish: venue change be recognized before asking for tasks (Message 39108)
Posted 19 Jul 2011 by Profile Joseph Stateson
Post:
Ok, I see now that the sync request I made from jstateson2duo to BAM! at 3:44:18 did not obtain a new venue for my system and it took the project update at 3:51:24 to actually get the venue change. That is just batch processing at its worst: A bunch of punched cards get read in and processed by collatz and collatz inserts a "next time get an ATI job" card that sure enough, works the next time the cards are read again. The session should have just asked "give me what I am allowed to get for the jstateson2duo venue".

my 2c says this is a design change that goes thru a committee and will never happen.
460) Message boards : BOINC client : wish: venue change be recognized before asking for tasks (Message 39101)
Posted 19 Jul 2011 by Profile Joseph Stateson
Post:
I change the venue for collatz to one that allowed ATI tasks and then did a sync. There was no request for tasks so after a while I did an update. The new computer location was the LAST message as shown here.

    jstateson2duo
    1701 Collatz Conjecture 2011-07-19 3:51:24 AM update requested by use
    1702 Collatz Conjecture 2011-07-19 3:51:29 AM Sending scheduler request: Requested by user.
    1703 Collatz Conjecture 2011-07-19 3:51:29 AM Not reporting or requesting tasks
    1704 Collatz Conjecture 2011-07-19 3:51:31 AM Scheduler request completed
    1705 Collatz Conjecture 2011-07-19 3:51:31 AM New computer location:



I did a second update and BOINC then asked for ATI tasks. It failed, of course, because the request was too recent. I believe that if the venue change had been done first, then it would have asked and received ATI tasks.

Second update shown below


    jstateson2duo
    1706 Collatz Conjecture 2011-07-19 3:52:37 AM update requested by use
    1707 Collatz Conjecture 2011-07-19 3:52:42 AM Sending scheduler request: Requested by user.
    1708 Collatz Conjecture 2011-07-19 3:52:42 AM Requesting new tasks for ATI GPU
    1709 Collatz Conjecture 2011-07-19 3:52:43 AM Scheduler request completed: got 0 new tasks
    1710 Collatz Conjecture 2011-07-19 3:52:43 AM Not sending work - last request too recent: 73 sec



Seems to me that the first update should have received some tasks since the venue change was done during a sync and both the server and the client knew there was a change.

461) Message boards : The Lounge : Crysis 2's legacies (Message 38154)
Posted 4 Jun 2011 by Profile Joseph Stateson
Post:
I used to pay Duke Nukem when my kids were in high school. They have all graduated from college some time ago and I (we) are still waiting for Duke Nukem Forever.

last year, I was given "Plants vs Zombies" by my brother and got hooked on it. I then bought copies for myself and my kids but discovered that the new version did not have Michael Jackson as the "Dancing Zombie". Supposidly, they were forced to remove that actor when Michael died. That is a shame as I looked forward for Jackson showing up and dancing while playing that game.
462) Message boards : The Lounge : End of the world (Message 38153)
Posted 4 Jun 2011 by Profile Joseph Stateson
Post:
I drove thru Dallas that weekend and they had bought advertizing on billboards downtown warning of the doom.


At least I wasnt spammed by them and the ad was safer than the corona beer ad that had a lady on her back drinking beer with leg up in the air. That ad actually caused a couple of deaths, unlike the doomsday one.
463) Message boards : The Lounge : 98% of people have no interest in the projects (Message 38152)
Posted 4 Jun 2011 by Profile Joseph Stateson
Post:
I am one of the 2% (but not a motorcycle 2%'er) and I check projects to see if they seem legit. If it looks like they are trying to crack a key to win a prize for the programmer (and not the end user) then I am not interested in working with them. I have some suspicions like when quantumfire closed down right after that PS3 key got cracked but I can't be sure they had anything to do with it ;-)

I quit processing WUs for climatechange when they got caught fudging the numbers (the "hide the decline" email crap). LHC is another bad one, they admitted the project would never provide real data for BOINC users as it was "too big" and "in fortran" and the amount of credit they provided was so small I thought it was because of a micro-blackhole they had implement in their code.

Anyway, I suspect that if you had provided Billy with a link to a Jodi Foster "Contact" screensaver he would have stuck around for SETI a while longer.

my 2c
464) Message boards : The Lounge : Strange problem solved! (Message 38151)
Posted 4 Jun 2011 by Profile Joseph Stateson
Post:
Thought I would share this story ---

For several weeks I had been hearing a high pitched whistle when one of my dual opteron systems powered up. It would fade away and be gone by the time windows was running. It only occured during a warm reboot such as after a windows update that required a reboot. I never heard it after a cold reboot (several hours of being turned off). The system ran slow but I assumed that was because of two gpu tasks and 3 cpu tasks that boinc was running. When I suspended boinc the system ran better and I assumed all was OK.

I checked the cpu fans while they were spinning to see if they were causing the whistle and found one of the cpu heat sinks was no longer attached to the motherboard. The tab on the retainer broke where the HSF clip was to be attached to and the silver grease was all that was holding the HSF to the cpu.

I replaced the plastic retainer and the slowing down problem disappeared. Apparantly, the silver grease was sufficient to allow the system to boot into windows but not sufficient to allow it to work after boinc started the cpu tasks. The whistle was the bios warning that the cpu was too hot and the noise goes away after thermal shutdown. It was not easy to remove the HSF as the grease, which had NOT dried, had a good vacuum hold on the cpu surface.

Anyway, I thought this was interesting.
465) Message boards : BOINC client : 6.12.26: insufficient nVidia tasks (Message 38106)
Posted 1 Jun 2011 by Profile Joseph Stateson
Post:
I am certain that this is a boinc problem or feature. The system with both ati and nvidia is now processing only milkyway tasks. According to this thread, there is a limit of 12 tasks per gpu for milkyway. I am actually seeing this right now as I have 15 Milkyway nVidia, 7 Milkyway ATI and two Milkway gpu tasks are running, one is nVidia, the other is ATI.

Appearantly, the boinc scheduler is not letting me process any PrimeGrid or Collatz and thinks I need to do more Milkyway. It took 24 hours for this to stableize. My very first post showed 24 milkyway tasks and 24 collatz, all ATI, with a single PothPrime CUDA, and a day later I am down to only 24 Milkyway with a mix of nVidia and ATI and no other gpu tasks.

So perhaps this is the way it is supposed to be. To get some other projects (Prime or Collatz) I guess I will have to set preferences at Milkyway for ATI only.

So what caused the scheduler to select milkyway only?

I think I know how this happened. I let all work finish (set NNT) before I went on vacation and shut the farm off until I got back. When I got back last week I turn the farm on and enabled new work. I suspect each project maxed out with no regards to other gpu projects on the initial request after startup. However, my other systems with similar gpu's (all nVidia) seemed to handle the restart better and collatz, milkyway, prime, einstein adn seti were all represented somewhat equally.
466) Message boards : BOINC client : 6.12.26: insufficient nVidia tasks (Message 38099)
Posted 1 Jun 2011 by Profile Joseph Stateson
Post:
Peter - Proth Prime CUDA on gtx570 typically run 15 minutes and generate 4500 credits

This system has two nvidia gpu's: a gtx280 and a gtx460 and is maxed out at 24 tasks each for poth prime and milkyway




The following is what you asked for. The system has ati 5850 and nvidia 570. It should also have 24 or so poth primes but only has 1.

467) Message boards : BOINC client : 6.12.26: insufficient nVidia tasks (Message 38073)
Posted 31 May 2011 by Profile Joseph Stateson
Post:
I have one system with ATI 5850 and nVidia gtx570. For preferences, Collatz and Milkway are set to allow both ATI and nVidia tasks and PrimeGrid is set for nVidia only. As shown below, I have 44 gpu tasks ready to start, all of which are ATI. There is only 1 nVidia task running, and when it complete it gets only one more task. My work buffer is set for one (1) day.

Unaccountably, I have no backlog for nVidia tasks. There should be about equal number of nVidia and ATI. I checked the servers and there are plent of WU's ready to be downloaded. Something seems wrong.



from message log


    jstateson2quad

    11171 Milkyway@home 2011-05-31 5:12:25 PM Reporting 1 completed tasks, requesting new tasks for ATI GPU
    11172 Milkyway@home 2011-05-31 5:12:26 PM Scheduler request completed: got 1 new tasks
    11173 PrimeGrid 2011-05-31 5:13:01 PM Computation for task pps_sr2sieve_21291464_1 finished
    11174 PrimeGrid 2011-05-31 5:13:01 PM Sending scheduler request: To fetch work.
    11175 PrimeGrid 2011-05-31 5:13:01 PM Requesting new tasks for NVIDIA GPU
    11176 PrimeGrid 2011-05-31 5:13:03 PM Started upload of pps_sr2sieve_21291464_1_0
    11177 PrimeGrid 2011-05-31 5:13:04 PM Finished upload of pps_sr2sieve_21291464_1_0
    11178 PrimeGrid 2011-05-31 5:13:15 PM Scheduler request completed: got 1 new tasks
    11179 PrimeGrid 2011-05-31 5:13:17 PM Starting pps_sr2sieve_21291865_1
    11180 PrimeGrid 2011-05-31 5:13:17 PM Starting task pps_sr2sieve_21291865_1 using pps_sr2sieve version 139
    11181 Milkyway@home 2011-05-31 5:13:30 PM Sending scheduler request: To fetch work.
    11182 Milkyway@home 2011-05-31 5:13:30 PM Requesting new tasks for ATI GPU
    11183 Milkyway@home 2011-05-31 5:13:31 PM Scheduler request completed: got 0 new tasks
    11184 Milkyway@home 2011-05-31 5:13:31 PM No tasks sent
    11185 Milkyway@home 2011-05-31 5:13:31 PM This computer has reached a limit on tasks in progress



I do not see milkyway requesting any nVidia at all, and, as shown, PrimeGrid only gets 1 task every time it completes the previous.

468) Message boards : BOINC client : tasks being sent to wrong gpu card (Message 37983)
Posted 25 May 2011 by Profile Joseph Stateson
Post:
Thanks Jord.

I have not decided what to do as I was hoping the project would fix the problem.

I also want to correct the computer id I listed above. It should have been
http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=258308

Having two BOINCs running complicate stopping / suspending them when I want to use an application that uses CUDA such as DVDFab.

469) Message boards : BOINC client : tasks being sent to wrong gpu card (Message 37961)
Posted 24 May 2011 by Profile Joseph Stateson
Post:
Is there any way to direct milkyway tasks to my gtx460 and avoid the 9800gtx card which is single precision? I had 3 tasks fail within seconds because they were assigned the wrong card. Here is one of them and the card is only single precision capable. The Computer ID shows a pair of gtx460 but that is not true, there is only one and the other is 9800gtx. This combination works fine for primegrid and collatz but not milkyway.

I recently added some single slot gtx460s to my systems and now have double precision math capability with these newer cards.

[EDIT] I saved the result image since milkyway deletes results usually within minutes. Click here for result http://stateson.net/images/milkyway_save_result.png

According to this thread http://forums.nvidia.com/index.php?showtopic=173877 the gtx460 has only 1/6 or maybe 1/12 the double precision capability of other cards.
470) Message boards : Questions and problems : is reserving one core for CUDA still adviseable? (Message 37646)
Posted 29 Apr 2011 by Profile Joseph Stateson
Post:
I have several 4 core systems. Three intel and 5 opteron. All have acceleraters except one of the opterons. I have been setting aside 1 cpu that is not to be used by boinc on all systems with CUDA. I found a long time ago almost all gpu bound projects complete much faster with %75 of cpus assigned.

Is this still required even with the 6.12.26?

471) Message boards : GPUs : GPU mis-identified in dual gpu system. (Message 37564)
Posted 23 Apr 2011 by Profile Joseph Stateson
Post:
In a system with two GTX-280, I replaced a defective GTX280 with a GTX570 and AFAICT all projects are reporting my computer to have two GTX-570. I assume this problem is with the project web software?

I set cc_config to force all gpu's to be used as BM was only using the 570. Both boards are crunching fine but project computer info shows, for example, the following incorrect identifications:

472) Message boards : GPUs : Dual gpu problem: 99% utilization hangs two gtx-280s (Message 37563)
Posted 23 Apr 2011 by Profile Joseph Stateson
Post:
Followup: Always want to correct a post that was wrong. The problem was NOT the driver. The heatsink compound dried out on on the defective GTX 280. In the process of swapping the board to different systems it got into a slot that was cooler and it simply took longer before it finally failed and I had thought the problem was the driver since it worked a couple of days before it failed again.

XFXForce has lifetime warrantee but only for original buyer, not a used board from ebay so I disassembled it and determined that all the grease on the memory chips had totally dried out turning to a white fluffy powder and the main nVidia chip with the silver stuff had dried out but had stayed silvery. The board ran very hot to the touch, hotter than its other pair, but its temp sensor never got very high. The temp pickup must have been near the big chip that still had the silver stuff but the rest of the chips overheated.

Anyway, I replaced it with at GTX 570 (another story) and ordered a replacement cooler from zalman to try to salvage the board.
473) Message boards : GPUs : Dual gpu problem: 99% utilization hangs two gtx-280s (Message 37542)
Posted 20 Apr 2011 by Profile Joseph Stateson
Post:
I installed nVidia 270.61 and it starting working again in a win7-64 system. I think the problem was 266.58.

The gtx280 that I thought was defective worked perfectly in a linux system processing collatz and that was when I went back to nVidia and discovered the even newer 270.61 drivers.

Getting the dust out helped but the system still hung with 266.58 and a new power supply I had bought thinking it was a lack of power.

Some other thoughts - The software DVDFab uses CUDA for encoding / decoding of dvd & bluray. It is significently faster than just the CPU. They do not support ATI's equivalent software. It took me over 4 hours to copy a bluray on a system with a cypress 5850 and a core 2 duo. A similar bluray movie copied in 30 minutes on my gtx280 system with a core 2 quad. Currently, I stop BOINC when using DVDFab as I am unsure if they can coexist at the same time.
474) Message boards : GPUs : Dual gpu problem: 99% utilization hangs two gtx-280s (Message 37458)
Posted 10 Apr 2011 by Profile Joseph Stateson
Post:
I suspect the problem is lack of power as Jord suggested. This system has an xfx 850 watt modular pwr supply. However, I recently added a 3rd 2TB drive for a total of 4 HD (6.5TB) and 2 opticals. According to
http://www.eggxpert.com/forums/thread/493743.aspx at the newegg forum, 800 watt is needed for a pair of gtx280. This system worked fine for about 6 months since I added the 2nd GPU, but I suspect adding the additional 2TB drive created too great a power load.

I saw a better power calculating tool somewhere but I don't remember where I found it. The above link is probably just a guess. I can measure A/C current flow into the xfx power supply, but the 850 watt is the delivery rating, not the consumed power. There is probably some way to actually measure how close I am to the 850 watt rating. In the mean time I can start adding up HD power requirements and assume worst case with all disks spinning.

I have a single ATI 5850 that is having its fan replaced under warrantee. I may stick that in place of the gtx280 that I pulled when I get it back. According to the above link the 5850 only requires 600 watts for a pair whereas the gtx280 requires 800.
475) Message boards : GPUs : Dual gpu problem: 99% utilization hangs two gtx-280s (Message 37442)
Posted 9 Apr 2011 by Profile Joseph Stateson
Post:
This is possibly a hardware problem but I am unsure how to solve it. Starting about 2 days ago (April 6th) my core 2 quad with a pair of gtx280 started getting sluggish and then began hanging. About this same time, the project milkyway went offline and my system started processing PrimeGrid(99% gpu utilization) and Collatz (86% gpu utilization) instead of milkyway. This system can no longer process two PrimGrid tasks concurrently without hanging. No errors in windows event file. Temps easily under 70c. Windows 7/64 6.12.13 and 260.99. I upgraded to 6.12.23 and 266.58. Things got worse. Instead of taking 60 seconds or so before the system hung after bringing up boincmgr, it hung within seconds. I pulled the boards and swapped them and eventually put one board in another system. Both boards work fine in separate systems. They are very sluggish when processing PrimeGrid and I suspend the GPU when I want to do other things. Collatz at %86 utlization is not sluggish and the system is useable when collatz is running. ie: I do not have to suspend the gpu.

Is there a way to decrease utilization of gpu or to restrict the same project from using both gpu's at the same time?

This system has DVDFab installed. That program uses CUDA to handle video encoding/decoding for dvd & bluray copying. AFAICT it is only active when I am running the program itself although there is a service associated with it.

PrimeGrid has not changed their CUDA app since January and I dont see anything unusual in the micrsoft win7 updates that could cause this problem.

any help would be appreciated

thanks for looking


[EDIT] Want to clarify that with both boards and the system set for use GPU after 1 minute of idle, the system hangs instantly upon the minute expiring with two primegrid tasks and requires a hard reset. Prior to upgrade to 266.58 and 6.12.23 it would hang within 30 seconds and if I moved the mouse quick enough I could suspend the project before the system completely hung. The system also hangs with two collatz tasks running, but it runs longer before it finally hangs.
I have another system with a pair of 9800gtx+ that run PrimeGrid without hanging. However, they are not double precision video boards like the gtx280.
476) Message boards : BOINC Manager : replaced motherboard but got a second account with same name (Message 36753)
Posted 6 Feb 2011 by Profile Joseph Stateson
Post:
I replaced a motherboard, which required a clean install of Win-7. I set the system name to exactly the same it used to be before I install BOINC. When I connected to my account manager BAM! I got the same name, but it showed up as another computer in the host list. This creats an orphan and I will eventually have to merge all projects that have this system. The merging will create a new project wide identity which taks a long time to propagate thru their data base.

How can I avoid this in the future when doing a clean install?

thanks for looking
477) Message boards : BOINC client : Observation: 6.12.13 nicely round-robin's 4 "ATI" tasks (Message 36752)
Posted 6 Feb 2011 by Profile Joseph Stateson
Post:
I just installed 6.12.13 on a new system with clean install of Win-7 and I noticed that my 4 "ATI" projects I signed up for were all taking turns running on my ATI CYPRESS 5850: Milkway, Prime, Collatz, Seti-Beta. Normally, I see one project run for days with a huge queue and then another project starts up and does the same with a bigger or smaller queue.

It this is an artifact of a clean install or a better scheduler?

I will be looking at this system and will post back here when/if I see changes.
478) Message boards : Questions and problems : Resume from suspension a problem for ATI apps? (Message 36751)
Posted 6 Feb 2011 by Profile Joseph Stateson
Post:
I just replaced a motherboard, it required a clean install of Win-7 and I forgot that the default is to set a 20 minute suspension. Of course, I cannot use that since I run a farm 7/24. I had 4 crashes in 6 hours of running my new system - EVGA 760 chipset with HIS 5850 and BOINC 6.12.13. I have not installed any other software so far.

I almost returned the motherboard as I thought it was defective until I realized it was suspending itself. The event viewer simply shows a bugcheck (crash occured) and previous to that, hours earlier, are statements about going into suspension.

Since I disabled the suspension, it has not crashed and I completed my first 24 hour burnin.

AFAICT all the devices are capable of being woke up after a suspension so I am guessing that one of my ATI app's: Prime, Collatz, Milkway, or Seti-Beta has problems when the GPU awakes from suspension.

I am just guessing but was wondering if any other users have seen this problem.
479) Message boards : Questions and problems : 64 bit version sometimes goes into (x86) (Message 36661)
Posted 29 Jan 2011 by Profile Joseph Stateson
Post:
Running the 32 bit version on ubuntu-64 generates really weird errors. See the last two posts on this thread
http://boinc.berkeley.edu/dev/forum_thread.php?id=5562&nowrap=true#31690

480) Message boards : Questions and problems : wish: Install to allow rpc option (remote control) (Message 36660)
Posted 29 Jan 2011 by Profile Joseph Stateson
Post:
Ageless: boincmgr --help shows the /b option will pass commands to boinc but I was never able to get it to work. I downloaded the source code and looked at the module that does the parsing and the command arg parser would not validate --allow_remote_gui_rpc because it was not one of the command args to boincmgr. I tried putting quotes about the args to "/b" but that generated an error. The args to "-b" or "/b" are not forwarded to boinc.exe as AFAICT they fail the "rules" for boincmgr. It was simply ignored. That was two years ago and I remember a developer posting, about that time, that he thought that command "-b" had been removed long ago as it no longer worked. If it is now working, that is news to me.

Also, I have a farm of PC's and use boinctasks for monitoring instead of boincmgr.

I tried it again and got the same error message I got 2 years ago
481) Message boards : Questions and problems : wish: Install to allow rpc option (remote control) (Message 36657)
Posted 29 Jan 2011 by Profile Joseph Stateson
Post:
An install (or update) to vista-64 or windows 7-64 creates the following at
    Windows Registry Editor Version 5.00
    [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Run]
    "boincmgr"="C:\Program Files\BOINC\boincmgr.exe\" /a /s"
    "boinctray"="C:\\Program Files\BOINC\boinctray.exe\"



It would be nice if the user could [x] an option to allow remote control (doing away with automatic install of boincmgr and tray) to get something like the following:

    Windows Registry Editor Version 5.00
    [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Run]
    "boinc"="C:\Program Files\BOINC\boinc.exe\" --detach --allow_remote_gui_rpc"



In addition, subsequent installs would default to having the [x] checked or not depending on what the user did on the previous install.

There is also the question of why the install goes to Wow6432 in the first place for 64bit programs.

482) Message boards : Questions and problems : 64 bit version sometimes goes into (x86) (Message 36656)
Posted 29 Jan 2011 by Profile Joseph Stateson
Post:
I assume this does not matter as long as it runs correctly (which is the case)?

I downloaded boinc_6.12.12_windows_x86_64 and installed on Vista 64 Ultimate and Windows 7 Home Premium (Upgrade from Vista HP) and a Windows 7 Pro that was an original build. Two of the installs went in "Program Files" the one from Win7 HP went into "Program Files (x86)"

The system that has this problem had (4 years ago?) the 32 bit version of Boinc. I assume there is something left over in "Program Data" or the registry that wants subsequent installs to go into X86 instead of the normal 64 bit location. In any event, it is working. I only notice this when I went to add --detach --allow_remote_gui_rpc to the registery where the run key is locate and noticed it was running out of x86. I uninstalled, re-installed and noticed that the install default as x86. After uninstalling I rebooted and verified that the directory "boinc" was not in either Program Files or the x86 program files.

It would be nice if the install for windows and linux checked the architechure to a 32 cannot go on a 64 and at least warn the user.

I also noticed the following on all 3 systems running windows: boincmgr and boinctray are located in the registery at

    [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Run]



as shown here:


it seems to me it should have gone here:

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run]



which is where boinctasks64 is installed as shown here:


If this is a known problem that is not worth fixing because the program still runs fine?

483) Message boards : Questions and problems : linux: problem installing 6.12.8 (Message 36221)
Posted 27 Dec 2010 by Profile Joseph Stateson
Post:
I have not had time to look at this again, but I suspect the static linking of 6.10 is what is working and 6.12 is not linked statically. These dotsch systems have minimum tools available so I cannot confirm the problem.

What bothers me is the boinc error "cant find 2.11 glibc" and ubuntu claiming that I can only upgrade my rev glibc to 2.6.x.y. One of those is not correct. 2.6 sounds like a kernel version and 2.11 seems like a real version number. All these libraries come in (i think) when I did a get of "build_essentials". That was required to run nvidia stuff and was not required for just cpu bound boinc projects. RPM is not in ubuntu so I cannot use that tool to find the glibc versoin number. However, this system has no nvidia. I did an "apt-get install build_essential" on it, but got a huge steam of ip address errors. The repository for this old version of ubuntu must no longer exist and I will have to set the default repository somewhere else when I figure out how to do that. In the mean time I am running 6.10.56 just fine. I even selected the "update" and got the same ip address errors. Maybe I waited to long before doing an update and this old ubuntu version is not supported.

Dotsch is working on a version 2 based on ubuntu 10.x but no telling when it is coming out and since the official boinc release is back at 6.10 then I suspect his new version may not work with 6.12 if 6.12 is not statically bound as suggested by Chris.
484) Message boards : Questions and problems : linux: problem installing 6.12.8 (Message 36207)
Posted 24 Dec 2010 by Profile Joseph Stateson
Post:
OK - I posted over there. However, I looked at that synaptic package manager again and my Dotsch/UX system (ubuntu actually) shows

glibc version is 2.6.27-16.44 and can be upgraded to 2.6.27.17.46

AFAICT, glibc 2.6 is a long way from 2.11

It looks like I have to upgrade to a new release of ubuntu, not just upgrade my current release to its latest and greatest. That does not seem correct. According to boinc release info linux x64 works with ubuntu 7.1 that includes dotsch 1.2.

That makes the 2.11 version seem suspicious if 7.1 (dotsch ux) can only be upgrade to 2.6.27.17.46. That number (2.6) looks like a kernel version but that is all that the package manager shows.

I think I did something wrong in the install and it should be working.

On rea-reading the Linux info at the boinc site they state quote "The current release is known to work with these Linux versions:"

However, the "current release" is not 6.12.8, it is 6.10.58 if I understand the instructions and 6.10.56 works fine and I suspect that 6.10.58 will also work fine.
485) Message boards : Questions and problems : linux: problem installing 6.12.8 (Message 36205)
Posted 24 Dec 2010 by Profile Joseph Stateson
Post:
I am getting the error "missing glibc 2.11" when starting boinc up in \home\boinc\BOINC . I assume I need to upgrade 1.2 Dotsch/UX but that will be a problem on this 4gb flash drive system. It is one of 5 linux systems running Dotsch/UX 1.2 using 4gb flash and 6.10.56. I have been avoiding any Ubuntu upgrades because of space limitation. It is a PITA,but can be done. I have to delete downloads after an install to allow more downloads and a "clean" to get rid of binaries and intermedaries not needed anymore. Anyway, the following is suspicious:

There is no error message about glibc being missing if I run boinc from \home\boinc\Desktop\BOINC_12 That location is where the install for the 6.12.8 got unpacked by the install script. However, that location does not have any of the project stuff (slots, etc) that Dotsch defaults to \home\boinc\BOINC. I tried /sbin/ldconfig thinking that the new library needed to be put into the cache. I as guessing that the new ca-buildle.crt needed to be cached but glibc is in 6.so, not in ca-buildle.

Anyway, I can run boinc (it shows no attached projects naturally) from either \home\boinc\Desktop\BOINC_10 or \home\boinc\Desktop\BOINC_12 and neither show any error about glibc at startup.

If I run boinc from \home\boinc\BOINC then it works only if the executable was 6.10.56 Not only does it start up, it starts processing. Running the 6.12.8 terminates immediately with that error messages about glibc 2.11 being missing.

AFAICT all permissions, ownership are set correctly and I compared a working Dotsch with the nonworking to verify ownerships and permissions. The working of course had 6.10.56

Question: Was glibc 2.11 put in after 6.10.56 or is there something in the new executable where it cannot find glibc anymore? FWIW, A grep thru the output of ldconfig does not list glibc.
486) Message boards : BOINC Manager : Cannot connect to BAM! on one system anymore (Message 36160)
Posted 19 Dec 2010 by Profile Joseph Stateson
Post:
This has been solved by Willy. The team name "Texas A&M University" contained an unescaped ampersand which caused the signature to not be accepted by BAM!.

I posted a question about improperly constructed Team Names back in march HERE The two teams involved, yoyo and cosmology, seem to be the only two that have improper & in the name. Maybe it was just the way they were displaying the name and they didnt bother fixing it. Other projects with A&M don't have the same problem.
487) Message boards : BOINC Manager : Cannot connect to BAM! on one system anymore (Message 36139)
Posted 18 Dec 2010 by Profile Joseph Stateson
Post:
I went and posted my problem at that thread although the error message was not exactly the same and their OS was linux.

It seems likely a BAM! bug dealing with a disconnect problem.

However, if BM reports to the effect "now using account manager" then why is "sync with account manager" missing and the menu item "connect to account manager" present.

If BM reports using the account manager it should show the menu item to allow disconnect so it seems to me that BM is not handleing the signing error as it should.


If I put in what I know is an INCORRECT password, I should not get a message to the effect "now using account manager"
488) Message boards : BOINC Manager : Cannot connect to BAM! on one system anymore (Message 36134)
Posted 18 Dec 2010 by Profile Joseph Stateson
Post:
Did a full uninstall of boinc 6.12.4 and then installed 6.12.8 but still cannot connect from one of my Vista64's to my BAM! account. No matter what password I put in (correct or even an incorrect one!) BOINCMGR tells me that "I am now connected to the account manager". The menu item "Synch with account manager" is missing from the menu bar. In the event log I see the following
    2010-12-18 12:50:40 PM | | Contacting account manager at http://bam.boincstats.com/
    2010-12-18 12:50:42 PM | | Account manager contact succeeded
    2010-12-18 12:50:42 PM | | [error] No signing key from account manager



I put total garbage in the account manager password box and BOINCMGR 6.12.8 still tells me I am "now using the account manager" which is incorrect.

At the boincstats site, they show my system as making a connection. ie: at the time that I was told "now using account manager" I see that time show up under the host "last connect". However, BM does not show "sync with project manager" and does not show "disconnect from project manager". If I was really connected then the "disconnect" should have been displayed.

Another problem: One of the colatez XML files has 0 length and BOINC reports it cannot read it. OK, but since the project does not show up, there is no way to reset it. I had to go to "\ProgramData\Boinc" and delete all the collatz files to clear the problem. I think problems with BAM! started when I upgraded to 6.12.4 but the system was crunching OK so I didnt bother finding out why the option "Sync with project manager" was missing.

Anyway, I do not know how to get BAM! to start working again. I did an uninstall of BOINC and also delete all the stuff at "Local Setting\temp" After I did the uninstall of 6.12.4 I checked the "Microsoft uninstall cleanup" and 6.12.4 was not there so the uninstall supposidly went OK.

thanks for looking

489) Message boards : BOINC client : need option to limit # of gpu's per task (Message 33625)
Posted 1 Jul 2010 by Profile Joseph Stateson
Post:
I just realized that dnetc was using 2 gpu's for each task. I was not aware that was even possible so I learned something today. There is no preference option at their website to limit the number of gpu's to one. I dont mind if another dnetc task is running on my other gpu, I just want to make one available for other projects to use. Is there some way to do this using boinc options?



Also want to point out that the device number is listed as 0. For my 2 nvidia system, the gpu devices used should be 0 and 1, not just 0.

ie: insteasd of ..(device 0)) it should be something like
..(device 0)(device 1))

etc

ps: I am getting 10x the number of credits when 2 gpu's are used (as opposed to 1 gpu), but I dont need the points that bad. Their admin never answered the question about who gets the prize money if the RC code is cracked. I would rather have some of the money instead of the extra credits.
490) Message boards : Questions and problems : submitted 2 alpha tests, but 4 show up (Message 32972)
Posted 23 May 2010 by Profile Joseph Stateson
Post:
Was wondering if I did something wrong. I submitted an alpha test for a 9.1 ubuntu system then hit the "back button" a couple of times, changed my comment to "8.1 ubuntu" and submitted the results for my 8.1 test (slightly different results than 9.1 but no bugs for the few items I was testing).

I then looked here and noticed my name (Joseph Stateson) show up with 4 "results". There should have been only 2 as I clicked on "submit" only twice
491) Message boards : BOINC Manager : Poll: BOINC Manager remote control (Message 32902)
Posted 20 May 2010 by Profile Joseph Stateson
Post:

Run it with a cc_config.xml file that has the following in it:

<cc_config>
<options>
<allow_remote_gui_rpc>1</allow_remote_gui_rpc> 
</options>
</cc_config>


You can add that into an existing cc_config.xml file. Since the file is saved in your BOINC Data directory, it won't be removed on a BOINC uninstall/upgrade install.


Thanks, I was not aware of that cc_config option.

I have had problems with boinc or boincmgr being started both in the registry and in the startup folder. That was some time ago and perhaps it was caused by some other problem. I may look at doing that again.

I do have two desktop tasks that I use. One starts boinc just like you suggeseted, the other runs boinccmd and issues the --quit. I needed these two since boinctasks does not have a way to stop and restart boinc like boincmgr does. I do not use the suspend on %25 property. Rarely (ie: whenever netflick arrives) I burn a dvd and that generally requires that I terminate all boinc tasks. Opening too many windows and applications such as MSVS can require that all boinc tasking exit.
492) Message boards : Projects : run time and other problems at collatz (Message 32901)
Posted 20 May 2010 by Profile Joseph Stateson
Post:
This is a repost of bad wu's that I posted to at the collatz forum. The complaint that some wu's never finish.

This just happened to me. I had two wu's in a dual 9800gtx+ box running xp-32 pro that take normally 30 minutes showing that almost 24 hours had elapsed. I rebooted and they finished within minutes.

There are no gpu nor cpu restrictions. Running 6.10.55. Temps were down in the low 50's, a sure sign that neither gpu was crunching.

What is strange is the following:

Before rebooting I very clearly saw 23:xx:xx(00:00:xx) for one of them. I dont remember the value of the xx, just the 23 hours and the low temps. This indicates 23 hours elapsed time and less than 1 minute cpu time (Elapsed time, boingtask column)

After rebooting, the tasks finished within minutes and are "ready to report" with the time for each one showing

00:35:41 (00:00:17) and
00:35:46 (00:00:18)

so what happened to the actual elapsed time of 23 or so hours? I think collatz was hung in the gpu and the time counter was not incrementing. It would seem to me that elapsed time should have been reported correctly and not the 35 minutes. Something is wrong here

I just did an update that release the two to the project. I went to the project and the run time is about the 30 minutes, Not anywhere near the almost 24 hours that had actually occured.

------

Collatz admin had mentioned earlier that he is reporting cpu time instead of gpu time because of boinc scheduleing problems. Seems like run time does not include wall clock elapsed time. In this case it would appear that neither tasks started in either one of the gpus.
493) Message boards : BOINC Manager : Poll: BOINC Manager remote control (Message 32898)
Posted 20 May 2010 by Profile Joseph Stateson
Post:
I use boinctasks for remote control and, except for debugging, I rarely use boincmgr any more. Since this is essentially a boinc farm, I do not do any graphics.

I used to list the complete range of ip addresss because my linux systems dont understand netbios and that ubuntu network manager on several systems seems broken to where only dhcp works. ie: there is a problem setting static IP's. I also swap flash drives in and out and the linux hostname and ip, especially, changes when I do that.

On all linux systems I have set my daemon for --allow_remote_gui_rpc
It would be helpful if the default password was "boinc" at gui_rpc_auth.cfg which is what Dotsch_UX uses.

On all vista & 7 systems I have replaced "boincmgr" at HKLM/Wow6432Node/Microsoft/Windows/CurrentVersion/Run
(the normal install place) with "boincclient" and
"C:\Program Files\BOINC\boinc.exe" --detach --allow_remote_gui_rpc
and got rid of that tray applet.

Unfortunately, I have to do this after every new install (get rid of boincmgr)

On my one xp pro system I use the service mode and the -allow_remote_gui_rpc but again, i have to edit that back in after every new install.

If you are going to change the manager code you might want to remove the message that " /b --boincargs " can be used to pass arguments to boinc. I looked at the code and boincmgr parses its arguments using some type of template scheme and does not have access to the set of rules that boinc uses. ie: no args to boinc are valid since they are parsed by boincmgr for boincmgr and will fail boincmgr rules. (Unless there has been a change since I last looked about 9 months ago)

It would be nice if " /b ' -allow_remote_gui_rpc ' " could pass that
-allow_remote_gui_rpc to boinc
494) Message boards : BOINC Manager : 6.10.43 bug? entire message buffer being duplicated (Message 32803)
Posted 14 May 2010 by Profile Joseph Stateson
Post:
I had to update this old thread because I finally figured out what was causing the error. I had inadvertantly installed the 32bit version of linux boinc on several 64 bit ubuntu systems. It actually ran some of the cpu tasks correctly, but on systems haveing a gpu that was where I had that strange message buffer problem. When I reverted back to 8.1 (Dotsch unix) that was a clean install so it got rid of the 32 bit boinc and that was why why it started working. The problem was not the 9.1 like I thought originally.

I found out because just an hour ago I did the same wrong install and get the exact same buffer response problem. This time I noticed the download was missing the "64" amd realized my error.
495) Message boards : Web interfaces : virus at boincstats? (Message 32404)
Posted 26 Apr 2010 by Profile Joseph Stateson
Post:
Willy - I sent you a message about this, please check your inbox.
496) Message boards : Web interfaces : virus at boincstats? (Message 32394)
Posted 26 Apr 2010 by Profile Joseph Stateson
Post:
Happened again, this time I snagged a copy of the "microsoft" section along with the comments by McAfee


All these seem to be generic download "trojans" that are used in advertising. I will not post anymore here. Just wanted to expose the wording of the microsoft section.
497) Message boards : Web interfaces : virus at boincstats? (Message 32340)
Posted 23 Apr 2010 by Profile Joseph Stateson
Post:
BOINCstats works for me.
Have you read this: Security update hits Window PCs, or similar from other sites?



Yea - thanks for that article, it was interesting to say the least. I tried about 30 seconds later and got onto boincstats w/o any problem. However, this problem seems different because (if you can belive the message) microsoft states that the site was reported as unsafe. The attribution to microsoft is behind the macafee popup as I moved the popup to center it. It asked the viewer to click to send a message stateing that the site was known to be safe. I wonder if they take a vote and if the majority decides the site is safe then it is safe? However, sicne I cant replicate that warning, I cannot be positively sure that microsoft was attributed.

Ihave never seen that all red screen show up and I put a lot of time into surfing.

Whatever the problem was, it is gone now. My McAfee dat is 2 version later then the one reported in that security bulletin.
498) Message boards : Web interfaces : virus at boincstats? (Message 32336)
Posted 23 Apr 2010 by Profile Joseph Stateson
Post:
Cant get to boincstats anymore - microsoft and McAfee wont let me. Anyone know what is going on? Maybe McAFee had a brain freeze like the time it deleted my vnc server.


499) Message boards : Questions and problems : Linux (Karmic) not running low cpu tasks at nice=10 anymore (Message 32242)
Posted 18 Apr 2010 by Profile Joseph Stateson
Post:
Ok - I finally got collatz to run at nice=0 in Karmic which solved my GPU feeding problem. I also learned something (possibly)

I installed the recent (2.3 actually) linux cuda toolkit from nvidia. After doing ldconfig, the tool ldd ...collatz... showed that I was now using the 2.3 version and that seems to have got me nice=0. So - I am assuming the problem was using an older library, 2.2, that came with the collatz linux package. That one was showing nice=19.

500) Message boards : Questions and problems : Linux (Karmic) not running low cpu tasks at nice=10 anymore (Message 32235)
Posted 17 Apr 2010 by Profile Joseph Stateson
Post:
I am guessing that the problem is Karmic (Ubuntu 9.1) because Dotsch_UX (8.1) runs collatz at nice=10 just fine. Unfortunately, in 9.1, collatz is running at nice=19 and cannot get enough of the cpu to feed the GTS250 as I just discovered.

Rather than repost everything, here is a link to the problem.

There is also an interesting discussion here where it is pointed out that the GPU time is substituted for the CPU time when reporting time back to boinc. It would appear that is one way to avoid the low cpu time problem.

If boinc "dispatches" a task, such as collatz, could it not set nice to a smaller number such as 10 or even 0 if it knew that the cpu load was very low?
501) Message boards : Questions and problems : BoincTasks alternative BOINC manager (Message 32136)
Posted 12 Apr 2010 by Profile Joseph Stateson
Post:
.48 installed just fine on my vista-64. I have been comparing it to boincview and it looks great. Thanks!

.49 will not install, it brings up my VS2008 debugger. If I position the address location to the next subroutine return and let it continue then it seems to run. I uninstalled it and went back to .48

You have done a lot of work there, thanks again!

502) Message boards : Questions and problems : BOINC 6.10.43 - Runs two task on single gpu (Message 32089)
Posted 10 Apr 2010 by Profile Joseph Stateson
Post:
Please ignore the stated "gpu load is 0" I posted above. There is a load and the two boards are working, but I am getting 0 for the gpu load which is incorrect. I checked another system (XP and single gts250) and gpuz and msi both show 0 for its gpu load which I know is incorrect. From the test I ran it would appear that it simple takes about 1-2 minutes for one of the gpu's to switch fron device 0 to device 1 after one device is resumed. During that time both collatz and seti seemingly where on device 0. The may not be the same problem as reported in this thread. HTH.

[EDIT}

I have two collatz tasks supposidly running on device 0 for the last 10 minutes. Both GPUs are running warm so I assume both are being used. By alternately suspending and resuming tasks I was able to get two task stuck on device 0. Since both are crunching and both gpu's are running warm I suspect both are being used although I do see "Device 0" for both.

again, HTH.
503) Message boards : Questions and problems : BOINC 6.10.43 - Runs two task on single gpu (Message 32087)
Posted 10 Apr 2010 by Profile Joseph Stateson
Post:
I as able to duplicate the problem on vista 64, 6.10.43. I had two 6.08 tasks running and "resumed" collatz. Collatz immediately went to device 0



I brought up gpuz and the load on both my gts250 and 9800gtx+ are "0" . After about 2 minutes one of the tasks switched to device 1. Currently, both task seem to be making progress but gpuz and msi afterburner both show 0 gpu load.

[EDIT] I do not remember if collatz was originally on device 0 when suspended. Perhaps it just took a minute or to before the seti task was switched to 1.


504) Message boards : BOINC Manager : linux 6.10.43: cherry picking bug (Message 31966)
Posted 5 Apr 2010 by Profile Joseph Stateson
Post:
Test if 6.10.44 does the same thing. It'll be available from the normal BOINC download page within the hour.


Yes, 44 has the same problem. I downloaded it to Dotsch_UX (8.1 ubuntu) and ran a test. I used snagit to capture the screen. The movie is really bad because I was using ultravnc to access the linux desktop and snagit on my vista system to record.

Movie is here

I started recording with 8th from the bottom already suspended and the cursor is positioned on the 3rd one above the suspended task. Latency is terrible.

I held the shift and moved the cursor down. Shortly after I passed the suspended line item, the button at the top left "Suspend" becomes inactive as expected. Then, still holding down the shift, I moved the cursor up. If you look closely you can see the cursor motion but the blue highlight stays on. As I move the cursor key up, there should be fewer and fewer colatz items selected and one should see less of the blue highlighted items. Eventually, I pass and go above the item I started with. Note that the suspend button is left inactive.

On a windows system, when I get thru there should be only 3 items highlighted. Instead, the linux system shows all items highlighted. Only the ones between the start and the finish should be selected IMHO.
505) Message boards : BOINC Manager : linux 6.10.43: cherry picking bug (Message 31961)
Posted 5 Apr 2010 by Profile Joseph Stateson
Post:
AFAIK, BOINC Manager follows the standard (or is it Windows only?) convention.

If you hold down the 'Shift' key, the selection becomes "this one, the first one you clicked, and everything in between."

If you hold down the 'Ctrl' key, you can select or deselect individual items in the list, without affecting the ones already selected/deselected.


Yes, one would hope they should work the same way especially with the same program and version number.

There are actually two bugs here.

(1) When backing the cursor off the unselected items should have their highlight removed. This does not happen. The blue highlight is still shown even though the items appear to have been deselected (you can see the cursor moving back the other direction)

(2) When backing the cursor off, the item that was in the wrong context (ie: it was already suspended) is deselected. However, the resume button is not re-activated.

The above two bugs are not in the windows version. Of course, this could be a feature to make it harder to cherry pick the data.
506) Message boards : BOINC Manager : linux 6.10.43: cherry picking bug (Message 31955)
Posted 5 Apr 2010 by Profile Joseph Stateson
Post:
This bug occures only in linux and is in 6.10.17 as well as 6.10.43. If you hold down the shift and start selecting tasks and one of the tasks conflicts with what you are trying to do (like it is already suspended and you are trying to suspend some more) then you have to release the shift button, clear the selection list, and start over.

It can be easily demonstrated, make sure you have some tasks that are ready to start. I am assuming 10 tasks that have not yet started and all have 0 progress and are marked "ready to start".


Select the task tab in BM and sort progress from 100% down to 0% Assume you have 10 tasks at the bottom that are ready to start.

Select the task 5th from the bottom and click on "suspend" note that it changes to "resume".

Select the first of the "ready to run" tasks, then hold down the shift key and start clicking the down cursor key. Note that the selections are all highlighted and the "Suspend" button is NOT grayed out.

When you move the cursor over the task that is already suspended (the 5th one) the "Suspend" button becomes inactive and is grayed out.

Still holding the shift key down, if you then click the up cursor key the inactive "suspend" button does NOT become activated. It remains grayed out. In addition, the 5th tasks stays highlighted even though it was supposidly de-selected.


If you do the above in any windows version it work perfectly: Moving the cursor up off of the already selected line item de-selects that task, un-highlights it and the "suspend" button is then re-activated and can be used on the items you have selected. In linux you have to release the shift key, and start over.


At one time I was trying to resume about 100 tasks that were suspended (dont ask me why) and I was holding down the shift key and hitting the page down, but I passed thru a task that was running. That one task inactivated the "resume" button and i had to start all over again because the "resume" button would not become active again.
507) Message boards : Questions and problems : possible sched bug: einstein needs 1 cpu + 1 gpu (Message 31916)
Posted 2 Apr 2010 by Profile Joseph Stateson
Post:
Got 7 tasks running on my windows 7 system after all. I didnt think it was possible. This is a dual opteron (4 cores total) with one 9800gtx (BFG) and one 9800gtx+ (XFX) although boinc 6.10.43 thinks they are the same.

anyway, here are my 7 active tasks



I can see where scheduling can get confused. Note that einstein (the 1 + 1 guy) is not running anymore.

Supposidly, milkyway can run two tasks at once on the same gpu. Unfortunately, I have only (on this system) single precision gpu's tho I read where somebody actually got the single precision to work on that project.
508) Message boards : Questions and problems : possible sched bug: einstein needs 1 cpu + 1 gpu (Message 31913)
Posted 2 Apr 2010 by Profile Joseph Stateson
Post:
I used the term "non cpu" for freehal but I have only seen that term used on my linux systems as shown here:



[EDIT AGAIN]

Not sure who / what is doing this, but when I view all my systems using boincview then all freehal show up as non-cpu intensive, not just the linux ones.

509) Message boards : Questions and problems : possible sched bug: einstein needs 1 cpu + 1 gpu (Message 31912)
Posted 2 Apr 2010 by Profile Joseph Stateson
Post:
I have not seen this problem before. I just started crunching freehal recently, which seems not to use the cpu pool. I typically get 2 extra cpu tasks with freehal one of them and a gpu the other (on a system with 1 gpu). On systems with 2 gpus I see 6 tasks running (4 cpu + 2 gpu) and freehal is not running as it is waiting to upload. I have never seen 7 tasks. Then when freehal gets a slice I still have 6 tasks but have lost one of the others.

I am just guessing that freehal program is the problem. POssibly the scheduler marks it as non-cpu when it really is????
510) Message boards : Questions and problems : possible sched bug: einstein needs 1 cpu + 1 gpu (Message 31910)
Posted 2 Apr 2010 by Profile Joseph Stateson
Post:
Hi Jord

Yea - I sort of knew that as I am shown "1.0 cpu + 1 nvidia" by BM. However, I suspect there is some type of scheduling problem going on when more than 1 gpu is being used. For example, this windows 7 system had an idle 9800gtx+ for some reason and I do not know how to debug it.



So where is the other nvidia gpu? I would think that one of the einstein or one of the seti would be running based on the following "all tasks".





So what is causing the other nvidia not to run? It is shown in messages



I just closed boinc and restarted it and picked up an einstein



Note that the above show only 4 task running whereas the one with the unused nvidia had 6 tasks running. There are only 4 cpus on this system and all are allocated. I wonder if that supposidly "not cpu-intensive" freehal is pulling more cpu work that it is supposed to? I just checked again and I picked up two more tasks for a total of 6 including 2 gpu

511) Message boards : Questions and problems : possible sched bug: einstein needs 1 cpu + 1 gpu (Message 31904)
Posted 2 Apr 2010 by Profile Joseph Stateson
Post:
On a quad system with 3 cpu tasks running and 2 collatz (.18 cpu + 1 nvidia each) I decided to suspend one of the collatz so that the einstein could start up, finish, and get out the door. It had been waiting for a free gpu. Since I already had 1 cpu free, then I assumed that the newly free nvidia and the unused cpu would allow the einstein to start up. It didnt happen and after about 15 minutes of waiting I closed boinc and restarted it. That worked.

So counting my cpu's, even with one supposidly free, part of it (.36 exectly) was probably running the two collatz. Suspending one of the collatz freed up a full gpu, but it seems that only .82 of a cpu was really available for einstein and it would not start. Restarting boinc must have added the same numbers up in a different order thus semingly breaking symmetry.

thanks for looking at my new math.
6.10.43
512) Message boards : Questions and problems : where to ask about boinc develop code and bugs? (Message 31839)
Posted 29 Mar 2010 by Profile Joseph Stateson
Post:
I have some general questions to ask about the source code to boinc. I am registered and receive email from boinc.dev and boinc.alpha but neither of those feeds seem appropriate for a newbie.

(1)

Why is the difference between the "trunk" that I downloaded recently, and the latest tag "6.10.43" so great? The piece of code I am analyzing, CViewProjects::OnProjectWebsiteClicked, was patched in 6.2.12 or earlier, but the trunk does not show it.

Since I just want to debug a problem or two and submit a bug report, perhaps I should be working with the tag?

I have read thru this to get up to speed on SVN. I usually work with source safe, an M$ product.

(2)
apt-get source boinc returns 6.2.12 under Dotch_UX which is 8.1 ubuntu. OK, I read here that wxLaunchDefaultBrowser was replaced with a debian patch that seems to have caused problems with the CViewProjects::OnProjectWebsiteClicked code. That patch is not in the trunk but it is in 6.2.12 and 6.10.43. However, according to the above launchpad bug, ExecuteLink has ALSO been patched. I assume that patch in in debian code, not boinc. So if I grab the latest 6.10.43 from boinc downloads and replaced boinc and boinmgr (in 8.1) could that cause the problem I am seeing where I cannot launch the project url from boincmgr? This problem was also observed by the Dotch_UX developer. Note that ubuntu is not exactly the same as debian and the sensible-browser is in debian utils whereas ubuntu seems to use environment variables such as $BROWSER. IANE on linux.
513) Message boards : BOINC Manager : ubuntu 9.1: cannot launch browser from project url (Message 31758)
Posted 25 Mar 2010 by Profile Joseph Stateson
Post:
I did get it working, see below.

BOINC Manager follows the default browser set in the operating system. So you will have to figure out how to set the default browser in Ubuntu. Check if this FAQ is of use.


That FAQ is correct, the problem seems to be wxLaunchDefaultBrowser has been patched to handle "default-browser" or so I have been reading here. Boinc uses wxWidgets.

After reading the above (which didnt apply since firefox was never launched) I then read where /usr/share/mime-info/gnome-vfs.keys had to be edited and Epiphany replaced with the desired browser. IANE on linux. Unfortunately, that didnt apply since I was not in an email and clicking on a url. I then restored default-application-id to Epiphany, installed Epiphany and selected Epiphany as the default browser. That works, and if I click on any project link buttons in boincmgr it brings up Epiphany and I get to the web site. I just cant get firefox launched the same way.

I am going to post the above stuff over at the Dotsch/UX forum since he is supplying boinc and ubuntu but has firefox as the default. Maybe someone over there can figure out how to get firefox to show up instead of Epiphany. The first thing I tried after installing Dotsch/UX were those project url links and they failed.

The gnome mime defination file list about 6 browser but none are firefox. That needs to be fixed one would think.
514) Message boards : BOINC Manager : 6.10.43 bug? entire message buffer being duplicated (Message 31755)
Posted 25 Mar 2010 by Profile Joseph Stateson
Post:
This was solved (sort of) by installing Dotsch as explained here Dotsch 1.2 is actually 8.1 so essentially I reverted 9.1 back to 8.1 and the problems I had been seeing disappeared.
515) Message boards : BOINC Manager : problem with & in team name (Message 31742)
Posted 24 Mar 2010 by Profile Joseph Stateson
Post:
I noticed that every project listed in the client_state.xml file has the same faulty name

ie: for AQUA at home

    <master_url>http://aqua.dwavesys.com/</master_url>
    <project_name>AQUA@home</project_name>
    <symstore>http://aqua.dwavesys.com/symstore/</symstore>
    <user_name>BeemerBiker</user_name>
    <team_name>Texas A&amp;M University</team_name>




However, aqua is NOT one of the projects that displays the &amp. They correctly show A&M at their web site and also in boincmgr.

So is the &amp ok for use in client_state.xml?

thanks for looking

516) Message boards : BOINC Manager : 6.10.43 bug? entire message buffer being duplicated (Message 31717)
Posted 22 Mar 2010 by Profile Joseph Stateson
Post:
Found Bug (or at least I know how to duplicate the problem) I ran these test from a Windows Vista 64 system using boinc 6.10.43. The two target systems were 8.04 and 9.1 ubuntu. The 8.04 worked correcty.

I have a total of 4 ubuntu systems, two of them are 9.1 the others are 8.04. The older OS work fine. Both of the 9.1 have the problem.

I ran netmon to monitor traffic between two of my systems which helped spot the problem. I then ran boinccmd to duplicate the problem.

For example: boinccmd --host 192.168.0.9 --get_message
this retrieves all the message from an 8.04 ubuntu system I have that is running 6.10.32

As shown below, the sequence number of the last message 114. If you then make the same request but specify sequence number 114 (an offset) you should then get NO messages (unless there are new messages). This works correctly. Note that there is no text between the command that was typed and the empty prompt underneath it.



OK the above worked correctly, there are no additional messages beyond <seqno>114</seqno>

Now look what happens when I make the same request to a system running ubuntu 9.1 and boinc 6.10.17. Here is the first response to
boinccmd --host 192.168.0.11 --get_message
note that the message number ends in 71



Look at the last command I typed (above). When one presses Enter there should be no response (unless there are additional message after seqno 71)



BINGO! The entire message buffer was re-displayed! That should not have happened!

[EDIT]
There is something else going on here. I am not getting consistent failures. For example, I can run the command 3 or 4 times in a row and it works perfectly: no results after the last seqno. Then all of a sudden I get the entire message buffer, then I am back to where it works ok and no messages are displayed if there are no new messages.
517) Message boards : BOINC Manager : 6.10.43 bug? entire message buffer being duplicated (Message 31694)
Posted 21 Mar 2010 by Profile Joseph Stateson
Post:
Just went back to 6.10.17 and got the same problem. I checked the resource manager and I am only running boinc and boincmgr once.

This problem is on 9.1 ubuntu and I am running 2.6.31.20. Just put in a kernel update a few minutes ago thinking that might fix it, but the update did not require an nvidia kernel build (unlike the previous 9.1 updates) and the problem continues.

I do not see this problem on my 8.04 ubuntu system. Using 6.10.43 on vista 64 I did a "select computer" to the 8.04 ubuntu and examined its message logs. I see a contiguous string of dates with no jumbled dates out of order or any duplicates ie:


    3/19/2010 11:58:21 PM Starting BOINC client version 6.10.32 for x86_64-pc-linux-gnu
    3/19/2010 11:58:21 PM log flags: file_xfer, sched_ops, task



all the way down to the following with all dates in correct order


    3/21/2010 11:53:47 AM FreeHAL@home Reporting 1 completed tasks, not requesting new tasks
    3/21/2010 11:53:52 AM FreeHAL@home Scheduler request completed



If I try to do the same on ubuntu 9.1 the dates are all jumbled with plenty of duplicate entries. Something is amiss.

in 8.04 I also have boinc as a daemon. I set boinc up the same for both systems other than I do not have any gpu devices in 8.04 as I could not get them to work.

Want to point out that the jumbled dates and duplicate entries in the message log seem to be cosmetic. The projects crunch along just fine. I would not have noticed the problem except I was examining the logs in detail due to a problem with collatz.

518) Message boards : BOINC Manager : 6.10.43 bug? entire message buffer being duplicated (Message 31693)
Posted 21 Mar 2010 by Profile Joseph Stateson
Post:
sorry - duplicate post
519) Message boards : BOINC Manager : 6.10.43 bug? entire message buffer being duplicated (Message 31690)
Posted 21 Mar 2010 by Profile Joseph Stateson
Post:
Not sure what is going on here. It seemed to start when I put in 6.10.43 on my Ubuntu 9.1 systems.

I should see my nvidia gpu's listed just once, at the top of the dialog box for messages. Instead, they are listed multiple times. It would appear that every time a block of messages are written to the boinc message buffer, the entire message buffer is replicated. The longer I leave boincmgr running, the worse it gets. I did a grep for "nvidia" it is listed 100's of times. Should have been just listed twice - once for each gpu at startup of boincmgr.

I went and deleted the stdout files that have the messages, stopped boincmgr, restarted. Seemed to be ok: about 150 messages all dated to the same 1-2 seconds (the startup message). Then slowly but surelly the entire message buffer started being duplciated. Intead of a message that a task completed, I got the entire message buffer duplicated.

I copied and pasted the crap here

I only spotted this problem when I attempted to debug why collatz tasks were being (supposidly) rejected because the app was wrong. AFAICT all my collatz wu's are being processed and reported to the project correctly even though the boincmgr indicated 100's of collatz tasks are wrong. That is also shown in the above mentioned crap.

This problem occured on two ubuntu systems, both 9.1 and 6.10.43
520) Message boards : BOINC Manager : ping Ageless: question about wiki RAC spreadsheet (Message 31607)
Posted 16 Mar 2010 by Profile Joseph Stateson
Post:
Jord: I downloaded the excel spreadsheet at
http://www.boinc-wiki.info/Recent_Average_Credit and I cannot make it work. All I see is #NAME? indicating an unknown name. I noticed you were listed in the wiki credits as the one to shoot down.

Anyway, the following is what I am staring at:



so is the spreadsheet broken?

521) Message boards : BOINC Manager : problem with & in team name (Message 31585)
Posted 15 Mar 2010 by Profile Joseph Stateson
Post:
I am seeing the html "&amp;" show up in yoyo and cosmology in the linux version of boincmgr, but the windows one shows the correct text The top figure is 9.1 Ubuntu the one below is vista 64. Yoyo is not shown on the linux one, but it has the same problem.
Team name should be "Texas A&M University" but as shown below, "Texas A&amp;M University shows up". When I go to the actual projects I do not see any problem in the team name. ie: at the yoyo team page, there seems to be nothing wrong with the name of the team.



For what it is worth, boincview shows the &amp; on cosmology and yoyo for ALL platforms, not just the linux platform. Since boincview interfaces with boinc directly, maybe this is a boinc core bug.




522) Message boards : BOINC client : 6.10.32 linux: pct cpu not working with AQUA MT (Message 31207)
Posted 25 Feb 2010 by Profile Joseph Stateson
Post:
The number of CPUs used by AQUA is set by the number of CPUs available at the time the task is allocated by the scheduler (there's a command line parameter in the workunit specification) - you'll probably find that the next AQUA task you download is set to 3 CPU