Posts by Wedge009

1) Message boards : Questions and problems : BOINC Manager memory saturation upon adding project (Message 112786)
Posted 2 Oct 2023 by Wedge009
Post:
It's not a new client, it's been running on all of my Linux machines. But that's a digression from the main topic here, which was the memory consumption.
2) Message boards : Questions and problems : BOINC Manager memory saturation upon adding project (Message 112752)
Posted 26 Sep 2023 by Wedge009
Post:
I worked around the issue by using the command-line. But I still find it worrying that something could cause BOINC Manager to consume memory indefinitely.
3) Message boards : Questions and problems : BOINC Manager memory saturation upon adding project (Message 112751)
Posted 25 Sep 2023 by Wedge009
Post:
Got a weird problem I haven't seen before. When encountering the 'outdated code signing key' issue, I'm trying to re-add the project after removing it (as per server instructions). For the first time, I'm not able to re-add the project: BOINC Manager becomes unresponsive at the 'Communicating with project. Please wait...' dialogue, and I notice a steady increase in memory usage that would presumably saturate my system if I don't kill the BOINC Manager process first.

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   8084 ........  20   0  106.4g   4.3g  76148 R 100.0  27.6   3:26.59 boincmgr
(4.3g and climbing...)

What could cause this?

(I've tried rebooting, re-installing BOINC, etc)
4) Message boards : Questions and problems : BOINC on Android: Problems with suspension of computation and network (Message 112543)
Posted 16 Aug 2023 by Wedge009
Post:
Um, my point is that whether this is checked or not, BOINC thinks network is suspended. It was working fine before the 2-day outage.

What do you mean by 'set the network type'? (I just checked - earlier I had reset it so 'Transfer tasks on WiFi only' is checked again, still nothing.)
5) Message boards : Questions and problems : BOINC on Android: Problems with suspension of computation and network (Message 112541)
Posted 16 Aug 2023 by Wedge009
Post:
Greetings, two problems I've been having recently:

The first is the more pressing concern. With the recent (14-15 August 2023) power outage for Einstein@Home project, I bumped my work buffer to 2 days. But now that the project has been restored, one phone is insisting 'Suspending network activity - user request' while the other has no such communication problem. Any idea what could be blocking task upload? Otherwise I'll have a bunch of completed tasks that can't be submitted back to the project and potentially becoming expired.

I have already tried Projects -> Retry transfers (it currently says 'Transfers suspended (10 Upload)'), and Preferences -> Transfer tasks on WiFi only (unchecked - there's no SIM anyway), Daily transfer limit: 99999 MB (0 default), Daily transfer limit: 99999 Days (0 default).

The second problem is intermittent but increasingly more frequent, where BOINC client thinks the battery charge level is 0% - and suspends computation - but the OS and BOINC GUI both say battery level is 100% (or whatever the current state is, still far greater than the suspend computation threshold). It usually happens after completing one task and then it's stuck doing nothing while another task is ready.

Currently running BOINC 7.24.1 on two phones but I don't think the issues are specific to that version - certainly I've run into the suspended computation issue with 7.22.x, maybe even with 7.18.x
6) Message boards : Questions and problems : BOINC 7.18.x and later: Computation error oddly specific to ROCm (Message 110044)
Posted 8 Oct 2022 by Wedge009
Post:
It turns out that disabling some of the systemd hardening is a work-around for this issue. I consider it only a work-around because it wasn't necessary for BOINC 7.16.17, and presumably the hardening is there for good reason.

https://github.com/BOINC/boinc/issues/4948
7) Message boards : Questions and problems : BOINC 7.18.x and later: Computation error oddly specific to ROCm (Message 109915)
Posted 30 Sep 2022 by Wedge009
Post:
I did some digging - initialize_ocl() seems to be a function in Einstein code, not BOINC. For whatever reason, though, newer BOINCs cause a problem in it. According to the source code for Einstein BRP (which may well be out of date) error code 2013 is the definition in demod_binary.h for RADPUL_OCL_MEM_ALLOC_DEVICE. It's one of the error codes in response clCreateCommandQueue(), which is an OpenCL function. Error code -6 corresponds to CL_OUT_OF_HOST_MEMORY. It seems to be a common error code for a variety of reasons, so I suspect it's not really out of memory, just some weird interaction between potentially old Einstein code and new BOINC versions. Why and how newer BOINCs are causing this is still a mystery to me, however.
8) Message boards : Questions and problems : BOINC 7.18.x and later: Computation error oddly specific to ROCm (Message 109857)
Posted 22 Sep 2022 by Wedge009
Post:
This issue is oddly specific to AMD GPUs running ROCr-based OpenCL on Linux. It doesn't appear to be a problem for NV GPUs or AMD's legacy OpenCL on Linux, or for any Windows-based set-up. (AMD OpenCL support for Linux requires ROCm for Vega GPUs and later.)

When attempting to run GPU tasks for Einstein@Home, it results in 'computation error' within ~10 seconds, an example:
<message>
process exited with code 69 (0x45, -187)</message>
<stderr_txt>
09:31:22 (11580): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16

09:31:22 (11580): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'.
09:31:22 (11580): [debug]: 1e+16 fp, 5.9e+09 fp/s, 1785112 s, 495h51m52s17
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3012L12220912.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 836.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3012L12220912_0844_11462382.dat --debug 0 --device 1 -o LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out
output files: 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3012L12220912_844.0_0_0.0_11462382_1_0' 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3012L12220912_844.0_0_0.0_11462382_1_1'
09:31:22 (11580): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
09:31:22 (11580): [debug]: glibc version/release: 2.35/stable
09:31:22 (11580): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x1e97b40 , 0x7fabc0742d90]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx900:xnack-" by: Advanced Micro Devices, Inc.
Max allocation limit: 7287183768
Global mem size: 8573157376
Couldn't create OpenCL command queue (error: -6)!
OpenCL shutdown complete!
initialize_ocl returned error [2013]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
09:31:22 (11580): [CRITICAL]: ERROR: MAIN() returned with error '5'
FPU status flags:
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah3012L12220912_844.0_0_0.0_11462382_1_0.out.cohfu': No such file or directory
09:31:34 (11580): [normal]: done. calling boinc_finish(69).
09:31:34 (11580): called boinc_finish

</stderr_txt>

I've determined that this issue appears to be specific to BOINC because while I confirm it's a problem with BOINC 7.18.1 and 7.20.2, it's not a problem with 7.16.17. All other hardware and software remains the same - even the same desktop session (ie no rebooting between BOINC installations). I wonder if it's a permissions issue, because of all the file missing messages - is there a change in how BOINC runs GPU tasks between 7.16.x and 7.18.x that ROCm might be sensitive to?

Here are some of my Linux hosts:
Ubuntu 20.04, ROCr-based OpenCL, can only run successfully up to BOINC 7.16.17:
https://einsteinathome.org/host/12803029

Ubuntu 22.04 (issue also occurs on 20.04), ROCr-based OpenCL, can only run successfully up to BOINC 7.16.17:
https://einsteinathome.org/host/12918837

Ubuntu 22.04, legacy OpenCL, running just fine with BOINC 7.20.2:
https://einsteinathome.org/host/12887570

On the other hand, I've found a host that's using BOINC 7.18.1 and appears to be running AMD GPU fine, but I can't tell what the amdgpu set-up is. (I've attempted to contact the owner before but never got an answer.)
https://einsteinathome.org/host/12941414
9) Message boards : GPUs : BOINC 7.10.2 - Windows 7 - OpenCL GPU Detection (Message 86522)
Posted 10 Jun 2018 by Wedge009
Post:
For future reference: for whatever reason, in my particular dual-GPU set-up, a 64-bit compilation fails in COPROCS::get_opencl(), gpu_opencl.cpp:343. After finding two OpenCL GPU platform IDs, it fails when attempting to find the AMD GPU Device ID:

        ciErrNum = (*p_clGetDeviceIDs)(
            platforms[platform_index],
            (CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR),
            MAX_COPROC_INSTANCES, devices, &num_devices
        );

        if (ciErrNum == CL_DEVICE_NOT_FOUND) continue;  // No devices

It doesn't seem like it should be necessary, but I'll stick with 32-bit compilation as a work-around.
10) Message boards : GPUs : BOINC 7.10.2 - Windows 7 - OpenCL GPU Detection (Message 86521)
Posted 10 Jun 2018 by Wedge009
Post:
Okay, totally bizarre but I've found why my compilation didn't match the BOINC release - my compilations were 32-bit only while the BOINC release I was using was the 64-bit version. I just tried a 32-bit release and the GPUs are being detected okay.

Might anyone know why 64-bit BOINC can't detect Vega properly (apparently)?
11) Message boards : GPUs : BOINC 7.10.2 - Windows 7 - OpenCL GPU Detection (Message 86520)
Posted 10 Jun 2018 by Wedge009
Post:
So I managed to compile the boinc client from branch client_release/7/7.10.2 (commit f6033b09) and it's detecting the Vega while the BOINC 7.10.2 binary release does not. I'm getting quite confused by now. Does anyone have any information on who does the compilation for the releases on boinc.berkeley.edu?
12) Message boards : GPUs : BOINC 7.10.2 - Windows 7 - OpenCL GPU Detection (Message 86512)
Posted 9 Jun 2018 by Wedge009
Post:
Thanks for the quick responses - I didn't have time to go into too much detail earlier.

I don't recall ever having an issue with GPU detection in Windows, only in Linux and that was years ago - I think the driver situation has improved much since then.

Under BOINC I've run NV Pascal, Maxwell, Kepler, and Fermi of all sizes without issue in Windows (OpenCL as well as CUDA), and AMD Fiji, Hawaii, Bonaire, Tahiti, Pitcairn, and even pre-GCN Cayman, Barts, RV730 and RV630. I don't remember BOINC having issues detecting any of those under Windows. In my current case, GPU1 is NV Pascal and GPU2 is AMD Vega. Vega has been around for nearly a year now, so I'd be surprised if there's something wrong with the driver side - as I said I can run OpenCL applications off-line apart from BOINC just fine, plus my cobbled-together stand-alone BOINC GPU detection could read both the Pascal and the Vega.

That pre-compiled clinfo reproduced the curious scenario I described earlier - the May 2018 clinfo that was put into my Windows/System32 directory (I think that's from the AMD installation) only picks up the NV Pascal and the CPU. The 2011 copy you linked to picks up those two as well as the AMD Vega.

I also did a quick test under Linux with AMD Fiji and AMD Vega together. Only the Fiji was detected by BOINC so I doubt mixed-vendor set-up is a relevant concern. Really puzzled why BOINC is having difficulty with Vega when there is more than one GPU involved.
13) Message boards : GPUs : BOINC 7.10.2 - Windows 7 - OpenCL GPU Detection (Message 86503)
Posted 9 Jun 2018 by Wedge009
Post:
I have been running BOINC for years, am currently in the middle of trying to upgrade a dual GPU system. Quick summary:

Have two GPUs - GPU1 and GPU2. GPU1 and GPU2 are detected on their own just fine. GPU1 is detected fine together with any other GPU. GPU2, however, isn't detected in combination with GPU1 (have swapped physical PCIe slots with the same result).

coproc_debug doesn't give much info. I pulled the OpenCL detection routines (gpu_opencl.cpp) from BOINC into a stand-alone application for debugging purposes (I didn't have the time to compile and debug the entire BOINC application). The crazy thing is that even with the hardware set-up that BOINC doesn't detect correctly, my stand-alone application does.

Running current releases of NV and AMD drivers respectively. Another curiosity is that clinfo doesn't appear to detect GPU2, but an older version of clinfo does. Both GPUs concerned here are current generation, current architectures.

I have to rush off now but I'll provide more details later - just wondering if anyone has any ideas/thoughts on this because I'm starting to run out of them.

Edit: I'll add that stand-alone OpenCL applications work just fine as well. BOINC just isn't detecting the GPU for some reason.




Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.