Posts by JStateson

1) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95261)
Posted 13 minutes ago by Profile JStateson
Post:

I wonder if you can daisychain the 4 way splitters to get infinite cards?


I suspect you can have an infinite number of "bus id's" but, as suggested by pro digit, if a unique lane must be associated with each "bus id" then there is a limit.

On the other hand, if the driver is smart enough, it could use the same lane for all the traffic to the multiplexer (the 4-in-1) but that is a guess as I have no knowledge of the workings of the multiplexer.

Looking at this and assuming it is not "fake news" one would think that 104 boards on risers would need 104 lanes.
https://videocardz.com/newz/biostar-teasing-motherboard-with-104-usb-risers-support-for-mining
2) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95237)
Posted 13 hours ago by Profile JStateson
Post:
I finally received the x1 to x16 USB risers. I don't yet have the 4 way version, it's in the post.

I connected an AMD R9 280x via one of the risers to the PCI Express 2.0 x16 slot, and it ran Milkyway or Einstein at full speed (two tasks per GPU). Same full speed when connecting it to the PCI Express 1.0 x1 slot.

I'm not sure it is really only PCI Express 1.0 though. This specs page doesn't state the version for the x1 slots:
https://www.asus.com/Motherboards/P5ND/specifications/
I find it hard to believe they'd use both versions on the same motherboard, I'm just going by what someone wrote above.


I ran SETI on P5K, P5E and P7N using core-2-quad for years and gave most away. I did get one down from the attic that had had a lot of x1 slots and tried risers but the problem was the CPU, not the risers when adding more GPUs and more so on windows.

One test I would like to run but I no longer have socket 775 boards would be run a load test on Einstein to see if the problem is the number of boards:

Using a core 2 duo, run 2 concurrent tasks on 2 boards and compare that to 1 task each on 4 boards. I have been wondering if dedicating a core to a single board with 2 tasks is more efficient than 2 cores allocated to 4 boards.

[edit] Forgot to mention in my earlier post: I bought a second set of 1-16 risers from the same company as the first set. The new purchase came with a warning that the manufacturer had released a number of risers that had the polarity reversed on the capacitors. He included a picture of an incorrect assembly: the shaded top 1/2 at the top of the capacitor was not on the same side as the colored design on the board where it was soldered. This would mean the + and - were reversed. The seller said to return any defective to him for replacement. I went and checked all my risers and all were ok.
3) Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card? (Message 95230)
Posted 15 hours ago by Profile JStateson
Post:
I had mixed results with risers on old motherboards and especially those 4-in-1 risers.

An older X8DTL (1366 socket) required a board in the X16 slot to install Ubuntu 18.04. After installing ubuntu I was able to replace it with a riser. A 4-in-1 riser only showed one board when more than 1 ATI was used so I never got more than 4 boards to work.

my TB-85 (8 slot) worked fine with 8 risers, all gtx1060, ubuntu 18.04. For a "seti wow event" I temporarily added first a gtx1070 and then a 1070Ti. Things quickly went south, probably because of the different mix of boards.

I would see the following about twice a week
"Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU**" 
In addition, the fan sensors frequently reported "ERR" instead of RPM.

I saw that "reboot" message daily when I added the second "extra" board. I tried a 2nd splitter thinking that keeping similar boards on the same splitter would help. I ended up getting an H110BTC that has 12 x1 slots and the 4-in-1s are in the scrap pile.

The TB85 had settings for lane speed and I tried a lot of variations where I set the lane speed to spec 2 for the slot that had the 4-in-1 but eventually left it all as "default" as things got even worse.

The problem I have now with TB-85 and H110BTC are projects like Einstein and GPUgrid that use almost a full CPU while SETI and Milkyway use a small fraction. My gen 6 & 7 CPUs only support 8 threads so there is a problem on the H110BTC as I cannot feed Einstein fast enough and 10-minute work units stretch to 30+ minutes with 9 boards. I solve this by limiting the number of concurrent tasks and reporting fewer GPUs to the project than I have.

** I created a program that shutdown the GPUs and reports using a text message here but I have not had a problem since I quit using those 4-in-1 risers and I have a mix of 1660, 1060, 1070, p102-100, p104-100, p104-90 and all work fine. I had to do this because more often than not, the work units would "time out" and another job was assigned and very quickly I would have 100's of errored out tasks.
4) Message boards : Questions and problems : Move data dir on Ubuntu ? (Message 95168)
Posted 2 days ago by Profile JStateson
Post:
You might want to add the "ReadWritePath" and also the "EnvironmentFile" as shown below. Change the paths "/var/lib/boinc" to what you want and move the filles there.
After editing "/lib/systemd/system/boinc-client.service" you will have to run "systemctl daemon-reload"
A discussion of systemctl is here
https://www.digitalocean.com/community/tutorials/how-to-use-systemctl-to-manage-systemd-services-and-units

if something goes wrong use this for debugging
journalctl -xe

I have not moved my files but I have used the environment file at "etc/default/boinc-client" to pass parameters to boinc.
Post if problems and also confirm if you got it working.

[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
Type=simple
ProtectHome=true
PrivateTmp=true
ProtectSystem=strict
ProtectControlGroups=true
ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc 
EnvironmentFile=/etc/default/boinc-client
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle

5) Message boards : The Lounge : The Seti is Down Cafe (Message 95092)
Posted 3 days ago by Profile JStateson
Post:
Collatz has always made me feel stupid.


41 valid tasks and an RAC of 114k or so.


How about 320,000 credits every 5 and 1/2 seconds?

http://www.ukboincteam.org.uk/newforum/viewtopic.php?t=6221

The project is good for credit points only and ranks up there with bitcoin utopia. No scientific value what-so-ever but that is just my honest opinion worth about 2c. I did run up a lot of points on it and also on bitcoin utopia but could have been finding solution for medical problems over at WCG or other more useful work. Again, just IMHO but I didn't know better.
6) Message boards : The Lounge : The Seti is Down Cafe (Message 95084)
Posted 3 days ago by Profile JStateson
Post:
I not k now what metodoth or program you use to spoofed the GPU count, but i could tell for sure, max concurrent & scheduler works totaly different (not broken) from the previous versions than on the 7.16 Boinc. That is why we not use that with the spoofed client we use. Instead of that we manage the number of active cores/threads with CPU usage.

BTW I will remain at the outrage pub for about 1/2 hour, need to work tomorrow soon, hope that will be enought to satisfy the SETI Gods and bring the servers back to life. Tried to find a virgin here to sacrify at the vulcano and that was impossible.


I made a change to my program as I had been applying the 64 to all projects. I am now using the project app_config and setting the # of gpus depending on the project. Since this system has 9 GPUs then the below just limits the count to 4 instead of 9. Seti still has 64 to get through the off-line time. However, the 4000 limit I use did not get me over the 13+ hours.
root@h110btc:/var/lib/boinc/projects/einstein.phys.uwm.edu# cat app_config.xml
<app_config>
 <app>
  <name>einstein_O2MDF</name>
  <max_concurrent>4</max_concurrent>
 </app>
 <spoofedgpus>4</spoofedgpus>
</app_config>

I set the value in cs_scheduler
    // update hardware info, and write host info
    //
    host_info.get_host_info(false);
    set_ncpus();
    iGPU = (gstate.spoof_gpus == -1) ? 0 : gstate.spoof_gpus;
    if(p->app_configs.spoofedgpus > 0) iGPU = p->app_configs.spoofedgpus;
    host_info.write(mf, !cc_config.suppress_net_info, false, iGPU);
7) Message boards : The Lounge : The Seti is Down Cafe (Message 95080)
Posted 3 days ago by Profile JStateson
Post:
My contention is that GPUs should never sit idle, regardless of any perceived debt. Apparently, the software feels otherwise.
I'd be interested to see if you experience anything like this.


Exactly what I have been looking at in the last 2 hours and trying to figure out. I had 4 GPU idle that should have been running Einstein and the other 5 GPUs are running milkyway. This system normally runs SETI and GPUgrid at %100 and Einstein at %0. I added Milkyway at 0 and after a while the Einstein GPUs went idle.

The work count in excess of 64 seem to be "lost work units" and I am guessing that number is not used when checking the GPU count. Both mining systems had a lot of "lost work units": However, I cannot account for something like 300 lost units. I only run Einstein when seti is offline. I clicked on Einstein's "www host schedule log" which duplicate info shown in the event viewer: "...lost tasks..." However, I also saw a strange message "..[CRITCAL] … two instances of the scheduler running.." or something to that wording. I am not running two instances of Boinc. The so-called "schedule" is an Einstein app that (my understanding) arranges to download database items, not just project work units.

There is no reason for the 4 GPUs to be idle. I aborted the Milkyway as I didn't want them stopping Einstein from running. Einstein then started up and, !INCREDIBLY! I got 3 GPUgrid work units. Probably been a week or more since any showed up. 7 of the 9 GPUs are at %100 utilization but I got 2 idle due to the CPU not having enough threads.
8) Message boards : The Lounge : Help Desk Expert? (Message 95076)
Posted 3 days ago by Profile JStateson
Post:
Yea, happened to me too here. At DVDFab I posted CUDA BluRay movie "rip" times for various NVidia boards and became their first "Knowledgebase Contributor". Not sure if that was a good idea but I think I am still allowed one backup for each movie I buy.
9) Message boards : The Lounge : The Seti is Down Cafe (Message 95072)
Posted 3 days ago by Profile JStateson
Post:
The extended outage allowed me to notice that a 4 core (8 thread) CPU cannot feed 9 GPUs running Einstein. I had to configure for 4 concurrent Einstein and 5 concurrent Milkyway and in addition had to scrap the "64" spoofed GPUs as that got too many Einstein. I had resources set to 0 but got way more than 64 work units. Should have gotten 1 for each GPU but I am looking at 110 on one mining system and 241 on another. Resource on both for Einstein was 0 so something not right.
10) Message boards : Questions and problems : Building client only, fails because of missing libnotify (Message 94650)
Posted 16 days ago by Profile JStateson
Post:
Thanks for your answer. I'm building in a , with no packages available.

I'm more concerned by the fact that libnotify is required while I'm trying to not build the manager :)

As mentioned by keith, _autosetup is the key an if errors show up then a problem

There is no configure file. Those 2 lines of code are in configure.ac
running _autosetup uses configure.ac and possibly other files to create "configure"
configure.ac is supposed to have that test for the manager as that is how it is determined if the manager gets created or not.
If you see an error message like "cant find wxwidgets" then you accidently included the manager.

walk-through below
https://boinc.berkeley.edu/forum_thread.php?id=13059&postid=92381#92381

I have no idea what "simili linux-from-scratch" is. Do you have bash or dash or something else. what version?
I myself got caught by a script that behaved differently than what I expected as it used sh instead of bash. Does your configure file have "#! /bin/sh" at the top or something else?

If you have
./configure --disable-server --disable-manager
that test of line 36044 will take the no path and continue on
else it will take the yes path and continue on

IN NO EVENT WILL IT GENERATE A SYNTAX ERROR UNLESS THE OS CANNOT PARSE IT.

suggestion: write a small batch file and run it with those lines of code something like
if test (whatever) = yes; then
echo "found a yes"
fi
If you get a syntax error then problem with whatever is running the script.

also good is "bash -x ./configure" assuming you have bash. else source, else I don't know.

HTH

[EDIT] My opinion is only worth 2c. It used to be worth a lot less but I got promoted to "Help Desk Expert" so maybe the 2c is good.

The above assume you got the source from GitHub after having selected the branch "client 7.16.3" else all bets are off
11) Message boards : Projects : Access Android desktop remotely? (Message 94646)
Posted 16 days ago by Profile JStateson
Post:
You can use add-ons such as BOINCTasks Mobile (requires Windows) and AndroBOINC



1) Was not aware of AndroBOINC. Went there and looked around and found screenshots
https://code.google.com/archive/p/androboinc/wikis/ScreenShots.wiki

they will not display even with JavaScript enabled on Google's chrome. Edge requires a policy change to run JavaScript and I suspect the problem is something else. /svn/www/projects.png is not a valid URL, it is folder. I tried the export to GitHub but that failed.

Where can I see screen shots? I do not have android devices.

2) I use splashtop to access the windows system running Boinctasks. Probably not much different than using mobile boinctasks on iPad under safari except the tiny screen on the iPhone requires zoom and panning. $10 a year gets me remote access from anywhere, not just the subnet.
12) Message boards : BOINC client : Support for Visual Studio versions newer than 2013? (Message 94640)
Posted 17 days ago by Profile JStateson
Post:
I have been building the client using VS2013 and also the latest Linux gcc (GitHub)
Recently was able to build the milkyway app for windows on Linux using mingw cross compiler (githhub)
Also built TBar's "special seti" source on Linux using latest gcc and CUDA libs (found zip at forum)

Was looking at building a windows version of that seti app and found a problem:

1>CUDACOMPILE : nvcc warning : nvcc support for Microsoft Visual Studio 2013 and earlier has been deprecated and is no longer being maintained
1>  support for this version of Microsoft Visual Studio has been deprecated! Only the versions between 2015 and 2019 (inclusive) are supported!


This is not a problem for the client as it does not run any CUDA code. However, the app clearly needs to be built with CUDA and I am guessing the newer libraries from NVidia might not be linkable with object code build by VS2013. That seti app uses source code from the client, especially include files and using VS2017 will require mods to the sources. I can try building the seti app using VS2017 and was wondering if there is any active work in making the client compatible with VS2015 or later? Possibly the seti app is best built with the mingw cross compiler instead of any MS product.
13) Message boards : Questions and problems : Data breach notification on Boinc.berkley.edu? (Message 94626)
Posted 17 days ago by Profile JStateson
Post:
I have never seen that pic nor was I even aware of this capability. I do use chrome for first visits or searching as I have chrome locked down. My other browser is Edge. I don't like it but it does work better on forms mainly because I keep chrome on tight leash. Sometimes chrome wont even show a required "captcha" popup because I loaded it with so many blocking extensions.

My normal desktop "office" system has McAfee via Dell and I pay for subscription to McAfee. OTH my surface pro has only windows 10 plus I do pay for Malware Bytes premium. One thing I noticed on the surface pro. If I browse to Seti@home and select "Number Crunching' and then the most popular thread "server panic" I ALWAYS get a warning that a trojan was found. Some site in u.nu had or has a trojan or is well known for poor security and is on Malwarebytes list. McAfee shows no problem, but who knows? The following is a screen grab from my SP4. BTW SETI has a "server panic" so often they start a new thread as the messages are too long. Currently # 118 If you read the message behind the "trojan warning" you can understand why they constantly have panics: a 20,000 WU cache size and any # of gpus you want (as long as you are a member of the club).



[edit] Thanks for letting me correct this post.
14) Message boards : Questions and problems : Big-little configuration and Boinc setup (Message 94625)
Posted 17 days ago by Profile JStateson
Post:
Thanks for posting this. I was unware of the big little terminology and went and read up on it here
https://en.wikipedia.org/wiki/ARM_big.LITTLE

My take: Intel extends battery life by reducing the clock speed when cpu not being used much.
ARM has the potential of switching to a core that has fewer transistors in addition to reducing the clock speed.
However, the OS has to implement the strategy and the applications needs to be tailored.
The article indicates that if one app needs a big core than all switch and vice-versa but better operating systems and better tailored apps can be more efficient.

I remember running a boinc app on a blackberry, forget what the Android version was but it made for a really good hand warmer when crunching.
15) Message boards : Questions and problems : iPhone credit app (Message 94616)
Posted 17 days ago by Profile JStateson
Post:
Just realized something is missing. Picture is screen grab from iPhone X and looks like there is more info under the "Average Credit .." Does anyone know what is there or if there is something to click on like to go to the project forum?

16) Message boards : Questions and problems : iPhone credit app (Message 94605)
Posted 18 days ago by Profile JStateson
Post:
Installed it. Nice that it has a link to the forum here.

Some possibilities as the source code is on GitHub:

Forum link for each project
Something like a stock ticker showing rise or drop for each project
Need eyeball not "X" on the password line plus should prefill email once first email is entered

Have never developed for iOS. Do not even know if apple has open source tools like gnu C or not.
17) Message boards : Questions and problems : "Phantom" GPU devices showing up in 7.16.3 and 441.66 again (Message 94601)
Posted 18 days ago by Profile JStateson
Post:
Ran some more tests after talking with Dell and it turned out the fan was not the problem. The NVidia board is running the fan at %100 which is ruining my hearing as well as the fan.

Just removed the "read only" coproc file and started boinc and it wrote out a good coproc_info.xml file that actually matched the one I had edited.

The board arrangement is the same. Maybe it needed another reboot for the "cleaner" to work.

Turned out the "basic" warranty (have 40 days left) covers the video board so they wanted proof so I took a lot of pictures. GPUz was helpful as it showed 5000 rpm and "no load" on the bad board and 1100 rpm on the good one also at no load. It also shows the history which is as good as a video.

I think an issue should be brought up about that coproc_info file. The detect GPU should never write out identical GPUs as the same address. If boinc has no control over the program doing the writing (which I suspect) then for sure when the client reads in the info file to see what is there it should ignore duplicates at the same bus address. Unfortunately, the ATI behavior is different.

https://stateson.net/images/coproc_normal.png
18) Message boards : Questions and problems : "Phantom" GPU devices showing up in 7.16.3 and 441.66 again (Message 94594)
Posted 18 days ago by Profile JStateson
Post:
Went back to feb 2019 and got the AMD RX-570 zipped coproc_info that I had provided earlier in the year when the problem first arose..

There is a difference, although both coproc info files have an extra pair of GPUs, the arrangement is not the same as nvidia. In this case I deleted the last two sections before making the file read-only.

	device_num, device_index
OCLati0		0	0
OCLati1		1	1
OCLati2		2	0
OCLati3		3	1



C:\Users\josep\Desktop\debug coproc>fc OCLat0.txt OCLat1.txt
Comparing files OCLat0.txt and OCLAT1.TXT
***** OCLat0.txt
      <opencl_driver_version>2766.5</opencl_driver_version>
      <device_num>0</device_num>
      <peak_flops>5095424000000.000000</peak_flops>
***** OCLAT1.TXT
      <opencl_driver_version>2766.5</opencl_driver_version>
      <device_num>1</device_num>
      <peak_flops>5095424000000.000000</peak_flops>
*****

***** OCLat0.txt
      <opencl_available_ram>4294967296.000000</opencl_available_ram>
      <opencl_device_index>0</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
***** OCLAT1.TXT
      <opencl_available_ram>4294967296.000000</opencl_available_ram>
      <opencl_device_index>1</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
*****


The nvidia coprioc info lists 2 CUDA devices so if more than 2 OpenCL device then a clue there is a problem. There is no count of actual cards nor do any of the OpenCL have duplicate sections so the ATI problem I harder to solve if just analyzing the file.
19) Message boards : Questions and problems : "Phantom" GPU devices showing up in 7.16.3 and 441.66 again (Message 94593)
Posted 18 days ago by Profile JStateson
Post:
I know how it happened and what can be done to fix it but not why.

How: Had to replace blower fan on one of two boards on my office desktop, long story, but ended up with the two boards back in but the slots were reversed. Installed 441 after Microsoft put in 3xx as it seems reversing the PCIe slots confuses windows.

Boinc showed 2 CUDA and 4 OpenCL devices with the pair of extra "phantom" GPU's attempting to crunch. Revo Uninstaller, clean install of 441 did not solve the problem. The Revo showed a mix of 339 and 441 but the clean install should have worked.

Looked at the coproc_info xml file
header
cuda0
cuda1  
opencl  num,index
OCLnv0  ===> 0,0
OCLnv1  ===> 0,0
OCLnv2  ===> 1,1
OCLnv3  ===> 1,1


C:\Users\josep\Desktop\debug coproc>fc OCLnv0.txt OCLnv1.txt
Comparing files OCLnv0.txt and OCLnv1.TXT
FC: no differences encountered


C:\Users\josep\Desktop\debug coproc>fc OCLnv2.txt OCLnv3.txt
Comparing files OCLnv2.txt and OCLnv3.TXT
FC: no differences encountered


C:\Users\josep\Desktop\debug coproc>fc OCLnv1.txt OCLnv3.txt
Comparing files OCLnv1.txt and OCLnv3.TXT
***** OCLnv1.txt
      <opencl_driver_version>441.66</opencl_driver_version>
      <device_num>0</device_num>
      <peak_flops>8186112000000.000000</peak_flops>
***** OCLnv3.TXT
      <opencl_driver_version>441.66</opencl_driver_version>
      <device_num>1</device_num>
      <peak_flops>8186112000000.000000</peak_flops>
*****

***** OCLnv1.txt
      <opencl_available_ram>3726508031.000000</opencl_available_ram>
      <opencl_device_index>0</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
***** OCLnv3.TXT
      <opencl_available_ram>3726508031.000000</opencl_available_ram>
      <opencl_device_index>1</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
*****


The gpu detect program wrote out duplicate entries for the same GPU. My fix was to delete the OCnv1 and OCnv3 and set the attributes of the coproc_info.xml file to read only.

Suggestion: The program that writes out that file should check for duplicates. Alternately, the program that reads it in should do a check.

other thoughts: clean uninstall should have worked. possibly I should have disconnected the ethernet to prevent windows from re-downloading the same 339 (?) driver. I was instructed to reboot several times to removed 441 and 339 stuff. Since I was busy with replacing the fan I may not have responded in time to continue the uninstall.
20) Message boards : Questions and problems : problem setting up anonymous platform - need help (Message 94583)
Posted 20 days ago by Profile JStateson
Post:
Oince I put
<dont_check_file_sizes>1</dont_check_file_sizes>

into the cc_config.xml then there was no urgency for an anonymous platform and I deleted the app_info.xml

From memory I think I had the following
<app_info>
<app>
<name>milkyway</name>
</app>
<file_info>
<name>ati_milkyway_separation.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway</app_name>
<version_num>1.46</version_num>
<platform>windows_x86_64</platform>
<plan_class>opencl_ati_101</plan_class>
</app_version>
</app_info>


I put the above (or something like it) together after looking at
https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3987 and comparing it to the one that Tbar released for SETI.

starting with the first <app> above the first "101" and I changed nvidia to ati
the "factory" app I have been using is milkyway_1.46_windows_x86_64__opencl_ati_101
so I guessed and broke that down in to the 4 parts name, ver, platform, class

In order to try the above xml I will have to run down my WU count from 850 to zero, set resources to "0" and exclude all but 1 gpu and only process 1 work unit at a time else I might dump a lot of good workunits due to my misconfiguration of the anonymous platform. However, I got plenty of free time and can try getting it to work.


Next 20

Copyright © 2020 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.