Posts by Gary Roberts

1) Message boards : Projects : News on Project Outages (Message 112535)
Posted 15 Aug 2023 by Gary Roberts
Post:
Thanks very much for the update. I'm pleased for you guys that things seem to be going to plan :-).

I'll be in bed by then so hopefully when I next check around 7:30 - 8:00 pm UTC that evening (5:30 - 6:00 am UTC+10 Wednesday my time), I'll be able to set networking back to 'always' and upload all the accumulated results. My script allows me to stagger the uploads (and I have plenty of spare time) so I intend to put a gap of say at least 90 secs between hosts. It will take 3-4 hours (or more) for all the hosts to finish the job. I won't need to download anything until well after the uploads have all finished, so I have no need to push things. After that, I can gradually return to the usual work cache size in small steps over the next couple of days if necessary. The script can do all that for me so it's very easy to get back on track.
2) Message boards : Projects : News on Project Outages (Message 112525)
Posted 14 Aug 2023 by Gary Roberts
Post:
We are shutting down Einstein@Home.
Yes, pretty much right at 5.00am UTC.

I had my hosts store ~3 days of work and now they all have network activity suspended for 2 days until approximately 6.00am UTC on Wednesday. Hopefully the project will be back up by then and at least some of the initial flood of connection attempts will have subsided before any of mine add to the chaos :-). If things go faster than planned, I can easily turn on network activity early and if there are delays I can easily extend the suspension. Hopefully I have enough work on hand to avoid adding to the congestion when service resumes.

I'm sure glad that years ago I developed scripts to give me lots of control over the full fleet of hosts. I'd hate to be doing this on a host by host basis :-).
3) Message boards : Projects : MLC@home output file absent (Message 102356)
Posted 28 Dec 2020 by Gary Roberts
Post:
My own personal policy as a moderator is to ignore basic help requests delivered by PM. If someone persists, it's little trouble to request them to start a forum thread, giving full details. I point out that they'll likely get better answers from the opinions of a wider population where sub-standard answers tend to get 'corrected by peer review' :-).

And just to add to what Keith mentioned, and since my earlier comment referred to a message posted at Einstein a long time ago with no response there, my comments in this thread related to what to do at Einstein and not necessarily at any other project.

That advice was based on the fact that Einstein staff rarely get involved in the day to day answering of basic help questions. However, they do respond positively if a moderator (a user stupid enough to have volunteered to try to answer basic questions) has asked them to look into something that seems quite likely to need attention. I have done that a couple of times recently with positive results. I regarded the BOINC wide teams thing as something trivial enough to NOT justify a request from me. I had no experience with it so wasn't prepared to give potentially unreliable advice at the time. I thought that some other volunteer might be able to comment about it.

Finally, I probably should have expanded on the term "staff". Staff doesn't include moderators UNLESS there is also an extra term like "Administrator" attached as well. At Einstein, there should be no issue with knowing who is staff and who is just another volunteer/user. I don't know about other projects. The term I should have used is "Technical Staff" because that very small sub-group are the ones that get to fix technical issues. The much larger group of academics and researchers, doctoral and post-doctoral students, etc, do not handle the day-to-day technical issues. Undoubtedly some of them may write specialist code but it is people like Bernd and Oliver who get to fix the technical issues when they occur.
4) Message boards : Projects : MLC@home output file absent (Message 102351)
Posted 28 Dec 2020 by Gary Roberts
Post:
In e.g. Einstein@home forum I opened a thread about a month ago, zero reply. Last reply in complete forum is 9 weeks ago there.
If by "complete forum" you are referring to all the Einstein boards, or even if you are only referring to the 'Problems' forum where you posted, you can't have looked very hard because questions get answered almost every day there.

Let me give you a tip. Probably at most mature projects, the staff/developers tend to leave the grunt work of answering every day questions to volunteers. The staff tend not to have the time once some volunteers start shouldering the load. So, if you don't get an answer, it surely means that none of the volunteers understands the question or has the experience to answer it. Even blindingly obvious and trivial questions get answers to them posted. Nobody deliberately withholds known information, no matter how obvious the answer might be. Nobody wants to snub you.

So, if you don't get an answer in a day or two, and if you are sure the problem is not a trivial one, with no easy alternative, you could attempt to attract staff attention by sending a PM. If you browse the News or Technical News channels you will easily find the staff who post there.

Or, you could dig a bit deeper for yourself. In what you posted at Einstein, there was a link to the Boinc wide teams site. Did you read what was there very carefully? Did you follow all the instructions given? The very first paragraph tends to suggest that perhaps Boinc wide teams don't work anymore and perhaps you should abandon the concept and manage each project separately. The 4th paragraph instructs you to contact the project staff about updating server software. Right there is the perfect reason for a PM to the Einstein staff in case there is still some sort of option for the mechanism to work. The volunteers who answer questions don't know.

It's common knowledge that Einstein uses an old and highly modified version of the server software so the likely answer is the 'feature' no longer exists or can't be implemented in Einstein's modified version. But at least you would know and could then stop banging on about it. Surely updating team details is not something you would need to do very often? Fix things manually and forget about it.
5) Message boards : Questions and problems : Problem on Linux related to the message "gui_rpc_auth.cfg not found. Try reinstalling BOINC" (Message 102224)
Posted 18 Dec 2020 by Gary Roberts
Post:
I run Einstein@Home on a very large bunch of computers using a 'much overlooked and under-appreciated' Linux distro called PCLinuxOS - PCLOS to its friends. It's a "rolling-release" distro - install once update forever. The potential show stopper with that release model is that a poorly implemented update can wreck the whole system. I've been running it since 2006 and I've never had that problem happen to me.

There is a single, well maintained repository. Nothing gets into it without the say-so of the boss who does the bulk of the packaging. I don't know how he finds the time, but the quality of the packages is spot on and issues get resolved extremely quickly. To overcome any potential 'rolling-release' headaches I maintain my own full local copy of the PCLOS repo and clone it at times of known good stability. Currently, a clone copy is ~36GB and I have about 20 dated clone copies going way back to around 2012, all on a 2TB external USB drive that's only half full.

I created a new clone (dated 29 Nov 2020) and chose to test it by updating a machine that was last updated in March 2018, and had been working fine. It was one of a bunch I'd decided to shut down to limit my summer heat problem. I don't usually try updating after such a long interval (I usually fresh install from a fully kitted out remaster) so I was interested to see what happened. I actually expected it to fail. There wasn't a single problem. I chose several others (different hardware) with the same result. 29 Nov 2020 is obviously a keeper.

PCLOS refuses to package BOINC. They did many years ago but the boss was frustrated with what he referred to as "crap that isn't even alpha quality". I've always used the self-extracting archive on the BOINC website until they stopped making those. The last one I used was 7.2.42. In early 2017 I bit the bullet and followed the instructions for building BOINC on unix that I found on the website and was able to work out the full list of development packages needed. That first one was quite an adventure but it's very simple these days. I had built 7.16.5 earlier this year and with the PCLOS 29 Nov 20 repo a keeper, it seemed like the perfect time to build something a little more recent. I chose 7.16.11 which I built yesterday.

I tested the new build on two widely different machines and had no issues with the way I launch the client or with a local or remote manager connecting with the running client. I also updated the OpenCL capability on those machines using the Red Hat flavour of amdgpu-pro 20.40 from which had I extracted the OpenCL libs. One machine was using an RX 460 GPU, the other an RX 570 and everything was normal - so I added the 2 machines to the list of hosts for automatic control and went home.

I use a series of scripts that check all aspects of all 'production' machines. The two prime functions are to monitor regularly for 'misbehaviour' (once per hour, 24/7) and to control work fetch and data file download behaviour. To find hosts that need attention, the first thing is to make sure the host is 'pingable' and that both BOINC and the science apps are running. To make sure nothing is 'spinning its wheels', I use the kernel's tools for tracking the use of CPU cycles. My main interest is in GPU tasks and the kernel can show exactly how many 'clock ticks' per second get used by the CPU in supporting a GPU task. This has proved extremely reliable for detecting a stuck GPU.

For work fetch aspects, I cache data files so that any particular file gets downloaded once and deployed to all machines that subsequently might need it. This works because the control script manipulates the work cache setting. Normal is 0.05 days. Six times a day, the control script verifies a host has all current data and then increases the cache setting to the desired value (currently ~1.5 days) to trigger a work fetch. There is a suitable interval between each host and once all have had time to finish 'feeding', they have the cache size returned to 0.05 days. If any host receives new data, it is deployed to all others and also sent to the cache. The stats tell me that many hundreds of potential data file downloads get saved every day so I feel this is worthwhile.

After going home last night, the script that controls work fetch reported an error with both the hosts that had the new 7.16.11 BOINC version. The particular error message I got was:
Can't get RPC password: gui_rpc_auth.cfg not found. Try reinstalling BOINC.
Only operations not requiring authorization will be allowed.

This was a bit of a surprise because the file has always existed and never previously caused a problem with any self-built BOINC versions. I found the several reports (here and here) but these seemed more to do with the Manager not being able to connect to the client and I'd already confirmed that I had no such issue, since I'd run the manager (local and remote) before adding the machines to the auto-control group. Error messages like this get logged and usually it means that there has been a failure. In this case the message was quite bogus because, on closely inspecting the two hosts involved, both had downloaded new work at the appropriate time and both had been successfully returned to the default 0.05 days setting as per normal.

Since I use boinccmd over the LAN to control each client, I went and read the boinccmd docs again and the following quote gave me a clue:-
If you run boinccmd in the same directory as the BOINC client, you don't need to supply either a host name or a password.

I decided I'd better look at exactly how my bash script uses boinccmd to make the cache size adjustments. Firstly there is the command to adjust the value in the global_prefs_override.xml file. The secure shell (ssh) is used. $ip is the variable containing the target host's IP address and sed is the stream editor that finds the appropriate field and changes the default value (0.05) into the desired value (1.50). There is no problem with that.
ssh $ip "sed -i /min_days/s/0.05/1.50/ BOINC/global_prefs_override.xml"

Secondly, there is the command to get the client to take notice of the change. Once again ssh delivers the command which launches boinccmd from the home directory by specifying the BOINC data directory where the boinccmd app resides. The variable $passwd contains the contents of gui_rpc_auth.cfg.
ssh $ip "BOINC/boinccmd --passwd $passwd --read_global_prefs_override"

I've never had a problem with this before but was immediately struck by the above quote from the docs. So I made the following change to the command string that the secure shell launches and it has completely solved the problem. Essentially, the fix was to cd into the data directory before launching boinccmd.
ssh $ip "cd BOINC ; ./boinccmd --passwd $passwd --read_global_prefs_override"

I decided to post all this just in case anyone else happens to run across these error messages and might not realise that boinccmd can trigger them as well. But, as Richard mentioned in one of his posts about this that I'd read, the message was bogus for me because the operation actually succeeded, despite the error message.
6) Message boards : BOINC Manager : Unfinished Tasks Question. (Message 99259)
Posted 12 Jun 2020 by Gary Roberts
Post:
Keith how can I see the deadline on each of the multiple tasks.
The BOINC client runs in the background. To see exactly what it is doing, you need to open a separate program called BOINC Manager. In addition to observation, the Manager allows you to make changes to, or exert some control over, how the client is behaving. Until you know what you are doing, it's best to just use the Manager to observe.

If you have the Manager open, make sure you set it to "Advanced view" if you want to see and understand what is really happening. If you see just pretty pictures and little info, go to the 'View' menu and select "Advanced".

In the advanced view, there are a number of different 'tabs' labeled "Notices", "Projects", "Tasks", etc. Select the Tasks tab and you will be able to see lots of columns of information for a whole page of current tasks for your different projects. One of the columns will be headed "Deadline". That is how you see what all the competing deadlines are. You can easily adjust the window size and drag the column separators (between column headings) to see all the data for a particular column if the column has an inappropriate width.

You really should look at the available information on all the tabs if you want to have an understanding of what the client is doing and how it is handling different tasks. To know what different projects you are running, look at the Projects tab. If you want to know what applications you are running (quite a different matter) look at the "Application" column on the Tasks tab. In addition to the different tabs, the very top line of the screen has a range of menu items which you should make yourself familiar with.

Exploring the Advanced view in this manner will often allow you to answer your own questions. If there are things you don't understand, you could always consult the User manual which goes into detail about this very basic stuff, complete with pretty pictures.

Some event on my computer is causing an older task to be suspended and a new task to be started.
Could it be a restart after a computer crash?
No.

Maybe you have too large a work cache size and the client has gone into high priority mode rather than the more normal FIFO (First In First Out) mode so as not to miss deadlines if possible. That could easily happen if your machine has been off for a while and the client thinks that this low on-time will be normal behaviour for the future as well. That is the assumption it will make if there has been a period of not running.
7) Message boards : Questions and problems : Certain Projects Hijack Boinc (Message 98411)
Posted 10 May 2020 by Gary Roberts
Post:
The main reason I’ve seen for this behaviour is the project %age setting. If a project has just been added or has had a slack period then the scheduler will give ....
This thread has the title "Certain Projects Hijack Boinc". This is likely to attract others who are equally misinformed. Many think that a 'nasty' project, through its scheduler, can force extra work onto the client. I just wanted to make sure that such readers, seeing your use of the term "scheduler" will understand that the client makes these decisions, not the project's scheduler. The comments here are really directed at the general readership, attracted by the thread title, rather than yourself.

It's certainly true that the "project %age setting" - more properly known as 'resource share' - will affect how much time a given project is allocated by the BOINC client (the bit that runs on the user's machine). It's not the only reason. If a user considers that a project is running too frequently, the initial action should be to go to the project website and check that the setting for resource share is appropriate, and perhaps reduce it if necessary.

The problem is in the second sentence where the scheduler is mentioned. Whilst it's true that the client arranges (or schedules) the order in which work is done on the user machine, the "scheduler" is more likely to be interpreted by the user as that program running at the server end (not the client end) whose job is to respond to client requests for specific amounts of work. It just provides (if it can) the work requested by the client. It cannot have any effect or control on when the client decides to run that work. Of course, if the user has set too large a cache size and the client requests too much work, then the scheduler, being an obedient servant, will simply respond to exactly what is requested. It's then up to the client to process the work whilst meeting both the resource shares set by the user and the deadlines of the tasks allocated.

So that is the second important reason for why a "project hijack" claim might be being made. The work cache size might be too large for the client to be able to process the work in an orderly fashion, within the various deadlines. The client switches to high priority mode and concentrates on one project to the exclusion of others. The answer in this case is to reduce the size of the work cache, rather than to change the resource share.

So, in both situations, it's up to the user to check both resource share and work cache size if they believe that a particular project is "hijacking Boinc". Projects DON'T force extra non-requested work on clients.
8) Message boards : BOINC client : BOINC 7.16.5 for Win, 7.16.6 for Mac released to the public (Message 97565)
Posted 14 Apr 2020 by Gary Roberts
Post:
That's actually correct, because it's also labelled as pre-release on https://boinc.berkeley.edu/download_all.php - he's consistent.
I built my 7.16.5 when it was announced that 7.16.5/7.16.6 were released. I noticed DA's actual words specified version 7.16.6 for the shell archive which was what drew my attention when you posted about your "pre-release" description. I didn't look at the download_all page - I just assumed the new archive would be an official release version and not something on the way towards an eventual 7.18 release.

After I had seen that DA called his 7.16.6, I wondered (for a microsecond) if I should rebuild mine as 7.16.6. I decided that was crazy since the only likely difference between the two would be specific to MacOS - so a bit pointless. Your descriptions of your experience made me a bit curious so I decided to answer your post.

I know you are more than capable of working things out for yourself - far more capable than me. My main reason for documenting what I had done was to encourage Linux users in general to 'build their own' since it's actually a trivial exercise once the build environment is set up. The mantra of "... for Linux, get the new version from your package manager ..." always annoyed me because, believe it or not, there are distros out there that don't package BOINC. My distro used to (a long time ago) but they decided that the dozens and dozens of alpha quality versions on the way to a final release effectively disqualified it from serious consideration. I was quite happy to use the shell archives - until they stopped. That eventually forced me to become self reliant - for which I'm eternally grateful :-).

I have actually built a client from sources, but I failed with the Manager - I don't think I've got all the dependencies properly installed yet.
I did build 7.16.3 when it first came out and used it quite a bit. It identified itself as 7.15.0-pre-release which caused me to investigate and find the set-version script. It was either that build or perhaps the aborted 7.16.4 version (built but never used) where I found a new dependency on wayland-protocols-devel, if I remember correctly. I could easily send you a full list of packages I install, if that is of any use to you. The names won't be quite the same but something might stand out as a likely candidate.
9) Message boards : BOINC client : BOINC 7.16.5 for Win, 7.16.6 for Mac released to the public (Message 97546)
Posted 13 Apr 2020 by Gary Roberts
Post:
And having done some more tinkering, I've now got BOINC Manager to open, and display my running service client. It says it's a pre-release copy, but it appears to work - and it's got the standard File menu and the exit options dialog straight from the BOINC code, whereas the repo Manager has had useful bits stripped out. That's enough for today.
Is the manager extracted from the shell archive describing itself as pre-release or it it telling you it is connected to a pre-release client? Or perhaps even both situations are happening? The file menu of the manager will show you what the manager is. The "connected to ..." message in the bottom RH corner should tell you about the client.

If the manager is pre-release, then DA hasn't quite built it properly :-). If the client is pre-release, the repo you got it from hasn't built it properly.

If both the client and manager belong to a proper release (eg 7.16.5) and the latest source has been cloned from github, then there is a script that needs to be run in the subsequent procedure in order to set the proper version string rather than have the product identify itself as pre-release stuff. These are the steps I use on my build machine, starting from the top of the source tree (/home/gary/src/) and with the cloned code in the subdir boinc :-

[gary@host src]$ cd boinc
[gary@host boinc]$ git tag --list '*/7.16/*'                                  (lists all the 7.16.x versions available)
[gary@host boinc]$ git checkout client_release/7.16/7.16.5                     (specify the specific version to checkout)
[gary@host boinc]$ ./set-version 7.16.5                                       (step to avoid "pre-release" - also runs ./_autosetup)
[gary@host boinc]$ ./configure --disable-server --enable-client CXXFLAGS="-O3 "
[gary@host boinc]$ make
[gary@host boinc]$ cd packages/generic/sea                          (sea stands for self extracting archive - the .sh file you mentioned)
[gary@host sea]$ make                                               (you don't need to build this unless you need it)
[gary@host sea]$                                                    (I always do to keep as a single backup of everything in case needed)

And that all there is to it. Of course, there's lots of terminal output at various stages but as long as it goes to completion without errors (warnings are fine) the binaries produced will run. The biggest problem would be if all the required development packages aren't installed on the build machine. For my distro (pclinuxos) that amounted to something like 150 extra packages on top of the standard build tools (latest gcc, autoconf, automake, libtool, git, etc). I discovered that the majority of the devel pkgs get pulled in automatically by specifying just one - lib64wxgtku3.1-devel. The pkg manager I use can install from a list kept in an external text file. On an earlier build, I just saved the full list so if I choose a different machine, the setup is trivial.

Different distros use different packaging systems and different naming conventions so you have to create your own list. Many are probably fairly similarly named though. Sometimes, a single package in one distro might be represented by several in another - or vice versa. It's usually not too hard to work that out :-).

So, when are you going to start building on Linux ;-) :-).
10) Message boards : Questions and problems : How do I 0.25 CPU and 0.25 GPU? (Message 97543)
Posted 13 Apr 2020 by Gary Roberts
Post:
The only thing the wiki how-to doesn't explain is project name.
Do you have any clue about the topic of project level configuration?

The app_config.xml mechanism that was linked to doesn't have any entry for "project name".
It's not needed since such a user constructed file only needs to be placed in the correct project sub-directory.
Perhaps you meant <app_name> or just <name>? If so, those are documented so you're wrong again.

Before firing off blatantly incorrect information, maybe it would be wise to check your 'facts' first?
11) Message boards : Questions and problems : No tasks being supplied (Message 97510)
Posted 11 Apr 2020 by Gary Roberts
Post:
This all seems to happen without error, but the boinc process receives no tasks.
You need to find the file Jord mentioned - stdoutdae.txt. According to your script, it should be in $JOB_DIR. During startup, the client should write all the startup messages and the response from Rosetta in that file. If the client exits for some reason, your final command would remove all the evidence so perhaps that's why you can't find it. Have you looked in the $JOB_DIR directory while the client is running?

The client is stored in $BOINC_DIR but you cd to $JOB_DIR and run it from there. You don't specify a PATH when launching the client and you don't copy the client into $JOB_DIR. I assume $BOINC_DIR must be in the path that your script has access to. The --redirectio option seems to be just for specifying a different filename (or path) than ./stdoutdae.txt and so shouldn't be needed. I'm not familiar with how you are running the client. I'm using Linux and I compile my own clients and run them from ~/BOINC. I have built and run 7.16.5 and there is nothing unusual about stdoutdae.txt. It continues to be used in the BOINC directory where the client is installed.

The query I would have about your script is to do with using both --fetch_minimal_work and --exit_after_finish. According to the documentation, exit_after_finish is some sort of debugging option, whereas the usual option to use with --fetch_minimal_work should be --exit_when_idle.

That's sounds like a better fit so perhaps you should try that. If you still get no work, you need to examine the startup messages in stdoutdae.txt, before $JOB_DIR gets deleted.
12) Message boards : The Lounge : Wishing important projects would start supporting GPU crunching! (Message 97451)
Posted 10 Apr 2020 by Gary Roberts
Post:
Hey all you guys insisting on wandering down memory lane ... don't you have any protocols about staying on topic?? :-) ;-).

Few do GPU crunching, and when they do (like Einstein), they don't do it well.
What was that saying about the poor workman and his tools???
I don't seem to be able to quite remember it ...
13) Message boards : Questions and problems : AMD's Radeon open compute (RocM) has problem with boinc (Message 96947)
Posted 20 Mar 2020 by Gary Roberts
Post:
Been running the tests at Einstein using their beta app that relies on the RocM driver.
I've highlighted the bit that is just plain wrong. There is no app that needs ROCm. Maybe it would be a good idea to read the whole thread in it's entirety and the following key points would then emerge.

    * A user who had a properly configured ROCm system posted about the GW app crashing.
  • The precise error message (clearly listed) pointed to a coding style in the Einstein app that wasn't supported under ROCm.
  • When Bernd's attention was eventually attracted, the problem got passed to the app author/developer.
  • A 'fix' was quickly developed and a test app was distributed that solved the problem for ROCm.
  • I ran this app deliberately on non-ROCm systems to make sure there were no regressions. There was no problem.
    * Bernd has since taken the app out of beta. It became the default app. To my knowledge, there continue to be no app related problems.



I suspect any problem you have is down to you trying to use non-ROCm-compliant hardware configurations. Have you really tried reading some documentation? Here are a couple of key points from that link that might be appropriate to your setup. You should read the whole document carefully to really be sure your setup is fully compliant with all the restrictions.

As such, by default ROCm requires that these GPUs be installed in PCIe slots with PCI Express 3.0 or higher capabilities with transfer rates of 8.0 GT/s in either x16 or x8 lanes. The system configuration can have the PCIe slots directly on CPU’s root port or a PCIe switch, but everything between the CPU and the GPU must support atomics.


Note that the physical PCIe slot size does not guarantee support for ROCm. Some motherboards have physical x16 PCIe slots, but the PCIe connector is electrically connected as PCIe Express 2.0 to the southbridge. Since the PCIe slot connector matters to the GPU, care must be taken to not place them in on motherboards configured this way.


The ROCm kernel driver logs if ROCm capable GPUs are installed on system that does not support PCIe atomics.

Example text from kernel log:

kfd: skipped device 1002:7300, PCI rejects atomics


You actually posted that very message (shown above) over at Einstein. I know nothing about ROCm but the documentation (which I've just now found and read) seems to be saying that your problem is your non-ROCm-compliant hardware setup. In other words, nothing to do with any purported conflict between ROCm and BOINC. You should try to be less misleading in your choice of thread title.
14) Message boards : The Lounge : The Seti is Slumbering Cafe (Message 96459)
Posted 6 Mar 2020 by Gary Roberts
Post:
On April 1st, ET will come surfing in on a continuous gravitational wave.

I wonder where that might be detected .... :-)
15) Message boards : BOINC Manager : Boinc manager does not connect in openSUSE 15.2 (Message 95364)
Posted 21 Jan 2020 by Gary Roberts
Post:
Gary, I do not even have such a file anywhere and if you look back in my posts I listed the output of "service boinc-client status" that
says that the client is running.
Provided your boinc-client hasn't been modified by whoever packaged it, it will write its startup messages to the file I indicated. A previous version of that file (if it has been rotated) would be called stdoutdae.old. If you have neither of those two files, there are two possibilities that spring to mind. Either, "service boinc-client status" is lying to you and the client has never started -OR- the packager has altered the standard behaviour of the client to direct the startup messages elsewhere. For that second option, you would need to ask on the OpenSUSE forums. It's quite possible they may have done something like that.

If the client is running normally, and if the manager can't connect to it, there would have to be a reason, which would probably show up in the startup messages. If you are certain the client is running, you need to find those messages to diagnose the problem - as was asked by the very first person who responded to your question.

Can you open a terminal session and run something like 'ps -A | grep boinc' to get the process ID of the client? That would be one way of confirming that there is a running client.
16) Message boards : BOINC Manager : Boinc manager does not connect in openSUSE 15.2 (Message 95337)
Posted 20 Jan 2020 by Gary Roberts
Post:
No it is still running, see reply from the status command ::
If the client is actually running, wouldn't it have added the startup information to the file stdoutdae.txt in the data directory?

Have you tried browsing that file? If the startup messages aren't there, I don't think it can be running.
17) Message boards : Questions and problems : Inaccurate "time left" (Message 94733)
Posted 7 Jan 2020 by Gary Roberts
Post:
.... I had cynically assumed this was to encourage people to flock to Einstein to get their global stats up :-)
You obviously didn't think the situation through at all :-).

Richard has succinctly summarised the situation very well, but he has left out one important detail - Locality Scheduling (LS) :-). This is the current description. As I remember it, there used to be a lot more about it. Notice that it's very brief these days and suggests there are two versions, Limited and Standard. Whilst there's some description about Limited, what exists for Standard is just a "highly project-specific version ... used by Einstein@home."

My understanding is that LS was developed by Bruce Allen and David Anderson in the very early days (circa 2004/5 - maybe even earlier) as a means for enabling the Einstein project to be viable in the first place. Bruce is no slouch as a programmer since he is the author of the SMART system used for monitoring the health of (and predicting potential failures in) hard disks. These days he's just the Director of the AEI which runs the E@H project :-).

E@H kicked off in early 2005 and LS allowed volunteers to cope with downloading a very limited sub-set of data, split up into lots of discrete frequency bins so that individual hosts could be sent lots of consecutive 'discrete frequency' tasks that all were based on the same small subset of large data files. In other words, the bandwidth needed for both the project and the volunteers could be effectively managed and minimised. The problem is that no other BOINC project seems to need this ( I don't really know of any) so it's not really surprising that a more general version has never seen the light of day.

It's also not surprising that E@H Devs have put a lot of time and effort into getting their specialised version to work the way they need it to. They did a lot of development and tuning work in the years between 2005 and 2010. They were then faced with the option of changing to new server code and porting all their modifications to that new code base or staying with the "devil they knew". They would have considered the amount of work needed to keep porting that special code every time a new version of the server code came out. I remember seeing comments from Bernd at the time to the effect that porting to the new code on a continuing basis was simply untenable and they needed to stay with what they had.

I'm not a programmer so I make no comment about that. DCF has basically worked OK for many years since that time. GPU apps started appearing around 2011 and there really hasn't been much of a problem with the swings generated by DCF changes because the estimates built into the workunits used to allow the DCF to be relatively close to 1. Until fairly recently, that is. The estimate for the gamma-ray pulsar (GRP) GPU work was always a bit too high so the DCF would always settle below 1. When paired with CPU work of any description, it didn't seem to matter much - as long as the work cache size wasn't too high. There would be a few more CPU tasks than could be crunched in the configured time but still within the deadline for even a 3 or 4 day work cache size. The much faster finishing GPU tasks would quickly counter any slow CPU tasks which pushed the DCF higher, towards or even above 1.

Issues started with the GW GPU app. The estimate is quite wrong but the real problem is that it's wrong in the completely opposite direction to the way it is for the GRP app. There is effectively something like an order of magnitude difference in a stable DCF for GRP tasks compared to that required for GW tasks. I've seen DCF values for GW tasks in the 4 to 6 range. The Devs must surely be seeing this. The real puzzling bit is that there seems to be no attempt being made to reduce this mismatch. Even if they were both a bit wrong but in the same general direction that would be a significant improvement to what currently exists. Maybe it's just a matter of too many irons in the fire and too few people to tend to them. Lots of users complaining might get some attention.

So back to the relevance of the original quote to which this message is responding ;-). If the cynical view was correct, ie., E@H is buying extra participation with credits, then they are doing it in a rather stupid fashion. If you look at the split of tasks you get when having both GPU searches enabled, I'm sure you would see more of the lower credit GW tasks than the higher credit GRP tasks. The reason for that bias (my opinion) is that the project does want to process the GW stuff as quickly as possible since the whole point of the project (for many long years) is to detect continuous GW. They are doing that through getting the scheduler to preferentially send GW tasks when preferences allow it to do so. If the project were trying to 'buy' participation, the GW tasks would be much higher in credit worth and there would be no need to tweak the scheduler to prefer GW tasks.

And finally, shouldn't you be making the complaint from your original message at Einstein and perhaps trying to get other voices to join the chorus? The Einstein Devs aren't likely to notice it here.
18) Message boards : Questions and problems : Trying to maximize the performance of my Mac Pro (Message 93506)
Posted 3 Nov 2019 by Gary Roberts
Post:
... In the meantime, how do I do more work with it (i.e. get my daily credit count up?)
On just the RX 580 alone, you would be able to get a RAC of around 500K at Einstein (gamma-ray pulsar search). Your CPU cores are old and slow (and probably power hungry) and wouldn't contribute a whole lot more. The gamma-ray pulsar search needs very little CPU support when run on AMD cards. It's a different story with nvidia. It's also a different story if your preference was to run the new gravitational wave search using GPUs, currently in beta test. It is hampered by older and slower CPUs.
19) Message boards : BOINC Manager : 7.16.3 on Ermine (Message 93368)
Posted 28 Oct 2019 by Gary Roberts
Post:
... I had assumed the latest gtk3 would be the one ...
Don't know why you had a problem with that. lib64gtk3.0-devel (3.24.12-1pclos2019) was the only gtk lib I had installed so seems unlikely to be due to some gtk2/gtk3 incompatibility. I'm not a programmer so wouldn't really have a clue about things like that :-).
20) Message boards : BOINC Manager : 7.16.3 on Ermine (Message 93348)
Posted 28 Oct 2019 by Gary Roberts
Post:
As a side note about building the latest BOINC version, I'd like to comment about something else that's been 'fixed' for my builds. On previous builds I've done (eg 7.14.2) snippets from the event log look like the following:-

Mon 28 Oct 2019 06:56:27 PM EST | Einstein@Home | <![CDATA[Sending scheduler request: To fetch work.]]>
Mon 28 Oct 2019 06:56:27 PM EST | Einstein@Home | <![CDATA[Requesting new tasks for AMD/ATI GPU]]>
Mon 28 Oct 2019 06:56:32 PM EST | Einstein@Home | <![CDATA[Scheduler request completed: got 2 new tasks]]>
Mon 28 Oct 2019 06:56:34 PM EST | Einstein@Home | <![CDATA[Started download of templates_LATeah1062L11_0364_7323190.dat]]>
Mon 28 Oct 2019 06:56:34 PM EST | Einstein@Home | <![CDATA[Started download of templates_LATeah1062L11_0364_7324821.dat]]>
Mon 28 Oct 2019 06:56:36 PM EST | Einstein@Home | <![CDATA[Finished download of templates_LATeah1062L11_0364_7323190.dat]]>
Mon 28 Oct 2019 06:56:36 PM EST | Einstein@Home | <![CDATA[Finished download of templates_LATeah1062L11_0364_7324821.dat]]>
Mon 28 Oct 2019 06:56:56 PM EST | Einstein@Home | <![CDATA[Computation for task LATeah1062L09_396.0_0_0.0_5504625_2 finished]]>
Mon 28 Oct 2019 06:56:57 PM EST | Einstein@Home | <![CDATA[Starting task LATeah1062L10_404.0_0_0.0_14980735_2]]>
Mon 28 Oct 2019 06:56:59 PM EST | Einstein@Home | <![CDATA[Started upload of LATeah1062L09_396.0_0_0.0_5504625_2_0]]>
Mon 28 Oct 2019 06:56:59 PM EST | Einstein@Home | <![CDATA[Started upload of LATeah1062L09_396.0_0_0.0_5504625_2_1]]>
Mon 28 Oct 2019 06:57:02 PM EST | Einstein@Home | <![CDATA[Finished upload of LATeah1062L09_396.0_0_0.0_5504625_2_0]]>
Mon 28 Oct 2019 06:57:02 PM EST | Einstein@Home | <![CDATA[Finished upload of LATeah1062L09_396.0_0_0.0_5504625_2_1]]>

On the latest build, all the extraneous opening and closing square brackets and CDATA stuff are gone. The event log is much cleaner now :-). Many thanks to whoever/whatever sorted all that out :-).


Next 20

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.