Posts by rvp

1) Message boards : Questions and problems : Does the new scheduler suspend jobs silently? (Message 44194) Posted 20 May 2012 by rvp_lan Post: Hi, Thanks for pointing your thread. Bravo for your patience to observ and note all this! As I said in my previous post: I try to let the new scheduler do its job during the longest time possible, after then I can decide if all projects are running smoothly and gently together. Actually, my RAC is getting lower, because the hosts are not getting as much tasks as before. But I don't care for score, but for fair crunching for all projects. Actually, with job reserve at 0.2d min and 0.2d res, one of my 24/24 host doesn't receive anymore any job for any project??? It just compute a task for Climate... I'm hoping that it's not due to the very long deadline of Climate... My good old Linux 24/24 host in 6.12.22 is a good comparison's base, because it does receive jobs continuously... Wait and see. To be continued!
2) Message boards : Questions and problems : Does the new scheduler suspend jobs silently? (Message 44127) Posted 13 May 2012 by rvp_lan Post: Hello, There is already a lot of talk about the new scheduler. I've tried to read a lot. -- I got different hosts @ home under Windows XP 32 or 64. -- All upgraded to 7.0.27. -- I crunch for more than 20 projects. Not all of them do have work. -- The host which gives me the worst result with new version is the quad cores CPU + one nvidia GPU always up and connected. I started a thread at Einstein because I repeatedly received a special message in log from their server. http://einstein.phys.uwm.edu/forum_thread.php?id=9438&nowrap=true#117085 The not (yet) resolved (for me) conclusion of the thread is that I can't figure out what could be the most correct settings for "minimum work" and "reserve work" for a host connected 24h/24 with quad cores CPU and a nvidia GPU, for which the scheduler will behave (almost) the same way the older does (I know: not same programming, not same behavior). "Almost" the same way! In this thread, I explain that, as advised, I put debug booleans in the cc_config.xml, tried to disable GPU, re-enabled it, etc. Actually, I have the feeling that the new scheduler suspend projects when it reaches a limit. What limit? I don't understand. There was this chain of actions: -- upgrade from 7.0.25 to 7.0.27 (to follow advice of Albert@Home) -- reset ALL non active projects (those with no WU pending or crunching) -- client received to many WUs at a time on a single request for some projects -- let the scheduler do its job (don't touch anything and wait more than a week) -- After that, it's as if projects don't update themselves -- No more job asked for CPU, only NVIDIA -- I suspended job with to much WUs -- Instantaneously, other jobs start their reset (after a week!!!) and ask for job With the previous scheduler, reset orders for all (almost all) projects terminate within 15~30 mins. Whenever there was job or not to crunch. What I don't get here, it's why because ONE project received too much job, it will saturate and stop all other projects? Even stop them for polling and updating (with or without asking for jobs). Einstein has always been the most regular project from which I always received job. Because of this new message from their server, to test the response of Einstein (which was denied for job since a week), I suspended all other projects, then Einstein instantaneously ask for job and get 4 WUs. With a request for 138240 secs = 4 cores * (0.2d min + 0.2d res) * 24 * 3600. OK with that: all other suspended, it's the only one to work, it ask for all cores. But after that, if I resume all other projects, why do they seem all again denied for polling? The previous scheduler was able to deal with these 20 projects on four cores and all projects seems to fairly and regularly received jobs. I found @ GPUGRID that "best" values for a 24/24 connected host would be 0.2d min + 0.2d res. But obviously, the 2 hosts connected 24/24 do not behave like under the 6.x scheduler. Except if there's a bug inside the 7.0.27, what would be the best values for a such host? Or what would be the best values for having a nice distribution of work between ALL projects? Thanx. Cheers.
3) Message boards : BOINC client : massive work fetch bug in 7.0.25 (Message 44075) Posted 10 May 2012 by rvp_lan Post: I do know that his has been working on a new version of the website for about 18 months and the new site is expected to go live around the end of this month Thx for this info. So I'll wait til further updates of Bam.
4) Message boards : BOINC client : massive work fetch bug in 7.0.25 (Message 43965) Posted 4 May 2012 by rvp_lan Post: The minimum work buffer setting sets the minimum amount of work you're going to request. The maximum additional work buffer sets the additional days worth of work you want to have. Not quite. The meaning of the prefs is: - The client requests work for a given resource when the amount of buffered work falls below min. - It requests (from the highest-priority project) enough work to bring the amount up to min + additional. Nice, thx for this. Now I (guess I) understand correctly what it does: it is not: min < work to do < max but: min < work to do < min + max (additional) So may I suggest that international versions of client be enhanced to better reflect this. In french client language, these settings have been translated with: "minimum reserve work" and "maximum reserve work". Following these wrong translations, I initially set 1 day for min and 5 days for max, but it should be 5 min and 0,1 additional. Is it? If yes, the correct labels should be: "minimum reserve work" and "additional reserve work". Likewise, may be there a mistranslation in Bam prefs' settings, because I do not find these new ones. I guess that Bam prefs' labels haven't been updated to follow new 7.x version. Anyway, should I put my 5 and 0,1 values into: "Connect to network about every" 5 days "Maintain enough work for an additional" 0,1 day Correct? Regards
5) Message boards : Questions and problems : Windows client: File permissions and UNC path (Message 42866) Posted 5 Mar 2012 by rvp_lan Post: Thx for the tip. Actually only the data part are to be located on the network share. The Boinc's binaries stay local to the computer. In the installer, you can choose separatly these two locations.
6) Message boards : Questions and problems : Windows client: File permissions and UNC path (Message 42852) Posted 4 Mar 2012 by rvp_lan Post: Boinc versions: 6.12.34 and test with new 7.0.18 OS: Windows XP 64bits SP2 Hi there, I've tried to uninstall and reinstall Boinc client and manager in order to have my data on a shared network folder (Linux Samba share with full writable browseable permissions for everyone). Between the uninstall/reinstall, I deleted all users and groups created by Boinc previous installation. Also set files permissions of Boinc's data folder for "full control" to "everyone" and removed all other users/groups. With new install, it seems that installer tries to set permissions on the data folder, EVEN if it is set with: UNSELECT "protected application execution"; SELECT "ALLOW all users to control". Error message from dialog box and event viewer is: Product: BOINC -- Error 1926.Could not set file security for file \\server\boinc\. Error: 5. Verify that you have sufficient privileges to modify the security permissions for this file. I've also tried to connect the remote share to a drive letter B:, doesn't work either. Error message from dialog box and event viewer is: Product: BOINC -- Error 1926.Could not set file security for file B:\. Error: 3. Verify that you have sufficient privileges to modify the security permissions for this file. So, I've tried with local data, but still with NO protected mode and ALLOW all users control: installer still create boinc_users, boinc_masters groups, and RESET permissions on folder data (everyone is dismissed)???? Isn't it inconsistent with: allow ALL users to manage? Then I created an unpriviledge user to check: Once logged, trying to launch boinc or boincmanager, it ends with this error: You are not allowed to manage Boinc. Ask your administrator to add you in the boinc_users group. So, it seems to me inconsistent with "allow ALL users to manage Boinc". If I check this first in installer, I don't want to spend time after to add users to a group... I fully understand that when Boinc runs in a service mode, we need special priviledges to stop/start the service, but when Boinc is run in standard executable mode? Two questions: 1) Whatever are the options, are UNC paths supported for Boinc's data? (as a Linux mounted drive in my mind) 2) If we don't want to run Boinc in protected mode (through boinc special user), why installer still create boinc groups and set filesystem permissions on Boinc's data? Regards Seems related to this post: http://boinc.berkeley.edu/dev/forum_thread.php?id=7344&nowrap=true#42830
7) Message boards : BOINC client : Being able to switch off high priority feature of BOINC (Message 41751) Posted 23 Dec 2011 by rvp_lan Post: What you do forget here is that there are scientists behind all the projects who expect results back. I mostly crunch for biological projects!!! So I well understand this. Malariacontrol has a deadline of 3 days. Not much and it may make BOINC go into high priority even when you've been attached to that project for more than a year Crunch for it too since long, it does not go into high priority for me. What is it with the overwhelming need to tell BOINC what to work on next? Huh... No... I don't want to tell Boinc what to do!!! I want it to be fair. I think after the two previous posts, my point was clear: I don't want to interfere, except when I suspect something wrong. Why are you running BOINC, to help the science or to gain all those credits? Does it then matter where you get them from? I could have asked you the same question when you argueing around a technical point of view, rather than admitting that some projects doesn't set their deadlines correctly. Which is a fact, not a conspiracy theory. As you asked, I'm crunching quietly but surely since years, I've been quite coherent with my different posts in different boinc's forums to assure you that credits doesn't matter, science does, and particularly projects for fighting disease or biological research. I think you may have sense this through my previous words. Is someone forcing you to stay with the project(s) that in your eyes do things so damn wrong? Nope!!! It's exactly what I concluded in previous post!!! It's set to a connect to of 0.1 days and cache of 0.5 days. Furthermore, BOINC only runs between the hours of 9pm and 7am when electricity is cheap (5pm and 7am in weekends). It will suspend all work when I start games Glad for this, we have the same parameters! And the same behavior in scheduling time for electricty and games. Except my home servers which run 24/7. I have yet to see any project's tasks run in high-priority. Have you give EoN a try? Have you played with Volpex? I think I was clear enough by precising that there was two kind of high priority projects: newbies and dummies. Again, as previously said: I give time to newbies, I quit the dummies! But, but, but... You are forgetting that you have set project resource shares. Yes, Lattice is getting extra time now but eventually BOINC will not download more Lattice tasks and then your other projects will be paid back the time Lattice took from them. Therefore it doesn't matter if Lattice gets extra time now. What matters is what happens over the longterm and over the longterm your project shares will be honored. Ok... Both Dagorath and Ageless made your point: I though that I was precisely considering the long term, because for me the goal is that the project get all its WUs done (science), not that my clients have their WUs done (credits). In this way of thinking, going into high priority wasn't needed because if the client won't finish a WU in time, the project can remove it and re-assigne it to another client. So, if you both assure me/us that the scheduler is well balancing shares, ressources and downloads, even after a run in high priority, and that there is no stealed or spoiled time possible. I will monitor my clients with a different point of view. Regards
8) Message boards : BOINC client : Being able to switch off high priority feature of BOINC (Message 41746) Posted 22 Dec 2011 by rvp_lan Post: If you do not look in BOINC all the time, then how do you know you get high-priority work all the time (as you claim to have in that same post)? Sorry for the lack of precision in the use of the terms. I should rather have written: I see SOME WUs or some project WUs going in high-priority mode ALL THE TIME. It's not ALL the WUs, on ALL clients, ALL THE TIME. The assertion is correct: I do not look at my clients all the time, but when I do, on the 6 hosts, there's always at least one WU in high priority. The Lattice one I mentionned is still under computation in high-priority mode... My scheduler is set to swap projects every hour. The high-priority mode overload the swapping. This WU has taken time of other projects... In my point of view, it's wrong. That's all. What's your response for Lattice when Einstein or QMC, for the same amount of time to crunch (~29h) give one week more?!!! They, do never go into high-priority... When you see BOINC run your work in high-priority all the time, either: - you're seen as a high-value user with very much trusted computers by the project who send you work with a very short deadline; - or you've got too large a cache for all the projects you're attached to; Some projects seem "being subscribed" to the high-priority mode... If I refer to what you already pointed out to me many times in previous discussions: The WUs I see in high-priority aren't (seem not be) related to a cache problem... (If I well understand what you explain!!! ;-) ) They are (seem to be) related to too short deadlines and only a few projects produce that. - or you just added a project that BOINC hasn't a clue how long the tasks run for, whose estimate is way way way off and for which BOINC has to run some tasks to get a feel of how long these tasks run. Give time to a new project so that stats between the server and the scheduler are stabilized, of course! But if after 1 year, on different computers, a project continues to send ridiculously short deadlines, which in turn send WUs into high priority mode, I think it's also reasonable not to conclude by asking only TO THE USER to do something! My long-term vision of this is to verify what happens when you do not touch anything! From this strict point of view, you can not always return to: check your cache of work etc. I insist: ON LONG TERM, some projects never go into high-priority mode. I have the same cache and scheduler parameters since 2007... Projects which tend to produce high-prioriy WUs are: newbies and dummies... You could not conclude that this is ONLY a matter of settings at client/user side. It's precisely what I'm trying to point out: when healthy projects have stabilized the client performances, when healthy deadlines are calculated, there's no sign of high-priority on my clients... - or you've got too much work from one project whose deadline is very short; Indeed... So what's your response if the project administrators do not care with their ridiculously short deadlines? While other administrators do it well. So, high priority is inherent to a fine scheduler. In no survey mode, to avoid projects with abusive deadlines which spoil time, the only solution is: do not sign in with this project. Ok. Regards
9) Message boards : BOINC client : Being able to switch off high priority feature of BOINC (Message 41744) Posted 22 Dec 2011 by rvp_lan Post: I think in theory the high priority function can suspend tasks and switch to others resulting that none of the tasks are finished before the deadline. If that ever happens then you have valid reason to complain. At that point the suggestion will be to decrease your cache. In all the years I have being using BOINC I have never had a task miss a deadline. There are 2 reasons for that Hello happy boincers and electricity taxpayers! (in case you would not have noticed the two go hand in hand) I come in peace, so do not misinterpret my words if they seem staggered or irrelevant. Thus, english isn't my native language, so I hope to not make too much mistranslation... I have carefully read the various responses and, as often, a convenient point raised by a user becomes a discussion of gurus level 2345, practicing the config file tuning! As usual, in most discussions I have had in the forums, I am committed to (wrongly?) have the perspective of the non-"boinc user". One that uses the client because his neighbor told him it was "good". This person does not understand the computer; does not understand science projects for which it will put a computer available; ultimately, the boinc screen saver is "pretty". I'm sure you see what I mean. It is not derogatory or condescending, this is the case of many users. So, I, who understand (almost all) what you explain, I'm still not agree with your expert explanations, which lead to dive into the configuration file. I agree with Iconized because me neither I do not understand why there is a high-priority mode. Those who have never seen a wu going into high-priority mode on their client, I think it's because you have previously set up your configuration file and you have previously signed for projects that you know that they have a certain maturity and stability. Again, what you argue may be right, but not for the normal user. The normal user does not modify its configuration file, the normal user will sign for simple projects which are "fun", not knowing at all what means "long-term debt" or "amount of work cache"... So the normal user will sign for completely different project, with completely different WU duration, WU quantity or project maturity. Still, the normal user, because he has a today machine, will have a multi-cores processor and a correct memory amount. Nevertheless, should the brilliant boinc's scheduler let plunder the resources made available? I don't think so... I myself have 3 computers that work 24/7 and 3 others working in office hours. I GET ALL THE TIME HIGH PRIORITY UNITS! On the two types of computers. So come and tell that it will never happens when you know how to set your work cache is swollen! No offense. I have great respect for the programmer of boinc's scheduler. I have much less respect for administrators of some projects, which I think, understood very well how one could turn to its advantage the stats and calculations made by the scheduler to switch to high priority. Basically, look at the EON project, it has been 2 years that users are asking for longer deadlines, administrators do not care... I stopped to calculate for EON, the WU were ALWAYS in high priority... Just now, I receive a WU for Lattice which goes immediately in high priority: 29h45 to crunch before 29/12... This deadline is nonsense. It could make sense, if I only crunch for Lattice, which it's definitively not the case. The respectable projects, those that work since a looooong time (Climate, Einstein, QMC, Docking, Enigma, etc) do not behave like that. Even if they have evolved, offered different types of calculation, different binaries, their WUs always come on time, correctly, regardless of the computer on which they run. And I think it's (almost) just because they know how to prepare a deadline!!! So you can not think seriously about "how to fine tuning your configuration file" and completely ignore that there are bad project managers. Some are even unfair, since they (seems to) use the subtleties of the scheduler to their advantage. I'm not in "conspiracy theory" mode, I am in a logic of common sense and a rational use of resources made available by users. I am convinced that there is also a "blinkers effect" (not sure of the term) because administrators do their computation tests on powerful computers with only their own project! Regardless that most users are involved in several projects on much less powerfull computers. So as a conclusion: as a simple user, but an aware one, who have a look to its boinc clients from time to time, without the nose on it all day, who don't want to modify the config file, I would like to have a switch/option in menus to prevent high-priority mode. If WU does not finish on time on my computers, it is because my computers are not powerful enough or not working hard enough (schedule time) and, hey: too bad for me and my stats. OR it's because this WU was not well prepared, rightly or wrongly, and, hey: so much for this (bad) project. But in both case, there is no need of a high priority mode, which can lead to spoil ressources for other projects. Happy Christmas to all.
10) Message boards : Questions and problems : BOINC 6.10.43/6.10.44 no longer released for public (Message 32146) Posted 12 Apr 2010 by rvp_lan Post: Hi Ageless, as always: precious and pertinent arguments delivered. Please, think outside your own box. It's built in for ... people who were complaining that despite BOINC's applications running on the lowest possible priority, it taking up CPU cycles that would slow down their computer. Effectively, this is what some have opposed to me when I tried (years ago) to deploy CNET on the whole computer parc! And now, outside of my box, I get the point. Now that I see the potential of this parameter, I still think that it needs refinements. I insist: when just ONE core is heavily used, the others aren't always used too. Boinc could stop cores progressively by monitoring if there's still an heavy CPU usage after/during a given time. When the CPU stays at 25% during more than 2 minutes on a multi-cores, it could be 'nice' (arf) to stop one core for Boinc, but not ALL cores at the same time. We loose valuable cycles available on other cores. This is particulary true under Unix kernels where there's a fine repartition of the load. You can use the <exclusive_app> and <exclusive_gpu_app> functions of BOINC for suspending BOINC when it detects any of the games entering Windows memory. Another interesting parameters that I didn't know about. But it has always been my point to stay a "simple" user of Boinc! By not becoming a tuning "guru", I try to have a "simple" vision of what Boinc should do to perform quietly without disturbing the user. Regards
11) Message boards : Questions and problems : BOINC 6.10.43/6.10.44 no longer released for public (Message 32102) Posted 11 Apr 2010 by rvp_lan Post: Hi, Thanks for telling me where I needed to make changes for the suspend computation, since that part seems to be awfully twitchy. Same here... At first, I thought it could be an interesting tweak, but on multi-core machines, when doing something "usual" with a browser and boring non-optimized Flash, just ONE core is heavily used, the others aren't. So there's plainty other cores to play with for Boinc. So I just changed that value to 0. Works just like it used to know. All same! I don't get it anyway. Since Boinc is linked to the idle CPU time, why the need of this supplementary parameter? Multi-cores or not, if my system needs CPU, Boinc just cooldown naturally and progressively following idle time available. So, I would rather had put this parameter for the GPU. My kids are complaining about GPU WU which interfere with their greedy GPU ressources games! But most of the time, CPU cores aren't much busy. So, on modern machines with multi-cores, Boinc has always a bit of ressources available to crunch. Curious to see how all of this will evolve when everything will be OpenCL compliant!!! Hope you will not transform us, final users, into gurus of the tweaking parameters...
12) Message boards : BOINC client : What's the trick ?? Boincmgr handling GPU AND CPU WU's (Message 30267) Posted 19 Dec 2009 by rvp_lan Post: Hello Ageless, Thx for the precision. The way that debt is calculated changed between BOINC 5 and BOINC 6.6, so you would have to reset all projects after you update to 6.6 to get a more correct view on things. I'm aware of that. Numerous threads treating about the (almost same) problem. And I've read many before posting myself. But it doesn't work... This has to be more complicated than that. The Windows host on which I updated from 6.6.x to 6.10.x, I already performed the zero_debts trick you gave me earlier, AND resetting some projects. Still, Malaria for example, keep going with this "Message from server: CPU app exists for malariacontrol.net but no CPU work requested". I let this host finishing the heavy Aqua WU it has to do. I will consider re-install this one from scratch with a 6.10.24. Mean erase everything prior to the new install.
13) Message boards : BOINC client : What's the trick ?? Boincmgr handling GPU AND CPU WU's (Message 30263) Posted 19 Dec 2009 by rvp_lan Post: I don't know if the following example could lead to something but: I was able to retrieve and re-install a 5.10.45 x86_64 client. Once done, everything went fine because this client has resetted the CT/LT balances to acceptable values. Each project was now requesting an amount of seconds of work AND GET JOB! This client don't handle GPU, no matter. Now let's install the 6.6.38 x86_64 client over the 5.10.45. It starts to ask for GPU job for each project, but no more CPU job is asked. After 24h of working, no more job AT ALL is downloaded, only the remaining tasks previously downloaded compute with alternance. Let's install the 6.10.24 x86_64 last client over the 6.6.38. It doesn't change anything: no more job asked, no more job got. Let's erase all client binaries and config files, but always keep projects file. Let's install again the 5.10.45 x86_64. Same pattern: Each project is now requesting an amount of seconds of work AND GET JOB! (Edges, Malaria, Ibercivis, WCG, etc, have always some jobs to compute) Let's install the 6.10.24 over the 5.10.45. After 24h, the client is still asking and getting jobs. May be, I step too fast into conclusion, but it seems that there is something really wrong in the 6.6.x trunk which corrupt the config files. The next client generation being not able to correct this and follows the previous config files corruption. This is corroborated on the others Windows hosts. Too of them were upgraded directly from 5.10.x to 6.10.24, these ones ask and receive jobs. The another one from 6.6.36 to 6.10.24, this last one only ask for GPU jobs. In order to be very precise: only one of my windows hosts is running 24/24h. The three other ones describe above are shutdown many times a day. I don't know if this is revelant, but I may guess that the scheduler could be more "in pain" running 24/24h, than running a few hours and being resetted by a shutdown. Exception which confirms the rule: 6.6.36 Linux (Mandriva) client has no scheduler problem at all and receives well jobs 24/24h 7/7d, but hasn't any CUDA/ATI graphic controler. To be continued!
14) Message boards : BOINC client : What's the trick ?? Boincmgr handling GPU AND CPU WU's (Message 29486) Posted 16 Dec 2009 by rvp_lan Post: Following this message, that is, de-installing completely Boinc, emptying the data and config files, the client is now functionning as it should: "requesting new tasks for CPU and GPU" When the client is not following user's preferences (BAM or local), there is definetively something wrong somewhere in: cc_config.xml global_prefs.xml client_state.xml etc Whatever you do with zero_debt, resetting prefs, going back to older version etc, somewhere in the config fileS, something takes precedence over preferences, this is a bug.
15) Message boards : BOINC client : What's the trick ?? Boincmgr handling GPU AND CPU WU's (Message 29470) Posted 15 Dec 2009 by rvp_lan Post: Hi, I've started a thread about multi-threaded and single-threaded wu, but it is now going into a scheduler problem between CPU and GPU job. I've also read some other threads talking about this to get some clues. I completely agree with Seigell when he/she (!) notices that it's now (since 6.10.x) heavy duty to control Boinc client so that it behaves as the preferences are set. I'm fore fully agree also with the term "babysitting" Boinc. In addition, it seems that we could (should?) put our hands in the file cc_config to have better control, but this may interfere with the (BAM) preferences... In the meantime, be assured that current BOINC clients will indeed keep all your CPU cores busy No. It doesn't... Now each minute, the 6.10.18/24 client send a "Sending scheduler request: to fetch work" / "Not reporting or requesting task". But in fact, it never ask for new job... Ageless has already explain the trick with the zero_debts option to clear things, but my client doesn't ask for job anymore, whatever I put aside the cc_config file or the global prefs file. I'm in no position to argue at a technical level. Because I don't want to become a Boinc guru level #32768 to manage my hosts... For years now, I've played the grinchy old user in forums when I face project which ask me (or force me) to become alternately a guru of computer or a project babysitter... None of this is welcome. Boinc was and should remain a simple tool to use, even if what's underneath becomes heavily complex. But as it has been said before in the forum, many users don't know a clue of computer's inside and don't want to play with that. They just want to participate in something brillant for mankind or science. Remember that SETI started this as a "funny" screensaver!!! ;-) My perfect wish of a perfect use is that once I've been warned that I should set my work preferences ONE time at my first subscribing to Boinc network, my wait is that all projects follow these preferences without babysitt ALL my hosts/clients. If I set that I'm agree to participate in both CPU and GPU projects, I shouldn't have to monitor if I'm receiving well GPU WU or CPU ones. No more should I monitor the client when it receives WU multi-threaded or high priority, and become "anxious" at the next rotation of WUs to see if all cores are used... I'm not angry. I could be patient and very regarding about the heavy and nice job done by Boinc's dev teams and waiting for the next version. Until there, just now, what simple move could I perform to have a "x86_64bits quad cores CPU and CUDA compliant GPU" client asking, receiving and computing 24h/24 7d/7 on ALL available cores WITHOUT rummage into the config files, WITHOUT babysitt my debts and especially WITHOUT going into each project preferences to choose CPU/GPU?! Regards
16) Message boards : BOINC client : Scheduler: alternance between multi-thread and single thread WU (Message 29469) Posted 15 Dec 2009 by rvp_lan Post: Hi, I have finaly let passed more than one day to observe how the 6.10.24 behaves. I installed it on the other windows boxes I got at home. They are all monitored through BoincView since years, but only my box do have the cc_config.xml??? After four days running, the multi-threaded Aqua's WU has ended. I have no more multi-threaded WU, neither high-priority one. The Boinc's client use only one CPU but this is because it has only one task to compute!!! And especially the client do not ask for job anymore?!?!? No CPU job, no GPU job... Even if I restart the client, it still doesn't ask or receive jobs. Has it something to do with the trick you pointed out with the zero_debts option? All my balance CT are at more than 60000secs!!!! How should I clear this? I presume it's not realistic at all. On another one, all balance CT counter are at -1800 secs??? The only box which ask, receive and compute jobs is the linux one with 6.6.36 client. On this one, balance CT counters are normally mixed between -500 to 500secs, most of projects have zero. I tried to put aside the cc_config.xml file, then restart the client, in case it would behaves in a more "standard" (default) way, but the client doesn't ask for job either. May be ALL my boxes are "overworked", but this is the first time I see a such behavior generalized on all boxes??? Normally, on the twenty projects in which I participate, there is always at least one project that has something to compute, or one of my boxes that isn't overworked and can compute... I do not understand this... I will try to get back to version 6.10.18, may be 6.6.38, see what happens. Thanx for any clue. Regards
17) Message boards : BOINC client : Scheduler: dead and orphan processes (Message 29342) Posted 10 Dec 2009 by rvp_lan Post: Hi, You'll have to contact the projects about this behaviour. OK. Your plan on using boinc_project is a no-go, as that userID is only used when BOINC is installed as a service. OK. Get it. Since you are using a service installation, you could write a batch file that closes BOINC every so often I understand the principe, but I'm a firm militant of a zero-interaction with Boinc's client!!! I mean: I agree that this could be a bug at the project side, but if Boinc should spread around the universe (!), it has to propose some tools so that the gentle user keeps having its computer with all ressources available. In that particular hypothesis, the gentle user is not able to understand anything about service, neither of creating a batch spawned by a crontab!!! Consider my proposal of a garbage collector as the idea of a gentle Boinc's client who kindly ensure that the space he found on arriving stays clean when he goes! First of all, don't take my proposal and what's above like a criticism! I'm sure there are already a lot of control mechanismes into Boinc that I don't "see" because of a lack of knowledge. I quite well understand [as a former developper] that Boinc's client dev team has something else to do than prevents all kind of crap that could arrives from project side. But... but... In my opinion, as Boinc should be consider to an entity that manage potentially harmfull (for the computer host) programms, it has to provide mechanismes that keep these (bad) programms into a kind of "virtual space", which is completely "waterproof" to the real system space. After all Boinc's client is a kind of system into the system, which handles and shares ressources like CPU, GPU, memory. At these days, in my mind, Boinc is a virtual machine like VirtualBox or VMware. I am even more concerned by what I recall, when I see that some projects take over themselves the idea of russian dolls by offering to other projects to compute through themselves (Yoyo, Docking, Ibercivis, WCG) [wrapper into the wrapper into the wrapper, etc. Not mentionning those which use some lib/dev tools that interfere with hosted applications: Hydrogen with cygwin...] We have therefore, a virtualization in the virtualization of system host! So if BOINC has no strong control of process heritage and left aside processes, a large number of WU can potentially evade the surveillance of the manager and scheduler. Even if Boinc doesn't run in a service mode. Thanks for the infos about the user_id. Regards
18) Message boards : BOINC client : Scheduler: alternance between multi-thread and single thread WU (Message 29340) Posted 10 Dec 2009 by rvp_lan Post: Hi, Updated to 6.10.24, I added the line <zero_debts> into the CC_config.xml, because the file already exists. [We did have a thread about this file and <ncpus>-1</ncpus>, which should have been mis-writed by BoincView and interferes with number of GPU/CPU available.] First restart, <zero_debts>1, same behavior, only the high-priority WU starts on ONE core. Second restart, <zero_debts>0, seems ok, since all cores are used. I will keep you informed after a day, see what happens after few rotations between single and multi-threaded WU. Thanks a lot for you prompt answer! Regards
19) Message boards : BOINC client : Scheduler: dead and orphan processes (Message 29335) Posted 10 Dec 2009 by rvp_lan Post: Platform: Win XP64 SP2, quad cores, GPU CUDA available. Hello, Even with the use of a boinc_project user to frame activity of Boinc's projects, some WUs (Simap, Malaria, etc) sometimes leave their processes in task manager once finished. I can't link a computation error or an abnormal termination to these orphan processes. WUs seem to finish and return well. I stop the client, then I manually kill the orphan processes, then restart the client, no error appears. So I'm quite sure that these are orphan processes and not some processes which decide to stay in memory even if they should not. But these orphan processes, stucked in task manager, sometimes leads to a very abusive memory comsumption... Obviously, this is annoying because my preferences are set to "do NOT leave processes in memory". So if these are dead orphan processes, and since they are tagged with the "boinc_project" user id, could we have a control mechanism which checks for non attached processes (non scheduled anymore) and forces the kill of these orphan processes? Kind of "garbage collector"! Regards
20) Message boards : BOINC client : Scheduler: alternance between multi-thread and single thread WU (Message 29334) Posted 10 Dec 2009 by rvp_lan Post: Platform: WIN XP64 SP2, quad core CPU, GPU with CUDA available. Hello, Since I installed client 6.10.18, I notice problems with the scheduler. When the scheduler handles single-threaded WUs, almost no problem. When the scheduler have to handle multi-threaded WUs (ie AQUA), after some alternances between single and multi-threaded WUs, only ONE core is running for ALL projects. May be more precise: Client start: uses any WU available on ALL cores. Rotation: multi-threaded WU's turn => one WU all cores. Rotation: uses any WU available on ALL cores. Rotation: multi-threaded WU's turn => one WU all cores. ..... Rotation: uses any WU available on ONE core. Rotation: multi-threaded WU's turn => one WU all cores. Rotation: used any WU available on ONE core. This is getting worst when a WU comes with a high-priority mode. In this particular case, only the high-prioritized WU runs, even if other WUs have near or missed deadlines and other cores are available... Actually Yoyo/Evolution@Home has sent a WU with a deadline at 4 january 2010, in high-priority!!! All other projects are squeezed... Unacceptable... Especially on a multi-core and GPU machine... I mean: Evolution@Home may have miss-calibrated their deadline and this is bad. But the scheduler, having three other cores to play with should use them. If I restart the client (running in service), the scheduler still takes in first the high-priority single threaded WU and doesn't launched any other core. If I suspend the task, this is the AQUA multi-threaded which starts, even if they are other WUs closer to their deadlines. I have to suspend both high-priority (Evolution) and multi-thread (AQUA) WUs, then restart the client, in order to get back on a correct multi-core behavior. Then restart suspended tasks. Then falling after hours to the rotation bug: all cores / one core. I read that some stuff are debugged on future 6.10.2x about high-priority mode. Is this case fitting the debug? Regards

Next 20

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.

Posts by rvp_lan