Posts by Mippi

1) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98693)
Posted 20 May 2020 by Mippi
Post:
Hi Jord,
thanks very much for the excellent answer, it is greatly appreciated and very useful for me and works:D

If you do not mind, I would like to ask one more question, but I promise the last one:)
I have a computer which has got 8 cores, but only 6GB RAM and I cannot upgrade it. Unfortunately, it is not enough for Rosetta, so usually 4-5 cores work and the rest wait. I would like to run some less memory consuming projects to use all cores, but it happens that the second project takes more cores than it should. My questions are:
1. Is there any option to prioritise projects? So, the first project takes as many cores as possible and in a case not all are taken the second project can run their jobs? I tried to play with share ratio, but it does not work as I wanted.
2. How can I fix number of cores to a project, e.g. I would like to have 4 cores for Rosetta (even if sometimes Rosetta can run just 3 process due to RAM limitation), 3 cores for nanoHUB and one core for NumberFields and it does not matter if there are jobs for a project or not, cores assigned firmly to a project.

Thanks in advance for your help.
2) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98660)
Posted 19 May 2020 by Mippi
Post:
Thanks in advance and looking forward to having your solutions.
Just to be clear, I would like to use both cards, just the weaker one for a less demanding project and the better one for a more demanding project and I would like to be sure that jobs from each project will be directed correctly.
3) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98626)
Posted 19 May 2020 by Mippi
Post:
Hi Jord,
I have one more question to you: is there any method to assign a project to a GPU? I have got two cards in each station, one is very old and slow and the second one is much faster. They are called in my log Device 0 and Device 1. At the moment I can see that jobs are randomly directed to GPU. Unfortunately some projects do not need so many resources, some need a lot, so I would like to assign more demanding projects to more powerful card.

Is there any method to do that?

Thanks in advance for your help!
4) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98625)
Posted 19 May 2020 by Mippi
Post:
Thanks again, I will install newer kernel and newer BOINC next week, it should solve a lot of problem.
5) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98624)
Posted 19 May 2020 by Mippi
Post:
Hi ProDigit,
If you read my e-mail carefully, I set processing time to 29h, just to save network use, therefore you need to divide your calculations by 3.5 to see how many jobs I can complete.

Anyway, I think I solved the problem, so thanks for everyone's help and input.
6) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98597)
Posted 17 May 2020 by Mippi
Post:
Thanks very much indeed, it is clear now :)
7) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98593)
Posted 17 May 2020 by Mippi
Post:
Hi Richard,
thanks for your reply, I will test it tomorrow, I did not know the options you mentioned.

Thanks!
8) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98592)
Posted 17 May 2020 by Mippi
Post:
Hi MarkJ,
thanks very much for your reply.


If you want to stick with the CLI you can use BOINCtui to visually see what they are doing, one machine at a time.

With multiple machines its best to use BOINCtasks on one machine that you trust (usually the one you use to remote into the others). It doesn't need to run BOINC at all. I'm using a Windows laptop to look after my cluster. If you don't have a windows machine you can run it under wine. You can download it from https://efmer.com/ and click on the BoincTasks option at the top of the screen. If you need help with configuring it just ask. It really makes managing a fleet of machines so much easier when you can see all of them on one screen.

I did learn about BOINCtui and it looks really nice, but I wrote some bash scripts to controll all the machines, so it was not very useful for me, especially you need to connect with each machine separately. I did not know about BONICtasks and for sure I will test it, thanks for your advice. It looks like a piece of software I really need.


If you can update your boinc-client to a later version.

I would like to update BOINC, but it seems that at my Linux version it is the newset version. However, I will update OS on my all stations quite soon, so then I will install newer version.

I do not understand what you stated: Rosetta have stated they start looking at results 48 hours after they send a batch out so setting the default run time as high as your is a waste of effort. Could you explain it more in details, please? I have read many pages on setting target CPU times and on all of them was written that it does not matter as all model needs to be tested anyway, so the longer time you set the more models you can calculate and it makes the process more efficent. However it can be wrong, so why do you think it is waste of time? I set 29h just because I want to reduce newtork use. My computers work 24/7 anyway, so I can use more of their time.


As for the xml files BOINC doesn't use config_aux. It uses an app_config.xml file, It goes in the project specific folder. Under Debian and Ubuntu it would be:
/var/lib/boinc-client/projects/boinc.bakerlab.org_rosetta/app_config.xml

I am very surprised with the fact that BOINC ignores config_aux.xml, it is clearly describbed here: https://boinc.berkeley.edu/trac/wiki/ProjectConfigFile So, how could I know what is taken and what is ignored by the software? Is there any documentation which provides a correct configuration description? Let me know, please.

At the moment I do not want to limit my cores, I want to use the full power of my computers. Maybe I will use it in the future, but not now.
Taking the opportunity I would like to ask if it is possible to configure BOINC that for instance 1 core is dedicated to one project, 2 cores are fixed to another project and the rest of cores for a third project?


Once you've returned at least 11 and they've validated then you can adjust your cache setting. I run 0.1 days with no extra. That gives the project a fast turn around and you don't get overloaded with tasks. I'd suggest you have all of the machines with the same settings seeing as they are the same hardware config. The project don't recommend more than a 1 day cache due to the short deadlines (which are 3 days).

I have already changed the value to 0,1 as you suggested and I will see the effect within next days.

Thanks:)
9) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98566)
Posted 17 May 2020 by Mippi
Post:
Hi Les,
thanks for you extremely quick answer, I am impressed:)

At the moment my settings are 0.1 and 1, so it still should be fine, but of course I will try to change them tomorrow and check if it works. I do not use GUI, but I know in which file I need to change the options. However, all the options are exactly the same on all computers, why 3 of them take much more tasks?

Could you tell me why the xml files did not work in that case, please?

Thanks again!
10) Message boards : Questions and problems : Please, help - too many downloaded jobs/too many jobs in progress (Message 98564)
Posted 16 May 2020 by Mippi
Post:
Hi BOINC experts,
I need your help. Before I write my message, I really spent long hours trying to find a solution in the BOINC documentation, forums etc. Unfortunately, I have not been able to find anything which solves my problems and therefore I would like to ask you for help.

I joined Rosetta with my computer cluster. There are 11 powerful Xeon computers which do not do anything at the moment due to the current situation, so I decided to use them for Rosetta and it has worked OK for 2 weeks, but now I can see a problem with some of them.

I use BOINC 7.9.3 on Linux without GUI, just text mode (command line), I set target CPU run time to 30h (1 day and 6 hours) to minimise network connections. I set 100% CPU use and time and 90% memory, but apart from those I use default settings. 8 computers work absolutely fine, but 3 of them download much more tasks than can be calculated:( For instance, I have 16 cores in each machine, deadline is 72h, so should have 32 work units per machine to be able to complete them. Unfortunately I have more than 60 (sometimes even up to 100), so I miss deadlines or I return tasks which are already calculated by somebody else..... I would like to stress all computers are exactly the same, the same CPU, RAM, HDD, OS and its configuration, boinc version etc. and they have achieved very similar average credit within last month, differences are within 10%. The problem is just on 3 machines, the rest keeps around 24 tasks, so 16 in progress and 8 waiting to be processed which is fine.

I have read the BOINC documentation how to configure a client and a project, I created config.xml in the rosetta project folder with max_wus_in_progress limited to 2 per core and I limited max_ncpus to 16 - no changes at all. Then I created config_aux.xml in rosetta folder with total job limit - again nothing. I read the event log, I cannot find any errors. By the way, it would be great to provide in the BOINC documentation some examples of config file or at least use different font style to show what is a command and what is a parameter, sometimes it is not obvious.

At the moment I control it by stop and start of new tasks allowance manually, but I cannot do that during weekends, so after my return I always need to abort tens of jobs.

Could somebody help me, please? Thanks in advance.




Copyright © 2020 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.