Limit number of jobs in progress to be 2 at most.

Message boards : Server programs : Limit number of jobs in progress to be 2 at most.
Message board moderation

To post messages, you must log in.

AuthorMessage
Saad

Send message
Joined: 23 Oct 17
Posts: 17
Message 87560 - Posted: 10 Aug 2018, 0:29:20 UTC

In our project jobs are highly compute intensive and as well requires a lot of memory. So I want to limit number of jobs in progress to be limited by two. Where as boinc server by default create slots equivalent to number of cores present in hosts usually 8 slots for 8 jobs as modern machines have 8 cores normally. I tried to follow this https://boinc.berkeley.edu/trac/wiki/ProjectOptions option and edited my config.xml. After edition bin/start command does not read the config file properly and gives error. So I need to know a way to limit the number of jobs in progress simultaneously to be equal to be two.

Query 1 : As described above I need to know how to make jobs in progress to be maximum 2 or 3 at given time.? Below is my config file. And do you have to keep to two files for that config_aux.xml (to set job limits) and as well as simple config.xml.


 <daily_result_quota>1000</daily_result_quota>
        <one_result_per_user_per_wu>0</one_result_per_user_per_wu>
        <max_wus_to_send>50</max_wus_to_send>
       [b] <project>
        	<max_jobs_in_progress>
            	<total_limit>
                	<jobs>3</jobs>
            	</total_limit>
        	</max_jobs_in_progress>
        </project>[/b]
    </config>
    <tasks>
        <task>
            <cmd>antique_file_deleter -d 2</cmd>
            <period>24 hours</period>
            <disabled>0</disabled>
            <output>antique_file_deleter.out</output>
        </task>
        <task>
            <cmd>db_dump -d 2 --dump_spec ../db_dump_spec.xml</cmd>
            <period>24 hours</period>
            <disabled>1</disabled>
            <output>db_dump.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./update_uotd.php</cmd>
            <period>1 days</period>
            <disabled>0</disabled>
            <output>update_uotd.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./update_forum_activities.php</cmd>
            <period>1 hour</period>
            <disabled>0</disabled>
            <output>update_forum_activities.out</output>
        </task>
        <task>
            <cmd>update_stats</cmd>
            <period>1 days</period>
            <disabled>0</disabled>
            <output>update_stats.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./update_profile_pages.php</cmd>
            <period>24 hours</period>
            <disabled>0</disabled>
            <output>update_profile_pages.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./team_import.php</cmd>
            <period>24 hours</period>
            <disabled>1</disabled>
            <output>team_import.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./notify.php</cmd>
            <period>24 hours</period>
            <disabled>0</disabled>
            <output>notify.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./badge_assign.php</cmd>
            <period>24 hours</period>
            <disabled>0</disabled>
            <output>badge_assign.out</output>
        </task>
    </tasks>
    <daemons>
        <daemon>
            <cmd>feeder -d 3 </cmd>
        </daemon>
        <daemon>
            <cmd>transitioner -d 3 </cmd>
        </daemon>
        <daemon>
            <cmd>file_deleter -d 3 </cmd>
        </daemon>
        <daemon>
            <cmd>sample_trivial_validator -d 3 --app sampleimage</cmd>
        </daemon>
        <daemon>
            <cmd>sample_assimilator -d 3 --app sampleimage</cmd>
        </daemon>
    </daemons>
</boinc>



Query 2 : I want to keep the input files always present on server I have applied no_delete tag in my job_in file. But I thinks it gets deleted once the output is uploaded on server.

Query 3: Where do you find your output results in sample_results folder right ? I am also confused in it.
ID: 87560 · Report as offensive
Seth

Send message
Joined: 19 Nov 16
Posts: 63
Australia
Message 87564 - Posted: 10 Aug 2018, 1:40:36 UTC - in response to Message 87560.  
Last modified: 10 Aug 2018, 1:42:57 UTC

Try:
<max_wus_in_progress> N </max_wus_in_progress>


https://boinc.berkeley.edu/trac/wiki/ProjectOptions

remove this section:
       [b] <project>
        	<max_jobs_in_progress>
            	<total_limit>
                	<jobs>3</jobs>
            	</total_limit>
        	</max_jobs_in_progress>
        </project>[/b]


Output results go into the upload directory first before they are assimilated into the single work unit result. Remember the same work is sent to multiple (usually 2) hosts to process. Once processed the result is sent back to the server. If both results are the same for the same work unit they are assimilated into the final output. The final output ends up in sample_result.

Cheers
Seth
ID: 87564 · Report as offensive
Saad

Send message
Joined: 23 Oct 17
Posts: 17
Message 87847 - Posted: 28 Aug 2018, 17:09:21 UTC - in response to Message 87564.  
Last modified: 28 Aug 2018, 17:17:50 UTC

Tried and no error. But still the client launches 8 jobs which is equal to number of cores present in client machine. Is there any other fix ?
<?xml version="1.0" ?>
<boinc>
    <config>
        <upload_dir>/home/pitb/projects/automationTest/upload</upload_dir>
        <send_result_abort>1</send_result_abort>
        <long_name>automationTest</long_name>
        <sched_debug_level>3</sched_debug_level>
        <cache_md5_info>1</cache_md5_info>
        <upload_url>http://103.226.217.106/automationTest_cgi/file_upload_handler</upload_url>
        <disable_account_creation>0</disable_account_creation>
        <uldl_dir_fanout>1024</uldl_dir_fanout>
        <disable_web_account_creation>0</disable_web_account_creation>
        <download_url>http://103.226.217.106/automationTest/download</download_url>
        <db_user>pitb</db_user>
        <log_dir>/home/pitb/projects/automationTest/log_food-home</log_dir>
        <app_dir>/home/pitb/projects/automationTest/apps</app_dir>
        <download_dir>/home/pitb/projects/automationTest/download</download_dir>
        <fuh_debug_level>3</fuh_debug_level>
        <master_url>http://103.226.217.106/automationTest/</master_url>
        <host>food-home</host>
        <db_name>automationTest</db_name>
        <shmem_key>0x1111f565</shmem_key>
        <show_results>1</show_results>
        <key_dir>/home/pitb/projects/automationTest/keys/</key_dir>
        <dont_generate_upload_certificates>1</dont_generate_upload_certificates>
        <ignore_upload_certificates>1</ignore_upload_certificates>
        <db_passwd>
            
            
            
            
            
        </db_passwd>
        <min_sendwork_interval>6</min_sendwork_interval>
        <db_host>
            
            
            
            
            
        </db_host>
        <ignore_delay_bound/>
        <daily_result_quota>500</daily_result_quota>
        <one_result_per_user_per_wu>0</one_result_per_user_per_wu>
        <max_wus_to_send>50</max_wus_to_send>
        <max_wus_in_progress>3</max_wus_in_progress>     
    </config>
    <tasks>
        <task>
            <cmd>antique_file_deleter -d 2</cmd>
            <period>24 hours</period>
            <disabled>0</disabled>
            <output>antique_file_deleter.out</output>
        </task>
        <task>
            <cmd>db_dump -d 2 --dump_spec ../db_dump_spec.xml</cmd>
            <period>24 hours</period>
            <disabled>1</disabled>
            <output>db_dump.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./update_uotd.php</cmd>
            <period>1 days</period>
            <disabled>0</disabled>
            <output>update_uotd.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./update_forum_activities.php</cmd>
            <period>1 hour</period>
            <disabled>0</disabled>
            <output>update_forum_activities.out</output>
        </task>
        <task>
            <cmd>update_stats</cmd>
            <period>1 days</period>
            <disabled>0</disabled>
            <output>update_stats.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./update_profile_pages.php</cmd>
            <period>24 hours</period>
            <disabled>0</disabled>
            <output>update_profile_pages.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./team_import.php</cmd>
            <period>24 hours</period>
            <disabled>1</disabled>
            <output>team_import.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./notify.php</cmd>
            <period>24 hours</period>
            <disabled>0</disabled>
            <output>notify.out</output>
        </task>
        <task>
            <cmd>run_in_ops ./badge_assign.php</cmd>
            <period>24 hours</period>
            <disabled>0</disabled>
            <output>badge_assign.out</output>
        </task>
    </tasks>
    <daemons>
        <daemon>
            <cmd>feeder -d 3 </cmd>
        </daemon>
        <daemon>
            <cmd>transitioner -d 3 </cmd>
        </daemon>
        <daemon>
            <cmd>file_deleter -d 3 </cmd>
        </daemon>
        <daemon>
            <cmd>sample_work_generator -d 3</cmd>
        </daemon>
        <daemon>
            <cmd>sample_trivial_validator -d 3 --app sampleimage</cmd>
        </daemon>
        <daemon>
            <cmd>sample_assimilator -d 3 --app sampleimage</cmd>
        </daemon>
    </daemons>
</boinc>

[/code]
ID: 87847 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 87848 - Posted: 28 Aug 2018, 17:34:37 UTC

It would appear that you are making some assumptions about your client computers, in that they are all of the same CPU core count and RAM size - this may not be a valid assumption.

That said, there may be a way around your problem -assign a number of CPU cores to each task - one of the multitude of (server side) configuration files contains a parameter that defines the default number of CPU (cores) and, where appropriate, the number of GPUs assigned to each task. By setting this value to two or three you will restrict the number of concurrent CPU tasks to core_count / number_cores_required. I think this parameter is called "n_cpus", but since I am a long way from the source code just now I can't be sure.
ID: 87848 · Report as offensive
Saad

Send message
Joined: 23 Oct 17
Posts: 17
Message 87851 - Posted: 28 Aug 2018, 18:29:24 UTC - in response to Message 87848.  

You are right all computers will not be same. Here are the details, each task takes 1.6gb memory, the user having even 8 gb memory but 8 cores will launch 8 jobs. That is why we want to restrict number of tasks/workunit in progress to be 2 or 1 for all users. Definitely we will be informing our users about minimum requirements require to volunteer their machines. Nothing is going wrong but usually when we are testing our jobs with our own dedicated machines we have noticed that sometimes the machine running jobs are hanged for couple of minutes and even for half an hour, so before public launch of our product we do not want our clients to see their computers hanged if they have to pause/suspend computing and have to use their machine. For example if a user comes back to resume his own work and see that machine is hanged he would panic. What all we want is that tasks in progress should be restricted to our desired number. We will be testing our jobs on several types of machines and then we are gonna set no of tasks in progress which is best for all.
ID: 87851 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 87852 - Posted: 28 Aug 2018, 18:38:41 UTC - in response to Message 87851.  

You probably want to have a look at Job Limits. One that catches my eye is

<max_ncpus>N</max_ncpus>
An upper bound on NCPUS (default: 64)
If that isn't adequate, try a config_aux.xml file, described in the following section.
ID: 87852 · Report as offensive
Saad

Send message
Joined: 23 Oct 17
Posts: 17
Message 87853 - Posted: 28 Aug 2018, 18:49:52 UTC - in response to Message 87852.  

Perfect, if I edit my current config file will the changes be effective for new tasks, or I have to re create the project.
ID: 87853 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 87859 - Posted: 28 Aug 2018, 20:57:18 UTC - in response to Message 87853.  

To be honest, I haven't a clue. I'm just a reader of documentation - I've never even seen a live BOINC server, let alone operated one.

I would be surprised if you have to re-create the project. I would expect you might have to stop and then re-start the servers daemons after editing the config file: you might need to cancel any previously created workunits and create new ones. Trial and error, as always.
ID: 87859 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 87861 - Posted: 28 Aug 2018, 21:13:55 UTC

From memory it should just be a stop the BOINC server and restart - although I think there are cases where if it is one of the .xml files it can be read whenever a new task is generated. Give it a go, let us know, either Richard or I will then try and remember when someone asks a similar question.....
ID: 87861 · Report as offensive

Message boards : Server programs : Limit number of jobs in progress to be 2 at most.

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.