Ticket #925 (closed Defect: fixed)

Opened 5 months ago

Last modified 4 weeks ago

BOINC 6.6.36 non-CUDA checkpoint interval scaled by host.ncpus

Reported by: Thyme Lawn Assigned to: davea
Priority: Minor Milestone: Undetermined
Component: Client - Daemon Version: 6.6.36
Keywords: checkpoint Cc:

Description

BOINC 6.6.36 has a checkpoint multiplier problem for non-CUDA tasks on multi-core systems.

I have my checkpoint interval set to 10 minutes but checkpoints for malariacontrol.net and WCG are happening every 20 minutes on a dual core system and every 40 minutes on a quad core. Applications are kept in memory and WCG usually checkpoints soon after being scheduled, meaning it can be scheduled for >80 minutes on the quad core instead of the 60 minutes set in preferences.

The problem is in ACTIVE_TASK::write_app_init_file() which contains the following pair of lines:

int nprocs = (result->avp->ncudas)?coproc_cuda->count:gstate.ncpus; aid.checkpoint_period = nprocs*gstate.global_prefs.disk_interval;

This means the checkpoint interval for non-CUDA tasks will always be scaled up by the host's number of CPUs (gstate.ncpus) instead of the average number of CPU's requested for the task (result->avp->avg_ncpus).

Sure enough, app_init.xml on the dual core system has

<checkpoint_period>1200.000000</checkpoint_period>

and on the quad core it has

<checkpoint_period>2400.000000</checkpoint_period>

Attachments

app_start_cpp.patch (0.6 kB) - added by Thyme Lawn on 06/23/09 07:11:12.
Patch

Change History

06/23/09 07:11:12 changed by Thyme Lawn

  • attachment app_start_cpp.patch added.

Patch

10/29/09 21:11:53 changed by romw

  • status changed from new to closed.
  • resolution set to fixed.

Now fixed in 6.10.

10/30/09 12:40:36 changed by Nicolas

Fixed in r19293, backported to 6.10 in r19321.


If this page is incomplete or incorrect, please edit it or add it to the wiki to-do list. To do this, you must be logged in; click Login or Register above.