Changes between Initial Version and Version 1 of MemoryManagement

Apr 26, 2007, 2:38:48 PM (15 years ago)

Added page


  • MemoryManagement

    v1 v1  
     1= Memory usage =
     3This document describes proposed policies and mechanisms related to RAM and swap space. There are several issues:
     5    * '''Job dispatch''': how do we decide whether to send a job to a host, based on the job's memory requirements and the host's memory resources?
     6    * '''Client CPU scheduling''': how do memory factors affect when jobs should run?
     7    * '''Job abort policy''': when must a job be aborted because it is using too much memory?
     9== Issues and goals ==
     11BOINC applications run at the lowest CPU priority. However, they can impact user-visible performance because of their memory usage:
     13    * When the system is in use (i.e. when there's mouse/keyboard input), the memory usage of running BOINC apps can increase the level of paging, making the system sluggish.
     14    * If several user apps are open and the system is idle for a long period, the memory usage of BOINC apps may cause the user apps to be paged out. When the user eventually returns, it may take a while (10-20 seconds) for the user apps to get paged back in.
     16These effects can be minimized by limiting BOINC apps to a very small amount of memory. However, this reduces the CPU time available to BOINC, and on some systems BOINC would do no work at all. There is a tradeoff: the more work BOINC does, the greater its potential impact on user-visible performance. One goal of our design is to provide user-adjustable controls (i.e. [GlobalPrefs general preferences]) over this tradeoff.
     18A second goal is to maximize the CPU efficiency of BOINC apps, i.e. to ensure that they don't thrash. On a multiprocessor, it may sometimes be more efficient (in terms of total CPU time per wall time) to run fewer jobs than the number of CPUs.
     20A third goal is to support applications that are '''memory-aware''', i.e. that can trade off memory usage for speed. Such applications should be made aware of the current memory constraints, so that they can adapt accordingly.
     22== Client data ==
     24When it starts up, BOINC measures:
     26    * The amount of RAM on the system.
     27    * The amount of swap space. On Win/Unix, this is the size of the page file or swap partition. On Mac, it's the amount of free disk space.
     29BOINC measures the following periodically (every 10 seconds or so):
     31    * For each executing BOINC app: the working set size (for compound apps, this includes all processes). The definition of 'working set' may vary between OSs, but we assume that it means the amount of RAM needed to run with high (say, > 90%) CPU utilization. This is not necessarily the amount of RAM the app currently is using.
     33      To accommodate spikes in memory usage, BOINC smooths the working set size: the actual value used is computed as
     35      WSS = .5*WSS + .5*WSS_OS
     37      where WSS_OS is the working set size as reported by the OS.
     39    * For each BOINC app: the amount of swap space used.
     41Data we don't have:
     43    * Page-fault rates for each app. This doesn't seem to be available on Win (the reported page fault rate includes faults that don't read from disk).
     45== Server data ==
     47Each workunit record includes:
     49    * '''rsc_memory_bound''': an estimate of the app's largest working set size.
     51== Preferences ==
     53We propose the following:
     55    * '''ram_max_used_frac_busy''': Max fraction of RAM to use while system is busy
     56    * '''ram_max_used_frac_idle''': Max fraction of RAM to use while system is idle
     57    * '''vm_max_used_pct''': Max percentage of swap space to use (this already exists)
     59== Scheduler (server side) ==
     61A result is sent to a client only if
     63rsc_memory_bound < (RAM size)*min(ram_max_used_frac_busy, ram_max_used_frac_idle)
     65== Client CPU scheduler ==
     67The scheduler is divided into two parts:
     69    * Make a list of tasks to run, ordered by 'importance' (deadline-critical ones first, then high-debt).
     70    * Enforcement: go through the run list, starting tasks in order, and preempting other tasks as needed. Don't preempt a task that hasn't checkpointed in favor of a non-deadline-critical task.
     72This will be modified as follows:
     74    * In building the run list, compute the available RAM, based on preferences. In building the list, keep track of RAM used so far. Skip any task that would cause this to exceed available RAM.
     75    * Enforcement: compute the available RAM, based on preferences. In running tasks, keep track of RAM used so far. Skip any task that would cause the limit to be exceeded. Preempt tasks that haven't checkpointed if they would cause the limit to be exceeded.
     77In addition, we will add a new 'memory usage check' that runs every 30 seconds or so. This will compute the working sets of all running tasks. If the total is too large, it will trigger CPU scheduler enforcement (see above). If an individual task's working set is too large for it to every run, it is aborted (see below).
     79Note: the above policies may cause some tasks to not get run for long periods. For example, suppose that
     81    * A 2-CPU machine has 1 GB RAM,
     82    * There's a small-RAM job X with a close deadline
     83    * There's a 1 GB job Y
     84    * There are several small-RAM jobs.
     86In this case, Y won't run until X has finished, even if it more deserving (in terms of debt) than the other small jobs. However, Y won't starve indefinitely. Eventually it will run into deadline trouble, and will run ahead of everything else.
     88== Aborting tasks ==
     90A task is aborted if, at any point, its working set size is larger than
     92(RAM size)*max(ram_max_used_frac_busy, ram_max_used_frac_idle)
     94since this means it can't be scheduled.
     96== Memory-aware applications ==
     98The following items will be added to the BOINC_STATUS structure:
     101double working_set_size;        // app's current WS (non-smoothed)
     102double max_working_set_size;    // app will be aborted if WS exceeds this
     105== Future work ==
     107    * Measure, and take into account, non-BOINC RAM usage. Maybe the best policy is: if non-BOINC RAM usage is X, BOINC uses total-X. If the computer is busy
     108    * Enforce bounds on swap space usage.
     109    * Make the round-robin simulator aware of memory issues. In the scenario described under Client CPU Scheduler, the large-RAM task won't get classified as being in deadline trouble until somewhat too late.