wiki:FileDeleter

Server-side file deletion

Files are deleted from the data server's upload and download directories by two programs:

  • file_deleter deletes input and output files as jobs are completed.
  • The antique file deleter deletes files that were missed by the file_deleter and "fell through the cracks".

The File Deleter

file_deleter is a daemon. Typically you don't need to customize it. The default file deletion policy is:

  • A workunit's input files are deleted when all results are 'over' (reported or timed out) and the workunit is assimilated.
  • A result's output files are deleted after the workunit is assimilated. The canonical result is handled differently, since its output files may be needed to validate results that are reported after assimilation; hence its files are deleted only when all results are over, and all successful results have been validated.
  • If <delete_delay_hours> is specific in config.xml, the deletion of file is delayed by that interval.

Command-line options:

-d N
set debug output level (1/2/3/4)
--mod M R
handle only WUs with ID mod M == R
--one_pass
exit after one pass through DB
--dry_run
don't update DB (for debugging only)
--download_dir D
override download_dir from project config with D
--sleep_interval N
sleep for N seconds between scans (default 5)
--appid N
only process workunits with appid=N
--app S
only process workunits of app with name S
--dont_retry_errors
don't retry file deletions that failed previously
--preserve_wu_files
update the DB, but don't delete input files
--preserve_result_files
update the DB, but don't delete output files
--dont_delete_batches
don't delete anything with positive batch number
--input_files_only
don't delete output files If you store input and output files on different servers, you can improve performance by running separate file deleters, each one on the machine where the corresponding files are stored.
--output_files_only
don't delete input files
--xml_doc_like L
only process workunits where xml_doc LIKE 'L'

In some cases you may not want files to be deleted. There are three ways to accomplish this:

  • Use the --preserve_wu_files and/or the --preserve_result_files command-line options.
  • Include <no_delete/> in the <file_info> element for a file in a workunit or result template. This lets you suppress deletion on a file-by-file basis.
  • Include nodelete in the workunit name.

The Antique File Deleter

antique_file_deleter should be run as a periodic task. It removes output files that are older than the oldest WU in the database (not including "no_delete" WUs). These files are created when BOINC clients return results after the corresponding WU has been deleted from the database.

The antique files are deleted by using a Unix 'find' command to locate files that are older than the oldest workunit. The find command will work on NFS mounted file systems, and will ignore .nfs stale file markers. The output of find is limited by a 'head' to 50000 files by default.

If the web-server account on your system is not 'apache', add a <httpd_user> element to your config.xml file. Otherwise antique deletion won't work.

Command-line options:

-d N
set debug output level (1/2/3/4)
--dry_run
don't delete any files, just log what would be deleted
--usleep N
sleep this number of usecs after each examined file (Throttles I/O if there are many files.)
Last modified 5 years ago Last modified on Feb 20, 2019, 6:22:24 PM