wiki:DbPurge

Version 2 (modified by davea, 17 years ago) (diff)

--

Database purging utility

As a BOINC project operates, the size of its workunit and result tables increases. Eventually they become so large that adding a field or building an index may take hours or days. To address this problem, BOINC provides a utility db_purge that writes result and WU records to XML-format archive files, then deletes them from the database. Workunits are purged only when their input files have been deleted. Because of BOINC's file-deletion policy, this implies that all results are completed. So when a workunit is purged, all its results are purged too.

db_purge creates an archive/ directory and store archive files there.

db_purge is normally run as a daemon, specified in the config.xml file. It has the following command-line options:

-min_age_days N
Purge only WUs with mod_time at least N days in the past. Recommended value: 7 or so. This lets users examine their recent results.
-max N
Purge at most N WUs, then exit
-max_wu_per_file N
Write at most N WUs to each archive file. Recommended value: 10,000 or so.
-zip
Compress archive files using zip
-gzip
Compress archive files using gzip
-d N
Set logging verbosity to N (1,2,3)

Recording credited jobs

You may want to keep a record of which user returned a valid result for each workunit (e.g. so that you can give them credit for specific discoveries). If you use db_purge, this information is removed from the database; it can be extracted from the XML archives, but it's slow.

Alternatively, you can store the user/workunit association in a separate table called credited_job. This records in this table are small - they store only a user ID and a 64-bit integer which is taken from the workunit's 'opaque' field (typically an index in a separate database). By default, the table has indices that allow you to efficiently enumerate workunits for a given user and users for a given workunit.

To use the credited_job table, pass the -update_credited_job flag to your validator.

Archive file format

The archive files have names of the form wu_archive_TIME and result_archive_TIME where TIME is the Unix time the file was created. In addition, db_purge generates index files 'wu_index' and 'result_index' associating each WU and result ID with the timestamp of its archive file.

The format of both type of index files is a number of rows each containing:

ID     TIME

The ID field of the WU or result, 5 spaces, and the timestamp part of the archive filename where the record with that ID can be found.

The format of a record in the result archive file is:

<result_archive>
  <id>%d</id>
  <create_time>%d</create_time>
  <workunitid>%d</workunitid>
  <server_state>%d</server_state>
  <outcome>%d</outcome>
  <client_state>%d</client_state>
  <hostid>%d</hostid>
  <userid>%d</userid>
  <report_deadline>%d</report_deadline>
  <sent_time>%d</sent_time>
  <received_time>%d</received_time>
  <name>%s</name>
  <cpu_time>%.15e</cpu_time>
  <xml_doc_in>%s</xml_doc_in>
  <xml_doc_out>%s</xml_doc_out>
  <stderr_out>%s</stderr_out>
  <batch>%d</batch>
  <file_delete_state>%d</file_delete_state>
  <validate_state>%d</validate_state>
  <claimed_credit>%.15e</claimed_credit>
  <granted_credit>%.15e</granted_credit>
  <opaque>%f</opaque>
  <random>%d</random>
  <app_version_num>%d</app_version_num>
  <appid>%d</appid>
  <exit_status>%d</exit_status>
  <teamid>%d</teamid>
  <priority>%d</priority>
  <mod_time>%s</mod_time>
</result_archive>

The format of a record in the WU archive file is:

<workunit_archive>
  <id>%d</id>
  <create_time>%d</create_time>
  <appid>%d</appid>
  <name>%s</name>
  <xml_doc>%s</xml_doc>
  <batch>%d</batch>
  <rsc_fpops_est>%.15e</rsc_fpops_est>
  <rsc_fpops_bound>%.15e</rsc_fpops_bound>
  <rsc_memory_bound>%.15e</rsc_memory_bound>
  <rsc_disk_bound>%.15e</rsc_disk_bound>
  <need_validate>%d</need_validate>
  <canonical_resultid>%d</canonical_resultid>
  <canonical_credit>%.15e</canonical_credit>
  <transition_time>%d</transition_time>
  <delay_bound>%d</delay_bound>
  <error_mask>%d</error_mask>
  <file_delete_state>%d</file_delete_state>
  <assimilate_state>%d</assimilate_state>
  <hr_class>%d</hr_class>
  <opaque>%f</opaque>
  <min_quorum>%d</min_quorum>
  <target_nresults>%d</target_nresults>
  <max_error_results>%d</max_error_results>
  <max_total_results>%d</max_total_results>
  <max_success_results>%d</max_success_results>
  <result_template_file>%s</result_template_file>
  <priority>%d</priority>
  <mod_time>%s</mod_time>
</workunit_archive>