wiki:DirHierarchy

Version 3 (modified by Nicolas, 17 years ago) (diff)

Minor formatting

Hierarchical upload/download directories

The data server for a large project, may store 100Ks or millions of files at any given point. If these files are stored in 'flat' directories (project/download and project/upload) the data server may spend a lot of CPU time searching directories. If you see a high CPU load average, with a lot of time in kernel mode, this is probably what's happening. The solution is to use hierarchical upload/download directories. To do this, include the line

<uldl_dir_fanout>1024</uldl_dir_fanout>

in your config.xml file (this is the default for new projects). This causes BOINC to use hierarchical upload/download directories. Each directory will have a set of 1024 subdirectories, named 0 to 3ff. Files are hashed (based on their filename) into these directories.

The hierarchy is used for input and output files only. Executables and other application version files are in the top level of the download directory.

This affects your project-specific code in a couple of places. First, your work generator must put input files in the right directory before calling create_work(). To do this, it can use the function

int dir_hier_path(
    const char* filename, const char* root, int fanout, char* result,
    bool make_directory_if_needed=false
);

This takes a name of the input file and the absolute path of the root of the download hierarchy (typically the download_dir element from config.xml) and returns the absolute path of the file in the hierarchy. Generally make_directory_if_needed should be set to true: this creates a fanout directory if needed to accommodate a particular file. Secondly, your validator and assimilator should call

int get_output_file_path(RESULT const& result, string& path);
or
int get_output_file_paths(RESULT const& result, vector<string>& );

to get the paths of output files in the hierarchy. A couple of utility programs are available (run this in the project root directory):

dir_hier_move src_dir dst_dir fanout
dir_hier_path filename

dir_hier_move moves all files from src_dir (flat) into dst_dir (hierarchical with the given fanout). dir_hier_path, given a filename, prints the full pathname of that file in the hierarchy.

Transitioning from flat to hierarchical directories

If you are operating a project with flat directories, you can transition to a hierarchy as follows:

  • Stop the project and add <uldl_dir_fanout> to config.xml. You may want to locate the hierarchy root at a new place (e.g. download/fanout); in this case update the <download_dir> element of config.xml, and add the element
    <download_dir_alt>old download dir</download_dir_alt>
    

This causes the file deleter to check both old and new locations.

  • Use dir_hier_move to move existing upload files to a hierarchy.
  • Start the project, and monitor everything closely for a while.