Cross-project identification

From BOINC
Jump to: navigation, search
Broom icon.svg

This page has been moved from Trac directly, and hasn't been cleaned up yet.

Please help improve this page by converting it to MediaWiki formatting, and fixing the links.


Leaderboard sites may show statistics from several BOINC projects, and may want to show credit for users and/or hosts summed across all the projects in which they participate.

Cross-project identification of hosts

Each host generates an internal cross-project ID, which is the MD5 of the concatenation of its domain name, IP address, free disk space, and a timestamp. This is reported to the projects to which the host is attached. The projects convert it to an external cross-project ID by hashing it with the owner's email address (this is intended to prevent spoofing). The external ID is exported in statistics files.

Whenever a scheduler reply indicates that it has generated a new host record (e.g. because a bad RPC seqno suggests that this is a duplicate host) the client generates a new internal host CPID.

Cross-project identification of participants

Accounts on different projects are considered equivalent if they have the same email address (we have considered other concepts, but they all lead to extreme complexity).

Projects can't export email addresses in statistics files; email addresses are private. It's also not desirable to export hashed email addresses, because spammers could enumerate feasible email addresses and compare them with the hashed addresses.

Instead, BOINC uses the following system:

  • Each account is assigned an 'internal cross-project identifier' (CPID) on the server when it's created; it's a long random string.
  • When a scheduling server replies to an RPC, it includes the account's CPID, its email address, and its creation time. These are stored in the client state file.
  • When the BOINC client makes an RPC request to a scheduling server, it scans the accounts with the same email address, finds the one with the oldest creation time, and sends the CPID stored with that account.
  • If the scheduling server receives a CPID different from the one in its database, it updates the database with the new CPID.
  • User elements in the [XmlStats XML statistics data] include a hash of (email address, CPID); this 'external' CPID serves as a unique identifier of all accounts with that email address. (The last step, hashing with the email address, prevents people from impersonating other people).

This system provides cross-project identification based on email address, without publicizing any information from which email addresses could be derived.