Changes between Initial Version and Version 1 of DesignKeywords


Ignore:
Timestamp:
Jul 13, 2017, 3:47:48 PM (7 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DesignKeywords

    v1 v1  
     1= Keywords =
     2
     3This document describes a framework for assigning keywords,
     4such as science area and location, to jobs and projects.
     5This can be used for several purposes:
     6
     7 * Client GUIs can show volunteers what kinds of jobs they're running.
     8 * As part of an account manager that lets volunteers sign up
     9   for science areas rather than specific projects
     10   (I'm currently working on one of these).
     11 * To show project attributes in the project list on the BOINC web site
     12   (we currently show attributes in an ad-hoc way).
     13
     14There lots of other potential uses.
     15
     16To make this work, the BOINC community needs to agree on
     17
     18 * A structure for the set of keywords.
     19 * An authoritative set of keywords. I propose that the BOINC PMC be in charge of this,
     20  possibly creating a committee for this purpose.
     21
     22== Goals ==
     23
     24 * Keep things as simple as possible.
     25   We don't need to create the ultimate taxonomy of science.
     26 * Make it possible to have a very simple UI for volunteer keyword preferences,
     27   e.g. a few high-level keywords with yes/no/maybe buttons.
     28 * Make it possible to have a higher-resolution UI,
     29   e.g. research for a particular type of cancer.
     30
     31== Structure ==
     32
     33I propose structuring keywords as follows:
     34
     35'''Category''': what property the keyword refers to; I suggest
     36 * '''Science Area''': what kind of research is being done.
     37 * '''Location''': where (continent, country, institution) the researcher is located.
     38 * Another orthogonal attribute is ownership and accessibility of results.
     39   Some volunteers don't want to support for-profit research.
     40   But this is tricky; there are gray areas such as academic research for which
     41   a corporation has right of first refusal for licensing the results.
     42
     43'''Level''': 0, 1, 2.
     44Level 0 is most general (e.g. 'Physics' or 'Europe').
     45
     46'''Hierarchy''': the relationship between level n and n+1 keywords.
     47I propose a strict hierarchy:
     48each level n+1 keyword is the child of a single level n keyword.
     49 * Advantage: this simplifies the conceptual model and the user interface.
     50 * Disadvantage: it can't represent, for example, that a level 1 keyword like "Gravitational waves"
     51   is associated with both "Physics" and "Astronomy".
     52   But I don't think this matters.
     53   If volunteer wants to support GW research and doesn't find it in one place,
     54   they'll look in the other.
     55
     56Each keyword has
     57 * an integer ID, which never changes, and is used to identify the keyword
     58   in job, project, and preferences lists.
     59 * short and long textual descriptions; these can change over time.
     60   We'll figure out a way to make them translatable.
     61 * create time, mod time, and delete time.
     62
     63The list of keywords and all their properties will be exported
     64by the BOINC web site as an XML file.
     65
     66== Keyword example ==
     67
     68(not complete: just to show the idea; indentation shows level)
     69
     70{{{
     71Science Area
     72   Astronomy
     73      SETI
     74      Pulsars
     75      Gravitational waves
     76      Cosmology
     77   Physics
     78      Particle physics
     79      Nanoscience
     80   Biology and medicine
     81      Drug design
     82      Protein research
     83      Genetics and phylogeny
     84      Disease research
     85         Diabetes
     86         Cancer
     87            Prostate cancer
     88            Breast cancer
     89   Mathematics and Computer Science
     90   Artificial Intelligence and Cognitive Science
     91
     92Location
     93   Europe
     94      Germany
     95         AEI
     96   Asia
     97   Australia
     98   The Americas
     99      United States
     100         UC Berkeley
     101         Purdue
     102}}}
     103
     104== Project and job attributes ==
     105
     106Each project can have a set of keywords.
     107For each keyword there is an associated "work fraction":
     108an estimate of the fraction of the project's work that have that keyword.
     109
     110Each job can have an associated set of keywords.
     111Note: keywords need to be at the job level, not app, because VM-based projects
     112can use a single BOINC app for all their jobs.
     113
     114If a project has a keyword with work fraction 1,
     115that keyword is implicitly associated with all the project's jobs.
     116
     117== Volunteer preferences ==
     118
     119A volunteer can specify (e.g. via an account manager) a set of "preferences",
     120which is a map from keywords to [yes, no, maybe].
     121
     122"no" means don't send jobs with that keyword.
     123
     124"yes" means preferentially send jobs with that keyword.
     125
     126A "no" for a level N keyword trumps "yes" for a descendant keyword.
     127
     128If a project has a keyword with work fraction 1,
     129and the volunteer has "no" for that keyword,
     130the volunteer should not be attached to that project.
     131
     132Note: instead of ternary yes/no/maybe, we could have some sort of "research share" per keyword.
     133This would greatly complicate things; I don't think it's worth it.
     134
     135== Information flow ==
     136
     137 * An account manager reply can return a set of volunteer preferences,
     138   and sets of project keywords,
     139   both of which are stored by the client.
     140   They are deleted if the user detaches from the AM.
     141 * The client includes volunteer preferences in scheduler requests.
     142 * The job submission interfaces will be expanded to include job keywords;
     143   these will be stored in the DB result table.
     144 * Projects can export their keywords in get_project_config.php.
     145 * Project and job keywords will be included in GUI RPC replies,
     146   so that GUIs can show them.
     147
     148== Keywords and scheduling ==
     149
     150The BOINC scheduler's score-based algorithm will be augmented with a keyword component:
     151 * If a job has a keyword for which the volunteer has a "no" preference,
     152   the score is -1 (don't send).
     153 * For each job keyword for which the volunteer has a "yes" preference,
     154   increment the score.
     155
     156== Changes over time ==
     157
     158Keywords may be added, removed, or changed over time.
     159In terms of volunteer preferences, what should the semantics be?
     160E.g., suppose a new science area is added.
     161Should prefs default to "maybe" or "no"?
     162I propose:
     163 * Prefs default to "maybe";
     164 * Volunteers are informed ASAP that keywords have changed,
     165   and given a link to update their prefs accordingly.
     166
     167For example: AMs that support keyword prefs can keep a timestamp
     168of when each user updated their prefs.
     169If the mod time of the keyword set is later than this:
     170
     171 * When the user visits the AM web site, they're shown a message of the form
     172   "keywords have changed - please update your prefs".
     173 * A similar message is sent to the client as a notice.