No tasks being supplied

Message boards : Questions and problems : No tasks being supplied
Message board moderation

To post messages, you must log in.

AuthorMessage
loris

Send message
Joined: 23 Mar 20
Posts: 4
Germany
Message 97404 - Posted: 9 Apr 2020, 12:00:59 UTC

I'm running 7.17.0 on a CentOS 7.8 cluster and submitting a job to a resource manager. The resource manager starts 'boinc' on a node and the the connection to Rosetta@Home seems to set up correctly. However, my boinc process is not provided with any tasks. The state information is shown below. Where else can I look at to find out why no tasks are being provided?

$ boinccmd --get_state
======== Projects ========
1) -----------
   name: Rosetta@home
   master URL: http://boinc.bakerlab.org/rosetta/
   user_name: loris
   team_name: 
   resource share: 100.000000
   user_total_credit: 0.000000
   user_expavg_credit: 0.000000
   host_total_credit: 0.000000
   host_expavg_credit: 0.000000
   nrpc_failures: 0
   master_fetch_failures: 0
   master fetch pending: no
   scheduler RPC pending: no
   trickle upload pending: no
   attached via Account Manager: no
   ended: no
   suspended via GUI: no
   don't request more work: no
   disk usage: 0.000000
   last RPC: Thu Apr  9 09:28:43 2020

   project files downloaded: 0.000000
GUI URL:
   name: Message boards
   description: Correspond with other users on the Rosetta@home message boards
   URL: http://boinc.bakerlab.org/rosetta/forum_index.php
GUI URL:
   name: Your account
   description: View your account information
   URL: http://boinc.bakerlab.org/rosetta/home.php
GUI URL:
   name: Your tasks
   description: View the last week or so of computational work
   URL: http://boinc.bakerlab.org/rosetta/results.php?userid=<redacted>
   jobs succeeded: 0
   jobs failed: 0
   elapsed time: 0.000000
   cross-project ID: c052c0ca0d54136020e40742e96b0cbb

======== Applications ========

======== Application versions ========

======== Workunits ========

======== Tasks ========

======== Time stats ========
  now: 1586433457.156679
  on_frac: 1.000000
  connected_frac: -1.000000
  cpu_and_network_available_frac: 1.000000
  active_frac: 1.000000
  gpu_active_frac: 1.000000
  client_start_time: Thu Apr  9 07:57:46 2020

  previous_uptime: 0.000000
  session_active_duration: 21579.826771
  session_gpu_active_duration: 21579.826771
  total_start_time: Thu Apr  9 07:57:49 2020

  total_duration: 21579.826771
  total_active_duration: 21579.826771
  total_gpu_active_duration: 21579.826771
ID: 97404 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2516
United Kingdom
Message 97406 - Posted: 9 Apr 2020, 12:37:56 UTC - in response to Message 97404.  

Are you able to look at the event log to see what messages there are when work is requested?
ID: 97406 · Report as offensive
loris

Send message
Joined: 23 Mar 20
Posts: 4
Germany
Message 97408 - Posted: 9 Apr 2020, 12:49:44 UTC - in response to Message 97406.  
Last modified: 9 Apr 2020, 12:54:09 UTC

Where would I find the event log? I'm not running any manager, just 'boinc' from the command line.
ID: 97408 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 97415 - Posted: 9 Apr 2020, 13:53:52 UTC - in response to Message 97408.  

You probably don't have any, unless you run the boinc client with the --redirectio flag. But on the off chance that changed lately (I haven't run Linux in ages), check for a stdoutdae.txt file. It's usually stored in the data directory, which may still be at /var/lib/boinc-client
ID: 97415 · Report as offensive
loris

Send message
Joined: 23 Mar 20
Posts: 4
Germany
Message 97417 - Posted: 9 Apr 2020, 14:05:46 UTC - in response to Message 97415.  

I'm not running with the --redrectio flag. Should I? I'm unclear about what exactly is meant by 'data directory'. The binaries are installed in a non-standard system path. In my home directory I have a subdir containing the auth token. The resource manager runs a script on a compute node which starts boinc in a dedicated, temporary directory and attaches to R@H using the auth token:

BOINC_DIR=~/programs/boinc

# Rosetta@home
URL='http://boinc.bakerlab.org/rosetta/'
source ${BOINC_DIR}/project_auth
AUTH=$BOINC_AUTH_ROSETTA

# create and switch to job directory
JOB_DIR=$(mktemp -p /scratch/${USER}/boinc -d)
cd ${JOB_DIR}

# start BOINC and ask to just run a single job and then exit
boinc --fetch_minimal_work --exit_after_finish --attach_project ${URL} ${AUTH}

rm -rf ${JOB_DIR}

This all seems to happen without error, but the boinc process receives no tasks.
ID: 97417 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 97418 - Posted: 9 Apr 2020, 14:10:04 UTC - in response to Message 97417.  

Well, it's possible that Rosetta doesn't have an application for your Linux distro. You probably do best to ask that at their forums, BOINC merely manages the projects - decides how much work to cache, when to run things etc. The actual calculations are done by project applications, and therefore any trouble you have with them, including not getting them, is best asked at their forums: http://boinc.bakerlab.org/rosetta/forum_help_desk.php (pointing you to their help desk forums, as those you can post in without credit or RAC)
ID: 97418 · Report as offensive
Gary Roberts

Send message
Joined: 7 Sep 05
Posts: 130
Australia
Message 97510 - Posted: 11 Apr 2020, 22:41:04 UTC - in response to Message 97417.  

This all seems to happen without error, but the boinc process receives no tasks.
You need to find the file Jord mentioned - stdoutdae.txt. According to your script, it should be in $JOB_DIR. During startup, the client should write all the startup messages and the response from Rosetta in that file. If the client exits for some reason, your final command would remove all the evidence so perhaps that's why you can't find it. Have you looked in the $JOB_DIR directory while the client is running?

The client is stored in $BOINC_DIR but you cd to $JOB_DIR and run it from there. You don't specify a PATH when launching the client and you don't copy the client into $JOB_DIR. I assume $BOINC_DIR must be in the path that your script has access to. The --redirectio option seems to be just for specifying a different filename (or path) than ./stdoutdae.txt and so shouldn't be needed. I'm not familiar with how you are running the client. I'm using Linux and I compile my own clients and run them from ~/BOINC. I have built and run 7.16.5 and there is nothing unusual about stdoutdae.txt. It continues to be used in the BOINC directory where the client is installed.

The query I would have about your script is to do with using both --fetch_minimal_work and --exit_after_finish. According to the documentation, exit_after_finish is some sort of debugging option, whereas the usual option to use with --fetch_minimal_work should be --exit_when_idle.

That's sounds like a better fit so perhaps you should try that. If you still get no work, you need to examine the startup messages in stdoutdae.txt, before $JOB_DIR gets deleted.
Cheers,
Gary.
ID: 97510 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 97541 - Posted: 12 Apr 2020, 23:00:08 UTC

On the boic stats page, Rosetta shows it's regularly running low on tasks.
For this reason I always load 3 CPU projects and 3 GPU projects, just in case one project hasn't got any new WUs.
ID: 97541 · Report as offensive
loris

Send message
Joined: 23 Mar 20
Posts: 4
Germany
Message 97749 - Posted: 16 Apr 2020, 12:34:47 UTC

Maybe there were indeed just no tasks. I thought I had checked the server status, but maybe not. Anyhow, the server status page currently says that there are tasks ready to send and my BOINC client seems to have received one and is now working on it. Thanks for the pointers.
ID: 97749 · Report as offensive

Message boards : Questions and problems : No tasks being supplied

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.