Any way to manually change the deadline of a task?

Message boards : Questions and problems : Any way to manually change the deadline of a task?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4478
United Kingdom
Message 101970 - Posted: 5 Dec 2020, 12:29:10 UTC - in response to Message 101969.  
Last modified: 5 Dec 2020, 12:39:38 UTC

Need to check one more thing. I've joined up, and my initial state is

So, all four are running at once. That's because the project has a preference

I'll switch it to 4 for the next fetch, and try again. Looks like I'll have time to go and fetch my newspaper before we get there...

Edit - this is the one I got.

Proth Prime Search LLR (PPS)
k·2n+1 for k<1200

Supported platforms:
Windows: 32 bit, 64 bit
Linux: 32 bit, 64 bit
Mac: 64 bit
Multi-threading is supported but is NOT recommended. Click here to set the maximum number of threads.
Uses fast proof tasks so no double check tasks are needed. Everyone is "first"!
Deadline: 4 days (up to 30 days)

Recent average CPU time: 1:06:54
Which one were you running?
ID: 101970 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4478
United Kingdom
Message 101972 - Posted: 5 Dec 2020, 12:40:56 UTC
Last modified: 5 Dec 2020, 12:46:27 UTC

Ah - you'd answered my question (edit) already. I'll switch to that.

But now I see

I'll try LLR (SGS) anyway.
ID: 101972 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4478
United Kingdom
Message 101974 - Posted: 5 Dec 2020, 15:22:25 UTC - in response to Message 101973.  

No, I changed it manually. It was set to 1 when I first attached.

For reference, here are the first two work requests for a new computer on a new project:

05/12/2020 12:16:25 | | [work_fetch] target work buffer: 864.00 + 8640.00 sec
05/12/2020 12:16:25 | | [work_fetch] shortfall 38016.00 nidle 4.00 saturated 0.00 busy 0.00
05/12/2020 12:17:13 | PrimeGrid | [work_fetch] request: CPU (1.00 sec, 0.00 inst) Intel GPU (1.00 sec, 0.00 inst)
05/12/2020 12:17:15 | PrimeGrid | [sched_op] estimated total CPU task duration: 23626 seconds
05/12/2020 12:17:25 | PrimeGrid | [work_fetch] set_request() for CPU: ninst 4 nused_total 1.00 nidle_now 3.00 fetch share 1.00 req_inst 3.00 req_secs 28512.00
05/12/2020 12:17:26 | PrimeGrid | Requesting new tasks for CPU
05/12/2020 12:17:26 | PrimeGrid | [sched_op] CPU work request: 28512.00 seconds; 3.00 devices
05/12/2020 12:17:27 | PrimeGrid | Scheduler request completed: got 3 new tasks
05/12/2020 12:17:27 | PrimeGrid | [sched_op] estimated total CPU task duration: 70879 seconds
Nothing wrong with that. The estimated duration matches the speed and size of those first few tasks.

But after three hours, we're at

    <checkpoint_elapsed_time>10853.239532</checkpoint_elapsed_time>
    <fraction_done>0.050201</fraction_done>
I knew this machine was slow, but not that slow. At this rate, it'll finish at about midnight Monday!
ID: 101974 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4478
United Kingdom
Message 101979 - Posted: 5 Dec 2020, 16:58:19 UTC - in response to Message 101978.  

Yes, I've got all that now. I was a bit pre-occupied with getting my test started, fetching the paper, and writing my reply to David. And after all that, I think I need a bit of a lie-down...

We also have to factor in <duration_correction_factor>7.767842 - most projects have stopped using that. But at least they don't use APR on top of it. I'll do the maths after I've had a break - at least, it's all in the sched files, for which many thanks.
ID: 101979 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 962
United Kingdom
Message 101981 - Posted: 5 Dec 2020, 17:56:44 UTC

05/12/2020 17:35:17 |  | Fetching configuration file from http://www.primegrid.com/get_project_config.php
05/12/2020 17:35:49 | PrimeGrid | Fetching scheduler list
05/12/2020 17:35:50 | PrimeGrid | Master file download succeeded
05/12/2020 17:35:59 | PrimeGrid | Sending scheduler request: Project initialization.
05/12/2020 17:35:59 | PrimeGrid | Requesting new tasks for CPU and NVIDIA GPU
05/12/2020 17:36:00 | PrimeGrid | Scheduler request completed: got 1 new tasks
05/12/2020 17:36:00 | PrimeGrid | Project requested delay of 7 seconds
05/12/2020 17:36:02 | PrimeGrid | Started download of tpsieve_0.3.10d_windows64.exe
05/12/2020 17:36:02 | PrimeGrid | Started download of stat_primegrid.png
05/12/2020 17:36:02 | PrimeGrid | Started download of primegrid_slideshow_00.png
05/12/2020 17:36:03 | PrimeGrid | Finished download of tpsieve_0.3.10d_windows64.exe
05/12/2020 17:36:03 | PrimeGrid | Finished download of stat_primegrid.png
05/12/2020 17:36:03 | PrimeGrid | Finished download of primegrid_slideshow_00.png
05/12/2020 17:36:29 | PrimeGrid | Starting task pps_sr2sieve_137992346_0
05/12/2020 17:36:34 | PrimeGrid | Sending scheduler request: To fetch work.
05/12/2020 17:36:34 | PrimeGrid | Requesting new tasks for CPU
05/12/2020 17:36:35 | PrimeGrid | Scheduler request completed: got 11 new tasks
05/12/2020 17:36:35 | PrimeGrid | Project requested delay of 7 seconds
05/12/2020 17:36:38 | PrimeGrid | Starting task pps_sr2sieve_137989313_3
05/12/2020 17:36:38 | PrimeGrid | Starting task pps_sr2sieve_137992342_0
05/12/2020 17:36:38 | PrimeGrid | Starting task pps_sr2sieve_137992721_0

(sorry, I didn't have the right debug messages selected before connecting to Prime Grid and the work arrived)

OK, so this is a "new" machine of PrimeGrid, it has 4 cores available for use, and my cache is currently set at 1+0.01 days, and I'm using one core per task
As delivered the estimated run times are of the order of 3hrs 50 minutes.
Within seconds the estimated time jumps to several days, before rapidly dropping back to about 20 hours, and this figure correlates pretty well with the time I get by looking at the elapsed time and elapsed percent complete (albeit that the elapsed time is only a few minutes and the percent complete is about 0.5% so there could be some large margin for error there).
Now as far as I'm aware the server side of BOINC calculates the amount of work to send based on figures that will give you roughly the right number of tasks, but if the "estimated" performance figures are way off from reality then the number of tasks sent out will be wrong

This sort of error might explain what Peter is seeing - the servers think his computer is about ten times faster than it really is, so send him work based on that figure; as soon as work gets underway the expected runtimes get adjusted upwards and he sees far too much work in hand.

I'm going to let these tasks run through then get another batch set to use two cores each. This may take some time....
ID: 101981 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 962
United Kingdom
Message 101982 - Posted: 5 Dec 2020, 18:46:32 UTC

Right, 34 minutes into a task, 2.9% complete, which gives a time left of 19.5hrs - and that agrees with the remaining time given by BOINC on my computer. Remember this task immediately before starting had an estimated runtime of 3.67hrs. I would posit that Prime Grid are using some rather strange figures in getting to their flops guess - which is what is used to give the "anticipated" runtime.....

(Given my cache settings, and what I am now seeing I would have expected no more than four tasks in the initial batch, not the twelve that actually arrived - I'm glad I set NNT very soon after crunching the first batch commenced. However, given that Prime Grid's figures gave an expected runtime of ~4hours the initial delivery of twelve tasks is correct according to their end of the process.)
ID: 101982 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4478
United Kingdom
Message 101983 - Posted: 5 Dec 2020, 19:01:52 UTC - in response to Message 101982.  

I would posit that Prime Grid are using some rather strange figures in getting to their flops guess.
I don't think it's that. For my first group of four, the flops estimate in the <workunit> sent by the server exactly matched (down to the units place, and I think on into the micro-flop) the Benchmark figure calculated by the computer. It's a straight copy.

The dodgy one is probably the fpops_est for the <result>..It matters, but projects have a habit of paying very little attention, especially when it should be changing. PrimeGrid will be searching for ever-larger primes, and naively I'd suggest that the search will take ever longer? fpops_est should be growing to keep pace, but they may not have bothered.
ID: 101983 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 962
United Kingdom
Message 101984 - Posted: 5 Dec 2020, 19:12:10 UTC

Hmmmm - if one did the calculation of amount of work delivered in the first batch immediately on receipt then one would expect to see a 1:1 relationship between the two figures, BUT soon after a task starts the estimate for remaining time jumps to something much higher, and thus the hours of work in the cache goes up significantly, which is exactly what Peter is complaining about. I think you might have struck on something with suggesting that the ever increasing complexity involved in finding the next prime is increasing the real runtime (flops or hours) while the project has not adjusted for this (or hasn't adjusted correctly).
ID: 101984 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4478
United Kingdom
Message 101985 - Posted: 5 Dec 2020, 19:37:42 UTC

Checking...

1) Benchmark (from client_state / host_info)
2) Task speed (from sched_reply / app_version)

    <p_fpops>2084811827.956989</p_fpops>
      <flops>2084811827.956989</flops>
ID: 101985 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 962
United Kingdom
Message 101986 - Posted: 5 Dec 2020, 21:12:32 UTC

Interesting.....
I've got the value for p_flops in the expected place, but there is no value for flops in the place you suggest, but there is one in sched_request - and the values are the same.
Well that shows that the value for the computer speed is being sent to the server correctly - but why am I not seeing it in sched_reply as you say (that would be the value returned, having possibly been used, by the server. Can I plead confusion?
ID: 101986 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4478
United Kingdom
Message 101987 - Posted: 5 Dec 2020, 21:35:41 UTC

Now, doing a similar calculation for Peter's 24-core MT tasks, using the figures from sched_request and sched_reply.

The basic calculation is "size / speed", or "flops / (flops / sec)", = sec

size is <rsc_fpops_est>6525083980042.000000
speed is <flops>70852542799.840393

Which is bloody fast for a single core (over 70 Giga-Flops), so we'll assume it's for the whole CPU, 24 cores in parallel.

So duration is 92.093857498882860479551119183157 cpu-secs
or 2,210.2525799731886515092268603958 core-secs

The client applies a DCF of 7.767842, so the task is estimated at 715.37053422183723671319732473793 cpu-secs, or 00:11:55. QED

The total 170 job work-fetch turns out to be

170 tasks * 92ish sec/task * 7ish DCF = 121,612.99081771233024124354520545 cpu-sec

The log says 121823 seconds, a little more. That's to take account of
<on_frac>0.998370
<active_frac>0.999910

- the client takes account of those little breaks in service.
ID: 101987 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4478
United Kingdom
Message 101988 - Posted: 5 Dec 2020, 21:41:33 UTC - in response to Message 101986.  

Interesting.....
I've got the value for p_flops in the expected place, but there is no value for flops in the place you suggest, but there is one in sched_request - and the values are the same.
Well that shows that the value for the computer speed is being sent to the server correctly - but why am I not seeing it in sched_reply as you say (that would be the value returned, having possibly been used, by the server. Can I plead confusion?
Peter got

<app_version>
    <app_name>llrTPS</app_name>
    <version_num>804</version_num>
    <api_version>7.11.0</api_version>
<file_ref>
    <file_name>cllr64.3.8.23.exe</file_name>
    <open_name>primegrid_cllr.exe</open_name>
    <copy_file/>
</file_ref>
<file_ref>
    <file_name>llr.ini.6.07</file_name>
    <open_name>llr.ini</open_name>
    <copy_file/>
</file_ref>
<file_ref>
    <file_name>llr_wrapper_8.04_windows_x86_64.exe</file_name>
    <main_program/>
</file_ref>
    <is_wrapper/>
    <platform>windows_x86_64</platform>
    <plan_class>mt</plan_class>
    <avg_ncpus>24.000000</avg_ncpus>
    <flops>70852542799.840393</flops>
    <cmdline> --nthreads 24</cmdline>
</app_version>
I don't think the server always plays it back if it thinks you have the app already, but Peter sent this in the request:

<app_version>
    <app_name>llrTPS</app_name>
    <version_num>804</version_num>
    <platform>windows_x86_64</platform>
    <avg_ncpus>24.000000</avg_ncpus>
    <flops>70852542799.840393</flops>
    <plan_class>mt</plan_class>
    <api_version>7.11.0</api_version>
    <cmdline>--nthreads 24</cmdline>
    <is_wrapper/>
</app_version>
ID: 101988 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 962
United Kingdom
Message 102003 - Posted: 6 Dec 2020, 17:26:35 UTC

Have a look at the figures I gave:
Initially the tasks were reporting a runtime of just under four hours. There were twelve tasks using one core each. So that 48hours worth of work was about right.
Moments after starting the expected runtime leapt to about twenty hours, a five-fold increase in runtime and so may cache was about five times too big. OK, not as big as yours, but certainly heading in the right direction for what you've been looking at.
My hypothesis is that PrimeGrid are using the correct figure for the performance of my computer when working out how many tasks to send me, but is not using the right figure for the actual amount of work that the tasks will require.
Well that theory covers what I've seen, and to an extent what you've seen, but I'm not happy that it covers what happens when the task is running as that is something strange, during execution is there a calculation performed to estimate a more accurate estimate the amount of work required and so adjust the runtime estimate?
Also I'm still working on one core per task, but you are using four cores - I should find out what happens in a couple of days once I've chewed through the single core tasks.
ID: 102003 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Questions and problems : Any way to manually change the deadline of a task?

Copyright © 2021 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.