Android, getting too many wu's to handle in time

Author	Message
Drago75 Send message Joined: 29 Sep 20 Posts: 22	Message 107421 - Posted: 17 Mar 2022, 12:50:08 UTC Shortly before WCG went on hold I increased the work cache on my phones to 7 days. After there was no more new work I reduced the cache to 0.1 days again. All 16 phones with BOINC 18.1 installed showed the same behaviour. The BOINC App crashed after the change, had to do it again, crashed again, the phones offered to close the app and restart it. After the restart the cache showed 0.1 days correctly but when asking work for other projects like Einstein and Universe I got hundreds of wu's on each phone which would correspond more to the previous seven days of cache. On Einstein that leads to a lot of aborted wu's because they were not started in time. On one phone I reinstalled Boinc but the behaviour is the same. Does anyone know why that is? ID: 107421 ·

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 2538	Message 107423 - Posted: 17 Mar 2022, 13:38:12 UTC universe@home running on my phone (BOINC7.18.1) I just use one core being somewhat risk averse and not wanting to fry the battery of the phone I use as a phone. I have download 0.1 day for work and 0.5 as additional buffer and I never have more than one task running and one in the queue. Haven't tried Einstein. Will try and see what happens. Just connected Einstein, one Universe task running, one Einstein in queue. You did cut down the additional work buffer as well? ID: 107423 ·

Drago75 Send message Joined: 29 Sep 20 Posts: 22	Message 107424 - Posted: 17 Mar 2022, 14:02:39 UTC - in response to Message 107423. Last modified: 17 Mar 2022, 14:04:44 UTC Hey Dave. Yes the additional work buffer is 0. Until february I never had problems with changes in the work buffer. What may have happened is that the last Android update provided by Samsung somehow has let to the BOINC buffer settings not being taken into account anymore. This is a strange behaviour and it happens on ALL of my devices. ID: 107424 ·

Drago75 Send message Joined: 29 Sep 20 Posts: 22	Message 107425 - Posted: 17 Mar 2022, 14:20:53 UTC - in response to Message 107424. Update: This also happens on my Odroid running Android and Boinc 7.18.1. This device didn't receive any OS update so far, yet Boinc crashes now if I try to change the work buffer setting. After a couple of times it finally accepts them but they seem to be ignored. ID: 107425 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15480	Message 107426 - Posted: 17 Mar 2022, 14:27:55 UTC Sounds to me like a bug in the work buffer code, and since BOINC is crashing it's not one we can easily solve without the developer's help. So please post the problem on https://github.com/BOINC/boinc/issues ID: 107426 ·

Dr Who Fan Send message Joined: 10 May 07 Posts: 1350	Message 107429 - Posted: 17 Mar 2022, 15:22:45 UTC Since WCG Scheduler is missing / offline... Just a wild guess could BOINC be trying to retrieve the default "web based" scheduler settings from WCG, thus causing the crash? Does he have local BOINC override settings on his devices? ID: 107429 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5081	Message 107431 - Posted: 17 Mar 2022, 16:40:25 UTC - in response to Message 107424. Hey Dave. Yes the additional work buffer is 0. Until february I never had problems with changes in the work buffer. What may have happened is that the last Android update provided by Samsung somehow has let to the BOINC buffer settings not being taken into account anymore. This is a strange behaviour and it happens on ALL of my devices. That sounds like a very unlikely explanation, but odd things have happened in computing ... I have a 4-core Samsung tablet which was running WCG and Einstein 50-50: since the WCG pause, it's been doing exclusively Einstein work. I haven't updated the operating system for a while (it's currently on Android 4.4.177-22626479 (Android 11)), but I'll keep an eye on it when I do next update it. At the moment, it's keeping '4 running and 4 ready to start', as per instructions. More interesting question: how does the excessive workload arrive? Is it a single work request, resulting in a huge spike of work, or a series of separate work requests, again and again and again, spaced 1 minute apart (the Einstein server backoff setting)? I've seen the latter behaviour (endless, repeated, requests) on Windows machines, but never been able to track it down: it stops if you change any of the Event Log settings. [There's a known problem if you use the max_concurrent setting in an app_config file (similar but subtly different). But you won't have installed an app_config under Android, will you?] ID: 107431 ·

Drago75 Send message Joined: 29 Sep 20 Posts: 22	Message 107432 - Posted: 17 Mar 2022, 19:15:40 UTC - in response to Message 107431. Hey Richard. No, I didn't write an app_config file under Android. :-) The behaviour you discribe is pretty accurate. After the first batch of wu's got downloaded about a minute later there is the next one. That keeps going until my S8's are loaded with 700 wu's and my S9's with around 1000. It happens with Einstein and Universe. Have you tried changing the work cache on your tablet to see if it happens to you, too? Ok, I will have a look at the event log settings to see if that helps... ID: 107432 ·

Drago75 Send message Joined: 29 Sep 20 Posts: 22	Message 107433 - Posted: 17 Mar 2022, 19:27:41 UTC - in response to Message 107429. Last modified: 17 Mar 2022, 19:27:49 UTC Since WCG Scheduler is missing / offline... Just a wild guess could BOINC be trying to retrieve the default "web based" scheduler settings from WCG, thus causing the crash? Does he have local BOINC override settings on his devices? Hey Dr Who Fan! I don't use local settings on my devices as I wouldn't know how to. You got me thinking it started happening around the moment when WCG stopped sending work so maybe it is related to them. Just changed some settings of the event log like Richard proposed. If that comes out a dead end I will detach from WCG on one device to see if that changes things. Thank you guys! ID: 107433 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5081	Message 107434 - Posted: 17 Mar 2022, 19:44:29 UTC - in response to Message 107432. Check the current event log first, if you haven't changed it yet. You should see 'Starting scheduler request' every minute-and-a-bit. If you could set <sched_op_debug> without breaking the behaviour, you should see exactly how many seconds of work you're requesting each time. ID: 107434 ·

Dr Who Fan Send message Joined: 10 May 07 Posts: 1350	Message 107435 - Posted: 17 Mar 2022, 20:11:56 UTC - in response to Message 107433. Last modified: 17 Mar 2022, 20:17:06 UTC Hey Dr Who Fan! .... You got me thinking it started happening around the moment when WCG stopped sending work so maybe it is related to them. Just changed some settings of the event log like Richard proposed. If that comes out a dead end I will detach from WCG on one device to see if that changes things. Thank you guys! No need to detach from WCG. Just goto Einstein or Universe and edit - no need to change anything then save your computing settings. Example: https://universeathome.pl/universe/prefs.php?subset=global Edit to add >>> This would make the project you edit settings on the new default to retrieve the global settings from. ID: 107435 ·

Drago75 Send message Joined: 29 Sep 20 Posts: 22	Message 107436 - Posted: 17 Mar 2022, 21:02:51 UTC - in response to Message 107435. Richard, the device I checked requested a scheduler update every 70 seconds but this may have been also because I had deactivated new work four days ago to get the work queue down. It was an Einstein device. I installed Universe and it downloaded 67 wu's at once and no new requests in the last 10 minutes. It leaves the device sitting with 167 wu's in the queue total so maybe that is the new target now. Maybe I have to deplete the queue completely before asking for new work. Who Fan, I did as you said and edited and saved the online settings of those two projects and activated new work on several devices which still have a lot of work in the pipeline. No new work as of now. Maybe that's a good sign. The problem of the crashes when changing the work cache remains on all devices. Some are 7.16.16., most are 7.18.1. My hope is that that goes away after WCG starts to send out work again. ID: 107436 ·

Dr Who Fan Send message Joined: 10 May 07 Posts: 1350	Message 107437 - Posted: 17 Mar 2022, 21:23:37 UTC - in response to Message 107436. Drago75, You only need to edit/save settings one active project. The device(s) will Pick up "new" settings on a project update. On your global web settings what are your settings for these two settings?: Store at least --- days of work Store up to an additional --- days of work The smaller the numbers the less chances of getting too much work that won't compete. For Android devices I myself keep the numbers at 0 to prevent running out free space on my phones. ID: 107437 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15480	Message 107438 - Posted: 17 Mar 2022, 21:26:34 UTC Last modified: 17 Mar 2022, 21:31:38 UTC BOINC on Android uses its own scheduler and work request code. It will ignore any such settings from websites. You can only set work requests from within the BOINC Manager GUI on Android. So, when the client crashes when you go from 7 days worth of work back to 0.1 days worth of work, it is a problem with the BOINC client for Android, and so you will need to tell the developer about it. The developer talregev can as far as I know only been found on Github, which is why this needs to be reported there. Perhaps that even AenBleidd (Vitalii Koshura) can take a quick look at it. If this is truly a problem with the scheduler and work fetch code, it can only be fixed with a new client. ID: 107438 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5081	Message 107439 - Posted: 17 Mar 2022, 22:24:30 UTC - in response to Message 107436. Richard, the device I checked requested a scheduler update every 70 seconds but this may have been also because I had deactivated new work four days ago to get the work queue down. It was an Einstein device. I installed Universe and it downloaded 67 wu's at once and no new requests in the last 10 minutes. It leaves the device sitting with 167 wu's in the queue total so maybe that is the new target now. Maybe I have to deplete the queue completely before asking for new work. 70 seconds exactly fits in with my observations over the years. It's made up of: 61 seconds requested by Einstein (60 set by the admins, plus 1% added by BOINC in case of clock synchronisation errors) 4 seconds typical time for the Einstein server to process a request 5 seconds inserted by the BOINC client between scheduler requests (all subject to miscellaneous +/- seconds depending on how busy everything is) Seeing it at Einstein, but not at Universe, also matches my observations. I've only ever seen it in relation to Einstein tasks. I usually run Einstein on GPUs, and have other projects on CPUs. When this problem strikes (very, very, rarely), Einstein fetches incessantly, the CPU project(s) doesn't. I'd vaguely assumed that the bug was in the GPU part of the client work-fetch algorithm, but the triggering state was caused by something unusual in the Einstein scheduler reply. And there it rests. ID: 107439 ·

Drago75 Send message Joined: 29 Sep 20 Posts: 22	Message 107447 - Posted: 18 Mar 2022, 12:02:19 UTC - in response to Message 107438. Hey Jord. Ok I will report that like you said. Does nobody else have that problem? Cheers to you all… ID: 107447 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15480	Message 107449 - Posted: 18 Mar 2022, 12:12:59 UTC - in response to Message 107447. You're the first to report it. I'm no longer running my Android farm, so cannot test it. I'm also assuming it isn't happening when you only switch to 7 days of work and then immediately back again, but that it needs several scheduler contacts before it crashes? Because that's more difficult to test even. ID: 107449 ·

Drago75 Send message Joined: 29 Sep 20 Posts: 22	Message 107464 - Posted: 18 Mar 2022, 15:37:17 UTC - in response to Message 107449. Boinc crashes with every change I do to the work cache. Even when I change it from 0.1 + 0 days to 0.2 + 0 days on the devices. The website settings are always set to 0.1 + 0 but as I learned they are ignored anyway on Android. ID: 107464 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 107466 - Posted: 18 Mar 2022, 16:29:23 UTC - in response to Message 107464. What happens if you try something like 0.1 + 0.001? Shooting in the dark it might be something to do with Android not handling zero values properly. ID: 107466 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5081	Message 107469 - Posted: 18 Mar 2022, 17:05:03 UTC The absolute size of the work cache doesn't seem to matter. Here's an example of the type of log that illustrates what happens: Sat 19 Feb 2022 10:41:27 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 21.84 seconds; 0.00 devices Sat 19 Feb 2022 10:41:28 GMT \| Einstein@Home \| Scheduler request completed: got 1 new tasks Sat 19 Feb 2022 10:42:28 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:42:29 GMT \| Einstein@Home \| Scheduler request completed: got 4 new tasks Sat 19 Feb 2022 10:42:29 GMT \| Einstein@Home \| [sched_op] estimated total NVIDIA GPU task duration: 5680 seconds Sat 19 Feb 2022 10:43:30 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:44:32 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:45:34 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:46:35 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:47:38 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:48:39 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:49:41 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:50:43 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:51:45 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:52:47 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:53:55 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:54:57 GMT \| Einstein@Home \| [sched_op] NVIDIA GPU work request: 5184.00 seconds; 1.00 devices Sat 19 Feb 2022 10:54:58 GMT \| Einstein@Home \| Scheduler request completed: got 4 new tasks Sat 19 Feb 2022 10:54:58 GMT \| Einstein@Home \| [sched_op] estimated total NVIDIA GPU task duration: 5229 seconds At the time, my cache settings must have been 0.05 + 0.01 days - that adds up to 5184 seconds. Note that the first request is for something trivial, with no idle devices - and it got one task, as you'd expect. That should have been the end of the matter. But a minute later, it asked again - for the full cache setting, plus an idle device. Each request got the full four tasks - I only saved the first and the last, for clarity. The problem seems to be that the incoming new work allocation, whilst complete - I watched them all download - doesn't get added to the client's knowledge of 'work ready to run'. So it asks again, and again, and again. This particular event was recorded on a Linux machine, so that's a third OS that can be affected. ID: 107469 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.