Boinc over using memory

Message boards : Questions and problems : Boinc over using memory
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Richie

Send message
Joined: 2 Jul 14
Posts: 186
Finland
Message 68903 - Posted: 11 Apr 2016, 12:24:49 UTC

What are your settings for memory usage in BOINC: http://boinc.berkeley.edu/wiki/Preferences#Memory

What Windows do you have? What are the memory settings for it? You can limit the maximum page file size or even disable paging.

http://windows.microsoft.com/en-us/windows/change-virtual-memory-size#1TC=windows-7
ID: 68903 · Report as offensive
SekeRob2

Send message
Joined: 6 Jul 10
Posts: 585
Italy
Message 68923 - Posted: 11 Apr 2016, 19:32:06 UTC - in response to Message 68903.  
Last modified: 11 Apr 2016, 19:32:59 UTC

You can use app_config.xml at project level to limit how many per app and project total can run concurrent.
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 68923 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 68955 - Posted: 12 Apr 2016, 21:52:46 UTC - in response to Message 68931.  

My computer specs are posted below my chat.


Sandman192's signature wrote:
Software: Win 10 Pro 64 bit-Video Drivers v364.73-CUDA v7.0-V-Box 5.0.16
Hardware: ASUS X-99 Deluxe-i7 5930K Extreme Edition 3.50Ghz, 12 core-16GB of Ram-1 EVGA GTX 980 GTX 4GBs-250 SSD for BOINC Program and 2TB HDD for data-BOINC v7.6.22


Next time please include the specs in your post. Some people have signatures hidden and once you update the sig it's no longer possible to tell what kind of system you had a problem with.

Ok then, questions.

Did all Lattice tasks start at the same time?

Lattice seems to have tasks with wildly varying memory requirements. Did you already have a smaller task started?

What the memory allocation pattern for Lattice tasks look like? Does it allocate memory in small blocks over minutes or does it allocate all memory in just seconds?

Open client_state.xml file in BOINC's data directory and find the workunit section for each of the Lattice tasks. What the <rsc_memory_bound> value is for each task? Is the value larger than the actual memory usage for each task?
(sample:
<workunit>
    <name>20ap10ab.11539.4162.5.32.172</name>
    <app_name>setiathome_v8</app_name>
    <version_num>800</version_num>
    <rsc_fpops_est>184242804412977.000000</rsc_fpops_est>
    <rsc_fpops_bound>3684856088259540.000000</rsc_fpops_bound>
    <rsc_memory_bound>33554432.000000</rsc_memory_bound>
    <rsc_disk_bound>33554432.000000</rsc_disk_bound>
    <file_ref>
        <file_name>20ap10ab.11539.4162.5.32.172</file_name>
        <open_name>work_unit.sah</open_name>
    </file_ref>
</workunit>
)
ID: 68955 · Report as offensive
Profile David Anderson
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 10 Sep 05
Posts: 717
Message 68971 - Posted: 13 Apr 2016, 23:30:59 UTC

Sandman, can you please

- In the BOINC Manager, go to Options/Event Log Options
and check mem_usage_debug.
Your Event Log will now have messages (every 10 seconds)
showing how much memory each running job is using
(more specifically, its "working set size").

- Compare these numbers to the sizes you see in the Win Task Manager.
Look at Private Working Set.
I'm not sure what Commit Size is; it tends to be larger.

- If the working set sizes reported by BOINC add up to
more than 16GB (or the fraction specified by your prefs)
then there's a bug in BOINC;
please post an event log segment here,
or email it to me (davea at ssl dot berkeley dot edu).

-- David
ID: 68971 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 69034 - Posted: 17 Apr 2016, 21:30:41 UTC

David,

When the client is considering starting a new task, it assumes the new task's working set to be the maximum of already running tasks for the new task's app version. If there are no running tasks for the app version then the working set size is taken to be zero.

Lattice's GARLI has tasks with wildly varying memory usage, working set sizes can go from tens of megabytes to several gigabytes. If there is already a small task running the client will assume the next task is just as small even if in reality the next task is hundred times larger and has no chance of fitting in available RAM.

The client should at some point realise it has overloaded the machine and start unloading tasks from memory. But since the machine is swapping like crazy it may take a good while before the client manages to react and the app reacts to the exit command.

I don't think we should rely on the client being able to react. The obvious reason is that the machine is pretty much unusable during swapping and if the machine runs out of swap space any programs the volunteer runs may crash. Linux boxes are prone to running out of swap space due to being traditionally set up with fixed size swap partition. On Linux running out of swap space tends to result in swapping frenzy that's stoppable only by pushing the reset switch.

I think it would be better if the client assumed that an unstarted task's working set size to be the maximum of the working set sizes of any running task and the task's rsc_memory_bound. rsc_memory_bound alone isn't enough, some projects have set it way too low.

This would most likely result in idle resources from time to time but I think that is better than overusing available RAM by 4x. The example Sandman192 gave with rsc_memory_bound=10GB would likely result in being able to run only one such task at a time on a 16GB machine even if the example task's actual working set size was less than 10GB.
ID: 69034 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 69233 - Posted: 28 Apr 2016, 12:26:02 UTC - in response to Message 69206.  

It's doing it again

So please do as David asked you to do:
- In the BOINC Manager, go to Options/Event Log Options and check mem_usage_debug. Your Event Log will now have messages (every 10 seconds)
showing how much memory each running job is using (more specifically, its "working set size").

- Compare these numbers to the sizes you see in the Win Task Manager.
Look at Private Working Set.
I'm not sure what Commit Size is; it tends to be larger.

- If the working set sizes reported by BOINC add up to more than 16GB (or the fraction specified by your prefs) then there's a bug in BOINC;
please post an event log segment here, or email it to me (davea at ssl dot berkeley dot edu).

Without any of that information we cannot help out.
ID: 69233 · Report as offensive
Coleslaw
Avatar

Send message
Joined: 23 Feb 12
Posts: 198
United States
Message 69254 - Posted: 28 Apr 2016, 20:12:51 UTC - in response to Message 69233.  
Last modified: 28 Apr 2016, 20:13:33 UTC

It's doing it again

So please do as David asked you to do:
- In the BOINC Manager, go to Options/Event Log Options and check mem_usage_debug. Your Event Log will now have messages (every 10 seconds)
showing how much memory each running job is using (more specifically, its "working set size").

- Compare these numbers to the sizes you see in the Win Task Manager.
Look at Private Working Set.
I'm not sure what Commit Size is; it tends to be larger.

- If the working set sizes reported by BOINC add up to more than 16GB (or the fraction specified by your prefs) then there's a bug in BOINC;
please post an event log segment here, or email it to me (davea at ssl dot berkeley dot edu).

Without any of that information we cannot help out.


This is also making the wild assumption that when the system becomes non-responsive that you can even get to the logs to view. Are these saved locally so that they can be viewed if the system was hard reset? I've never tried that option and am curious. I've experienced TLP doing this many of times and it is quite annoying.
ID: 69254 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 69255 - Posted: 28 Apr 2016, 20:25:47 UTC - in response to Message 69254.  

The event log is saved into the stdoutdae.txt file in the data directory. By default only the last 2MB is saved, with a roll-over to a stdoutdae.old file for the last backup log. Through cc_config.xml's options one can set the size of this file to be much much larger.
ID: 69255 · Report as offensive
Coleslaw
Avatar

Send message
Joined: 23 Feb 12
Posts: 198
United States
Message 69260 - Posted: 28 Apr 2016, 21:55:53 UTC - in response to Message 69255.  

The event log is saved into the stdoutdae.txt file in the data directory. By default only the last 2MB is saved, with a roll-over to a stdoutdae.old file for the last backup log. Through cc_config.xml's options one can set the size of this file to be much much larger.


Now that the PrimeGrid challenge is over, I may attempt to pull some more TLP work units and test. Thanks for the info. If I find anything, I will post.
ID: 69260 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 69301 - Posted: 30 Apr 2016, 0:03:22 UTC - in response to Message 69297.  

Before my computer froze I went into v-box and it showed 9 of them running with 2 gigs shared for each totaling 20 gigs.

You will really have to try to give an excerpt of the log with mem_usage_debug on, on a situation as you described. Giving an excerpt of the situation afterwards when it no longer is a problem isn't going to be helpful.

So continue to run with the mem_usage_debug flag on, if needed increase the size of your stdoutdae.txt file through cc_config.xml: http://boinc.berkeley.edu/wiki/Client_configuration#Options.

For instance &lt;max_stdout_file_size>20119200%lt;/max_stdout_file_size> will set stdoutdae.txt to 19.2 MB, the size is in bytes.

<cc_config>
  <log_flags>
  </log_flags>
  <options>
      <max_stdout_file_size>20119200</max_stdout_file_size>
  </options>
</cc_config>

The above will do that.
If you already have a cc_config.xml file, check to see if the line isn't already in there but at its default '0' value, then you can just change it.
Afterwards it's best to exit BOINC & restart it.

Then the next time you find your system going slow or worse, exit BOINC and dig out the stdoutdae.txt file, then get us an excerpt of the last lines, something like the last 50 lines or so.
ID: 69301 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 69308 - Posted: 30 Apr 2016, 13:30:19 UTC - in response to Message 69303.  

Having mem_usage_debut flag on won't help unless I have it get more ATLAS@home to work on. Which I don't want that to happen.

Sorry to be blunt, but if you don't want to test the circumstances under which this happens, while you have said you can easily reproduce it, how do you want anyone to fix this if we don't know what there is to fix?

Is it BOINC or is it The Lattice Project or ATLAS I don't know but something went wrong.

Or your system, or your preferences, or or or. Without rigorous testing, no one will be able to answer any of that.

As I said before, posting logs of when things work without flaw isn't going to be of any help to anyone. The log snippet you showed before said you were using 3GB of RAM for BOINC and 5GB of RAM for non-BOINC.
ID: 69308 · Report as offensive
SekeRob2

Send message
Joined: 6 Jul 10
Posts: 585
Italy
Message 69363 - Posted: 2 May 2016, 7:01:42 UTC - in response to Message 69326.  

Of course app_config.xml can be used to set the concurrent restriction for the problem causing tasks. Maybe if you set the Write to Disk time to a value greater than the longest runtime, the disk hitting stops (very large checkpoint files?)
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 69363 · Report as offensive
Cruncher Pete

Send message
Joined: 16 Oct 10
Posts: 27
Australia
Message 69364 - Posted: 2 May 2016, 7:39:29 UTC - in response to Message 69308.  
Last modified: 2 May 2016, 7:44:58 UTC


As I said before, posting logs of when things work without flaw isn't going to be of any help to anyone. The log snippet you showed before said you were using 3GB of RAM for BOINC and 5GB of RAM for non-BOINC.


I reread this thread and I find no information to what you are saying...

I just can't stand this anymore listening to an obnoctious moderator whose job is to keep any problems away from David Anderson. I followed this guys problems and at no timne did I find your help to be helpfull for when he did follow you, you constradicted yourself and even told him that he did not follow your advice for what he provided was not helpful. Admittedly, I did not reply for I did not have the answer. Your typical help is to tell your users that their messages is not understood, too short or too long as I have had previous experiance with you on this regard.

Go ahead, ban me, I no longer care, but I ask look at your method of operations, please do not judge users who have problems for they would not post here unless they are seeking help. Your attitude in replying to them needs to be improved When someone is in trouple, the last thing you want is to tell them that have a problem and it is their fault for not knowing better or following your advice to the latter. Please do not forget that this is an international; forum and interpretations of meanings are difficult at best. If you do not like my reply, than PM me, but I will not guarantee that I will reply as you said to your users unless it suits me...
ID: 69364 · Report as offensive
Profile Agentb
Avatar

Send message
Joined: 30 May 15
Posts: 265
United Kingdom
Message 69366 - Posted: 2 May 2016, 10:10:01 UTC - in response to Message 69364.  

I just can't stand this anymore

Neither can i, 30508 added here
ID: 69366 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 69367 - Posted: 2 May 2016, 10:20:13 UTC - in response to Message 69364.  
Last modified: 2 May 2016, 10:22:01 UTC

I reread this thread and I find no information to what you are saying...

4/29/2016 5:47:06 PM | | [mem_usage] BOINC totals: WS 3094.27MB, smoothed 3094.27MB, swap 2941.44MB, 0.00 page faults/sec
4/29/2016 5:47:06 PM | | [mem_usage] All others: WS 5010.64MB, swap 6173.04MB, user 34819.875s, kernel 14237.250s
4/29/2016 5:47:06 PM | | [mem_usage] non-BOINC CPU usage: 5.26%

Source.

I just can't stand this anymore listening to an obnoctious moderator whose job is to keep any problems away from David Anderson.

If that were so, I would not have forwarded this thread to David, and he would not have posted in this thread.
ID: 69367 · Report as offensive
SekeRob2

Send message
Joined: 6 Jul 10
Posts: 585
Italy
Message 69585 - Posted: 13 May 2016, 18:51:58 UTC - in response to Message 69583.  
Last modified: 13 May 2016, 18:52:47 UTC

All event log messages are stored in the stdoutdae.txt file. If your system hangs, that's where they are up to the crash moment.
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 69585 · Report as offensive
SekeRob2

Send message
Joined: 6 Jul 10
Posts: 585
Italy
Message 69657 - Posted: 19 May 2016, 19:48:02 UTC - in response to Message 69655.  
Last modified: 19 May 2016, 19:49:02 UTC

All event log messages are stored in the stdoutdae.txt file. If your system hangs, that's where they are up to the crash moment.


I you read my form before I've been doing that.

You wrote "I've looked all over for any log file that shows exactly what it shows in the Event Log.". This infers you could not find the file, so I tried to be helpful, as others have tried in this thread!
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 69657 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 69710 - Posted: 23 May 2016, 15:30:41 UTC - in response to Message 69674.  

18 Task running

No, &lt;mem_usage_debug> shows the memory use of 18 tasks, it does not mean they are all running at this time. This debug flag shows the memory usage of all tasks that run or at one point have run, but have since suspended.

16-May-2016 05:40:55 [---] [mem_usage] BOINC totals: WS 291.95MB, smoothed 19284.53MB, swap 1436.48MB, 0.00 page faults/sec

As far as I understand it - but I am trying to get that confirmed - the smoothed value is virtual memory usage. This may include shared memory, but it's certainly not the RAM value.

The value that the developers look at is the WS value. If that goes over the value you have for RAM, there is a problem. Thus far it hasn't ever gotten above it.
ID: 69710 · Report as offensive
ChristianB
Volunteer developer
Volunteer tester

Send message
Joined: 4 Jul 12
Posts: 321
Germany
Message 69715 - Posted: 24 May 2016, 8:57:39 UTC

@Sandman192: Can you please for the time being uncheck the "Leave application in memory" bos in your preferences for this computer? And also gradually reduce the number of tasks run via the "use at most % processors" preference?

This means there will be idle cores but it would verify that BOINC is overtasking the RAM. If the freezes still happen with only some tasks running we can check if a specific project is the source of the freezes.

Also, please be patient when looking for help. I understand that you don't want your PC to freeze all the time but everyone involved with BOINC is a volunteer (even the developers) and may not have time immediately to look into this.
ID: 69715 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 69718 - Posted: 24 May 2016, 10:02:43 UTC - in response to Message 69710.  
Last modified: 24 May 2016, 10:39:57 UTC

As far as I understand it - but I am trying to get that confirmed - the smoothed value is virtual memory usage.

Well, I was close, but no cigar. The smoothed WS value shows the memory usage of apps inside a virtual machine. Memory use inside a virtual machine can be as much as there is actual memory in the computer (although it will be swapping to the page file then, which can cause severe slowdowns).

The scheduler inside BOINC will check if the WS value is less than physical RAM * (mem usage preference). As long as that's true, tasks will continue to run.
ID: 69718 · Report as offensive
1 · 2 · Next

Message boards : Questions and problems : Boinc over using memory

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.