BOINC memory constraint - GPU projects

Message boards : Questions and problems : BOINC memory constraint - GPU projects
Message board moderation

To post messages, you must log in.

AuthorMessage
Professor Ray

Send message
Joined: 31 Mar 08
Posts: 59
United States
Message 33598 - Posted: 29 Jun 2010, 22:26:22 UTC

I only have one question:


  1. Do GPU projects utilize Graphic Aperture virtual address space that normally is utilized by the graphics adapter for rendering of textures that exceed the limitation imposed by insufficient discrete VRAM
  2. If so, does the memory constraint specified in preferences include shared system memory, i.e., the virtual address pool utilized by the video subsystem as an adjunct to discrete on-board VRAM?



ID: 33598 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 14948
Netherlands
Message 33599 - Posted: 29 Jun 2010, 23:42:09 UTC - in response to Message 33598.  

Do GPU projects utilize Graphic Aperture virtual address space that normally is utilized by the graphics adapter for rendering of textures that exceed the limitation imposed by insufficient discrete VRAM

Are you asking if setting the Graphics Aperture Size to something high will make a difference when BOINC detects how much memory is on the videocard or -chip? For if so, the answer is no, it doesn't make a difference.

If you're asking if it matters for project applications, then you will have to ask at the projects. They decide how much videoRAM is used by their GPU applications.

If so, does the memory constraint specified in preferences include shared system memory, i.e., the virtual address pool utilized by the video subsystem as an adjunct to discrete on-board VRAM?

The answer to this one is no. Or at least, not when BOINC checks for how much memory is on the card. It'll read the library files provided by the videocard driver, which only count memory on the videocard or -chip, not what you add to from normal RAM, it if you have that option in the BIOS.

Whether it makes a difference or not for the project GPU applications is again something to ask the projects themselves.
ID: 33599 · Report as offensive
Professor Ray

Send message
Joined: 31 Mar 08
Posts: 59
United States
Message 33600 - Posted: 30 Jun 2010, 0:32:53 UTC

Thanx for the reply.

Its come to my attention that AGP aperture size is only relevent for systems that actually have an AGP port. Duh, huh? Howver, PCIe is a different animal and how Vista handles memory is not even a four legged horse shaped animal in comparison to its ancestors.

Am I understanding correctly that, setting aside any TurboCache (or its ATI equivalent) and/or shared system memory for the graphic sub-system issue, GPU projects essentially run outside of BOINC's purvuew with regards to memory constraints? That is to say: BOINC memory constraints apply exclusively to CPU dedicated projects.

That could be an issue for large memory footprint projects; this especially for hosts with multi-cored processors (or those single cored processors with hyperthreading functionality) that are processing WU's for projects in parallel; and also given Vista's propensity for using all available memory.







ID: 33600 · Report as offensive
Professor Ray

Send message
Joined: 31 Mar 08
Posts: 59
United States
Message 33621 - Posted: 1 Jul 2010, 9:19:44 UTC
Last modified: 1 Jul 2010, 9:33:19 UTC

This thread seems to put the kaibosch (sp?) to the notion of BOINC's utilization of shared system memory: http://setiathome.ssl.berkeley.edu/forum_thread.php?id=58305&nowrap=true#960571. While the display driver may utilize shared system memory to augment the texture buffer for textures that can't fit in the local frame-buffer, CUDA implements the graphic processing hardware for computation purposes. Since the mission, goals and objectives of CUDA apps are entirely different from rendering of 3D shapes onto a 2D surface, the procedures that CUDA relies upon are likewise entirely different.

I doubt that any other projects utilize shared system memory because prior to CUDA 2.2, CUDA kernels could not access host system memory directly. For that reason, CUDA programmers used the design pattern:

  1. Move data to the GPU.
  2. Perform calculation on GPU.
  3. Move result(s) from the GPU to host.


With the release of CUDA 2.2, however, that paradigm has changed whereby new APIs allow the mapping of host memory into device memory via cudaHostAlloc (or cuMemHostAlloc in the CUDA driver API).

These functions provide a new memory type that supports:


  • "Portable" pinned buffers that are available to all GPUs.
  • "Mapped" pinned buffers that map host memory into the CUDA address space and provide asynchronous transparent access to the data without requiring an explicit programmer initiated copy.

    Setting aside integrated CUDA capable graphic adaptors that supplement their VRAM with system memory, for discrete GPUs mapped pinned buffers are only a performance win in certain cases though. Since the memory is not cached by the GPU:

    • It should be read or written exactly once.
    • The global loads and stores that read or write the memory must be coalesced to avoid a 2x-7x PCIe performance penalty.
    • At best, it will only deliver PCIe bandwidth performance, but this can be 2x faster than cudaMemcpy because mapped memory is able exploit the full duplex capability of the PCIe bus by reading and writing at the same time. A call to cudaMemcpy can only move data in one direction at a time (i.e., half duplex).

      Moreover, a drawback of the current CUDA 2.2 release is that all pinned allocations are mapped into the GPU's 32-bit linear address space, regardless of whether the device pointer is needed or not. (NVIDIA indicates this will be changed to a per-allocation basis in a later release.)


  • Write-Combined (WC) memory that can provide higher performance:

    • Due to WC memory being neither cached, nor cache coherent, it isn't 'snooped' during transfers across the PCI Express bus. According to NVIDIA's notes - "CUDA 2.2 Pinned Memory APIs" - WC memory may perform as much as 40% faster on certain PCI Express 2.0 implementations.

    • It may increase the host processor(s) write performance to host memory because individual writes are first combined (via an internal processor write-buffer) so that only a single burst write containing many aggregated individual writes need be issued. (Intel claims they have observed actual performance increases of over 10x but this is not typical). For more information, please see the Intel publication: Combining Memory Implementation Guidelines




SOURCE: CUDA 2.2 Changes the Data Movement Paradigm

ID: 33621 · Report as offensive

Message boards : Questions and problems : BOINC memory constraint - GPU projects

Copyright © 2022 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.