Feature request: Make EDF mode Hyper-threading aware

Author	Message
Jesse Viviano Send message Joined: 14 Feb 11 Posts: 63	Message 36864 - Posted: 14 Feb 2011, 23:33:32 UTC I noticed that normally, hyper-threading allows more tasks to complete per unit of time, but it slows down each thread. I think that if BOINC could detect whether or not hyper-threading is active, it could then know not to allow more threads to run than the number of physical cores present unless the task causing EDF actually is requesting more threads than there are physical cores, like how some of AQUA@home's jobs can use all 12 logical cores of a 6-core Intel Core i7 980X Extreme Edition. This will make it easier for the task in danger of missing its deadline to actually meet the deadline. ID: 36864 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15483	Message 36865 - Posted: 15 Feb 2011, 0:14:26 UTC - in response to Message 36864. AQUA uses multi-threading in their application, which means that it will use all available cores, be it real or virtual, to calculate the tasks. When your system is running tasks in Earliest Deadline First mode a lot, then you have lots of problems on your own. Changing how BOINC does things will not fix that, unless you figure out where it's going wrong on your system. Perhaps your cache is a little too large even for your system? Aside from that, I fear it's nearly impossible for BOINC to detect how many true cores there are in any given system. I mean, the OS already sees all these CPUs, so how is any software going to know specifically which cores are real and which aren't? And then you have that little nifty problem of cross-platform-compatibility, where the source code for BOINC needs to be able to do that for Windows, Linux and Mac OS X without relying on external OS specific APIs or mechanisms. ID: 36865 ·

Jesse Viviano Send message Joined: 14 Feb 11 Posts: 63	Message 36868 - Posted: 15 Feb 2011, 5:21:48 UTC - in response to Message 36865. I am having tasks run in EDF mode not due to AQUA@home, but due to Collatz. It sent me a non-SSE application to my Core i7 980X which has SSE (because AMD included it in its definition of AMD64 which Intel copied) even though an SSE-optimized application is available, has set the deadline to a way too short time for the non-SSE application, and has given BOINC a way-too-short estimated time to complete. I am sure that the deadline would be easily met if it had correctly sent me the SSE application. I mentioned AQUA@home as a corner case that if not accounted for could fail to run under my proposed change. As for determining whether or not hyper-threading exists, I noticed that BOINC detects the CPU, gets its CPUID flags, and logs this information into the messages tab when it opens probably by using the CPUID instruction, at least in the Windows versions. It should be easy to detect whether or not hyper threading exists from this information. First, one needs to detect the CPU vendor because Intel and AMD have used the CPU flag "htt" for very different purposes. Intel has used "htt" for hyper-threading, while AMD turns it on for its multicore chips. If anything but "GenuineIntel" is found as the vendor ID, hyper-threading technology is not present. If "GenuineIntel" is the vendor ID string, look for the "htt" flag. If it is present, hyper-threading technology is present and more than likely enabled because most people leave hyper-threading enabled. However, I do see the need for a preferences option to let the user override this behavior because some people, most notably Windows 2000 users, will disable hyper-threading for one reason or the other. Windows 2000 is hyper-threading unaware and will therefore schedule threads poorly when hyper-threading is active (e.g. two threads on the same physical core when the other physical core is idle). ID: 36868 ·

Claggy Send message Joined: 23 Apr 07 Posts: 1112	Message 36870 - Posted: 15 Feb 2011, 11:21:59 UTC - in response to Message 36868. Last modified: 15 Feb 2011, 11:45:07 UTC I am having tasks run in EDF mode not due to AQUA@home, but due to Collatz. It sent me a non-SSE application to my Core i7 980X which has SSE (because AMD included it in its definition of AMD64 which Intel copied) even though an SSE-optimized application is available, has set the deadline to a way too short time for the non-SSE application, and has given BOINC a way-too-short estimated time to complete. I am sure that the deadline would be easily met if it had correctly sent me the SSE application. I mentioned AQUA@home as a corner case that if not accounted for could fail to run under my proposed change. Slicker already replied to you at Collatz in this message, the non SSE_x64 app and the SSE_x64 app are the same app: There is only one 64-bit app which is the stock app. Because it is x64, it automatically uses x86 intrinsics/assembler and 64-bit integers. The 32-bit SSE app is needed to differentiate between a non-x86 box (e.g. ppc64) which can't use the x64 intrinsics which speed up 64-bit emulation on the 32-bit platform. SSE isn't actually used at all in the app since it doesn't do any floating point math. If your computer is running the 64-bit app, that's as fast as it is going to get. The "SSE" version of the 64-bit apps is a COPY of the non-SSE version since ALL 64-bit versions support not only SSE but also SSE2. It isn't the SSE that speeds it up, but rather being able to use 64-bit integers natively or, on the 32-bit platform, being able to use x86 intrinsics in order to emulate 64-bit integers faster than using straight C code which is required on older 32-bit MACs running on PowerPC (and maybe PS3 boxes if I ever get around to trying to cross-compile for that as well). Slicker has recently made the collatz Workunit 4 times the size as before, so now takes 4 times as long, (this is aimed at GPU's taking less than 10 mins with the old collatz Wu size), and his introduced a mini_collatz Wu that takes half the time of the old collatz Wu, (this is aimed at CPU's and slow GPU's like my 8400M GS), but has reduced the time allowed for both sizes of Wu, But there is no way of setting your preferences to restict the CPU to only doing mini_collatz, (without an app_info) Also introduced at the same time is new Boinc server software, once each app has completed 10 validations, the Server will scale the <rsc_fpops_est> for each new Wu for that app, (your host) meaning over time the predicted runtimes of your ATI and CPU Wu's will improve, You're got a couple of choices, eithier restrict your PC to doing mini_collatz only, stop your CPU doing any Collatz tasks, or do nothing and wait for the Server to scale for your CPU apps, or make an app_info with only mini_collatz for the CPU, and collatz and mini_collatz for the GPU, Claggy ID: 36870 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15483	Message 36882 - Posted: 16 Feb 2011, 15:52:50 UTC - in response to Message 36868. As for determining whether or not hyper-threading exists, I noticed that BOINC detects the CPU, gets its CPUID flags, and logs this information into the messages tab when it opens probably by using the CPUID instruction, at least in the Windows versions. It should be easy to detect whether or not hyper threading exists from this information. Oh, detecting whether the CPU can do hyper-threading isn't the problem. The problem is to disable the virtual CPUs. How will any program know which are the real and which are the virtual processors? The OS doesn't even differentiate between the two, it just shows how many CPUs you have. And then what should BOINC do? Disable hyper-threading, a function that is only done in the BIOS? Or at a guess run with a max of 50% of the CPUs? As when it does that, it could well be that in your case of an i7/4/8, there's still 2 real/2 virtual CPUs running, or 1 real/3 virtual, or 3 real/1 virtual, or 4 real/0 virtual or 0 real/4 virtual. Source: http://www.intel.com/Assets/PDF/appnote/241618.pdf, Bit 28. The physical processor package is capable of supporting more than one logical processor. This field does not indicate that Hyper-Threading Technology or Core Multi-Processing (CMP) has been enabled for this specific processor. To determine if Hyper-Threading Technology or CMP is supported, compare value returned in EBX[23:16] after executing CPUID with EAX=1. If the resulting value is > 1, then the processor supports Multi-Threading. ID: 36882 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5082	Message 36884 - Posted: 16 Feb 2011, 16:15:26 UTC - in response to Message 36882. Oh, detecting whether the CPU can do hyper-threading isn't the problem. The problem is to disable the virtual CPUs. How will any program know which are the real and which are the virtual processors? The OS doesn't even differentiate between the two, it just shows how many CPUs you have. There a problem all right, but this isn't it. The real problem is: how do you distinguish between a four-core CPU with hyperthreading disabled (four logical cores) and a dual-core CPU using hyperthreading (four logical cores). Maybe Jord's EAX=1 test on bit 28 answers that one, making it possible to branch into different execution paths depending on whether HT is active or not. And then what should BOINC do? Disable hyper-threading, a function that is only done in the BIOS? Or at a guess run with a max of 50% of the CPUs? As when it does that, it could well be that in your case of an i7/4/8, there's still 2 real/2 virtual CPUs running, or 1 real/3 virtual, or 3 real/1 virtual, or 4 real/0 virtual or 0 real/4 virtual. Oh, come off it - this is no problem at all. BOINC doesn't employ CPU affinity, and nor - despite what Crunch3r and others keep saying - should it. BOINC just launches 'n' threads of work, and leaves it up to the underlying OS to manage the hardware. It would be perfectly possible, provided the HT detection works as advertised, for BOINC to launch (or preempt) tasks until the number of active threads was 'n/2', in the circumstances described by the OP. That may not get exactly the same benefit as disabling HT entirely in BIOS for the duration, but I bet you it would show some benefit with any OS sensibly paired with an i7. archae86 could probably help us with that one. ID: 36884 ·

Jesse Viviano Send message Joined: 14 Feb 11 Posts: 63	Message 36887 - Posted: 17 Feb 2011, 7:42:45 UTC - in response to Message 36882. As for determining whether or not hyper-threading exists, I noticed that BOINC detects the CPU, gets its CPUID flags, and logs this information into the messages tab when it opens probably by using the CPUID instruction, at least in the Windows versions. It should be easy to detect whether or not hyper threading exists from this information. Oh, detecting whether the CPU can do hyper-threading isn't the problem. The problem is to disable the virtual CPUs. How will any program know which are the real and which are the virtual processors? The OS doesn't even differentiate between the two, it just shows how many CPUs you have. And then what should BOINC do? Disable hyper-threading, a function that is only done in the BIOS? Or at a guess run with a max of 50% of the CPUs? As when it does that, it could well be that in your case of an i7/4/8, there's still 2 real/2 virtual CPUs running, or 1 real/3 virtual, or 3 real/1 virtual, or 4 real/0 virtual or 0 real/4 virtual. Source: http://www.intel.com/Assets/PDF/appnote/241618.pdf, Bit 28. The physical processor package is capable of supporting more than one logical processor. This field does not indicate that Hyper-Threading Technology or Core Multi-Processing (CMP) has been enabled for this specific processor. To determine if Hyper-Threading Technology or CMP is supported, compare value returned in EBX[23:16] after executing CPUID with EAX=1. If the resulting value is > 1, then the processor supports Multi-Threading. According to this report on CPUID by Microsoft, AMD sets bit 28 on all of its multicore processors. That is why a programmer must check the vendor string before examining the flag. On another note, I think that Windows XP, later versions of Windows, and somewhat recent versions of Linux are smart enough to avoid assigning two threads to one physical core unless there are no more completely free physical cores. Windows XP SP3 and newer has the GetLogicalProcessorInformation function that will help you determine if you have hyper-threading enabled. If you really want to manually set processor affinity with Windows 7 or later, see the GetLogicalProcessorInformationEx function to get all of the information you need to optimally schedule processor affinity manually if you do not trust Windows' scheduler. However, I think that Windows' scheduler should be able to handle this task well enough unless it is Windows Vista, which has the unfortunate quirk of bouncing threads from core to core which conflicted with AMD's Cool'n'Quiet in the original AMD Phenom and ruining performance with Cool'n'Quiet enabled. (This bouncing around reminds me about how mortgages and their notes have been bounced around until nobody knows who owns the mortgages and their associated notes, which is now messing everyone up who has any direct connection to delinquent mortgages.) The Phenom II shipped with a workaround that forced the cores to run at the same speed for stupid OSes that bounced threads around without care to avoid the slowdown but allowed OSes that either were aware of underclocked cores or did not bounce threads around to run the cores at individual frequencies and save power. Someone will need to bench Windows 7's scheduler to see if threads bounce around carelessly or if the threads are scheduled sensibly, and to write code for BOINC to set processor affinities itself if Windows 7's scheduler is messed up. For Linux, search for information on "/proc/cpuinfo". You could either process the result with grep to get the number of physical or logical cores, or you could process the whole output so you might have enough info to optimally set processor affinities if you do not trust Linux's scheduler. However, I think that Linux is likely to be able to schedule threads optimally on its own if its version is late enough to be aware of hyper-threading, so I doubt that anyone needs to distrust Linux's scheduler. ID: 36887 ·

Jesse Viviano Send message Joined: 14 Feb 11 Posts: 63	Message 36889 - Posted: 17 Feb 2011, 9:31:22 UTC - in response to Message 36887. I just realized that the reason Windows Vista and AMD's Cool'n'Quiet for the Phenom conflicted with each other is probably because the Phenom came after Vista was introduced. Therefore, it is likely that Vista's scheduler could deal with hyper-threading but not Cool'n'Quiet for the Phenom, and 7's scheduler probably can deal with Cool'n'Quiet for the Phenom and hyper-threading. I think that the thing about 7's scheduler not being sane might be paranoia on my part. ID: 36889 ·

Jesse Viviano Send message Joined: 14 Feb 11 Posts: 63	Message 36891 - Posted: 17 Feb 2011, 11:26:56 UTC Here is one other question that I just thought of: should the EDF scheduler use the virtual cores that would otherwise be left alone if there are more tasks in danger of missing their deadlines than there are virtual cores? For example, if BOINC running on a Pentium 4 Prescott with hyper-threading running on Socket 478 and therefore cannot run in 64-bit mode notices two results in danger of missing their deadlines, should both virtual cores be used, or should the result with the closer deadline be scheduled with the other result held back so that the first result can virtually monopolize the core? As you might know, the Pentium 4 has very long pipelines and very few registers, practically guaranteeing plenty of pipeline bubbles due to the register pressure that can be exploited by hyper-threading. The answer in this situation probably would be to use the virtual cores. Now consider a similar situation on a Core i7 Gulftown running a 64-bit OS and primarily runs mostly 64-bit jobs. The user just vacationed and forgot to empty the cache. He starts it up and there are now twelve or more results that might miss their deadlines. Should all twelve virtual cores be used? While hyper-threading will provide good benefits with 32-bit tasks due to register pressure creating plenty of pipeline bubbles, 64-bit tasks have access to more registers, so they tend to have less register pressure and generate fewer pipeline bubbles. The Core i7's shorter pipelines also relieve register pressure as well. Hyper-threading is not as much help here, but still can improve throughput here if the task cannot fit in the registers but both tasks going to that physical core fit within the caches or if the tasks stress different parts of the CPU like a crypto task which uses the integer unit exclusively paired with a molecular dynamics task that mostly stays within the FPU, or hyper-threading could hurt here if the working set of both tasks are too big to fit within the caches of the physical core that both tasks were assigned to, causing the core to go out to DRAM all the time. The situation could be less clear here because BOINC has no way to determine which one of these situations apply. In the case of the crypto task and the molecular dynamics task being paired together, using the second virtual core is a good idea. For the two tasks which would fit within their caches if they were ran separately and do not fit in the cachces together because they are too big, leaving the second virtual core idle is the correct idea here. ID: 36891 ·

SekeRob2 Send message Joined: 6 Jul 10 Posts: 585	Message 36894 - Posted: 17 Feb 2011, 15:58:46 UTC - in response to Message 36891. There is really no "Virtual Core". The CPU / OS split the cores resources as needed and because of nifty tricks, the 'gaps'being employed to squeeze out that extra 15% or whatever per hard-core. So, if a quad HT device has a BOINC client that is allowed to only use 4 cores, then for all intends and purposes all resources go towards 1 thread of the 2 on any core, but that 15% is than gone out the window. This was in it's simplest how it was made to understand to me and sounded highly reflective of the reality. ID: 36894 ·

Jesse Viviano Send message Joined: 14 Feb 11 Posts: 63	Message 36935 - Posted: 20 Feb 2011, 4:21:13 UTC - in response to Message 36894. You are missing the point of EDF mode. When two threads are running on the same physical core, they slow each other down when they compete for the same resource, like the units that issue the instructions (which always happens), the execution units (which often happens), the units that issue memory operations (which almost always happens because x86 has too few registers), and the instruction completion units (which always happens). Hyper-threading works because there are times when a thread cannot use a unit for one reason or another. However, this decreases the available execution resources for each thread. The point for EDF is to get results done before the deadline. To be fair, the scheduler should multiply the time executed under EDF mode by two to penalize programs that underestimate the time they will take to execute to be fair for scheduling purposes. ID: 36935 ·

Claggy Send message Joined: 23 Apr 07 Posts: 1112	Message 36938 - Posted: 20 Feb 2011, 15:46:37 UTC - in response to Message 36937. Last modified: 20 Feb 2011, 15:47:20 UTC You are missing the point of the point I made, but pointless to explain ;-) And penalize programs for underestimation... sorry, nonsense! Many sciences are non-deterministic in nature, some heavily... impossible to provide a ''close enough'' FPOPS for each task. What the client needs is a DCF per science application for umbrella projects, and users that run with sensible levels of caches. There's enough features in the client combined with multi-project crunching to auto-compensate if one overruns it's share or temporarily has no work, to later go out and give the other more time. --//-- also Collatz's Boinc Server needed to send the Stock 64bit app to Jesse's host, and not the 32bit non-SSE app, which is probably almost half the speed of the 64bit app, Slicker has in the last day changed things so 64bit hosts only get 64bit apps, Since the collatz Wu is now a 3,000 Credit Wu, the CPU apps were going to be hard pressed to finish in time anyway, Slicker has since then deprecated the CPU apps for the collatz application, and CPU's can only now get Wu's for the mini_collatz application, Claggy ID: 36938 ·

Jesse Viviano Send message Joined: 14 Feb 11 Posts: 63	Message 36946 - Posted: 21 Feb 2011, 0:32:52 UTC - in response to Message 36937. The main reason for the penalty is not for punitive purposes, but for fairness to the applications that were suspended during the EDF mode. I understand where you thought that penalty meant punishment. Another purpose is to help the scheduler ramp up its estimated time to completion for future tasks so that it won't demand as much work during future scheduler updates so the chance for missing deadlines shrink because there will be less tasks to compete for a logical core during normal scheduling, which also reduces the chance of EDF mode. As for nondeterministic workloads, I would rather have a wildly overestimated deadline rather than a reasonable one which results in BOINC halting the result with a maximum time exceeded error. This has caused the failure of several work units a year or two ago because that project hit an area of numbers that would take a ridiculously long time to analyze, causing everyone to fail the BOINC time limit. If the estimated run time is wildly overestimated, the BOINC client program will figure it out and adjust its estimation appropriately, but will be less likely to fail a huge result due to maximum time exceeded. ID: 36946 ·

Jesse Viviano Send message Joined: 14 Feb 11 Posts: 63	Message 36982 - Posted: 23 Feb 2011, 3:46:55 UTC - in response to Message 36946. I forgot to identify the project with the maximum time exceeded errors in my last post: ABC@home. ID: 36982 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.