Cores vs Threads ? (hyperthread matter?)

Message boards : Questions and problems : Cores vs Threads ? (hyperthread matter?)
Message board moderation

To post messages, you must log in.

AuthorMessage
darwincollins

Send message
Joined: 9 Jan 10
Posts: 18
Message 33387 - Posted: 16 Jun 2010, 5:41:45 UTC

For Boinc projects (since they are computationally intensive), does having hyperthreading matter?

For example,

a computer with 4 cores/4 threads,
vs
a computer with 2 cores/4threads (due to HT)


i think the 4core/4thread will get more work done, but, at what percentage will it get work done over a 2core/4thread system?





ID: 33387 · Report as offensive
archae86

Send message
Joined: 18 Jan 08
Posts: 36
United States
Message 33412 - Posted: 16 Jun 2010, 20:51:00 UTC - in response to Message 33387.  

The answer is highly application dependent, and also will vary with the particular CPU implementation.

In direct careful comparisons, I've commonly seen same system net throughput improvement on the order of 10 to 20% in comparing running HT vs. running with HT disabled. But there certainly have been cases well outside that range (including a pathological case in which running HT actually lowered net throughput on one short series of Einstein third-party aps).

Now if, on the other hand you are comparing completely different architectures or generations, then the HT portion of the comparison is of minor importance compared to everything else.
ID: 33412 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 33413 - Posted: 16 Jun 2010, 21:10:27 UTC - in response to Message 33412.  

The answer is highly application dependent, and also will vary with the particular CPU implementation.

In direct careful comparisons, I've commonly seen same system net throughput improvement on the order of 10 to 20% in comparing running HT vs. running with HT disabled. But there certainly have been cases well outside that range (including a pathological case in which running HT actually lowered net throughput on one short series of Einstein third-party aps).

Now if, on the other hand you are comparing completely different architectures or generations, then the HT portion of the comparison is of minor importance compared to everything else.

Peter,

Would it be fair to say that those earlier comparisons were done on NetBurst-era HT processors? Have you had any chance to repeat them on the Core iN range, or do you know anyone else who has?
ID: 33413 · Report as offensive
darwincollins

Send message
Joined: 9 Jan 10
Posts: 18
Message 33434 - Posted: 18 Jun 2010, 5:34:46 UTC

I looked around based on Netburst/etc words, and found:

Hyper-Threading on vSphere
http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/

(summary: suggests to enable HT. slight 10% to 24% increase)
thought, the comments point to some debate
http://vpivot.com/2010/03/17/vsphere-4-0-hyper-threading-and-terminal-services/

Intel engineers stage CPU coup (2yr old article)
http://www.techworld.com.au/article/257165/intel_engineers_stage_cpu_coup


so, my guess, is that HT is good. Further, that I should think of a 4core/8thread pc as being upto 30% faster than a 4core/4thread pc. ?

ID: 33434 · Report as offensive
archae86

Send message
Joined: 18 Jan 08
Posts: 36
United States
Message 33443 - Posted: 18 Jun 2010, 15:46:28 UTC - in response to Message 33413.  

Would it be fair to say that those earlier comparisons were done on NetBurst-era HT processors? Have you had any chance to repeat them on the Core iN range, or do you know anyone else who has?
I wrote a long answer yesterday in this thread. Not sure if it was moderated away, or whether I failed to click on the post button after previewing it.

I'll recast the text part of my answer: my own personal owned system comparisons were done on a Gallatin, which is the large-cache variant of Northwood, which in turn was the next-process implementation of Willamette (with some appreciable improvement). So, yes, marketing called them all NetBurst, and they all were from a diseased branch of the Intel microprocessor tree--now happily cut off in favor of the vastly better Conroe and Nehalem branches. I don't currently operate any hosts capable of HT.

But my most recent measurement of this kind was on msattler's Frozen Nehi. The first was before it got frozen, and sadly, it also only had one of the three channels of RAM populated at the time, rendering the results of rather limited application. Still, they showed a quite modest hyperthreading productivity benefit in two Angle Ranges which had quite a bit of work at the time, and a slight disadvantage in another Angle Range region.

On the chance that embedded images are forbidden here, but links permitted, I'll include a couple of links this time:

Single RAM channel Nehalem HT comparison by AR

same comparison--expanded view near 0.4 AR


Much later in the Frozen Nehi's life, Mark undertook another comparison--this time on Astropulse, and this time running with RAM channels fully populated with high-performance RAM, overclocked as Mark would. With RAM starvation not getting in the way to nearly the degree seen in those first comparisons, HT but highly consistent productivity improvement--ballpark 10%.

Astropulse comparison on fully populated and overclocked system




ID: 33443 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 33447 - Posted: 18 Jun 2010, 21:45:48 UTC - in response to Message 33443.  

Thanks.

IIRC, some of the earlier NetBurst experiments showed much better results with dis-similar tasks - one SETI with one Einstein was a favourite pairing.

I know Mark is an avowedly one-project cruncher (SETI first, the rest nowhere) - and shortly to become a no-project cruncher, such is his disgust at the cack-handed way the latest BOINC server updates have been rolled out - so presumably mixed project pairing wasn't part of your tests with him. We've also lost Tony (mmciastro), who used to do similar testing with mainly AMD processors (if I may be permitted to use those letters in this company!)

So I wonder if the opening question has yet been answered with iN technology and diverse projects?

(posting from a Williamette, as it happens)
ID: 33447 · Report as offensive
archae86

Send message
Joined: 18 Jan 08
Posts: 36
United States
Message 33450 - Posted: 18 Jun 2010, 23:55:02 UTC - in response to Message 33447.  

IIRC, some of the earlier NetBurst experiments showed much better results with dis-similar tasks - one SETI with one Einstein was a favourite pairing.

< snip>
We've also lost Tony (mmciastro), who used to do similar testing with mainly AMD processors (if I may be permitted to use those letters in this company!)

No objection from me--I as a former employee have more concrete reasons to dislike Intel than most people do, though AMD fans tend to have serious blind spots to flaws and less than competitive aspects of that product set.
So I wonder if the opening question has yet been answered with iN technology and diverse projects?

(posting from a Williamette, as it happens)

I've looked at ap diversification benefit (specifically for Einstein and ordinary SETI) myself, as it happens, and even got the honor of having my results pointed to several times by one Joe Segur! and commented on by you. Those results were observed on a Q6600 (4-core Conroe). But I never looked at the question of application diversity benefit vs. hyperthreading (Conroe does not do HT). You are right in assuming there was nothing but SETI on Mark's system when I monitored it. I've certainly done nothing at all on i7 behavior in the face of ap diversity at all, still less on the HT interaction.
ID: 33450 · Report as offensive
darwincollins

Send message
Joined: 9 Jan 10
Posts: 18
Message 33469 - Posted: 20 Jun 2010, 23:34:45 UTC

The serious number stat crunchers can give a much better answer, but, my 'really rough' comparisons looking at the WGC project stats seem to indicate that HT may help 20% over non-HT. (well, its comparing I7 enabled HT to non-HT harpertowns so my comparison weak)

ps. I do appreciate reading your responses. The IT folks at work can't even fathom apps that normally run at 100% of the cpu except parrot that they must be badly written.


ID: 33469 · Report as offensive
perryjay

Send message
Joined: 19 Apr 09
Posts: 23
United States
Message 33473 - Posted: 21 Jun 2010, 13:04:10 UTC - in response to Message 33469.  

Darwincollins, explain to them that BOINC doesn't actually run at 100%, it uses the extra CPU cycles that aren't being used by anything else. That brings your CPU usage up to 100%. As other work asks for more space, BOINC cuts back out of the way and takes less of the share of cycles.

To get back on topic though, some of the people with i7s have tried running both with and without HT on. With HT the work units were slower but there were 8 of them at a time because they are running two WUs on each core. I believe they decided it was better with HT on but not twice as fast. If I remember right it was about like having two extra cores speed wise.
ID: 33473 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 33479 - Posted: 21 Jun 2010, 18:34:51 UTC - in response to Message 33412.  

The answer is highly application dependent, and also will vary with the particular CPU implementation.

In direct careful comparisons, I've commonly seen same system net throughput improvement on the order of 10 to 20% in comparing running HT vs. running with HT disabled. But there certainly have been cases well outside that range (including a pathological case in which running HT actually lowered net throughput on one short series of Einstein third-party aps).

Now if, on the other hand you are comparing completely different architectures or generations, then the HT portion of the comparison is of minor importance compared to everything else.


Last year I saw the tasks web page of a hyperthreaded computer running 8 CPDN HadAM3P climate models that were not designed for hyperthreading. IIRC it was a decent computer but the models were advancing I think 8 times more slowly than on my C2D 6600. I've never seen a slower speed on any other computer.

The lesson is not to use HP for CPDN models until a type is developed specially for it. This is planned but not soon. Or if you do try it, check what's happening.
ID: 33479 · Report as offensive
archae86

Send message
Joined: 18 Jan 08
Posts: 36
United States
Message 33480 - Posted: 21 Jun 2010, 20:42:34 UTC - in response to Message 33479.  

Last year I saw the tasks web page of a hyperthreaded computer running 8 CPDN HadAM3P climate models that were not designed for hyperthreading. IIRC it was a decent computer but the models were advancing I think 8 times more slowly than on my C2D 6600. I've never seen a slower speed on any other computer.

The lesson is not to use HP for CPDN models until a type is developed specially for it. This is planned but not soon. Or if you do try it, check what's happening.
I don't know what coding for HT benefit would mean, other than trying to get a smaller working set and other measures of RAM footprint.

If you saw a really dramatic slowdown, and there was not something non-comparable going on, then the most obvious possibility would be that the HT variant, with double the memory demand, pushed the system into heavy enough disk-swapping to slow it severely.

That is not what was going on in the case I observed. My systems generally have substantial RAM relative to the demands of the BOINC projects I've used them on.

But performance degradation when an appreciable amount of memory activity spills down to the next speed tier, whether from cache to RAM, or from RAM to disk, is really severe. So depending on configuration and code, could be a source of HT performance loss well below break-even.

Not that HT is required to get this effect. Someone was running a monster server possessing a large number (at least eight, I think) of the Intel hex-core processor Dunnington chips designed in India on BOINC a while back. The total throughput per core, and the execution time per result were just awful. I think the problem was that the system configuration provided far less RAM bandwidth per core than the smaller also Penryn-generation systems to which I compared it, so the processors spent most of their time waiting for RAM requests to complete.
ID: 33480 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 33481 - Posted: 21 Jun 2010, 21:27:11 UTC

I think Mo meant to draw a distinction with 'multithreading': the distinction is probably between 8 separate applications, running on 8 different tasks with eight different datasets: or a single application, spawning eight threads, all working on different aspects of the same task, and only accessing one dataset. Hopefully the latter case would suffer from far less memory bus contention. Those CPDN HadAM3P jobs have a heavy memory demand at the best of times: I've got one at the moment which is holding almost 220 MB in RAM even while 'waiting to run'.
ID: 33481 · Report as offensive
Ace Rimmer

Send message
Joined: 4 Sep 10
Posts: 4
Australia
Message 34764 - Posted: 20 Sep 2010, 10:40:29 UTC - in response to Message 33481.  

Just wondering if turning HT off on my i7 930 will I get a boost in the performance of non boinc apps whilst crunching at 100%. As it stands now, I'm having trouble running things whilst BOINC is at full. Would it be better to keep hyperthreading and reduce BOINC to use 7 cores???

Also, I believe you can overclock higher with HT off as it wont get as hot! Could I compensate for no HT with a good OC???
ID: 34764 · Report as offensive
Ace Rimmer

Send message
Joined: 4 Sep 10
Posts: 4
Australia
Message 34768 - Posted: 20 Sep 2010, 13:23:24 UTC - in response to Message 34764.  

Hey let me answer my own question... YES! I am seeing an improvement with other programs when HT is off and BOINC is on 100%.

Such programs as Media Player Classic and Firefox and Opera.
Media player and/or VLC's audio would pop and crackle and weren't playable until boinc was off.

Now that seems to be fixed with HT off. I've also been able to OC by 200mHz with no real change in temps... waiting for my h50 to return so I can OC more.

Hope this helps some ppl,

Jim.
ID: 34768 · Report as offensive
darwincollins

Send message
Joined: 9 Jan 10
Posts: 18
Message 34979 - Posted: 27 Sep 2010, 3:26:43 UTC - in response to Message 34768.  

I have heard similar ideas over on GPUGrid.

ID: 34979 · Report as offensive

Message boards : Questions and problems : Cores vs Threads ? (hyperthread matter?)

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.