Help with Project Points Proposal

Message boards : The Lounge : Help with Project Points Proposal
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14747 - Posted: 8 Jan 2008, 17:14:34 UTC

Hi everyone.

I have an idea, about points, which I would like to throw open to general discussion and tweaking ideas.

Firstly this idea has been put to a number of BOINC projects and they have all referred me here with it. Also I am not very computer savvy, so this has grown from observation and user experience, rather than detailed technical knowledge.

It seems obvious to me, that with the growing technological breakthroughs in CPU and computer design in general, that the benchmarking of computers has little relevance to actual performance.
One of the greatest of these breakthroughs has been increased L2 cache and more recently, L3 cache on the CPU. This has an exponential effect on the way a CPU can handle WUs and is increasingly being exploited by various projects within the BOINC community. The obvious answer is a Fixed Point System.

This, in turn, leads to the greatest hurdle that I believe BOINC faces today. What criteria does one base a Fixed Point System on, given the vast difference in computer capabilities? Many of the systems currently in use, for fixed point allocation, are heavily penalizing older systems and inadvertently also penalizing the new ones as well. Most are little more than a blind stab in the dark, trying to achieve a balance, where none can be reached due to the constantly moving and changing technology.

My suggestion is a simple one. The original Cobblestone Points System, while never perfect, is still excellent and useful. It is designed however to measure the potential of a very specific group of computers, with their inherent limitations of an era. That, is to our advantage today. Being designed to work with computers with at the very most, 512MB of ram, 128K or, at a stretch, 256K L2 cache and lets say 1GHz CPU on 100FSB to be generous. Throw in a PCI video card, and one would have a monster machine of the era. Any project could build or obtain such equipment for minimal, or no, cost today.

This then, would be our Universal Benchmark Machine. By using it to 'crunch' a sample of WUs from a project, and awarding the Cobblestone points produced by such a machine as THE Fixed Points for that run of work, would remove the ambiguity and guesswork from the process. Older machines would now receive the reward points similar to what their claimed points have always been. Faster, more modern, or better suited computers would receive extra points/day by doing more work, as the original design of the Cobblestone System intended. While I am aware that some projects have fairly constant run times within a project others do not. At least with a firm base to build upon, the likelihood of an appropriate algorithm being developed to deal with this situation greatly increases.

I believe that this system is also 'future proof' in that as fewer PII and PIII era systems are used, the Universal Benchmark Machine could be moved on to the next level, by consensus.

Thoughts, ideas, suggestions, questions? Feel free.

Ozylynx
Keith.
ID: 14747 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 14754 - Posted: 8 Jan 2008, 18:32:13 UTC - in response to Message 14747.  

This then, would be our Universal Benchmark Machine. By using it to 'crunch' a sample of WUs from a project, and awarding the Cobblestone points produced by such a machine as THE Fixed Points for that run of work, would remove the ambiguity and guesswork from the process.

How would you know how many credits to award to the sample of WUs on the benchmark machine?
ID: 14754 · Report as offensive
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14761 - Posted: 8 Jan 2008, 19:57:06 UTC - in response to Message 14754.  

How would you know how many credits to award to the sample of WUs on the benchmark machine?

Nicolas.

The points on the benchmark machine would be the claimed points, by that machine, using the existing cobblestone formula. It then becomes a standard. The machine described would claim around 5.5+ cobblestone points/hour multiplied by the number of hours to complete the WU gives the value of the WU for all computers.

The difficulty I see right now is that a Cobblestone point seems to have no real quantity to define it. A knot is a knot if you are travelling by air or sea, at 6 knots or Mach3, a knot doesn't change, it remains a constant measure of distance travelled/time. This is a crude physical method of giving a cobblestone a very specific value.

There is a problem with this idea that even I can see. If a project has a low demand on RAM and on L2 cache, it will offer fast machines a relatively low PPH return in comparison to a project with higher demands would. The reason is that the benchmark machine will become swamped and bogged down with cache overflow while the more modern machine breezes right on through. The very same reason the cobblestone is no longer viable. IMO.

The result of course would be that faster machines would gravitate to more demanding projects and slower machines to less demanding projects. I'm not sure that's a bad thing. In fact, that may be the way it should be.

Cheers.
Keith
ID: 14761 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 14950
Netherlands
Message 14806 - Posted: 9 Jan 2008, 22:51:56 UTC

Check out [trac]wiki:CreditProposal[/trac].
ID: 14806 · Report as offensive
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14816 - Posted: 10 Jan 2008, 13:01:36 UTC - in response to Message 14806.  

Check out [trac]wiki:CreditProposal[/trac].

First reaction. The way I read it, more powerful(valuable) equipment would not seem to reap their deserved rewards. Creation of Data Bases, gathering of information about CPU types, which was pointed out to me is inaccurate given the reporting method in current use, with the inherent loss of privacy involved in snooping a system to get accurate data... Just for stsrters. I wouldn't know where to begin. How would one cope with a BeoWulf system? Overclockers etcetera?

I absolutely agree with the idea, of publishing anticipated credit for particular classes of system at the project web site. One of the many reasons, I have been thinking about this issue, was that I recently attached a C1300 to a project recommending minimum 1Ghz processor. This is a notoriously generous project so, after 10 days of crunching and claiming 1465points, I was awarded 850 points under a 'fixed point system'. Reason, insufficient L2 cache@256K. 10 Days which could have been used efficiently elsewhere. This is one of many such instances, which amounts IMO to greed and/or laziness on the part of the project admins. That's another topic.

Previous statement, of mine, about the cobblestone having no fixed value, is of course wrong. What I was trying to say is that it is now influenced by external factors to such an extent that it appears to have no fixed value. Cobblestones spent writing swap files, either from RAM to Cache or HDD to RAM are counted and demanding projects on those resources, produce a 'warp' effect.

It is that effect which needs to be addressed. There is nothing inherently wrong with the Cobblestone! The computer which doesn't cope with a projects demands should be awarded what it claims for the work done. The computer which does cope with increased demand should be rewarded for the ability to do so. That has always been the philosophy behind awarding credits in BOINC and I see no reason to change that. We do however, I believe, need a physically present, lowest common denominator, computer to establish a foundation, upon which, the variables present between projects, can be quantified as a cobblestone value.

Keith

ID: 14816 · Report as offensive
SekeRob

Send message
Joined: 25 Aug 06
Posts: 1596
Message 14817 - Posted: 10 Jan 2008, 14:15:05 UTC

Fixed does not have my vote and never will for it cannot be representative of the actual effort. Particularly, with non-deterministic calculations, the number of flops packed into a job can vary widely. One job can take 30 minutes, the next 1 hour 30 minutes and i'm sure there are projects that have even wider variations.

What has my vote is that the benchmark is made consistent and representative of the computations a machine can do in a second and not a test that one time happens when the machine is idle and next time runs and is impacted whilst doing an extensive indexing job. After all the credit claim are determined by the number of actual flops done and do not take into consideration that a cruncher happened to be watching a streaming video from the other side of the planet over a 56k dial-up like the benchmark test does.
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 14817 · Report as offensive
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14823 - Posted: 10 Jan 2008, 15:43:56 UTC - in response to Message 14817.  
Last modified: 10 Jan 2008, 15:57:05 UTC

Fixed does not have my vote and never will for it cannot be representative of the actual effort. Particularly, with non-deterministic calculations, the number of flops packed into a job can vary widely. One job can take 30 minutes, the next 1 hour 30 minutes and i'm sure there are projects that have even wider variations.

Projects with non-deterministic calculations are obviously outside the direct scope of the proposal and are, I think, in the minority. The proposal would improve the situation at most projects and another system wold need to be developed for the non-deterministic ones. Perhaps in the form of a post completion analysis of computation against a known mean.

What has my vote is that the benchmark is made consistent and representative of the computations a machine can do in a second and not a test that one time happens when the machine is idle and next time runs and is impacted whilst doing an extensive indexing job. After all the credit claim are determined by the number of actual flops done and do not take into consideration that a cruncher happened to be watching a streaming video from the other side of the planet over a 56k dial-up like the benchmark test does.

I agree totally Sek. The question is how? I believe that this idea is a good start down that path, it doesn't change any of the good aspects of the existing system. It simply adds a specific criteria to eliminate other unknown variables. Specifically, variation in the swap file and indexing load you mentioned.

[edit:] On reading your post more carefully, due to each project having its own 'Benchmark Computer' the Streaming video problem and all things like it would be eliminated completely. The donor computer would no longer require benchmarking. Also, exit the bad 'optimized client'. The true optimization clients like 64 bit processing workarounds would still be viable under this system. I honestly cannot see a future without fixed credits, in some form.[edit]

Cheers.
ID: 14823 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 14824 - Posted: 10 Jan 2008, 16:14:14 UTC - in response to Message 14823.  

Fixed does not have my vote and never will for it cannot be representative of the actual effort. Particularly, with non-deterministic calculations, the number of flops packed into a job can vary widely. One job can take 30 minutes, the next 1 hour 30 minutes and i'm sure there are projects that have even wider variations.

Projects with non-deterministic calculations are obviously outside the direct scope of the proposal and are, I think, in the minority. The proposal would improve the situation at most projects and another system wold need to be developed for the non-deterministic ones. Perhaps in the form of a post completion analysis of computation against a known mean.

BURP. What can you do about it? The run time and memory usage can vary very widely depending on input files, like on most 3D renderers. Seriously, a 3D image can take anywhere from 1 second to 1 week and the only way to know how much it will take is running it and timing it. It cannot be "predicted". Runtime can also vary widely between frames of the same animation, so you can have huge differences from one workunit to the next.
ID: 14824 · Report as offensive
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14830 - Posted: 11 Jan 2008, 0:08:06 UTC - in response to Message 14824.  

BURP. What can you do about it? The run time and memory usage can vary very widely depending on input files, like on most 3D renderers. Seriously, a 3D image can take anywhere from 1 second to 1 week and the only way to know how much it will take is running it and timing it. It cannot be "predicted". Runtime can also vary widely between frames of the same animation, so you can have huge differences from one workunit to the next.

Sheesh, Here I was hoping for some positive suggestions and all I'm getting are questions... I'll have a stab at it though.

If a relatively small, already known, render is chosen to be the 'standard'. Ideally it would run for an hour, or thereabouts, on the 'Benchmark computer'. It could then be sent to all new machines registering at the project, and the time taken to complete the 'standard render' used as a correction factor. This would need to be re-assessed if overclocking is changed, or new equipment is installed.

If the 'Benchmark Computer' claims 6 Credits/Hour and takes 1 hour to complete the 'standard render' Then the credit for that render would be 6. If another computer completes the standard render in 15 minutes, that computer would earn 6*4=24 credits per hour of rendering. It then becomes a simple task to multiply credit/hour by time taken to complete any required task. Just like we already do. I am of course talking CPU time not wall clock time.

I know of one project which has already looked at something similar to this in a Beta test, but I never saw any results... did you see them Sekerob?

On a different but relevant topic: There is a real danger in becoming too complex and clever with credit systems. Till now I haven't named any projects but these guys deserve it. Folding@Home, not BOINC thank heavens, famously introduced GPU folding some time back. This produces around 3000%(not a typo) of productivity over what was the normal CPU production. In their wisdom they then decided to 'align' the credits with CPU production, plus a small margin, to be fair. The result? The 'Super Producers' overclock their CPUs and save their hard earned cash buying low end GPUs which aren't used. Thus earning more credits on an overclocked CPU, and only producing about 1/25th the actual work. That doesn't exactly benefit the project.

In some ways, I see BOINC going down the same path. A cobblestone credit, is most relevant to the older CPU architecture. This, till now, has largely been ignored and overall credit being issued is diminishing as modern systems become more capable. There is a lesson to be learned from F@H. Like it or not, competitive 'Super Crunchers' contribute massive amounts of productivity. They need to be encouraged to strive ever harder, not discouraged because they crunch for a different reason to the one others might hold near and dear.

Cheers.
ID: 14830 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 14838 - Posted: 11 Jan 2008, 16:27:59 UTC
Last modified: 11 Jan 2008, 16:58:56 UTC

I can see that the credits system needs to be fair and seen to be fair, or at least not unfair. But the proposal looks to me so complicated that it will need a roomful of BOINC credit accountants and auditors each with a dozen different types of computer to implement and periodically recalibrate it. Because very few people will understand it, there will be a constant stream of queries and complaints on the forums.

If this proposal is implemented, will it be compulsory for all BOINC projects? Or will some be able to opt out?

When a new version of a CPDN model was found to run about 20% more slowly on most computers, it took about 6 weeks between the first forum posts pointing this out and the implementation of a different credit allocation for these models. Why such a delay? Because there are very few programmers keeping a very big project up and running. To be realistic, I just cannot see that these same (two) programmers will have time for such a fundamental reassessment of credit allocation.

I certainly won't be pointing the CPDN programmers to the Trac proposal as it would spoil their weekend. For them, being on track is more important than being on Trac.
ID: 14838 · Report as offensive
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14841 - Posted: 11 Jan 2008, 19:15:54 UTC - in response to Message 14838.  

Can we please get this thread back on Track and off Trac? To paraphrase.

Cheers.
ozylynx
ID: 14841 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 14846 - Posted: 11 Jan 2008, 21:05:18 UTC

Your suggestion certainly seems simpler than the Trac proposal.
ID: 14846 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 14849 - Posted: 12 Jan 2008, 0:22:39 UTC - in response to Message 14847.  

Yes we can. Please tell us how to chose a decent standard render, one that is sufficiently typical of all other renders to be crunched to allow decent credit calculations.

Yep, you have a point. Making a 'standard workunit' that can represent all other workunits is nearly impossible for many projects. Since you mentioned 'render', and I know about that topic, I will go on with that, which would apply to Renderfarm@Home, or to BURP if they ever support POV-Ray.

POV-Ray has a standard benchmark scene, which uses almost every single feature.

But not even that can be used for BOINC benchmarking needs. If I render a scene that uses photons with very high quality settings, needing around 1GB of RAM, memory bandwidth will become the bottleneck. The standard benchmark scene isn't made to test that specifically.

Or say I render the standard clouds.pov, which uses a specific feature *very* extensively (media). If somebody makes an optimized app that happens to make that feature faster but not the rest, the clouds scene will process much faster, but benchmark scene performance won't increase too much.

I'm sure many other projects would have problems coming up with a "benchmark workunit".

ID: 14849 · Report as offensive
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14850 - Posted: 12 Jan 2008, 0:41:05 UTC - in response to Message 14849.  

Yes we can. Please tell us how to chose a decent standard render, one that is sufficiently typical of all other renders to be crunched to allow decent credit calculations.

Yep, you have a point. Making a 'standard workunit' that can represent all other workunits is nearly impossible for many projects. Since you mentioned 'render', and I know about that topic, I will go on with that, which would apply to Renderfarm@Home, or to BURP if they ever support POV-Ray.

POV-Ray has a standard benchmark scene, which uses almost every single feature.

But not even that can be used for BOINC benchmarking needs. If I render a scene that uses photons with very high quality settings, needing around 1GB of RAM, memory bandwidth will become the bottleneck. The standard benchmark scene isn't made to test that specifically.

Or say I render the standard clouds.pov, which uses a specific feature *very* extensively (media). If somebody makes an optimized app that happens to make that feature faster but not the rest, the clouds scene will process much faster, but benchmark scene performance won't increase too much.

I'm sure many other projects would have problems coming up with a "benchmark workunit".

[...snip]I am not very computer savvy, so this has grown from observation and user experience, rather than detailed technical knowledge.

My thoughts on this is that the 'Standard Work Unit' would NOT be typical but in fact a worst case scenario WU. Maximum stress applied so that less stressed situations would be handled more quickly and therefore receive fewer credits. The system should then take care of itself.

Remember this is not a radical change to the existing system. It merely places a physical check point in the form of a computer which should demonstrate stressed performances, a low point if you will, from which, the point system can be better managed and predicted.

Keith
ID: 14850 · Report as offensive
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14853 - Posted: 12 Jan 2008, 4:27:01 UTC

I've had a chance to crunch a small sample of WUs from BURP. I believe the stress test idea should work and I found some interesting stuff too.

First: The WUs I have seen so far use very little L2 Cache, a much underrated slow down factor. Also the most Memory usage I have seen is about 43M on a WU which took 56 Minutes on an E2180 Dual Core. From what I've read in the forum this represents quite a large WU.

I'm also crunching WUs on a PIII 1GHz and a Celery 600. Both have only 384MB of ram. The former 256K and the latter only 128K L2 Cache. There is no evidence of overload or swap file usage to HDD. In short the project seems perfect for the Credit System, exactly as it stands right now and I can see no benefit, to this project, in changing it. No harm either :) Very early figures on a very small sample...

btw. All computers are running WinXP O.S.

Cheers.
Keith
ID: 14853 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 14862 - Posted: 13 Jan 2008, 0:31:32 UTC - in response to Message 14853.  

First: The WUs I have seen so far use very little L2 Cache, a much underrated slow down factor. Also the most Memory usage I have seen is about 43M on a WU which took 56 Minutes on an E2180 Dual Core. From what I've read in the forum this represents quite a large WU.



I tend to stay out of credit discussions.

But if this is what you want to use as a "worst case scenario" I'm very skeptical. I have an Intel T2300 laptop that ran XP. I did some BURP on it over the summer. I remember work taking 24+ hours on it. I don't recall memory usage.

There are projects out there where memory use varies dramatically within a single workunit. I'm talking from roughly 70 MB to 700+ MB, a factor or 10+.


Kathryn :o)
ID: 14862 · Report as offensive
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14863 - Posted: 13 Jan 2008, 1:01:17 UTC - in response to Message 14858.  
Last modified: 13 Jan 2008, 1:46:19 UTC

you've gone from a system in which a fairly accurate/reliable extrapolation is envisioned to one where we measure the worst case scenario, whatever that is, and then let the chance that the WU takes less time determine the final outcome. I don't see much chance of a system that works on chance ever getting off the ground.

Thank you Dagorath.
At least one person sees the idea as giving an accurate and reliable extrapolation. The chance factor, of which you speak, applies to the few specific projects which are 'indeterminate'(means by chance) within their own structure. The system, was never designed with that situation in mind. They are a completely different issue to be handled under a completely different set of rules.
[edit] btw. pph would remain constant. Credits for a completed task would be the 'chance' variable, in line with the time taken to complete.[/edit]
And since it's all happening on the host it will get cheated anyway.

This is a very valid point. I actually confirmed this for myself after my previous post. I also discovered that they can put irregular strain on L2 cache demands. Another idea is needed. The, one I put forward for handling this type of project is flawed.
They are not as easily cheated, they're easier to define, easier to code, easier to tweak too.

The info behind this statement would doubtless solve all of the problems. Please enlighten us as to how they are being easily and I sunmmize accurately, defined?

I'm encouraged that the basic proposal appears sound. Let's iron out the wrinkles. Any ideas? Let's Think Tank this people.

Kathryn, thanks for the input. That's all valuable info.

Cheers.
Keith
ID: 14863 · Report as offensive
zombie67
Avatar

Send message
Joined: 14 Feb 06
Posts: 136
United States
Message 14864 - Posted: 13 Jan 2008, 2:46:22 UTC

Perhaps cross-project credit parity really is impossible. I really like the idea behind Formula BOINC:

http://www.myboinc.com/FormulaBoinc/

Rather than comparing credits between projects, you compare positions
Reno, NV
Team: SETI.USA
ID: 14864 · Report as offensive
Profile Ozylynx
Avatar

Send message
Joined: 2 Jan 08
Posts: 31
Australia
Message 14865 - Posted: 13 Jan 2008, 4:41:08 UTC - in response to Message 14864.  

Rather than comparing credits between projects, you compare positions


Yes this is an excellent incentive by BOINC and an average overall position comparison of individual users, as well as teams would be interesting.

It doesn't however, overcome the fundamental problem of unfair credit allocation, within the same project, affecting the position of the individual within that project.

An example: One project, that I was with for some time, used an averaging technique for claimed credit. I'll use the WU mentioned earlier. I have database info on that one and it is from a different project but the principals apply equally. The Celeron 1300, 256K cache, took 211.5 CPU hours to complete claiming 1465 credits. E2180 1M shared cache, took 47.9 CPU hours and claims 683 credits. Average 1074. The E2180 receives 157% of its benchmarked credit while the Celeron only gets 73%. This effect is more or less, depending upon which computers are grouped for an average and IMO represents no Credit System whatsoever. It certainly makes rankings and positions, even within the same project, quite farcical.

Further to the other issue. To quote myself from the original post in this thread:
While I am aware that some projects have fairly constant run times within a project others do not. At least with a firm base to build upon, the likelihood of an appropriate algorithm being developed to deal with this situation greatly increases.

There is an answer out there. This is getting into the technical areas beyond my capability. Some see problems and others solutions. C'mon BOINCers.

Cheers
Keith
ID: 14865 · Report as offensive
zombie67
Avatar

Send message
Joined: 14 Feb 06
Posts: 136
United States
Message 14867 - Posted: 13 Jan 2008, 9:32:52 UTC - in response to Message 14865.  
Last modified: 13 Jan 2008, 9:35:25 UTC

It doesn't however, overcome the fundamental problem of unfair credit allocation, within the same project, affecting the position of the individual within that project.


Hmmm.

Credits within a project are a completely different problem than cross-project credits.

The former, IMO, are completely the responsibility of the project, and far more easily defined. It is a relatively easy problem to solve. Fixed credits, transaction counting, benchmarks/quorum, whatever. The advantage a project has is that they are all running the same app, using the same resources.

The latter is the real problem (IMO), and I think the position comparison may be the solution.
Reno, NV
Team: SETI.USA
ID: 14867 · Report as offensive
1 · 2 · 3 · Next

Message boards : The Lounge : Help with Project Points Proposal

Copyright © 2022 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.