Calculating the number of cores provided by a project

Message boards : Questions and problems : Calculating the number of cores provided by a project
Message board moderation

To post messages, you must log in.

AuthorMessage
Hendrik B.

Send message
Joined: 26 Aug 14
Posts: 5
Switzerland
Message 55589 - Posted: 26 Aug 2014, 10:07:00 UTC

Hi there,
In cloud computing the number of "logical cores" is often used to describe how much resources a certain system can deliver. I would like to compare some Boinc projects (as example Seti@home) to computing centers using this number.
Is there a way to get the total number of at the moment running logical cores for a Boinc project? (or a Number that is close to it)

So the question and answer that I would like to have from a Boinc project would be as example:
If I have two hosts actively running on a boinc project with one i7 (8 logical Cores) and one i5 (4 logical Cores) processor, the answer to the total number of actively running logical cores would be “12 logical Cores”.

Is there some way to actually do this calculation for Boinc projects?


Thank you in advance,
Hendrik
ID: 55589 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 55607 - Posted: 26 Aug 2014, 17:17:31 UTC - in response to Message 55589.  

Do you mean to calculate your own cores per project, or total cores of all computers active at projects? If the latter, that information should be available through the Host statistics files that the projects dump out every day, and is therefore possibly available through the statistics websites.
ID: 55607 · Report as offensive
Hendrik B.

Send message
Joined: 26 Aug 14
Posts: 5
Switzerland
Message 55636 - Posted: 26 Aug 2014, 23:28:00 UTC - in response to Message 55607.  

Thank you for your fast reply,
I would like to calculate the total number of cores of all computers active at one project at this moment or averaged over a decent time. But not only the number of work units that were send out, but rather how many cores are available to this project
.
The Host statistics files, which you mentioned gave me a good hint.
Although for seti@home the only way for me to find out something about the numbers of cores in the project were the host stats and the host CPU breakdown from Boincstats.com . The problem on each of them is, that I would have to sum up huge amounts of numbers, by copy and pasting them into excel. Then I would get the total number of cores for all hosts ever registerd, but I would not find out, how many of those cores are actually considered active. Do you know how I could find that out?

Hendrik
ID: 55636 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 55650 - Posted: 27 Aug 2014, 15:06:09 UTC - in response to Message 55636.  
Last modified: 27 Aug 2014, 15:10:36 UTC

The number in the Host statistics site may be a bit diluted, as it shows all CPU cores that ever reported any work and got credit.

In the case of Seti, you can go to http://setiathome.berkeley.edu/stats/, download the hosts.gz file, unpack that, open it (in e.g. Notepad++) and then filter on <ncpus> for computers whose <expavg_credit> is larger than 1.
ID: 55650 · Report as offensive
Hendrik B.

Send message
Joined: 26 Aug 14
Posts: 5
Switzerland
Message 55671 - Posted: 28 Aug 2014, 14:13:04 UTC - in response to Message 55650.  

Hi,
Your suggestion worked nicely, thank you :)
I wrote a little python script to analyze the file from the Seti@home statistics for me.
The results were:
Inactive Computers: 119137
Total CPU's inactive: 452946
Active computers: 181761
Total CPU's active: 816959

It took a total time of 21 Minutes for the script to go through the 4GB File that Seti produces, but that is just okay.
As minimum for <expavg_credit> I used 1, is that as well the number that Boinc uses to produce it's statistics?




If anyone is interested, here is the code of the python script written for python 2.6.
It is really dirty and not a clean at all, but it does what it should and it does it relatively fast. (You will need to reinsert the tap-spaces, as the board removed them...)

import time

__author__ = 'Hendrik'

'''
The script will only work with python 2.6.x or higher, but not with 3.x

This is a little script wirtten to read a host statistics file made by a boinc project.
It will find out how many logical cpu cores are considdered active (by a threshhold you can define for the "expavg_credit" value)
parameters are given over command line input.

Sorry this is really dirty and not clean writing at all...

'''

import xml.sax as sax

class hostStatsReader(sax.handler.ContentHandler):

    def __init__(self):
        self.ergebnis = {}
        self.schluessel = ""
        self.wert = ""
        self.aktiv = None
        self.typ = None
        self.lines = 0
        global NumberLines, ReportLines
        self.allLines = NumberLines
        self.allLines = (self.allLines * 29) /30
        self.linesToReport = ReportLines

    def startElement(self, name, attrs):
        if name == "host":
            self.schluessel = ""
            self.wert = ""
        elif name == "expavg_credit" or name == "ncpus":
            self.aktiv = name
            self.typ = eval("int")

    def endElement(self, name):
        self.lines += 1
        if (self.lines % self.linesToReport) == 0:
            percent = (self.lines * 1000) / self.allLines
            print "about " + str(self.lines) + " lines read. That are about " + str(percent) + " promill"
        if name == "host":
            self.ergebnis[self.schluessel] = self.typ(self.wert)
        elif name == "expavg_credit" or name == "ncpus":
            self.aktiv = None

    def characters(self, content):
        if self.aktiv == "expavg_credit":
            self.schluessel += content
        elif self.aktiv == "ncpus":
            self.wert += content


Threshhold = 1
FILE = "D:\Studium\CERN\BOINC\hostStatsReader\host_atlas"
NumberLines = 34637
#NumberLines = 104016628
ReportLines = 100000

FILE = input("Enter the name of the File (enclosed by quote signs): ")
NumberLines = int(input("Enter the number of lines the file has: "))
ReportLines = int(input("Enter after how many lines there should be a status update printed out (note, a lower number slows down the process significantly, choose at least 10000, for big projects choose about 100000 or more): "))
Threshhold = float(input("Enter the threshhold number for the  expavg_credit value (normaly 1): "))


handler = hostStatsReader()
parser = sax.make_parser()
parser.setContentHandler(handler)
start = time.clock()
parser.parse(FILE)
stop = time.clock()
time = stop - start
print "Time for reading the File (in secs): " + str(time)

ergebnis = handler.ergebnis
activeComputers = 0
inactiveComputers = 0
totalCPUSactiv = 0
totalCPUSinactiv = 0
for item in ergebnis:
    if float(item) <= Threshhold:
        inactiveComputers += 1
        totalCPUSinactiv += int(ergebnis[item])
    else:
        activeComputers += 1
        totalCPUSactiv += int(ergebnis[item])


print "Active computers: " + str(activeComputers)
print  "total CPU's active: " + str(totalCPUSactiv)
print "Inactive Computers: " + str(inactiveComputers)
print  "total CPU's inactive: " + str(totalCPUSinactiv)

ID: 55671 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 55672 - Posted: 28 Aug 2014, 15:25:11 UTC - in response to Message 55671.  
Last modified: 28 Aug 2014, 15:26:40 UTC

As minimum for <expavg_credit> I used 1, is that as well the number that Boinc uses to produce it's statistics?

All the values in the statistics file exported by Seti are used by statistic sites, not by BOINC (Manager). BOINC itself uses &lt;user_expavg_credit&gt; and &lt;host_expavg_credit&gt; read from client_state.xml for the User and Host average statistics.

I used the value of 1 on Seti as that's what they use to allow you to post on their main forums. A value of at least 1 average credit means you're somewhat active. A value of 10 would probably be more realistic, I am not sure what minimum RAC someone has when he crunches Seti in a minimalistic way on his CPUs only.

But I just remember that even this setup is diluted, as this counts also those computers that do not actively use their CPU for calculations at Seti, but instead use one or more GPUs. For instance, my 4 core CPU will be counted in your values, but the RAC I have at Seti comes purely from running work one half day a weekend on my GPU. I don't crunch on any of the 4 CPU cores.

There are no distinctions between (average) credit gotten from CPUs or GPUs (yet) at Seti, or any other project. That's in the works, but I must say that that is in the works to differentiate between ASICs and CPU/GPUs only. Or that is as far as I read the proposal for that.

But still, congratulations. You now know how many computers are active at Seti, just not directly how many CPUs are active. Perhaps that you should try this on a CPU only project, instead of CPU+GPU projects? :-)
ID: 55672 · Report as offensive

Message boards : Questions and problems : Calculating the number of cores provided by a project

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.