Where would I find files for parsing stats?

Message boards : Web interfaces : Where would I find files for parsing stats?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile hiigaran
Avatar

Send message
Joined: 11 Sep 13
Posts: 57
Message 52451 - Posted: 9 Feb 2014, 1:20:37 UTC

I'm about to start programming a customised stat display for individual users to be able to show their total points across all BOINC projects, but I've run into a problem before I could even begin...

Where do I find these stats?

Of course, I know where I could get these stats if I personally wanted to look at them. Just go to boincstats or some other stat site. However, I would soon become unpopular with any site owners if I made excessive requests to some of their pages when these stats are loaded en masse, so I need to look for some flat files. Something that is nothing but stats, like these.

Surely something for combined BOINC project scores exist somewhere, right?
ID: 52451 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 52452 - Posted: 9 Feb 2014, 1:43:45 UTC - in response to Message 52451.  

All projects have their independent statistics, which are just XML files that can be downloaded from the project. Normally by taking the project URL and just adding /stats/ to the end of it. (Not a .php or .htm or .html page!)

So, for instance, https://einstein.phys.uwm.edu/stats/, or http://www.malariacontrol.net/stats/, or https://setiathome.berkeley.edu/stats/.

Also see http://boinc.berkeley.edu/trac/wiki/CreditStats for more information.
ID: 52452 · Report as offensive
Profile hiigaran
Avatar

Send message
Joined: 11 Sep 13
Posts: 57
Message 52453 - Posted: 9 Feb 2014, 2:09:27 UTC - in response to Message 52452.  

Alright, just so I understand this (it's 3 in the morning here, and I'm pretty tired, so apologies), if I wanted to do what I mentioned, which is to have a user be able to show his or her own stats for combined BOINC points across all projects, I would need to program something that uses:

http://boinc.netsoft-online.com/get_user.php?cpid=xxxx

This is what I gathered from a link within your link. I assume the cpid is the unique ID each user will have, right? So that way, a user can enter their ID into whatever I will make, then my script will take that number and obtain the relevant information from the link above, automatically adding the cpid value.

If this is correct, where would one obtain this cpid value from?
ID: 52453 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 52454 - Posted: 9 Feb 2014, 2:22:06 UTC - in response to Message 52453.  
Last modified: 9 Feb 2014, 2:22:51 UTC

I assume the cpid is the unique ID each user will have, right?

Well, not so unique. It'll change whenever the user adds a project to crunch. The only true unique thing is the email address that the user uses.

If this is correct, where would one obtain this cpid value from?

Not sure if it's correct to do it that way, but both the userID and the CPID of that userID are in the user_id.gz file per project.

To see what the CPID entails, see http://boinc.berkeley.edu/wiki/CPID

(It's 3:22am for me as well, btw. I'm waiting for a couple of large files to be copied to my NAS. ;-))
ID: 52454 · Report as offensive
Profile hiigaran
Avatar

Send message
Joined: 11 Sep 13
Posts: 57
Message 52460 - Posted: 9 Feb 2014, 11:24:56 UTC

Yes, I can see what you mean about not being sure if it's the correct way. It would be annoying for users to have to keep changing their cpid just to have their stats working on my end.

Obviously there must be something that never changes, otherwise other stat sites wouldn't be able to gather the data. Any idea what that would be?
ID: 52460 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 52462 - Posted: 9 Feb 2014, 12:39:24 UTC - in response to Message 52460.  

As I said, userID and email address. No two userIDs can make use of the same email address. Most stats sites I know of use the userID, although that is a per-project value, so userID4 at project A can have userID 77 at project B and userID 210438219823 at project C (if he got in a bit late).
ID: 52462 · Report as offensive
Profile hiigaran
Avatar

Send message
Joined: 11 Sep 13
Posts: 57
Message 52463 - Posted: 9 Feb 2014, 12:56:30 UTC

So I need to download the .gz files for every single project, if I want to combine the scores? Damn, that's going to present a lot of issues.

I'm still confused, though. How do other stat sites gather information on users, then? If the user IDs vary between projects, how does boincstats know that user x has starts from projects a, b, and c? Although even boincstats seems to be imperfect as well. I've got four duplicates of myself showing different projects im in.

Since you mentioned the email address, I assume that there must be something you can do with it, right?
ID: 52463 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 52464 - Posted: 9 Feb 2014, 13:09:41 UTC - in response to Message 52463.  

How do other stat sites gather information on users, then? If the user IDs vary between projects, how does boincstats know that user x has starts from projects a, b, and c?

That's where the CPID (Cross Project IDentificator) comes into play. For this to function correctly, users have to connect all their projects to one computer (simply said). There are some explanations in the BOINCstats FAQs.

Although even boincstats seems to be imperfect as well. I've got four duplicates of myself showing different projects im in.

That's not BOINCstats's fault but the problem lies within the CPID system itself.

Gruß
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
ID: 52464 · Report as offensive
Profile hiigaran
Avatar

Send message
Joined: 11 Sep 13
Posts: 57
Message 52467 - Posted: 9 Feb 2014, 16:10:03 UTC

So, bottom line...What is the recommended way to go, then? It seems like any potential method mentioned so far is in some way flawed.

Man, this was so much easier to do with Folding@Home.
ID: 52467 · Report as offensive
ChristianB
Volunteer developer
Volunteer tester

Send message
Joined: 4 Jul 12
Posts: 321
Germany
Message 52482 - Posted: 10 Feb 2014, 14:56:57 UTC

Bottom line of any stats system is:
you need to get the xml-stats from each project you want to support (BOINC is a framework with many independent projects. Folding@home once was one of those and then decided to go it's own way. You can't compare BOINC to Folding@home!)

You use the CPID from each project as identifier, this CPID is derived from the user email and should synchronize over time if all projects somehow are added to a computer. It's possible if you have two computers and separate projects on each one to have a different CPID but same email address. This can be cleared by adding the other project to computer A for a short time. I think it's not a wide spread case. The bigger problem is when users use different mail addesses for different projects. This can be "repaired" by the stats site but you have to make sure that it's really the same user that claims those two accounts.

You can look at the sources for boinc.netsoft-online.com here: http://boinc.berkeley.edu/trac/browser/boinc-combined-stats and see what they do.

You should also take a look at the already available stats sites. They basically already do what you want to build: http://boincstats.com/, http://boinc.netsoft-online.com/, http://boinc.mundayweb.com/html/index.php, http://www.boincsynergy.com/stats/index.php, http://www.allprojectstats.com/, the full list: https://boinc.berkeley.edu/links.php (see Credit statistics and signature images)
ID: 52482 · Report as offensive
Profile hiigaran
Avatar

Send message
Joined: 11 Sep 13
Posts: 57
Message 53553 - Posted: 8 Apr 2014, 21:28:48 UTC

Back again!

I abandoned this idea due to its complexity, but I was looking through some of my account information on a few of my projects and had another idea.

Instead of going through the stat files (as well as downloading them, which I'm sure are large), I could parse a project page based on a user's id number which is used as an input for my parser through a GET variable in the image URL. So for instance, if I have a bunch of users doing this:

site.com/image.png?id=1
site.com/image.png?id=2
site.com/image.png?id=3
site.com/image.png?id=4

I could then code it so that when the image is requested, it searches the user account page for a user on a project common to all, which would contain all the information I would need, from the name of the user, to how many points they have in total, plus many more things. If I were to use WUProp, it would be easy to set up for users, and it's not intensive, so there's not really a downside to adding it for those who want some stats. Using myself as an example:

http://wuprop.boinc-af.org/show_user.php?userid=6076

My ID would be 6076, so I would have a stat banner looking like:

site.com/image.png?id=6076

I can then get all the data required for this particular ID and save it to a file on my server, so that data per user is requested no more than once a day (assuming that image is requested in a page load on that day), since obviously I wouldn't want to flood WUProp with requests. Then I can keep an old file as well to perform calculations like how many points were obtained over the past day, and I've got myself all the important stats I need.

This should work, right? Any problems you folks see in this that I don't?
ID: 53553 · Report as offensive
ChristianB
Volunteer developer
Volunteer tester

Send message
Joined: 4 Jul 12
Posts: 321
Germany
Message 53557 - Posted: 9 Apr 2014, 10:45:07 UTC

Sure that works. But that limits your stat-image to WUProp Users. You also need to check on any changes in the page source and modify your parser accordingly.

The data wouldn't be bleeding edge and you don't know how accurate this is at all but you would still get a good picture of the users stats.

If you only retrieve and calculate on page load than you somehow have to make sure that each day for all existing users the calculation is done. Else you can't calculate daily differences.
ID: 53557 · Report as offensive
Profile hiigaran
Avatar

Send message
Joined: 11 Sep 13
Posts: 57
Message 53558 - Posted: 9 Apr 2014, 11:10:21 UTC - in response to Message 53557.  

I'm not particularly worried about limiting it to WUProp users. Since its a non intensive project that can help others, I feel like it's a good incentive for people in my relatively small community to run it, if they want to display stats with an appearance that suits them.

You mention accuracy as being a potential issue. What's the specific issue with going down this road? I'm motivated partly due to its simplicity, given that even the methods that dedicated third-party stat sites use have flaws, and that the difference may not be worth the added effort and complexity needed to get my stats working the way their stats do.

As for the calculations, i could either use cron to solve that, or just go ahead and save every single user that has used the images at some point or another, since there's always going to be at least one person per day loading one of those images via a page load on my site.

I've already got a similar system set up for my team's stats across the distributed computing world, so I could just adapt that for usage with multiple users. I've also got stats for Folding@Home in exactly the same way I do for what I want to do with BOINC, so I could combine elements from both (ie, the multiple project stat gathering from the former, with the multiple user functionality of the latter).
This is a signature
ID: 53558 · Report as offensive

Message boards : Web interfaces : Where would I find files for parsing stats?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.