Message boards : Web interfaces : Where would I find files for parsing stats?
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Sep 13 Posts: 57 |
I'm about to start programming a customised stat display for individual users to be able to show their total points across all BOINC projects, but I've run into a problem before I could even begin... Where do I find these stats? Of course, I know where I could get these stats if I personally wanted to look at them. Just go to boincstats or some other stat site. However, I would soon become unpopular with any site owners if I made excessive requests to some of their pages when these stats are loaded en masse, so I need to look for some flat files. Something that is nothing but stats, like these. Surely something for combined BOINC project scores exist somewhere, right? |
Send message Joined: 29 Aug 05 Posts: 15573 |
All projects have their independent statistics, which are just XML files that can be downloaded from the project. Normally by taking the project URL and just adding /stats/ to the end of it. (Not a .php or .htm or .html page!) So, for instance, https://einstein.phys.uwm.edu/stats/, or http://www.malariacontrol.net/stats/, or https://setiathome.berkeley.edu/stats/. Also see http://boinc.berkeley.edu/trac/wiki/CreditStats for more information. |
Send message Joined: 11 Sep 13 Posts: 57 |
Alright, just so I understand this (it's 3 in the morning here, and I'm pretty tired, so apologies), if I wanted to do what I mentioned, which is to have a user be able to show his or her own stats for combined BOINC points across all projects, I would need to program something that uses: http://boinc.netsoft-online.com/get_user.php?cpid=xxxx This is what I gathered from a link within your link. I assume the cpid is the unique ID each user will have, right? So that way, a user can enter their ID into whatever I will make, then my script will take that number and obtain the relevant information from the link above, automatically adding the cpid value. If this is correct, where would one obtain this cpid value from? |
Send message Joined: 29 Aug 05 Posts: 15573 |
I assume the cpid is the unique ID each user will have, right? Well, not so unique. It'll change whenever the user adds a project to crunch. The only true unique thing is the email address that the user uses. If this is correct, where would one obtain this cpid value from? Not sure if it's correct to do it that way, but both the userID and the CPID of that userID are in the user_id.gz file per project. To see what the CPID entails, see http://boinc.berkeley.edu/wiki/CPID (It's 3:22am for me as well, btw. I'm waiting for a couple of large files to be copied to my NAS. ;-)) |
Send message Joined: 11 Sep 13 Posts: 57 |
Yes, I can see what you mean about not being sure if it's the correct way. It would be annoying for users to have to keep changing their cpid just to have their stats working on my end. Obviously there must be something that never changes, otherwise other stat sites wouldn't be able to gather the data. Any idea what that would be? |
Send message Joined: 29 Aug 05 Posts: 15573 |
As I said, userID and email address. No two userIDs can make use of the same email address. Most stats sites I know of use the userID, although that is a per-project value, so userID4 at project A can have userID 77 at project B and userID 210438219823 at project C (if he got in a bit late). |
Send message Joined: 11 Sep 13 Posts: 57 |
So I need to download the .gz files for every single project, if I want to combine the scores? Damn, that's going to present a lot of issues. I'm still confused, though. How do other stat sites gather information on users, then? If the user IDs vary between projects, how does boincstats know that user x has starts from projects a, b, and c? Although even boincstats seems to be imperfect as well. I've got four duplicates of myself showing different projects im in. Since you mentioned the email address, I assume that there must be something you can do with it, right? |
Send message Joined: 20 Dec 07 Posts: 1069 |
How do other stat sites gather information on users, then? If the user IDs vary between projects, how does boincstats know that user x has starts from projects a, b, and c? That's where the CPID (Cross Project IDentificator) comes into play. For this to function correctly, users have to connect all their projects to one computer (simply said). There are some explanations in the BOINCstats FAQs. Although even boincstats seems to be imperfect as well. I've got four duplicates of myself showing different projects im in. That's not BOINCstats's fault but the problem lies within the CPID system itself. Gruß Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) |
Send message Joined: 11 Sep 13 Posts: 57 |
So, bottom line...What is the recommended way to go, then? It seems like any potential method mentioned so far is in some way flawed. Man, this was so much easier to do with Folding@Home. |
Send message Joined: 4 Jul 12 Posts: 321 |
Bottom line of any stats system is: you need to get the xml-stats from each project you want to support (BOINC is a framework with many independent projects. Folding@home once was one of those and then decided to go it's own way. You can't compare BOINC to Folding@home!) You use the CPID from each project as identifier, this CPID is derived from the user email and should synchronize over time if all projects somehow are added to a computer. It's possible if you have two computers and separate projects on each one to have a different CPID but same email address. This can be cleared by adding the other project to computer A for a short time. I think it's not a wide spread case. The bigger problem is when users use different mail addesses for different projects. This can be "repaired" by the stats site but you have to make sure that it's really the same user that claims those two accounts. You can look at the sources for boinc.netsoft-online.com here: http://boinc.berkeley.edu/trac/browser/boinc-combined-stats and see what they do. You should also take a look at the already available stats sites. They basically already do what you want to build: http://boincstats.com/, http://boinc.netsoft-online.com/, http://boinc.mundayweb.com/html/index.php, http://www.boincsynergy.com/stats/index.php, http://www.allprojectstats.com/, the full list: https://boinc.berkeley.edu/links.php (see Credit statistics and signature images) |
Send message Joined: 11 Sep 13 Posts: 57 |
Back again! I abandoned this idea due to its complexity, but I was looking through some of my account information on a few of my projects and had another idea. Instead of going through the stat files (as well as downloading them, which I'm sure are large), I could parse a project page based on a user's id number which is used as an input for my parser through a GET variable in the image URL. So for instance, if I have a bunch of users doing this: site.com/image.png?id=1 site.com/image.png?id=2 site.com/image.png?id=3 site.com/image.png?id=4 I could then code it so that when the image is requested, it searches the user account page for a user on a project common to all, which would contain all the information I would need, from the name of the user, to how many points they have in total, plus many more things. If I were to use WUProp, it would be easy to set up for users, and it's not intensive, so there's not really a downside to adding it for those who want some stats. Using myself as an example: http://wuprop.boinc-af.org/show_user.php?userid=6076 My ID would be 6076, so I would have a stat banner looking like: site.com/image.png?id=6076 I can then get all the data required for this particular ID and save it to a file on my server, so that data per user is requested no more than once a day (assuming that image is requested in a page load on that day), since obviously I wouldn't want to flood WUProp with requests. Then I can keep an old file as well to perform calculations like how many points were obtained over the past day, and I've got myself all the important stats I need. This should work, right? Any problems you folks see in this that I don't? |
Send message Joined: 4 Jul 12 Posts: 321 |
Sure that works. But that limits your stat-image to WUProp Users. You also need to check on any changes in the page source and modify your parser accordingly. The data wouldn't be bleeding edge and you don't know how accurate this is at all but you would still get a good picture of the users stats. If you only retrieve and calculate on page load than you somehow have to make sure that each day for all existing users the calculation is done. Else you can't calculate daily differences. |
Send message Joined: 11 Sep 13 Posts: 57 |
I'm not particularly worried about limiting it to WUProp users. Since its a non intensive project that can help others, I feel like it's a good incentive for people in my relatively small community to run it, if they want to display stats with an appearance that suits them. You mention accuracy as being a potential issue. What's the specific issue with going down this road? I'm motivated partly due to its simplicity, given that even the methods that dedicated third-party stat sites use have flaws, and that the difference may not be worth the added effort and complexity needed to get my stats working the way their stats do. As for the calculations, i could either use cron to solve that, or just go ahead and save every single user that has used the images at some point or another, since there's always going to be at least one person per day loading one of those images via a page load on my site. I've already got a similar system set up for my team's stats across the distributed computing world, so I could just adapt that for usage with multiple users. I've also got stats for Folding@Home in exactly the same way I do for what I want to do with BOINC, so I could combine elements from both (ie, the multiple project stat gathering from the former, with the multiple user functionality of the latter). This is a signature |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.