Please Help Me if You Can

Message boards : Questions and problems : Please Help Me if You Can
Message board moderation

To post messages, you must log in.

AuthorMessage
MarcioCavalcanti

Send message
Joined: 3 Nov 21
Posts: 3
Message 105956 - Posted: 3 Nov 2021, 13:43:15 UTC

Greetings.
I need to set up a script to fetch credits informations from some users in a team on R@H and then reward points in a "game" based proportionally on how many credits each participant scored on their Rosetta@Home activities.

This is 100% non profitable work from me trying to bring more people to help and on the good spirit of helping I have started trying to create a webscraper but honestly I'm terrible since I'm no more than an amateur dev. I don't even know if this code generates payout(rewarding) data properly.
Does anyone know if there is a rosetta@home api that could help or - even better - a webscraper for the rosetta@home website? Something similar to this project here would be great: https://github.com/stuckatsixpm/fah_scraper

Here is what I have managed to write (probably not gonna work or is gonna work badly) as the webscrapper for the rosetta@home website but I don't think it would even work. If anyone have some minutes of spare time and can help me out I would be forever thankful:
"
import argparse
import logging
import sqlite3
import datetime
import json
from collections import namedtuple

import requests
from bs4 import BeautifulSoup

UserStats = namedtuple("UserStats", ["name", "credit", "recent_average_credit"])

def init_db(db_file="./folding_data.db"):
logging.info("Initializing database.")
con = sqlite3.connect(db_file)
cur = con.cursor()
cur.execute(
"CREATE TABLE IF NOT EXISTS folding_data (name TEXT PRIMARY KEY, credit INTEGER, recent_average_credit INTEGER, credit_delta INTEGER, date_delta INTEGER, date INTEGER)"
)
con.commit()

return con


def fetch_stats(teamid=30157):
url = "https://boinc.bakerlab.org/rosetta/team_display.php?teamid={}".format(teamid)
r = requests.get(url)
if r.ok:
return r.text
else:
raise Exception("Failed to fetch: {}".format(url))


def log_stats(db, user_stats):
logging.debug("logging stats for user: {}".format(user_stats.name))
cur = db.cursor()
prev_credit = cur.execute(
"SELECT credit FROM folding_data WHERE name == '{}'".format(user_stats.name)
).fetchone()
prev_date = cur.execute(
"SELECT date FROM folding_data WHERE name == '{}'".format(user_stats.name)
).fetchone()
if prev_credit:
prev_credit = prev_credit[0]
else:
prev_credit = 0
if prev_date:
prev_date = prev_date[0]
else:
prev_date = 0
credit_delta = user_stats.credit - prev_credit
date = int(datetime.datetime.utcnow().timestamp())
date_delta = date - prev_date

cur.execute("DELETE FROM folding_data WHERE name == '{}'".format(user_stats.name))
cur.execute(
"INSERT INTO folding_data VALUES ('{}', {}, {}, {}, {}, {})".format(
user_stats.name,
user_stats.credit,
user_stats.recent_average_credit,
credit_delta,
date_delta,
int(datetime.datetime.utcnow().timestamp()),
)
)


def create_snapshot(db, teamid=30157):
logging.info("Creating snapshot.")
stats = fetch_stats(team=team)
soup = BeautifulSoup(stats, "html.parser")
members = soup.find_all("table", {"class": "members"})
rows = members[0].find_all("tr")

for row in rows[1:]:
try:
_, _, name, credit, recent_average_credit = [item.text for item in row.find_all("td")]
user_stats = UserStats(name, int(credit), int(recent_average_credit))
log_stats(db, user_stats)
except Exception as e:
logging.error("Failed to log data for user: {}".format(str(e)))

db.commit()


def save_snapshot(db, output="./folding_data.json"):
logging.info("Saving snapshot as JSON.")
cur = db.cursor()
names = cur.execute("SELECT name FROM folding_data")
snapshot = {}
for name in names.fetchall():
name = name[0]
data = cur.execute(
"SELECT credit_delta, date_delta FROM folding_data WHERE name = '{}'".format(
name
)
)
data = data.fetchall()
credit = data[0][0]
time = data[0][1]
if credit > 0:
snapshot[name] = {
"credit": credit,
"time": time,
}

with open(output, "w") as outfile:
outfile.write(json.dumps(snapshot))


def main():
parser = argparse.ArgumentParser()
parser.add_argument("team", help="The team ID to scrape. Ex. 234980")
parser.add_argument(
"--db-file",
default="./folding_data.db",
help="The path to the local stats DB. This DB will be created if it doesn't exist.",
)
parser.add_argument(
"--json-file",
default="./folding_data.json",
help="The path to the output JSON file containing the credit and time deltas.",
)
parser.add_argument("--verbose", action="store_true", help="Print debug logs.")
args = parser.parse_args()

if args.verbose:
logging.basicConfig(level=logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)

db = init_db(db_file=args.db_file)
create_snapshot(db, team=args.team)
save_snapshot(db, output=args.json_file)


if __name__ == "__main__":
main()
"
ID: 105956 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 105957 - Posted: 3 Nov 2021, 14:20:22 UTC - in response to Message 105956.  

a webscraper for the rosetta@home website
If everyone goes scraping the webpages of the projects, the webpages of the projects go down. Most projects will banish you if they find you scraping their pages without asking permission.

All BOINC projects export the statistics. Rosetta does that via https://boinc.bakerlab.org/rosetta/stats/ (notice, there's no .html or .php extension to this link!), where you can download their data daily and search through that, locally.
ID: 105957 · Report as offensive
MarcioCavalcanti

Send message
Joined: 3 Nov 21
Posts: 3
Message 105958 - Posted: 3 Nov 2021, 14:42:25 UTC - in response to Message 105957.  
Last modified: 3 Nov 2021, 14:43:31 UTC

I did not know that so thanks for the info. As I said I'm an amateur on the matter but I need to get the data so I asked devs and basically they unanimously said I would have to create a webscrapper.

So how cant I get the data by those rosetta/stats/? Would I need to create a script to download the user.gz list daily and pick up the info one by one? Would you be able to help me?
ID: 105958 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 105960 - Posted: 3 Nov 2021, 16:20:57 UTC - in response to Message 105958.  

So how cant I get the data by those rosetta/stats/? Would I need to create a script to download the user.gz list daily and pick up the info one by one? Would you be able to help me?
I'm not in the script writing business, so cannot help you there. Perhaps someone else here can. Or you can find something on the interwebs.
ID: 105960 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 105962 - Posted: 3 Nov 2021, 17:35:00 UTC

The exported stats are in a compressed XML format. Your favorite database probably has an XML import tool. Make sure you have plenty of disk space available.
ID: 105962 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2516
United Kingdom
Message 105963 - Posted: 3 Nov 2021, 17:44:31 UTC - in response to Message 105962.  

Make sure you have plenty of disk space available.
And Richard does mean plenty!
ID: 105963 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 105964 - Posted: 3 Nov 2021, 18:28:04 UTC - in response to Message 105962.  

It's reasonably small, I downloaded and unpacked it: 412,003KB.
ID: 105964 · Report as offensive
MarcioCavalcanti

Send message
Joined: 3 Nov 21
Posts: 3
Message 105965 - Posted: 3 Nov 2021, 18:55:54 UTC - in response to Message 105960.  

I'm not in the script writing business, so cannot help you there. Perhaps someone else here can. Or you can find something on the interwebs.


Thanks. I asked around here because it would definitely bring many new users from the project into using Rosetta @ Home and consequently to BOINC. Hope a good soul with some time to spare appears that is able to help me with the script to download the user list daily and fetch the credit informations there for rewarding.
ID: 105965 · Report as offensive

Message boards : Questions and problems : Please Help Me if You Can

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.