Ticket #57 (closed Defect: fixed)

Opened 1 year ago

Last modified 10 months ago

Code for boincstats pages

Reported by: mo.v Assigned to: davea
Priority: Major Milestone: 5.10
Component: Web - Project Version:
Keywords: charset stats xml db_dump Cc:

Description

On cpdn we have a team trying unsuccessfully to make its name, Universität der Bundeswehr München, display properly on its Boincstats page:

http://www.boincstats.com/stats/team_graph.php?pr=cpdn&id=5624

though on cpdn it displays correctly:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=5624

Cpdn member Richard Rodway says

It's definitely UTF-8 that's appearing on the boincstats pages and it looks like the correct (2 byte) UTF-8 sequences are being used. Unfortunately the page is being served as an ISO8859-1 page and as a result the 2 byte sequence is not being interpreted as one character, but as two. This apparently is being done by the server since it's specifically encoding these bytes to appear correctly as 8859-1 characters, so changing the page encoding in the browser will not work! (It's using html entities to render the characters)

I notice that the climateprediction page for that team is also a 8859-1 encoded page, but in this case the correct code values are being used. 'ä' is encoded as the single byte 0xE4 in 8859-1 and this is being used on the cpdn pages.

I don't know how the team name is getting propagated to the boincstats servers, but something in the way has translated that to UTF-8. The encoding for 'ä' in UTF-8 is the 2 byte sequence 0xC3 0xA4. However if you read that as 8859-1 then instead of translating that sequence into the one character U+00E4 (ä) it gets viewed as the 2 8859-1 characters 0xC3 and 0xA4. 0xC3 is a Ã, 0xA4 is a ¤ . The server is reading the UTF-8 sequence, and probably then storing it unchanged in the database. Then whenever a page is generated, it's reading that data from the database and assuming that it is ISO8859-1

To fix the problem you need to make sure that whatever is sending the team names to boincstats is doing so in an encoding that boincstats understands. There's nothing at all wrong with UTF-8, and my preferred solution is for boincstats to use UTF-8 in its webpages and database (or at least some variant of Unicode in the database). Not only would this fix this problem, it'd also allow teams (and names) to use any character. Such as Japanese or Korean characters... Which is quite impossible in 8859-1, there's only 256 characters in that characterset, as opposed to about 1.1 million in Unicode... (although I think only about 150,000 are currently in use).

The cpdn discussion is here - one sees that none of the usual solutions work:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=5409

This problem must affect teams from all projects. Any hope of a solution?

Richard Rodway and Mo Vilar

Attachments

ticket-57 v1.diff (3.0 kB) - added by ChristianB on 05/18/07 04:58:13.
This patch will encode all umlauts and other strange characters within teamname, team description and username into proper html entities. User and teams with such strange characters in their names should edit their names after this patch was applied at there project.

Change History

04/18/07 14:52:37 changed by romw

  • owner set to romw.
  • priority changed from Undetermined to Major.
  • status changed from new to assigned.
  • component changed from Undetermined to Server - Web - Project.
  • milestone changed from Undetermined to 5.10.

04/18/07 14:53:44 changed by romw

  • owner changed from romw to Rytis.
  • status changed from assigned to new.

04/19/07 00:19:30 changed by Rytis

Not a project web problem, but actually stats export. Changing stats exporter to use utf-8 should fix the problem. I won't do that, it's not PHP code. Anyone?

04/19/07 00:19:45 changed by Rytis

  • owner deleted.

04/19/07 13:57:02 changed by romw

  • owner set to davea.
  • milestone changed from 5.10 to 5.12.

David?

04/24/07 04:14:58 changed by milo.thurston@oerc.ox.ac.uk

May I ask what part of the code the "stats exporter" is in?

05/09/07 12:32:15 changed by davea

  • status changed from new to closed.
  • resolution set to wontfix.
  • milestone changed from 6.0 to 5.10.

05/09/07 12:32:32 changed by davea

This is not relevant to BOINC

05/15/07 12:36:47 changed by maureenvilar@hotmail.com

  • status changed from closed to reopened.
  • resolution deleted.

Who should we take the problem to?

05/15/07 12:38:37 changed by davea

  • status changed from reopened to closed.
  • resolution set to fixed.

Boincstats is run by Willy de Zutter: willy1@bbgo.nl

05/17/07 14:00:09 changed by Willy

  • status changed from closed to reopened.
  • resolution deleted.

This is not a stats site problem, the XML is encoded incorrectly. Not going to type it all again, the story is here: http://www.boincstats.com/forum/forum_thread.php?id=2082

05/17/07 18:07:10 changed by Nicolas

  • keywords set to charset stats xml db_dump.

It's a deficiency on db_dump. Why would that that be "not relevant to BOINC"?

05/18/07 01:41:40 changed by ChristianB

It's not entirely db_dump that's wrong here. There is a serious fault when entering usernames and teamnames into the DB. The special characters (umlauts: äöü) are not encoded into html entities. So we have to encode the ä to ä for example. Every browser understands these tags and we don't have to cope with UTF-8 or ISO-8859-1. I'm currently working on a function that can be used to do this. I'll post the diffs later.

05/18/07 04:58:13 changed by ChristianB

  • attachment ticket-57 v1.diff added.

This patch will encode all umlauts and other strange characters within teamname, team description and username into proper html entities. User and teams with such strange characters in their names should edit their names after this patch was applied at there project.

05/18/07 08:02:04 changed by Rytis

  • status changed from reopened to closed.
  • resolution set to fixed.

(In [12691]) Encode UTF characters into HTML entities (from ChristianB, fix #57).

NOTE: teams that have name display issues will have to edit their description once the projects update the code.


If this page is incomplete or incorrect, please edit it or add it to the wiki to-do list. To do this, you must be logged in; click Login or Register above.