Skip to:
Content
Pages
Categories
Search
Top
Bottom

Problems with umlauts

  • Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hello,

    i have installed Buddypress the bbPress Forum. Both are the latests builds. If i write in bbPress direct, i can use öäüß and the forum shows them. The look also ok in buddypress.

    If i writhe öäüß into an Buddypress Group forum, i get somthing like that ߠքܼ/p>. In bbPres it looks like that aus /amp/auml;/amp/ouml;/amp/uuml;/amp/szlig; /amp/Ouml;/amp/Auml;/amp/Uuml;

    In both configs WPMU und bbPress is utf8 defined

    Any hint how to solve the problem? It seems to be a problem of the filtering funktion in the buddypress forum plugin.

    Best,

    Karl-Heinz

Viewing 25 replies - 1 through 25 (of 30 total)
  • Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    No one can help?

    Avatar of Fishbowl81
    fishbowl81
    Participant

    @fishbowl81

    Can you confirm that they are making it back to the bbpress tables correctly? IE if you open the post in bbpress, are they already corrupt?

    Or is it only when they get displayed back to Buddypress?

    Also if you make a post in bbpress, does it show up in buddypress correctly or is it corrupt there also?

    Answering these questions may help to narrow down the bug, or may prove to be a wide spread issue.

    Sorry I couldn’t give you a solution, but first we must understand the problem at hand.

    Brad

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    If i write in the BuddyPress Group Forum in the Buddypress Frontend, all the Umlauts are gone and look like in the first post above.

    If i view this in the bbPress frontend the Umlauts are also broken.

    When i write into the bbPress Frontend direkt then the Umlauts are ok. They are also ok, when i list them in the BuddyPress Group Forum. So the transfer from bbPress to BuddyPress is ok, the other way around it does not work.

    Avatar of Burt Adsit
    Burt Adsit
    Participant

    @burtadsit

    Lemme go look at this. I was just in that area about a week ago.

    (later) Works for me. Lots of u-umlaut chars going to and from bp group forums, to bbpress and back again. Showing up as proper u-umlaut chars. Andy just installed some fixes lately in this area. Try upgrading to the latest bp trunk.

    The fixes include an upgrade to the buddypress-enable.php plugin that runs in bbpress. Don’t forget that.

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hello,

    i have installed the latest version. Now in Buddypress the umlauts in the Title of the forum message are ok. but the umlauts in the textfield are filtered completly out. That means if i put umlauts like öäüÖÄÜß there, then send it and list it, i just see an empty textfield.

    Best,

    Karl

    Avatar of Burt Adsit
    Burt Adsit
    Participant

    @burtadsit

    I can’t reproduce the problem you are describing.

    In bp the title of a post going from bp group forums to bbpress goes through all the same filtering as the content except the following:

    wpautop

    make_clickable

    bp_forums_filter_encode / bp_forums_filter_decode filters. The content gets that treatment.

    Are your umlaut chars showing ok in bbpress if you enter them from bp forums?

    Just to narrow things down. Can you temporarily comment out line 4 in bp-forums-filters.php which reads: add_filter( ‘bp_forums_new_post_text’, ‘bp_forums_filter_encode’ );

    Then try adding a new post with umlaut chars. This filter needs to be there but let’s see if it is the problem.

    Avatar of Burt Adsit
    Burt Adsit
    Participant

    @burtadsit

    If you enter umlaut chars in the content of a forum post from bp and they are displayed correctly in bbpress then the problem isn’t the ‘encoding’ filter. It’s the ‘decoding’ filter which has an extra step in it.

    On the decoding site in bp when the content comes back from bbpress we also call another filter inside the decode filter. This one is called: wp_filter_kses() and it gets called from the bp filter bp_forums_filter_decode() in bp-forums-filters.php.

    If commenting out the line in my previous message doesn’t solve the problem then uncomment the line in the above message and try commenting out line 53 which reads $content = stripslashes( wp_filter_kses( $content ) );.

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hello,

    if i comment the filter out, everything is ok.

    Best,

    Karl

    Avatar of Burt Adsit
    Burt Adsit
    Participant

    @burtadsit

    In bp_forums_filter_encode() line 42 should read: $content = htmlentities( $content, ENT_COMPAT, "UTF-8" ); What does yours say?

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    in line 42 i have exactly the same

    $content = htmlentities( $content, ENT_COMPAT, “UTF-8″ );

    strange

    Best,

    Karl-Heinz

    Avatar of Burt Adsit
    Burt Adsit
    Participant

    @burtadsit

    I wish I could reproduce your problem. It’s tough to track down a problem if it isn’t reproducible somehow.

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hello fishbowl81,

    is there a way to use an other filter than the one above? What can happen if i leave the filter out?

    I also use the latest WPMU Beta. Can this be the reason?

    Best,

    Karl

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hello.

    maybe a hint.

    after the update of the files the umlauts in the title of the message are ok. So if i write öäüßÖÄÜ in the title they appear.

    If i write the same umlauts öäüßÖÄÜ? in the textbox, they totally disappear now. So if i write only this small chain of umlauts, the textbox is empty. This happening also with the lates fixes i get from the trunk.

    Best,

    Karl

    ps. btw. If i edit an entry in this forum here and send it, i get the message “Topic not found.”, but the edited text appears.

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hello,

    no idea?

    Best,

    Karl

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hello,

    i still have not found the problem with the umlauts.

    the strange thing is, that in the forum title (BuddyPress frontend) the umlauts are there, but in the text the are deletet.

    So if the title is like öäüÖÄÜß, after sending it, it looks like öäüÖÄÜß

    If i write the same umlauts in the textbox, then after sending it the text is empty.

    Is the filter for the title and the textbox not the same in BuddyPress?

    If i write it in the bbPress Frontend, everything is ok.

    Best,

    Karl

    Avatar of Burt Adsit
    Burt Adsit
    Participant

    @burtadsit

    I put in a ticket in trac on this. It’s become and ‘official’ problem. :)

    I actually was stumbing across this last night. Maybe your encoding isn’t UTF-8 and it is being force? How about changing the code to get the current sites encoding instead of it being hard coded in that query?

    For example:

    $content = html_entity_decode($content, ENT_COMPAT, get_option('blog_charset'));

    Trent

    Avatar of Burt Adsit
    Burt Adsit
    Participant

    @burtadsit

    I’ve been looking at this problem and getting frustrated. I realized finally that the filter function actually does two things:

    $content = htmlentities( $content, ENT_COMPAT, "UTF-8" );
    $content = str_replace( '&', '/amp/', $content );

    From http://loadaveragezero.com/app/drx/Data_Formats/Character_Encoding

    [...] But there are a number of other issues to deal with. In particular, because UTF-8 is a multibyte encoding, meaning one character can be represented by more one or more bytes. This causes trouble for PHP, because the language parses and processes strings based on bytes, not characters, and makes mincemeat multibyte strings – for example, by splitting characters ‘in half’, bodging up regular expressions, and rendering email unreadable.

    Karl can you just comment out the following lines please:

    Line 46 $content = str_replace( '&', '/amp/', $content ) in bp-forums-filters.php

    and line 52 $post_text = str_replace( '/amp/', '&', $post_text ); in buddypress-enable.php on the bbpress side.

    I’d like to narrow this down to the htmlentities fn.

    I’m gonna help solve this or just move to a planet where only ASCII is spoken. :)

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hello,

    i comment out this lines, but i get the same result.

    I do some more tests, maybe they help.

    if i put this in the title: öäüÖÄÜß

    and this 3 lines in the content:

    öäüÖÄÜß

    1. fdgfd gdfgd sfgsdgs dfggf dfgd

    2. fdgdsgd fdgdfgd gdfsadfsdafsd dfg

    i get this:

    Title: öäüÖÄÜß

    Content: all lines deleted

    If i put this in the Title: öäüÖÄÜß

    and this in the content:

    1. fdgfd gdfgd sfgsdgs dfggf dfgd

    2. fdgdsgd fdgdfgd gdfsadfsdafsd dfg

    öäüÖÄÜß

    i get this:

    Title: öäüÖÄÜß

    Content:

    1. fdgfd gdfgd sfgsdgs dfggf dfgd

    2. fdgdsgd fdgdfgd gdfsadfsdafsd dfg

    Last test. If i put this line in the content:

    öäüÖÄÜß this is a test

    after sending it the whole line is empty /get deleted.

    All this is only in the BuddyPress frontend.

    So in the title i can use any umlauts, but in the content the filter routine deletes not only the umlauts, but depending on where the umlauts are also the rest of the text.

    Best,

    Karl

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hi Trent,

    i try your suggestion, but it dont change anything.

    Best,

    Karl

    Avatar of Burt Adsit
    Burt Adsit
    Participant

    @burtadsit

    Got anything to add to the trac discussion Karl? http://trac.buddypress.org/ticket/436

    Just disabling the filters is not a good idea. The filtering of content is bound up with stripping sensitive data. Maybe Andy can decouple those two and enable the content filters for those users that get bitten by the libxml2 problem.

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hi,

    everything works on the bbPress side with umlauts. Whould it be possible to use the same filters for the BuddyPress side of the forum?

    Best,

    Karl-Heinz

    Avatar of MartinNr5
    MartinNr5
    Participant

    @martinnr5

    I had issues with UTF-8 encoding, although not the ones Karl mentions, due to my host running my sites on a PHP 4 server. After a move to a PHP 5 server the problems went away.

    Avatar of Karlheinz01
    karlheinz01
    Member

    @karlheinz01

    Hi,

    i still have the same Problem. Any good news ;-)

    Best,

    Karl

    Avatar of MartinNr5
    MartinNr5
    Participant

    @martinnr5

    No offense but the “fix” for this bug is not acceptable. “a few international characters not working” /is/ a big deal to the rest of the world.

    The characters show up fine in Buddypress no matter where I post the text; BB or BP.

    The characters show up fine in BP and wrong in BB if I post the text in BP.

    There has to be another way around this.

Viewing 25 replies - 1 through 25 (of 30 total)

You must be logged in to reply to this topic.