[ overboard / sfw / alt / cytube] [ leftypol / b / WRK / hobby / tech / edu / ga / ent / 777 / posad / i / a / R9K / dead ] [ meta ]

/meta/ - Ruthless criticism of all that exists (in leftychan.net)

Discussions, querries, feedback and complaints about the site and its administration.
Tor Only

Password (For file deletion.)

Matrix   IRC Chat   Mumble   Telegram   Discord

File: 1621313198729.jpg ( 274.61 KB , 2048x1152 , 1621284985965.jpg )


There isn't a bug report thread so

>Incorrect string value: '\xEF\xBDuygh' for column `lainchan`.`posts_b`.`body_nomarkup` at row 1

I tried posting this picture on /b/ with full-width text as the body.


It would have been the full-width text ("body" refers to this main text part of a post). As you can tell, your image worked fine.
We've seen a few cases where unusual characters (including some foreign-language copypastas) have caused similar-looking errors.
I'll ask the devs but I think it's an issue that stalled due to priorities and devs doing IRL stuff.


> lainchan
You have to go back.


Where do you think you are?


I got a similar error message when making a post that contained Japanese characters.


Looking at the error, I think I know what it is and it should now be fixed.
<these next two points are technical, interesting but TL;DRable
>in the beginning there were ASCII character encoding, where the values 0 - 127 that's 2^7 (7 bits), minus 1 because programmers start counting at 0 not 1 represent each character ('A' = 41, 'B' = 42, … )
>naturally 127 values isn't enough even for Europe so everyone told America to fuck off and now we use UTF-8 encoding (Unicode), where a single character can be represented by an amount multiple bytes long ('A' = 41, '☭' = 14850221, '日' = 15112101, 'a' = 15711617, '💩' = 4036989609), and in order to remain compatable with ASCII you can basically have a signal (first bit of byte=1) for ''and also include the next byte after this". A value under 256 (2^8) requires one byte, 65536 (2^16) is two, 16777216 is three, 4294967296 is four.
>however, by default, PHP's function for the filter parsing language (regex) doesn't treat messages as multi-byte by default, despite UTF-8 being used almost everywhere if you aren't anglo. So it would break the 日 (3 bytes) into three weird bits and treat each of those weird bits as letters when looking for a phrase.
>if the stars aligned and if 'found' a filtered phrase, it replaced it. Even if it was the 3rd byte of 日 and the 3 bytes of 本. It would have just ripped out the last third of 日 and replaced it with 'uygh'.
>when that string of characters (your post) gets processed later after running the filter, the broken character caused an error because by chance it's an impossible letter so it refuses to post.

Unique IPs: 3

[Return][Catalog][Top][Home][Post a Reply]
Delete Post [ ]
[ overboard / sfw / alt / cytube] [ leftypol / b / WRK / hobby / tech / edu / ga / ent / 777 / posad / i / a / R9K / dead ] [ meta ]