I have MT 3.15 running a Hebrew blog using utf-8 encoding.
The blog is at http://www.xslf.com
On the main page I have a block that displays the last comments, using trimmed text in the comment itself.
All works fine, except for one thing: if the trimming is in a middle of a word, an invalid unicode character gets inserted into the html of the page.
It is displayed as a question mark or box depending on the browser, and it prevents the w3c validator from testing the page (which is the real pain).
The relavent code from the template:
CODE
<div class="side" dir="rtl">
<MTEntries recently_commented_on="10" sort_order="descend">
<strong>ב־<a href="<$MTEntryPermalink$>"><$MTEntryTitle$></a>
(<$MTEntryCommentCount$>)</strong><br />
<MTComments lastn="1"><$MTCommentAuthor$> כתב/ה:
<a href="<$MTEntryLink$>#<$MTCommentID$>"><$MTCommentBody trim_to="36" remove_html="1" convert_breaks="0"$>...</a>
<br />
</MTComments>
</MTEntries>
</div>
<MTEntries recently_commented_on="10" sort_order="descend">
<strong>ב־<a href="<$MTEntryPermalink$>"><$MTEntryTitle$></a>
(<$MTEntryCommentCount$>)</strong><br />
<MTComments lastn="1"><$MTCommentAuthor$> כתב/ה:
<a href="<$MTEntryLink$>#<$MTCommentID$>"><$MTCommentBody trim_to="36" remove_html="1" convert_breaks="0"$>...</a>
<br />
</MTComments>
</MTEntries>
</div>
The W3C validator error (validation results):
QUOTE
Sorry, I am unable to validate this document because on lines 317, 329, 353 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
(the line numbers match the lines which have comments that have been trimmed in the middle of a word. These line numbers will change as the comments displayed on the page change)