vlbrown
Mar 15 2003, 08:02 PM
I am happy that the "Convert Line breaks" function treats empty lines as virtual <P> tags and line breaks as virtual <BR< tags. However, this conversion should not be done within HTML tages (e.g. <A HREF=...). I prefer to break up HREFs, IMGs, etc, as otherwise they get too long and obscure the text. If I type
<A
HREF="http://longname.come/long/path/to/file.html"
<text</A<
I do not want this to show up literally because MT decided to take my "line breaks" as text breaks.
girlie
Mar 16 2003, 07:28 AM
Gee. That's an interesting way of writing HTML.
I'm curious as to how many other people write it that way, since this could be seen as a request to accommodate a particular preference rather than the commonly accepted method.
Anyone else do this?
vlbrown
Mar 16 2003, 01:57 PM
You're kidding... you've
never seen HTML written this way? I have to ask... how much HTML have yu writen? I mean really, from scratch, not using Dream Weaver or something like it. The tags are _meant_ to be allowed to be broken to keep things from needing to be all on one humongous unmanageable line.
HTML tags for any web browser can be broken at known locations within the tag; the tags themselves cannot be broken but you can break between parameters, or between the tags and the parameters.
How about if I phrase it this way;
It's a feature for MT to allow some magic that web browser do not allow (such as blank lines == paragraphs)
It's a bug for MT to disallow functionality that all web browsers accept - such as formatting HTML tags to be more readable and maintainable by the author/programmer
Take a look at
http://www.w3.org/TR/html401/struct/links.html#h-12.1.4excerpt:
You'll find a lot more in <A href="chapter2.html"
title="Go to chapter two">chapter two</A>.
Note that MT woould imagine an undesired <br> in 4 places in this text, printing the hrefs and titles as literal strings in the resulting entry (try it yourself)
girlie
Mar 16 2003, 02:22 PM
First of all - I don't use Dreamweaver or anything like it to write my HTML code for me.
And the example you posted doesn't really indicate to me that it's necessarily an acceptable way to write it, just that the text may be wrapped in that particular document.
See here:
See also this <A href="../images/forest.gif"
title="GIF image of enchanted forest">map of
the enchanted forest.</A>
The way "the enchanted forest" wraps to the next line without indentation suggests something to me other than that is a preferred way to write code. Why isn't it all on one line like the excerpt you cited? Why break the text between "map of" and "the enchanted"?
I'm not saying you couldn't be correct, I'm just looking for some concrete evidence that it's an acceptable method of writing HTML. I'm all for clarity of code, and seeing it that way actually looks more confusing to me than having it all on one line.
girlie
Mar 16 2003, 02:30 PM
Oh and just for shits and giggles, I did a View Source on the page you linked to. None of the code is written that way. So, if w3 thought that was a good way to do it, wouldn't you expect them to set the precedent?
vlbrown
Mar 16 2003, 02:33 PM
Did you try the example in a web browser? The result is _one line_.
Why break in the middle of a sentence? - because _the author wanted to break there_.
Note that the sentence is still within the <A> and the </A> _and is therefore part of the tag/element/attribute set which shoul not be subject to internal line-breakage.
My point again:
It's a feature for MT to allow some magic that web browsers do not allow (such as blank lines == paragraphs in user text)
It's a bugfor MT to disallow/disable functionality that all web browsers accept, functionality that is part of HTML/SGML - functionality that permits formatting HTML code to be more readable and maintainable by the author/programmer
MT's current implementation of "convert line breaks" is breaking my code. Line breaks should be converted in text. They should not be converted in code.
And why are you arguing with me? Let Ben or Mina argue with me... if they feel it's necessary.
girlie
Mar 16 2003, 02:38 PM
I have just as much right to post my opinions to this forum as you do. And as a moderator, I was doing my job by moderating the discussion. You in turn, took things up a notch by attempting to insult me.
Just because you don't like to be disagreed with doesn't mean I can't post an alternative view of your request.
vlbrown
Mar 16 2003, 03:17 PM
http://www.w3.org/MarkUp/html-spec/html-spec_3.html#SEC3.2...
Tags delimit elements such as headings, paragraphs, lists, character highlighting, and links. Most HTML elements are identified in a document as a start-tag, which gives the element name and attributes, followed by the content, followed by the end tag. Start-tags are delimited by `<' and `<'; end tags are delimited by `</' and `<'.
...
The content of an element is a sequence of data character strings and nested elements.
...
A name consists of a letter followed by letters, digits, periods, or hyphens.
...
In a start-tag, the element name must immediately follow the tag open delimiter `<'.
...
In a start-tag, white space and attributes are allowed between the element name and the closing delimiter. An attribute specification typically consists of an attribute name, an equal sign, and a value, though some attribute specifications may be just a name token. White space is allowed around the equal sign.
---------------
The spec allows white space within the start tag. My "way of writing" HTML may be "interesting" to you because you have never seen it before... we learn something new every day. My "interesting" coding style is supported by the SGML specification.
MT's "convert" option is breaking my code. Convert line breaks in my text... keep your conversions out of my code. Code is not data.
girlie
Mar 16 2003, 04:00 PM
Okay. Let's assume for the sake of argument that I agree with you 100%.
Now, imagine what machinations MT would have to go through to determine whether or not to apply an automatic line break in each of your entries where you choose to use code over several line breaks. What would it have to look for in each line of text? Just the presence of < or >? Or would it need to compare each HTML tag you used to a list of valid HTML tags to know when NOT to apply a break? What is the bare minimum it would have to do in order to achieve the goal?
And how will this impact on rebuilds if MT has to so closely examine your entries in this fashion? Many MT users already have hosts who grumble at the resources they claim MT consumes.
So the question becomes a) is the impact of this behavior affecting a high number of users and therefore, b) is the impact of this behavior detrimental enough to warrant implementing another potentially detrimental behavior (increased rebuild time)?
Those questions are what prompted me to ask for feedback from other users as to their method of coding HTML in their entries.
If the solution is easily implemented without a negative impact in another area, then I'm sure it will be addressed in future versions of MT. But there is a bigger issue here which also has to be addressed: the needs of the many often outweigh those of the few.
stepan
Mar 16 2003, 04:18 PM
vlbrown is right that HTML treats whitespace (space, tab, line breaks) equivalently in most situations. Still, when I let MT break up my entry into lines, I'm no longer writing pure HTML and am willing to accomodate MT by not inserting line breaks inside a tag. It also makes it less "pretty" if I want to insert a list or some other structured tag, but c'est la vie. When I do serious HTML formatting, I don't let MT break lines for me and put in the P and BR tags myself.
It would be great ff Ben changed the parser to support this, but I think there are other things that are more important to work on. Especially considering that the text formatting is pluggable, so if you don't like the way it's done, write a formatter yourself (or get someone to write one for you).
btrott
Mar 16 2003, 04:42 PM
vlbrown -- I both agree and disagree. I agree that it would be nice if MT were able to detect multi-line HTML. But I disagree that it's necessarily a *bug* because, as stepan wrote, when you use convert line breaks, you are asking MT to treat the text as (largely) unformatted text, and asking MT to mark it up. Now, we want MT to be as correct as possible in converting markup--but not when we would have to sacrifice a lot of speed, as we would here. I believe we'd have to implement a full HTML parser, which would definitely slow down the posting--let me know if you see a simpler alternative.
If we could improve this and keep it fast, I'd be all for it. But if it's going to slow MT down quite a bit, we'd rather keep the formatting fast, particularly as "convert line breaks" is just the default, and text formatting plugins can be written (fairly easily) to be smarter about formatting.
vlbrown
Aug 10 2003, 11:19 PM
Well, we won't know if something slows down till we try it, will we :-) Let's say, for example, we start by not breaking HTML within <A tags... that's pretty simple, everything between the <A and the /A> is not busted up. No <BR>s or <P>s inserted. After that, you could do <img...>
I'm not suggesting a full HTML parser by any means; but anchor links and imaages are VERY popular in weblog; tables probably somewhat less so, most other HTML somewhere in the noise level. Yes, in a way, stepan is correct, but he's overstating the situation when he says "when I let MT break up my entry into lines, I'm no longer writing pure HTML". As soon as I ask to upload a file, I get loads of HTML.
Now, personally, I don't think anyone who is working in "Convert Lines breaks because-I-don't-want-to-deal-with-HTML mode" should be exposed to HTML at all, least of all something like this in the middle of their entry (yuch! ptooie!).
<a href="http://www.cfcl.com/~vlb/weblog/images/fish" onclick="window.open('http://www.cfcl.com/~vlb/weblog/images/fish.jpg','popup','width=188,height=447,scrollbars=yes,resizable=yes,toolbar=no,directories=no,
location=no,menubar=no,status=yes,left=0,top=0');return false">[img]http://www.cfcl.com/~vlb/weblog/images/fish-tm.jpg[/img]
But once that big wad of HTML is here...then we've moved waaaaay out of stepan's "no longer writing pure HTML". There's suddenly a much higher percentage of HTML in this file than text, probably. The least MT could do would be to accomodate it in such a way that allows the user to separate the HTML goo from their actual data/text/entry in some reasonable fashion...
Or I could file a more interesting feature request - make "Convert Line Breaks" aa real "don't show me the HTML" mode and HIDE all that #$^#&! javascript/HREF/IMG tag gunk
:-)
vlbrown
Aug 10 2003, 11:22 PM
The above was SUPPOSED to look like this
...
Now, personally, I don't think anyone who is working in "Convert Lines breaks because-I-don't-want-to-deal-with-HTML mode" should be exposed to HTML at all, least of all something like this in the middle of their entry (yuch! ptooie!).
<a href="http://www.cfcl.com/~vlb/weblog/images/fish" onclick="window.open('http://www.cfcl.com/~vlb/weblog/images/fish.jpg','popup','width=188,height=447,scrollbars=yes,resizable=yes,toolbar=no,directories=no,
location=no,menubar=no,status=yes,left=0,top=0');return false"><img src="http://www.cfcl.com/~vlb/weblog/images/fish-tm.jpg" align="" border="0" hspace="10" vspace="10"></a>
...
stepan
Aug 12 2003, 07:17 AM
If you're willing to pay the performance penalty (and I don't really think it's all that expensive), you can go to the html_text_transform function in lib/MT/Util.pm and replace the line that says:
CODE
$p =~ s!\r?\n!<br />\n!g;
with
CODE
$p = join '', grep { s!\r?\n!<br />\n!g unless /^</; 1; } split /(<.*?>)/s, $p;
See the
Multiline Tags with "Convert Line Breaks" hack for more info.
shabbirjsafdar
Aug 14 2003, 01:38 PM
This would be way down on my "wants" list, but this job could be made easier with HTML::TagReader, a really useful Perl module for parsing HTML tags that may split across lines.
-Shabbir
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.