[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

Re: Submission of draft-pam-html-fine-trans-00.txt



Lasse Hiller=F8e Petersen wrote:
> >   The proposal is to add an HTML tag with the following syntax:
> >   <TEXT SRC=(URI) {PLAIN} {RANGE=(start),(end)}>
> 
> Comment #1. Since what is included is in a sense a (dynamic) image (in a
> broader sense) of a different text, why can't the IMG element simply be
> extended to permit this? OK, this may cause problems. In that case, INCLUDE
> may be better than TEXT.

While the <IMG> and proposed <TEXT> tags both implement transclusions,
I don't think it would be intuitive to extend the existing <IMG> tag.
However, since several people have made the same suggestion and it appears
that the <INCLUDE> tag is now available, I would support this change.

> Comment #2. Why should the client modify the included text when a Plain
> attribute is present?

Because it provides needed functionality.

> And why only a Plain attribute? Why not allow for the future, and have
> a Type attribute? (In principle, adding a Type attribute to the IMG
> element would be a more unified approach.) A possibility could be to
> allow for a Type=MIME-Type. An inclusion of a HTML-document by <IMG
> Src="text.html" Type="text/plain"> would permit the browser to render
> the HTML-text as is (useful for HTML guides, embedding the inclusions
> in <PRE></PRE>), whereas <IMG Src=text.html"> would include the text
> as HTML, and apply HTML-rendering to the text.

I'm not sure I understand the benefits of this suggestion.  The PLAIN
attribute is intended to remove all markup so that new markup can be
applied.  What would it mean to apply each MIME-type to a document?

> Comment #3. The IMG and APPLET elements both have an Alt attribute.
> Wouldn't it be desirable to have this as well?

This suggestion, also made by someone else, holds merit.

> Comment #4. The draft does not seem to specify how "active" elements should
> affect included HTML?
> #4a. Assume <A Href="url"><STRONG><IMG Src="text.html"></STRONG></A>.
> Should the text be rendered as a bold link?

Yes.  If that isn't clear, perhaps I should make it explicit.

> #4b. What if the source is a part of the source document?

That's probably legal, unless recursive.  I should probably add something
to the draft about nesting and recursion.

> What if the part is not balanced? Should the part be checked for
> correctness/balance?

It should be handled in whatever way browsers already handle bad markup.
The intention is to leave that to the author and authoring tools to check.
Again, I should probably make this explicit in the draft.

> Comment #5. It seems the intention of the Plain attribute is to prevent
> remote markup to interfere with the including document.

Exactly.

> Wouldn't it be desirable to have a more fine-grained control over this,
> so for example emphasis can be retained, while references are removed
> (keeping EM, STRONG, CITE, but omitting A)?

I can see how this would be useful, but how would it be specified?

> Comment #6. This relates to Comment #4 and #5. What if the included HTML
> document part contains a reference to a local anchor. Is this anchor
> modified to be absolute, or can it refer to an anchor in the including
> document?

Excellent point.  This may also apply to relative URLs in general.
They should probably be modified to be absolute.

> What about embedded images?

The same applies.

> Comment #7. While pattern matching is nice, it could be expensive.

Yes indeed; the principal reason why it is not part of the main proposal.

> Should the matching be performed by the browser, or the server of
> the included text source?

There are arguments for both approaches.

> If the latter, how is this indicated to the server? A normal IMG Src
> inclusion simply results in a plain HTTP transfer of the image data, but if
> the TEXT inclusion was only a small part of a huge document, the transfer
> of the whole document would be highly inefficient. Would it be desirable to
> include location specification in the URL, rather than as attributes to the
> TEXT element? That is: would it be better to extend the URL definition to
> allow for document parts?

Absolutely.  That is another issue which we are actively following.
The inclusion of a facility for the client to select a portion of a
retrieved document is intended purely as a kludge until it becomes
possible to perform these kinds of partial retrieval.  The problem
is that there will still be servers incapable of supporting partial
retrieval for quite some time after the introduction of a standard.
Of course, compliant proxy servers might help speed the transition.

> Comment #8. How is the version of HTML used in a source for inclusion
> passed on to an including document? How is compatibility maintained in
> general? The included text is not a complete HTML document. (And as
> mentioned, it is difficult even to assure that it is a valid HTML document
> by itself.)

As in my answer to comment #4 above, this is really up to the author and
the authoring tools used to perform the transclusion.  The traditional
Xanadu solution is to have all markup in a parallel addressing range, not
embedded within the document itself.  Much work has also been done, and
continues to be done, on separating structural and presentational markup.

> Comment #9. Should included text be regarded as atomic; that is, can there
> be references to a document part, which refers to a slice of the document
> that crosses a further included text? (part a of doc A is included in
> paragraph b of doc B, doc C included some text from paragraph b.) This is
> obviously related to how to refer into a document in the first place, as
> mentioned in #7. Whereas it probably makes only little sense to talk about
> a section of an image, it would make a lot of sense to talk about text
> slices that cross inclusion boundaries.

Yes.  The Xanadu design considerably simplifies these issues because
each transclusion causes a new list of address ranges in the original
source documents to be constructed, thus allowing an infinite nesting
depth because each new document refers directly to the originals rather
than the intermediates.

> There are two separate problems that have to be addressed - HOW TO TAKE
> SOMETHING OUT and HOW TO PUT SOMETHING IN:
> 
> Problem #1. How to refer to document parts, or how to construct documents
> that are just parts of other documents.
> Currently there is no well-defined way to request a part of a document from
> a HTTP server, and there is no URL-notation to achieve this. Anchors
> provide marks in the text that can be used for this, but only allow
> reference to parts that have been anticipated when the document was
> written. An including document cannot arbitrarily pick a part of the text
> to quote. Using character positions (byte indexes) is not perfect, but
> there currently is no standard namespace that can be used to refer into
> HTML documents.

Absolutely correct.  The "byte range" HTTP extension provides a mechanism,
but unfortunately uses an HTTP header rather than the URL.

> Problem #2. Inclusion of any object in another document.
> Currently, a browser can only display images and applets inside documents.
> What is necessary is to allow (at least) textual inclusion. This raises the
> question of whether such an inclusion should be regarded as happening at
> parse time, so that included HTML will be parsed just as the containing
> document.

That is the intention of this proposal.

> This however means that parsing the HTML will become complicated
> and recursive.

Not if explicitly ruled out by the proposal.

> I don't know if SGML permits this.

Probably not!

> Alternatively, the browser process displaying the including document
>  will have to embed another browser process (displayed as an image?) in
> the window. This would probably affect further cross-boundary inclusion.

I don't regard this as a feasible solution, for exactly that reason.

> I am tempted to guess that HTML can't be stretched enough to accomodate
> these two problems, without severe kludging. (Verbed form of kludge, sorry
> if that's not proper usage of the word.;-)

You're probably right, but our aim is to promote as much of the Xanadu
functionality as we can in any and all environments where it would
be useful.  In the longer term we very probably have to move beyond HTML.

> An example of how this could be made to work:

[ Helpful example snipped ]

> Inclusion interests me, because I have implemented it partially (for a
> specific purpose) using CGI.

Can you elucidate further?

> Inclusion will make many things a LOT easier, but they also raise
> many questions.  In fact I think they highlight problems of HTML and the
> WWW in general,

I fully agree.

> namely that there is no enforced concept of containment - the apparent
> hierachical structure of URL's does not necessarily convey any
> meaningful structure, but is only a convenience for the publisher of
> documents, because it mirrors the storage structure on the server. The
> only containment is between the host storing the document and the
> document itself. In the above example <http://foo.bar.org/reports>
> would probably be an index of reports, which can be meaningfully said
> to contain myreport (and probably other reports), but this is would only
> be a local convention. An OOPL-like namespace would be much better, IMO.

Sounds reasonable.

> Well, sorry for the length and the digressions. I hope this input will be
> of some value to you, and that you don't feel I have wasted your time.

Not at all!  Thank you for your contribution.

Share and enjoy,
		*** Xanni ***
-- 
mailto:xanni@xxxxxxxxxx                         Andrew Pam
http://www.xanadu.net/xanadu/                   Technical VP, Xanadu
http://www.glasswings.com.au/                   Technical Editor, Glass Wings
http://www.sericyb.com.au/sc/                   Manager, Serious Cybernetics
P.O. Box 26, East Melbourne VIC 8002 Australia  Phone +61 3 96511511