[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

Third Party Suicide?



Abstract: I respond to John Walker & MarcS, examining a few options
and explaining how we got here.

   Date: Mon, 20 Nov 89 11:26:47 PST
   From: marcs (Marc Stiegler) (forwarding from John Walker)

   If we're trying to do things that blow up the only existing reference
   implementation of C++, to the extent that we're seriously considering
   shifting to a language environment that does not even target (at
   present) the two most widely distributed front-end architectures (the
   PC and Mac), it looks like we've got the "first real user of the
   language" blues, and bad.

Yup.  I think going to g++ would be an exceedingly bad idea for
precisely this reason.  Neither can we convince old C or C++ hackers
to give up their && and || operators.  Except for these two, all the
other problems are comparatively small pains (no ?:, no "for",
initialization wierdness that makes us have to break up "return"s).  I
don't *want* to ask our developers to live without these either, but I
am willing to try.  We do in fact have the above blues.

   I don't really know what the precipitating factors are for this
   apparent problem, particularly since, as usual, there appear to be several
   causes which must be weighed together, nor do I remotely know enough about
   what's going on to begin to offer suggessions.  

It would take a lot of explaining to explain why having a garbage
collector is so important for our code, but given this premise, the
rest follows rather simply.  Also, explaining it to you will give us
all an opportunity to reexamine our premises.

(Along one axis) there are two kinds of garbage collectors:
conservative and non-conservative.  Several people (especially Marc
Wieser) have implemented conservative garbage collectors for C and
C++.  The traditional problem with garbage collecting C is that you
don't know where the pointers are, so you don't know what to mark.
(In this discussion, I will use the term "mark" to mean "determine to
be non-garbage".  This is what the marker of a mark&sweeper does, but
I'm not necessarily talking of mark&sweep collection.)  

Wieser's insight (I think its Wieser's) is that you don't have to know
where the pointers are--anything that looks like a pointer and points
at the beginning of a heap allocated block of memory is assumed to be
a pointer.  Where this happens most radically is in scanning the stack
and global/static data space--the roots of the marking process.  Of
course, an occasional word aligned chunk of memory may happen to
contain a bit pattern which looks like such a pointer, and so causes
garbage to be marked.  The result is that the garbage collector may
accidently preserve garbage, but it should never eliminate
non-garbage.  Hence the name "conservative".

(This works given the assumption that an allocated chunk of memory is
never pointed to only in its middle.  C semantics do not demand that,
but it seems to be true in practice.  Those programs for which it
isn't true would break under Wieser's system.)

We considered a conservative collector, and rejected it because we are
building a server product which has to stay up for a *long* time.  If
we leak memory at some (even rather slow) leak rate, then eventually
we will die in our own waste.  Also, I don't know that it is possible
to build a *portable* *incremental* conservative collector, and we
need both.

Our breakthrough came when we realized that the PtrVar technique would
make it possible to keep track of where all the pointers are.  By
"PtrVar" I mean that we define a class (or set of classes) which act
like C pointers.  C++ theoretically makes this possible since it
allows overloading of unary *, ->, =, ==, etc.  We would then have two
kinds of declared pointers: Strong-pointers (SPTR), for use as
automatic stack pointer variables, and Checked-pointers (CHKPTR), for
use inside heap allocated objects.  Now our problem of knowing where
the C-style pointers on the stack are is converted into the problem of
knowing where the strong-pointers are.

The way that occured to us to do this was inspired by Michael's bomb
package--have the strong pointer constructor link the strong-pointer
onto a global linked list, and have the destructor remove it from the
linked list.  (If a deterministic way to find all and only the
strong-pointers on the stack could be found without giving them
destructors, that would probably lead to an acceptable solution.)
Then we start marking by walking this linked list.

(Note: for efficiency reasons, we also have Wimpy-pointers, but that
is another story and besides the current point.)

After we met with Peter Deustch (a world-class garbage collector ;-))
and talked over our plans with him, he asked about compiler generated
temporaries for holding onto itermediate results.  After much
investigation we determined that, by declaring the return types of
pointer-returning routines as strong-pointers, the compiler generated
temporaries for holding onto these values as intermediate results
would also be strong-pointers (in fact, the very same ones).  These
temporaries seemed to get constructed and destructed properly, and in
an order that makes the world safe for garbage collection.

However, we ran into problems when we proceeded from simple test cases
to conversion of our actual code.  There are many places expressions
that generate temporaries with destructors are not allowed.  The most
worrysome is in && and || expressions.

Virtually all our expressions are message sends or function calls.
Most of these return pointers to heap allocated objects.  In order to
generate temporaries when we need them, the return types are
strong-pointers.  So even when we don't actually need any temporaries,
virtually every expression that we write generates temporaries with
destructors.

There are many links in the chain that gets us into these blues.
Perhaps some of these links are weak.  We should examine these
critically.  

   At this point I just want
   to remind folks that development projects which have bet on ragged edge
   features of leading edge languages have, historically, been prone
   to disaster and slippages, sometimes measured in years (in fact,
   had Multics not run afoul of PL/I, we might not be using Unix today).

I share your fear, but I still believe that C++ was our best option.
However your comparison to PL/1 is much closer than I would have
guessed when I first learned the language.  

   If we are, indeed, constructing a development environment that forces
   unfamiliar tools upon our front end developers, 

We will be making many unfamiliar tools *available* to them.  I don't
think we are forcing very many on them.  You pointed out (in the 88.1
days) the need for a front-end library so developers wouldn't just be
left with the raw protocol spec (although you did *amazingly* well
starting from just that spec).  The tools that we are foisting on our
developers are optional in the sense that such a front-end library is.
Indeed, many of the tools exist either to create the library
(Stubble), or (among other things) to support it (e.g. X++ and garbage
collection).  Any developer that wants to deal directly with the
socket-level febe interface needs none of our tools (beyond the
backend :-)).  As the febe protocol contains less than a hundred
messages and less than a hundred data types altogether, this is not a
facetious suggestion.  Nothing in the backend assumes or depends on
front-ends going through our library (this is taken as a design
constraint).

However, my honest advice to any potential developer is to make full
use of our tools because they're good!  What we haven't yet examined
is what subsets of our tools are sensible, so that's it's not all or
nothing.  I think this would be a good idea.  I've already been
talking over with BobP how simple the user's view of Stubble can be
made. 

   requires them to code
   in an obtuse and hard to debug manner, 

With the exception of taking apart && and || expressions, I don't see
that anything we're talking about makes things hard to debug.  Overall
in the project, there has been much high quality attention made to
making things easier to debug.  On the other hand, we still haven't
found a good symbolic debugger for C++ itself.  This may be what you
are referring to, and it is of grave importance.

   and does not provide easy
   integration with existing code implemented with industry-standard compilers,

Not true.  Well, it depends on your definition of "easy", but I think
we're doing pretty well on this score.  Via the 'extern "C"'
declaration, there is easy intercallability in both directions between
C and C++.  Also, C++ is mostly upwards compatable from ANSI C (Though
not as close for K&R v1).  This means that those who want to write
actual C code, those wishing to code in the C subset of C++ and those
that have a large body of existing C code are all well served.  In
addition, we have taken care in both the garbage collector and in X++
to be intercallable and integratable with foreign (including
pre-existing) C++ code.  Including the ability to, in one piece of
code, deal both with our garbage collectable structures with our
strong-pointers, and deal with other structures manually.  Roger was
faced with exactly this problem in dealing with existing window
systems, and we constrained our design appropriately.  (It turned out
not to involve any design choices.  It did take some thinking to
realize this though.)

   perhaps we should take a couple of steps back from the abyss and think about
   what we are doing.  In such a situation, should

	   *  Scrapping the garbage collection architecture,

Admittedly we should consider this.  Let's first see if we can alter
it so that it avoids the problem.  Also note that it is possible to
consider this wrt our developer package and still use it in the
backend.  Having the computational assumptions between backend and
frontend differ so much would itself be costly (in terms of
reusability of tools).  I do consider scrapping the garbage collector
as mandatory for the front-end library plausible.  It would mean that
the library code we provide will have to be leak free (Heh pointed
this out).  Our library code could still be GC compatable.

	   *  Using C as the reference front-end implementation
	      language, with C++ as an option,

Regardless of any of the above problems, I think it is quite important
to support C as a development language.  This would be a lot of work
and it would be good to postpone it to a later stage of development.
But of course we can only postpone it if we find another way of
addressing the above concerns.  If we are even going to consider
backing off this far for the initial development support, I would
advocate C++ or X++ without garbage collection being considered first.
We've invested a lot of effort that would have to be thrown away if we
backed off to C (granted this sounds like the "sunk-cost theory of
value", but...) 

   and/or  *  Redefining the level of interface between the front-end
	      support code and the application to be more at arm's length
	      and hence language-independent (as is ADS, for example)

I think that we should also define such a level, and support those who
wish to work there.  A place to start would be to clean up the textual
syntax of the messages transmitted through the Stubble code, and
document that.  We all strongly suspect that we won't be releasing
product with the textual syntax being the major protocol.  To do so,
we've kept knowledge of the communication syntax restricted (to class
RPCTransceiver), and architected our communication support so that it
is easy and local to have an efficient binary syntax as well.  I've
always though of the textual syntax as just a temporary debugging
measure to be removed before we ship, but there's little reason not to
have the backend support both.  Doing so would certainly make
developers who want to avoid our choice of language and/or tools
happier (witness the textual NeWS protocol).

Also, taking the message protocol of Transceiver (the abstract
superclass) and reproducing its functionality as a set of C callable
routines is a direct minimalist approach.  From what I understand
about ADS, it would be analogous.  (My understanding of ADS may be
bogus.)

   be off the agenda?

Indeed.  The exercise of responding to your messages has already made
me realize there are more degrees of freedom here than I thought.
Thank you.

   [from marcS again:]
   -----------------
   How clearly have we thought through the implications of what we are
   doing for (or to) our third parties?

Not clearly enough.

   Bobp, this is of course of most concern to you. Has there been an intense
   discussion yet of what a C-based frontend to all this stuff would look
   like? If not, I would propose that you grab hugh and markm and discuss
   it (I propose a 3-person meeting, since 3-person meetings seem so much
   more effective). 

Heh: I'm meeting with BobP starting at Noon on Wednesday (orginally
for other purposes).  Heh & BobP: Should we redirect the meeting to
these ends?  Heh: Would you be free then?

   In general, are Walker and I confused? 

No.

   Are people really talking about
   taking an approach which rejects CFRONT implementations of C++ for
   the foreseeable future?

It was talked about.  I brought it up out of desperation.  I think we
have general agreement that it probably doesn't make sense.
Nevertheless, we should talk to Tieman.  Nat and AT&T are the priority
though.

I've also got a meeting with Roland tomorrow on the increadibly wierd
idea of a C++ to C++ converter that is mostly just a copy program, but
does such things as turned "for"s into "while"s.  We'll let everyone
know how plausible it all looks.  If it works, it would be a
completely portable C++ implementation that implements more of the
specs than AT&T's.  Wierd.

Ravi, how *did* you eventually reach Nat?