The Meaning of Nothing
Nils in Smalltalk
Overview
This is an old, frequently repeated topic, which I happened to participate in during 1997.
There were many participants, so you need to look at the thread for the full (long)
discussion.
The following is a reference to the main thread-point that this short paper is on:
Original Posting: Nils in Smalltalk
I think Allen Wirfs-Brock's statement
> ... Nil's primary role in Smalltalk is to provide a value
> for uninitialized variables. You can think of it as an error marker.
provides the most important concept to this debate and possibly the end
to it.
For simplicity, we only want one mechanism in a language to identify an
unitialized variable. This could have been a special keyword (say '?'
was reserved as asking the variable [not the object in the variable]
are-you-initialized?) and then we would have no debate: Programmers
don't add new keywords. So we would happily know that
x? ifTrue: ["x is initialized"].
was the 100% correct way to check a variable.
But Smalltalk is a very syntactically simple language, and it uses a few
core concepts (objects, messages, blocks, etc.) to provide the
expressiveness that requires many keywords in other languages. One of
these syntactic simplifications was to use an object ('nil') to indicate
an unitialized variable. This means that the above keyword '?' IS
replaced with testing for not-identical-to 'nil' and the above statement
would be (ignoring the possible order reversal):
(x ~~ nil) ifTrue: ["x is initialized"].
This must be an identity test: why would we care anything about the
behavior of the object in an unitialized variable. That variable's
object shouldn't even exist. We have already "distorted" the meaning of
unitialized by testing the value instead of the variable. We should not
go any further in treating the value as real.
Although Smalltalk is one of the best designed languages around because
of these types of simplifications/abstractions, they do seem to cause
problems in certain crucial points. This particular choice was
unfortunate because it allowed 'nil' to also be used for many other
functions that have nothing to do with unitialized variables. In these
other functions 'nil' is just one of many objects that could provide the
service and this is where the #isNil message springs into existance.
Some of these other cases (springing to mind) are :
(N1) As a special return value flag
(N2) As a special parameter flag (a non-existant parameter)
(N3) As a special state indicator
(N4) As a general proxy mechanism parent (send a message and it gets redirected)
ALL OF THESE could be done (and possibly should be done) with some
object other than 'nil'. For 'N1' and 'N2' flags we can create our own
distinguished objects. For 'N3' state indicators we can create our own
distinguished objects or have messages (#areYouSpecial :-) which allow
multiple objects (especially proxy objects) to be special. And we can
use Object instead of UndefinedValue as our proxy root class (and curse
the vendor or our coworkers for all the methods we have to override).
This would get rid of the overloading of 'nil' and prevent the
confusions of when other objects should act like 'nil'. Like
(O1) When the object is a proxy for a state that could be (N3)
Note that I am not pontificating (or at least I am part of the audience
too). I have used 'nil' for all of N1..N4 and can't recall the last
time I used '== nil' (or 'nil ==') instead of #isNil. I would say my
uses were reasonable approaches given the tools, standard libraries,
common idioms, coding speed, etc. etc. But most of them were still
wrong and I should have thought more carefully.
Now can I really change my (and my teams') habits and, if so, how should
I change? Certainly using the 'nil' identity test is the correct thing
to do for an unitialized variable. It also has to be used for at least
some 'N1' and 'N2' that involve the standard libraries, so it might as
well be used throughout a system. 'N4' seems to be required in VW
because of the heavy loading in Object.
I think 'N3' and '01' are the easiest ones to fix. I personally use a
created distinguished object (or a special message) instead of 'nil' for
special state distinguishing, but there are still other people who
don't. Probably just protecting against further propogation is
appropriate.
Note that 'N1' and 'N2' should also test for identity. There is no
reason to allow someone to sneak in their own object that pretends to be
your distinguishing value: you get to define it and if you choose it to
be 'nil' than someone should pass you a 'nil'.
LOG TO SELF
So a standard approach for correcting your code might be: If you are
sending #isNil,#notNil either (1) you should be able to change it to '==
nil', '~~ nil' or (2) you shouldn't be using 'nil' for that purpose.
For (1) you should then change to the correct idiom and for (2) you
should create your own distinguished object or a new protocol and
supporting objects. Of these (1) should always work, but (2) might
require more effort than you are willing to do (especially if it is
system wide) in which case, leave it as it is, but know that it is a bit
of a "hack" and find a better solution next time.
END LOG
--Mark
mark.fussell@chimu.com
Subsequent Discussions
Ivory Towers? and the variations of 'nil'
Frank Zdybel Jr. wrote:
> Once upon a time in an ivory tower far, far away, nil meant
> 'uninitialized variable'. But in these modern times we know
> better: nil means 'absent', a more widely useful abstraction.
> It is time to stop lamenting the rapture of nil from the bosom
> of her family; she has gone on to greater glory.
I think no one in this discussion is concerned with history as much as
he/she is concerned with discovering the best approaches to developing
software. The history of the work of brilliant people can be very
useful, but today's knowledge and criteria always filter that
brilliance. Considering how much of ST-80 lives today, we either have
very poor filters or the radiation levels at PARC were extremely high
back then.
Having a single object 'nil' or a single method #isNil represent the
very general concept of "absent" is probably not a good idea. Absent
could mean:
- Uninitialized (we have not yet reached the point we can
initialize the variable, but the value will exist and must exist before
we "read" the variable)
- Inapplicable (the value does not apply to this
object/method/etc.)
- NotYetKnown (we should have some value, but we don't know it
yet, OK to read/use)
- NotEntered (UI variation on the above)
- FunctionallyUncomputable ( transitively from a function
involving (3) )
- OutOfDomainBounds
- NonFunctionalState
- NoneFound (distinguished return value from a lookup)
- ...and so on...
Worse, some of these meanings are applicable in the same context. You
could have a NotYetKnown that will transition to an Inapplicable, or a
return value that could either be a NoneFound or an actual "value" of
NotEntered. Overloading a single object for all of these purposes for
"absent" would require serious mind-reading among all the developers and
would not be very good engineering.
So, again, I believe that it would have been best to have constrained
the language mechanism to isolate (1) from all the others. Instead, a
very general 'nil' is used for (1), so it would be best to avoid using
it for the rest. This may not always be convenient or acceptable. You
certainly have to consider consistency with other parts of the system
(and Smalltalk has a big built-in system). Of the other meanings for
"absent", probably (2) and (8) are the most common and least damaging.
Taking this a bit further, it would also be best to have a distinct
standard, idiom, or pattern (meaning general to most languages) to
handle each of the different meanings. A good example is the
at:ifAbsent: methods that avoided the problem of a distinguishing return
value by having the caller provide the behavior for a NoneFound (8) in
an absent-block. Using different distinguishing values (preferably
within the domain of the type itself: such as a NotEnteredPoint) would
seem to be a minimal good-design approach.
The topic of missing information is one of the hotbeds in information
modeling (and specifically relational modeling). For a very different
and generally more in depth discussion people may want to also read the
works of Date, McGoveran, Codd, etc.
--Mark
mark.fussell@chimu.com
PS: Somehow I managed to misspell "uninitialized" about 6 times in my
posting by consistently dropping off the 'ni', so here they are: ni ni
ni ni ni ni ;-)
The term 'Unitialized'
David N. Smith wrote:
>...
> Smalltalk has no such concept as 'uninitialized variable'.
True, in the sense of a variable that points to random memory or points
to something that is not an 'object' in a pure OO language. Fortunately
this type of 'unitialized' variable has pretty much vanished from modern
OO languages.
>...
> But, regardless, there are no uninitialized variables in Smalltalk, just
> variables initialized with objects one doesn't like.
So we could call them 'unwanted-initializations' but that is a bit
unwieldy, and at the level of proper execution of our program, they are
still 'uninitialized': their initial value is just as wrong for correct
program execution as if they referred to garbage memory. The program
just doesn't blow up as badly when it tries to use them (or worse,
continue without noticing the error). [In the case where 'nil' is a
valid value for a variable than the variable would be
'auto-initialized', not uninitialized]
Considering that the 'garbage use' of the term 'uninitalized' is gone, I
think it is natural to reappropriate it for this other meaning unless we
have a better term. I will think about it, but I haven't found or come
up with one yet ('incorrect', 'improper', 'nasty',
'not-yet-invariant-conformant', 'extra-variant' :-). Do you have a
suggestion?
--Mark
mark.fussell@chimu.com
Nils and SQL NULLs
Kevin Szabo wrote:
> Brian Gridley wrote:
> >The most damaging use of Nil that I have seen is in the Database
> >extensions to VW. There nil is what is returned whenever a NULL value
> >is found, as well as when the value has not even been queried.
>
> Interesting. A while back I had to roll my own INGRES-VW interface and
> I thought about NULL objects vs using nil to represent the NULL SQL
> entity. I think I started out will a NULL object, but then it had no
> behaviour other than what nil did, so I went back to mapping NULLs from
> INGRES back to smalltalk nils. ... [snip]
Depends on whether you really need to model a true SQL NULL. NULL
participates in a three valued boolean logic, so is has distinctly
different behavior from a 'nil'. For receiving values from the database
this does not matter much, but for anything that might touch the query
engine it can be pretty important to track the difference between
'no-value' and 'unknown-value'.
And, as Brian said, it can be important to seperate a 'not-set' value
from the database 'NULL' value. Some database write operations do not
like 'NULL' values when you really want to 'not-set' that particular
attribute (column) value. If everything maps to 'nil' then it is
impossible to tell the difference.
Wide use of context-specific special objects would certainly help. My
database frameworks use a NotSet object for the same reason Brian
mentioned: The database took over 'nil' (or in my recent case 'null') as
indicating a relational NULL, so I needed a new object. On the other
hand, I probably would have done the same design even if NULL was
represented as its own special object, and I certainly would if it (a
NotSet object) was the "standard" approach. It makes the code much
easier to understand and maintain if you see 'Unset-Column-Value's and
'SQL-NULL's instead of just 'nil's everywhere.
--Mark
mark.fussell@chimu.com
Terminology: Unassigned, Unattached
Tim Rentsch wrote:
> > Considering that the 'garbage use' of the term 'uninitalized' is gone, I
> > think it is natural to reappropriate it for this other meaning unless we
> > have a better term.
>
> May I suggest 'unassigned' (or 'not yet assigned')? For Smalltalk
> that works pretty well.
Quite a reasonable suggestion.
My first thought is assignment is more general than initialization (i.e.
initialization is the first assignment of a variable), so unassigned and
uninitialized are effectively synonymous. This could be viewed
differently in that initialization is not officially an assignment (a
kind of pre-assignment). So an initialized variable would still be
unassigned. But one of my main points in discussing 'initialization'
was to step up one level over garbage memory issues: Until an object or
variable is placed in a valid state for its model's rules it is not
initialized. I wasn't trying to say you had to formally assign a value
to it (the default value of 'nil' could be perfectly fine for some
variables) to make it initialized. 'Invalid' is good except it does not
imply 'before being valid'. Maybe 'prevalid'???
But after reviewing the discussion and thinking a bit more about this I
realized the term 'attached' from Eiffel is probably what I should have
been using for the particular discussion. "A reference is a value which
is either void or attached. If a reference is void, no further
information is avaialable on it. If it is attached, it gives access to
an object; it is said to be attached to that object. The object will
also be said to be attached to the reference" [Meyer 92].
Why 'attached' is probably the correct word because I was discussing
Allen Wirfs-Brock comment:
>> ... Nil's primary role in Smalltalk is to provide a value
>> for uninitialized variables. You can think of it as an error marker.
Replacing 'unitialized' with one of 'preattached' or 'unattached' is
probably closer in intent to the original than any other word we have
come up with. In a sense Eiffel's approach is a slight variation on the
above. Then my comments can be seen as an restatement and example of
Meyer's definition of attachment.
Mark L. Fussell [edited replacing 'uninitialized' with 'unattached']
> For simplicity, we only want one mechanism in a language to identify an
> unattached variable. This could have been a special keyword (say '?'
> was reserved as asking the variable [not the object in the variable]
> are-you-attached?) and then we would have no debate: Programmers
> don't add new keywords. So we would happily know that
> x? ifTrue: ["x is attached"].
> was the 100% correct way to check a variable.
>
> ...One of these syntactic simplifications was to use an object ('nil') to
> indicate an unattached variable....This must be an identity test: why
> would we care anything about the behavior of the object in an unattached
> variable. That variable's object shouldn't even exist.
Note that Eiffel's 'Void' is an interesting variation on 'nil'. 'Void'
is an Object but is incapable of understanding any messages. It is what
'nil' would be if you took out all the methods in 'UndefinedObject',
removed the 'Object' inherited behavior, and disallowed extended
behavior. Clearly this object 'Void' could only have identity because
you can not interact with it at all (well, unless you like catching
exceptions). Effectively 'Void' is a purer (less existent or more
nauseating ;-) 'nil' and prevents the whole #isVoid discussion.
Comments on 'attached', 'unattached', 'preattached', 'invalid',
'prevalid', 'unassigned', 'uninitialized' ????
--Mark
mark.fussell@chimu.com
Eiffel 'Void' vs. Smalltalk 'nil'
Jsarkela wrote:
> [Mark L. Fussell wrote:]
> >Note that Eiffel's 'Void' is an interesting variation on 'nil'. 'Void'
> >is an Object but is incapable of understanding any messages. It is what
> >'nil' would be if you took out all the methods in 'UndefinedObject',
> >removed the 'Object' inherited behavior, and disallowed extended
> >behavior. Clearly this object 'Void' could only have identity because
> >you can not interact with it at all (well, unless you like catching
> >exceptions). Effectively 'Void' is a purer (less existent or more
> >nauseating ;-) 'nil' and prevents the whole #isVoid discussion.
>
> This is almost correct. Actually, the object Void supports all methods
> in a system since it inherits from every class. Void is conceptually an
> instance of the class None. Thus the classes in an Eiffel system form a
> lattice with Any as the top and None as the bottom. The object, Void,
> raises a fuss upon feature application because none of its inherited
> features are exported to clients but for some special methods
> such as, isVoid. This singular undefined or void object can be
> managed by the compiler (Eiffel's standard implementation)
> or by the runtime (Smalltalk's standard implementation).
I did not want to bring in all the specifics of Eiffel in a Smalltalk
oriented discussion, so I simplified 'support but does not export' to
'does not support/understand'. In the closest Smalltalk terms None
implements all methods defined anywhere but the implementation is '^self
doesNotUnderstand...'[1] I feel this is pretty much equivalent to not
implementing any methods.
> (Aside: Much confusion arises because inheritance in statically
> typed languages such as Eiffel means something quite different
> than inheritance in dynamically typed languages like Smalltalk.)
I don't think that is true. There is little difference in what
inheritance means, it is the 'Typing' of variables that makes a
difference[2]. Statically typed languages are pessimistic and constrain
variables so they know a future message send will be successful.
Smalltalk is optimistic and waits until the message send to verify its
success. This allows greater flexibility and reusability but it hinders
detection of defects. I discuss this a bit in
http://www.chimu.com/publications/smallJava/
Although Eiffel has one of the nicest static-typing systems around, the
feature you brought up (non-export of an inherited method) is also
surprisingly 'optimistic' itself. You won't find out until you send a
message, do a Void check, or do a system wide validation that the actual
object really doesn't support the message/feature it claims it
supports. Having to check whether a parameter that claims to be a Point
really is a Point (because it might be Void or some other class
subclassed from Point that did not export #x) is surprising for a Design
by Contract language.
--Mark
mark.fussell@chimu.com
Identity operators and Eiffel
Bob Jarvis wrote:
> Mark L. Fussell <mark.fussell@chimu.com> wrote in article <33D346C1.35E8@chimu.com>...
> > Note that Eiffel's 'Void' is an interesting variation on 'nil'. 'Void'
> > is an Object but is incapable of understanding any messages. It is what
> > 'nil' would be if you took out all the methods in 'UndefinedObject',
> > removed the 'Object' inherited behavior, and disallowed extended
> > behavior. Clearly this object 'Void' could only have identity because
> > you can not interact with it at all (well, unless you like catching
> > exceptions). Effectively 'Void' is a purer (less existent or more
> > nauseating ;-) 'nil' and prevents the whole #isVoid discussion.
>
> Interesting. So, if you have an object which understands no messages
> what good is it? I mean, you can't *do* anything to/with it. For example,
> how do you find out what it is? You can't say
>
> aVar == Void
>
> because if aVar actually is set to Void the message send will bomb because
> your Void object doesn't understand any methods, including #==. (Of course,
> it also doesn't understand #doesNotUnderstand: so this might cause yet more
> difficulties :-). I'm curious about how Eiffel gets around this.
Eiffel's identity operators[1][2] do not have the semantics of a message
send. This actually makes sense because there is no way either 'a' or
'b' could answer the question "are you the same exact object as this
other object" any better than the "machine" itself. There are no
attributes that determine whether the objects are identical so they
can't chat with each other to find out[3]. You simply have to know
whether they are or not (and, in a sense, looking from the outside), so
why pretend it is a message send any more than assignment is. [4]
The only advantage a real method '==' provides is the ability to
override the default machine behavior, which is unlikely to be what
anyone wants [well, some of us do for a very short flaming period of
time]. Most Smalltalk's do not send '==' as a message either and
overriding it does nothing.
But anyway, Eiffel specifically says:
"Equality expressions cover both equality and inequality tests, using
the symbols = and /=. Although they are syntactically similar to
operator expression, with = and /= being used in infix form, it is
preferable to treat them seperately because here the semantics is not
that of a call"
"If both e and f are of reference types, the expression [e = f] denotes
reference equality. In other words, it returns true if and only if e
and f are either both void or attached to the same object."
[Bertrand Meyer: Eiffel, The Language]
There are also specific remarks about the variations with expanded types
and Void, but they lead to the same solution.
--Mark
mark.fussell@chimu.com
a: Hey, we have the same identity hash! We must be the same!
b: Yea, and so does that Smalltalk-80 book sitting on my desk. I don't
think I'm it. But
a: maybe this is
b: a horrible mista... [Scene ends when Mark closes book]
Identity operators-2
David N. Smith wrote:
> Mark L. Fussell writes:
> >The only advantage a real method '==' provides is the ability to
> >override the default machine behavior, which is unlikely to be what
> >anyone wants [well, some of us do for a very short flaming period of
> >time]. Most Smalltalk's do not send '==' as a message either and
> >overriding it does nothing.
>
> While overriding it does nothing, #== can be #perform:ed in Smalltalk.
>
> a perform: #== with: b
>
> and there really is a #== method:
>
> == anObject
> ^ self == anObject
>
> There is thus a subtle difference between an optimized message and a
> language feature (like assignment).
But isn't this more confusing than saying '==' is a language feature?
We now have made
a == b
different from
a perform: #== with: b
The second is a real message send and the first one is not -- even
though the first one is the standard form of a message send. We have
now made message sending in general more complicated because it has
exceptions [1].
Similarly complicated is that the method
== anObject
is only called for the 'perform:' variation (or a
'get-the-method-object' itself message send) which must be understood to
see why:
== anObject
^self == anObject
is not an infinite recursion.
If '==' was specifically stated to be a language feature we could also
define a real method:
isIdenticalTo: anObject
^self == anObject
and provide the same 'perform' capability as well as normal message
sending and overriding [2] Whether to use '==' or 'isIdenticalTo:'
would be the same type of choice as between 'basicNew' and 'new'. Much
of the lower level code would still use '==' because that is what they
really care about "are the two objects the same to this VM?". The
higher level could would use isIdenticalTo: to allow for proxies and
other higher level concepts of identity.
Note that I don't really think pretending '==' is a message and having
it be half-of-a-message is a significant problem: It is only a bit
confusing to programmers when they see (or have to guess) what is under
the covers. This is in exchange for the conceptual simplicity of
everything being a message. The only reason I brought it up was that
Eiffel chooses to not make the pretense which allows it to have a
Void/nil object which can not respond to any message but can still
participate in an identity comparison.
--Mark
mark.fussell@chimu.com
|