ChiMu  
 
Menu Edge About   Products   Services   Projects   Publications  
  Publications                 

The Meaning of Nothing

Nils in Smalltalk

Overview

This is an old, frequently repeated topic, which I happened to participate in during 1997. There were many participants, so you need to look at the thread for the full (long) discussion.

The following is a reference to the main thread-point that this short paper is on:

Original Posting: Nils in Smalltalk

I think Allen Wirfs-Brock's statement

> ... Nil's primary role in Smalltalk is to provide a value
> for uninitialized variables. You can think of it as an error marker.

provides the most important concept to this debate and possibly the end to it.

For simplicity, we only want one mechanism in a language to identify an unitialized variable. This could have been a special keyword (say '?' was reserved as asking the variable [not the object in the variable] are-you-initialized?) and then we would have no debate: Programmers don't add new keywords. So we would happily know that

    x? ifTrue: ["x is initialized"].

was the 100% correct way to check a variable.

But Smalltalk is a very syntactically simple language, and it uses a few core concepts (objects, messages, blocks, etc.) to provide the expressiveness that requires many keywords in other languages. One of these syntactic simplifications was to use an object ('nil') to indicate an unitialized variable. This means that the above keyword '?' IS replaced with testing for not-identical-to 'nil' and the above statement would be (ignoring the possible order reversal):

    (x ~~ nil) ifTrue: ["x is initialized"].

This must be an identity test: why would we care anything about the behavior of the object in an unitialized variable. That variable's object shouldn't even exist. We have already "distorted" the meaning of unitialized by testing the value instead of the variable. We should not go any further in treating the value as real.

Although Smalltalk is one of the best designed languages around because of these types of simplifications/abstractions, they do seem to cause problems in certain crucial points. This particular choice was unfortunate because it allowed 'nil' to also be used for many other functions that have nothing to do with unitialized variables. In these other functions 'nil' is just one of many objects that could provide the service and this is where the #isNil message springs into existance.

Some of these other cases (springing to mind) are :

    (N1) As a special return value flag
    (N2) As a special parameter flag (a non-existant parameter)
    (N3) As a special state indicator
    (N4) As a general proxy mechanism parent (send a message and it gets redirected)

ALL OF THESE could be done (and possibly should be done) with some object other than 'nil'. For 'N1' and 'N2' flags we can create our own distinguished objects. For 'N3' state indicators we can create our own distinguished objects or have messages (#areYouSpecial :-) which allow multiple objects (especially proxy objects) to be special. And we can use Object instead of UndefinedValue as our proxy root class (and curse the vendor or our coworkers for all the methods we have to override).

This would get rid of the overloading of 'nil' and prevent the confusions of when other objects should act like 'nil'. Like

    (O1) When the object is a proxy for a state that could be (N3)

Note that I am not pontificating (or at least I am part of the audience too). I have used 'nil' for all of N1..N4 and can't recall the last time I used '== nil' (or 'nil ==') instead of #isNil. I would say my uses were reasonable approaches given the tools, standard libraries, common idioms, coding speed, etc. etc. But most of them were still wrong and I should have thought more carefully.

Now can I really change my (and my teams') habits and, if so, how should I change? Certainly using the 'nil' identity test is the correct thing to do for an unitialized variable. It also has to be used for at least some 'N1' and 'N2' that involve the standard libraries, so it might as well be used throughout a system. 'N4' seems to be required in VW because of the heavy loading in Object.

I think 'N3' and '01' are the easiest ones to fix. I personally use a created distinguished object (or a special message) instead of 'nil' for special state distinguishing, but there are still other people who don't. Probably just protecting against further propogation is appropriate.

Note that 'N1' and 'N2' should also test for identity. There is no reason to allow someone to sneak in their own object that pretends to be your distinguishing value: you get to define it and if you choose it to be 'nil' than someone should pass you a 'nil'.


LOG TO SELF
So a standard approach for correcting your code might be: If you are sending #isNil,#notNil either (1) you should be able to change it to '== nil', '~~ nil' or (2) you shouldn't be using 'nil' for that purpose. For (1) you should then change to the correct idiom and for (2) you should create your own distinguished object or a new protocol and supporting objects. Of these (1) should always work, but (2) might require more effort than you are willing to do (especially if it is system wide) in which case, leave it as it is, but know that it is a bit of a "hack" and find a better solution next time.
END LOG

--Mark
mark.fussell@chimu.com

Subsequent Discussions

Ivory Towers? and the variations of 'nil'

Frank Zdybel Jr. wrote:
> Once upon a time in an ivory tower far, far away, nil meant
> 'uninitialized variable'.  But in these modern times we know
> better:  nil means 'absent', a more widely useful abstraction.
> It is time to stop lamenting the rapture of nil from the bosom
> of her family; she has gone on to greater glory.

I think no one in this discussion is concerned with history as much as he/she is concerned with discovering the best approaches to developing software. The history of the work of brilliant people can be very useful, but today's knowledge and criteria always filter that brilliance. Considering how much of ST-80 lives today, we either have very poor filters or the radiation levels at PARC were extremely high back then.

Having a single object 'nil' or a single method #isNil represent the very general concept of "absent" is probably not a good idea. Absent could mean:

  1. Uninitialized (we have not yet reached the point we can initialize the variable, but the value will exist and must exist before we "read" the variable)
  2. Inapplicable (the value does not apply to this object/method/etc.)
  3. NotYetKnown (we should have some value, but we don't know it yet, OK to read/use)
  4. NotEntered (UI variation on the above)
  5. FunctionallyUncomputable ( transitively from a function involving (3) )
  6. OutOfDomainBounds
  7. NonFunctionalState
  8. NoneFound (distinguished return value from a lookup)
  9. ...and so on...

Worse, some of these meanings are applicable in the same context. You could have a NotYetKnown that will transition to an Inapplicable, or a return value that could either be a NoneFound or an actual "value" of NotEntered. Overloading a single object for all of these purposes for "absent" would require serious mind-reading among all the developers and would not be very good engineering.

So, again, I believe that it would have been best to have constrained the language mechanism to isolate (1) from all the others. Instead, a very general 'nil' is used for (1), so it would be best to avoid using it for the rest. This may not always be convenient or acceptable. You certainly have to consider consistency with other parts of the system (and Smalltalk has a big built-in system). Of the other meanings for "absent", probably (2) and (8) are the most common and least damaging.

Taking this a bit further, it would also be best to have a distinct standard, idiom, or pattern (meaning general to most languages) to handle each of the different meanings. A good example is the at:ifAbsent: methods that avoided the problem of a distinguishing return value by having the caller provide the behavior for a NoneFound (8) in an absent-block. Using different distinguishing values (preferably within the domain of the type itself: such as a NotEnteredPoint) would seem to be a minimal good-design approach.

The topic of missing information is one of the hotbeds in information modeling (and specifically relational modeling). For a very different and generally more in depth discussion people may want to also read the works of Date, McGoveran, Codd, etc.

--Mark
mark.fussell@chimu.com

PS: Somehow I managed to misspell "uninitialized" about 6 times in my posting by consistently dropping off the 'ni', so here they are: ni ni ni ni ni ni ;-)

The term 'Unitialized'

David N. Smith wrote:
>...
> Smalltalk has no such concept as 'uninitialized variable'.

True, in the sense of a variable that points to random memory or points to something that is not an 'object' in a pure OO language. Fortunately this type of 'unitialized' variable has pretty much vanished from modern OO languages.

>...
> But, regardless, there are no uninitialized variables in Smalltalk, just
> variables initialized with objects one doesn't like.

So we could call them 'unwanted-initializations' but that is a bit unwieldy, and at the level of proper execution of our program, they are still 'uninitialized': their initial value is just as wrong for correct program execution as if they referred to garbage memory. The program just doesn't blow up as badly when it tries to use them (or worse, continue without noticing the error). [In the case where 'nil' is a valid value for a variable than the variable would be 'auto-initialized', not uninitialized]

Considering that the 'garbage use' of the term 'uninitalized' is gone, I think it is natural to reappropriate it for this other meaning unless we have a better term. I will think about it, but I haven't found or come up with one yet ('incorrect', 'improper', 'nasty', 'not-yet-invariant-conformant', 'extra-variant' :-). Do you have a suggestion?

--Mark
mark.fussell@chimu.com

Nils and SQL NULLs

Kevin Szabo wrote:
> Brian Gridley  wrote:
> >The most damaging use of Nil that I have seen is in the Database
> >extensions to VW.  There nil is what is returned whenever a NULL value
> >is found, as well as when the value has not even been queried.
>
> Interesting.  A while back I had to roll my own INGRES-VW interface and
> I thought about NULL objects vs using nil to represent the NULL SQL
> entity.  I think I started out will a NULL object, but then it had no
> behaviour other than what nil did, so I went back to mapping NULLs from
> INGRES back to smalltalk nils. ... [snip]

Depends on whether you really need to model a true SQL NULL. NULL participates in a three valued boolean logic, so is has distinctly different behavior from a 'nil'. For receiving values from the database this does not matter much, but for anything that might touch the query engine it can be pretty important to track the difference between 'no-value' and 'unknown-value'.

And, as Brian said, it can be important to seperate a 'not-set' value from the database 'NULL' value. Some database write operations do not like 'NULL' values when you really want to 'not-set' that particular attribute (column) value. If everything maps to 'nil' then it is impossible to tell the difference.

Wide use of context-specific special objects would certainly help. My database frameworks use a NotSet object for the same reason Brian mentioned: The database took over 'nil' (or in my recent case 'null') as indicating a relational NULL, so I needed a new object. On the other hand, I probably would have done the same design even if NULL was represented as its own special object, and I certainly would if it (a NotSet object) was the "standard" approach. It makes the code much easier to understand and maintain if you see 'Unset-Column-Value's and 'SQL-NULL's instead of just 'nil's everywhere.

--Mark
mark.fussell@chimu.com

Terminology: Unassigned, Unattached

Tim Rentsch wrote:
> > Considering that the 'garbage use' of the term 'uninitalized' is gone, I
> > think it is natural to reappropriate it for this other meaning unless we
> > have a better term.
>
> May I suggest 'unassigned' (or 'not yet assigned')?  For Smalltalk
> that works pretty well.

Quite a reasonable suggestion.

My first thought is assignment is more general than initialization (i.e. initialization is the first assignment of a variable), so unassigned and uninitialized are effectively synonymous. This could be viewed differently in that initialization is not officially an assignment (a kind of pre-assignment). So an initialized variable would still be unassigned. But one of my main points in discussing 'initialization' was to step up one level over garbage memory issues: Until an object or variable is placed in a valid state for its model's rules it is not initialized. I wasn't trying to say you had to formally assign a value to it (the default value of 'nil' could be perfectly fine for some variables) to make it initialized. 'Invalid' is good except it does not imply 'before being valid'. Maybe 'prevalid'???

But after reviewing the discussion and thinking a bit more about this I realized the term 'attached' from Eiffel is probably what I should have been using for the particular discussion. "A reference is a value which is either void or attached. If a reference is void, no further information is avaialable on it. If it is attached, it gives access to an object; it is said to be attached to that object. The object will also be said to be attached to the reference" [Meyer 92].

Why 'attached' is probably the correct word because I was discussing Allen Wirfs-Brock comment:

>> ... Nil's primary role in Smalltalk is to provide a value
>> for uninitialized variables. You can think of it as an error marker.

Replacing 'unitialized' with one of 'preattached' or 'unattached' is probably closer in intent to the original than any other word we have come up with. In a sense Eiffel's approach is a slight variation on the above. Then my comments can be seen as an restatement and example of Meyer's definition of attachment.

Mark L. Fussell [edited replacing 'uninitialized' with 'unattached']
> For simplicity, we only want one mechanism in a language to identify an
> unattached variable.  This could have been a special keyword (say '?'
> was reserved as asking the variable [not the object in the variable]
> are-you-attached?) and then we would have no debate: Programmers
> don't add new keywords.  So we would happily know that
>     x? ifTrue: ["x is attached"].
> was the 100% correct way to check a variable.
>
> ...One of these syntactic simplifications was to use an object ('nil') to
> indicate an unattached variable....This must be an identity test: why
> would we care anything about the behavior of the object in an unattached
> variable.  That variable's object shouldn't even exist.

Note that Eiffel's 'Void' is an interesting variation on 'nil'. 'Void' is an Object but is incapable of understanding any messages. It is what 'nil' would be if you took out all the methods in 'UndefinedObject', removed the 'Object' inherited behavior, and disallowed extended behavior. Clearly this object 'Void' could only have identity because you can not interact with it at all (well, unless you like catching exceptions). Effectively 'Void' is a purer (less existent or more nauseating ;-) 'nil' and prevents the whole #isVoid discussion.

Comments on 'attached', 'unattached', 'preattached', 'invalid', 'prevalid', 'unassigned', 'uninitialized' ????

--Mark
mark.fussell@chimu.com

[Meyer 92] Bertrend Meyer. Eiffel, The Language. Prentice Hall...1992.

Eiffel 'Void' vs. Smalltalk 'nil'

Jsarkela wrote:
> [Mark L. Fussell wrote:]
> >Note that Eiffel's 'Void' is an interesting variation on 'nil'.  'Void'
> >is an Object but is incapable of understanding any messages.  It is what
> >'nil' would be if you took out all the methods in 'UndefinedObject',
> >removed the 'Object' inherited behavior, and disallowed extended
> >behavior.  Clearly this object 'Void' could only have identity because
> >you can not interact with it at all (well, unless you like catching
> >exceptions).  Effectively 'Void' is a purer (less existent or more
> >nauseating ;-) 'nil' and prevents the whole #isVoid discussion.
>
> This is almost correct. Actually, the object Void supports all methods
> in a system since it inherits from every class. Void is conceptually an
> instance of the class None. Thus the classes in an Eiffel system form a
> lattice with Any as the top and None as the bottom. The object, Void,
> raises a fuss upon feature application because none of its inherited
> features are exported to clients but for some special methods
> such as, isVoid. This singular undefined or void object can be
> managed by the compiler (Eiffel's standard implementation)
> or by the runtime (Smalltalk's standard implementation).

I did not want to bring in all the specifics of Eiffel in a Smalltalk oriented discussion, so I simplified 'support but does not export' to 'does not support/understand'. In the closest Smalltalk terms None implements all methods defined anywhere but the implementation is '^self doesNotUnderstand...'[1] I feel this is pretty much equivalent to not implementing any methods.

> (Aside: Much confusion arises because inheritance in statically
> typed languages such as Eiffel means something quite different
> than inheritance in dynamically typed languages like Smalltalk.)

I don't think that is true. There is little difference in what inheritance means, it is the 'Typing' of variables that makes a difference[2]. Statically typed languages are pessimistic and constrain variables so they know a future message send will be successful. Smalltalk is optimistic and waits until the message send to verify its success. This allows greater flexibility and reusability but it hinders detection of defects. I discuss this a bit in
    http://www.chimu.com/publications/smallJava/

Although Eiffel has one of the nicest static-typing systems around, the feature you brought up (non-export of an inherited method) is also surprisingly 'optimistic' itself. You won't find out until you send a message, do a Void check, or do a system wide validation that the actual object really doesn't support the message/feature it claims it supports. Having to check whether a parameter that claims to be a Point really is a Point (because it might be Void or some other class subclassed from Point that did not export #x) is surprising for a Design by Contract language.

--Mark
mark.fussell@chimu.com

[1] Or more complex and dialect specific ~ '(thisContext.sender != self) ifTrue: [^self doesNotUnderstand...]'

[2] Although you are correct that languages which combine class inheritance and type conformity are different from Smalltalk where class inheritance and type conformity are seperate.

[3] I personally feel Eiffel is on of the best OO languages to contrast with Smalltalk. They both are very pure in their concepts, so most of the differences stem from the Static/Pessimistic/Fully-specified vs. Dynamic/Optimistic/Flexible aspects of the languages.

Identity operators and Eiffel

Bob Jarvis wrote:
> Mark L. Fussell <mark.fussell@chimu.com> wrote in article <33D346C1.35E8@chimu.com>...
> > Note that Eiffel's 'Void' is an interesting variation on 'nil'.  'Void'
> > is an Object but is incapable of understanding any messages.  It is what
> > 'nil' would be if you took out all the methods in 'UndefinedObject',
> > removed the 'Object' inherited behavior, and disallowed extended
> > behavior.  Clearly this object 'Void' could only have identity because
> > you can not interact with it at all (well, unless you like catching
> > exceptions).  Effectively 'Void' is a purer (less existent or more
> > nauseating ;-) 'nil' and prevents the whole #isVoid discussion.
>
> Interesting.  So, if you have an object which understands no messages
> what good is it?  I mean, you can't *do* anything to/with it.  For example,
> how do you find out what it is?  You can't say
>
>         aVar == Void
>
> because if aVar actually is set to Void the message send will bomb because
> your Void object doesn't understand any methods, including #==.  (Of course,
> it also doesn't understand #doesNotUnderstand: so this might cause yet more
> difficulties :-).  I'm curious about how Eiffel gets around this.

Eiffel's identity operators[1][2] do not have the semantics of a message send. This actually makes sense because there is no way either 'a' or 'b' could answer the question "are you the same exact object as this other object" any better than the "machine" itself. There are no attributes that determine whether the objects are identical so they can't chat with each other to find out[3]. You simply have to know whether they are or not (and, in a sense, looking from the outside), so why pretend it is a message send any more than assignment is. [4]

The only advantage a real method '==' provides is the ability to override the default machine behavior, which is unlikely to be what anyone wants [well, some of us do for a very short flaming period of time]. Most Smalltalk's do not send '==' as a message either and overriding it does nothing.

But anyway, Eiffel specifically says:

"Equality expressions cover both equality and inequality tests, using the symbols = and /=. Although they are syntactically similar to operator expression, with = and /= being used in infix form, it is preferable to treat them seperately because here the semantics is not that of a call"
"If both e and f are of reference types, the expression [e = f] denotes reference equality. In other words, it returns true if and only if e and f are either both void or attached to the same object."

[Bertrand Meyer: Eiffel, The Language]

There are also specific remarks about the variations with expanded types and Void, but they lead to the same solution.

--Mark
mark.fussell@chimu.com

[1] In Eiffel they are called equality operators because they do either identity or equality tests depending on context.

[2] The one thing about Eiffel is that the name for EVERYTHING is different (OK, Class is the same), so it makes writing about it in a non-Eiffel forum very disturbing: do I use the Eiffel term 'call','feature',... or the Smalltalk term 'message','method',... Here I continue to use Smalltalk terms as much as possible, which may disturb any Eiffel readers. My apologies.

[3]

  a: Hey, we have the same identity hash!  We must be the same!
  b: Yea, and so does that Smalltalk-80 book sitting on my desk. I don't
think I'm it.  But
  a: maybe this is
  b: a horrible mista... [Scene ends when Mark closes book]

[4] Actually, if variables were first-class objects and assignment was a message send... think of how this could kill the accessor method debate... Inter-thread warfare by changing the language.

Identity operators-2

David N. Smith wrote:
> Mark L. Fussell writes:
> >The only advantage a real method '==' provides is the ability to
> >override the default machine behavior, which is unlikely to be what
> >anyone wants [well, some of us do for a very short flaming period of
> >time].  Most Smalltalk's do not send '==' as a message either and
> >overriding it does nothing.
>
> While overriding it does nothing, #== can be #perform:ed in Smalltalk.
>
>    a perform: #== with: b
>
> and there really is a #== method:
>
>    == anObject
>       ^ self == anObject
>
> There is thus a subtle difference between an optimized message and a
> language feature (like assignment).

But isn't this more confusing than saying '==' is a language feature? We now have made

   a == b
different from
   a perform: #== with: b

The second is a real message send and the first one is not -- even though the first one is the standard form of a message send. We have now made message sending in general more complicated because it has exceptions [1].

Similarly complicated is that the method

   == anObject
is only called for the 'perform:' variation (or a 'get-the-method-object' itself message send) which must be understood to see why:
   == anObject
      ^self == anObject
is not an infinite recursion.

If '==' was specifically stated to be a language feature we could also define a real method:

   isIdenticalTo: anObject
       ^self == anObject

and provide the same 'perform' capability as well as normal message sending and overriding [2] Whether to use '==' or 'isIdenticalTo:' would be the same type of choice as between 'basicNew' and 'new'. Much of the lower level code would still use '==' because that is what they really care about "are the two objects the same to this VM?". The higher level could would use isIdenticalTo: to allow for proxies and other higher level concepts of identity.

Note that I don't really think pretending '==' is a message and having it be half-of-a-message is a significant problem: It is only a bit confusing to programmers when they see (or have to guess) what is under the covers. This is in exchange for the conceptual simplicity of everything being a message. The only reason I brought it up was that Eiffel chooses to not make the pretense which allows it to have a Void/nil object which can not respond to any message but can still participate in an identity comparison.

--Mark
mark.fussell@chimu.com

[1] Although you could argue we already had to make message sending a bit more complicated to support optimization of 'if' conditionals and other constructs.

[2] Potentially for Proxy objects, although this usually requires a double-dispatch, so both methods have to be replaced.

 
Publications