ChiMu  
 
Menu Edge About   Products   Services   Projects   Publications  
  Publications                 

Heterogeneous collections and typechecks
in Java and Smalltalk

Overview

This is a discussion of the different options for dealing with heterogeneous collections where you care about certain kinds of objects that are in the collection (but are not type-guaranteed to be in the collection). It describes some of the differences between Smalltalk's type system (optimistic, highly augmentable) and Java's type system (pessimistic, type-annotated, non-augmentable).

The following is a reference to the main thread-point that this short paper is on:

Original Posting: Heterogeneous collections in Java and Smalltalk

I think people may be talking past each other on this topic... so I will restate what I think the context is and try to give the examples for the two languages. Note the following will tend to use Java-ish notation (i.e. '.','(' instead of ':'), simply because it inherently supports type annotations and is a little more widely known.

Context

We have a heterogeneous collection "guaranteed" to have People in it, but only some will be Programmers, the rest might be any subtype of Person. We want to see how fast each of the Programmers in this collection are, so we will ask them to 'programSpecification(...)' and time how long it takes them to produce a result [yes, unfortunately, it is a waterfall process :-].

So say we have an array of people:

   Person[] people = //...some value

We can loop through an array of Persons in Java with:

   for (int i=0; i<people.length; i++) {
      Person eachPersion = people[i];
      //<<DoSomething>>
   }

Inside the <<DoSomething>> we need to separate Persons who are Programmers from those who are not. If we do a typecheck-cast we could use the following approach:

      if (eachPerson instanceof Programmer) {
          Programmer eachPersonAsProgrammer = (Programmer) eachPerson;
          startTime = getCurrentTime();
          program = eachPersonAsProgrammer.programSpecification(aSpec);
          endTime   = getCurrentTime();
          if (!aSpec.isSatisfiedBy(program)) {
              ...maybe give an 'infinite' time...
          }
      } else {
          //Ignore the person
      }

I hope this represents what Adolph Mendicant was thinking of. The above could skip the 'instanceof' check and just catch an exception and '//Ignore the person' if the typecheck-cast '(Programmer) eachPerson' failed, but the results would be identical and the code is a little more obscure that way.

Removal of typecheck

First off, I think some people have been remarking that this might not be such a good pattern to use no matter what the language. Instead of the typecheck we could change the first line to use:

      if (eachPerson.isProgrammer()) {

This would require changing/augmenting the 'Person' class to understand a message that it probably does not currently understand (assuming a Programmer is a new concept) and produces a downward knowledge of a supertype to its subtypes. The advantages are that the concept of different kinds of Persons is more formalized and we are working solely with objects that respond to messages [e.g. consider that it would be nice if a Type-annotated program worked correctly without the Type-annotations].

Although in Java, adding 'isProgrammer' would require changing the Person type/class, in major Smalltalk implementations you can add methods to existing classes without modifying the original source. So the operation 'Person::isProgrammer' could be included in a project that defined the Programmer type and so make project/dependency management cleaner.

Removal of kind-verifying message

Another possible approach is to simply not seperate the different types of people and treat all people as programmers. So our loop becomes:

          startTime = getCurrentTime();
          program = eachPerson.programSpecification(aSpec);
          endTime   = getCurrentTime();
          if (!aSpec.isSatisfiedBy(program)) {
              ...maybe give an 'infinite' time...
          }

In Java this would again require modifying Person to support the 'programSpecification' operation, but that is just as possible. We then have the average Person produce a NullProgram object that never satisfies any spec. This has similar problems as the 'isProgrammer' example because a Person knows information about its subtypes, but the tradeoff in client usage might make it very worth it. Especially consider that the whole Smalltalk code can be as simple as:

   aCollection collect: [:each | | program |
       time := self time: [program := each programSpecification: aSpec].
       (aSpec isSatisfiedBy: program) ifFalse: [
              ...maybe give an 'infinite' time...
       ]
   ]

This might be very nice.

Again in some Smalltalks, we could add the method 'Person::programSpecification' in a project separate from the project that defined Person. Even more likely we could add 'Object::programSpecification' in that project if we wanted to support super-heterogeneity in the collection.

Summary of Typecheck Options

So summarizing, we could either:

  1. Use a typecheck on the Person object
  2. Enable a Person object to know whether it is a programmer
  3. Enable a Person object to 'programSpecification' whether it is a programmer or not

Of these, (1) and (2) are really the same except Java has inherent support for (1): effectively Java's Object class has an operation 'conformsTo(Type)' that the 'instanceof' keyword turns into. Smalltalk can also easily have general support for this type of 'conformsTo(Interface/Type)', and coded frameworks have been mentioned on this list multiple times (I forget the URLs at the moment though).

The variation (2) is a more domain-specific protocol (nicer for clients but more limited usage). In Java, (2) and (3) require modifying the original source of Person, which both may not be possible and may not be a good design. Depending on the version of Smalltalk, (2) and (3) might or might not require modifying the original source of Person. Of all the solutions, (3) produces the most compressed code for the clients.

Affect of typecheck options on errors

Now since the main topic is errors, say we return to the collection:

   Person[] people = //...some value

and make it less type-aware:

   Object[] people = //...some value

Which of the above design idioms would have the best behavior? The first would accept a collection that included 'Dog' (say someone thought 'Snoopy' was a Person) and just ignore them because they didn't support the right Type. The second and third idiom would cause a type failure unless we have a supertype of Person and Dog that had our 'isProgrammer' or 'programSpecification' operations. In Java you would almost certainly have a type failure that was irreconcilable if you did not control the Dog class, so you would have to do return to idiom (1) to do the initial subtype-branch off of Object. In Smalltalk you could augment Object to do either (2) or (3): make any Object know that it is not a Programmer or make it a very bad programmer.

But if in Smalltalk only Person had a method 'isProgrammer' then Snoopy would cause a 'doesNotUnderstand' exception. So Smalltalk would have a runtime-visible type error and it would have to be found out and corrected if the production system could possibly have this error.

Nulls

Unfortunately though, the Java code is not completely type-safe no matter which option you choose: a 'null' can hit the main tests in option (2) and (3), and would be in the alternate path for option (1). If a null comes down these paths than the Java program will behave as badly (and likely worse) than the Smalltalk version. In Smalltalk the options (1), (2), and (3) are all also available to handle the 'nil' Object: (1) All objects including 'nil' could understand 'conformsTo', (2) All objects including 'nil' could understand 'isProgrammer', etc. This power/flexibility can be abused but having to write:

   boolean is_equalTo(Object a, Object b) {
      if ((a==null) && (b==null)) return true;
      if ((a==null) || (b==null)) return false;
      return a.equals(b);
   }

in Java gets a bit silly when the Smalltalk:

    (a = b)

works correctly.

Summary

So the behavior is different, and Smalltalk could have a type error that the Java code tries to avoid (and may be successful) in exchange for a lot of explicit type annotation. Is it worth it? [depends...depends...] But I hope producing a fuller example is helpful in getting a feel of the options[2].

--Mark
mark.fussell@chimu.com

[1] You might want to look at: http://www.chimu.com/publications/smallJava/index.html For a comparison (especially Type effects) between Java and Smalltalk.

[2] Other idioms are certainly possible, but I didn't want to give too many in the main discussion. For example:

  1. Smalltalk could reject an object that caused a 'doesNotUnderstand' exception. Similar but with less clear timing than doing a 'Cast' and 'catch' in Java.
  2. We could "guarantee" that all objects in the list always understand the Person protocol. Various approaches are possible, including wrapping any non-Person with a NonPerson object.
  3. We could provide class-specific Adapters that know how to provide or adapt to an interface the source object doesn't necessarily have.
  4. We could use reflection in Java to get operation-level type
  5. checks.

As context, Adolph Mendicant wrote:
[SNIP]
> There's no maybe about it, and it isn't bad form to ignore messages if
> that's the design.  I could have a room full of people and want to find out
> which programmers knew Smalltalk.  I wouldn't bother asking the
> non-programmers in the room.  So I wouldn't ask them.  This isn't dropping
> the message, it's dealing with a heterogenous collection.
>
> Terry Raymond <traymond@craftedsmalltalk.com> wrote
> >  The best thing you can do is notify the
> > developer
> > of the problem and this is precisely what smalltalk does.
>
> But what if it isn't a problem?  What if it is the design?  And Smalltalk
> isn't guaranteed to notify the developer.  It might notify the user.
>
> >  If it turns out
> > that the object should be there but simply does not want to respond to
> > the message then you put a dummy method in the object's class.
>
> Wait, I have to modify the class?  What if it isn't my class?  Just because
> a baker can be in the same room as a programmer doesn't mean that the baker
> class should get programmer only methods added to it.  This approach sounds
> like it penalizes the use of heterogenous collections.
 
Publications