Heterogeneous collections and typechecks in Java and Smalltalk
Overview
This is a discussion of the different options for dealing with heterogeneous
collections where you care about certain kinds of objects that are in the
collection (but are not type-guaranteed to be in the collection). It describes
some of the differences between Smalltalk's type system (optimistic, highly augmentable)
and Java's type system (pessimistic, type-annotated, non-augmentable).
The following is a reference to the main thread-point that this short paper is on:
Original Posting: Heterogeneous collections in Java and Smalltalk
I think people may be talking past each other on this topic... so I will
restate what I think the context is and try to give the examples for the
two languages. Note the following will tend to use Java-ish notation
(i.e. '.','(' instead of ':'), simply because it inherently supports
type annotations and is a little more widely known.
Context
We have a heterogeneous collection "guaranteed" to have People in it,
but only some will be Programmers, the rest might be any subtype of
Person. We want to see how fast each of the Programmers in this
collection are, so we will ask them to 'programSpecification(...)' and
time how long it takes them to produce a result [yes, unfortunately, it
is a waterfall process :-].
So say we have an array of people:
Person[] people = //...some value
We can loop through an array of Persons in Java with:
for (int i=0; i<people.length; i++) {
Person eachPersion = people[i];
//<<DoSomething>>
}
Inside the <<DoSomething>> we need to separate Persons who are
Programmers from those who are not. If we do a typecheck-cast we
could use the following approach:
if (eachPerson instanceof Programmer) {
Programmer eachPersonAsProgrammer = (Programmer) eachPerson;
startTime = getCurrentTime();
program = eachPersonAsProgrammer.programSpecification(aSpec);
endTime = getCurrentTime();
if (!aSpec.isSatisfiedBy(program)) {
...maybe give an 'infinite' time...
}
} else {
//Ignore the person
}
I hope this represents what Adolph Mendicant was thinking of. The above
could skip the 'instanceof' check and just catch an exception and
'//Ignore the person' if the typecheck-cast '(Programmer) eachPerson'
failed, but the results would be identical and the code is a little
more obscure that way.
Removal of typecheck
First off, I think some people have been remarking that this might not
be
such a good pattern to use no matter what the language. Instead of the
typecheck we could change the first line to use:
if (eachPerson.isProgrammer()) {
This would require changing/augmenting the 'Person' class to understand
a message that it probably does not currently understand (assuming a
Programmer is a new concept) and produces a downward knowledge of a
supertype to its subtypes. The advantages are that the concept of
different kinds of Persons is more formalized and we are working solely
with objects that respond to messages [e.g. consider that it would be
nice if a Type-annotated program worked correctly without the
Type-annotations].
Although in Java, adding 'isProgrammer' would require changing the
Person type/class, in major Smalltalk implementations you can
add methods to existing classes without modifying the original source.
So the operation 'Person::isProgrammer' could be included in a project
that defined the Programmer type and so make project/dependency
management cleaner.
Removal of kind-verifying message
Another possible approach is to simply not seperate the different types
of people and treat all people as programmers. So our loop becomes:
startTime = getCurrentTime();
program = eachPerson.programSpecification(aSpec);
endTime = getCurrentTime();
if (!aSpec.isSatisfiedBy(program)) {
...maybe give an 'infinite' time...
}
In Java this would again require modifying Person to support the
'programSpecification' operation, but that is just as possible. We
then have the average Person produce a NullProgram object that never
satisfies any spec. This has similar problems as the 'isProgrammer'
example because a Person knows information about its subtypes, but the
tradeoff in client usage might make it very worth it. Especially
consider that the whole Smalltalk code can be as simple as:
aCollection collect: [:each | | program |
time := self time: [program := each programSpecification: aSpec].
(aSpec isSatisfiedBy: program) ifFalse: [
...maybe give an 'infinite' time...
]
]
This might be very nice.
Again in some Smalltalks, we could add the method
'Person::programSpecification' in a project separate from the project
that defined Person. Even more likely we could add
'Object::programSpecification' in that project if we wanted to support
super-heterogeneity in the collection.
Summary of Typecheck Options
So summarizing, we could either:
- Use a typecheck on the Person object
- Enable a Person object to know whether it is a programmer
- Enable a Person object to 'programSpecification' whether it is a
programmer or not
Of these, (1) and (2) are really the same except Java has inherent
support for (1): effectively Java's Object class has an operation
'conformsTo(Type)' that the 'instanceof' keyword turns into. Smalltalk
can also easily have general support for this type of
'conformsTo(Interface/Type)', and coded frameworks have been mentioned
on this list multiple times (I forget the URLs at the moment though).
The variation (2) is a more domain-specific protocol (nicer for
clients but more limited usage). In Java, (2) and (3) require modifying
the original source of Person, which both may not be possible and may
not be a good design. Depending on the version of Smalltalk, (2) and
(3) might or might not require modifying the original source of Person.
Of all the solutions, (3) produces the most compressed code for the
clients.
Affect of typecheck options on errors
Now since the main topic is errors, say we return to the collection:
Person[] people = //...some value
and make it less type-aware:
Object[] people = //...some value
Which of the above design idioms would have the best behavior? The
first would accept a collection that included 'Dog' (say someone thought
'Snoopy' was a Person) and just ignore them because they didn't support
the right Type. The second and third idiom would cause a type failure
unless we have a supertype of Person and Dog that had our 'isProgrammer'
or 'programSpecification' operations. In Java you would almost
certainly have a type failure that was irreconcilable if you did not
control the Dog class, so you would have to do return to idiom (1) to do
the initial subtype-branch off of Object. In Smalltalk you could
augment Object to do either (2) or (3): make any Object know that it is
not a Programmer or make it a very bad programmer.
But if in Smalltalk only Person had a method 'isProgrammer' then
Snoopy would cause a 'doesNotUnderstand' exception. So Smalltalk would
have a runtime-visible type error and it would have to be found out and
corrected if the production system could possibly have this error.
Nulls
Unfortunately though, the Java code is not completely type-safe no
matter which option you choose: a 'null' can hit the main tests in
option (2) and (3), and would be in the alternate path for option (1).
If a null comes down these paths than the Java program will behave as
badly (and likely worse) than the Smalltalk version. In Smalltalk the
options (1), (2), and (3) are all also available to handle the 'nil'
Object: (1) All objects including 'nil' could understand 'conformsTo',
(2) All objects including 'nil' could understand 'isProgrammer', etc.
This power/flexibility can be abused but having to write:
boolean is_equalTo(Object a, Object b) {
if ((a==null) && (b==null)) return true;
if ((a==null) || (b==null)) return false;
return a.equals(b);
}
in Java gets a bit silly when the Smalltalk:
(a = b)
works correctly.
Summary
So the behavior is different, and Smalltalk could have a type error that
the Java code tries to avoid (and may be successful) in exchange for a
lot of explicit type annotation. Is it worth it?
[depends...depends...] But I hope producing a fuller example is helpful
in getting a feel of the options[2].
--Mark
mark.fussell@chimu.com
- Smalltalk could reject an object that caused a
'doesNotUnderstand' exception. Similar but with less clear timing than
doing a 'Cast' and 'catch' in Java.
- We could "guarantee" that all objects in the list always
understand the Person protocol. Various approaches are possible,
including wrapping any non-Person with a NonPerson object.
- We could provide class-specific Adapters that know how to provide
or adapt to an interface the source object doesn't necessarily have.
- We could use reflection in Java to get operation-level type
checks.
As context, Adolph Mendicant wrote:
[SNIP]
> There's no maybe about it, and it isn't bad form to ignore messages if
> that's the design. I could have a room full of people and want to find out
> which programmers knew Smalltalk. I wouldn't bother asking the
> non-programmers in the room. So I wouldn't ask them. This isn't dropping
> the message, it's dealing with a heterogenous collection.
>
> Terry Raymond <traymond@craftedsmalltalk.com> wrote
> > The best thing you can do is notify the
> > developer
> > of the problem and this is precisely what smalltalk does.
>
> But what if it isn't a problem? What if it is the design? And Smalltalk
> isn't guaranteed to notify the developer. It might notify the user.
>
> > If it turns out
> > that the object should be there but simply does not want to respond to
> > the message then you put a dummy method in the object's class.
>
> Wait, I have to modify the class? What if it isn't my class? Just because
> a baker can be in the same room as a programmer doesn't mean that the baker
> class should get programmer only methods added to it. This approach sounds
> like it penalizes the use of heterogenous collections.
|