ChiMu  
 
Menu Edge About   Products   Services   Projects   Publications  
  Publications > TupleMarks      Previous Page Previous TOC Next Next Page

Missing Information

Fully Determined Relations Predicates that specify Complexity Summary

Now we can deal with representing the lack of knowledge (or missing information) in a relational database. We have already seen what a database does not know: everything it has not been told it does know. So we don’t have a problem of telling the database: "You don’t know this", we have a problem of telling a database "You only know this".

Fully Determined Relations

This is the technique of using only fully determined relations [ See [McGoveran 94c] and [Date 94b].] : We can add any relations needed to express what we know, but each should only include exactly what we know about those tuples. If this approach can suitably handle all our needs for representing missing information then we should not add another mechanism to the relational model. That would be adding complexity without any new expressiveness.To try out this technique, what if we don’t know where a particular supplier is located? So far our only base relation is S which has a predicate that statesS.2: There exists a supplier with identifier S#, who has the name SName, who is located only in city City, and …

Since we don’t know where the supplier is located we can’t use this relation. No problem, we can add a new relation to our database.

S_NoCity : S_NoCity.1

S#

SName

S_NoCity.1: There exists a supplier with identifier S#, who has the name SName, and …

And if we add two entries to S_NoCity we get:

S_NoCity : S_NoCity.1

S#

SName

S3

DuPont

S5

Grid

We now have two relation variables that each express exactly what we know about their tuples. Everything in fully determined. One of the relations has more information than the other and that is the only sense in which we are "missing" information. The S_NoCity relation is "half-full" so it could also be considered "half-empty".

So how does this new relation and the new tuples affect our last two "tough" questions?

Q2.2: What suppliers can be proven to be in London?

Q3.2: What suppliers can be proven not to be located in London?

The new tuples don’t affect the queries at all. Since the S_NoCity predicate doesn’t mention cities in any way, it can not affect the outcome of any query that mentions a city. It has nothing to do with it.

On the other hand, the easy question:

Q1.2: What names can be proven to be the name of a supplier?

Should now return:

SName

Jones

Smith

DuPont

Eiffel

Grid

The problem is that asking this question of the database is now a bit cumbersome for the user. We need to take the two base relations, project SName, and union the results together. Not terribly difficult, but this requires a user to always remember the two tables when querying. We could also define the following derived relation value (i.e. View) to make this particular query easier for the user.

S_All = f(S,S_NoCity)

S#

SName

S1

Jones

S2

Smith

S3

DuPont


S4

Eiffel


S5

Grid


S_All.1: There exists a supplier with identifier S#, who has the name SName, and …

The user can now query from S_All when asking about names, use S when asking about cities, and use S_NoCity when asking about … hmm … that isn’t completely clear from our predicates. Why would we use S_NoCity instead of S_AllNames? What distinguishing trait places a tuple in S_NoCity?

Predicates that specify known "unknowns"

So far we have three predicates (renaming them slightly):

S_City.2: {S#, SName, City, …} – There exists a supplier with identifier S#, who has the name SName, who is located only in city City, and …

S_NoCity.1: {S#, SName, …} – There exists a supplier with identifier S#, who has the name SName, and …

S_AllNames.1: {S#, SName, …} – There exists a supplier with identifier S#, who has the name SName

We appear to have been a bit vague about the meaning of putting a tuple into S_NoCity: why is the set of cities in S_NoCity different from the set in S_AllNames? We can now decide what "unknown" information we want to represent in our database by specifying what the known information means. We could decide that putting a city into S_NoCity means that we don’t know the city for that supplier, or we could mean that the supplier is a multi-national corporation that is located in multiple cities, or any number of other meanings. For a fuller discussion of the different possible meanings of missing information see [McGoveran 94b]. If we needed more than one meaning we would create multiple relations and variables. For example we could have:

S_UnknownCity

S#

SName

S3

DuPont

S_MultiNational

S#

SName

S5

Grid

We would then need to have S_AllNames be a function of all three relations to union them together.

At the moment we choose to simply correct the predicate of S_NoCity:

S_NoCity.2: There exists a supplier with identifier S#, who has the name SName, and … and for which we do not know the city it is located in.

This new version of the predicate allows us to ask the question:

Q4.1: What suppliers can be proven to be among those that we don’t know their location?

It has no impact on the results of our previous questions:

Q1.2: What names can be proven to be the name of a supplier?

Q2.2: What suppliers can be proven to be in London?

Q3.2: What suppliers can be proven not to be located in London?

Complexity

All these tables and views add complexity to the database and make life more difficult for the user and also for the manager of these base and derived relations. This is with just one "optional" attribute. We will get combinatorial explosions of relations if for some suppliers we knew the City but not the Name, and some we knew the previous months purchase-quota, and so on. We get an even greater explosion if we need to have multiple meanings for the missing information.

Summary

The approach in this section correctly handled lack of information by adding new predicates that contain only the information we know. We didn’t need to add any new mechanisms to relational algebra. We did need to specify our predicates precisely enough to identify why tuples are placed into one predicate over another. This precision is also what enables a user to know what the results of a query mean.

Unfortunately, managing all the new predicates proved more difficult than we might desire and the resulting scheme may be difficult for a user to understand. It would be good if there was a mechanism that can ease this management and simplify the scheme, but it should have as little impact as possible on relational algebra.

 
Publications > TupleMarks Previous Page Previous TOC Next Next Page