ChiMu  
 
Menu Edge About   Products   Services   Projects   Publications  
  Publications > TupleMarks      Previous Page Previous TOC Next Next Page

Tuple-Marks in the NULL Debate

Looking at previous Codd-1 McGoveran-1 Date-1 Summary

Although Tuple-Marks could be proposed as an addition to the basic relational model (without NULLs) that is not the current historical context. There has been much discussion of the value of NULLs and their associated problems. Although tuple-marks are a completely different mechanism from NULLs, they are similar enough in use that it would be reasonable to see how tuple-marks would hold up in the debates. If tuple-marks have as many issues as NULLs than they won’t be a better alternative to them.

Looking at previous NULL criticisms

I will select a few example criticisms from the authors who have written on the NULL topic. Some of these authors believe NULLs are a useful mechanism and some believe NULLs and 3VL cause major problems. In either case there are arguments about correct relational modeling and the correct behavior of relational operators (e.g. queries) in the face of missing information. I will place tuple-marks in the middle of the fray and say they can handle the arguments from both sides.

Codd-1

The example running through the first part of this paper was identical to the one in [Codd 90 § 9.2] where he discusses criticisms towards his approach for missing information. In that section he criticizes the results of a Special Value approach (at the time named as and mixed with the default value concept)

S_All : (S_City, S_NoCity)

S#

SName

City

S1

Jones

London

S2

Smith

Bristol

S3

DuPont

--

S4

Eiffel

Paris

S5

Grid

--

Which I will define as having the same relations (predicates) as earlier in this paper:

S_City: There exists a supplier with identifier S#, who has the name SName, who is located only in city City, and …

S_NoCity: There exists a supplier with identifier S#, who has the name SName, and … and for which we do not know the city it is located in.

S_All: There exists a supplier with identifier S#, who has the name SName, and …

The questions posed of this table are:

Q1: Find the suppliers in London

Q2: Find the supplier NOT in London

To rephrase these in terms the database can understand gives:

Q1a: Find the suppliers that can proved to be in London.

Q1b: Find the suppliers that can proved to be possibly in London.

Q2a: Find the suppliers that can proved to be NOT in London.

Q2b: Find the suppliers that can proved to be possibly NOT in London.

Q1a is easy: " SELECT S# FROM S WHERE City = ‘London’ ". By mentioning City we automatically restrict the considered tuples to those with a relation that includes City (so we drop S3 and S5).

Q1b requires a union between Q1a and "SELECT S# FROM S [!City]".

Q2a is easy again: " SELECT S# FROM S WHEREH City <> London "

Q2b again requires a union between Q2a and "SELECT S# FROM S [!City]".

Specifying each query correctly requires understanding the relation definitions above, but the tuple-marks give correct and predictable results.

McGoveran-1

David McGoveran [McGoveren 94a,b,c] covers properly normalizing a database to avoid needing NULLs. This approach is the best first approach possible and is what I superficially described in Chapter 5. But it can lead to a significant increase in tables and database complexity. Using a slightly modified version of one of McGoveran’s examples:

Vehicles : (PV, UPV, UPMV)

VIN

Make

Model

Color

1

Ford

Escort

Green

2

Pontiac

Grand Prix

Red

3

Porsche

Carrera

--

4

Chrysler

LeBaron

--

5

DeLorean

--

--

PV: PaintedVehicles: There exists a care with vin VIN, made by Make, of model Model, with color Color

UPV: UnpaintedVehicles: There exists…

UPMV: UnpaintedModellessVehicles: There exists…

If we were to divide this table using the different relations involved we would get a total of three tables and would have to create two views to get "Makes" and "Makes and Models" for a user to easily query on. By keeping it all in one multi-relation variable with tuple-marks to distinguish the different relations we have much less complexity of the database scheme.

Date-1

C.J. Date discussed why three-valued logic is a mistake in [Date 95c] and in which he shows what answers supplied by 3VL are considered wrong answers. Based on the discussion of database knowledge and the use of tuple-marks I will show what answers tuple-marks would provide and respond to those that are considered wrong answers.

Wrong answers of the first kind

Date’s first example of a wrong answer is the result of the query

SELECT E#

FROM EMP

WHERE Job = ‘Clerk’

OR NOT Job = ‘Clerk’;

Assuming a relation and variable of:

EMP

E#

Name

Job

Date asserts that this query should return "all employee numbers" even when some tuples have JOB = NULL.

Right answers of the first kind

Although Date’s assertion seems intuitive and correct, with our multi-relation variables we can see that Date is incorrect. If there were two relation variables:

EMP_OnJob

E#

Name

Job

EMP_NoJob

E#

Name

We would not expect that

SELECT E#

FROM EMP_OnJob

WHERE Job = ‘Clerk’

OR NOT Job = ‘Clerk’;

would return any of the entries in EMP_NoJob.

It is also nonsensical to form the query:

SELECT E#

FROM EMP_NoJob

WHERE Job = ‘Clerk’

OR NOT Job = ‘Clerk’;

but if the query were formed it would certainly return no tuples: Job is unmentioned and has no value, not either ‘Clerk’ or something other than ‘Clerk’. We can see this by translating the query into a more formal question:

Q1: Find the employees that can proved to have a Job ‘Clerk’ or can be proved to be have a Job that is not ‘Clerk’.

This does not mean we can’t find employees who are not on the Job ‘Clerk’ simply because they aren’t on any job, but we have to express it in a sensible manner given our relation predicates. The predicate of EMP_NoJob is likely to be:

EMP_NoJob: There exists an employee with id E# and name Name who is not currently on a job.

So our second query would be:

SELECT E#

FROM EMP_NoJob;

We simply select all the tuples from EMP_NoJob to determine who is not on any particular Job.

To get the answer to the question:

Q1: Find the employees that can proved to not be on the Job ‘Clerk’

We can form the query:

(SELECT E#

FROM EMP_OnJob

WHERE Job = ‘Clerk’

OR NOT Job = ‘Clerk’)

UNION

(SELECT E#

FROM EMP_NoJob) ;

Requiring the separate expressions joined together makes sense: we are dealing with two different relations that have different attributes and predicates.

Muli-relation variables

For multi-relation variables we have the same problem and the same results but it is slightly more obscured by them sharing the same variable (table). Our relations and variables are:

EMP : (Emp_OnJob, Emp_NoJob)

E#

Name

Job

To get the answer to the question:

Q1: Find the employees that can proved to not be on the Job ‘Clerk’

We must form the query:

(SELECT E#

FROM EMP

WHERE Job = ‘Clerk’

OR NOT Job = ‘Clerk’)

UNION

(SELECT E#

FROM EMP [!Job]) ;

Where the projection of EMP using a choose-marked allows us to get to the tuples that are members of the relation Emp_NoJob.

Unbound variables

The argument Date proposes for why the original query should return all tuples would be correct if a "NULL" represented an unbound attribute-value variable which during a Prolog-like unification (see [Clocksin+M 81]) actually took on all possible values of that attribute-value. In that case the condition would apply to all tuples (because all had a relation that mentions Job) and the WHERE would be a tautology. A relation distinguishing tuple-mark is not an unbound attribute-value variable (nor anything to do with an attribute-value) so this reasoning does not apply to it.

Summary

Tuple-marks do not cause wrong answers. The answers to queries over a multi-relation variable will be correct for the applicable relations for that variable. Users will have to remember that certain variables have multiple relations and must form queries appropriately to use the relations desired.

Date’s arguments over the problems with 3VL causing wrong answers are cause by NULLs being considered an attribute value in a single relation. Tuple-marks are not attribute values.

Summary

I believe relation-distinguishing tuple-marks have correct behavior in the relational model and are superior to both attribute NULLs (3VL) and special values for handling missing information.

 
Publications > TupleMarks Previous Page Previous TOC Next Next Page