Relation Distinguishing Tuple-Marks
Our problem with multi-relation variables is that it is as complex to distinguish among the different relations in a single multi-relation variable as if there were multiple separate variables. What if we used a simpler marker? Instead of having a marker outside the normal attributes of the relations, we can use a marker within the tuples attributes to identify which relation the tuple belongs to. I will call this a Relation Distinguishing Tuple-Mark (tuple-mark for short) and use a dash "-- " to indicate it in a table. Using this approach for our example gives us:
S_All : (S_City.2, S_NoCity.2)
|
Relation
|
S#
|
SName
|
City
|
|
|
S_City.2
|
S1
|
Jones
|
London
|
|
|
S_City.2
|
S2
|
Smith
|
Bristol
|
|
|
S_City.2
|
S4
|
Eiffel
|
Paris
|
|
|
S_NoCity.2
|
S3
|
DuPont
|
--
|
|
|
S_NoCity.2
|
S5
|
Grid
|
--
|
| The "-- " tuple-mark is not a type of City or any comment on the value of a tuples city: that tuple has no attribute City. The tuple-mark identifies that the tuple belongs to a different relation than a tuple that does not have the mark. Supplier-S3 is part of the relation S_NoCity that does not have a city and the mark simply specifies that to be true.
This approach is only possible if each base relation has a distinguishing set of attributes. For our example this is true: S_City has attributes of {S#, SName, City,
} and S_NoCity has attributes of {S#, SName,
}. We could also have a relation S_NoName with attributes of {S#,City,
} and S_NoNameOrCity with attributes of {S#,
}. We could not handle our two relations S_UnknownCity and S_MuliNational, and we will return to this problem later.
The full relation S_All is functionally derived as the union of S_City and S_NoCity so it does not need to be distinguished (all tuples in the variable are also part of S_All). It is not completely clear how we query on S_City or S_NoCity instead of S_All, but that will be dealt with in the next section.
Interacting
We can now return to the question "How do we interact with this multi-relation variable?" How do we insert relations into it and how do we query a multi-relation variable? Specifically how do we answer the questions:
Q1.2: What names can be proven to be the name of a supplier?
Q2.2: What suppliers can be proven to be in London?
Q3.2: What suppliers can be proven not to be located in London?
Q4.1: What suppliers can be proven to be among those that we dont know their location?
Querying
A query over a multi-relation variable will only consider (be restricted to) the tuples for which the query is applicable. A query is applicable to a relation if all the attributes the query mentions exist for that relation. For example, the attribute SName exists in the relation S_City and S_NoCity, so a query "SELECT SName FROM S_All" would consider all the tuples:
|
S#
|
SName
|
|
|
S1
|
Jones
|
|
|
S2
|
Smith
|
|
|
S4
|
Eiffel
|
|
|
S3
|
DuPont
|
|
|
S5
|
Grid
|
| If a query mentions City (e.g. "SELECT City FROM S_ALL") it would only consider the tuples that have a relation with an attribute City:
|
S#
|
SName
|
City
|
|
|
S1
|
Jones
|
London
|
|
|
S2
|
Smith
|
Bristol
|
|
|
S4
|
Eiffel
|
Paris
|
| This allows us to answer Q1-Q3 and gives the same answers as our original S_City, S_NoCity, and S_All solution. So far so good, but to make any more progress requires extending the relational algebra.
Extending Projection
To answer Q4 we need to be able to retrieve the tuples that have distinguishing marks that identify them to be part of the S_NoCity relation. This sounds like a restriction but it cant be; we have the rule that tuples without an attribute will be immediately discarded so they will never make it to the restriction stage. We need to choose the tuples with a mark in a attribute and at the same time throw away that attribute (to prevent the tuple from being later discarded). We can do this by extending the "innermost" projection operation: the direct projection of our multi-relation variable "before" it is involved with the rest of the query.
Although not part of SQL it will be easiest to use and extend C.J. Dates projection notation:
A [X, Y,
, Z]
Which produces a relation value subset of A "obtained by eliminating all attributes not specified in the attribute commalist and then eliminating duplicate (sub)tuples from what is left)." [Date 95, Page 151]. In this notation Q1 can be answer by "S_ALL [SName]" and finding all the cities of suppliers can be answered by "S_ALL [City]".
To the projection notation I will add three new pieces: the include-all ("*") indicator, the eliminate ("-") prefix and the choose-marked ("!") prefix. The first two are primarily convenience additions to the projection syntax and they have nothing to do with multi-relation variables. They are added to the projection notation because they conceptually prepare it for the third addition, which will allow us to discriminate among different marked tuples.
Include-All
The include-all indicator is similar in meaning to the SQL version: it specifies that all the attributes of the relation should be used included in the projection and (by itself) is equivalent to the identity projection.
A = A [*]
Eliminate
The eliminate prefix allows you to specify that an attribute should be eliminated from the projection instead of included in it. This allows you to say "S_ALL [*,-City]" and get
|
S#
|
SName
|
|
|
S1
|
Jones
|
|
|
S2
|
Smith
|
|
|
S4
|
Eiffel
|
|
|
S3
|
DuPont
|
|
|
S5
|
Grid
|
| Instead of specifically stating the attributes to include "S_ALL [S#, SName,
]". If you have a projection that has only eliminated attributes you can leave off the include-all indicator:
S_ALL [-X, -Y, -Z ] = S_ALL [*, -X, -Y, -Z]
Choose-marked
Finally, the choose-marked prefix is similar to the eliminate prefix in that it will eliminate the attribute from the projection, but in addition it will also eliminate tuples (restrict the result to not include tuples) that have anything other than a tuple-mark (Relation Distinguishing Mark) for that attributes value. For our example, we can select the tuples that are part of S_NoCity by the projection "S_ALL [*, !City]" which would give:
|
S#
|
SName
|
|
|
S3
|
DuPont
|
|
|
S5
|
Grid
|
| Now we can answer Q4:
Q4.1: What suppliers can be proven to be among those that we dont know their location?
with "S_ALL [S#, !City]"
It may seem strange to have a projection operation perform a restriction but notice that this pseudo-restriction is complementary with the pseudo-restriction performed by including the attribute in the projection. "S_ALL [S#, SName, City]" gives:
|
S#
|
SName
|
City |
|
S1
|
Jones
|
London |
|
S2
|
Smith
|
Bristol |
|
S4
|
Eiffel
|
Paris | "S_ALL [S#, SName, !City]" gives:
|
S#
|
SName |
|
S3
|
DuPont |
|
S5
|
Grid | In both cases the restriction occurs in the process of getting rid of tuple-marks in the particular attribute, in the first case it eliminates the marked tuples and in the second it eliminates the non-marked tuples.
Summary
To handle multi-relation variables in queries we proposed two additions to relational algebra.
- A tuple with a marked attribute is excluded from all queries that mention that attribute
- A choose-marked attribute projection will eliminate that attribute from the projection and will restrict the result to only include tuples that have a tuple-mark for that attribute
These will be the only additions to relational algebra. There is no need to add three-valued logic or change any domain operations behavior. This is a much smaller change to relational algebra compared to supporting "normal" NULLs.
Adding tuples
To add a relation-distinguishing tuple-marked tuple to the database we can use the same approach as is used for NULLs. We can "pretend" to set the value of the attribute to "NULL" which will instead result in the tuple being marked as belonging to a different relation which does not include that attribute. As mentioned earlier in chapter, to do this operation requires that each relation in a multi-relation variable be distinguishable by its set of attributes. For our example, we can use
INSERT INTO S_ALL (S#, SName, City) VALUES ("S6", "Java", NULL)
To add an S_NoCity tuple.
|
Relation
|
S#
|
SName
|
City
|
|
|
S_NoCity
|
S6
|
Java
|
--
|
| Because NULLs are being used to identify the tuples relation this may cause confusion with other insertion features that use NULLs as a flag. For example, if a particular table uses NULLs for a default value it will effectively change the inserted tuples relation before completing the insertion. Actually adding a tuple with a particular relation (that does not include the defaulted attribute) may be impossible because of this.
Summary
The approach of using multi-relation variables allows us to simplify some databases and especially databases with missing information. If two or more relations are related to each other by them having similar but slightly different attributes than these relations can "share" the same multi-relation variable. Although adding this concept of multi-relation variables with marked-tuples adds complexity to the relational model, database schemes can be made much less complex. What used to require two, three, or more base relation variables and multiple derived relation values (views) can now be merged into a single variable. Doing so has no impact on what can be express with the database: the defined and derivable relations are the same.
Using Relation Distinguishing Tuple-Marks made it possible to easily interact with multi-relation variables and only required a couple additions to the relational model:
- A tuple with a marked attribute is excluded from all queries that mention that attribute
- A choose-marked attribute projection will eliminate that attribute from the projection and will restrict the result to only include tuples that have a tuple-mark for that attribute
This appears to be a good approach because of its simplicity, its limited impact on the relational model, and possibly because of its similarity to how NULLs are used now. Tuple-marks have many advantages over NULLs as currently implemented or considered because:
- Tuple-marks do not use three valued logic
- Tuple-marks have no impact on domains and domain operations
We will return to further detail tuple-marks in a later chapter, but first we should consider how the approach handles the issues brought up in the NULL debate.
|