Knowledge and Meaning
To discuss how a database can handle lack of information, we must first define how we put knowledge into a relational database [ See [Date+M 94] for a fuller discussion of this topic.] . To start, I will bring up the familiar Suppliers relation variable (S).
S
The first thing me must do is to define what putting a tuple (i.e. row) into S means: We must give S a predicate. Without stating what S means we can not translate between human knowledge and the database model. Everything we tell the database in the databases language will be "true" (i.e. provable) to the database, and the database will even be able to prove other "new" truths from its model and our axioms. All of these proofs are meaningless unless a person can interpret them in human terms, which must be the identical interpretation as the person who designed the database, and the person who entered the data, and so on. The only way to be sure all these interpretations are identical is to document and disseminate the meaning of each relation (and attribute, domain, etc.) in the database.
What does S mean? Well it could mean, if there exists a tuple in S then:
S.0: {S#, SName, City,
} There is a supplier with identity S# who hates the name SName but likes the city City, and
But that is unlikely. This does show the importance of precise predicates and human interpretation. More likely the predicate [ Usually I will not include the relation’s attributes in the predicate specification for space purposes. ] for S is:S.1: There exists a supplier with identifier S#, who has the name SName, who is located in city City, and
So adding the tuple
|
S#
|
SName
|
City
|
|
|
S1
|
Jones
|
London
|
| States that there exists a supplier with identifier S1, who has the name Jones, and is located in London.
Adding a couple more tuples gives us:
S : S.1
|
S#
|
SName
|
City
|
|
|
S1
|
Jones
|
London
|
|
|
S2
|
Smith
|
Bristol
|
|
|
S4
|
Eiffel
|
Paris
|
| We can now ask the database:
Q1: What are the names of suppliers
Q2: What suppliers are in London?
Q3: What suppliers are not in London?
Actually, we cant. Although the above questions use natural wording, the database can not possibly answer the questions as posed. They are in terms of our world. We can only ask the database about what it can prove about its "world" not what is true in our world. We need to do the translation to and from the database so our questions should reflect that. The corrected questions are:
Q1.2: What names can be proven to be the name of a supplier?
Q2.2: What suppliers can be proven to be in London?
Q3.2: What suppliers can be proven not to be located in London?
These questions the database can answer. For the first question the database returns:
And for the second question it returns:
And for the third question it
cant answer the question. We have no predicate that states:
There exists a supplier with identifier S#, who is not located in city City
Given how the query (Q3.2) is phrased we would need such a relation (in which we might put S2 and S4 if they dont have a location in London). But with what we have at the moment we can only answer the question:
Q3.3: What suppliers can be proven not to be provably located in London?
Which would then return
But say we really wanted to be able to answer questions like Q3.2, how can we? We could modify the S predicate:
S.2: There exists a supplier with identifier S#, who has the name SName, who is located only in city City, and
We can now answer Q3.2 because we can produce a derived relation value [ This derived value is unrealistic because it would have to include the cartesian product of the extent of S# and the possible values (minus one) of the Domain City. ] of:
NotIn = f(S) : NotIn.1
|
S#
|
City |
|
S1
|
Bristol |
|
S1
|
Paris |
|
S2
|
London |
|
S2
|
Paris |
|
S4
|
London |
|
S4
|
Bristol |
|
|
| Which has the predicate we wanted:
NotIn.1: There exists a supplier with identifier S#, who is not located in city City
It is impossible to produce the NotIn.1 relation value without modifying the predicate for S or adding a new base relation variable to record new information. The information simply was not recorded in the database.
Summary
First, for a database to have any meaning we must be able to uniquely translate between human terms and database statements. For a relational database this requires precisely specifying what each relation, domain, attribute, and tuple means. It is especially important to remember specifying what a relation means because without a formal specification a user will assume its meaning and misinterpret the answers to questions.
Second, a database can only answer questions about its world. It is up to the user to make the databases world contain the model and information necessary to provide answers that are useful for understanding the "real world".
|