MAPPER DataBase Structure - A Case For Relativity
         -------------------------------------------------
                         by Rob Haeuser

----------------------------------------------------------------  
  The issue of whether or not MAPPER meets the criteria for a
"relational database" has been hotly debated since the phrase was
invented.  I'll bet you've heard this tired old argument before:
"MAPPER's flat-file structure can't possibly be relational."  It
would help to know what qualifies a database structure as
relational, and just what a flat file is.
     Hard pressed for an explanation of terms, the mumbling begins.
"Well, a relational database relates data."  Tsk, tsk. You just
broke a cardinal rule: never use a word to define itself.  "Ok, it
relates data elements to other data elements."  That isn't helping
much. 
     "All right, all right.  It maintains associations among
various groups of information..." (sounds like something you could
do with a bunch of 3-by-5 cards and a pencil)  "...on a computer."
Oh.  Well, at least we've qualified the requirements somewhat.  But
now that "maintains" part is bothering me.
     Herein lies the root of the problem.  Without a clear
definition, the term "relational database" can be terribly vague. 
I'm sure the experts have defined it a dozen times over, each
inventing a completely new set of terms, lending to the general
confusion.  Tables, domains, tuples, areas, pages, records, lines,
fields, columns - all different ways of saying basically the same
thing.
     It is imperative to agree on a definition before one can
proceed to shred the opponents shabby and ill-conceived arguments. 
Therefore, I am including two definitions from different points of
view: one from a college textbook and one from a dictionary of
business terms.
     This excerpt is from "Database System Concepts" by Henry F.
Korth and Abraham Silberschatz, McGraw-Hill, copyright 1986, page
45: "A relational database consists of a collection of tables, each
of which is assigned a unique name. ... A row in a table represents
a relationship among a set of values. Since a table is a collection
of such relationships, there is a close correspondence between the
concept of table and the mathematical concept of relation, from
which the relational data model takes its name."
     We don't even want to begin to talk about tuples, do we?  But
you know, table sounds a lot like a MAPPER rid (report), and we can
sure give a report a unique name.  If row could possibly be a line,
well, bingo!  Sounds relational to me!
     In "Barron's Business Guide: Dictionary of Computer Terms"
(2nd Edition) by Michael Covington, Ph.D. and Douglas Downing,
Ph.D. copyright 1989, a relational data base is defined as follows:
"A relational data base is a data base in which some data items in
one type of record refer to records of a different type.  Consider,
for example, a data base of mailing addresses.  Within each record,
only the zip code is given, not the city and state.  There is also
a set of records containing zip codes corresponds.  To print out a
complete address, the computer examines all the data in the address
record and then looks up the appropriate city- and -state record to
obtain additional information."  Data items?  Record types?  Sounds
like MAPPER-speak, to me.  Zip code "corresponds" though?  Once
again, a question of semantics.  
     Two different sources; two different definitions.  But there
seems to be a common theme in all of them: that groups of data
physically separated on the storage medium can be logically
connected when necessary.  Physical separation occurs because
records are grouped together by like kind, possibly in different
"files" that might reside on different disk drives.  To me a more
important question is: "Is it database or data base (one word or
two)?"  If you can't agree on that, you might as well give it up. 
     I humbly submit the following short list of terms, known as
"Rob's Data Processing Dictionary" or RDPD (pronounced ridpid -
sorry, pun intended), shown in Figure 1.

----------------------------------------------------------------  
                         FIGURE 1
                            --------

    Term                   Definition
--------------   --------------------------------------------
Bit              A binary 0 or 1.  A buncha bits is a byte. What  
                 you get when you forget to backup what you're    
               doing and then somehow lose it.
Byte             A buncha bits. A few bytes make a character.     
                  What you'd like to do to the inventer of these  
                  goofy terms.
Character        For all practical purposes the smallest piece of 
                  data that we care about.  One or more characters 
                  is a data field.  Or would one character be a   
                datum field?
Data field       One or more characters.  One or more data fields 
                  is a record.  Where all the characters play     
                 baseball, and make.. uh.. records.. ya know?..
Record           One or more data fields.  A MAPPER line.  One or 
                  more records (lines) constitutes a file.
File             One or more records: can be just about anything. 
                   A MAPPER rid.  One or more files (rids)        
                  constitutes a database.
Flat file        A file without any lumps, somehow inferior to    
                  files that are "non-flat."  Probably really     
                 means "single file", sort of like how            
                certain domesticated animals walk.
Database         Data.  The "base" part seems to be a throwback
                   to the days when there was base (important)    
                  data, and then a bunch of extra stuff floating  
                  around that was not as important as the rest
                   of it (my data versus your data).  Consists of 
                  one or more files (rids).
Pid              Pseudo-identifier.  The number associated with a 
                  terminal session (required for the pun).
Relational       A guy named Al, probably your brother-in-law; a  
                  vague connection that is somehow implied by
                   it's very existence, as in Al's case.
Rid              A MAPPER Report-identifier. Can be just about    
                  anything, including files, tables, areas,       
                 tuples, indexes, domains, etc., etc., etc.
---------------------------------------------------------------

       Most people seem to believe that a relational database
somehow magically connects all kinds of data items.  According to
the two definitions quoted above, this is not far from the truth. 
We aren't really concerned with how those connections are made: we
just want to use them to get some work done.  "Computer, gimme a
list of all the shoes of style X sold in Austin, Tx. between the
beginning of time and next week and compare that to every other
style shoe sold everywhere else" is a request that could evoke
relationships among various data groups.
     Notice that the above request started with the word
"computer," as if hardware and software were somehow one entity. 
This may also be part of the problem.  In the real world, hardware
and software are two distinct entities that just happen to be
located in approximately the same physical space, the "computer". 
A relational database is simply a bunch of data on a storage medium
(hardware).  It's the relational database management system (RDMS)
that actually runs the show (software).
     Then, to complicate things further, there is a relational
database machine.  To me, this phrase refers to a computer that is
dedicated to running a particular RDMS, possibly to the exclusion
of anything else.  The hardware may or may not have been engineered
to maximize software throughput.  Because there are RDMS packages
available to run on just about anything, "relational database
machine" seems a bit moot.  They're all relational at some time or
another.
     For the sake of argument, let's try to establish some
meaningful definitions to work with, and cloud the issue even more. 
Let's say that a relational database consists of multiple reports
(files, areas), that each report consists of one or more lines
(records, rows) of data, and that each line consists of one or more
data items (fields, columns).  Let's say that a relational database
management system allows the association of data items which may
occur on different line types in different reports (see Figure 2).
As you can see, there isn't really any difference in the two
models.  Again, simply one of semantics.  You call it tomato...

----------------------------------------------------------------  
                        FIGURE 2
                           --------

    Generic Relational Model                 MAPPER Model
    ------------------------                 ------------

           Table 1                             Report 1
           -------                             -------- 
     row1  ccccccc                      line1  cccccccc
     row2  ooooooo                      line2  oooooooo
     row3  lllllll                      line3  llllllll
     row4  uuuuuuu                      line4  uuuuuuuu
     row5  mmmmmmm                      line5  mmmmmmmm
     row6  nnnnnnn                      line6  nnnnnnnn
     row7  1234567                      line7  12345678
              .                                   .
              .                                   .               
             .                                   .                
         Table n                             Report n

----------------------------------------------------------------
     Most RDMS packages allow only one way to connect data, via
tables or indexes.  This cross-reference mechanism is built in and
is not subject to modification - you just use it.  This can be part
of the appeal of an RDMS.  A lot of the "code" is built in.  You
simply tell it what data needs to be connected when, a task not
always as easy as it sounds.  Why do you think that they have a
database administrator?  And don't believe that once the database
is defined that all the work is over.
     Back to the old "flat file" argument.  I could not find a
definition for flat file, so the phrase is probably meaningless.
Again, I believe it is used to mean one file, all records organized
sequential.  That sounds similar to a MAPPER rid, but certainly not
MAPPER's entire file structure.  MAPPER can support hundreds of
"super files", with each super file supporting thousands of report
files (rids).  Just because something may happen to be a version of
an element of a program file doesn't make it any less a file in
it's own right.
     At TDHS we currently run four MAPPERs.  One of those MAPPERs
supports about 262 cabinet pairs.  Taking it to the theoretical
limits, if each cabinet were filled up (16,000 reports - 8 drawers
per cabinet X 2,000 reports per drawer), we would have 4,192,000
reports.  If each report contained the maximum of 131,071 lines, we
would have 549,449,632,000 (almost 550 BILLION) lines of data.  If
every report were 256 characters, we would have 1.4058291e14
characters of data!  String that number across the galaxy a few
times.  And I could always add more cabinets.  Now, if you can't
see the need to be able to relate some data at some time, sheesh!
     The particular application drives the database structure in
MAPPER.  If you only need a simple list, that's all you get.  No
fancy relationships required.  If, however, you need a truly
relational database, with tons of tuples to table, MAPPER can do
that, too.
     So, you might not be arguing about whether or not a MAPPER
database is relational, but really whether or not MAPPER is a
relational database management system.  How can you have one
without the other?  The Match function alone would qualify MAPPER
for RDMS status.  After all, everything I've heard about RDMS seems
to hover around matching and merging disparate data, something done
every day in MAPPER, worldwide.