MAPPER DataBase Structure - A Case For Relativity ------------------------------------------------- by Rob Haeuser ---------------------------------------------------------------- The issue of whether or not MAPPER meets the criteria for a "relational database" has been hotly debated since the phrase was invented. I'll bet you've heard this tired old argument before: "MAPPER's flat-file structure can't possibly be relational." It would help to know what qualifies a database structure as relational, and just what a flat file is. Hard pressed for an explanation of terms, the mumbling begins. "Well, a relational database relates data." Tsk, tsk. You just broke a cardinal rule: never use a word to define itself. "Ok, it relates data elements to other data elements." That isn't helping much. "All right, all right. It maintains associations among various groups of information..." (sounds like something you could do with a bunch of 3-by-5 cards and a pencil) "...on a computer." Oh. Well, at least we've qualified the requirements somewhat. But now that "maintains" part is bothering me. Herein lies the root of the problem. Without a clear definition, the term "relational database" can be terribly vague. I'm sure the experts have defined it a dozen times over, each inventing a completely new set of terms, lending to the general confusion. Tables, domains, tuples, areas, pages, records, lines, fields, columns - all different ways of saying basically the same thing. It is imperative to agree on a definition before one can proceed to shred the opponents shabby and ill-conceived arguments. Therefore, I am including two definitions from different points of view: one from a college textbook and one from a dictionary of business terms. This excerpt is from "Database System Concepts" by Henry F. Korth and Abraham Silberschatz, McGraw-Hill, copyright 1986, page 45: "A relational database consists of a collection of tables, each of which is assigned a unique name. ... A row in a table represents a relationship among a set of values. Since a table is a collection of such relationships, there is a close correspondence between the concept of table and the mathematical concept of relation, from which the relational data model takes its name." We don't even want to begin to talk about tuples, do we? But you know, table sounds a lot like a MAPPER rid (report), and we can sure give a report a unique name. If row could possibly be a line, well, bingo! Sounds relational to me! In "Barron's Business Guide: Dictionary of Computer Terms" (2nd Edition) by Michael Covington, Ph.D. and Douglas Downing, Ph.D. copyright 1989, a relational data base is defined as follows: "A relational data base is a data base in which some data items in one type of record refer to records of a different type. Consider, for example, a data base of mailing addresses. Within each record, only the zip code is given, not the city and state. There is also a set of records containing zip codes corresponds. To print out a complete address, the computer examines all the data in the address record and then looks up the appropriate city- and -state record to obtain additional information." Data items? Record types? Sounds like MAPPER-speak, to me. Zip code "corresponds" though? Once again, a question of semantics. Two different sources; two different definitions. But there seems to be a common theme in all of them: that groups of data physically separated on the storage medium can be logically connected when necessary. Physical separation occurs because records are grouped together by like kind, possibly in different "files" that might reside on different disk drives. To me a more important question is: "Is it database or data base (one word or two)?" If you can't agree on that, you might as well give it up. I humbly submit the following short list of terms, known as "Rob's Data Processing Dictionary" or RDPD (pronounced ridpid - sorry, pun intended), shown in Figure 1. ---------------------------------------------------------------- FIGURE 1 -------- Term Definition -------------- -------------------------------------------- Bit A binary 0 or 1. A buncha bits is a byte. What you get when you forget to backup what you're doing and then somehow lose it. Byte A buncha bits. A few bytes make a character. What you'd like to do to the inventer of these goofy terms. Character For all practical purposes the smallest piece of data that we care about. One or more characters is a data field. Or would one character be a datum field? Data field One or more characters. One or more data fields is a record. Where all the characters play baseball, and make.. uh.. records.. ya know?.. Record One or more data fields. A MAPPER line. One or more records (lines) constitutes a file. File One or more records: can be just about anything. A MAPPER rid. One or more files (rids) constitutes a database. Flat file A file without any lumps, somehow inferior to files that are "non-flat." Probably really means "single file", sort of like how certain domesticated animals walk. Database Data. The "base" part seems to be a throwback to the days when there was base (important) data, and then a bunch of extra stuff floating around that was not as important as the rest of it (my data versus your data). Consists of one or more files (rids). Pid Pseudo-identifier. The number associated with a terminal session (required for the pun). Relational A guy named Al, probably your brother-in-law; a vague connection that is somehow implied by it's very existence, as in Al's case. Rid A MAPPER Report-identifier. Can be just about anything, including files, tables, areas, tuples, indexes, domains, etc., etc., etc. --------------------------------------------------------------- Most people seem to believe that a relational database somehow magically connects all kinds of data items. According to the two definitions quoted above, this is not far from the truth. We aren't really concerned with how those connections are made: we just want to use them to get some work done. "Computer, gimme a list of all the shoes of style X sold in Austin, Tx. between the beginning of time and next week and compare that to every other style shoe sold everywhere else" is a request that could evoke relationships among various data groups. Notice that the above request started with the word "computer," as if hardware and software were somehow one entity. This may also be part of the problem. In the real world, hardware and software are two distinct entities that just happen to be located in approximately the same physical space, the "computer". A relational database is simply a bunch of data on a storage medium (hardware). It's the relational database management system (RDMS) that actually runs the show (software). Then, to complicate things further, there is a relational database machine. To me, this phrase refers to a computer that is dedicated to running a particular RDMS, possibly to the exclusion of anything else. The hardware may or may not have been engineered to maximize software throughput. Because there are RDMS packages available to run on just about anything, "relational database machine" seems a bit moot. They're all relational at some time or another. For the sake of argument, let's try to establish some meaningful definitions to work with, and cloud the issue even more. Let's say that a relational database consists of multiple reports (files, areas), that each report consists of one or more lines (records, rows) of data, and that each line consists of one or more data items (fields, columns). Let's say that a relational database management system allows the association of data items which may occur on different line types in different reports (see Figure 2). As you can see, there isn't really any difference in the two models. Again, simply one of semantics. You call it tomato... ---------------------------------------------------------------- FIGURE 2 -------- Generic Relational Model MAPPER Model ------------------------ ------------ Table 1 Report 1 ------- -------- row1 ccccccc line1 cccccccc row2 ooooooo line2 oooooooo row3 lllllll line3 llllllll row4 uuuuuuu line4 uuuuuuuu row5 mmmmmmm line5 mmmmmmmm row6 nnnnnnn line6 nnnnnnnn row7 1234567 line7 12345678 . . . . . . Table n Report n ---------------------------------------------------------------- Most RDMS packages allow only one way to connect data, via tables or indexes. This cross-reference mechanism is built in and is not subject to modification - you just use it. This can be part of the appeal of an RDMS. A lot of the "code" is built in. You simply tell it what data needs to be connected when, a task not always as easy as it sounds. Why do you think that they have a database administrator? And don't believe that once the database is defined that all the work is over. Back to the old "flat file" argument. I could not find a definition for flat file, so the phrase is probably meaningless. Again, I believe it is used to mean one file, all records organized sequential. That sounds similar to a MAPPER rid, but certainly not MAPPER's entire file structure. MAPPER can support hundreds of "super files", with each super file supporting thousands of report files (rids). Just because something may happen to be a version of an element of a program file doesn't make it any less a file in it's own right. At TDHS we currently run four MAPPERs. One of those MAPPERs supports about 262 cabinet pairs. Taking it to the theoretical limits, if each cabinet were filled up (16,000 reports - 8 drawers per cabinet X 2,000 reports per drawer), we would have 4,192,000 reports. If each report contained the maximum of 131,071 lines, we would have 549,449,632,000 (almost 550 BILLION) lines of data. If every report were 256 characters, we would have 1.4058291e14 characters of data! String that number across the galaxy a few times. And I could always add more cabinets. Now, if you can't see the need to be able to relate some data at some time, sheesh! The particular application drives the database structure in MAPPER. If you only need a simple list, that's all you get. No fancy relationships required. If, however, you need a truly relational database, with tons of tuples to table, MAPPER can do that, too. So, you might not be arguing about whether or not a MAPPER database is relational, but really whether or not MAPPER is a relational database management system. How can you have one without the other? The Match function alone would qualify MAPPER for RDMS status. After all, everything I've heard about RDMS seems to hover around matching and merging disparate data, something done every day in MAPPER, worldwide.