Make Mine MAPPER #26 -------------------- by Rob Haeuser MAPPER Database Structure - A CASE For Relativity --------------------------------------------------------------- The issue of whether or not MAPPER meets the criteria for a "relational database" has been hotly debated since the phrase was invented. I'll bet you've heard this tired old argument before: "MAPPER's flat-file structure can't possibly be relational." It would help to know what qualifies a database structure as relational. And just what is a flat file, anyway? Hard pressed for an explanation of terms, the mumbling begins. "Well, a relational database relates data." Tsk, tsk. You just broke a cardinal rule: never use a word to define itself. "Ok, it relates data elements to other data elements." That isn't helping much. "All right, all right. It maintains associations among various groups of information..." (sounds like something you could do with a bunch of 3-by-5 cards and a pencil (*see below)) "...on a computer." Oh. Well, at least we've qualified things somewhat. But now that "maintains" part is bothering me. And I still haven't heard what a flat-file is... Herein lies the root of the problem. Without a clear definition, the term "relational database" is terribly vague. The experts have defined it a dozen times over, each one inventing a whole new set of terms for the same old stuff, lending to the general confusion. Tables, domains, tuples, areas, pages, records, lines, fields, columns - all different ways of saying the same things. It is imperative to agree on a definition before one can proceed to shred the opponent's shabby and ill-conceived arguments. Therefore, I am including two definitions from different points of view: one from a college textbook and one from a dictionary of business terms. This excerpt is from "Database System Concepts" by Henry F. Korth and Abraham Silberschatz, McGraw-Hill, copyright 1986, page 45: "A relational database consists of a collection of tables, each of which is assigned a unique name. ... A row in a table represents a relationship among a set of values. Since a table is a collection of such relationships, there is a close correspondence between the concept of table and the mathematical concept of relation, from which the relational data model takes its name." We don't even want to begin to talk about tuples, do we? But you know, table sounds a lot like a MAPPER rid (report), and we can sure give a report a unique name. If row could possibly be a line, well, bingo! Sounds relational to me! In "Barron's Business Guide: Dictionary of Computer Terms" (2nd Edition) by Michael Covington, Ph.D. and Douglas Downing, Ph.D. copyright 1989, a relational data base is defined as follows: "A relational data base is a data base in which some data items in one type of record refer to records of a different type. Consider, for example, a data base of mailing addresses. Within each record, only the zip code is given, not the city and state. There is also a set of records containing zip codes corresponds. To print out a complete address, the computer examines all the data in the address record and then looks up the appropriate city- and -state record to obtain additional information." Data items? Record types? Sounds like MAPPER-speak, to me. Zip code "corresponds" though? Once again, a question of semantics. Two different sources; two different definitions. But there seems to be a common theme in all of them: that groups of data physically separated on the storage medium can be logically connected when necessary. Physical separation occurs because records are grouped together by like kind, possibly in different "files" that might reside on different disk drives. To me a more important question is: "Is it database or data base (one word or two)?" If you can't agree on that, you might as well give it up. I humbly submit the following short list of terms, known as "Rob's Data Processing Dictionary" or RDPD (pronounced ridpid - sorry, pun intended), shown in Figure 1, for no other purpose than to distract you from the fact that we still don't have a clue as to how to define any of this stuff. ---------------------------------------------------------------- FIGURE 1 -------- Term Definition ------------ ------------------------------------------------- Bit A binary 0 or 1. A bunch'a bits is a byte. What you get when you forget to backup what you're doing and then somehow lose it. Byte A bunch'a bits. A byte gives you character. What you'd like to do to the inventer of these goofy terms. Character For all practical purposes the smallest piece of data that we care about. One or more characters is a data field. Or would one character be a datum field? Data field One or more characters. One or more data fields is a record. The question asked when a really intuitive person sees some characters playing baseball making, uh, records in a series.. ya know?.. Hey! It is October, isn't it? Record One or more data fields. A MAPPER line. One or more records (lines) constitutes a file. What the author should be given for making you read this. File One or more records: can be just about anything. A MAPPER rid. One or more files (rids) constitutes a database. What you need to get out of the place that your record got you into. Flat file A file without any lumps, somehow inferior to files that are "non-flat." Probably really means "single file", sort of like how certain domesticated animals walk. Pid Pseudo-identifier. The number associated with a terminal session (required for the pun). Relational A guy named Al, probably your brother-in-law; a vague connection that is somehow implied by it's very existence, as in Al's case. Database Data. The "base" part seems to be a throwback to the days when there was base (important) data, and then a bunch of extra stuff floating around that was not as important as the rest of it (my data versus your data). Consists of one or more files (rids). Rid A MAPPER Report-identifier. Can be just about anything, including files, tables, areas, tuples, indexes, domains, etc., etc., etc. An abbreviation for "Rob, the IDiot." ---------------------------------------------------------------- Most people seem to believe that a relational database somehow magically connects all kinds of data items. According to the two definitions quoted above, this is not far from the truth. We don't care how those connections are made: we just want to use them to get some work done. "Computer, gimme a list of all the shoes of style X sold in Austin, Tx. between the beginning of time and next week and compare that to every other shoe style sold everywhere else in the known universe" is a request that could evoke relationships among various data groups. Notice that the above request started with the word "computer," as if hardware and software were somehow one entity. This may also be part of the problem. I call it the "Scotty Syndrome". In the real world, hardware and software are two distinct entities that just happen to be located in approximately the same physical space, the "computer". A relational database is simply a bunch of data on a storage medium (hardware). It's the relational database management system (RDMS) that runs the show (software). To complicate things further there is a relational database machine. To me, that refers to a computer that is dedicated to running a particular RDMS, possibly to the exclusion of anything else. The hardware and software may or may not have been engineered to maximize throughput, so you've got good ones and bad ones. Because there are RDMS packages available to run on just about anything, "relational database machine" seems a bit moot. They all get relational at some time or another. For the sake of argument, let's try to establish some meaningful definitions to work with (forget Figure 1), and cloud the issue even more. Let's say that a relational database consists of multiple reports (files, areas), that each report consists of one or more lines (records, rows) of data, and that each line consists of one or more data items (fields, columns). Let's say that a relational database management system allows the association of data items which may occur on different line types in different reports (see Figure 2). ---------------------------------------------------------------- FIGURE 2 -------- Generic Relational Model MAPPER Model ------------------------ ------------------------ Table 1...n Report 1...n ------- -------- row 1 ccccccc line 1 cccccccc row 2 ooooooo line 2 oooooooo row 3 lllllll line 3 llllllll row 4 uuuuuuu line 4 uuuuuuuu row 5 mmmmmmm line 5 mmmmmmmm row 6 nnnnnnn line 6 nnnnnnnn row 7 1234567 line 7 12345678 . . . . row n line n ---------------------------------------------------------------- There isn't really any difference in the two models. Again, simply one of semantics. You call it tomato, I spell it potato... You see the problems with a 3-by-5 card system yet, Mr. Vice- president (*see above)? Most RDMS packages allow only one way to connect data, via tables or indexes. This cross-reference mechanism is built in and is not subject to modification - you just use it. This can be part of the appeal of an RDMS. A lot of the "code" is built in. You simply tell it what data needs to be connected when, a task not always as easy as it sounds. Why do you think that they have a database administrator? And don't believe that once the database is defined that all the work is over. It has only just begun. Back to the old "flat file" argument. I could not find a definition, so the phrase is probably obsolete, anyway. Again, I believe it is used to mean one file, all records in sequential order. That sounds similar to a MAPPER rid, but certainly not MAPPER's entire file structure. MAPPER can support hundreds of "super files", with each super file supporting thousands of report files (rids). Just because something may happen to be a version of an element of a program file doesn't make it any less a file in it's own right. At TDHS we currently run three MAPPERs. One of those MAPPERs supports about 262 cabinet pairs. Taking it to the theoretical limits, if each cabinet were filled up (16,000 reports - 8 drawers per cabinet X 2,000 reports per drawer), we would have 4,192,000 reports. If each report contained the maximum of 131,071 lines, we would have 549,449,632,000 (almost 550 BILLION) lines of data. If every report were 256 characters, we would have 1.4058291e14 characters of data! String that number across the galaxy a few times. And I could always add more cabinets. Now, if you can't see the need to be able to relate some data sometime somewhere, sheesh! The particular application drives the database structure in MAPPER. If you only need a simple list, that's all you get. No fancy relationships required. If, however, you need a truly relational database, with tons of tuples to table, MAPPER can do that, too. So, you might not be arguing about whether or not a MAPPER database is relational, but really whether or not MAPPER is a relational database management system. How can you have one without the other? The Match function alone would qualify MAPPER for RDMS status. After all, everything I've heard about RDMS seems to hover around matching and merging disparate data, something done every day in MAPPER, worldwide. ----------------------------------------------------------------- Rob Haeuser has over 17 years of Data Processing experience. He has been the MAPPER Coordinator for the Texas Department of Human Services since 1983. Covering MAPPER topics ranging from technical to tacky, his never-ending quest is for truth, justice, and the MAPPER way. This is Article number 26.