Web Projects #1: Background and Motivation

In the summer of 2007 I accepted a one-year PA-ship as editorial assistant at an academic journal; it was my second such job, and my previous experience certainly aided in getting the job. Where this position differed from the previous was, among other things, in a non-editorial task I’d been assigned and for which I was hired: improving our office database. This as well as a previous professional project as well as personal projects, along with their complications, are described below.

“Database” in this first instance refers to a FileMaker Pro setup my predecessor designed, and for what it was and compared to what it replaced—manilla folders—it was successful. As it was a journal database, it had to keep track of two main fields of interest: article submissions and books for review. Article submissions had to be tracked through the submission process (from submission, being looked at by the editor, sent out for peer review, and so on) and keep track of the article title, its areas of focus (region and subject), it author(s), author contact information, etc. When it came to books we had to keep track of books sent to us for review, books we ordered for review, where books were (in our office, with a book review editor, under review and at the reviewer, `killed’ and donated to a library, etc.), review status (if a book were under review), and so on.

There were several immediately obvious limitations to the implementation I was handed. Most glaring was the proliferation of “author” or “editor” type fields for books. There was a single, giant “books” table that had four author fields; if a book actually had more than four authors and/or editors, one would have to fall back on “et. al” or input multiple authors per field. There was no real convention for distinguishing authors and editors (or illustrators, translators). To query an author or editor meant a rather blunt querying of all author/editor fields rather than a simple search of the first field. Obviously—obviously, I say with a hint of arrogance—there should have been a separate “People” table or the like that could be linked to books in a many-to-many fashion. Editors vs. authors should be marked. And a simple people table would allow us to link authors of article submissions to authors/editors of books. Likewise a table of article reviewers could be linked to these “people,” and so on.

A similar problem set was encountered years earlier in my first editorial assistant position, though the differences are manifold. First I was not tasked there with implementing a new or improved setup. Secondly the solution we had was considerably more fragile and ill-suited to the task than the FileMaker Pro database. Furthermore we actually had several databases there, one of which was a reasonably designed Access setup to which I had access for querying but which I was not to modify. And on top of that I had my own copy of a related but not synced database that I did have to manage; it, too, was in Access. Our article submis were logged with the computer; Word Perfect templates were populated from an addressbook and physical folders were prepared, but no comprehensive electronic database of the article submissions or the review process was maintained. The latter was handled via paper and email by the main editor even though I prepared the folders and manuscripts for him. Also handled through the Word Perfect addressbook was our book review process and said addressbook was our database. In it we kept not just contact information for (potential) reviewers but also, in a modified sheet/layer with custom fields, the books themselves. Review status was maintained there and exported to Word Perfect templates for our perusal. The first Access database was not highly complex but did have several large tables with an excess of fields; it was a rather denormalized enterprise. The second Access database, which I updated extensively each summer, was accessed through a number of relatively easy-to-use forms. They connected data related to (a) universities (in the U.S. and Canada), (b) relevant departments (modern or foreign languages, etc.), and (c) faculty employed in those departments. It was not properly normalized, but application-level safeguards were in place to main integrity as best as possible; we still had to proof again submitted print records.

None of these databases at the first journal were online. Worse, the book review(s) database was fragile and tied to a particular version of Word Perfect; attempting to convert it always ended in tears. The Access databases did not like to “play nice” on the network; as a result, all the databases were tied to the machines they were on. Frequently the book review editor would ask me to run queries in the main Access database (against research, teaching, and publishing interests) to find suitable potential new book reviewers; still, our actual book information (and our publisher information) was held in an addressbook.

It made certain sense to me that—at each project—all the related databases should (a) be in one database, (b) should be held in an at least somewhat normalized RDBMS, (c) and that by doing these two things we could gain functionality and the ability to access said databases from multiple locations. An editor would not have to come to me to ask who did such and such sort of work, for example; he or she could do it alone. We could more easily track all the publications from a given publisher, or, if we linked our Access and addressbook data, all publications in a given time frame from people in a certain department. In fact it seems obvious to anyone who has worked with SQLite, MySQL, PostgreSQL, and the like in almost any sort of capacity that such matters of data manipulation are trivial, but for us they were a chore.

In my free time during the first editorial job I thought about creating a webapp for our databases. I queried friends in IT. I ran the local Linux group at the time but was only just then becoming acquainted with RDBMSs. Before the second editorial stint I was interested in such apps for my own data, for content management, and the like. I had already admined Scoop boxen and was generally familiar with LAMP. During the second editorial run designing such a solution was part of my job. But it had also opened up new questions that related to other interests of mine.

A simple one goes as follows: how do with deal with people and names in a database? At the first journal it was not much of an issue. Most of the people we tracked were from an Anglo-American or at least Western European cultural background. They fell into two categories: acknowledged faculty in the U.S. and Canada as well as scholars in North American and Europe (mainly Germany). I’d already lived in Hungary, my first exposure to a culture in which the family name is usually presented before the given name, but while I was familiar with that variation, it did not cause us much pause. Enter the second journal, focused on Asia and Asian studies … suddenly the order of family name and given name was relevant. Furthermore to speakers not versed with a given language/culture, it can be difficult at times to determine which is the family name, which the given. For a journal this is important for at least two reasons. First we needed to send correspondence, and so had to be able to address “Mr. So-and-so,” “Ms. So-and-so,” “Prof. So-and-so” and so on. Beyond that we needed “proper” alphabetizing for our bibliographic records. If only only refers to someone by his or her family name (e.g. “Mr. Smith”) then dealing with the order of given and family names is irrelevant. But it is important to distinguish “Hello, this is John Smith” from “Hello, this is Kovács László.” One could have a field in the database to mark whether, in “normal” situations, the family name came first or last, and in the web interface, as is frequently the case, one can translate the family name to all capital letters to mark it. But this does not provide a general solution: what about those (e.g. from Iceland) who may have only a “given” name and no surname? Do we distinguish family names from patronymics or matronymics? “Middle names” are culturally specific; do we keep a “middle name” (or initial) field in the database and just cope with a large number of BLANK (if not NULL) fields? And so on.

This is not a solution: the above is just what got me started thinking about the problem. And if we were keeping track of “people” (who were in our database) in general, should we not implement a couple other tables as a generic addressbook (since we already had most of the data)?

And at home I was interested in related but not identical problems, mainly bibliographic in nature. Evenings were spent considering bibliographic software for my dissertation; existing solutions—free or otherwise—tended to be poorly modeled in my view. In addition to my academic references I had my own personal library to think about, and, by extension, my collection of comic books. Or magazines. And so on. This led me to FRBR, about which I’ll write at a later date.

To summarize: what got me into the web projects that currently consume were bibliographic and and person databases for academic journals and my own projects. Proper data modeling and data integrity rather than query speed or time to market were my main concerns, even though, with the journal project being “on time” and “good enough” was important.

Next: Language Choice

About Steve

47 and counting.
This entry was posted in Code and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *