The History of PostgreSQL Development
Copyright © 1999 Bruce Momjian
PostgreSQL is the most advanced open-source database server. It is Object-Relational(ORDBMS), and is supported by a team of Internet developers. PostgreSQL began as Ingres, developed at the University of California at Berkeley(1977-1985). The Ingres code was taken and enhanced by Relational Technologies/Ingres Corporation, which produced one of the first commercially successful relational database servers. (Ingres Corp. was later purchased by Computer Associates.) Also at Berkeley, Michael Stonebraker lead a team to develop an object-relational database server called Postgres(1986-1994). The Postgres code was taken by Illustra and developed into a commercial product. (Illustra was later purchased by Informix and integrated into Informix's Universal Server.) Two Berkeley graduate students, Jolly Chen and Andrew Yu, added SQL capabilities to Postgres, and called it Postgres95(1994-1995). They left Berkeley, but Jolly continued maintaining Postgres95, which had an active mailing list.
In the summer of 1996, it became clear that the demand for an open-source SQL database server was great, and a team should be formed to continue development. Marc G. Fournier, in Toronto, Canada, offered to host the mailing list, and provide a server to host the source tree. The 1,000 mailing list subscribers were moved to the new list, and a server was configured, giving a few people login accounts to apply patches to the source tree using CVS.
Jolly Chen had stated, "This project needs a few people with lots of time, not many people with a little time." With 250,000 lines of C code, we understood what he meant. In the early days, there were four major people involved, Marc, Thomas Lockhart in Pasadena, California, Vadim Mikheev, in Krasnoyarsk, Russia, and myself. We all had full-time jobs, so were doing this in our spare time. It certainly was a challenge.
Our first goal was to scour the old mailing list, evaluating patches that had been posted to fix various problems. The system was quite fragile then, and not easily understood. During the first six months of development, there was fear that a patch would break the system, and we would be unable to correct the problem. Many bug reports had us scratching our heads, trying to figure out not only what was wrong, but how the system even performed many functions.
We inherited a huge installed base. A typical bug report was, "When I do this, it crashes the database back-end." We had a whole list of them. It became clear that some organization was needed. Most bug reports required significant research to fix, and many were duplicates, so our TODO list reported every buggy SQL query. It helped us identify our bugs, and made users aware of them too, cutting down on duplicate bug reports. We had many eager developers, but the learning curve in understanding how the back-end worked was significant. Many developers got involved in the edges of the source code, like language interfaces or database tools, where things were easier to understand. Other developers focused on specific problem queries, trying to locate the source of the bug. It was amazing to see that many bugs were fixed with just one line of C code. Postgres had evolved in an academic environment, and had not been exposed to the full spectrum of real-world queries. During that time, there was talk of adding features, but the instability of the system made bug fixing our major focus.
We changed our name from Postgres95 to PostgreSQL. It's a mouthful, but it touts our SQL capabilities. We started distributing our source tree using sup, which allowed people to keep up-to-date copies of the development tree without downloading a whole tarball. We later switched to remote CVS.
Releases were every 3-5 months. This consisted of 2-3 months of development, one month of beta testing, a major release, and a few weeks to issue subreleases to correct serious bugs. We were never tempted to do a more aggressive schedule with more releases. A database server is not like a word processor or a game, where you can easily restart it if there is a problem. Databases are multi-user, and lock user data inside our servers, so we have to be very careful that released software is as reliable as possible.
Development of source code of this scale and complexity is not for the novice. We had trouble getting developers interested in a project with such a steep learning curve. However, our civilized atmosphere, and our improved reliability and performance, finally helped attract the experienced talent we needed.
Getting our developers the knowledge they needed to assist with PostgreSQL was clearly a priority. We had a TODO list that outlined what needed to be done, but with 250,000 lines of code, taking on any TODO item was a major project. We realized developer education would pay major benefits in helping people get started. We wrote a flowchart of the back-end modules, outlining the purpose of each. We wrote a developers FAQ, to describe some of the common questions of PostgreSQL developers. With this, developers became productive much quicker.
The source code we inherited from Berkeley was very modular, but suffered from bit rot, and some Berkeley coders hadn't understood the proper way to handle certain tasks. Their coding styles were also quite varied. We wrote a tool to format/indent the entire source tree in a consistent manner. We wrote a script to find functions that could be marked as static, or never-called functions that could be removed completely. These are run just before each release. A release checklist reminded us of the things that have to be changed for each release.
As we gained knowledge of the code, we became able to perform more complicated fixes and feature additions. We started to redesign poorly structured code. We moved into a mode where each release had major features, instead of just fixes for previous bugs. We improved SQL conformance, added sub-selects, improved locking, and added major missing SQL functionality. We added commercial-style phone support for those that needed it.
The Usenet discussion group archives started touting us. In the previous year, we had searched for PostgreSQL, and found that many people were recommending other databases, even though we were addressing user concerns as rapidly as possible. One year later, many people were recommending us to users who needed transaction support, complex queries, commercial-grade SQL support, complex data types, and reliability. Other databases were recommended when speed was the overriding concern. This more clearly portrayed our strengths. RedHat's shipment of PostgreSQL as part of their Linux distribution quickly multiplied our user base.
Every release is now a major improvement over the last. Our upcoming 6.5 release marks the development team's final mastery of the source code we inherited from Berkeley. Finally, every code module is understood by at least one development team member. We are now easily adding major features, thanks to the increasing size and experience of our world-wide development team. Like most open-source projects, we don't know how many people are using our software, but our increased functionality, visibility and mailing list traffic clearly point to continued growth for PostgreSQL.