PostgreSQL: Taking E-Business Up a Notch
By Ned Lilly
Original article : http://www.webtechniques.com/archives/2001/01/lilly/
Database management systems are central to Web-based businesses, constituting an .1 billion market in 1999, according to the International Data Corporation (IDC). Large proprietary vendors currently rule the marketplace, with Oracle, IBM, and Microsoft capturing more than 70 percent of sales last year. But open-source database systems are steadily growing more powerful, exhibiting performance and functionality that will soon equal that of the proprietary giants. Each of the three leading open-source database programs—PostgreSQL, InterBase, and MySQL—has an established user base and is well suited to specific business computing tasks. But PostgreSQL is arguably the most advanced relational database of the open-source options, and the best positioned to challenge the supremacy of the commercial leaders.
The Major Players
As Web-based companies begin to need more full-featured, powerful, and flexible technology systems, many are looking more closely at open-source options. From their operating systems and Web servers to their database management systems, a growing number of businesses find that open-source applications are not only ideally suited to e-commerce, but also provide welcome alternatives to the exorbitant licensing fees and slow development cycles that typify proprietary applications.
Of the open-source database options, PostgreSQL is the most robust, object-relational database management system. PostgreSQL, also known as Postgres, has been open source since its inception and evolves quickly due to the committed, active community of developers behind it. From one release to the next, Postgres adds major features within a matter of months, equivalent to upgrades that commercial software companies take years to introduce. The unprecedented rate of revision stems from hundreds of loyal and enthusiastic developers and users who brainstorm fixes and test the latest tweaks, quickly reporting results from all kinds of applications, configurations, and platforms. Postgres' latest version, 7.0, is close to clearing the final hurdles to widespread adoption by enterprise-level businesses. Its next iteration, 7.1, will likely propel it into a position that seriously threatens the reigning commercial giants. The license under which the software is distributed, which hails from its roots at the University of California at Berkeley, is among the simplest and least constricting of open source licenses—furthering its implicit appeal to business users.
Another contender in the field is InterBase, a mature relational database that started out as a closed-source application. In a bid for skilled development talent and a broader user base, Inprise Corporation released the InterBase 6.0 database program under a variation of the open-source Mozilla Public License for Linux, Solaris, and Windows versions. The new release added more advanced ANSI SQL compliance and improved administration and customization tools. With few concurrent users, InterBase is fast on simple reads and complex joins, but its performance drops sharply under the stress of multiple queries and numerous concurrent users. Because InterBase uses a nonshared architecture, as user numbers increase, it must parcel data into ever-smaller partitions, diminishing its performance levels. This major flaw will likely be addressed if InterBase attracts skilled open-source developers. However, the program is new to the open-source world and still lacks the support of a strong developer community.
MySQL is a fundamentally different product from Postgres or InterBase, with different strengths and primary uses. It's a basic, stripped-down database that quickly serves up data to limited numbers of users. Its fast read performance, along with easy integration with Web scripting languages such as Perl and PHP, make it a favorite among Webmasters. MySQL is well suited to processing simple data on Web sites and a popular choice for building fairly low-traffic sites. The project recently moved from a rather constrained licensing scheme to the full GNU Public License (GPL) favored by many open-source projects, and has started to add elements of other existing software to address the feature gap it faces with PostgreSQL and InterBase.
Back to Berkeley: A History of PostgreSQL
PostgreSQL's earliest ancestor was Ingres, developed at the University of California at Berkeley (1977 to 1985). The Ingres code was later enhanced by Relational Technologies, which subsequently became Ingres Corporation, producer of one of the first commercially successful relational database servers. Michael Stonebraker, also at UC Berkeley, led the development of the object-relational database server called Postgres (1986 to 1994). Two Berkeley graduate students, Jolly Chen and Andrew Yu, later added SQL capabilities to Postgres. The resulting project was called Postgres95 (1994 to 1995). Users on the free program's expanding mailing list continued to send in bug fixes and enhancements.
By the summer of 1996, in response to the demand for an open-source SQL database, a team formed to continue Postgres' development. Mark Fournier of Toronto agreed to host the mailing list and provide a server for the source tree, and three other developers also became heavily involved in their spare time. With a thousand subscribers on the mailing list, a server was configured that gave a few developers login accounts to apply patches to the source code. Later that year, the system's name was changed from Postgres95 to PostgreSQL to honor its Berkeley origins and the program's SQL capabilities. Initially, the project had trouble attracting skilled developers, as the code is well structured but highly complex. Eventually, the project's challenging nature, and the core team's collegiality and well-run organization began to attract experienced developers. A passionate, dedicated community of Postgres users, problem-solvers and developers was born and continues to thrive. (The project's online headquarters is located at PostgreSQL.org).
A Closer Look at PostgreSQL
As of its latest release, PostgreSQL is fully ANSI-SQL-92 entry level compliant, supporting most SQL constructs, including transactions, sub-selects, and user-defined types and functions. Standards compliance is critical for any open-source database's acceptance—and a way in which open-source technologies can further differentiate themselves from their closed, proprietary competitors. Corporate infrastructures, often managed by several generations of IT staff, must be able to interoperate with current code and data sources. PostgreSQL supports a range of open-source and proprietary operating systems, including Linux, the various Berkeley Software Development (BSD) systems, Solaris, HP-UX, AIX, Irix, and Windows NT/2000. Already, PostgreSQL provides a powerful technical infrastructure for a wide range of companies that need Web sites, intranets, extranets, and e-commerce applications to conduct their business.
Postgres improved significantly from v6.5 to v7.0. Most of its enhancements greatly reduced or eliminated limitations in the previous version. The new release has more than 70 bug fixes, 100 enhancements and over 30 performance improvements, many of which are transparent to the end user, but enormously significant to developers and database administrators—and illustrative of the growing quality gap between PostgreSQL and its slower-moving closed-source competitors. Some of those enhancements include:
Foreign keys: These are now implemented, with the exception of PARTIAL MATCH foreign keys.
Optimizer: The optimizer offers improved query execution and performance with less memory usage.
Updated psql: The interactive terminal monitor, psql, has new features.
Date/time data types cleanup: The date/time data types now comply with the SQL standard, replacing partially implemented SQL date/time types with full-featured implementations. The default display format for date/time data has also changed to ISO (International Organization for Standardization) style.
Query length limits: These have been removed. Fixed length limits of query strings have been eliminated. (The block-size limit on the length of a stored row still exists, but that problem has already been eliminated in current development sources for version 7.1.)
Eight-argument limit: This has been removed on index keys and functions. The maximum number of keys in an index or arguments to a function is now configurable, with a default limit of 16, rather than the former hard-coded limit of 8.
Sorts and hashes: These now work for more than 2GB of data. Temporary files can now be split in the same way as oversize relations, so that data volume is limited only by available disk space and not by the size of an individual file.
With 7.1, Postgres becomes a serious contender, eliminating major functional barriers to the enterprise business arena. Until now, its larger-scale use has been impeded by concerns about its flexibility, data integrity, and storage capacity for Web site data. Postgres is the first open-source database to support more exotic, custom data types and analysis modes—three-dimensional geometric data displays, for example, for geospatial information used in cartography applications. To ensure data can't be lost, Postgres' next version will include write-ahead or redo logging, a safety net for lost data, and a major requirement for "mission critical" data management. It will also remove the block-size limit on rows referenced above, and include support for OUTER JOIN statements.
PostgreSQL at Work in the Real World
The majority of open-source database users are Web and software developers, small to mid-size e-businesses, systems integrators, and value-added resellers (VARs). These businesses integrate the databases with other open-source applications, yet many open-source databases can also run on proprietary platforms.
Wireless Developer Network (wirelessdevnet.com), an online portal for software developers that focuses on wireless communication technologies, runs 12 servers with PostgreSQL, Red Hat Linux, Apache Web servers, and PHP scripting. The network uses PostgreSQL 7.0 for its content management and to drive e-commerce applications such as book sales, message boards, mailing lists, and software sharing. The network and the GeoCommunity (geocomm.com), its sister site that runs on the same platform, services 150,000 visitors each month. The sites haven't experienced any crashes or data loss, according to the network's managing editor Bryan Morgan.
The network and GeoCommunity originally ran their systems on Oracle and Microsoft platforms. Morgan said he encountered performance and security problems with Microsoft IIS, and he also found that he didn't need all the bells and whistles—or the high licensing fees—that came with Oracle. When the network and GeoCommunity decided to use a single platform, they chose to create a fully open-source system. Morgan reports that Red Hat Linux 6.2 and PHP integrate seamlessly with Postgres.
Ready Set Net (readysetnet.com), a Web development company that creates technology systems and offers Web-hosting services to Fortune 5000 customers, provides good examples of how well open-source products serve a great variety of clients. The company's customers range from restaurant chain California Pizza Kitchen, and industrial leader Nordisk Aviation to national retail giant Just Sports. Reportedly, 98 percent of Ready Set Net's customers who depend on database-enabled Web sites run on systems built on a PostgreSQL 7.0 database platform, with Slackware Linux 7.0 operating systems. Company cofounder and director of technology Gavin Roy says that Postgres handles millions of requests on its network of sites daily.
Time for a Change
Web businesses demand increasingly complex, mission-critical applications of their databases, from supply-chain and customer relationship management (CRM) to data mining and analysis, along with a wide range of e-commerce functions. As these businesses grow and their needs expand, many find closed-source, proprietary development is neither flexible nor dynamic enough to keep pace with their needs. Open-source databases offer businesses greater control over their own software, and let them adapt its features to suit their business models' fluid demands.
So many businesses are waking up to open source's potential that Forrester Research recently predicted open-source products will radically reshape the software industry by 2004. The Internet grew up in an open-source environment, so it's no great surprise that the open-source development model has spawned some of the best applications and tools for Web-based businesses. Forrester Research's analysts suggested that within a few short years, proprietary "captive" developers would no longer be able to compete against the "hordes of Internet-armed revolutionaries" that make up open-source development communities, suggesting the eventual collapse of the closed-source software development model. One of that model's leading proponents, Oracle president Larry Ellison, recently astounded an audience by saying, "choice is not necessarily a good thing" when it comes to configuring business solutions. The world's second-richest man concluded, "We would like every one of our customers to run the identical software configuration." The market for open-source databases on the Web is indeed ripe.