database.sarang.net
UserID
Passwd
Database
ㆍDBMS
MySQL
PostgreSQL
Firebird
Oracle
Informix
Sybase
MS-SQL
DB2
Cache
CUBRID
LDAP
ALTIBASE
Tibero
DB 문서들
스터디
Community
공지사항
자유게시판
구인|구직
DSN 갤러리
도움주신분들
Admin
운영게시판
최근게시물
DBMS Columns 504 게시물 읽기
 News | Q&A | Columns | Tutorials | Devel | Files | Links
No. 504
Open Source for the Enterprise
작성자
정재익(advance)
작성일
2002-08-12 11:25
조회수
15,429

Open Source for the Enterprise

 

원본출처 : http://www.extremetech.com/print_article/0,3998,a=9639,00.asp

 

July 26, 2001

By: Jim Hurd

 

Apache has captured 62% of the market share for Web servers per Netcraft. Linux has captured 24% of the server market. GCC has been ported to virtually every operating system in existence. So when are you going to write your first program on an Open Source platform?

 

If you already know how to write enterprise class software for proprietary operating systems like Windows or Solaris, it's easy to make the jump to Open Source platforms. And everything you need is free--including this very valuable article :-).

 

Having been primarily a Windows developer for most of my career, my most recent project required software that could run comfortably on Windows as well as Linux, BSD, and AIX. Our software uses a Web interface, running as a daemon on Unix and a service on Windows 2000. Read on to see how we manage this process and how you can take your next project to the world of Open Source.

 

Getting Started

 

First, you need to decide on your open source operating system of choice. The primary contenders are:

 

 

Open Source OS of Choice

Linux Linux is the most widely supported, and is the most feature rich of any of the platforms. IBM has thrown its marketing muscle (and checkbook) behind Linux. Dell, Compaq, and Intel have all jumped on board.

The most widely supported distribution is Red Hat 6.2. Red Hat 7.1 contains many interesting advances but don't use it for development. The GCC 2.96 that comes with the 7.x distribution has serious flaws. Also SAP DB and other developer programs have trouble running correctly on 7.x. Don't say we didn't warn you.

 

There are many, many, other distributions. Debian, Suse, Mandrake, Corel, and Caldera come to mind. But it doesn't stop there. There are dozens more "full" distributions, and even more specialized distributions like the Linux Router Project (Linux firewall on a floppy). Then there are platform specific distributions like Compaq's Linux for IPAQ.

 

FreeBSD http://www.freebsd.org FreeBSD is the most widely used of the open source BSD operating systems. FreeBSD enjoys a well-deserved reputation for a fast, reliable TCP stack and no-downtime reliability. FreeBSD provides a Linux compatibility mode, which allows it to run a large number of Linux applications.

FreeBSD has become the open source operating system of choice at over 1000 ISP's. Yahoo runs on FreeBSD. Apple used FreeBSD technology in Darwin, the core of the new Mac OS X.

 

OpenBSD http://www.openbsd.org OpenBSD developers share a singular focus on security. OpenBSD is notable for installing with IPSEC, and for configuring in "lockdown" mode. This makes it a natural for firewall-like applications. OpenBSD developers are pushing hard to add hardware-supported cryptography as well. Like FreeBSD, Linux compatibility is provided.

 

NetBSD http://www.netbsd.org NetBSD developers take pride in running on everything. While it runs just fine on your vanilla Intel, Alpha, and PowerPC boxes, you can also find NetBSD running on Vax, Amiga, and DreamCast! Portability is big with the NetBSD folks. If your application requires a computing platform that's off the beaten track, NetBSD might be just the ticket. Linux compatibility is provided.

 

Debian/GNU Hurd http://www.debian.org Hurd is the GNU microkernel project. Starting with the Mach kernel designed at Carnegie Mellon, Debian developers are stitching together a full distribution. Microkernels are designed to get away from a basic problem in Unix-like architectures: To do anything fun you need the root password. Mach is designed to let normal users do things that only root could do on Linux; that makes it a natural on shared ISP boxes. Right now Hurd is a bit rough, so it's probably not the best place to start.

 

Darwin http://www.opensource.apple.com Apple's Darwin draws heavily from FreeBSD, but replaces the FreeBSD kernel with its own Mach-based kernel. Darwin runs on Macintosh and x86 hardware, and supports the Macintosh file system as well as the standard BSD file systems. Programs written for x86 Darwin can be recompiled to run on Mac OS X.

 

Baseline Development Hardware

How often have you heard something like "Oh, I just run Linux on my old 16 MHz 386"? Don't believe it. If you wouldn't run Windows on it, I don't recommend you try to run Linux or BSD as a development workstation either. X Windows has a voracious appetite for memory and processor cycles, as do the GNU compilers. That goes for Ghostscript, Jade, Tex, and a host of other useful applications. Throw in a few VMWare virtual machines to do compatibility testing on other platforms, and you can see why I recommend the beefiest box you can lay your hands on for development purposes.

 

software Development Environment Tools

VMWare is a worthwhile investment for any Windows developer diving into the world of Open Source. VMWare creates virtual machines on top of either Windows 2000 or Linux that allow the guest operating system to behave as if it were running on a standalone machine. While this may sound slow, the performance is very respectable. It is not an x86 interpreter; most guest code is executed at full machine speeds. As long as you have a few gigabytes free on your Windows 2000 development machine, you can easily run most of the Open Source operating systems without rebooting and without disrupting your normal development environment. Using VMWare, we can run Windows 2000, Linux, FreeBSD, OpenBSD, NetBSD, and Solaris on a single machine all at the same time without rebooting. Even if you only use one operating system, it's great for keeping multiple versions available. It's also great for testing network applications and firewall configurations since each virtual machine operates as an independent network node.

 

http://www.extremetech.com/image_popup/0,3969,s=0&iid=3237,00.asp

 

Next we recommend that you install Cygwin(http://sources.redhat.com/cygwin/) to pump up your Windows development platform. The Cygwin UNIX compatibility library provides a variety of Unix utilities recompiled to run on Windows. Cygwin remains the preferred way (or only way) to run many useful open source programs on Windows. In their latest triumph, the Cygwin group has managed to port XFree86 to Windows using Cygwin.

 

Even when considering that other utilities are available in Windows native form, Cygwin offers the advantage of grabbing the whole Unix banana in one install. For other tools you use frequently, you might find their native Windows versions to be more to your liking. We use the Windows native versions of CVS and Perl, for example.

 

What about using Cygwin for your own cross platform development? Be sure to factor Cygwin's GNU Public License (GPL) into your decision. By the requirement of the GPL, all programs linked to Cygwin must themselves be available as open source under the GPL. Alternately, you can purchase a commercial license from Red Hat that removes this restriction. We generally avoid using Cygwin for new development, since not all our clients want the source we write for them to be released under the GPL. Mozilla's NSPR (Netscape Portable Runtime) meets our cross platform needs and carries much less restrictive licensing terms.

 

The source control program used by the vast majority of open source developers is CVS (Concurrent Versions System). If your company does not already require a different source management system, you should consider using CVS. Just the fact that CVS is available everywhere, typically pre-installed on most distributions, is a big win. Also, the CVS client-server protocol works well even on slow connections. Finally, you will need to use CVS to coordinate with other Open Source libraries, so using CVS means only having to remember one source control system.

 

Practically speaking, you need a Unix-based OS to run your own CVS server. The Windows client is solid, but the server is not easy to configure on Windows. If you don't want to dedicate a Linux server, VMWare works great to run CVS on Linux under Windows 2000.

 

C/C++ Rules

 

At Datagrove, we think C++ is the best available general purpose programming language for mission critical, enterprise class programming. Evidently we are not alone.

 

Considered separately, C and C++ are the #1 and #2 languages of Open Source. Taken together, they dwarf the next closest language (Java). For all practical purposes C and C++ are the same language with different programming conventions. Similarly, for all practical purposes C is a subset of C++ (See C vs. C++.). Virtually every C programmer knows C++ (but chooses not to use its features), while every C++ programmer by language definition knows C. The same compilers are used to compile both languages. We will typically use C++ to refer to C and C++. While this is not quite right, it looks better than writing C/C++ all the time.

 

Taken together C and C++ account for 44% of all SourceForge hosted Open Source projects. Among the top 25 projects on SourceForge, C and C++ account for 88% of the projects with Java as the only other language cracking the top 25.

 

Language Share on Top 25 projects 
    C++ dominates  
    C++           88% 
   Java           12% 

 

Not coincidentally, C++ is an international standard controlled by no single corporation. The GNU C++ compiler itself remains a centerpiece of the open source community. GNU C++ incorporates the collective

 

http://www.extremetech.com/image_popup/0,3969,s=0&iid=3242,00.asp

 

Language Breakdown

[b] Languages with 1% or greater share of projects on SourceForge [/b]
          C 4646 24% 
         C++                  3837           20% 
         Java                 2557           13% 
         Perl                  2026           10% 
         PHP                 1885            10% 
         Python               876             5% 
         Other                 498             3% 
         Unix Shell          421              2% 
         Assembly          374              2% 
         Visual Basic       358             2% 
         Tcl                    308             2% 
         JavaScript          241             1% 
         Delphi/Kylix       228             1% 
         PL/SQL             222             1% 
         Lisp                  116             1% 
         Pascal              107             1% 
         ASP                  105             1% 

 

Java: #2 Has to Try Harder

 

The Open Source community has yet to fully embrace Java. Open Source luminaries Bob Young and Linus Torvalds have repeatedly chided Sun for keeping Java proprietary. Sun only recently recognized Linux, and still does not officially support any open BSD variant. Sun supports only x86 architectures on Linux, leaving the rest of the Linux world to their own devices. As a result, open source Java programmers are split among four incompatible implementations: Sun/Blackdown, IBM, Kaffe, and GCC. No wonder Linux and especially BSD are widely viewed as disadvantaged Java platforms.

 

In spite of these disadvantages, Java is still second only to C++ in popularity. It has slightly more SourceForge projects than Perl, and is the only language to crack the top 25. Freenet(http://freenet.sourceforge.net/), at #3, is the top Java project by number of downloads. This is probably somewhat deceptive, since Perl's CPAN pre-dates SourceForge, and many of the most popular Open Source Perl modules on CPAN are not hosted on SourceForge. Perl's popularity is a bit harder to measure because CPAN is widely mirrored. But looking at paying Linux jobs on dice.com also reinforces Java's Open Source popularity. Java occupies the second slot behind C++ and just ahead of Perl.

 

http://www.extremetech.com/image_popup/0,3969,s=0&iid=3243,00.asp

 

Open Source programmers have created two of the most interesting implementations of Java: GCJ and Kaffe. GCJ is distributed by the Free Software Foundation as part of the GNU Compiler Collection (GCC). Kaffe was a project begun by Tim Wilkinson to create a clean room implementation of the Java VM from the specifications.

 

Unlike Sun's Java implementation, GCJ compiles directly to native code. While this has obvious performance benefits, it has serious compatibility problems. Sun does not openly license source for its core libraries. The GCJ implementers are left with two unpalatable choices: reinvent the libraries, or compile the VM code. They have tried both, but neither has been completely effective. The sheer mass of Java libraries and the rate at which they change makes it impractical to reinvent them in lockstep. Compiling the VM code has grown increasingly more difficult as Sun has added features to the VM that are not easy to implement in a timely manner. From the GCJ FAQ:

 

 

To make things worse, you can't simply run Sun's JDK classes on any old JVM--they assume that a bunch of native methods are also defined. Since this native method requirement isn't defined by the JDK specs, you're effectively constrained to using Sun's JVMs if you want to use Sun's JDK libraries. Oh yes--you could also reimplement all of those native methods yourself, and make sure they behave exactly as Sun's do. Note that they're undocumented!

A hidden value in the GCJ project is proving the value of the Boehm conservative garbage collector. The conservative garbage collection concept was invented to add garbage collection to C and C++, but it has had little success in winning converts in that community. However, Bjarne Stroustrup, C++ inventor, has made positive comments about the Boehm collector for C++. Still C++ programmers have stuck to their destructors. Since Java requires a garbage collector, the GCJ team chose the Boehm collector. Hopefully the GCC team will see fit to integrate and test that capability in their C++ compilers in the near future.

 

The Kaffe project has created the most popular Open Source version of the Java VM. It implements the VM in the traditional way: interpreter + JIT compiler. The JIT compiler is modular--most Kaffe ports do not try to implement the JIT. Even with the JIT, Kaffe is not generally competitive in performance to the Sun-derived VM's. And it suffers from the same compatibility issues as GCJ--many of Sun's libraries won't run on it. Kaffe is GPL'd, which is among the most restrictive of Open Source licenses. The Japhar project was started to provide an LGPL alternative to Kaffe. The Japhar VM has no JIT, and is generally behind Kaffe in compatibility and providing ports.

 

Perl: The Scripting King

 

Perl(http://www.perl.com/) was the original Open Source scripting language. Perl burst on the Unix scene at a time when System Administrators struggled with multiple incompatible shell scripting languages. Perl fit that ecological niche so well, that people began trying it in other environments. Whenever it couldn't quite do the necessary job, someone would write a Perl module and contribute it to the Perl community. So while Larry Wall has always been the sole creative force behind the Perl core language, most Perl development is a community effort. Perl programmers became a model for how an Open Source community should function. When the Web happened along, it was not initially populated by Web designers; it was engineers and system administrators communicating with each other. Naturally they used Perl for this task too.

 

Perl programs are easy to move among platforms, and Perl has a long history of integration with C++ projects. Perl's open source cross platform library support is second to none. At Datagrove we use Perl as a general-purpose problem solver. The best applications for Perl tend to be small programs, typically less than 1,000 lines, which are not especially performance sensitive. That gives it great synergy with C++ whose strengths tend to emerge in large scale, performance sensitive programs.

 

PHP: Open Source ASP

 

According to its home page, PHP(http://www.php.net/) is a server-side, cross-platform, HTML embedded scripting language. PHP is the one trick pony of Open Source--It only does Web servers. But this single focus, and its association with the Apache project, have allowed it to achieve an amazing market share and mindshare in just a short period of time.

 

PHP has two major assets: It has a very low barrier to entry, and it has an absolutely killer library. The nature of PHP is such that you typically add a few lines of code to a static site to accomplish some specific thing that you can't do with just HTML or client side Javascript. Then you add a few more, and a few more, and voila! You have one of the two million PHP sites in the world.

 

The library continues to grow at an impressive clip. PDF creation, native database drivers for a dozen different database formats, compression functions, text searching, and on and on. The huge standard library keeps most PHP programs short, and allows tiny PHP fragments to be swapped among Web designers like Pokemon cards.

 

PHP for non-programmers

 

No doubt one reason for PHP's popularity is it's low barrier to entry. Web designers can learn just a few useful tricks that can take them a long way.

 

One continual bane of Web designers is the subtle differences between browsers. It's increasingly difficult to create a top tier Web design without tweaking the page for some browser idiosyncrasy. PHP gives the Web designer a quick and easy way to tweak parts of the page for everyone's favorite browser:

 


You are using Internet Explorer
You are not using Internet Explorer

 

Python: The Language of Zope

 

Of course it is unfair to characterize Python(http://www.python.org/) solely for its role in Zope(http://www.zope.org/). Python, like Perl, pre-dates the Internet Age. It began as Perl did--as a general purpose scripting language for System Administrators. But Zope has propelled Python to a wider Web developer audience than it would have enjoyed otherwise.

 

Like PHP, the focus of Zope is embedding code within otherwise static HTML. Unlike PHP, Zope provides the whole enchilada including the Web server, full text search, and a scripting language. You don't use Zope with Apache; Zope replaces Apache.

 

And you don't really need Zope to use Python. Python works well using either CGI or FastCGI. And Python is about more than just Web serving. Python makes it easy to create graphical user interfaces thanks to its link to TK (originally created for TCL). Combining TK and its link to Expect (another TCL staple), Python makes a great way to put a GUI face on character-based applications. With XFree86 finally making its way to Windows (as part of Cygwin), we may even see a resurgence of interest in non-Web GUI's.

 

Most Python programmers pick it just because they like Python. They like the aesthetics of the language--the way it looks. "Python fits in your brain", declares Bruce Eckel, an author noted for his C++ tutorials. It's not the fastest, and it doesn't offer the biggest library. To understand the appeal of Python you'll just have to try it.

 

C vs C++

 

Dedicated C compilers have pretty much gone the way of the Dodo bird. All widely used C compilers also compile C++. For the record, C and C++ are two different languages with two different ISO standardization committees. But for all practical purposes, C is a subset of C++.

 

Some developers like to stick to classic C. Most open source operating systems are written without using any of the extended features of C++. These groups will treat the use of any C++ feature as a bug to be eliminated.

 

Most developers tend to adopt a subset of C++ features that they feel comfortable with. A fundamental tenet of C++ design philosophy is "you don't pay for what you don't use". If you don't want to use templates, iostreams, dynamic type information, garbage collection, etc., you can take comfort in knowing that you aren't paying a run time cost for those features you aren't using. The Mozilla group typifies this class of C++ developer. Mozilla developers follow a strict set of guidelines that lets them avoid coding around the idiosyncrasies of C++ implementations on the platforms they support. The top five rules are:

 

Don't use C++ templates : http://www.mozilla.org/hacking/portable-cpp.html#dont_use_templates

Don't use static constructors : http://www.mozilla.org/hacking/portable-cpp.html#dont_use_static_constructors

Don't use exceptions : http://www.mozilla.org/hacking/portable-cpp.html#dont_use_exceptions

Don't use Run-time Type Information : http://www.mozilla.org/hacking/portable-cpp.html#dont_use_rtti

Don't use namespace facility : http://www.mozilla.org/hacking/portable-cpp.html#dont_use_namespace

 

Others like to use all the ISO C++ features. Boost is an informal community of programmers dedicated to advancing and supplementing the C++ standard library. Boost takes ISO compliance as a base requirement. While we have found Boost libraries can be a bit of trouble to incorporate into existing code and down level compilers, their "ISO or bust" mentality has helped them produce the most elegant library extensions around.

 

Still others will reach even further and take advantage of special extensions available only in one C++ implementation. C++ Builder applications, for example, require Borland specific extensions. GCC provides the "long long" type as a 64-bit integer, where Microsoft Visual C++ offers __int64 for the same purpose. With a bit of careful macro coding, you can often take advantage of proprietary features and still remain portable.

 

Which Compiler?

 

The C++ implementation of choice in the open source world is GCC (GNU compiler collection). If fact there is very little to compete with it. While other compilers exist, the open source libraries available are rarely tested against anything but GCC. While we are not entirely enthusiastic about GCC, we also don't want to be in the role of bug testing the Boost and Mozilla libraries we use on other compilers. If you don't need third party libraries, you could certainly consider one of the commercial alternatives.

 

You can use GCC on Windows, but libraries are rarely tested against anything but Visual C++. So if you are committed to a strategy of open source code reuse, as we are, you quickly find that you have little choice other than to use Visual C++ on Windows, and GCC on everything else.

 

GCC (not counting the totally broken 2.96 version unique to Red Hat 7.0) makes a decent C++ compiler. While clearly more standards-compliant than Visual C++, GCC compilation times can range from long to tortuous. GCC debuggers are adequately easy to use, and a godsend for remote support, but not nearly as nifty as Microsoft Developer Studio. The GNU tools are more powerful than their Microsoft counterparts in their ability to be custom programmed for task specialization--the Microsoft development environment is notoriously weak in this area.

 

Visual C++ has a number of annoying quirks, but it remains our favorite C++ compiler. Thanks primarily to its precompiled header option, Visual C++ compiles faster than GCC, actually several times faster on code that makes frequent use of templates. For us, the integration of the compiler, debugger, and editor in Microsoft Developer Studio trumps the combination of emacs, gdb, and gcc for ease of use and productivity.

 

The biggest problem with VC++ is its age. Released in 1998, it lacks many of the language changes made by the ISO standards committees. No doubt the next version (.NET) will make up some ground in this area. Like GCC, VC++ has a voracious appetite for memory and processor cycles.

 

For compilation speed and ease of debugging, I usually develop everything first using Microsoft Visual C++ and then port it to GCC on Unix. Typically the only thing I need to write specifically for Unix is the make file (Visual C++-generated make files are not usable on Unix). Other developers do the opposite; develop everything on Linux first, and then recompile it on Windows. We have not seen any problems with using both approaches on the same project.

 

Compatibility Quirks

 

Don't expect your C++ code to work across multiple platforms on your first try. Several minor nuances can trip you. Not only do you have differences in compilers to watch out for, you also have minor differences in the "standard" library as well. None of these are showstoppers, and you will quickly learn to write your code to be cross platform compatible on the first cut.

 

Now I don't want to sound negative about the C++ ISO committee. I'm grateful for their diligent work in pressing forward an international standard for C++. But I believe I am stating the obvious that the majority of cross platform issues came about from changes made in the language by the committee.

 

The one that seems to nail everyone is the for loop scoping.

 

Consider the loop:

 

        for( int i=0; i		

 

C++ versions prior to ISO treated i as though it were declared just before the loop. The ISO rules limit the scope of i to the loop.

 

One solution is to always declare you loops in the most banal way:

 

        int i=0;
        for( ; i

 

I use this method because it is easy and it works. It just looks weird.

 

You can convince an old compiler to use the new scoping rules with a simple dirty trick:

 

#define for if (1) for

 

This captures the lifetime of the variable within the for loop, per ISO standard scoping rules. You must be careful in your placement of this macro within your code, however. It will break any code that relies on the old rules. And it won't always break them at compile time. For example, this code compiles under both sets of rules, but returns a different answer:

 

int i=0;
foo ()
{
   for (int i=0; i<10; i++);
   cout << i << endl; // What value of i? ISO says 10, pre-ISO says 0.
}

 

Pity the compiler vendor faced with this subtle change. There is no way out. If you change to the new rules, you break existing code. But you have to change to the new rules, or you won't be standard. So, you have to add a switch to let the programmer pick between old rules and new rules. But now the programmer is in a pickle because he may be using one library that wants the old rules and one library that wants the new rules. The nature of templates and inline functions is such that both libraries will get compiled under the same compiler parameters. You could let it get set with a pragma, but by definition pragmas are not portable. Sheesh! Even Bjarne said as much that the original for scoping was probably a mistake, but the cure in this case was worse than the disease. If you try to use old rules or new rules you can expect, at the least, to be pummeled with well meaning warning messages from the compiler. As long as you can look at for loops that lack an initialization statement and not retch, that seems be the most pragmatic approach.

 

Microsoft defaults to pre-ISO scoping, while providing a switch to allow ISO scoping. The switch is rather useless, however, because the Microsoft library headers require pre-ISO scoping. GNU defaults to pre-ISO scoping with a warning message for code that would break with an ISO interpretation. One choice you have is turning off this warning message and continuing on in your pre-ISO ways.

 

There were numerous other ISO changes that cause cross platform disharmony: bool type, typename, and overload rule changes to name a few. The more compilers you use, the more of these issues you will bump up against.

 

Special GCC Quirks

 

GCC 3.0 (released June 18, 2001), offers a very high level of ISO C++ standards compliance and advanced code optimization. Unfortunately fixing some of the quirks of previous compilers, combined with a few new release bugs of its own, makes GCC 3.0 a bit difficult to use with large libraries (e.g. Mozilla, STLPort, ssh) at the time of this writing. No doubt the GCC and library communities will work through these issues in due course, but GCC 3.0 for a few months remains a tool for the adventurous. Most Linux distributions still use GCC 2.95.2. The GCC 2.95.2 library needs a bit of tweaking to bring it up to a reasonable level of standards conformance. First you need to find and install sstream and limits header files. The easiest way is to just install the Boost library (http://www.boost.org). Boost goes out of its way to "fix" implementations when it can. And GCC occupies a special place in the hearts of Boosters so it gets a very high level of support.

 

2.95.2 allows quite a few things that should not be allowed. See http://gcc.gnu.org for details. This is an area where it is useful to be using more than one compiler; it will find more errors in your code than any one compiler can.

 

We have tried every bug fix and level of the 2.96 compiler that ships with Red Hat 7.x and can only conclude that it is simply broken. While it fixes a number of problems with 2.95.2, it bombs with segv (segment violation) errors in random places on our code.

 

Special Visual C++ Quirks

 

Visual C++ has its share of problems. One problem that causes trouble with the Boost library is how Visual C++ does not declare its C function library within the std namespace as required by ISO. Boost "fixes" this through the use of using declarations:

 

namespace std
{
  using ::size_t;
  // etc.
}

 

But now if you have other code that attempts to uses the common statement "using namespace std;" Visual C++ gets confused because it's sees the same function come from two different namespaces. The best solution here is to avoid using "using namespace std" in your own code. This will make your code look a little funny with "std::" appearing everywhere. Or you could avoid using Boost libraries.

 

Summary

 

Using multiple compilers can be annoying at times, but with a little perseverance you will get the hang of it. Just don't be surprised when your first build with a new compiler fails with thousands of errors! As often as I find an annoying problem with the compiler, I find an error in my code that one compiler was going to let slide. Your code will be better for the trial.

 

Web Servers

 

Like C++ compilers, there's not much choice when it comes to Web servers. IIS dominates Internet Web server market share on Windows. Apache dominates every other platform. While it is tempting to develop only for Apache, the Apache(http://www.apache.org/) group makes the following caveat:

 

"Warning: Apache on NT has not yet been optimized for performance. Apache still performs best, and is most reliable on Unix platforms. Over time NT performance has improved, and great progress is being made in the upcoming version 2.0 of Apache for the Windows platforms. Folks doing comparative reviews of Webserver performance are still asked to compare against Apache on a Unix platform such as Solaris, FreeBSD, or Linux."

 

IIS only is not an option since IIS only supports Windows. Consequently most cross platform developers will have to develop with both IIS and Apache in mind. Fortunately, that's easier than it sounds. While each product carries its own baggage, the ISAPI API native to IIS and the FastCGI(http://www.fastcgi.com/) interface we use for Apache have both descended from the familiar CGI interface.

 

ODBC Performance Issues

 

The CGI interface is standard, portable, and easy to program. Unfortunately CGI performs unacceptably, especially for high concurrency SQL access using ODBC. CGI works like a command line application; it executes when the Web page is accessed, and it exits after the page is created. In an SQL/ODBC application, that means we have to do three expensive operations on every Web page retrieval: authenticate with the database, parse and optimize the SQL, and execute the query. To achieve acceptable performance, we need to reuse database connections and parsed SQL queries in future transactions. To do this, we have to keep the process making the connection alive between Web requests.

 

An unfortunate quirk of the ODBC API is that it lacks the ability to efficiently wait for multiple queries. You can wait until the query finishes (preventing you from doing anything else), or you can periodically poll to see if the query is finished. The best solution we have found is to run multiple threads, and just let the thread block until it is finished. It's a waste of threads, but it's cheaper than continually polling. Blocking achieves better response time as well.

 

Sometimes even a cheap access to the database may be too expensive. Every database request has to be formed by your program, munged by the ODBC library, wander down through the TCP stack, go across the wire to the database server… well, you get the idea. Even the cheapest database access can be expensive. You might not want to hit the database every time your home page is hit, for example. To eliminate hits to the database, you may need an in-memory cache--the equivalent of ASP's session variables. If you have multiple processes, you have to manage this cache in shared memory, which is by no means a pleasant task.

 

Efficient Multithreading with ISAPI and FastCGI

 

For all the reasons mentioned above, it is desirable, probably necessary, to use a multithreaded engine for high performance Web sites that need database access. But that's not saying it's going to be easy. Multithreaded programming, and especially debugging, is not for the faint of heart. The subtle and not so subtle cross platform nuances weigh heavily here, especially for anyone trying to support the major BSD variants. We always write our code in such a way that it can be conditionally compiled to run (albeit more slowly) in a single threaded, CGI build. That lets us debug the majority of the code in an easy fashion, and gives us a way to support platforms where threads are poorly supported or not available.

 

Multithreaded architecture is native to IIS--you are swimming against the tide a bit to use anything else. Apache is another story. Unlike IIS, Apache currently forks multiple processes to handle incoming Web requests. An Apache extension (typically called a "mod") ends up running in all these different processes. Coordinating a shared cache among these processes is difficult to do in a cross platform way. We use the FastCGI interface instead of the Apache internal interface so that we can easily take advantage of multithreading efficiencies.

 

FastCGI applications run in their own process, like CGI, but unlike CGI they don't quit when the Web request is finished. Instead they get ready to take the next request. If the FastCGI application runs multithreaded, it can service more than one request at a time. Using ISAPI on IIS, and FastCGI on Apache, allows us to maintain the same multithreaded architecture on both platforms.

 

The FastCGI developers took great pains to achieve a look that would be familiar to CGI application developers. FastCGI was also intended to work cross platform, not just on Windows and Linux, but any platform supporting TCP/IP.

 

The ISAPI developers took greater liberties in recasting CGI as an in-process API. The ISAPI API was designed to achieve very high performance when the ISAPI application (which are always created as dynamically linked libraries) is linked directly to the Web server core process. The ISAPI developers also took greater liberties in tying the design directly to the Windows API.

 

Early versions of IIS ran all ISAPI extensions in the same process as the Web server. This has the obvious downside that a single ISAPI hiccup could bring down the entire Web server. Beginning with version 5.0, IIS allows ISAPI applications to run in their own process, while transparently marshalling requests and responses to and from that process.

 

Cross Platform Threading: Safety First!

Writing thread-safe code is never easy in imperative languages like C++ and Java. The chief enemy of thread-safe code is the dreaded "race condition". Race conditions result when two threads operate on a shared piece of information without enforcing a stable order of operations. Race conditions are notoriously difficult to find using black box testing. Multithreaded code must be reviewed carefully by knowledgeable programmers looking for threading problems.

 

The biggest problem is trying to figure out what problems your libraries are going to cause you. Most people seem surprised that the standard C++ libraries are generally not as thread-safe as they would hope. For example, don't try this at home:

 

deque x;

void multiple_readers()
{
   for (;;) if (x.size()) { cout << x.back(); x.pop_back(); Sleep(1); }
}

void multiple_writers()
{
  for (;;) { x.push_back(1); Sleep(1); }
}

 

Sooner or later a core dump will ensue. Your code must synchronize access to data structures, even if they are from the standard library. This is not a deficiency in the standard library, as some have suggested. It's merely the reality of the colossal performance hit that would be caused by trying to enforce thread safety in the standard library. People would simply stop using it.

 

If that was the only issue, it would not cause much grief. Unfortunately there are more subtle issues are at work. GCC's implementation of the string class may or may not be thread-safe depending on how it is built (you need a platform-dependent file named "atomicity.h"--see http://gcc.gnu.org for details). You must also ensure that STL is configured to use multithreaded allocators. The biggest problem with this is that a small error in the make file can allow an error silently into the executable that can be very hard to detect.

 

Quoting the GCC documentation:

Extremely big caution: if you compile some of your application code against the STL with one set of threading flags and macros, and another portion of the code with different flags and macros that influence the selection of the mutex lock, you may well end up with multiple locking mechanisms in use which don't impact each other in the manner that they should. Everything might link and all code might have been built with a perfectly reasonable thread mode, but you may have two internal ABIs in play within the application. This might produce races, memory leaks, and fatal crashes. In any multithreaded application using STL, this is the first thing to study well before blaming the allocator.

 

Visual C++ programmers have their own problems with thread safety. And there's no "platform dependent" rug to hide under either. Consider this description from Dinkumware (VC++ library author) of a problem with the ubiquitous string class:

 

Please note that this implementation is still not [after applying available patches] as thread safe as it should be, because of the reference-counted implementation. A write to a ``copy'' of a string in one thread can confuse a read of that string in another thread, because the two still secretly share the same representation. One way to ensure that the non-const string str has a private representation is to call str.begin(). (By creating a mutable iterator into the string, you rule out sharing of representations.) Another way is to disable reference-counting altogether, making string operations thread safe. Simply change the value of _FROZEN to zero: enum _Mref {_FROZEN = 255}; // set to zero to disable sharing

 

One sledgehammer approach to fixing a number of limitations (including this string sharing issue) in the Visual C++ STL is to toss it in favor of STLPort(http://www.stlport.org/). This has been our approach. Since STLPort also works with GCC, so much the better. Of course, depending on your code base and company policies, you may find it difficult to make such a radical change. The STLPort programmers have done a fine job integrating their product with a wide selection of compilers, but replacing just the STL in Visual C++ is a bit like a heart transplant--it's tough to do without complications. The change wasn't without agony, but it was for the better.

 

Conclusion

These are just a few of the issues you'll face writing thread-safe code. I'm sure you will encounter ones of which I'm not yet aware. Sometimes writing multithreaded code feels like trying to do good security code--just when you think you've got it licked a new bug surfaces in some library you depend on.

 

At this point you might be glad to know that you can create multiprocessing programs using FastCGI and Apache that can achieve many of the benefits of multi-threading, without all the problems. You may find that this kind of solution is "fast enough", but keep in mind your competition may decide otherwise. If you don't plan for a multithreaded implementation, it may not be that easy to migrate to a multithreaded version in the future.

 

Functional languages like Haskell(http://www.haskell.org/) hold out some hope, as they are designed to avoid explicit threading in favor of implicit threading. To a Haskell programmer, poor cognizance of threading issues should result in slow code, but not incorrect code. Unfortunately Haskell remains a difficult choice for anyone trying to make a living writing code. While it mitigates some software engineering risks, it creates its own measure of business risk in adopting what is currently a fringe language with only academic implementations.

 

Database: Plenty of Options

 

The lack of database options held back the potential for enterprise class applications on Open Source platforms for many years. Sybase, Informix, IBM, and Oracle have all released Linux versions, but these never attracted much activity. None were truly free, let alone open source.

 

The past year has seen no fewer that four legitimate open source contenders step up to the task. SAP opened up its enterprise class ADABAS derivative dubbed SAP DB(http://www.sap.com/solutions/technology/sapdb/). Interprise/Borland opened up its well known Interbase product http://www.borland.com/interbase. PostgreSQL(http://www.postgresql.org/) matured from a slow, buggy curiosity to a solid database contender. MySQL (see both links http://www.mysql.com and http://www.mysql.org), long revered for its simplicity, has started to incorporate enterprise class features through the work of organizations like Nusphere(http://www.nusphere.com/) and Innobase Oy.

 

http://www.extremetech.com/image_popup/0,3969,s=0&iid=3250,00.asp

 

SAP DB

 

Our current open source database of choice is SAP DB. SAP DB burst on the open source scene in April 2001 complete with 24/7 enterprise class features and an impressive list of Fortune 500 clients. SAP DB descends from ADABAS, long a respected relational database contender.

 

SAP provides an amazing amount of support for SAP DB. SAP DB comes with voluminous documentation (especially by Open Source standards). SAP employees monitor the support mailing list continually. When we found a problem with the ODBC driver, SAP tracked it down and provided a solution in less than two hours. That level of support puts many paid support programs to shame.

 

The performance is excellent and the reliability is rock solid. The feature set is comparable to Oracle and SQL Server: Online backup, BLOB's, stored procedures, subqueries, replication, and Unicode. SAP DB provides JDBC and ODBC support on all platforms (yes, ODBC on Unix out of the box!). SAP provides database administration programs using command line, Web, and Windows GUI interfaces. Binary distributions are available for Linux, NT, IBM-AIX, SUN-Solaris, Tru64-Unix, and HP-UX.

 

Is there a downside to this? Just don't expect to dig in and start hacking. Consider the size of this puppy:

 

[b]Size Matters [/b]
Kernel                    95 MB in2905 files 
Tools                     16 MB in 578 files 
Progr. Interfaces      43 MB in 1087 files 
OS dependant         13 MB in 901 files 
Makefiles               < 500K in 814 files 
Unspecified            94 MB in 1309 files 

 

The source distribution comes as several million lines of code in C, C++, Pascal (this gets converted to C in the build process), Perl, and Python. Most of the file names are just six letters long, and many of the comments are in German. There are several quirks to the build process, which can make installing from source an arduous task. There is no official BSD port yet (but BSD's Linux compatibility has been made to work). There is no defined process yet for community contributions, so it will stay SAP's baby for the foreseeable future. This will make it a challenge for people with platforms not on SAP's supported platforms list.

 

SAP has a compelling business model for pursuing an open source strategy. While the database is free, SAP R/3 users that choose SAP DB must purchase a support license. Most users of SAP R/3 don't use SAP DB, however. They use Oracle, IBM DB2, or Microsoft SQL Server. And those three companies charge very hefty license fees upfront, as well as support costs. By opening the source of SAP DB, SAP has created a very compelling pitch for R/3 users that would have naturally gravitated to one of the database triumvirate including:

 

. No upfront cost

. Increased mind share without Oracle's advertising budget

. Increased pool of trained programmers to compete with Oracle's army of third party consultants

. Better security through completely open source code

. Increased number of third party support tools

 

All SAP has to do is convince a fraction of their customer base to pick SAP DB, and their lost software revenue will be outpaced by their increased maintenance revenue. The fact that Oracle is SAP's closest competitor in the enterprise accounting market likely influences their strategy. SAP executives are probably not upset about any revenue Oracle might lose from their open source strategy.

 

Borland Interbase

 

Borland also made waves by releasing their flagship database Interbase as an open source project. While it immediately attracted a group of interested developers, its transition to open source has not been a smooth one.

 

Interbase offers a relatively small footprint, advanced features, and special integration to Delphi/C++ Builder developers. The Interbase programming community is active and talented. Several significant contributions have already come from outside Borland.

 

We decided against Interbase based on the strength of the SAP DB product and a few specific concerns about Interbase. We noted Interbase developers often struggle with scaling issues due to lock granularity. The ODBC driver for Interbase is not part of the open source release. Borland's support of the product has been repeatedly questioned as serious bugs go unfixed. Borland is trying to walk a fine line by providing source to the open source community while continuing to offer Interbase a commercial product. This divided focus has alienated many open source developers. And Borland has made many confusing statements about their support for the open source version. This contrasts strongly with the SAP's apparent commitment to their open source product.

 

PostgreSQL

 

PostgreSQL began life as an open source project to continue the legacy of Postgres, which was of course a famous Berkeley computer science project led by the relational database pioneer Michael Stonebraker (also founder of Informix). PostgreSQL inherited some very advanced database technology from Postgres, but early versions were plagued with performance and reliability problems.

 

PostgreSQL turned a corner with version 7.0 and subsequent releases. PostgreSQL developers made it faster and more reliable, and the Open Source world took notice. PostgreSQL, already widely used in open source projects, is beginning to gain ground rapidly on the more popular MySQL. Red Hat recently announced that they would be supporting PostgreSQL with commercial support contracts and code contributions.

 

We passed on PostgreSQL primarily because of its weak Windows support, and again because of the strength of the SAP DB product. PostgreSQL provides no native Windows port of the server. While it is apparently possible to compile PostgreSQL under Cygwin, the performance obviously suffers when used in this manner.

 

However, PostgreSQL is significantly easier to extend than SAP DB. PostreSQL is written entirely in C and C++. Beginning from the original University project, PostgreSQL was designed to be a modular, extensible database. If you need to hack in your own database extensions and don't mind a Windows handicap, PostgreSQL could easily be your database of choice.

 

MySQL

 

The (still beta) release of Nusphere's Gemini table type moves MySQL into the enterprise space in a big way. MySQL is a database that began life as a simple SQL-based alternative to dbm and other Unix file managers. MySQL built its reputation by solidly integrating popular scripting languages like Perl across multiple platforms. MySQL has always offered high-speed query performance in read-mostly applications. Now the Gemini project promises to imbue MySQL with all the transaction processing and 24/7 operation of the big dogs. That makes it worth watching. Unfortunately, Nusphere and MySQL AB recently went through a very public "divorce". It will be interesting to see if Nusphere and MySQL AB can settle their differences short of a full fledged "code fork". Forking the code would leave MySQL AB without Gemini and Nusphere to do maintenance on its own version of MySQL.

 

ODBC: The Devil You Know

 

Database programmers have long struggled without the emergence of a true wire standard. Wire standards, like HTTP, SMTP, and any other Internet standard you can name, provide a precise description of how bytes should be exchanged between platforms. API standards like ODBC are an ordeal, because you need compatible software on both sides of the communication link that use the same proprietary format. Although a few efforts have attempted to forge the much needed wire standard, all have failed to muster the requisite support from the major database vendors. Perhaps an alliance of open source vendors might succeed where the major vendors have failed.

 

In the absence of a wire standard, ODBC has thrived. ODBC defines a C style API for accessing an SQL database. Database vendors usually provide software for both the client and the server. JDBC, essentially ODBC reimplemented in Java, has become the dominant database access method for Java as well.

 

Open Source ODBC offers a tremendous advantage over proprietary ODBC drivers--you can compile the ODBC driver on the platform of you choice. Want to access your database from your TIVO? No problem. Want to access your database through a CDPD link on your electric scooter? No problem.

 

ODBC offers a moderate amount of portability among databases. Typically 90% of the code will port without much effort, and 10% of the code will have to be customized to the database. Of course the more you want to take advantage of the non-standard database features like triggers, stored procedures, replication, etc., the more code you will have to customize.

 

ODBC drivers vary widely in their capabilities. Some are thread-safe, some are not. Some can encrypt data, some cannot. Some allow you to efficiently "prepare" SQL prior to using it, some do not. Good ODBC drivers offer equivalent performance to the "native" database connection. Bad ODBC drivers can be ten times slower.

 

 

Table 1 lists some important ODBC features for PostgreSQL, MySQL, and SAP DB. The Interbase ODBC driver is not open source.

 

ODBC features for PostgreSQL, MySQL, and SAP DB

 

PostgreSQL MySQL SAP DB

ODBC Version 2.5 2.50 3.51

API Conformance Level 1 Level 1 Level 2

SQL Conformance Core Core Extended

SAG Compliant No Yes Yes

Active Connections 128 No Limit 8

Active Statements No Limit No Limit 32767

Integrity Enhancement Facility No No No

Multiple result sets Yes Yes No

Accessible Procedures No No No

Accessible Tables No Yes No

 

Unsupported API Functions

SQLBrowseConnect, SQLColumnPrivileges, SQLDescribeParam, SQLParamOptions, SQLProcedureColumns, SQLProcedures, SQLTablePrivileges

SQLBrowseConnect, SQLColumnPrivileges, SQLParamOptions, SQLProcedureColumns, SQLProcedures, SQLSetCursorName, SQLTablePrivileges

None

 

Unsupported SQL Functions

Abs, Acos, Ascii, Asin, Atan, Atan2, Ceiling, Char, Convert, Cos, Cot, CurDate, CurTime, DayName, DayOfMonth, DayOfWeek, DayOfYear, DBName, Degrees, Difference, Exp, Floor, Hour, IfNull, Insert, Left, Locate2, Log, Log10, Minute, Mod, Month, MonthName, Pi, Power, Quarter, Radians, Rand, Repeat, Replace, Right, Round, Second, Sign, Sin, Soundex, Space, Sqrt, Tan, TimeStampAdd, TimeStampDiff, Truncate, UserName, Week, Year

Convert, Difference, TimeStampAnd, TimeStampDiff

Ascii, Char, Concat, Convert, Difference, Insert, Locate, Locate2, Log10, Power, Quarter, Rand, Repeat, Space, TimeStampAnd, TimeStampDiff, Truncate

 

Alter Table support Add Add, Drop Add, Drop

Column Aliases No Yes Yes

Correlation Names Yes Yes, table names must be different Yes

Order By Expressions No Yes No

LIKE escape character No Yes Yes

Order by fields not in select Yes Yes Yes

Outer Join Yes Yes Yes

Positioned statements None None Delete, Update, Select for Update

Subquery support Comparison, EXISTS, IN, Quantified None Correlated, Comparison, EXISTS, IN

Union Support UNION, UNION ALL None UNION, UNION ALL

 

Data Types

int8, char, data, numeric, float8, int4, lo, text, numeric, float4, int2, time, datetime, int2, bytea, varchar

bit,tinyint,tinyint unsigned, bigint, bigint unsigned, long varbinary, blob, longblob, tinyblob, mediumblob,long varchar, text, mediumtext, char, numeric, decimal, integer, integer unsigned, mediumint, mediumint unsigned, smallint, smallint unsigned, year, double, float, date, time, datetime, timestamp, varchar, enum, set

BOOLEAN, LONG BYTE, VARCHAR BYTE, CHAR BYTE, LONG, CHAR, DECIMAL, FIXED, INTEGER,SMALLINT, FLOAT, REAL, DOUBLE PRECISION, DATE, TIME, TIMESTAMP, VARCHAR

 

Maximum VARCHAR 254 255 4,000

Maximum LONG VARCHAR Unlimited 2 GB 2 GB

Maximum DECIMAL 1,000 digits 19 digits 38 digits

Searchable LONG VARCHAR All except LIKE Yes No

Maximum Column Name Length 32 64 64

Columns in Group By Unlimited Unlimited 16

Columns in Index Unlimited 16 16

Columns in ORDER BY Unlimited Unlimited 16

Columns in SELECT Unlimited Unlimited 254

Columns in Table Unlimited Unlimited 255

Maximum Row Size 4096 1,179,648 Unlimited

Maximum Row Size Includes Long Yes Yes No

Maximum Index Size Unlimited 120 255

Maximum Statement Length Unlimited 8,192 Unlimited

Maximum Tables in SELECT Unlimited 32 16

Transaction Capabilities DDL and DML None DDL and DML

Cursor Commit Behavior Close Cursors N/A Preserve Cursors

Cursor Rollback Behavior Close Cursors N/A Close Cursors

Isolation Levels Read Committed N/A Read Uncommitted Repeatable ReadRead Committed Serializable

Multiple Active Transactions Yes N/A Yes

Bookmark Persistence Scroll None Delete, Scroll, Update

SQLSetPos Lock Types SQL_LOCK_NO_CHANGE None SQL_LOCK_NO_CHANGE

Static Sensitivity No No Updates

Non-nullable columns Yes Yes Yes

Procedures Yes No Yes

Detect Row Updates No No Yes

 

Fetch Direction

Next, First, Last, Prior, Absolute, Relative, Bookmark

Next, First, Last, Prior, Absolute, Relative

Next, First, Last, Prior, Absolute, Relative, Bookmark

 

GetData Extensions

Any Column, Any Order, Block, Bound

Any Column, Any Order

Any Column, Any Order, Block, Bound

 

Positioned Operations Position, Refresh Position Position, Refresh, Update, Delete, Add

 

Scroll Concurrency Read Only Read Only Read Only, Lock, Opt. RowVer, Opt./Values

Scroll Options Forward Only, Static Forward Only, Static Forward Only, Keyset, Static, Dynamic

JDBC Driver Yes Yes Yes

[/pre]

 

Security #1: OpenSSH and GPG

 

Writing security code is hard, thankless work. If you do it right, no one notices. Do it wrong and you get splashed all over bugtraq and slashdot. So why bother if you can let someone else do all the work? OpenSSH and GPG make a great security toolkit for most applications.

 

OpenSSH descends from SSH, which stands for Secure Shell. OpenSSH comes packaged with secure replacements for telnet, rcp (remote copy), ftp, and rsh (remote shell). You can also use OpenSSH to secure your own applications by taking advantage of its tunneling mode. Tunneling provides an encrypted connection between two computers. For example you can tunnel between an X Windows client and an X Windows server application in this way. You can easily tunnel communications among your own applications in the same manner. Since it emanates from the OpenBSD project, its license is as free as it gets.

 

OpenSSH and ODBC

 

Most ODBC drivers do not support encryption, and many do not even encrypt the password. OpenSSH port-forwarding provides a way to easily use many existing client/server applications securely over the Internet.

 

You must install OpenSSH on both the client and the server (for Windows, use the Cygwin port). Determine that your ODBC driver works over a simple TCP connection (most do). Find out which port the server connects from-- you can typically configure this parameter. For example MySQL defaults to port 3306, but allows any port to be configured. The server must be running sshd (On Windows you can do this using Microsoft's srvany.exe).

 

To set up a tunnel, launch a copy of OpenSSH on the client using a command line parameter like:

 

ssl -L 3306:mysql_server:3306

 

Next, you can configure your ODBC driver to connect to localhost instead of the database server. If everything is configured correctly, OpenSSH will proxy the connection through an encrypted tunnel to the server. Cheap and easy!

 

If your ODBC driver is more complicated than the above solution supports, you can set up a VPN using OpenSSH and ppp. See the VPN howto on www.linuxdoc.org for details.

 

OpenSSH doesn't provide a way to secure files for distribution. The best open tool for this job is Gnu Privacy Guard, or just GnuPG. GnuPG provides a full GPL implementation of OpenPGP. It's classic Unix style makes it great to use in background processes, shell scripts, and interactively. The documentation exceeds reasonable expectations--a beautifully written and produced piece available in five languages. The documentation also makes a very nice introduction to modern security techniques.

 

EDI Delivery with GNU Privacy Guard

 

A common way for EDI partners to deliver EDI messages over the Internet is to drop PGP encrypted EDI messages in an FTP directory. While ftp leaves a lot to be desired as a secure protocol, GnuPG makes up for this deficiency by using its strong encryption capabilities.

 

Scripting this transfer using GnuPG and Perl is a piece of cake:

 

use Net::FTP;

$server = $ARGV[0];
$file = $ARGV[1];

system("gpg --batch --output $file.gpg -r $server --encrypt --sign $file");
$ftp = Net::FTP->new($server);
$ftp->login('user','password');
$ftp->binary();
$ftp->put("$file.gpg");
$ftp->quit;

 

Security #3: Mozilla

 

Netscape unleashed Mozilla(http://www.mozilla.org/) on the open source world over three years ago. It was big then, it's even bigger to day. Most developers pondered the 17,000,000 lines of source back in 1998 and quickly decided to leave the work to Netscape and AOL. It might be time to look again.

 

The Mozilla team broke that vast cosmos into manageable projects that are often useful in their own right. The most valuable of these are the Network Security Services (NSS) project and the Netscape Portable Runtime (NSPR) project. NSS provides a comprehensive set of security services including robust implementations of PKCS#11, SSL, and S/Mime. NSPR provides an impressive array of cross platform services including file services, threading, and interprocess communication.

 

At first we didn't think we needed (or wanted) NSPR, but using NSS required NSPR. We have since come to appreciate the value that it offers in its own right. NSPR is the best documented of any of the Mozilla libraries--think of the advantage of that! Documentation often takes as much time as programming, and to incorporate a well-documented library by referencing it, is much easier than writing it yourself.

 

What we really wanted was NSS. It's the same library used by the well-known Netscape/iPlanet application servers, so it inherits the bloodline of the oldest and most respected SSL implementation. But what really appealed to us about NSS is how it has been partitioned for dedicated security hardware. Even better, the API to the security library in NSS is the well documented and widely supported PKCS#11 standard created and maintained by RSA.

 

We like OpenSSL very much, but still chose to use the NSS over OpenSSL for most of our work. OpenSSL doesn't require the use of NSPR, which can be an advantage at times. It has a far more vocal and diverse community, which is certainly an advantage. We just liked the NSS architecture that much more. We planned from the start to offer hardware security options, and this is easier to do with NSS. OpenSSL has only recently begun the work to create a hardware API.

 

SoftToken: Hard Core Security

 

Good security is tough to maintain. As security guru Bruce Schneier has been known to say, its tough to maintain security when users continue to work in an unsafe manner. The best way to help users to work securely is to make it easy for them.

 

Smart Cards and other cryptographic tokens allow a simple lock and key metaphor that is easy for users to understand. Just insert the key when you need access. Most cryptographic tokens are harder to compromise than completely software driven ones, due to the tamper resistant, tamper evident casing, and the specialized processors. (There are exceptions. BugTraq has pointed out that some cryptographic tokens can be duplicated with a simple ROM burner.) Hardware cryptographic tokens should also provide a hardware supported random number generator (RNG). A hardware RNG is a very big asset--it's hard to make a cryptographically secure random number generator in software alone.

 

There are a great variety of cryptographic tokens: smart cards, Java rings, and USB dongles to name a few. Clearly there is a need for a standard cross-token compatible API that allows us to write our code in such a way that it will work with any token. The most widely supported and robust standard is PKCS #11 from RSA.

 

The most beautiful thing about Mozilla's security library is that implements PKCS #11 exactly in a library appropriately called SoftToken. For applications where a hardware token is overkill, you can use the Mozilla library by itself. When security needs dictate, you can seamlessly install hardware tokens without changing your code.

 

Boost'ing C++

 

Javasoft is constantly banging out new Java API's, but how does the C++ library grow? Boost(http://www.boost.org/) was begun by members of the C++ standard committee Library Working Group(http://anubis.dkuug.dk/jtc1/sc22/wg21/) to provide free, peer-reviewed portable libraries to the C++ community. Boost serves as a proving ground for potential additions to future versions of the C++ standard library.

 

The Boost library we most rely on is regex, a regular expression library written by Dr. John Maddock. Regex has allowed us to use C++ for many tasks that we may have preferred Perl for previously.

 

Boost also provides a variety of obvious and not so obvious extensions to the existing standard library. The Boost Pool library provides an easy to implement way to accelerate time critical memory management functions. The smart pointer library supplements the standard library's auto_ptr with four related, but task-specialized replacements. Type traits extends the concept behind the standard library numeric_limits template.

 

The best thing about the Boost group is their dedication to getting their code "right". Through their Yahoo group forum, they pound on each other's code until it is efficient, elegant, and correct.

 

More Cool Stuff We Like

 

Most open source projects have adopted either Expat (Mozilla) or Xerces C++ (Apache) for XML parsing. Expat seriously outperforms Xerces, but Xerces allows validation and supports W3C standards. We like them both, but we typically use Xerces for its validation.

 

Want to create cool looking GIF's for your Web site on the fly? Then have your lawyer call Unisys's lawyer and they can do lunch (if you forgot, reading and/or writing of GIF images requires a license to use Unisys' patented Lempel Ziv Welch (LZW) data compression and decompression technology). Otherwise get libpng www.libpng.org and create PNG graphics instead. PNG has better compression and more features than GIF anyway. Browser support is widespread.

 

Unfortunately animated PNG (better known as MNG) is not wide supported. But Flash is everywhere you want to be, and therefore you need Ming(http://www.opaque.net/ming/). Ming (not related to MNG) makes it easy to generate custom flash animations.

 

Need compression? You need zlib http://www.info-zip.org/pub/infozip/zlib. In fact, you probably need zlib even if you don't need compression, since libraries like libpng require zlib to build.

 

Contrary to popular hype, you can use garbage collection with C++. You just don't have to if you don't want to. C++ lets you choose. Many projects can gain a productivity boost and maybe even a performance boost by implementing Hans Boehm's conservative garbage collector for C++(http://www.hpl.hp.com/personal/Hans_Boehm/gc/).

 

There are two popular PDF libraries: ClibPDF(http://www.fastio.com/) and PDFLib(http://www.pdflib.com/). You must purchase a license to use either one commercially. If this dismays you, download the PDF reference library and write the PDF out directly. It's really not that hard, with a little help from zlib. Fundamentally, PDF is a text based document format that's only a little more difficult to write than XML.

 

Free CORBA implementations seem to have popped up all over the place. The best ones are OmniOrb(http://www.uk.research.att.com/omniORB/) and TAO(http://www.cs.wustl.edu/~schmidt/TAO.html). OmniOrb consistently beats TAO in performance tests and has less memory overhead. TAO is bigger (and probably a bit slower) because it is built on ACE, a useful library in its own right for distributed computing. In its favor, TAO has tracked CORBA standards more aggressively and carries less restrictive licensing terms. TAO has a more active user community and commercial support is more readily available. We use TAO.

 

 

PDF: Open Source Printing

 

The dominant printable format on the Web is PDF. HTML is nearly useless as a print format because every browser renders it differently. CSS helps a little, but even that gets interpreted differently. Adobe's PDF format allows reports generated on Linux or BSD servers to get downloaded and printed on any platform that supports PDF.

 

There are many ways to produce PDF. One simple tool chain starts by writing PostScript code and then running it though Ghostscript. For example we can pipe

 

 

%!PS-Adobe-2.0

100 600 moveto

/Times-Roman findfont

48 scalefont

setfont

(ExtremeTech Rocks!) show

showpage

 

through ghostscript like this:

gswin32 -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=file.pdf file.ps

 

and get this: extremetechpdf.

 

Postscript is a bit gnarly for typesetting much text, however. The grand-daddy of all open source typesetting programs, TEX http://www.tug.org, provides another great way to produce PDF files. If you go this route I highly recommend the combination of CONTEXT http://www.pragma-ade.com and pdfetex. Here you create something like this:

 

\starttext

No, it's not XML, but \CONTEXT\ with \TEX\ is the best way

to achieve high quality reports. \TEX\ has always had impressive math capabilities:

 

\placeformula[formula:aformula]
\startformula
y=x^2
\stopformula
\placeformula
\startformula
\int_0^1 x^2 dx
\stopformula

And tables:

\placetable[here][tab:os]{Operating System Mascots}
\starttable[|c|c|]
\HL
\NC \bf OS \NC \bf Mascot \NC\SR
\HL
\NC Linux \NC Penguin \NC\FR
\NC BSD \NC Daemon \NC\MR
\NC Windows \NC Clippy \NC\MR
\HL
\stoptable

\stoptext

 

Then run it through texexec to produce this: extremetech2.pdf

 

And if using SGML is more too your liking you can use the OpenJade to TEX approach to take SGML coded documents like:

 



]>

OpenJade uses DSSSL, a forerunner to XSL.

 

DSSSL uses a LISP based syntax rather than XSL's XML based syntax.

 

then process them with DSSSL style sheets like this:

 



(root (make simple-page-sequence
	    (make scroll)))
		
(element p (make paragraph))

 

and turn them into this: extremetech3.pdf . OpenJade offers the additional advantage of producing RTF. This is great for producing boilerplate documents that need to be hand-edited. You can also write the RTF directly without a great deal of difficulty.

 

The fastest and most flexible way to produce PDF's is to make them yourself. For example this C++ program

// Intended as a simplified example of writing to a PDF file
// This is not intended as good design for a general PDF writing library

#include 
#include 

struct pdf_object;
struct pdf_marker
{
	pdf_object& o;
	unsigned value;
	
	pdf_marker(pdf_object& _o,unsigned _value)
		:o(_o),value(_value)
	{}
};

struct pdf_object
{
	std::list offset;
	
	
	// We use this method to force documenting
	// the object's reference id inline
	// a better method would be to allow symbolic
	// object names: left to the reader as an exercise!
	pdf_marker operator()(int x)
	{
		return pdf_marker(*this,x);
	}
	
	void write_trailer(std::ostream& os);
	
};

std::ostream& operator<<(std::ostream& os,pdf_marker& x)
{
	x.o.offset.push_back(os.tellp());
	return os << x.value >> " 0 obj\r\n";
}


void pdf_object::write_trailer(std::ostream& os)
{
	unsigned xref = os.tellp();
	os << "xref\r\n"
	   << "0 " << (offset.size()+1) << "\r\n"
	   << "0000000000 65545 f\r\n"
		 ;

	for (std::list::iterator p=offset.begin();
			p!=offset.end();
			p++)
				os 
					<< std::setfill('0') 
					<< std::setw(10) 
					<< *p 
					<< " 00000 n\r\n"
					;
	os >>
		"trailer\r\n"
		"<<\r\n"
		"/Size " << (offset.size()+1) << "\r\n"
		"/Root 1 0 R\r\n"
		"<<\r\n"
		"startxref\r\n"
		<< xref 
		<<"\r\n%%EOF"
		;
}



struct pdf_stream
{
	const std::string& v;
	
	pdf_stream(const std::string& x)
		:v(x)
	{}
};

inline std::ostream& operator<<(std::ostream& os,pdf_stream& x)
{
	return os << "<< /Length " << x.v.size() << ">>\r\n" 
		<< "stream\r\n"
		<< x.v
		<< "endstream\r\n"
		;
}


void main(int argc,char* argv[])
{

	std::ofstream os(argv[1],std::ios::binary);
	
	os
		<<
		"%PDF-1.0\r\n"
		<< obj(1) <<
		"<<\r\n" 
		"/Type /Catalog\r\n"
		"/Pages 3 0 R\r\n"
		"/Outlines 2 0 R\r\n"
		">>\r\n"
		"endobj\r\n"
		<< obj(2) <<
		"<<" 
		"/Type /Outlines\r\n"
		"/Count 0\r\n"
		">>\r\n"
		"endobj\r\n"
		<< obj(3) << 
		"<<\r\n"
		"/Type /Pages\r\n"
		"/Count 1\r\n"
		"/Kids [4 0 R]\r\n"
		">>\r\n"
		"endobj\r\n"
		<< obj(4) <<
		"<<\r\n" 
		"/Type /Page\r\n"
		"/Parent 3 0 R\r\n"
		"/Resources << /Font << /F1 7 0 R >>/ProcSet 6 0 R\r\n"
		">>\r\n"
		"/MediaBox [0 0 612 792]\r\n"
		"/Contents 5 0 R\r\n"
		">>\r\n"
		"endobj\r\n"
		<< obj(5) << 
		pdf_stream(
		"BT\r\n"
		"/F1 24 Tf\r\n"
		"100 100 Td (ExtremeTech Rocks!) Tj\r\n"
		"ET\r\n") <<
		"endobj\r\n"
		<< obj(6) <<
		"[/PDF /Text]\r\n"
		"endobj\r\n"
		<< obj(7) <<
		"<<\r\n"  
		"/Type /Font\r\n"
		"/Subtype /Type1\r\n"
		"/Name /F1\r\n"
		"/BaseFont /Helvetica\r\n"
		"/Encoding /MacRomanEncoding\r\n"
		">>\r\n"
		"endobj\r\n"
		;
		
	obj.write_trailer(os);
}

 

will produce this PDF file: extremetech6.pdf(ftp://ftp.extremetech.com/pub/extremetech/open_source/extremetech6.pdf). Utilizing all of the capabilities of the PDF format requires that you learn to write your own files. None of the API's or PDF processors available will allow you to utilize the full range of Acrobat Reader's features.

 

There are also two open source projects for creating PDF from XSL Flow Objects (XSL::FO).The Java based Apache project FOP(http://xml.apache.org/fop/) can take XSL:FO objects and produces PDF directly. PassiveTEX(http://users.ox.ac.uk/~rahtz/passivetex/) uses macros written in TEX to both parse and format the Flow Objects. PassiveTex produces higher quality than FOP and supports a comparable subset of XSL. If you must use an XSL path, PassiveTEX is probably your best open source option. But be warned: If you don't have at least a Masters in TEXnology, it won't be easy to get PassiveTEX up and running.

 

Each of the different approaches has a place, although I doubt any software system would ever use all of them. For performance reasons we often write directly to PDF's. Nothing can beat this for pure speed; your response time will rival or exceed that of piping back HTML. When you don't mind taking several minutes to produce a report, TEX is the way to go. The output is awesome, and more importantly it allows you to divide the design work from the programming work in a useful way. We sometimes combine approaches by outputting PDF graphics directly and then including them in a TEX file for further processing.

 

The XML and SGML approaches show promise, but the open source implementations are not really production quality in my estimation. OpenJade + Tex produces good output, but its not easy to see where running it through the extra OpenJade step really buys you that much. FOP is really preliminary. PassiveTex is very cool, but even its author Sebastion Rahtz seems to prefer going to Tex directly.

 

See http://slashdot.org/article.pl?sid=01/07/23/015254&mode=thread for more discussion of open source PDF tools.

 

Conclusion

 

It is relatively painless to create enterprise-class Web applications in a way that lets them run effectively on 100% open source platforms, without sacrificing the ability to run on Windows. It requires no portability compromises, no performance compromises, no software engineering compromises, and no significant capital outlay.

 

Is it news that you don't need to write programs in Java to get portability? Open source C++ has consistently outpaced Java in portability and performance. Last I checked, Java Web Server was never supported on more than two platforms before being abandoned as hopelessly overmatched by its C++ implemented competition. C coded Apache runs on everything from IBM mainframes to high-end bicycles and it's rocking along at 63% market share. Java may have a place in the world (right now I have one Java computer on my key chain and one in my cell phone), but it does not have a monopoly on portability.

 

Cross platform programming does not require big performance compromises. By using C++ we achieve state-of-the-art performance for our core applications. Open source databases hold their own against commercial counterparts. Open source security libraries offer speed and robustness fully equal to their commercial counterparts.

 

An open source strategy offers you an enormous amount of debugged and documented code that you don't have to write. We have our choice of threading libraries. We have a choice in security libraries. We have a choice in database management systems. Much of the code needed for your next project has already been written, debugged, and documented.

 

Cross platform programming does not require big budgets. The necessary Windows tools are inexpensive and the open source tools are free. All of the documentation and training materials are also freely available on the Internet. Of course, your time to learn how to use these new tools is not free. But in our experience, the investment will make you a better, more productive programmer, even if you never release an open source hosted application.

 

It no longer makes sense to lock yourself to a single proprietary platform. Fire up your FTP client and get downloading!

 

And please join our discussion forum(http://discuss.extremetech.com/n/main.asp?Webtag=extremetech&nav=messages&topfolder=3) to give us your inputs.

 

Links

 

Cross Platform Libraries

http://www.boost.org

http://www.mozilla.org

http://www.opaque.net/ming/

http://www.openssl.org

http://www.fastcgi.com/

http://xml.apache.org/

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

http://www.uk.research.att.com/omniORB/

http://www.cs.wustl.edu/~schmidt/TAO.html

http://www.stlport.com

 

 

Cross Platform Tools

http://gcc.gnu.org

http://www.gnupg.org

http://www.openssh.com

http://www.modssl.org/

http://www.apache.org/

http://www.zope.org/

http://www.python.org/

http://www.php.net/

http://www.haskell.org

 

 

Open Source Databases

http://www.postgresql.org/

http://www.mysql.com/

http://www.nusphere.com

http://www.ibphoenix.com/

http://www.borland.com/interbase/

 

 

Commercial Linux Compilers

http://www.comeaucomputing.com

http://www.pgroup.com

http://www.kai.com

 

 

Open Source Operating Systems

http://www.redhat.com

http://www.freebsd.org

http://www.openbsd.org

http://www.netbsd.org

http://www.debian.org

http://www.opensource.apple.com/

http://gnu-darwin.sourceforge.net/

 

 

PDF Tools

http://www.tug.org/

http://openjade.sourceforge.net/

http://www.miktex.org

http://users.ox.ac.uk/~rahtz/passivetex/

http://xml.apache.org/fop

http://www.pragma-ade.com

http://www.docbook.org/

http://www.fastio.com/ (ClibPDF)

http://www.pdflib.com/

[Top]
No.
제목
작성자
작성일
조회
524오라클「IBM DB2는 시대에 뒤떨어진 것?」
정재익
2002-08-29
3838
523최고성능 DB 실체 밝힌다
정재익
2002-08-29
4511
515Transaction management under J2EE 1.2
정재익
2002-08-23
4096
504Open Source for the Enterprise
정재익
2002-08-12
15429
477Storing XML in Databases
정재익
2002-07-30
3965
447제4회 Lag/Lead family에 대한 소개 및 활용 사례
정재익
2002-07-13
4850
446제3회 Aggregate Family(Reporting)의 소개 및 활용사례
정재익
2002-07-12
4576
Valid XHTML 1.0!
All about the DATABASE... Copyleft 1999-2021 DSN, All rights reserved.
작업시간: 0.046초, 이곳 서비스는
	PostgreSQL v13.1으로 자료를 관리합니다