데이터베이스를 사랑하는 사람들의 모임 데이터베이스 사랑넷

Database

Community

자유게시판

도움주신분들

Admin

운영게시판

최근게시물

PostgreSQL Tutorials 3573 게시물 읽기

News | Q&A | Columns | Tutorials | Devel | Files | Links

No. 3573

Optimizing PostgreSQL
작성자	정재익(advance)	작성일	2001-10-22 22:39	조회수	8,515

PostgreSQL 의 최적화

원본 : http://www.phpbuilder.com/columns/smith20010821.php3

저자 : Ericson Smith

번역 : 정재익

Tim Perdue 의 MySQL 과 PostgreSQL 을 비교한 아주 좋은 글(http://www.phpbuilder.com/columns/tim20000705.php3)을 읽고 나는 이 데이터베이스를 설치해서 사용하기로 결정했다.나는 대부분의 작업을 사용하기 편하고, 비교할 수 없는 select query 의 속도 때문에 MySQL 을 사용했었으며, 지금도 그리하고 있다.

그러나 새로운 프로젝트에서는 많은 update 를 실행해야 하는데 MySQL 의 table locking feature 때문에 많은 고통을 받앗다.

우리가 설정한 PostgreSQL 서버 환경은 다음과 같다 :

. Redhat 7.1

. Dual PIII 650 MHz System

. 512 MB RAM

. 18 Gb SCSI for the postgresql data partition

다운로드와 설치하기

7.1.2 RPM 을 http://postgresql.org 로 부터 다운로드 받았다. 서버 설치를 위해서 단순히 postgresql-server 그리고 postgresql-7.1.2 (base) 만을 설치했다.

다음과 같이 서버를 기동했다 :

/etc/init.d/postgresql start

적은 크기의 데이터베이스 하나를 MySQL 로 부터 이동시켜 왔다 (5000 개 정도의 레코드를 포함한 세개의 테이블). PostgreSQL 최적화기가 사용할 충분한 인덱스를 생성했다. 그리고 이 테이블로 질의를 보내는 C로 작성된 작은 CGI 프로그램을 postgreSQL C client 인터페이스를 이용하여 수정했다. 이 작은 CGI 프로그램은 분당 수천개의 질의를 받도록 되어 있다.

최적화

CGI 프로그램에서 튜닝을 한후에 가장 처음으로 알게 된 것 중 하나는, 대부분의 질의들은 예전의 MySQL 기반의 시스템 처럼 빠르게 그 결과가 돌아오는데, 서버의 로딩이 훨씬 높다는 것이었다 -- 거의 90%에 육박하고 있었다. 그래서 나는 이것을 낮추기 위해 작업을 시작했다. 나는 이전에 MySQL 에서는 문제가 될만한 메모리 사이즈 보다도 더 크게 cache 와 buffer size 를 늘여 줌으로서 최적화를 시켰다.

PostgreSQL 을 실행하기 전에 가장 중요한 것은 충분한 shared buffer space 를 늘여 주는 것이다. 다시 반복해보자: 충분한 buffer space 를 제공해 줘라. 이 데이터베이스 서버에는 512 Mb 의 램이 있다. 여러분들은 이들 중 75% 정도를 shared buffer 로 전환시켜 줘야 한다. PostgreSQL 은 그 테이블의 대부분을 이 shared buffer 로 올릴수 있을때 가장 잘 또는 좀더 잘 동작할 수 있다. 우리의 경우 우리들 데이터베이스가 생각보다는 작았기 때문에 128 Mb 의 메모리만을 shared buffer space 로 결정했다.

/var/lib/pgsql/data/postgresql.conf 파일 내에 데이터베이스 서버에 대한 설정을 포함하고 있다. PostgreSQL 은 shared memory 를 buffer 로 사용한다. Linux system 에서는, 여러분들은 다음 명령어를 이용해서 시스템에 얼마나 많은 shared memory 가 할당되어 있는지 확인할 수 있다:

cat /proc/sys/kernel/shmmax

그리고 시스템이 사용하는 공유메모리는 다음 명령어로 볼수 있다:

ipcs

결과는 byte 단위로 보여 준다. Redhat 7.1 에서는 기본적으로 32 MB 의 공유메모리만을 할당하고 있다. 이것은 PostgreSQL 에서는 아주 모자라는 메모리이다. 나는 다음과 같은 명령어로서 이 한계를 증가시켰다.

echo 128000000 > /proc/sys/kernel/shmmax

만약 시스템을 리붓하게 되면 이 설정은 사라진다는 것을 명심하기 바란다. 여러분들은 이 명령어를 postgresql startup 파일 내부에 적어 놓는것이 좋다. /etc/sysctl.conf 파일을 수정하면 영구적으로 이 변화가 반영될 것이다.

그리고 나서 나는 postgresql.conf 파일에 shared_buffers 를 15200 으로 설정했다. PostgreSQL 은 8K segment 를 사용하기 때문에 128000/8192 + 512K overhead 해서 계산을 했다. 그리고 sort_mem 을 32168 (sort memory area 를 32M 로 설정)로 설정해 주었다. connection pooling 이 효율적으로 하기 위해서 max_connection 을 64로 설정했다. 그리고 fsync 는 false 로 설정해 주었다.

shared_buffers = 15200
sort_mem = 32168
max_connections=64
fsync=false

이들 설정을 조절하기 위해서 메뉴얼을 읽어 보기 바란다. 그러나 나는 그렇게 할 필요가 없다고 생각한다. 만약 여러분들이 shared_buffers 를 할당된 shared memory 한계보다 더 크게 설정했다고 생각해 보자. postgresql 은 기동시에 시작을 거부할 것이다. 이러한 사실은 처음에 아무런 로그 파일을 남기지 않았기 때문에 우리를 혼란 스럽게 만들었다. 여러분들은 /etc/init,d 내의 startup 파일에 로그 파일로 출력을 기록하도록 할수 있다. 다음과 같이 적혀 있는 부분을 찾아서...

/postmaster start > /dev/null 2>

다음으로 변경하도록 하라.

/postmaster start > /var/lib/pgsql.log 2>

(또는 여러분들이 로그를 적기 원하는 장소로 기록하면 될것이다)

이 로그파일을 쫓아 가 보면 문제가 무엇인지 알수 있을것이다.

모든 종류의 디버깅 정보들이 이 파일내에 기록될 것이다. 이것은 SQL 구문에러와 EXPLAIN state, emts, connection problem, authentication attempts 등등의 디버깅 정보들을 포함한다.

나는 postgresql 을 재기동하고 우리들의 CGI 를 온라인 시켰다. 우리들의 기우는 없어지고, postgresql 은 곧 buffer 를 사용하기 시작했다. 그리고 서버의 시스템 로딩은 10% 이하로 떨어 졌다.

초창기 시스템의 문제점은 postgresql connection 요구가 있을때마다 한번씩 이런 것들이 끊긴다는 것이었다. 그래서 나는 C library 의 connection pooling 방법을 사용하기 시작했다. 이 옵션으로 서버의 로딩이 순간적으로 증가하는 현상마저도 줄어 들었다. PHP 에서 여러분들은 이 효과를 얻기 위해서는 persistent connection (pg_connect 대신에 pg_pconnect 를 사용하라) 을 이용하면 될 것이다.

Indexes

PostgreSQL 에서 적절한 인덱싱의 필요성은 아무리 강조해도 지나치지 않다. 나의 초창기의 실수는 BIGINT column 에 대해서 인덱싱을 시도한 것이었다. 이 컬럼은 인덱싱은 잘되었다. 그러나 postgresql 은 그들을 이용하지 않는 것이었다. 이틀 후 나는 나의 머리를 쥐어 뜯어야 했다. 시스템 아키텍쳐가 32 bit 였던 것이다. 이것이 postgreSQL 은 64 bit (BIGINT) 를 인덱스로 사용할 수 없는 것일까? 이 자료형을 INTEGER 로 변경함으로써 이 문제를 해결할 수 있었다.내가 만약 64bit 의 Itanium processor 를 이용했다면 이런 문제는 없었을 것으로 생각된다.

결론

질의의 반응속도를 증가시키기 위한 SQL 구문을 변경시킬수 있다. 그러ㅏ 이들은 postgresql 문서에 적절하게 설명하고 있다.

=================

원문 내용

Optimizing Postgresql

Ericson Smith

Following Tim Perdue's excellent article on the comparison between MySQL and Postgresql, I decided to take a shot at installing and using this database. For most of our work I use MySQL and will continue to do so, because of its ease of use and unrivaled select query speed, and also because there is no point in trying to mess around with production systems that already work fine.

But some new projects suffered greatly from MySQL's table locking feature when I needed to update data (which I do a lot). Here are my adventures in setting up a Postgresql database server.

Our configuration for a dedicated Postgresql server was:

. Redhat 7.1

. Dual PIII 650Mhz System

. 512MB RAM

. 18Gig SCSI drive for the postgresql data partition

Downloading and Installing

I downloaded and installed the 7.1.2 RPM's from http://postgres.org without any trouble. For a server installation, I only installed: postgresql-server and postgresql-7.1.2 (base).

I then started the server up and running by executing:

/etc/init.d/postgresql start

A small sized database was ported from MySQL (three tables totaling about 5000 records). I created sufficient indexes for postgresql's optimizer to use, and modified our C application to use the postgresql C client interface for a small CGI program that would brutally query this table. This small CGI program receives thousands of queries per minute.

Optimizing

One of the first things I noticed after turning on the CGI program, was that although queries were returned almost as fast as from the previous MySQL based system, the load on the server was much higher -- in fact almost 90-percent! Then I started to go down into the nitty-gritty of things. I had optimized MySQL before by greatly increasing cache and buffer sizes and by throwing more ram towards the problem.

The single biggest thing that you have to do before running Postgresql, is to provide enough shared buffer space. Let me repeat: provide enough buffer space! Let's say you have about 512MB of ram on a dedicated database server, then you need to turn over about 75-percent of it to this shared buffer. Postgresql does best when it can load most or -- even better -- all of a table into its shared memory space. In our case, since our database was fairly small, I decided to allocate 128MB of RAM towards the shared buffer space.

The file /var/lib/pgsql/data/postgresql.conf contains settings for the database server. Postgresql uses system shared memory as a buffer. On a Linux system, you can see how much shared memory was allocated by your system by running the command:

cat /proc/sys/kernel/shmmax

And to view shared memory use on the system:

ipcs

The result will be in bytes. By default RedHat 7.1 allocates 32MB of shared memory, hardly enough for postgresql. I increased this limit to 128MB by doing the command:

echo 128000000 > /proc/sys/kernel/shmmax

Be aware that once you reboot the server, this setting will disappear. You need to place this line in your postgresql startup file, or by editing the /etc/sysctl.conf file for a more permanent setting.

Then in our postgresql.conf I set shared_buffers to 15200. Because Postgresql uses 8K segments, I made a calculation of 128000/8192 plus a 512K overhead. I also set our sort_mem to 32168 (32Megs for a sort memory area). Since connection pooling was in effect, I set max_connections to 64. And fsync was also set to false.

shared_buffers = 15200

sort_mem = 32168

max_connections=64

fsync=false

You can read the manual to tweak other settings, but I never had the need to do so. Note that if you set shared_buffers to more than what your shared memory limit is, postgresql will refuse to start. This confused us for a while, since no logging was taking place. You can tweak the startup file in /etc/init.d for the postmaster to write its output to a log file. Change the fragment from

/postmaster start > /dev/null 2>

to

/postmaster start > /var/lib/pgsql.log 2>

(or wherever you want to store the log.)

Tailing the log file clearly explained what the problem was.

All sorts of sexy debugging info will show up in this file, which includes SQL syntax errors, the output of EXPLAIN state, emts, connection problems, authentication attempts, and so forth.

I restarted postgresql and brought our CGI online. Our jaws collectively dropped to the floor as postgresql literally flew as soon as it started to use the buffer. Server load by postgresql dropped to just under 10-percent.

One hitch I found with an early version of the system was that it had to build up and tear down a postgresql connection with each request. This was intolerable, so I started to use the connection pooling features of the C library. Server load dropped another few notches with this option. With PHP you will want to use persistent connections (pg_pconnect instead of pg_connect) to fully take advantage of this effect.

Indexes

I cannot emphasize enough the need to have proper indexing in postgresql. One early mistake that I made was to index BIGINT columns. The columns were indexed ok, but postgresql refused to make use them. After two days of tearing out my hair, it came to me that the architecture of the system was 32 bits. Could it be that postgresql refuses to make use of a 64 bit (BIGINT) index? Changing the type to INTEGER quickly solved that problem. Maybe if I had one of those new-fangled 64 bit Itanium processors.

Conclusion

There are many things that you can do with your SQL statements to also improve query response, but these are adequately covered in the interactive postgresql documentation.

이 글에 대한 댓글이 총 1건 있습니다.

이 글에는 자칫 PostgreSQL 놈을 처음 접하는 사람들에게 오해를 불러 일으킬 여지가 충분히 존재하는 요소들이 많이 있습니다.

가장 중요한 것은 공유 메모리 버퍼는 어디까지나 각 세션에서 사용할 공유 메모리일 뿐입니다. 이놈이 너무 작아서도 안되지만, 너무 커서도 안되지요. - 너무 크면 결국 물리적인 메모리를 사용하는 범위를 벗어나 디스크로 구현된 가상 메모리까지 공유메모리로 할당해서 오히려 역효과를 가져오게 됩니다.

개인적인 견해입니다만, 먼저 최대 동시 접속 수를 구하고 PostgreSQL 문서에 있는대로 그 수의 두배 정도의 버퍼수를 지정하는 것이 지금까지 경험에 의한 결론입니다.

PostgreSQL 서버 최적화에 대한 이야기는 공유메모리에 촛점을 맞추는 것이 아니라,

얼마나 잦은 데이터 조작(insert, update, delete)이 일어나느냐에 따라서 vacuum 명령의 빈도수,

전원장치의 안정성을 판단에서 fsync 옵션을 끌 것인가? 켤 것인가?

sort 자료가 얼마나 많은가에 따른 sort 메모리.

....

등 꽤 복합적인 문제들을 함께 고려해야할 것같네요.

김상기(ioseph)님이 2003-09-04 01:46에 작성한 댓글입니다.
이 댓글은 2003-09-04 01:47에 마지막으로 수정되었습니다.

No.	제목	작성자	작성일	조회
3773	PostgreSQL의 JDBC 활용하기	정재익	2001-12-25	6677
3623	AGGREGATE funciton 함수 이용시 간단한 tip	정재익	2001-11-07	5860
3577	PostgreSQL 설치하기	정재익	2001-10-24	12858
3573	Optimizing PostgreSQL [1]	정재익	2001-10-22	8515
3561	Windows 용 PostgreSQL 의 설치 방법 [2]	정재익	2001-10-21	8110
3547	PostgreSQL 을 사용하면서 모은 몇가지 Tip 들...	정재익	2001-10-19	18148
3489	SSH 터널（tunnel)를사용한 안전한 TCP/IP 접속 [1]	송정훈	2001-10-12	5613

All about the DATABASE... Copyleft 1999-2024 DSN, All rights reserved.

작업시간: 0.019초, 이곳 서비스는
PostgreSQL v16.2로 자료를 관리합니다