데이터베이스를 사랑하는 사람들의 모임 데이터베이스 사랑넷

우선은 postgresql-9.1 소스가 있는 곳에서 ./configure 를 해줘야 합니다. 이미 패키지 버전으로 깔려있다면 pg_config 명령을 통해 기존의 configure 명령어와 같은 옵션을 줘야 합니다.
그리고 make를 한 번 해줘야 contrib 아래의 소스들이 정상적으로 컴파일이 됩니다.

contrib/pg_trgm/trgm.h 파일을 열어서 15라인 쯤에 있는 #define KEEPONLYALNUM 라인을 주석처리 해줍니다.

그리고 make
그리고 make install

$ pg_config --libdir 해보시면 pgsql의 라이브러리들이 모여있는 경로가 나옵니다. 저같은 경우는 /opt/local/lib/postgresql91 인데 이 경로에 pg_trgm.so 이 잘 설치되었습니다.

$ psql test

psql (9.1.0)
Type "help" for help.

test=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION
test=# \df

                                                            List of functions
 Schema |          Name          | Result data type |                            Argument data types                            |  Type  
--------+------------------------+------------------+---------------------------------------------------------------------------+--------
 public | gin_extract_query_trgm | internal         | text, internal, smallint, internal, internal, internal, internal          | normal
 public | gin_extract_value_trgm | internal         | text, internal                                                            | normal
 public | gin_trgm_consistent    | boolean          | internal, smallint, text, integer, internal, internal, internal, internal | normal
 public | gtrgm_compress         | internal         | internal                                                                  | normal
 public | gtrgm_consistent       | boolean          | internal, text, integer, oid, internal                                    | normal
 public | gtrgm_decompress       | internal         | internal                                                                  | normal
 public | gtrgm_distance         | double precision | internal, text, integer, oid                                              | normal
 public | gtrgm_in               | gtrgm            | cstring                                                                   | normal
 public | gtrgm_out              | cstring          | gtrgm                                                                     | normal
 public | gtrgm_penalty          | internal         | internal, internal, internal                                              | normal
 public | gtrgm_picksplit        | internal         | internal, internal                                                        | normal
 public | gtrgm_same             | internal         | gtrgm, gtrgm, internal                                                    | normal
 public | gtrgm_union            | integer[]        | bytea, internal                                                           | normal
 public | set_limit              | real             | real                                                                      | normal
 public | show_limit             | real             |                                                                           | normal
 public | show_trgm              | text[]           | text                                                                      | normal
 public | similarity             | real             | text, text                                                                | normal
 public | similarity_dist        | real             | text, text                                                                | normal
 public | similarity_op          | boolean          | text, text                                                                | normal

(19 rows)

잘 설치되었습니다.

한글이 잘 처리되는지 확인합니다.

test=# SELECT show_trgm('테스트');
show_trgm
---------------------------------------
{0xf5d371,0xf74fa4,0x016735,0x7349c7}

결과가 이상하지만 이렇게 나오면 정상입니다.
빈 데이터가 나오면 contrib/pg_trgm 설치 과정이 잘못된 것입니다.

실제로 어떻게 동작하는지 보겠습니다.
임의의 테이블 testtable에 대해 적용해 보겠습니다. 테스트 데이터는 13만 건입니다.

test=# CREATE INDEX testtable_trgm ON testtable (productname gist_trgm_ops);

이렇게 인덱스를 생성합니다.

쿼리 테스트 #1 - ILIKE 검색

test=# EXPLAIN ANALYZE SELECT *, similarity(productname, '컴퓨터') as sm FROM testtable WHERE productname ilike '%컴퓨터%' ORDER BY sm DESC;

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=31.84..31.85 rows=7 width=91) (actual time=19.156..19.162 rows=95 loops=1)
   Sort Key: (similarity((productname)::text, '컴퓨터'::text))
   Sort Method: quicksort  Memory: 45kB
   ->  Bitmap Heap Scan on testtable  (cost=4.41..31.74 rows=7 width=91) (actual time=18.720..19.065 rows=95 loops=1)
         Recheck Cond: ((productname)::text ~~* '%컴퓨터%'::text)
         ->  Bitmap Index Scan on testtable_trgm  (cost=0.00..4.40 rows=7 width=0) (actual time=18.665..18.665 rows=95 loops=1)
               Index Cond: ((productname)::text ~~* '%컴퓨터%'::text)  Total runtime: 19.222 ms
(8 rows)

잘 나옵니다.

쿼리 테스트 #2 - % 연산자

test=# EXPLAIN ANALYZE SELECT *, similarity(productname, '컴퓨터') as sm FROM testtable WHERE productname % '컴퓨터' ORDER BY sm DESC;

                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=504.87..505.22 rows=139 width=91) (actual time=23.517..23.517 rows=32 loops=1)
   Sort Key: (similarity((productname)::text, '컴퓨터'::text))
   Sort Method: quicksort  Memory: 32kB
   ->  Bitmap Heap Scan on testtable  (cost=13.43..499.92 rows=139 width=91) (actual time=23.385..23.482 rows=32 loops=1)
         Recheck Cond: ((productname)::text % '컴퓨터'::text)
         ->  Bitmap Index Scan on testtable_trgm  (cost=0.00..13.39 rows=139 width=0) (actual time=23.341..23.341 rows=32 loops=1)
               Index Cond: ((productname)::text % '컴퓨터'::text)  Total runtime: 23.572 ms
(8 rows)

빠르죠?
근데 보시면 ILIKE '%...%' 와 % 연산자의 결과가 다릅니다.
% 연산자는 similarity가 정해진 수치보다 낮으면 결과에서 제외합니다.

test=# SELECT show_limit();
show_limit
------------
0.3
(1 row)

기본적으로 similarity의 기본값은 0.3입니다.

test=# SELECT set_limit(0.1);
set_limit
-----------
0.1
(1 row)

set_limit을 통해 이 값을 변경할 수 있고 수치가 낮을수록 결과값이 더 많아진다고 생각하시면 됩니다.
수치를 낮추고 쿼리를 다시 돌려보겠습니다.

test=# EXPLAIN ANALYZE SELECT *, similarity(productname, '컴퓨터') as sm FROM testtable WHERE productname % '컴퓨터' ORDER BY sm DESC;

                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=504.87..505.22 rows=139 width=91) (actual time=35.567..35.573 rows=138 loops=1)
   Sort Key: (similarity((productname)::text, '컴퓨터'::text))
   Sort Method: quicksort  Memory: 51kB
   ->  Bitmap Heap Scan on testtable  (cost=13.43..499.92 rows=139 width=91) (actual time=34.529..35.444 rows=138 loops=1)
         Recheck Cond: ((productname)::text % '컴퓨터'::text)
         ->  Bitmap Index Scan on testtable_trgm  (cost=0.00..13.39 rows=139 width=0) (actual time=34.477..34.477 rows=138 loops=1)
               Index Cond: ((productname)::text % '컴퓨터'::text)  Total runtime: 35.637 ms
(8 rows)

결과값이 32개 -> 128개로 늘어났습니다.

test=# SELECT productname, similarity(productname, '컴퓨터') as sm FROM testtable WHERE productname % '컴퓨터' ORDER BY sm DESC LIMIT 10;

   productname   |    sm    
-----------------+----------
 좋은 컴퓨터     | 0.571429
 인텔 컴퓨터     | 0.571429
 사람과 컴퓨터   |      0.5
 컴퓨터 아저씨   |      0.5
 컴퓨터 선생님   |      0.5
 야무진 컴퓨터   |      0.5
 생쥐와 컴퓨터   |      0.5
 향기나는 컴퓨터 | 0.444444
 컴퓨터 방문교실 | 0.444444
 컴짱컴퓨터      | 0.428571
(10 rows)

기본적으로 ILIKE 검색이 매우 빨라진 것이 마음에 듭니다. 이 외에도 저는 soundex를 조합하여 이래저래 해보는 중인데 재밌네요.
활용은 알아서들 잘 하세요~ ^^

[Top]

No.	제목	작성자	작성일	조회
10414	EXCLUDE 제약 조건	김상기	2023-04-15	1591
9299	plpgsql debuger 설치해서 사용하기	김상기	2013-06-19	8665
8971	타임라인을 이용한 시간여행	김상기	2012-03-16	9032
8849	9.1 trgm 활용하기	신기배	2011-10-06	11427
8577	배열 자료형 정렬 함수 [1]	김상기	2009-12-15	12145
7496	엑셀 FORECAST 함수 구현하기	김상기	2008-08-14	18707
7068	corr() 집계 함수	김상기	2007-03-01	14094

작업시간: 0.010초, 이곳 서비스는
PostgreSQL v17.5로 자료를 관리합니다