PostgreSQL 8.3beta1 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Chapter 12. Full Text Search | Fast Forward | Next |
The previous section described how to perform full text searches using constant strings. This section shows how to search table data, optionally using indexes.
It is possible to do full text table search with no index. A simple query to find all title entries that contain the word friend is:
SELECT title FROM pgweb WHERE to_tsvector('english', body) @@ to_tsquery('friend')
The query above uses the english the configuration set by default_text_search_config. A more complex query is to select the ten most recent documents which contain create and table in the title or body:
SELECT title FROM pgweb WHERE to_tsvector('english', title || body) @@ to_tsquery('create & table') ORDER BY dlm DESC LIMIT 10;
dlm is the last-modified date so we
used ORDER BY dlm LIMIT 10 to get the ten most recent
matches. For clarity we omitted the coalesce
function
which prevents the unwanted effect of NULL
concatenation.
We can create a GIN (Section 12.5) index to speed up the search:
CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector('english', body));
Notice that the 2-argument version of to_tsvector
is
used. Only text search functions which specify a configuration name can
be used in expression indexes (Section 11.7).
This is because the index contents must be unaffected by default_text_search_config. If they were affected, the
index contents might be inconsistent because different entries could
contain tsvectors that were created with different text search
configurations, and there would be no way to guess which was which. It
would be impossible to dump and restore such an index correctly.
Because the two-argument version of to_tsvector
was
used in the index above, only a query reference that uses the 2-argument
version of to_tsvector
with the same configuration
name will use that index, i.e. WHERE 'a & b' @@
to_svector('english', body) will use the index, but WHERE
'a & b' @@ to_svector(body)) and WHERE 'a & b' @@
body::tsvector will not. This guarantees that an index will be used
only with the same configuration used to create the index rows.
It is possible to setup more complex expression indexes where the configuration name is specified by another column, e.g.:
CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(config_name, body));
where config_name is a column in the pgweb table. This allows mixed configurations in the same index while recording which configuration was used for each index row.
Indexes can even concatenate columns:
CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector('english', title || body));
A more complex case is to create a separate tsvector column
to hold the output of to_tsvector()
. This example is a
concatenation of title and body,
with ranking information. We assign different labels to them to encode
information about the origin of each word:
ALTER TABLE pgweb ADD COLUMN textsearch_index tsvector; UPDATE pgweb SET textsearch_index = setweight(to_tsvector('english', coalesce(title,'')), 'A') || ' ' || setweight(to_tsvector('english', coalesce(body,'')),'D');
Then we create a GIN index to speed up the search:
CREATE INDEX textsearch_idx ON pgweb USING gin(textsearch_index);
After vacuuming, we are ready to perform a fast full text search:
SELECT ts_rank_cd(textsearch_index, q) AS rank, title FROM pgweb, to_tsquery('create & table') q WHERE q @@ textsearch_index ORDER BY rank DESC LIMIT 10;
It is necessary to create a trigger to keep the new tsvector column current anytime title or body changes. Keep in mind that, just like with expression indexes, it is important to specify the configuration name when creating text search data types inside triggers so the column's contents are not affected by changes to default_text_search_config.