PostgreSQL 8.3beta1 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Chapter 8. Data Types | Fast Forward | Next |
tsvector is a data type that represents a document and is optimized for full text searching. In the simplest case, tsvector is a sorted list of lexemes, so even without indexes full text searches perform better than standard ~ and LIKE operations:
SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector; tsvector ---------------------------------------------------- 'a' 'on' 'and' 'ate' 'cat' 'fat' 'mat' 'rat' 'sat'
Notice, that space is also a lexeme:
SELECT 'space '' '' is a lexeme'::tsvector; tsvector ---------------------------------- 'a' 'is' ' ' 'space' 'lexeme'
Each lexeme, optionally, can have positional information which is used for proximity ranking:
SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector; tsvector ------------------------------------------------------------------------------- 'a':1,6,10 'on':5 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12 'sat':4
Each lexeme position also can be labeled as A, B, C, D, where D is the default. These labels can be used to group lexemes into different importance or rankings, for example to reflect document structure. Actual values can be assigned at search time and used during the calculation of the document rank. This is very useful for controlling search results.
The concatenation operator, e.g. tsvector || tsvector, can "construct" a document from several parts. The order is important if tsvector contains positional information. Of course, it is also possible to build a document using different tables:
SELECT 'fat:1 cat:2'::tsvector || 'fat:1 rat:2'::tsvector; ?column? --------------------------- 'cat':2 'fat':1,3 'rat':4 SELECT 'fat:1 rat:2'::tsvector || 'fat:1 cat:2'::tsvector; ?column? --------------------------- 'cat':4 'fat':1,3 'rat':2
tsquery is a data type for textual queries which supports the boolean operators & (AND), | (OR), and parentheses. A tsquery consists of lexemes (optionally labeled by letters) with boolean operators in between:
SELECT 'fat & cat'::tsquery; tsquery --------------- 'fat' & 'cat' SELECT 'fat:ab & cat'::tsquery; tsquery ------------------ 'fat':AB & 'cat'
Labels can be used to restrict the search region, which allows the development of different search engines using the same full text index.
tsqueries can be concatenated using && (AND) and || (OR) operators:
SELECT 'a & b'::tsquery && 'c | d'::tsquery; ?column? --------------------------- 'a' & 'b' & ( 'c' | 'd' ) SELECT 'a & b'::tsquery || 'c|d'::tsquery; ?column? --------------------------- 'a' & 'b' | ( 'c' | 'd' )