By Gerard Salton

Offers a thought of indexing able to rating index phrases, or topic identifiers in reducing order of value. This results in the alternative of fine rfile representations, and in addition bills for the function of words and of glossary sessions within the indexing strategy.

This learn is ordinary of theoretical paintings in computerized details association and retrieval, in that options are used from arithmetic, laptop technological know-how, and linguistics. an entire conception of details retrieval may well emerge from a suitable blend of those 3 disciplines.

C. Left-to-right thesaurus transformation. The left-to-right transformation takes low frequency terms and transforms them into units of higher frequency by 49 A THEORY OF INDEXING grouping a number of the low-frequency entities into classes. The term classes are then characterized by frequency properties equivalent to the sum of the frequencies of the individual components. The classical way of combining individual terms into classes is by means of a thesaurus. Such a thesaurus specifies a grouping of the vocabulary, where items included in the same class are normally,considered to be related in some sense— for example, by being synonymous, or by exhibiting closely similar content characteristics.

Recall-precision tables are included for the three experimental collections in Table 9. 1, averaged over the 24 user queries that are utilized with each collection. TABLE 9 Comparison of binary and term frequency weighting with and without inverse document frequency normalization Binary Term frequency Binary with weights weights IDF weights with IDF $ /! 1 CRAN MED Time Term frequency A THEORY OF INDEXING 29 Four weighting procedures are used to produce the output of Table 9, including binary term weights £>,, term frequency weights /*, and binary as well as term frequency weights multiplied by an inverse document frequency factor, designated (IDF)k in Table 9.

Single terms retained; triples added. Pairs added; corresponding singJe terms deleted. are also superior to the/f • IDF combined term weighting system. C. Left-to-right thesaurus transformation. The left-to-right transformation takes low frequency terms and transforms them into units of higher frequency by 49 A THEORY OF INDEXING grouping a number of the low-frequency entities into classes. The term classes are then characterized by frequency properties equivalent to the sum of the frequencies of the individual components.