PART 1 : What is a thesaurus ? Concept and samples

[Pages:75]PART 1 : What is a thesaurus ? Concept and samples

Christine Laaboudi-Spoiden Publications Office of the European Communities

EUR-LEX Unit ? Documentary section

Cape Town, June 2006

EUR-Lex ? Searching information

EUR-LEX

direct free access to European Union law

? the treaties, legislation, case-law and legislative proposals

? Official Journal of the European Union ? Official Journal L ? Legislation ? Official Journal C ? Information and notices ? Official Journal ? Special editions ? European Court Reports ? Documents of the institutions ? Consolidated texts

Cape Town, June 2006

EUR-Lex : Searching information

COMPUTER CRIME

? Title and text: computer crime 40 Hits

COMPUTER RELATED CRIME

? Title and text: computer crime 58 Hits

CYBERCRIME

? Title and text: cybercrime 55 Hits

CYBER CRIME

? Title and text: cyber crime 48 Hits

COMPUTER CRIME, CYBERCRIME, CYBER CRIME (Boolean - OR)

? Title and text: computer crime, cybercrime 129

Hits

USE OF SYNONYMS OR EQUIVALENT TERMS

Cape Town, June 2006

EUR-Lex sample ?Bibliographic Notice

TERMEESUDR'OINVDOEXCATDIOENSCouRDIPESTCORRIPSTEURS INDEXING TERMS PREFERRED TERMS CLASSIFICATION SCHEME SUBJECT HEADINGS

Cape Town, June 2006

Indexing process

Indexing = Identify the concept

Represented in a document

EUROVOC descriptor: information society, computer crime, personal data, electronic mail, confidentiality

For information retrieval (information request)

Title and text: computer crime, cybercrime 129 Hits

Content Indexing = only 1 process ! Searching = start again if the results are not

relevant to the question.

Cape Town, June 2006

Search Results

Relevant / Relevancy = relationship between a document and a request.

? The document is relevant to the topic ? It replies to the user's request

Pertinence = relationship between a document and an information need.

? Relevant and useful for a user ? Relevant but the user doesn't find it useful

(language, level of comprehensibility, type)

Irrelevant results = NOISE Non-retrieved results = SILENCE

Cape Town, June 2006

Causes of searching failures

Two words don't mean exactly the same thing Enormous range of choices of words and expressions No true synonyms, although words are often close in

meaning Words are not clearly understood Inconsistent use of words Users are unlikely to choose all the relevant terms The user might choose the terms used by the indexer

with a different understanding of meaning.

Cape Town, June 2006

Need of a controlled vocabulary

A controlled vocabulary = A consistent set of words/expressions, along with rules of usage, to be followed when indexing / searching

Nature of indexing language

A list of terms acceptable to users Mechanisms for structuring and using those terms Minimize the ambiguity of isolated vocabulary that

may be out of context

Cape Town, June 2006

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download