Clusteval logo ClustEval clustering evaluation framework
Navigation:
027Hints:

Here you see general information about the dataset.

Data Set 'coli_state' - General

coli_state

Text corpora contain occurrences of the same word in different contexts or senses. These texts can be analyzed automatically and word sense disambiguation can be employed to infer the context solely based on the contained words. For this purpose, pairwise similarities between single word occurrences can be calculated by masking and comparing word neighborhoods. Based on these similarities the occurrences can be clustered into potential contexts. This approach has been applied to a text corpus containing occurrences of the word "state".


Information Value
Aliascoli_state
Namecoli/state_N
Description:Text corpora contain occurrences of the same word in different contexts or senses. These texts can be analyzed automatically and word sense disambiguation can be employed to infer the context solely based on the contained words. For this purpose, pairwise similarities between single word occurrences can be calculated by masking and comparing word neighborhoods. Based on these similarities the occurrences can be clustered into potential contexts. This approach has been applied to a text corpus containing occurrences of the word "state".
Dataset formatSimilarity Matrix
DownloadClick (by downloading the dataset you accept the license presented below)
License
First 10 lines