Methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. Basic idea: the **Distributional hypothesis**: linguistic items with similar distributions have similar meanings. Basic approach: collect distributional information in high-dimensional vectors, and define similarity in terms of vector similarity Models: latent semantic analysis (LSA), Hyperspace Analogue to Language (HAL), syntax- or dependency-based models, random indexing, semantic folding and various variants of the topic model.
