Cloudspotting: Visual Analytics for Distributional Semantics

One of the sources for empirical research in linguistics is corpora: large collections of machine-readable texts. They contain authentic usage by multiple people, allowing us to sample how language is used within a community, e.g. which expressions are used, what meanings they take, etc. However, the different meanings of a word are not normally coded in the text, and identifying them requires manual annotation, which takes a lot of time, energy and resources. This PhD investigates the possibility of applying a computational technique, namely distributional modelling, to descriptive semantic research. This technique represents words or instances of words as numbers based on their behaviour in corpora, based on the hypothesis that words that occur in similar contexts have similar meanings. The full procedure results in 2D scatterplots where each point is an occurrence of a word, and points that are close to each other occur in similar context and thus have, according to the Distributional Hypothesis, similar meanings. The different shapes that emerge in these scatterplots are called clouds.

As part of the investigation, we have developed an interactive visualization tool for a thorough exploration of the results: (1) how different settings impact the results and (2) to what extent meaning can be modelled with this technique. The first conclusion is that each setting has a different impact on each word, and no configuration gives the best result across the board. The second conclusion is that points that are close together occur in similar contexts but may not have the same meaning, and points that are far from each other because they occur in different contexts could still have the same meaning. Nevertheless, the visualization tool offers a way of exploring the results on different case studies and extracting semantic information that goes beyond the identification of different meanings.

Watch my defense here; find the slides here.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
_bookdown_files/phdThesis_files/figure-latex		_bookdown_files/phdThesis_files/figure-latex
assets		assets
docs		docs
extras		extras
src		src
tokenclouds @ 1e3027b		tokenclouds @ 1e3027b
.RDataTmp		.RDataTmp
.gitignore		.gitignore
.gitmodules		.gitmodules
00-gracias.Rmd		00-gracias.Rmd
01-introduction.Rmd		01-introduction.Rmd
02-workflow.Rmd		02-workflow.Rmd
03-nephovis.Rmd		03-nephovis.Rmd
04-case_studies.Rmd		04-case_studies.Rmd
05-nephological_shapes.Rmd		05-nephological_shapes.Rmd
06-semantic_interpretation.Rmd		06-semantic_interpretation.Rmd
07-no_optimal_solution.Rmd		07-no_optimal_solution.Rmd
08-conclusion.Rmd		08-conclusion.Rmd
09-afterword.Rmd		09-afterword.Rmd
README.md		README.md
_bookdown.yml		_bookdown.yml
_output.yml		_output.yml
_scientific-summary_Dutch.Rmd		_scientific-summary_Dutch.Rmd
afterbody.tex		afterbody.tex
app_1.Rmd		app_1.Rmd
index.Rmd		index.Rmd
phdThesis.Rproj		phdThesis.Rproj
phdThesis.aux		phdThesis.aux
phdThesis.bbl		phdThesis.bbl
phdThesis.bcf		phdThesis.bcf
phdThesis.blg		phdThesis.blg
phdThesis.lof		phdThesis.lof
phdThesis.lot		phdThesis.lot
phdThesis.out		phdThesis.out
phdThesis.run.xml		phdThesis.run.xml
phdThesis.toc		phdThesis.toc
preamble.tex		preamble.tex
references.Rmd		references.Rmd
titlepage.log		titlepage.log
titlepage.tex		titlepage.tex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloudspotting: Visual Analytics for Distributional Semantics

About

Releases

Packages

Languages

montesmariana/phdThesis

Folders and files

Latest commit

History

Repository files navigation

Cloudspotting: Visual Analytics for Distributional Semantics

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages