NGI Engineroom: Online privacy case study

Topic models are of great help in automating the analysis process of exploring the structure of a large set of documents by clustering documents based on the words that occur in them. It is assumed that documents on similar subjects tend to use a similar vocabulary. Topic modeling is one of the most powerful techniques in text mining used for discovering the latent topics that occur in collection of text documents and examining the relationships among them (Jelodar et al. 2017).

Visualization explanation: the bubbles represent topics identified by the LDA. The number of topics is arbitrarily set to 15. The positions of bubbles are determined by computing the distance between topics, and then the inter-topic distances are projected onto two dimensions (see: Chuang et al., 2012). The topic’s prevalence is shown by the size of the bubble. In the right panel, bars represent the individual terms that help interpreting the currently selected topic on the left. A pair of overlying bars represent both the corpus-wide frequency of a given term as well as the topic-specific frequency of the term (see: Chuang et al., 2012). E.g. the currently marked (active - red) topic depicts the cluster of working papers related to online privacy.

more iot news