Description:
This interactive map represents the landscape of the literature analyzed during the OID's first research cycle. Each point corresponds to a paragraph extracted from the analyzed papers, positioned in a semantic space based on its content's statistical embedding. The color coding highlights clusters of paragraphs that discuss overarching macro-topics, further subdivided into specific, detailed topics.
Use:
The map can be used to efficiently explore the 1,664 sources cited in the Observatory's report, offering insights into the thematic distribution of topics across the analyzed literature. Hover over any point to see the title of the paper that deals with that topic and click to open it in a google search. Use the search bar on the top left to look for words in the sources titles, along with the regional filters and the histogram to filter by region and publication date.
Method:
To create the map we first chunked our sources into meaningful paragraphs using LangChain. We then embedded the paragraphs using the all-MiniLM-L6-v2 model from SentenceTranformer and used umap for dimensionality reduction into a 2D semantic space. Finally, we employed hdbscan to identify topic clusters, and conducted content analysis to derive detailed and macro-topic labels. The datamapplot library was then used to create the visualization.
This map represents a statistical summary of the thematic content of the report. The network graph represents relations between the words in the report, placing them closer to each other the more they are related. The bigger the node, the more present the word is, signalling its role in defining what the report is about. The colors represent words that are closely related to each other and can be interpreted as a topic.
The map is generated by the OID using GarganText – developed by the CNRS Institute of Complex Systems –on the basis of the repot’s text. Starting from a co-occurrence matrix generated from report’s text, GarganText forms a network where words are connected if they are likely to occur together. Clustering is conducted based on the Louvain community detection method, and the visualisation is generated using the Force Atlas 2 algorithm.