A common vocabulary is vital to smooth business operation, yet codifying and maintaining an enterprise vocabulary is an arduous, manual task. We describe a process to automatically extract a domain specific vocabulary (terms and types) from unstructured data in the en- terprise guided by term definitions in Linked Open Data (LOD). We validate our techniques by applying them to the IT (Information Tech- nology) domain, taking 58 Gartner analyst reports and using two specific LOD sources – DBpedia and Freebase. We show initial findings that ad- dress the generalizability of these techniques for vocabulary extraction in new domains, such as the energy industry.
IBM Watson Research Center