Canonicalizing Open Knowledge Bases

Abstract : Open information extraction approaches have led to the creation of large knowledge bases from the Web. The problem with such methods is that their entities and relations are not canonicalized, leading to redundant and ambiguous facts. For example, they may store Barack Obama, was born in, Honolulu and Obama, place of birth, Honolulu. In this paper, we present an approach based on machine learning methods that can canonicalize such Open IE triples, by clustering synonymous names and phrases. We also provide a detailed discussion about the different signals, features and design choices that influence the quality of synonym resolution for noun phrases in Open IE KBs, thus shedding light on the middle ground between " open " and " closed " information extraction systems.
Document type :
Conference papers
Complete list of metadatas

Cited literature [32 references]  Display  Hide  Download

https://hal-imt.archives-ouvertes.fr/hal-01699884
Contributor : Fabian Suchanek <>
Submitted on : Friday, February 2, 2018 - 6:20:18 PM
Last modification on : Thursday, October 17, 2019 - 12:36:55 PM
Long-term archiving on : Thursday, May 3, 2018 - 12:28:11 PM

File

cikm2014.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Luis Galárraga, Geremy Heitz, Kevin Murphy, Fabian M. Suchanek. Canonicalizing Open Knowledge Bases. CIKM, Nov 2014, Shanghai, France. ⟨10.1145/2661829.2662073⟩. ⟨hal-01699884⟩

Share

Metrics

Record views

67

Files downloads

140