Canonicalizing Open Knowledge Bases

Luis Galárraga; Geremy Heitz; Kevin Murphy; Fabian M. Suchanek

doi:10.1145/2661829.2662073

Communication Dans Un Congrès Année : 2014

Canonicalizing Open Knowledge Bases

(1) , , (2) , (1)

1
2

Luis Galárraga

Fonction : Auteur

Laboratoire Traitement et Communication de l'Information

Geremy Heitz

Fonction : Auteur
PersonId : 1027551

Kevin Murphy

Fonction : Auteur

Google Inc.

Fabian M. Suchanek

Fonction : Auteur
PersonId : 12540
IdHAL : fabian-suchanek
ORCID : 0000-0001-7189-2796
IdRef : 203477707

Laboratoire Traitement et Communication de l'Information

Résumé

Open information extraction approaches have led to the creation of large knowledge bases from the Web. The problem with such methods is that their entities and relations are not canonicalized, leading to redundant and ambiguous facts. For example, they may store Barack Obama, was born in, Honolulu and Obama, place of birth, Honolulu. In this paper, we present an approach based on machine learning methods that can canonicalize such Open IE triples, by clustering synonymous names and phrases. We also provide a detailed discussion about the different signals, features and design choices that influence the quality of synonym resolution for noun phrases in Open IE KBs, thus shedding light on the middle ground between " open " and " closed " information extraction systems.

Domaines

Web Base de données [cs.DB]

Fichier principal

cikm2014.pdf (363.23 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Fabian Suchanek : Connectez-vous pour contacter le contributeur

https://imt.hal.science/hal-01699884

Soumis le : vendredi 2 février 2018-18:20:18

Dernière modification le : mardi 28 février 2023-15:36:24

Archivage à long terme le : jeudi 3 mai 2018-12:28:11

Dates et versions

hal-01699884 , version 1 (02-02-2018)

Identifiants

HAL Id : hal-01699884 , version 1
DOI : 10.1145/2661829.2662073

Citer

Luis Galárraga, Geremy Heitz, Kevin Murphy, Fabian M. Suchanek. Canonicalizing Open Knowledge Bases. CIKM, Nov 2014, Shanghai, France. ⟨10.1145/2661829.2662073⟩. ⟨hal-01699884⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM PARISTECH LTCI INFRES DIG

89 Consultations

348 Téléchargements

Canonicalizing Open Knowledge Bases

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager