Learning How to Correct a Knowledge Base from the Edit History

The curation of a knowledge base is a crucial but costly task. In this work, we propose to take advantage of the edit history of the knowledge base in order to learn how to correct constraint violations. Our method is based on rule mining, and uses the edits that solved some violations in the past to infer how to solve similar violations in the present. The experimental evaluation of our method on Wikidata shows significant improvements over baselines.

birth date". If the rule is taken as an ontological rule, then it would just infer that Spinoza has some birth date. If the rule is taken as a constraint, in contrast, the KB would be considered incorrect. Constraints are thus similar in spirit to database integrity constraints. In practice, constraints often have exceptions. Therefore, it is useful to allow data that does not respect them (in Wikidata, e.g., constraint violations are simply flagged). Nonetheless, by design, most of the constraint violations are not exceptions but actual errors and proposing to repair them is a good starting point when it comes to improving KB quality.
In this paper, we aim at learning how to repair constraint violations. Our goal is to help a KB editor by suggesting how to clean the data locally (providing a solution to a particular constraint violation) or globally (providing rules that can be automatically applied to all constraint violations of a given form once validated by the editor). To do that, we take advantage of the edit history of the KB. We use it to mine correction rules that express how different kinds of constraint violations are usually solved. To the best of our knowledge, this is the first work that builds on past users corrections in order to infer possible new ones. We validate our framework experimentally on Wikidata, for which the whole edit history of more than 700 millions edits is available. Our experiments show substantial improvements over baselines. More concretely, our contributions are as follows: • a formal definition of the problem of correction rule mining, • a dataset of more than 67M past corrections for ten different kinds of Wikidata constraints (13k constraints in total), 1 • a correction rule mining algorithm, together with an implementation for Wikidata, CorHist, 2 • a suggestion tool for users to correct data based on our mined correction rules, 3 • an experimental evaluation based both on the prediction of the corrections in the history and on user validation of the suggested local corrections.
facts are not necessarily false. They thus classically have only "correctness" constraints, such as disjointness or functionality axioms (corresponding to special cases of denial constraints and equality generating dependencies in databases).
To express also completeness constraints, several works propose to use description logics, with varying semantics [28,37]. Another possibility is to use queries that should or should not hold as constraints (see e.g., [24] for methods for writing constraint queries in SPARQL). Other approaches define constraint languages to specify conditions for RDF graph [9] validation, such as SHACL [22] or ShEx [8]. It has been argued in [30] that description logics under the closed world assumption are also suitable for constraint checking in RDF, which can then be implemented with SPARQL queries. In our work, we follow a similar path, using description logic axioms as constraints for RDFS KBs, because it corresponds best to what we observe in current real-world KBs.
Contrary to the above works, we do not aim at expressing constraints, but at repairing their violations. The correction rules we learn for this purpose are similar in spirit to active integrity constraints [11], which specify for each constraint a set of possible repair actions. This type of constraints has recently been applied to description logic KBs as well [32]. Conditioned active integrity constraints add conditions for choosing among the possible actions, and we propose, in a similar spirit, to take into account the context of the constraint violation to correct it. Different from the existing work [11,32], our goal is to mine correction rules automatically from the edit history of the KB.
Knowledge base cleaning. Several recent approaches have dealt with the interactive cleaning of KBs. The proposed methods detect when a constraint is violated, compute the responsible facts, and then interact with the user to find out how to update the KB. The goal is then to minimize the number of questions the user has to answer. This is done in various ways, which include taking into account the dependencies among the facts to check or the interaction between several constraints violations to define heuristics to choose the best question to ask the user [2,3,5,6].
Other approaches to improve the quality of a KB rely on statistics, clustering, or structural aspects of the KBs. The work of [31] uses statistics to add missing types to the KB, and to detect wrong statements. The work of [25] exploits the observation that cycles in the KB often contain wrong "IsA" relations. Again other approaches [1] use crowdsourcing to detect Linked Data quality issues. We refer the reader to Section 7.2 of [1] for a recent overview of approaches for data quality assessment.
Our method also exploits KB constraints. However, it differs from the above in that it learns the corrections automatically from the edit history. It thus taps a source of knowledge that has so far not been exploited.
Rule learning. Mining logical rules by finding correlations in a dataset is a well-established research topic. In particular, learning patterns in the data can be used for completing KBs [14,36]. An algorithm for learning conjunctive patterns from a KB enriched with a set of rules is described in [20]. Methods similar to association rule mining have also been used for induction of new ontological rules from a KB [33]. A more recent trend is to use embedding-based models for KB completion. A comparison between these models and usual rule learning approaches is reported in [27] and significant recent works in this area include [19,39,40].
In this paper, we use a vanilla rule mining algorithm inspired by [14]. Our contribution is not the rule mining per se, but the application of rule mining to the edit history of a KB in order to mine correction rules. This avenue has, to the best of our knowledge, never been investigated.

PRELIMINARIES
In this work, we use description logics (DL) [4] as KB language and as constraint language, because they are the foundation of the Semantic Web standard OWL [16].
Syntax. We assume a set N C of concept names (unary predicates, also called classes), a set N R of role names (binary predicates, also called properties), and a set N I of individuals (also called constants). An ABox (dataset) is a set of concept or role assertions of the form is a set of axioms whose form depends on the DL L in question, and expresses relationships between concepts and roles (e.g., concept or role hierarchies, role domains and ranges...). A knowledge base (KB) K = T ∪ A is the union of an ABox A and a TBox T .
In this work, we assume that T is a flat QL TBox [23], i.e., that L differs from the standard RDF Schema (RDFS) [17] only by allowing inverse roles in role inclusions. More precisely, T can contain concept inclusions of the form A 1 ⊑ A 2 (subclass), ∃P ⊑ A (domain or range), and role inclusions A KB can also be written as a set of RDF triples ⟨s, p, o⟩ where s is the subject, p is the property, and o the object, using special properties to translate concept membership and relationships between concepts and roles [29]. A concept assertion A(a) is written as ⟨a, rdf:type, A⟩, and a role assertion R(a, b) as ⟨a, R, b⟩. Flat QL TBox axioms can also be represented by single triples. For example, A 1 ⊑ A 2 is written as ⟨A 1 , rdfs:subClassOf, A 2 ⟩ and ∃R − ⊑ A is written as ⟨R, rdfs:range, A⟩.
Semantics. We recall the standard semantics of DL KBs. An interpretation has the form I = (∆ I , · I ), where ∆ I is a non-empty set and · I is a function that injectively maps each a ∈ N I to a I ∈ ∆ I (unique name assumption), ⊤ to ∆ I , each A ∈ N C to A I ⊆ ∆ I , and each R ∈ N R to R I ⊆ ∆ I ×∆ I . The function · I is straightforwardly extended to general concepts and roles, e.g.
An interpretation I satisfies an inclusion G ⊑ H , if G I ⊆ H I ; it satisfies an axiom (func P ) if P I is functional; it satisfies an axiom (trans P ) if P I is transitive; and it satisfies A(a) (resp. R(a, b)), if a I ∈ A I (resp. (a I , b I ) ∈ R I ). We write I |= α if I satisfies the DL axiom α.
An interpretation I is a model of K = T ∪ A if I satisfies all axioms in K . A KB is consistent if it has a model. A KB K entails a DL axiom α if I |= α for every model I of K .
where ψ is a conjunction of atoms of the form A(t ) or R(t, t ′ ) or of equalities t = t ′ , where t, t ′ are individual names or variables from q is a Boolean CQ (BCQ). A BCQ q is satisfied by an interpretation I, written I |= q, if there is a homomorphism π mapping the variables and individual names of q into ∆ I such that: π (a) = a I for every a ∈ N I , π (t ) ∈ A I for every concept atom A(t ) in ψ , (π (t ), π (t ′ )) ∈ R I for every role atom R(t, t ′ ) in ψ , and π (t ) = π (t ′ ) for every t = t ′ in ψ . We also consider as BCQs the queries true and false which are respectively always and never satisfied by an interpretation. A BCQ q is entailed from K , written K |= q, iff q is satisfied by every model of K . A tuple of constants ⃗ a is a (certain) answer to a CQ q(⃗ x ) if K |= q(⃗ a) where q(⃗ a) is the BCQ obtained by replacing the variables from ⃗ x by the constants ⃗ a.
We denote by answers(q(⃗ x ), K ) the set of answers of q(⃗ x ) over K .
A union of CQs (UCQ) is a disjunction of CQs and has as answers the union of the answers of the CQs it contains.
Canonical model. It is well-known that a flat QL KB K has a canonical model I K such that for every BCQ q, K |= q iff I K |= q. The domain of I K is the set of individual names that occur in K and

CONSTRAINTS
This section defines the constraints that can be imposed on a KB, and relates the problem of checking that a KB complies with these constraints to CQ answering over this KB.
Defining constraints. In this work, we consider two types of constraints: consistency constraints (which express that some statements are contradictory), and completeness constraints (which impose that certain statements should hold in the KB as soon as some others do). While violations of consistency constraints can only be solved by removing statements, those of completeness constraints can also be solved by adding statements. Definition 1 (Constraint): Constraints are built from complex concepts and roles defined by the following grammar rules: . . , a n } | ∃P · B where R ∈ N R , A ∈ N C , a 1 , . . . , a n ∈ N I . A consistency constraint is a concept inclusion of the form B 1 ⊑ ¬B 2 or of the form B ⊑ {a 1 , . . . , a n }, a role inclusion of the form P 1 ⊑ ¬P 2 , or a functionality axiom of the form (func P ).
A completeness constraint is a concept inclusion of the form B 1 ⊑ B 2 , a role inclusion of the form P 1 ⊑ P 2 , or a transitivity axiom of the form (trans P ).
This definition of constraints covers the "constraining" versions of the majority of the most popular DL axioms used on the Web of Data according to the ranking done by [15].
We assume without loss of generality that concepts of the form B 1 ⊔ B 2 or {a 1 , . . . , a n } with n > 1 appear only on the right side of inclusions, and not at all in negative inclusions of the form B 1 ⊑ ¬B 2 . For example, we assume that ∃P · (B 1 ⊔ B 2 ) ⊑ C is rewritten as ∃P · B 1 ⊑ C, ∃P · B 2 ⊑ C, and B ⊑ ¬∃P · {a 1 , . . . , a n } is rewritten as B ⊑ ¬∃P · {a 1 }, . . . , B ⊑ ¬∃P · {a n }. As usual, we abbreviate ∃P · ⊤ as ∃P. Example 1: As a running example, we consider the following KB K = T ∪ A and set of constraints C inspired from Wikidata. Our TBox T expresses that human beings and deities are persons. Our ABox A provides information on several individuals. Our constraints C state that there are three possible genders (consistency constraint), that those who have a mother or are a mother must be persons or animals, that a mother must have gender female, and that if a has mother b, then b must have child a (completeness constraints).
We say that a KB K satisfies a constraint Γ ∈ C if I K |= Γ, where I K is the canonical model of K . Otherwise, K violates Γ.
Example 2: In our running example, the KB K satisfies Γ 1 since ∃hasMother I K = {Zeus, Spinoza} and I K |= Person(Zeus) and I K |= Person(Spinoza). However, it violates Γ 2 because I K ̸ |= Person(Marques) ∨ Animal(Marques) while Marques ∈ ∃hasMother −I K . It violates Γ 3 and Γ 4 for similar reasons. Finally, {hasGender(Zeus, masculine), Γ 0 } has no model because of the unique name assumption (which enforces that the interpretation of masculine differs from those of male, female and nonbinary). Hence, I K cannot be a model of Γ 0 . Thus, K violates Γ 0 . ◁ Note the semantic difference between the constraints and the axioms of the TBox: The axiom (Human ⊑ Person) in the TBox makes every human an answer to the query asking for persons. In contrast, if we had put the axiom in the set of constraints, it would have required all human beings in the KB to be explicitly marked as persons. As another example, consider the axiom (func hasBirthdate) (which says that everyone can have at most one birth date). If this axiom appears in the TBox, it renders the KB inconsistent whenever a person is given two distinct dates of birth. This has severe consequences on the reasoning capabilities, since everything is entailed from an inconsistent KB. If this axiom is in the set of constraints, in contrast, then distinct dates of birth lead only to the violation of the constraint. This gives us relevant information without having any impact on the usability of the KB.
Checking constraints. We show that our setting allows us to check constraint satisfaction via CQ answering. For this purpose, we use a function π , which maps each constraint Γ ∈ C to a rule of the form ∃⃗ yφ(⃗ x, ⃗ y) → ∃⃗ zφ ′ (⃗ x, ⃗ z). This function is defined recursively as shown in Table 1. The left side of the rule is called the body and its right side the head. Example 3: In our running example, we obtain the following rules: The following proposition shows that this transformation is sound and that the rule body and head can be rewritten as CQ and UCQ.
Proposition 1: For every constraint Γ ∈ C, π (Γ) can be rewritten Proof. By our assumptions on the form of the concepts that occur in the left side of the inclusions or in the right side of a negative It is easy to show by structural induction that for every concept B (resp. role P), π (B, x ) (resp. π (P, x, y)) can be written as a UCQ q(x ) (resp. q(x, y)) and that answers(q(x ), K ) = B I K (resp. answers(q(x, y), K ) = P I K ). If Γ is a completeness constraint of the form B 1 ⊑ B 2 (resp. P 1 ⊑ P 2 ), or a consistency constraint of the form B ⊑ {a 1 , . . . , a n }, the result follows immediately since K satisfies Γ iff I K |= Γ. If Γ is a consistency constraint of the form B 1 ⊑ ¬B 2 (resp. P 1 ⊑ ¬P 2 ), ) and is empty iff Γ is satisfied. Since in this case h(⃗ x ) := false, answers(h(⃗ x ), K ) = ∅, and the desired relation holds. If Γ is of the form (func P ), since the answers of h(x, y, z) := y = z over K are all possible tuples of the form (a, b, b), h(x, y, z) := π (P, x, z), and it is easy to see that P I K is transitive iff . This notion allows us to define constraint violations: In this definition, the requirement that K violates Γ(⃗ a) may seem superfluous. Yet, if Γ is a completeness constraint, it may be the case that some V ⊆ K violates Γ(⃗ a), while K satisfies it. Example 4: In our running example, it is easy to see that the subset V 0 = {hasGender(Zeus, masculine)} is a violation of Γ 0 . Consider now V = {hasMother(Spinoza, Marques)}. V is a violation of Γ 2 , Γ 3 and Γ 4 . Indeed, it violates Γ 2 (Marques), Γ 3 (Marques), and Γ 4 (Spinoza, Marques) and K does not satisfy any of these constraint instances. However, . The next proposition relates constraint violations and justifications. A justification (also known as an explanation, axiom pinpointing, or MinAs) for the entailment of a BCQ is a minimal subset of the KB that entails the BCQ [21,34].

CORRECTIONS
We now turn to correcting constraint violations.
Solutions. We will make use of atomic modifications of the KB to define solutions to constraint violations.

Definition 3 (Atomic modification):
An atomic modification of a KB K is a pair m = (M + , M − ) of two sets of assertions or L-axioms that takes one of the following forms: where ⟨s, p, o⟩ K , ⟨s ′ , p ′ , o ′ ⟩ ∈ K , and ⟨s, p, o⟩ differs from ⟨s ′ , p ′ , o ′ ⟩ in exactly one component.
Thus, an atomic modification consists of two sets M + and M − , each of which is either the empty set or a singleton set. M + will be added to the KB, and M − will be removed from the KB. Since the sets contain at most one triple, we slightly abuse the notation and identify the singletons with their elements (e.g., we will denote the addition of ⟨s, p, o⟩ simply by (⟨s, p, o⟩, ∅)). A replacement is equivalent to a sequence of a deletion and an addition. We chose to keep it as an atomic modification because it corresponds to common knowledge base curation tasks, such as correcting an erroneous object for a given subject and predicate, or fixing a predicate misuse. Atomic modifications can be used to solve a constraint violation, as follows: Note also that every constraint violation has at least one solution, which consists of the deletion of any of its elements. Solutions may also be additions or replacements, as in the following example: Example 5: In our running example, the deletion (∅, hasGender (Zeus, masculine)) and the replacement (hasGender(Zeus, male), hasGender(Zeus, masculine)) are two possible solutions to V 0 for Γ 0 (masculine).
The deletion (∅, hasMother(Spinoza, Marques)) is a solution to V for the three constraint instances Γ 2 (Marques), Γ 3 (Marques) and Γ 4 (Spinoza, Marques). The additions (Human(Marques), ∅), (hasGender(Marques, female), ∅) and (hasChild(Marques, Spino− za), ∅) are solutions to V for respectively Γ 2 (Marques), Γ 3 (Marques) and Γ 4 (Spinoza, Marques). ◁ Good solutions. Our goal is to find "good" solutions to constraint violations, i.e., solutions that make the KB as close to the real world as possible. The basic requirement for a "good" solution is that it deletes only erroneous facts, and that it adds only true facts. We also prefer replacements over deletions as long as they fulfill this condition. For instance, in our running example, the replacement (hasGender(Zeus, male), hasGender(Zeus, masculine)) is better than the deletion (∅, hasGender(Zeus, masculine)), because it corrects erroneous information instead of simply erasing it.
In some cases, there may be no "good" solution that consists of a single atomic modification. Consider for example a completeness constraint of the form A ⊑ ∃P · B violated by {A(a)}. If A(a) is true, we should actually add both P (a, b) and B(b) for some b. We choose to define solutions as atomic modifications nevertheless to simplify the problem by reducing the size of possible solutions. This is a limitation of our approach since we will not be able to learn solutions that are not atomic. However, we will still be able to learn to add B(b) to solve the aforementioned constraint violation in the case where P (a, b) is already present.
The main difficulty in finding good solutions to constraint violations is that we do not have access to an oracle that knows the validity of all facts. This is the problem that all KB cleaning approaches face (cf. Section 2). Our idea is to exploit the history of the KB modifications to learn how to correct constraint violations. Definition 5 (Edit history): The edit history of a KB is a sequence of is an atomic modification. The edit history allows us to pinpoint how constraint violations have been corrected in the past. In order to avoid learning from vandalism or mistakes, we consider only those corrections that have not been reversed: Intuitively, (B, D) corresponds to the sequence of additions and deletions that leads from K i to the current state of the KB K p , that contains the solution, and that does not "undo" it.
Relevant past corrections. During the history of a KB, users can change not just the assertions of the KB, but also the TBox. However, the TBox is typically much smaller and more stable than the ABox. Therefore, the edit history of the TBox is not a rich ground for correction rule mining. Moreover, we are interested in learning solutions that correct constraint violations in the current KB K p . We thus consider only those past corrections that would have been corrections also under the current TBox. For example, assume that the TBox contained C ⊑ B. Assume that C (a) was added to correct a violation of the constraint A ⊑ B. If, in the meantime, the inclusion C ⊑ B has been removed, we do not want to learn from this past correction. The following definition formalizes these requirements.
We will now see how we can use the relevant past corrections to mine correction rules.

FROM HISTORY TO CORRECTION RULES
In this section, we propose an approach based on rule mining to learn correction rules for building solutions to constraint violations.

Extraction of the Relevant Past Corrections
Algorithm 1 constructs the set of relevant past corrections from the KB history. It consists of three main steps. First, it constructs patterns to spot KB modifications that could be part of a relevant past correction. Then it uses these patterns to extract atomic modifications that solved some violation in the past. Finally, the relevant past corrections are obtained by pruning those that have been reversed.

Algorithm 1 Construction of PCDataset
Input: set of constraints C, current TBox T p , history (K i ) 0≤i ≤p Output: set of relevant past corrections PCDataset // Construct correction seed patterns for all Γ ∈ C such that Γ(⃗ Let us explain our algorithm with our running example. Consider the constraint Γ 0 (x ) : ∃yhasGender(y, x ) → x = male ∨ x = female ∨ x = nonbinary. Assume that ⟨Zeus, hasGender, masculine⟩ was added between K 1 and K 2 , but then replaced by ⟨Zeus, hasGender, male⟩ between K 100 and K 101 . The first goal of the algorithm is to find out that the removal of ⟨Zeus, hasGender, masculine⟩ between K 100 and K 101 (as part of the replacement) may be part of a relevant past correction. We where D is a set of assertions, such that T p ∪ D contains a violation V of some constraint instance and (∅, M − ) (resp. (M + , ∅)) is a solution to V. Looking for correction seeds instead of computing the constraint violations for all constraints on all KB versions has the advantage of significantly reducing the search space.
To find such correction seeds efficiently, the first step of the algorithm precomputes for each constraint a set of atomic modification patterns that the possible correction seeds would match. In the example there would be only one pattern: the deletion pattern (_, ⟨?, hasGender, ?⟩), where _ can be anything so that it matches both the deletion of ⟨?, hasGender, ?⟩ and its replacements. Since we only consider past corrections that involve assertions, and want them to be relevant for the current TBox, computing the correction seed patterns can be done via query rewriting of the CQs in the body b (⃗ x ) and the head h(⃗ x ) of the constraint w.r.t. T p . Indeed, if T is a flat QL TBox, any CQ q(⃗ x ) can be rewritten w.r.t. T into a UCQ q ′ (⃗ x ) such that for every ABox A, answering q(⃗ x ) over T ∪ A amounts to answering q ′ (⃗ x ) over A [23]. Each atom that occurs in the rewriting of the body of a constraint corresponds to a deletion pattern, and each atom that occurs in the rewriting of the head of a completeness constraint corresponds to an addition pattern. We collect the patterns for the constraint Γ in the set Patterns(Γ). Note that it is not possible to solve a consistency constraint with an addition, which is why such constraints have only deletion patterns.
The second step of the algorithm verifies, for each correction seed, whether it solved some constraint violation in the past -i.e., whether K i contains some violations of some constraint instances that are not in K i+1 . If so, the modification between K i and K i+1 is a solution that solved these violations in K i . In the example we would have found the violation {⟨Zeus, hasGender, masculine⟩} of Γ 0 (masculine) in K 100 , which is not in K 101 . So we would have extracted that (⟨Zeus, hasGender, male⟩, ⟨Zeus, hasGender, masculine⟩) is a solution that solved the violation {⟨Zeus, hasGender, masculine⟩} of Γ 0 (masculine) in K 100 . We store this information as a tuple in the relevant past corrections dataset (the PCDataset), as shown in Table 2. Finding the constraint instances violated in K i or K i+1 is done via CQ answering (Proposition 1), and computing their violations amounts to computing BCQ justifications (Proposition 2).
The final step of the algorithm removes corrections that have been reversed. The result is thus the set of relevant past corrections.

Correction Rule Mining
Correction rules. The previous algorithm has given us a list of relevant past corrections (the PCDataset, exemplified in Table 2).
We now present our approach to mine correction rules from this dataset and the KB history. Definition 8 (Correction rule): A correction rule is of the form is a constraint that can be partially instantiated, i.e., some of its variables have been replaced by constants, ) is a pair of sets of at most one triple, x ∪ ⃗ y ∪ ⃗ z. A correction rule can be applied to a KB K when there exist tuples of constants ⃗ a, ⃗ b such that K violates Γ(⃗ a) (recall that this can be decided via CQ answering by Proposition 1) and K |= ∃⃗ zE (⃗ a, ⃗ b, ⃗ z).

The result of the rule application is then
Note that while the variables from E (⃗ x, ⃗ y, ⃗ z) that do not appear in Γ(⃗ x ) or in the head of r can be existentially quantified, those that occur in the head of r have to be free: they have to be mapped to individuals occurring in the KB in order to construct the result. Example 6: In our running example, we would like to learn the following correction rules: The context of the second rule says that if x is the mother of a human, then x must also be a human. The rule obtained by replacing Human by Animal would express how to solve a violation of Γ 2 in the context where y is an animal. ◁ Mining correction rules. We mine correction rules with Algorithm 2. This algorithm is an adaptation of the algorithm in [13,14] to our context, where we learn rules not from a KB but from the PCDataset and the KB history. We first adapt the definitions of the confidence and support from [13,14] to our case. The support of the body of a correction rule r for a constraint Γ is the number of violations of Γ stored in the PCDataset that could have been corrected by applying r . Such violations are associated with an instance Γ(⃗ a) of the partially instantiated Γ(⃗ x ) that appears in r and with an index i such that K i |= ∃⃗ zE (⃗ a, ⃗ b, ⃗ z) for some ⃗ b. These two conditions imply that r could be applied to the KB K i . Moreover, we need to check that the result of applying r to K i actually gives a solution to V.
and the result of the application of r to K i is a solution to V }.
The support of the rule r measures when the past correction is exactly the result of the application of the rule in the cases where it could be applied. Formally, sup rule (r ) = |RSup|, where Finally, the confidence of a correction rule r is conf (r ) = sup rule (r ) sup bod (r ) .

Algorithm 2 Correction rule mining
Input: PCDataset, (K i ) 0≤i ≤p , minsup, minconf , θ Output: correction rules // Generate basic rules For this purpose, it first generates a trivial rule r 0 for each entry of the PCDataset. This rule has as context simply the deletion part of the constraint past correction. This trivial rule is then transformed into several more general rules, which we call basic rules, each of which is obtained from r 0 by replacing some of the constants by variables. Formally, the algorithm uses all partial substitutions σ from constants to distinct fresh variables. It retains only those basic rules that meet the minimum support and confidence thresholds.
In the second step, the algorithm incrementally refines each rule by building up its context part E (⃗ x, ⃗ y, ⃗ z). This works similarly to the mining algorithm of [14]: Each refinement step adds one atom built from the KB concept and role names and the variables and constants that appear in the rule, plus at most one fresh variable. For this purpose, the algorithm uses the operators defined in [14]. If the resulting rule meets the minimum support threshold, and improves the confidence by at least θ , the rule is retained. Note that Algorithm 2 outputs only rules that would have been returned by the algorithm of [13,14] if evaluated with our confidence function. It prunes more rules because of the use of the θ and the minconf thresholds that are also used to do an early pruning of the rules during the context construction. Algorithm 2 can be easily parallelized by running it independently on each constraint and/or having multiple workers working on the same queue.
Applying correction rules. When all rules have been mined, they are sorted by decreasing confidence, breaking ties by help of the support (as it is done in [26] to build classifiers from rules). This set of rules then forms a program that can be used to fix constraint violations as follows. Given a violation V of a constraint Γ in K , choose the first rule r in the program that is relevant for Γ (i.e., that Then check whether r can be applied to V. The correction is the result of the rule application. Example 7: Assume we mined the rules r 1 and r 2 of the preceding example with confidence 0.9 and 0.8 respectively, and another rule r 3 := [Γ 0 (x )] : {hasGender(x, y)} → (∅, hasGender(x, y)) with confidence 0.5. The correction program is (r 1 , r 2 , r 3 ). To correct a violation of Γ 0 , i.e. a wrong value for the hasGender property, the program first checks whether r 1 is applicable. If so, it replaces masculine by male. Otherwise, it falls back to r 3 and removes the wrong value. To correct a violation of Γ 2 , it ignores r 1 that is not related to Γ 2 and either applies r 2 if the context matches or does nothing. ◁

EXPERIMENTS ON WIKIDATA
This section describes CorHist, which implements the framework introduced for Wikidata, and presents its experimental evaluation.

Wikidata
Wikidata is a generalist collaborative knowledge base. The project started in 2012, and as of July 2018, it has collected more than 500M statements about 50M entities. The data about each entity is stored in a versioned JSON blob, and there are more than 700M revisions. Wikidata encodes facts not in plain RDF triples but in a reified representation, in which each main ⟨s, p, o⟩ triple can be annotated with qualifiers and provenance information [38]. Wikidata knows the property instanceOf which is similar to rdf:type. It does not have a formally defined TBox, but knows properties such as subClassOf, subPropertyOf, and inverseOf. However, only the property subClassOf is used to flag the constraint violations. Therefore, we use only this property in our TBox, which thus contains simple concept inclusions.
We consider the set C of constraints built from ten types of Wikidata property constraints (see Table 3). They are the top Wikidata property constraints that can be expressed in DL, covering the majority of the most used constraints, as well as 71% of Wikidata property constraints. The remaining constraints are mainly about string format validation with regular expressions (52% of the remaining constraints) and about qualifiers (31% of them).

Dataset Construction
We stored the RDF version [10,18] of the Wikidata edit history in an RDF quad store. We used named graphs for the global state of Wikidata after each revision, and for the triple additions and deletions. Our dataset stores 390M annotated triples about 49M items extracted from the July 1st, 2018 full database dump. Table 3: Wikidata property constraints. R is the property for which the constraint is given. A constraint has several lines when it uses a property whose set of values may be specified or not. ♯constr. is the total number of constraints of the given type in Wikidata. ♯triples is the sum for all these constraints of the numbers of triples with the property R on which they apply. ♯violations is the number of violations for this constraint in Wikidata on July 1st, 2018. ♯past cor. is the number of past corrections we extracted from Wikidata history. t.o. indicates that we were not able to extract all past corrections because of timeout so that we sample them (we then indicate the number of corrections we extracted).

Name in Wikidata DL form
Rule form ♯constr. ♯triples ♯violations ♯past cor.
. . , a n } ∃yR(y, x ) → x = a 1 ∨ · · · ∨ x = a n 104 3.6M 4k 14k Item requires ∃R ⊑ ∃R ′ · {a 1 , . . . , a n } ∃yR(x, y) Value requires ∃R − ⊑ ∃R ′ · {a 1 , . . . , a n } ∃yR(y, We extracted the relevant past corrections as explained in Section 6.1. Wikidata revisions do not correspond exactly to atomic modifications in our sense. For example, Wikidata bots are able to change multiple unrelated facts about the same entity at the same time. Wikidata users also sometimes prefer to delete a statement then add another one with the same property instead of directly modifying the value, in order to clear the existing qualifiers and references. Therefore, we artificially created a replacement modification for every deletion with a neighboring addition by the same user, which shares at least two components of the triple (analogously for additions). For example, if the correction seed is the deletion of ⟨Zeus, hasGender, masculine⟩, and if this revision or a neighboring one adds ⟨Zeus, hasGender, male⟩, then we consider this a replacement. However, if the same revision added the triple ⟨Zeus, hasMother, Rhea⟩, then we would not consider this a replacement, because it does not share two components with the first one.
Since the TBox consists of simple concept inclusions and the constraint bodies contain only roles, the deletion patterns for correction seeds correspond directly to the atoms of the constraint body. In the same vein, only atoms in the head of the Type or Value type constraints need to be rewritten. To find the constraint violations solved by a correction seed, we make use of the fact that the correction seed allows us to know the constraint instance Γ(⃗ a), and we look for matches of the constraint instance body.
To speed up the execution for the four constraint types which have the highest numbers of past corrections, Type, Value type, Item requires statement and Value requires statement, we did not extract 4 The Wikidata constraint Type can be qualified to modify its meaning. We ignore these cases, which are marginal: they concern less than 6% of the Type constraints. The same goes analogously for Value type. 5 Inverse and Symetric are two distinct kinds of constraints in Wikidata but we treat them together since Symetric is actually a special case of Inverse.
all the past corrections but sample them as follows. We compute only the relevant past corrections that where applied between K i and K i+1 where i is a multiple of s := max (1, N /10 6 ) with N the number of triples with the property R of the constraint at hand. This sampling allows us to get a sufficient ground for rule mining for each constraint. In practice, it affects only the most frequent 0.9% of Type, 2% of Value type, 0.5% of Item requires statement, and 3% of Value requires statement constraints.

Mining Rules
The output of our method is a set of correction rules that form a program (Section 6.2). To evaluate such a program, we apply it to each of the constraint violations stored in the PCDataset, using the associated stage of the KB to evaluate the part of the context which is not the deletion part of the correction. Then we check whether the correction we compute is exactly the same as the one associated to the constraint violation in the PCDataset. The precision p of the program is given by the fraction of the corrections computed by the program that are actually the same as those that have been applied. The recall r of the program is the fraction of the constraint violations stored in PCDataset for which the program gives some correction. The F1 score is F 1 = 2 p ·r p+r . CorHist mines rules as explained in Section 6.2. In order to decrease the computation time, we only allow one atom p(s, o) in where Body(⃗ x, ⃗ y) corresponds to the part of the context that matches part of the constraint body, such that s is a variable of ⃗ x ∪ ⃗ y and o is a fresh variable or a constant.
Rules were mined per constraint. For each constraint, we split the set of extracted past corrections into a 70% training set, a 10% cross-validation set, and a 20% test set. The training set is used to mine the rules, the cross-validation set is used to determine the confidence threshold that maximizes the F1 score of the obtained program, and the test set is used to evaluate the final program.   Table 4 gives examples of rules mined by CorHist. Several of these rules show the crucial importance of the instantiation of the constraint and/or of the context to be able to choose the correction. For instance, the rule for the Single value constraint uses the fact that an entity involved in a property "member of sport team" is probably a human being, and thus that if it has several values for the functional property "sex or gender" and one of them is a value reserved for non-human organisms in Wikidata, this value is probably wrong. In the same vein, the rule for the Item requires statement constraint recognizes that an entity has a heritage designation that is specific to Sweden ("monument in Fornminnesregistret"), to conclude that its country is Sweden. The rules also propose fixes to misused predicates: in Wikidata, the property "manner of death" is intended for the general circumstances of a person's death (such as "accident"), while the property "cause of death" is intended to give more precise causes (such as "traffic accident"). Table 5 presents the results of the evaluation of the mined programs against the test set. We computed both the micro and macro average of the precision, recall and F1 score per kind of constraint. The micro average aggregates over the whole set of relevant past corrections for the given kind of constraint, whereas the macro average computes the scores for each constraint of this given kind, and then computes the average. Both numbers are important: The micro average gives more weight to correction rules that fix many violations. It thus measures the overall impact of the correction rules on the dataset. However, if few rules had a large impact, then it would be easier to formulate these rules by hand. Our method, in contrast, can also find rules that by themselves solve less violations, but together contribute a large mass of corrections. To illustrate this, we also report the macro average: It measures the average performance across different constraints.

Evaluation against the Test Set
We compare our approach with two baselines: The first one, called "delete", is the most basic one and uses the fact that all Wikidata constraint bodies contain an atom of the form R(x, y) and the TBox contains only concept inclusions, so that all constraint violations contain an assertion that matches R(x, y). The "delete" baseline simply deletes this assertion. For the completeness constraints we define an additional baseline, "add", which tries to add  (A(x ), ∅). Value type constraints are handled in the same way. However, this baseline is not able to figure out what is the relevant addition correction for a constraint of the form ∃yR(x, y) → A 1 (x ) ∨ A 2 (x ) because there is no way to know a priori if A 1 or A 2 should be added. For Item requires statement  constraints of the form ∃yR(x, y) → R ′ (x, a), the "add" baseline applies (R ′ (x, a), ∅) (similarly for Value requires statement constraints). However, it cannot find a correction if there are multiple a i . As shown in Table 5, the precision of our approach significantly outperforms the two baselines -often by a very high margin. Regarding the recall, we manage to keep a reasonable, and sometimes even good, recall (see best F1 scores in Table 5), except for Single and Distinct value. The very low recall obtained for these two kinds of constraints is easily explainable because they are mostly used on predicates that link Wikidata to other databases (91% of the Single and 95% of the Distinct constraints), and we cannot get meaningful information about the target database to mine corrections.

User Evaluation
To see whether our corrections are accepted by the community, we designed a user study. We created a tool that suggests our corrections to Wikidata users for validation (available at https://tools. wmflabs.org/wikidata-game/distributed/#game=43). The user can choose a constraint type, and the tool then suggests corrections for random violations of constraints of this type (Figure 1). The violations for which corrections are suggested are provided by query.wikidata.org, which limits their number for performance reasons. For each proposed correction, the user has to choose between three options: apply the proposed correction to Wikidata, tag it as wrong, or get another correction to review. We ran the experiment for 3 months and 47 Wikidata users participated. Table 6 presents the results. The number of corrections reviewed is highly unbalanced between the kinds of constraints, mainly because a few users evaluate a lot of suggestions, and have a predilection for some kinds of constraints. It is thus difficult to draw conclusions for those kinds of constraints for which very few corrections have been evaluated. However, we can still make some interesting observations. In particular, the proposed corrections marked as wrong give us insights about possible weaknesses of our approach.
For the constraints which got a significant number of evaluations, our approach seems to perform well for Inverse and Symmetric, Conflict with and Value requires statement constraints, with approval rates above 80%. The other approval rates are lower. This is partly due to biases in the data. For example, when a gender is missing, our approach proposes the value "male" by default, because of the overrepresentation of men in Wikidata. Another issue is the quality of the constraints, which in Wikidata are sometimes questionable or difficult to understand (e.g., an incomplete set of possible types or values for completeness or One-of constraints).
However, even lower approval scores do not mean that our approach would be useless: Psychological research [12] shows that people find it much easier to choose from given options than to come up with an answer by themselves. The actual time needed to come up with an answer may vary, but if it takes just 3 times longer to come up with an answer than to accept or reject our proposed correction, then achieving a precision of 40% is already useful: If we have a precision of 40%, and if a free-form answer takes time t, then the expected answer time with our tool is 40%× 1 3 ×t +60%× 4 3 ×t < t.

CONCLUSION AND FUTURE WORK
We have introduced the problem of learning how to fix constraint violations from a KB history. We have also presented a method based on rule mining to this end. Our experimental evaluation on Wikidata shows significant improvement over baselines. Our tool is live on Wikidata and has already allowed users to correct more than 23k constraint violations. While our evaluation focused on Wikidata for which the whole edit history was available, we believe that our method can be applied in other settings, for example using edits done during the partial cleaning of an automatically extracted KB. For future work, it would be interesting to evaluate the impact of parameters such as the size of the context part of the correction rule in terms of rule quality. We also plan to extend the learning dataset with external knowledge (such as other KBs), or with information extracted from other sources (for instance from Wikipedia). We believe that this will allow finding even more precise correction rules, thus making KBs ever more precise and more useful.