Semantic Annotation of Textual Entailment (SemAnTE)

SemAnTE 1.0

An annotation project carried out on the Recognizing Textual Entailment (RTE) datasets. The annotation scheme addresses three types of modification that license entailment patterns: restrictive, appositive and intersective. These inferential constructions were found to occur in 81.21% of the entailments in the RTE 1-4 corpora and were annotated with cross-annotator agreement of 68% on average. The corpus contains 2,805 pairs of positive entailments, 2,278 of which were annotated for inferential semantic constructions.


SemAnTE 2.0

A theory-based corpus of textual entailment in which all pairs are annotated according to a semantic model of entailment and all inferences and lack thereof are explained formally.

This corpus was created using an annotation platform that integrates a typed-lexicon, a stochastic parser, a theorem prover and a user interface. It is a sound proof system; hence, when the annotations are used successfully for deducing the hypothesis from the text, this indicates that the underlying semantic theory accounts for the entailment. The platform is used within a methodology of Annotating-By-Proving: The text and hypothesis of a positive pair are considered well-annotated only if the annotations support an inferential chain. An extension of this methodology also covers negative pairs.

In the first part of the corpus we used the RTE as a source for creating 200 positive examples. In the second part we composed 200 couplets of positive and negative pairs which show a minimal contrast that the semantic theory can account for. In total, the corpus contains 600 annotated pairs in a positive-negative ration of 2:1.