Semantic Annotation of Textual Entailment (SemAnTE)
An annotation project carried out on the Recognizing Textual Entailment (RTE) datasets. The annotation scheme addresses three types of modification that license entailment patterns: restrictive, appositive and intersective. These inferential constructions were found to occur in 81.21% of the entailments in the RTE 1-4 corpora and were annotated with cross-annotator agreement of 68% on average. The corpus contains 2,805 pairs of positive entailments, 2,278 of which were annotated for inferential semantic constructions.
A theory-based corpus of textual entailment in which all pairs are annotated according to a semantic model of entailment and all inferences and lack thereof are explained formally.
This corpus was created using an annotation platform that integrates a typed-lexicon, a stochastic parser, a theorem prover and a user interface. It is a sound proof system; hence, when the annotations are used successfully for deducing the hypothesis from the text, this indicates that the underlying semantic theory accounts for the entailment. The platform is used within a methodology of Annotating-By-Proving: The text and hypothesis of a positive pair are considered well-annotated only if the annotations support an inferential chain. An extension of this methodology also covers negative pairs.
In the first part of the corpus we used the RTE as a source for creating 200 positive examples. In the second part we composed 200 couplets of positive and negative pairs which show a minimal contrast that the semantic theory can account for. In total, the corpus contains 600 annotated pairs in a positive-negative ration of 2:1.