Towards Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection

Lev Konstantinovskiy; Oliver Price; Mevan Babakar; Arkaitz Zubiaga

« Back to publications

Towards Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection

Lev Konstantinovskiy, Oliver Price, Mevan Babakar, Arkaitz Zubiaga

ACM DTRAP. 2021.

In an effort to assist factcheckers in the process of factchecking, we tackle the claim detection task, one of the necessary stages prior to determining the veracity of a claim. It consists of identifying the set of sentences, out of a long text, deemed capable of being factchecked. This paper is a collaborative work between Full Fact, an independent factchecking charity, and academic partners. Leveraging the expertise of professional factcheckers, we develop an annotation schema and a benchmark for automated claim detection that is more consistent across time, topics and annotators than previous approaches. Our annotation schema has been used to crowdsource the annotation of a dataset with sentences from UK political TV shows. We introduce an approach based on universal sentence representations to perform the classification, achieving an F1 score of 0.83, with over 5% relative improvement over the state-of-the-art methods ClaimBuster and ClaimRank. The system was deployed in production and received positive user feedback.

@article{10.1145/3412869,
    author = {Konstantinovskiy, Lev and Price, Oliver and Babakar, Mevan and Zubiaga, Arkaitz},
    title = {Toward Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection},
    year = {2021},
    issue_date = {April 2021},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    volume = {2},
    number = {2},
    issn = {2692-1626},
    url = {https://doi.org/10.1145/3412869},
    doi = {10.1145/3412869},
    journal = {Digital Threats: Research and Practice},
    month = apr,
    articleno = {14},
    numpages = {16},
    keywords = {debates, classification, factchecking, Claim detection}
}