Automated Fact Checking – Full Fact

Full Fact is building scalable, robust, automated fact checking tools to be used in newsrooms and by fact checkers all over the world.
If you want to use or test our automated fact checking software:
Bad information ruins lives. It harms our communities, by spreading hate through misleading claims. It hurts our democracy, by damaging trust in politicians and political processes. It leads to bad decisions, by disrupting public debate on the issues that most affect us, including climate change and public spending
Since 2015, we have been developing technology to help increase the speed, scale and impact of our and others fact checking. Our goal is to create a global collaborative effort to help media outlets, civil society, platforms and public policy makers better understand the landscape, and to bring the benefits of those tools to everyone by working in partnership.
We launched our roadmap The State of Automated Fact Checking in August 2016, where we set out a plan for making fact checking dramatically more effective using existing technology. In autumn of that year we were one of the first UK organisations to use the “Fact Check” label in Google news.
In November 2016, we announced support from Google’s Digital News Initiative for the first stages of our automated fact checking work, and we’re grateful for vital support from hosting experts Bytemark and open source search specialists Flax too. This funding helped build our first prototypes. In May 2019 we – along with Africa Check, Chequeado and the Open Data Institute – won the Google AI Impact Challenge. We are just one of 20 international winners, chosen from more than 2,600 entrants. Over the next three years, with Google’s support, we will use machine learning to dramatically improve and scale fact checking, working with international experts to define how artificial intelligence could transform this work, to develop new tools and to deploy and evaluate them.
We have made a set of tools designed to alleviate the pain points we experience in the fact checking process. As fact checkers with ten years experience, we understand the operational advantages these tools can bring, making us uniquely placed to build them.
We are not attempting to replace fact checkers with technology, but to empower fact checkers with the best tools. We expect most fact checks to be completed by a highly trained human, but we want to use technology to help:
Across a suite of products, our technology does the following tasks:
We start by collecting a range of data from leading news sites and social media platforms that may contain claims we want to fact check. Data we collect can be taken from speech on live TV, online news sites, and social media pages. We are able to add new monitoring inputs for fact checkers in other countries and have done so already for a number of countries in Africa.
Once we have all the input information available as text we split everything down to individual sentences, which are our atomic unit for fact checks. The sentences are then passed through a number of steps to enrich them and make them more and more useful in the process of fact checking.
We define a claim as the checkable part of any sentence which is made by a politician, journalist or online.
There are many different types of claims – ranging from claims about quantities (“GDP has risen by x%”), claims about cause and effect (“this policy leads to y”), predictive claims about the future (“the economy will grow by z”) and more.
We have developed a claim-type classifier to guide fact checkers towards claims that might be worth investigating. It helps us to identify and label every new sentence according to what type of claim it contains (whether it is about cause and effect, quantities, etc.).
We started building this with the recent BERT model published by Google Research and fine-tuned it using our own annotated data. BERT is a tool released by Google Research that has been pre trained with hundreds of millions of sentences in over 100 languages. This makes it a broad statistical model of language as it is actually used.
Labelling claims in this way filters the volume of data we could fact check from hundreds of thousands to tens of thousands. It is a vital first step in ensuring that the users of our tools have a chance to make sense of all the information.
Once we have labelled claims, sentences are checked to see if they are a match to something we have previously fact checked. Some claims are easier to model than others due to specificity and ambiguity in the language used to describe them.
The plan is to train a BERT-style model to predict match/no-match for sentences and then add in entity analysis (e.g. count if both sentences contain the sample numbers, people, organizations etc.). In combination, we hope these two stages will find repeats of a claim even if different words are used to describe it
Additionally, we semantically enrich the content to help our model detect semantically similar words and phrases. The first step is to identify people, places and other valuable entities, identifying the entities of interest and matching them to external URIs. We then deduplicate the information across multiple sentences, to identify and group together semantically similar references (e.g. ‘the prime minister’ and ‘Boris Johnson’). This allows us to extract greater value from the data we process and means we can make sophisticated interfaces showing all statements made by individuals. We currently use wikidata via Google big query to power this service.
Finally, we use external processes to help spot more claims and further identify patterns of language that can be automatically checked.
Given a sentence, our tool attempts to identify the topic, trend, values, dates and location. If that succeeds, it compares the extracted information with the corresponding data via the UK Office For National Statistics API. It knows about 15 topics and c.60 verbs that define trends (e.g. rising, falling). This means our technology can automatically match with significantly more data to identify whether it’s correct.
The fact checking process is often undertaken offline. We then publish the results on our website. We also describe each fact check with some very specific markup, called ClaimReview. This is part of the wider schema.org project. It describes content on a range of topics in domain specific terms. This is important for us as describing our content so specifically helps ensure that our fact checks can travel further than our own platforms. Fact checks can form a vital part of the web. Just over 60,000 fact checks exist in the Google Fact Check Explorer and these were seen over 4 billion times in 2019 in Google Search alone.
We are careful not to overstate our results. There are a lot of people who say that artificial intelligence and machine learning is a panacea, but we have been at the front lines of fact checking since 2010, we know how difficult fact checking is first hand. Humans aren’t going anywhere anytime soon—and nor would we want them to be.
Our automated fact checking team is made up of:
We need support and funding to develop this work further. Please get in touch if you can help.
Customising annotation tools for factchecking at scale
Automated Factchecking at Full Fact
Automated Factchecking at PyData Conference 2017
Full Fact awarded $500,000 to build automated factchecking tools
What is automated factchecking and how will it work?
The State of Automated Factchecking
#FactHack: Our hackathon at Facebook, with Flax
Factchecks are now featured in Google Search
Google supports Full Fact to build automated factchecking tools
Bad information ruins lives. It promotes hate, damages people’s health, and hurts democracy. You deserve better.
Full Fact, 17 Oval Way, London, SE11 5RR
Full Fact is a registered charity (no. 1158683) and a non-profit company (no. 06975984) limited by guarantee and registered in England and Wales. © Copyright 2010-2023 Full Fact. Thanks to Bytemark for donating our web hosting. Privacy, terms and conditions.

source

Leave a Comment