aboutusicon

Methodology

CIR’s methodology has been guided by best practices in the open source field and has been reviewed by leading practitioners.

6 Step Process

Data Collection

Researchers collect data commonly referred to as user-generated content (UGC) from open-source social media. This is the main focus of the collectors/investigators.

Main sources of UGC are Twitter, Telegram, Facebook, TikTok and YouTube. Data is identified through a combination of keyword, date-based and hashtag searches, and monitoring of data sources that generate high levels of relevant content.

Our researchers come across a huge volume of content daily, often accompanied by claims of human rights abuses. Before an image or video can enter our database or a claim can be investigated further, analysts must check whether the content is old or new. A quick reverse image search flags up old images or videos that have been shared out-of-context or framed inaccurately. This is an initial check on veracity; content that appears to be new – or has not been recorded previously – will be analysed and investigated further at a later stage in the process.

Data Preservation

CIR archives all of the collected data, a process that is equally as important as verifying it. This procedure ensures data is securely stored – and preserved in its original state – should it ever be used to hold perpetrators to account.  Preserving data is also important because vital information or evidence may be removed if social media platforms believe it is violating their terms of use.  The BBC recently found that evidence of potential human rights abuses may be lost after being deleted by tech companies. Platforms often use artificial intelligence to remove graphic videos, but footage that could support prosecutions may also be taken down in this process.

Any data that enters the CIR database is archived upon entry by an auto-archiver (version of: https://github.com/bellingcat/auto-archiver). This collects the related media (photos, videos, audio), source code, and a screenshot of the original source and stores it on a secure server. The auto-archiver also gives each piece of data an autogenerated hash value (SHA3-512) using a hash algorithm, which is then publicly timestamped on Twitter. Should the data be tampered with, this would be visible in changes to the unique value string originally assigned to the data. For example, if an archived video is edited, the hash value would be different to the original value assigned to the footage when it entered the database.

Analysis and verification

Once initial steps to collect and preserve the data have been taken, analysts will examine the content for additional clues that can shed light on what is happening, why, and who is involved.  Analysts use open source techniques to verify as many details as possible. When CIR describes a piece of content as “verified”, it means that investigators have been able to confirm, with a high degree of confidence, the location and date of a piece of footage or a photograph. Occasionally, analysts are able to verify other details, such as perpetrators or victims, but this isn’t always possible.

Geolocation

If an image or video is taken outdoors, and there are buildings, landmarks, or geographical clues visible in the frame, these can be matched with satellite imagery, Google Street View, or other related media. This process is known as geolocation and allows analysts to pinpoint the coordinates of where a photograph or video was captured.

Chronolocation

In some cases, chronolocation can be used to determine when the photograph or video was taken, though this process isn’t always feasible as it requires footage to be captured during daylight hours and a shadow cast to be visible.

Cross-examination

Other methods, such as conducting a reverse image search, analysing the content’s metadata, or identifying clues from buildings or other features in the frame may also hint at when the photo or video was taken.  Small details that may seem insignificant at first glance can reveal crucial details upon closer analysis, for example, clothing or insignia, dialogue, accents or dialect, and even facial expressions and tone of voice. During the verification process, analysts will triangulate their findings against other sources such as news reports or information from sources on the ground.

CIR treats all content the same way, regardless of whether it was shared by an established news outlet or a social media account with only a handful of followers. What is important is that the information can be verified, or – equally critical – debunked.

However, verification of a claim also relies on the availability of photographic or visual evidence. In some cases, it is possible to thread together several pieces of footage, filmed from various angles, to reconstruct an incident. This is precisely what CIR did in an Afghan Witness investigation into evidence of summary executions in the Panjshir Valley in October 2022, when we were able to conclusively link one group of Taliban fighters to the execution of ten men in the Dara District area. In other cases, visual evidence may be limited, meaning analysts can determine some details, but not others.

Review

After content has been analysed, it is reviewed by a senior investigator to ensure information is as accurate and reliable as possible. Data is also reviewed for privacy and safety to mitigate the risk of sharing footage that reveals identifying details of individuals, such as their personal details or location. A privacy tagging system is used to identify any footage that might compromise the privacy, security or safety of individuals, and footage flagged with a privacy tag will be redacted in the online map and any subsequent publications.

While viewing graphic footage or images is a necessary part of the verification process, CIR is focused on ensuring there are several measures in place to regulate the risks associated with viewing traumatic content. When content is entered into the database, it is given a graphic content level category based on a ranking of one to five. This allows reviewers, investigators and those viewing the footage to prepare themselves for graphic imagery. Similar features can also be found on CIR maps – a graphic content level is given to each piece of data, and any graphic imagery is removed from the preview box. The original source of the footage or image is still available to click on, but viewers get a chance to prepare themselves before viewing the content or can choose not to click the link.

Uploading to Map

Once data has been verified and reviewed, it can then be uploaded to the respective project map. The map is regularly updated with new data. However, cases that feature privacy concerns will be uploaded without the footage, or in some cases, will have a delayed upload of several months.

It is worth noting that while data displayed in the map has been verified to confirm that the report/claim is consistent with the associated image or video, this does not mean all elements of the content are verifiable. For example, AW may verify a video showing evidence of an explosion in Kabul but may be unable to confirm how many people were killed, or how the incident unfolded.

Reports, investigations, and media coverage 

Our hope is that these maps will become a launchpad for further investigations and analysis across our regions of focus, while allowing the public to visualise the extent of abuses, security incidents and protest movements.

Communicating our findings to audiences around the world is important – journalists can corroborate our data with vital on-the-ground testimony and can use our open source investigations to tell stories. Responsible, collaborative reporting can help illustrate the impact of the issues and incidents we monitor daily, while also promoting open source verification techniques as a crucial journalistic toolkit in the digital age.