Coding the “Uncategorized”

This continues our series of student reflections and analysis authored by our research team.

Coding the “Uncategorized”

Bridget Dickens

Through coding, I’ve learned some variables take 10 seconds and the others will take 10 minutes to figure out. When I’m coding for tactic, I never know how long it will take because occasionally it is difficult to summarize a person’s crime into one value. There are many different ways to carry out a crime, and it is impossible for us to list them all. That is why we have the value “uncategorized”.

“Uncategorized” is the catch-all value for cases that do not fit under any other tactics. It includes cases of financial fraud, theft, and even scaling the Statue of Liberty. Given its broad nature, it’s easy to code more complex cases like this, even though in reality they would fit better in another category. I consider the crime’s nature and how it would fit into all other values before deciding it should be uncategorized. For this reason, it can be easier to associate an uncategorized case with another value at first glance. I am often inclined to code these cases as “various methods”, “criminal violation not linked or motivated politically”, or “unknown/unspecified/undeveloped”. However, there is a clear difference.

Emily Putney is a clear example of why we have this value. In 2009, Putney was grocery shopping with her boyfriend, Michael Watkins, when they saw a Hasidic Jew. Watkins harassed the man, shouting many racial slurs. As police arrived, Putney drove their car away and led them on a brief chase. I easily coded Watkins’ tactic as threat/harassment, but Putney was more complex. While she did not stop Watkins from shouting expletives, she did not join in either. Her only role was fleeing the police. I debated the merit of potential values. It was unclear if she was motivated by antisemitic beliefs or the desire to help her boyfriend. Because of this, Putney could not be coded as “criminal violation not linked or motivated politically”.

“Unknown/unspecified/undeveloped” does not encapsulate her tactic either. In our codebook, this value is defined as when a “specific tactic is anticipated, but, at the time of arrest/indictment, unclear.” Technically, Putney committed a crime. There was a specific tactic involved though it is not defined under value in the codebook. That is why the value “uncategorized” works. It is the only one that fits because it includes everything that does not work.

“Uncategorized” is a more recent addition to the codebook. Nevertheless, it is an important one. Without it, there would not be a value for more complicated cases that do not fit into our more traditional categories. Our codebook is constantly evolving so that we can be more inclusive and effective in coding these type of cases.

Source Scraping and Coding Confirmation

This continues our series of student reflections and analysis authored by our research team.

Source Scraping and Coding Confirmation

Emma Lovejoy

When we look for new cases to include in the tPP database, it’s always helpful to find an existing compilation we can pull names and data from.  While cases in some existing databases automatically meet the criteria for inclusion (lists specifically dedicated to terrorist acts, political extremism, etc), for other sources each case must be individually investigated, and a determination made as to whether they should be included in tPP or not.  For sources like this, it’s important that we take the time to ensure we’re not adding extraneous cases, and that the information being added as up-to date.

When a case is located that we think may need to be included, the first step is to check the defendant’s name against each of our active spreadsheets, to ensure the case really is new to tPP.  In cases where we’ve already documented the case, this is an opportunity to see if there are updates to be made; if variables we’d had trouble with in the past are clarified by new source material, if the trial has progressed, etc.  If the case does not appear in our data yet, we move on to source collection.

An easy place to start when a case’s inclusion is still questionable is looking at news stories.  They’re usually easier to find than court documents, and can give a general picture of the crime and the defendant.  Based on what we see in the news, we can usually make a judgement at least on whether or not the case meets the criteria for inclusion, even if the details of their ideology remains unclear.  If it is a case for tPP, especially useful articles will be saved as source files, and known information (name, dates, location etc) as well as the dataset it was originally pulled from will be added to the working spreadsheet as a case-starter, to be coded.

If the case could have been included were it not for an exclusionary factor (charges not reaching felony-level, death prior to charging, etc) then the basic case-starter information is filed separately as an excluded case, with sources and an explanation of why.  We do this to save ourselves time down the line, if the case should come up again in the future.  Given the overlapping content of various datasets, it’s not uncommon for an excluded case to to raise flags on more than one occasion, so having this index improves our ability to work through external data efficiently.

Most recently, we have been working on developing a collective procedure for the scraping process.  That is, a system in which each stage of examination is assigned to an individual or group, to expedite the scraping of each document. So far, this assembly-line approach has yielded hundreds of new cases to be incorporated.

New Spring 2020 tPP Syllabus Available!

As the Prosecution Project has grown and changed, it has benefitted from utilizing the classroom. We have used the shared space as a laboratory, workshop, assembly line, and debate stage for developing our processes and concepts. Nearly each semester we have been able to host a twice a week class to get students in the same space working on tPP.

We are happy to share the Spring 2020 syllabus focused around advanced qualitative coding

To view Dr. Loadenthal’s past tPP syllabi focusing on advanced secondary research, project management and design, and analysis see:


The Case of Keith Luke (Part 1 of 3: general overview & background)

This continues our series of student reflections and analysis authored by our research team.

The Case of Keith Luke (Part 1 of 3: general overview & background)

Caitlin Marsengill

This journal is the first in a three part series on the case of Keith Luke. The first journal is a general overview and background information on the case. The second journal will be on the prosecution and legal proceedings surrounding the case, and the final journal be an analysis of the case. As a disclaimer this case is particularly egregious as the crimes he committed sexual assault, violent, and racially motivated.

Keith Luke is a neo-nazi and white supremacist from Brockton, Massachusetts. He went on a rampage that killed two people and shot and raped another person.  All three victims of this crime were of Cape Verdean descent and he sought them out because of their race. He premeditated the crimes for months. Other than the racial motivations, he also committed the rape because he said that he had been turned down “100,000 (expletive) times” and he did not want to die a virgin. While he was sexually assaulting the woman, another woman came home and walked in on the act so Luke decided to shoot her. Then he shot the woman he had be raping and then got in his car, cranked the music up, and left. As he was driving down the street he spotted his next victim, who was a 72 year old homeless man pushing a carriage. Luke had bought a gun and over 200 rounds of ammunition. He had planned to end his rampage by shooting and killing bingo players at a synagogue. He was attempting to reload his gun while driving but was having difficulty then the police caught up with him and he attempted to shoot at them and then crashed his vehicle. Luke later said that he regretted shooting at the police officers because they were white.

Many details describing the unusual behaviors Luke exhibited throughout his life became evident in his trial from both testimonies as well as actions Luke himself took. These will be discussed more in detail in the next journal entry in the series. Luke was ultimately convicted of killing two people and raping and shooting a third person

Something that was interesting with coding this case is that he wanted to commit the rape so that he did not die a virgin, which initially caused some question as whether or not the case should be included in the project. However, it quickly became evident that this was motivated by his socio-political beliefs and that he had picked the victims due to their race and his hatred towards minorities after reviewing more sources. It was also interesting the varying amount of details different articles gave and it felt like no single article gave the entire story so we had to piece together the whole story from multiple articles that gave different details. Due to this, it constantly felt like we were finding new bits of information and details.

These problems we ran into while coding the case speak to the difficulty of using documentary data sources and how we have to be cautious about the sources used within our research and their credibility, reliability, representativeness, and ethics (Caulfield and Hill, 2014). There is clear bias within the way these articles present the facts, however that worked to our benefit in some of the articles as they went out of their way to include details about his motive, socio-political motive, and his ideological target. As Caulfield and Hill also discuss in their chapter, we can come to trust our sources due to the underlying facts of the case being a commonality in all of the articles and agreeing with other types of documents such as court records that we found.



Caulfield, Laura, and Jane Hill. 2014. Criminological Research for Beginners: A Student’s Guide. 1st edition. Abingdon, Oxon: Routledge. [Excerpt: Chapter 10, “Using documentary and secondary data sources”]

It’s the Government’s Say (Part 2): On the Topic of ‘Other Status’

This continues our series of student reflections and analysis authored by our research team.

It’s the Government’s Say (Part 2): On the Topic of ‘Other Status’

Emily Ashner

When coding for the case of Abdirizak Haji Raghe Wehelie, a federal contractor for the FBI and worked as a linguist translating communications captured by court-authorized surveillance of a suspect in a terrorism investigation, the DOJ released his middle name as “Jaji”, likely on accident, but for those who know Ararbic this has a very a different definition. While this is not a major issue, there are ways the government utilizes Muslim and Arab names and references that influences workers within the government and citizens.

The Institute for Policy and Understanding released statistics that showed the severity differences between Muslims and non-muslim perpetrators who committed the same crime.1 These statistics show that Muslims receive a severe punishment 83% of the time while non-muslim only 17%. Further, average prison sentences are four times higher if the perpetrator is perceived to be Muslim. This is why in the Prosecution Project we code for “other status.” The codebook defines this status if they meet any of the following: Does the defendant have a name not readily understood as European?; Is the defendant Muslim or a Muslim convert?; Is the defendant an immigrant from a non-Western/European country?

Clearly, the way government officials view perpetrators has an extreme effect on how they are sentenced. However, the way this information is presented also has incredible implicit bias effects on citizens, who are potential jury members. According to a study on implicit attitudes towards Arab-muslims, participants showed an implicit negative attitude towards Arab-muslims over both whites and blacks.2 Interestingly, prejudice could be moderated if the participants were exposed to positive values of Arab-muslims first.

There is a wide understanding in the negative role the media can play, but government documents that immediately label a person’s race or religion has clear effects on attitudes. Understanding the availability heuristic, the tendency to apply the group first thought of when  a statement is seen or heard, can be important in conscious understanding of remaining unbiased. Once again, our process for the project is quite objective so coding is not affected, but is important to consider when thinking about how we can apply the results of our coding and other similar projects to mediate the discriminatory nature of justice system sentencing.


  1. Rao, Kumar, Carey Shenkman, Khwaja Ahmed, Hasher Nisar, Dalia Mogahed, Sarrah Bugageila, Katherine Coplen, and Katie Grimes. “Equal Treatment? Measuring the Legal and Media Responses to Ideologically Motivated Violence in the United States.” Washington, DC: Institute for Social Policy & Understanding, April 2018.
  2. Park, Jaihyun, Karla Felix, and Grace Lee. 2007. “Implicit Attitudes towards Arab-Muslims and the Moderating Effects of Social Information.” Basic and Applied Social Psychology 29 (1): 35–45. doi:10.1080/01973530701330942.

Preparing a report for the US Army War College

In December 2019, tPP was contacted by an individual with the United States Army War College seeking assistance with data. This individual asked if we could provide a number of measures of criminal defendants in order to compare defendants with and without military histories.

After a few emails and a phone call to ensure we could deliver the desired analysis product, our team went to work and in a few days completed two reports: “Active duty vs. discharged veterans and international vs. domestic affiliation in the Prosecution Project (tPP) dataset & Veteran versus civilian comparisons in the Prosecution Project (tPP) dataset”. We are pleased to share these reports here for others to review.

The Prosecution Project team is happy to assist individuals and institutions when our data can be useful. If you have a query which could benefit from tPP data, please let us know!

Inter-Coder Reliability Between Projects

This continues our series of student reflections and analysis authored by our research team.

Inter-Coder Reliability Between Projects

Stephanie Sorich

During the week of November 18, many members of the Project were given an assignment focused on “scraping” documents. Essentially with scraping, coders search the names of defendants for potential cases in our project database to determine if we’ve already coded a case, or if we’ve found a new case to code. Coders set up what we called an “assembly line,” several picking out names to be searched, several checking our spreadsheets to see if those names were already codes, and the last few began the cases that weren’t yet coded. It was a very efficient way of getting new potential cases on the board, even if they can’t be finished immediately.

However, rather than taking potential cases from police reports or news articles like is common in the Project, this assembly line was focusing on pulling potential cases from other research projects. The Threat Within provided us with a spreadsheet of cases compiled on foreign terror, as well as several other lists from Homeland Security and other organizations. Being able to compare cases with other projects and organizations working within the same field provided a tremendous opportunity to clarify the reliability of our work.

To a certain extent, we can check for the reliability of our results within our own project. The practice of checking for inter-coder reliability, or making sure that separate coders receive the same result when looking at a case, provides insight into whether coders are using the same standards and coding by the manual in the same way. More reliable coding makes it more likely that the values being coded are valid, as multiple coders are finding the same end results.

However, coding correctly by the manual does not necessarily mean that the cases are being represented accurately. An issue of checking reliability and validity of the Project within the Project itself is the potential for groupthink. The inability to consult outside minds or consider other perspectives on coding outside those in the room with us each day could cause coders to accept variables and values as true rather than as things to be changed to better fit the project as it develops. Therefore, we took this scraping activity as an opportunity to check our results based upon cases from the lists provided that the Project had already coded for. In many cases, we found that our coding matched that of other projects or organizations, giving us sufficient evidence to believe that our methods of coding and the variables and values we are using are adequate.

In a sense, checking results between projects becomes its own method of analysis. While mostly used as a means to complete the research already being done, performing this comparison could be used as its own analysis between the results of tPP and other projects in order to get a grasp of the general ideas behind terrorism research. While not necessarily for publishing, it could prove to be a useful tool when further evaluating our coding process.

At the moment, it looks as if we are on the right track based upon the comparisons done between our data and others’. However, continuing to scrape new cases from other organizations and compare those done mutually between projects should and likely will be a priority of the Project. While we are working with a group of capable individuals who work carefully to produce the best results, breaking away from the group to get an outside perspective is the best way to awaken the parts of us that take our decisions as absolute with no room for suggestion.

For more information on groupthink as it pertains to team projects:

My Year in Review

This continues our series of student reflections and analysis authored by our research team.

My Year in Review

Emily Ashner

As the 2019 year wraps up I approach my one year start with tPP. Looking back to when I asked to join the project at this point last year, I was at such a different knowledge level in terms of terrorism and extremism. I truly did not understand the meaning of these terms, the extent of domestic terrorism, and the variance of crimes that fall within our requirements on the project. After coding cases for two semesters, I have developed such a strong comprehension for terms of the court, how the prosecution process works, and the details of crime that is so often discussed in the media. I feel so much more connected to current events and a more active member in this realm.

Working on this project has not only provided me with a knowledge base of specific on the projects but has given me so many applicable skills. Big data is the new norm; efficient processing is being used across many domains. The ability to understand not only how this type of processing works, but the opportunity to add to and adapt the system are long lasting skills. The allowance of creativity and input within this process has built a new approach to finding best practice and expanding data in different ways. I appreciate the underlying aspects in which this project has strengthened skills that are applicable far beyond the prosecution field.

As a psychology major, I was unsure of how knowledge in this field would pave my future interests. In an unexpected manner, tPP created such a strong interest in the effects of bias. Working within judgment and decision making research I have understood how strong the influence of bias is. However, I did not understand application until a saw an intersection between this and the results of prosecution in the United States. Not only are judges, jury, and other members within the criminal proceedings driven by personal prejudice, whether conscious or unconscious, but these outcomes therefore influence the bias of society. There are endless cycles found in systemic discrepancy and the only way to break them is to first be conscious of their existence and then act in a manner that opposes this automatic process. Cycles are dangerous in creating continuous disadvantage. Whether the aim of the project was meant to discover this or not, the numbers bring light to the situation. There is such a strong ability of application that can arise from this project and I am excited to see where others may take this and where I can utilize a similar type of information moving forward in my career.

Discussing acts of terrorism brings such a strong emotional component. The large acts that are quickly associated with such are events that affect so many people. These are events that can connect us, but it is important to realize these are also events that are isolating those not responsible. I thank this project for this realization of such. I thank this project for giving me an accessible outlet to gain knowledge on the current state of this type of event and more so the ability to analyze them objectively. tPP has provided me with so many skills I will carry much past this project and I hope students of all majors will take advantage of such a unique project. Dr. Loadenthal and all members of the project are so dedicated and are creating something truly incredible. Thank you tPP, I will miss working on the project but have no doubt of the future success to come!

tPP’s Research Showing Up in the News

We’re very pleased to see our past analysis mentioned in this recent article by the Philadelphia Inquirer, “Republican death threats are undermining our democracy.

We’re always happy to work with journalists, academics, investigators, policy makers and practitioners who can make use of our data. Our goal remains the creation of a free, open source platform for knowledge construction, and until we’re able to make the entirety of the data set public, get in touch if we can be a resource.