New Spring 2020 tPP Syllabus Available!

As the Prosecution Project has grown and changed, it has benefitted from utilizing the classroom. We have used the shared space as a laboratory, workshop, assembly line, and debate stage for developing our processes and concepts. Nearly each semester we have been able to host a twice a week class to get students in the same space working on tPP.

We are happy to share the Spring 2020 syllabus focused around advanced qualitative coding

To view Dr. Loadenthal’s past tPP syllabi focusing on advanced secondary research, project management and design, and analysis see:


Preparing a report for the US Army War College

In December 2019, tPP was contacted by an individual with the United States Army War College seeking assistance with data. This individual asked if we could provide a number of measures of criminal defendants in order to compare defendants with and without military histories.

After a few emails and a phone call to ensure we could deliver the desired analysis product, our team went to work and in a few days completed two reports: “Active duty vs. discharged veterans and international vs. domestic affiliation in the Prosecution Project (tPP) dataset & Veteran versus civilian comparisons in the Prosecution Project (tPP) dataset”. We are pleased to share these reports here for others to review.

The Prosecution Project team is happy to assist individuals and institutions when our data can be useful. If you have a query which could benefit from tPP data, please let us know!

tPP’s Research Showing Up in the News

We’re very pleased to see our past analysis mentioned in this recent article by the Philadelphia Inquirer, “Republican death threats are undermining our democracy.

We’re always happy to work with journalists, academics, investigators, policy makers and practitioners who can make use of our data. Our goal remains the creation of a free, open source platform for knowledge construction, and until we’re able to make the entirety of the data set public, get in touch if we can be a resource.

Scraping the Violence Project’s Mass Shooter Database (part 2 of 2)

Reflecting back on the quick scraping exercise employing the Violence Project’s Mass Shooter Database (detailed in a previous blog post), I was puzzled by one thing.

In thinking through my own mental list of mass shootings which meet the criteria for inclusion with tPP, I was not able to determine why the 2009 shooting at a US Army facility in Forth Hood in Texas was not included. Why was the perpetrator, (then Major) Nidal Hasan, not included in my initial 13 results?

I decided to investigate. Upon locating the record for Hasan in the Mass Shooter Database, I noticed that he had been excluded from my own scraping approach because his grievance/motive was coded as ‘other’, a category I had decided to exclude. Seeing what to me was such a clearly socio-political crime coded this way got me to think: Are there are crimes in this set, coded as ‘other’ but in fact meeting tPP’s definition of a socio-political motive?

To determine this, I re-ran the same set of procedures: excluding cases prior to 1990, excluding cases where the perpetrator died, then excluding all cases that remained with motives besides those coded as ‘other.’ This left only a 10 cases displayed below:

  1. Michael Vernon
  2. Andrew Golden
  3. Mitchell Johnson
  4. Emanuel Burl Patterson
  5. Nidal Hasan
  6. Riccardo McCray
  7. William Hudson
  8. Robert Thomas
  9. Jarrod Ramos
  10. Kip Kinkel

From this list, I preformed a cursory investigation into the unknown cases using a news aggregator (i.e. Nexus Uni) and a simple boolean search string (“First name Last Name” AND shooting AND year) which inserted the shooter’s name and the year of the incident. This string returned positive results for all cases, and after reading about the various incidents, none met the criteria for inclusion beyond the attack by Hasan.

In total the Mass Shooter Database provided:

  • 23 cases for initial review for possible inclusion
  • 14 of these cases met the criteria for secondary investigation and likely inclusion
  • 9 of these cases were new to tPP while 5 are used for triangulation and ensuring the reliability of our coding procedures

Although this secondary review did not yield any new results (beyond triangulation information for the Hasan case), it did confirm the thoroughness of the original process and the integrity of the Mass Shooting Database’s conceptual coding. Upon comparing the Mass Shooting Database’s coding of Hasan and tPP’s coding, I found no disagreements in the codes, further cementing the accuracy of both projects’ data.

Scraping the Violence Project’s Mass Shooter Database (part 1 of 2)

The thoughtful Greg Reese from Miami’s Research Computing Support sent our team a link to a news story today. The email was titled, “Might have relevance to your work” and linked to a story by Vice News, “Nearly All Mass Shooters Since 1966 Have Had 4 Things in Common.”

This article presents a recent data set published by The Violence Project known as the Mass Shooter Database.

Greg was right to send this our way as there are obvious likely overlaps between tPP’s case criteria and the MSD’s. When reviewing Vice’s secondary review, they noted:

“[Mass shootings are] also increasingly motivated by racial, religious, or misogynist hatred, particularly the ones that occurred in the past five years.”

As soon as I saw this I decided to request access to the data set and promptly received a link. The meticulous and easy to navigate data set provided event data on 171 shootings. From there I sorted columns to prioritize certain variable values and trim the 171 events down to those which would likely meet the inclusion criteria for tPP.

  • I began by eliminating any shootings prior to 1990 as this is outside of tPP’s data range.
  • I then used the “on scene outcome” variable to remove all cases where the shooter died on scene, keeping only those in which the individual was apprehended. Since tPP requires the charging of a crime, only individuals who survived their attack could be included.
  • I then sorted by motive. The data set codes for 13 “grievances” and “motivations.” Using these criteria I colored all cases which displayed the following values:
    • Racial element
    • Interest in white supremacy/Notable racism/Xenophobia
    • Religious hate
    • Homophobia

I also included two cases coded as “Notable misogyny” as this is a recurrent trend in some cases we have added to our project. I then eliminated all of the cases which displayed other grievances as these would not likely meet our definition of a socio-political motive.

This process produced a final set of 13 cases which, according to my interpretations of the coding criteria as provided by The Violence Project, likely meet the criteria for inclusion in tPP. These cases will subsequently be assigned to coders to investigate, and eventually coded for inclusion or exclusion. The cases identified (prior to individual investigation) are:

  1. Kenneth French
  2. Colin Ferguson
  3. Hastings Arthur Wise
  4. Richard Baumhammers
  5. Steven Stagner
  6. Chai Vang
  7. Dylann Roof
  8. Arcan Cetin
  9. Nikolas Cruz
  10. Dimitrios Pagourtzis
  11. Jarrod Ramos
  12. Robert Bowers
  13. Patrick Crusius

Our scraping procedure for data sets requires that we first check if an incident is already included in the project. This involves searching the final data set as well as a series of ‘in progress’ sheets managed by coding teams. If the case is already included, as is the case of the 4 defendants underlined, we will evaluate our coding choices based on the new data for triangulation and possible modification. Since the data provided by The Violence Project is more detailed in certain aspects, we may be able to more accurately represent the record within tPP by exploring the other researchers’ coding decisions.

This search yielded confirmatory information on 4 cases, and 9 likely new case starters. These 9 cases will be investigated by coding teams. They will be worked through the inclusion/exclusion decision tree, and if they pass, entered into the team workflow.

tPP’s first journal publication!

We at tPP could not be more pleased to announce that in April 2019, two members of our Steering Team published the first peer-reviewed journal article based around the Prosecution Project data set!

The stellar team of Athena Chapekis and Sarah M. Moore published, “The prosecution of ‘others’: presidential rhetoric and the interrelation of framing, legal prosecutions, and the Global War on Terror“, as part of a four article Special Section in the journal Critical Studies on TerrorismThe journal portion, titled ‘Emergent Voices in Critical Studies on Terrorism,” featured five undergraduate student researchers each acting as first time authors. The papers were able to pass a standardized double blind peer review, and met strict academic and professional standards.

The abstract for the article is included below, and we invite you to read the complete article, available here!


In examining the Global War on Terror, the effects of presidential rhetoric on the framing of terrorism has been well documented. However, little previous work links terrorism and its status as an “othered” phenomenon to differential legal prosecution in a post-9/11 era. Using the Prosecution Project data set, we compared “othered” individuals, as defined by a Muslim, Arab/Middle Eastern, and/or foreign-born status, to “non-othered” individuals charged with terroristic felonies. Furthermore, we subdivided the dataset into three analytical time blocks: the George W. Bush administration immediately post-9/11, the latter half of the Bush administration, and the Obama administration. For the first and third time blocks, we found that “othered” individuals were prosecuted significantly more frequently than “non-othered” individuals. These findings call into question the effect of presidential rhetoric and the national framing of terrorism on the legal prosecution of “othered” individuals.

USA Today cites tPP

We’re very proud to see that the amazing work of our student researchers was quoted today by USA Today in their article, “AOC says she gets death threats after organizations air ‘hateful messages’ about her”.

We hope to be a resource to media, policy makers, researchers and advocates in the years to come as our data set grows and improves!

Have a question we can answer, let us know?

tPP Preliminary Statistical Report #5 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.

Below you will find the non-technical and technical analytical summaries and selected visualizations for Team #5 (Classification & Characteristic Tree Analysis) of 5.

This report was authored by Brent Crist, Elena McDonald & Yuan Liu, Xinru YuTo download the complete report, including the statistical source code, click here.

Non-Technical Summary


Classification of terrorist attacks is the main problem of the Prosecution Project. Terrorism is one of the hottest topics in the news today, due to its increasing prevalence. Looking at acts of terrorism or political violence from a case-to-case basis, it is interesting to see how the government classifies each of them. Having the only reason for inclusion being “State Speech Act” in comparison to a combination of State Speech Act with other reasons, or no State Speech is of interest. Determining factors for why and how the government labels these cases provides an opportunity for analysis. The data comes from The Prosecution Project (tPP) from the sociology department at Miami University and yields the Reason for Inclusion, Tactic, Number Killed, Number Injured, and Othered Status for each case. This tPP dataset looks into the taxonomy of felony criminal cases involving illegal political violence, occurring in the United States since 1990. Utilizing the tPP dataset will allow for an explanation of the government classifications and the effects these variables have on the decision and how it changes through time.


The Lethality variable is split by Reason of Inclusion categories: State Speech (the motivation for the terrorist act is explicitly political), No State Speech (the motivation for the terrorist act does not involve political purposes), and Combination (a mixture of the two). For better examination of the distribution for the lethality, below is the mean and the standard deviation for each reason, along with the number of cases belonging to the Reasons. It is clear that mean and standard deviation of State Speech are the lowest and have a large variability in comparison to No State Speech and Combination. It also occurs in the same as number of cases.

Looking at the Methods attackers are using, the top three Methods per Reason for Inclusion are below. Providing Support to a terrorist organization is the top method for No State Speech and a Combination. Non-political Method is the most common for State Speech and represents over half of all State Speech Cases. Generally, terrorist attacks in the news, in recent years, involve explosives, firearms, and/or vehicle ramming. The Explosives Method appears less frequently than one might expect, given the frequency of news articles.

The third variable of interest is Othered Status. The table below, once again, breaks down Othered Status into each Reason for Inclusion. For both State Speech and a Combination, Othered individuals heavily out number Non-Othered. In cases that are No State Speech, the two groups are almost perfectly split fifty-fifty.


For Lethality, no state speech is the most common reason, where state speech is much lower. Interestingly, providing support to terrorists or terrorist organizations is the most frequently encountered category for both no state speech and combination. Given the size of both of these categories, the frequency of this providing support is of interest to researchers for its implications in both separate categories. In all cases, the othered status of an individual might help researchers better understand how the state labels these people as terrorists. Because the categories state speech and combination carry implications of a directed attack against the state, the juxtaposition of the othered status reveals data to researchers who might be studying the othered status of terrorists.

Technical Summary


The Prosecution Project provides a chance to determine when and what factors cause the state to label a criminal act as terrorism. In this analysis, many different techniques aid the process of determination of how these acts make the list. Data manipulation and cleaning assist the analysis by creating convenient (and statistically viable) groupings. Summary statistics and data visualization further enhances the ability to better understand how these variables change over time and how they relate to one another. Creating a characteristic tree is a strong method for analyzing what factors cause the government to label criminal acts as terrorism. The random forest method allows for validation of pruned trees and aids the analysis in this paper.


Data cleaning and manipulation are the first two crucial steps to proper analysis. For the tPP data, the research question revolves around the following variables: Reason for Inclusion, Tactic, Lethality, Other Status, and Date. Lethality is not a variable present in the data set; construction of the Lethality variable consists of adding the total kills and injuries per case, resulting from an offense. To answer the time element to the research question, the use of presidential terms creates meaningful time intervals for comparison. Associating the Day, Month, and Year of an event with the Day, Month, and Year of the inauguration of each president (in the scope of the data frame) allows for this timeline to form. The earliest case in the data frame occurs during Bill Clinton’s service, while the latest case occurs during Donald Trump’s service, with George W. Bush and Barack Obama in between. By adding the political affiliation of each president, another layer of analysis and comparison comes into play.

For purposes of the characteristic tree analysis, reduction of the Tactic variable with twenty unique levels is necessary. Reducing the number of levels gives more splitting power in the characteristic trees, further in the analysis. The percentage of cases involving each tactic hints at how much information each unique tactic provides to the overall analysis. Having eight levels, seven without Other, rather than the original twenty levels strengthens the resulting analysis.

Reason for Inclusion also must undergo manipulation. To look specifically at the prevelane of the State Speech Act, splitting of Reason for Inclusion reflects this act. The three groups become cases that are State Speech, Not State Speech, and a Combination of the State Speech Act and other reasons. With this new variable, along with the others, the data are ready for investigation. Working with the data, summary statistics for Reason, Method, Lethality, and Other Status show how the data behaves and what it looks like. Additionally, separating bar graphs for the same set of variables by President, shows how each of these are changing in time. The bar graphs for Reason, Method, and Other Status are proportions while the bar graph for Lethality represents a count.

Creation of a characteristic tree (Buntine, 1992) can help analyze what factors cause the government to include each case, and the reason for the inclusion. Building a characteristic tree is not enough, both cross-validation and building a random forest provide insight as to how well the tree fits to the data. Execution of this technique in R, by partitioning the data into a training and testing set, produces this information. Fitting a tree, using a cost element for each partition, creates the optimal tree which will undergo methods of cross-validation (Zhong, 2016).

Comparing the values of the predictions and the real data computes the accuracy of these models. Further testing of the accuracy comes from the Random Forest, in the creation of a large sample of random trees (Zhong, 2016). By creating a large number of random trees, which use a random selection of the variables to split on, provides more evidence of model accuracy. The random forest generalizes the process, as such, the comparing predictions from the testing data set gives a stronger accuracy measure.

Many R packages are essential for the methods of this analysis. These procedures require the lubridate (Grolemund and Wickham, 2011), caret (Kuhn and Others, 2019), rpart (Therneau and Atkinson, 2018), rpart.plot (Milborrow, 2018), and randomForest (Liaw and Wiener, 2002) packages in R.


In order to properly understand the motive of terrorist attacks, the execution methods play a vital role in their inclusion to this dataset. The Prosecution Project includes an exhaustive list of methods detailing how the acts are committed; however, grouping methods with similar tactics allow for proper analysis. That is, all acts, including acts that effectively serve as the threat of committing another act, are in the same group for analysis (e.g. “Explosives” and “Bomb Threats” become “Explosives”). Additionally, tactics that are “Unspecified” are not useful to a deeper understanding and hence, do not appear in this analysis. Lastly, all tactics that comprise less than 1% of the total tactics and do not fit neatly into the aforementioned methods (Animal Release, Blockading, Unarmed Assault, Vandalism) do not appear in this analysis (see Prevalence of Tactic table in Appendix for more details). These categories, with the terrorists’ reasoning, offer more insight into how a terrorist attack carries out given their motivation. The table below shows the prevalence of each Method in the data in relation to each Reason for Inclusion.

Interestingly, more than half the cases that are State Speech are Non-Political (e.g. James Tyler Williams who killed a homosexual couple because they were gay). The majority of State Speech cases are Non-Political which are non-violent crimes relating to assisting terrorism or denying the ability of the state to pursue these crimes. No State Speech’s top three methods together account for 62.2% of the cases in this category. This means that there is a higher spread of types of crimes as opposed to State Speech’s Non-Political or Combination’s Provide Support which are more highly skewed to these crimes.

The summary statistics of lethality per method provides useful insight into how each of these crimes change by lethality. For instance, the mean lethality of Firearms should be different from the Provide Support method. The standard deviation also shows the spread of each of these methods.

Most cases yield results that fit the narrative of terrorism. Notice the higher means in the Firearms and Hostage/Standoff categories and the lower means in Non-Political and Provide Support categories. Higher standard deviations in the Explosives, Firearms, and Hostage/Standoff categories create a level of uncertainty in how many people are likely to be killed or injured from one of these attacks.

The Othered Status of an individual provides notable statistics for the Reason for Inclusion as well. It is critical to note that the Othered Status itself is quite subjective and is not a uniform label. That is, in no way are there exact criteria for a terrorist to be given an Othered Status. Mapping the Othered Status of a person to the reason their crime was included in the database allows for insight on how an othered person’s crime might be perceived by the State.

State Speech has the largest discrepancy between Othered and Non-Othered Status. This is to say that the vast majority of terroristic acts, when involving State Speech, are by Othered people. Whether or not this has any bearing to what period of time these acts happen appears later in this paper. No State Speech sees an almost even percentages by either Othered or Non-Othered people. As the guidelines for No State Speech are less specific than the other Reasons for Inclusion, there might be less cause for people of Othered and Non-Othered status to commit motivated terrorist attacks and more for the sake of senseless violence. The Combination Reason for Inclusion sees just over twice as many Othered people committing terrorist attacks for this reason as Non-Othered people.

As technology and geopolitical climates change with time, so too does the methodology of a terroristic act. Grouping these methods by their place in time relative to the President in office at the time of their happening gives way to visual representation of these statistics.

Drastic changes come over the years as the geopolitical climate changes. Notice there is a massive increase in the Provide Support (yellow) Method in the Bush, Obama, and Trump Administrations vice the Clinton Administration. This could be due largely to the fact that the Global War on Terror takes place during these Presidencies but not during Clinton’s. It is not unreasonable to believe that the United States, as a strategy to deter violent terrorism, is labeling more non-violent crimes as terrorism than in years past. Since the United States is an economic superpower, its dollar has more buying power around the world. Because of this, terrorist sympathizers are able to accrue cash with much more buying power than in their home countries (assuming they support foreign terrorist organizations). This results in the ability of foreign terrorist organizations to acquire much higher numbers of supplies for violent terrorist attacks.

As the changing of methods through time offers insight into how the United States labels a crime as a terroristic act, the Othered Status of a person, too, changes in time. Different conditions in the United States during the four Presidencies included in this dataset might offer clues into how the status changes.

Notice, again, how the Othered status of terrorists changes drastically after the Clinton era. The United States, during this time period, could be experiencing higher sensitivity to terrorism due greatly to the loss of life from the September 11 attacks. As the Global War on Terror continues through the years, the Othered status of terrorists lowers. Whether this is due to a Liberal Obama Administration and a smaller sample size for the Trump Administration or that the United States and its citizens are becoming less skeptical of the people committing these crimes requires further study.

Seeing how the lethality of each of these acts changes in time can give clues as to how violent the crimes committed in these separate time periods are. Given the rise in non-violent methods in the past three Presidencies, studying the counts of lethality in their terms will shed light on how many people were killed in violent terrorist attacks in these time periods.

The lethality of these attacks again increases substantially during the years following the September 11 attacks. It is important to note that President Donald Trump has only been in office for just over two years at the time of conducting this analysis. The significantly lower lethality could be due mostly to the fact that the sample size is much smaller.

Seeing how the Method, Othered Status, and Lethality has changed through time then lends itself to studying how all terroristic acts included in the dataset has changed. Political moods and outside factors might play into how these crimes are included, and can be visualized by plotting them by the four Presidencies included in tPP.

A large change in No State Speech occurs from the Clinton to Bush Administrations. It is difficult to determine whether this is due to the United States’ sentiment towards terrorism changing after the September 11 attacks or some other unknown variable. The Combination and State Speech groups constitute the largest change from the Clinton to Bush Administration. From the Bush to Obama Administration, a change in these two categories again occurs with State Speech becoming less prevalent and Combination becoming more prevalent. The increase in the Combination group might be a result of President Barack Obama being the first African American President in the history of the United States. A terroristic act due to this fact along with other racially charged motivations constitutes inclusion in the Combination group; however, this hypothesis requires further analysis and is not part of this study.

Machine learning processes can help to classify each case, with respect to their Reason for Inclusion, by the separate variables in the dataset. Splitting each of the nodes into various methods, Presidents, and lethality allows for the computer to decide where a case might fit based on the given factors and to create a Classification Tree from these splits.

The most important factor in this tree is the Method. From the first partition, all of the methods are present except for Non-Political and Perjury/Obstruction of Justice, which lends itself to the State Speech node on the right. The only Methods for which a case is likely to be State Speech are Perjury/Obstruction of Justice or Non-Political. Of note, President is the second partition on both the No State Speech and the State Speech nodes and that Obama appears in both of the positive splits for president. Only 10% of the entries fall under the criteria of Non-Political Method. Additionally, there is no partition that requires the Othered Status or Lethality in this tree. This tree shows a path of which to follow to see the categorization by the government of each type of case. The model accuracy rate of this optimized tree is about 65%, this comes from comparing the predicted values with those in the testing data set. A confusion matrix allows for the analysis of the performance of a Classification Tree. The model is the most accurate in predicting cases of State Speech and the least accurate for cases of a Combination.

The above procedure of obtaining a pruned tree involves using a training and testing data set. Splitting the data and training a model on part of it and then testing the model on the other part is a form of cross-validation. Another way to check the accuracy of the model is through a random forest. A random forest allows for validation of singular trees. Random forest importance plots show the validity of five hundred random trees from the data.

The Mean Decrease Accuracy and Mean Decrease Gini coefficients plots how important a variables is to the partitioning process in the creation of a characteristic tree. The further along a variable is on the x-axis (in both plots) signifies a greater presence in the partitioning process in randomly generated trees in the forest. As in the earlier singular characteristic tree, Method again is the most important variable for determining whether an act is State Speech, No State Speech, or Combination. Despite the large gaps in the variables (meaning the partitioning process becomes less accurate), it is worth noting that the variables in this order help to increase the validity of the singular tree. Lethality and Othered Status are the two least important predictors, according to the random forest data. Summarily, this means that the order of importance for determining the Reason for Inclusion is Method, President, Lethality, and Othered Status. The accuracy rate for the random forest is 70.2%, meaning the model for this data is predicting cases correctly 70.2% of the time.


Many outside political factors (e.g. the September 11 attacks, the Global War on Terror, Presidential Administration) can affect how the government classifies crimes as terroristic acts or not. These classifications do change over time and the involving methods play a significant role in determining whether they are state speech acts of terrorism, not affiliated with state speech, or a combination of the two. In predicting what a government will classify a case as, the method by which a crime is committed and the President in office at the time of its being committed, in order of importance, have the most impact. The others do not provides as much information, but they are, in order of relevance, lethality of the crime and the Othered Status of the terrorist committing the crime. The splitting power of President in the characteristic trees drives home the finding of how the Reason for Inclusion changes in time. The random forest solidifies the importance of the variables presented in the pruned tree through the Mean Decrease Accuracy and the Mean Decrease Gini. Not surprisingly, the pruned tree ended with five terminal nodes, two of which were No State Speech, two of which were Combination, and only one was State Speech. These results are consistent with the raw counts of each of the individual reasons for inclusion. With a 65% accuracy rate in the pruned tree and a 70% accuracy rate in the random forest, there is reason to believe that the variables in these trees make for important determining factors in whether a terroristic act will be classified as a state speech act, a non-state speech act, or a combination of the two.