An Exploratory Dive into the Dark Network Links of Far-Right tPP Cases


The posts below are brief summaries of 14-week research projects designed and carried out by our student team. tPP plans to release the full studies as peer-reviewed publications in the future.


An Exploratory Dive into the Dark Network Links of Far-Right tPP Cases

Meg Drown

There has been a mass movement by far-right extremists to dark web social media platforms and the use of cryptocurrencies as a means to crowdsource. This move has largely been due to the initiatives of big tech companies to stymie the current of extremist content on their websites by removing users who express extremist views or are otherwise connected to extremist organizations. Many on the far-right have publicly renounced Facebook, Twitter, and other tech companies claiming that their actions to remove extremist content, especially that iterated from the far-right, infringes on Americans’ right to free speech [1].

Although there are detailed user agreements that place constraints on the content that is broadcast by users, prohibiting the kind of insulting and hateful speech that is often expressed by those on the far-right, leaders and organizers on the far-right have gained momentum by politicizing this phenomenon. However new sites have arisen to paradoxically give far-right extremists a “safe haven” to express their views. The creator of social media platform Gab, has told media outlets that the purpose of Gab was to create an online platform specifically for conservatives and the far-right, whom he believes have been treated unfairly by big tech. The site’s lackadaisical regulations on what would normatively be considered hate speech and its targeted advertising towards conservatives have combined to create the perfect storm, or what has been described as a “hate-filled echo chamber full of racism and conspiracy theories” [2].

Likewise, 8chan, an imageboard and offshoot of 4chan, is another well-known site that harbors extremist content. Purportedly, the manifesto released by 28-year-old Brenton Tarrant, the man who murdered 50 Muslims at two mosques in Christchurch, New Zealand was circulated via 8chan and fell into the hands of another impressionable extremist. John Earnest, a 19-year-old who has been indicted on 109 hate crime charges, carried out an attack on a synagogue in early May leaving one dead and three injured. According to a Vox article detailing the apparent perils of 8chan, Earnest was inspired to carry out the attack, in part, due to the radical ideology outlined in Tarrant’s manifesto [3].

Fintech has pursued similar action against extremist users. Mainstream digital fundraising sites such as PayPal and Amazon, have been proactively identifying and denying access to those users utilizing their sites to fundraise for nefarious purposes. Richard Spencer and prominent voices on the far-right reveled at the spectacularity of Bitcoin to fundraise for their unsettling online platforms. Bitcoin and other cryptocurrencies are unique due to their peer-to-peer (P2P) transactional features. It is, in part, due to this feature that makes it easy to hide under the guise of anonymity while extorting money for various purposes [4]. Though the apparent anonymity benefits of Bitcoin and other cryptocurrencies have been cited by law enforcement and those using Bitcoin as a means to fundraise as the defining feature of the platform, scholars have asserted that Bitcoin is one knock below anonymous. Rather, Bitcoin and many of its crypto counterparts are pseudonymous due to endpoint identification in straightforward transactions.

An ambition of many open-source intelligence analysts is to be able to identify and track the financial networks of far-right actors. Certainly, open-source intelligence analysts have been highly successful at identifying traditional transactional networks and, recently, crypto transactional networks. John Bambenek, an open-source intelligence researcher and professor of cybersecurity at the iSchool in Illinois, does just that [5]. Specifically, Bambenek tracks the donations received by white nationalist BTC wallets, the amount spent, and their balance, which he records in a daily wallet summary report via his Twitter account called Neonazi BTC Tracker (@NeonaziWallets) [6]. Bambenek also records whenever a withdrawal or a substantial donation is made to one of the white nationalist BTC wallets in a separate tweet. For all of the apparent anonymity benefits of using BTC, highly-skilled computer scientists are able to identify and track specific BTC wallets using mathematical algorithms and the fact that the BTC transaction log is public by design.

Keeping in mind tPP while researching the shift of far-right actors to cryptocurrencies and dark web platforms, it was an ambition of mine to be able to identify individuals who occur in tPP that exist in a crypto transactional network with some prominent members of the far-right that have rose to prominence in recent years, and have, in fact, gained traction in the Bitcoin and dark web realms. However, due to my limited capabilities in being able to identify users who send donations via Bitcoin to these prominent far-right actors and the sheer volume of transactions that occur between their accounts, I found it an improbable task to carry out in a limited amount of time.

However, I did find that individuals in tPP who are coded as Rightist: Identity-focused under the variable Ideological Affiliation, especially those occurring after the Charlottesville “Unite the Right” rally in 2017 had maintained a presence on dark web forums and were, perhaps, inspired by extremist media purveyed on these forums. Wanting to delve deeper into the dark web links of individuals in tPP, I took an exploratory sample of those coded as Rightist: Identity-focused occuring after 12 August 2017. I created a link analysis which identified how various actors in the exploratory sample connected with one another.

To do so, I collected open-source data on the individuals via court documents, newspaper articles, and examination of dark web content that had been released online. Though the results were rather underwhelming – most individuals who were linked to one another were linked through organizational ties – I did find that several members of my exploratory sample had maintained ties with prominent far-right organizers, such as Richard Spencer and Eli Mosley, or others in tPP who had carried out high-profile attacks such as Dylann Roof and Robert Bowers. In fact, Bowers purportedly decried the prosecutions of various members of the Rise Above Movement (RAM), described as a “a Southern California-based racist fight club” [7], who appeared in the exploratory sample and had allegedly interchanged with the leader of RAM, Robert Rundo, via Gab.

Though the subject sample was small and the findings marginally supportive of a dark web network that exists between tPP individuals, my paper revealed that there are demonstrable links between actors on the right through dark web social media platforms such as Gab, Discord, and 8chan. Further studies can and should be carried out in order that we can better understand how individuals occurring in tPP interact and position themselves in the far-right movement through dark web participation.

Notes

[1] Kirkland, “Relegated To Fringe Platforms, White Nationalists Stuck In Own Echo Chamber”; “Big Tech, the Alt-Right and the Unknown Future of the Internet”; “Inside the Hate-Filled Echo Chamber of Racism and Conspiracy Theories | Media | The Guardian.”

[2] “Inside the Hate-Filled Echo Chamber of Racism and Conspiracy Theories | Media | The Guardian.”

[3] Stewart, “8chan, Explained.”

[4] Mabunda, “Cryptocurrency.”

[5] Matsakis, Koebler, and Pearson, “This Twitter Bot Tracks Neo-Nazi Bitcoin Transactions.”

[6] Tracker, “New Payment to Henrik Palmgren (Http://RedIce.Tv ): 0.00519921 BTC ($20.16) Https://Blockchain.Info/Tx/127b726aa6ad4c43d41b1b6783d1a71e05c27deeae7a393b44ced91a032948a7 … Total of Henrik Palmgren (Http://RedIce.Tv ) BTC Wallets: 0 BTC ($0).”

[7] “Rise Above Movement.”

Friend of Foe?: An Analysis of Factors Influencing Sentence Length in the Prosecution of Terrorism


The posts below are brief summaries of 14-week research projects designed and carried out by our student team. tPP plans to release the full studies as peer-reviewed publications in the future.


Friend of Foe?: An Analysis of Factors Influencing Sentence Length in the Prosecution of Terrorism

Megan Burtis & Liz Butler

Our research project utilized a grounded theory case study analysis to determine which factors influence the extent to which the Federal Sentencing Guidelines are adhered to in the prosecution of terroristic cases.

The cases we analyzed we United States v. Burgert et al., United States v. Boyd et al., and United States v. Dibee et al. All findings within our paper were the result of the analysis of the three case studies we selected. Using a grounded theory approach, the analysis of these findings yielded the creation of specific categories which provide a theory as to what factors have the greatest impact on sentencing. Our paper theorizes that government manipulation of the Federal Sentencing Guidelines plays the biggest role in determining the final sentence length of defendants prosecuted for terroristic crimes. Thus, the way in which the government views a defendant ultimately determines their sentence.

Four key factors were found to influence the government’s view of defendants which include the plea entered by the defendant, the level of regret the defendant shows for the crime committed, the degree to which the defendant continues to support the ideology which motivated their crime, and finally the extent to which the defendant cooperated with the government during both the investigation and adjudication. The evaluation of these factors allowed for defendants to be placed in specific categories, as shown in the table, which reflect whether they will receive sentences at the lower or higher end of what was recommended.

Our research tentatively supported our initial hypothesis that race/ethnicity, citizenship status, and “othered” status would be influential factors, but we would require more evidence to make this claim with any degree of certainty. Finally, these findings have significant implications for future research, specifically pertaining to the use of terrorism enhancements and plea bargains. Further research is recommended to see whether both or neither of these strategies are suitable as a counterterrorism measure. Further research into the generalizability of our theory will also be required to test its applicability.

Deportation Station: How the United States Decides Who Stays and Who Goes


The posts below are brief summaries of 14-week research projects designed and carried out by our student team. tPP plans to release the full studies as peer-reviewed publications in the future.


Deportation Station: How the United States Decides Who Stays and Who Goes

Zoe Belford

My paper assessed what conditions lead to an increased likelihood of deportation following a guilty verdict in a United States terrorism prosecution, as well as if and how this relates to post-9/11 national security policy.  My sample included all cases in the Prosecution Project’s database that included a defendant with foreign citizenship, as well as had ended in a guilty verdict. This resulted in a sample size of 306, which I divided into two subsamples – cases which ended in deportation and cases which did not. Using these two samples,  I conducted a descriptive statistical analysis to find if any notable differences existed between the two groups.

My findings were as follows. Compared to non-deported defendants, deported defendants were:

    • Less likely to have a case involving a co-defendant
    • Less likely to have been charged with a previous similar crime
    • More likely to have completed the crime they were charged with
    • Less likely to have their case involve an informant
    • Less likely to be affiliated with a foreign terrorist organization
    • Have, on average, significantly lower sentence lengths
    • More likely to have an unclear ideological affiliation
    • Less likely to have an affiliation with a Salafi/Jihadist ideology
    • More likely to be Middle Eastern/North African

All of these findings hold the potential for further research, but I focused on the variable of foreign terrorist organization (FTO) affiliation. I found that deported defendants are known to be FTO-affiliated in only 35% of cases, whereas non-deported defendants are known to be FTO-affiliated in 72% of cases.

During my research for this project, I came across a theory that seemed particularly applicable to my observed findings. Based in economic and national security studies, mosaic theory posits that bits of intelligence can be pieced together by hostile parties (i.e. foreign intelligence agencies, foreign terrorist organizations) to form a picture of US intelligence practices and knowledge [1]. Since 9/11, this theory has played a significant role in the United States court system. Specifically, it was used to justify the classification of documents regarding the detainment of over seven-hundred people in regards to September 11th [2]. Based on my findings, I hypothesize that that the government is choosing to keep defendants who are more intertwined with known terrorist organizations within the country to avoid the potential intelligence risks of a deportation hearing. Deportation hearings can only be closed in a select number of circumstances [3], whereas the precedent to use mosaic theory to justify the classification of criminal proceedings has already been set.

Notes

[1] Neuman, Gerald L. 2005. “Discretionary Deportation.” Georgetown Immigration Law Journal 20: 611–56.

[2] Pozen, David E., James E. Baker, Jessica Bulman-Pozen, Fadi Hanna, Kenneth Levit, John Sims, and David Vladeck. 2005. “The Mosaic Theory, National Security, and the Freedom of Information Act.” Yale Law Journal.

[3] “Fact Sheet: Observing Immigration Court Hearings.” 2015. Department of Justice. February 10, 2015. https://www.justice.gov/eoir/observing-immigration-court-hearings.

USA Today cites tPP

We’re very proud to see that the amazing work of our student researchers was quoted today by USA Today in their article, “AOC says she gets death threats after organizations air ‘hateful messages’ about her”.

We hope to be a resource to media, policy makers, researchers and advocates in the years to come as our data set grows and improves!

Have a question we can answer, let us know?

How and Why Socio-Politically Motivated Crimes are Completed


The posts below are brief summaries of 14-week research projects designed and carried out by our student team. tPP plans to release the full studies as peer-reviewed publications in the future.


How and Why Socio-Politically Motivated Crimes are Completed

Tia Turner and Brenda Uriona

Brian Jackson, a senior physical scientist at the RAND Corporation, and David Frelinger, a senior policy analyst at RAND, constructed a report stating three main characteristics of what causes terrorist attacks to succeed or fail: terrorist group capabilities and resources, the requirements of the operation it attempted or is planning to attempt, and the relevance and reliability of security countermeasures. Utilizing the entirety of tPP dataset, we tested their theoretical framework on terrorism attacks using QCA complimented by frequency distributions and chi-squared analysis. With this, we expanded upon their framework by utilizing the dataset’s inclusivity of all socio-politically motivated crimes. We measured attacker group capabilities as a binary of the perpetrator’s group affiliation or lack thereof and measured operational complexity through the variable “Tactic,” redefined in terms of violence as a binary of “Yes”- violent or “No”- nonviolent. Crimes coded as violent are operationally defined to have greater complexity than nonviolent ones (see Figure 1 below).

We believe and assume completion of a crime will be significantly dependent on type of instigator and violent or nonviolent tactic. Additionally, we are adding upon the framework a more specific take onto instigator identity with tPP variable “‘Other’ Status.” If tests run on othering bring rise to a significant indication of whether or not a crime was completed, we plan to examine which trait characteristics are possibly targeted by security countermeasures, if any.

Altogether, these will reveal how and why socio-politically motivated crimes are completed and what can be done as time goes on. Work like this is essential because of its ability to show judicial bias. We believe if “Other” status is a significant indicator of crime completion it may be caused by Othering from counter securities and law enforcement’s implicit bias. Understanding indicators of why a crime is completed to at least some measure of success is critical for developing effective security measures. By testing all socio-politically motivated crimes, signs will prove to have greater generalizability that can help create more exhaustive and efficient consideration and efforts against crime.

In the end, both the QCA and exhaustive CHAID classification tree analysis (Figure 2) showed “Tactic” and “Group affiliation” to be significant indicators for “Completion of crime,” proving dependent correlation.

Overall, “Group Affiliation” proved to be the strongest indicator with a p-value of 0.00 at the 95% confidence level. Crimes committed by perpetrators with group affiliation are significantly more likely to succeed (61%) compared to perpetrators without group affiliation (39%). Analysis of “Tactic” at the 95% confidence level, p = 0.013, shows crimes utilizing a violent tactic, meaning one of greater operational complexity, are significantly less likely to be completed (55%) than crimes using a nonviolent tactic (45%). Alternatively, opposing our hypothesis, “‘Other’ status” is not indicative of crime success or failure. Future research could focus on more specific trait characteristics of the variables found significant through this study. Using what is commonly found in group capability and operational complexity within large datasets like tPP (e.g. pre-incident indicators) can ensure reliability and continue to aid in the establishment of more effective security countermeasures.

 

Bibliography

Brian A. Jackson, and David A. Frelinger. Understanding Why Terrorist Operations Succeed or Fail. Santa Monica, CA: RAND Corporation, 2009. https://www.rand.org/pubs/occasional_papers/OP257.html.

Loadenthal, Michael, et al. 2019. “The Prosecution Project (tPP)” (Version March 2019) [Dataset]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

tPP Preliminary Statistical Report #5 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the non-technical and technical analytical summaries and selected visualizations for Team #5 (Classification & Characteristic Tree Analysis) of 5.

This report was authored by Brent Crist, Elena McDonald & Yuan Liu, Xinru YuTo download the complete report, including the statistical source code, click here.



Non-Technical Summary

Introduction

Classification of terrorist attacks is the main problem of the Prosecution Project. Terrorism is one of the hottest topics in the news today, due to its increasing prevalence. Looking at acts of terrorism or political violence from a case-to-case basis, it is interesting to see how the government classifies each of them. Having the only reason for inclusion being “State Speech Act” in comparison to a combination of State Speech Act with other reasons, or no State Speech is of interest. Determining factors for why and how the government labels these cases provides an opportunity for analysis. The data comes from The Prosecution Project (tPP) from the sociology department at Miami University and yields the Reason for Inclusion, Tactic, Number Killed, Number Injured, and Othered Status for each case. This tPP dataset looks into the taxonomy of felony criminal cases involving illegal political violence, occurring in the United States since 1990. Utilizing the tPP dataset will allow for an explanation of the government classifications and the effects these variables have on the decision and how it changes through time.

Results

The Lethality variable is split by Reason of Inclusion categories: State Speech (the motivation for the terrorist act is explicitly political), No State Speech (the motivation for the terrorist act does not involve political purposes), and Combination (a mixture of the two). For better examination of the distribution for the lethality, below is the mean and the standard deviation for each reason, along with the number of cases belonging to the Reasons. It is clear that mean and standard deviation of State Speech are the lowest and have a large variability in comparison to No State Speech and Combination. It also occurs in the same as number of cases.

Looking at the Methods attackers are using, the top three Methods per Reason for Inclusion are below. Providing Support to a terrorist organization is the top method for No State Speech and a Combination. Non-political Method is the most common for State Speech and represents over half of all State Speech Cases. Generally, terrorist attacks in the news, in recent years, involve explosives, firearms, and/or vehicle ramming. The Explosives Method appears less frequently than one might expect, given the frequency of news articles.

The third variable of interest is Othered Status. The table below, once again, breaks down Othered Status into each Reason for Inclusion. For both State Speech and a Combination, Othered individuals heavily out number Non-Othered. In cases that are No State Speech, the two groups are almost perfectly split fifty-fifty.

Conclusion

For Lethality, no state speech is the most common reason, where state speech is much lower. Interestingly, providing support to terrorists or terrorist organizations is the most frequently encountered category for both no state speech and combination. Given the size of both of these categories, the frequency of this providing support is of interest to researchers for its implications in both separate categories. In all cases, the othered status of an individual might help researchers better understand how the state labels these people as terrorists. Because the categories state speech and combination carry implications of a directed attack against the state, the juxtaposition of the othered status reveals data to researchers who might be studying the othered status of terrorists.

Technical Summary

Introduction

The Prosecution Project provides a chance to determine when and what factors cause the state to label a criminal act as terrorism. In this analysis, many different techniques aid the process of determination of how these acts make the list. Data manipulation and cleaning assist the analysis by creating convenient (and statistically viable) groupings. Summary statistics and data visualization further enhances the ability to better understand how these variables change over time and how they relate to one another. Creating a characteristic tree is a strong method for analyzing what factors cause the government to label criminal acts as terrorism. The random forest method allows for validation of pruned trees and aids the analysis in this paper.

Methods

Data cleaning and manipulation are the first two crucial steps to proper analysis. For the tPP data, the research question revolves around the following variables: Reason for Inclusion, Tactic, Lethality, Other Status, and Date. Lethality is not a variable present in the data set; construction of the Lethality variable consists of adding the total kills and injuries per case, resulting from an offense. To answer the time element to the research question, the use of presidential terms creates meaningful time intervals for comparison. Associating the Day, Month, and Year of an event with the Day, Month, and Year of the inauguration of each president (in the scope of the data frame) allows for this timeline to form. The earliest case in the data frame occurs during Bill Clinton’s service, while the latest case occurs during Donald Trump’s service, with George W. Bush and Barack Obama in between. By adding the political affiliation of each president, another layer of analysis and comparison comes into play.

For purposes of the characteristic tree analysis, reduction of the Tactic variable with twenty unique levels is necessary. Reducing the number of levels gives more splitting power in the characteristic trees, further in the analysis. The percentage of cases involving each tactic hints at how much information each unique tactic provides to the overall analysis. Having eight levels, seven without Other, rather than the original twenty levels strengthens the resulting analysis.

Reason for Inclusion also must undergo manipulation. To look specifically at the prevelane of the State Speech Act, splitting of Reason for Inclusion reflects this act. The three groups become cases that are State Speech, Not State Speech, and a Combination of the State Speech Act and other reasons. With this new variable, along with the others, the data are ready for investigation. Working with the data, summary statistics for Reason, Method, Lethality, and Other Status show how the data behaves and what it looks like. Additionally, separating bar graphs for the same set of variables by President, shows how each of these are changing in time. The bar graphs for Reason, Method, and Other Status are proportions while the bar graph for Lethality represents a count.

Creation of a characteristic tree (Buntine, 1992) can help analyze what factors cause the government to include each case, and the reason for the inclusion. Building a characteristic tree is not enough, both cross-validation and building a random forest provide insight as to how well the tree fits to the data. Execution of this technique in R, by partitioning the data into a training and testing set, produces this information. Fitting a tree, using a cost element for each partition, creates the optimal tree which will undergo methods of cross-validation (Zhong, 2016).

Comparing the values of the predictions and the real data computes the accuracy of these models. Further testing of the accuracy comes from the Random Forest, in the creation of a large sample of random trees (Zhong, 2016). By creating a large number of random trees, which use a random selection of the variables to split on, provides more evidence of model accuracy. The random forest generalizes the process, as such, the comparing predictions from the testing data set gives a stronger accuracy measure.

Many R packages are essential for the methods of this analysis. These procedures require the lubridate (Grolemund and Wickham, 2011), caret (Kuhn and Others, 2019), rpart (Therneau and Atkinson, 2018), rpart.plot (Milborrow, 2018), and randomForest (Liaw and Wiener, 2002) packages in R.

Results

In order to properly understand the motive of terrorist attacks, the execution methods play a vital role in their inclusion to this dataset. The Prosecution Project includes an exhaustive list of methods detailing how the acts are committed; however, grouping methods with similar tactics allow for proper analysis. That is, all acts, including acts that effectively serve as the threat of committing another act, are in the same group for analysis (e.g. “Explosives” and “Bomb Threats” become “Explosives”). Additionally, tactics that are “Unspecified” are not useful to a deeper understanding and hence, do not appear in this analysis. Lastly, all tactics that comprise less than 1% of the total tactics and do not fit neatly into the aforementioned methods (Animal Release, Blockading, Unarmed Assault, Vandalism) do not appear in this analysis (see Prevalence of Tactic table in Appendix for more details). These categories, with the terrorists’ reasoning, offer more insight into how a terrorist attack carries out given their motivation. The table below shows the prevalence of each Method in the data in relation to each Reason for Inclusion.

Interestingly, more than half the cases that are State Speech are Non-Political (e.g. James Tyler Williams who killed a homosexual couple because they were gay). The majority of State Speech cases are Non-Political which are non-violent crimes relating to assisting terrorism or denying the ability of the state to pursue these crimes. No State Speech’s top three methods together account for 62.2% of the cases in this category. This means that there is a higher spread of types of crimes as opposed to State Speech’s Non-Political or Combination’s Provide Support which are more highly skewed to these crimes.

The summary statistics of lethality per method provides useful insight into how each of these crimes change by lethality. For instance, the mean lethality of Firearms should be different from the Provide Support method. The standard deviation also shows the spread of each of these methods.

Most cases yield results that fit the narrative of terrorism. Notice the higher means in the Firearms and Hostage/Standoff categories and the lower means in Non-Political and Provide Support categories. Higher standard deviations in the Explosives, Firearms, and Hostage/Standoff categories create a level of uncertainty in how many people are likely to be killed or injured from one of these attacks.

The Othered Status of an individual provides notable statistics for the Reason for Inclusion as well. It is critical to note that the Othered Status itself is quite subjective and is not a uniform label. That is, in no way are there exact criteria for a terrorist to be given an Othered Status. Mapping the Othered Status of a person to the reason their crime was included in the database allows for insight on how an othered person’s crime might be perceived by the State.

State Speech has the largest discrepancy between Othered and Non-Othered Status. This is to say that the vast majority of terroristic acts, when involving State Speech, are by Othered people. Whether or not this has any bearing to what period of time these acts happen appears later in this paper. No State Speech sees an almost even percentages by either Othered or Non-Othered people. As the guidelines for No State Speech are less specific than the other Reasons for Inclusion, there might be less cause for people of Othered and Non-Othered status to commit motivated terrorist attacks and more for the sake of senseless violence. The Combination Reason for Inclusion sees just over twice as many Othered people committing terrorist attacks for this reason as Non-Othered people.

As technology and geopolitical climates change with time, so too does the methodology of a terroristic act. Grouping these methods by their place in time relative to the President in office at the time of their happening gives way to visual representation of these statistics.

Drastic changes come over the years as the geopolitical climate changes. Notice there is a massive increase in the Provide Support (yellow) Method in the Bush, Obama, and Trump Administrations vice the Clinton Administration. This could be due largely to the fact that the Global War on Terror takes place during these Presidencies but not during Clinton’s. It is not unreasonable to believe that the United States, as a strategy to deter violent terrorism, is labeling more non-violent crimes as terrorism than in years past. Since the United States is an economic superpower, its dollar has more buying power around the world. Because of this, terrorist sympathizers are able to accrue cash with much more buying power than in their home countries (assuming they support foreign terrorist organizations). This results in the ability of foreign terrorist organizations to acquire much higher numbers of supplies for violent terrorist attacks.

As the changing of methods through time offers insight into how the United States labels a crime as a terroristic act, the Othered Status of a person, too, changes in time. Different conditions in the United States during the four Presidencies included in this dataset might offer clues into how the status changes.

Notice, again, how the Othered status of terrorists changes drastically after the Clinton era. The United States, during this time period, could be experiencing higher sensitivity to terrorism due greatly to the loss of life from the September 11 attacks. As the Global War on Terror continues through the years, the Othered status of terrorists lowers. Whether this is due to a Liberal Obama Administration and a smaller sample size for the Trump Administration or that the United States and its citizens are becoming less skeptical of the people committing these crimes requires further study.

Seeing how the lethality of each of these acts changes in time can give clues as to how violent the crimes committed in these separate time periods are. Given the rise in non-violent methods in the past three Presidencies, studying the counts of lethality in their terms will shed light on how many people were killed in violent terrorist attacks in these time periods.

The lethality of these attacks again increases substantially during the years following the September 11 attacks. It is important to note that President Donald Trump has only been in office for just over two years at the time of conducting this analysis. The significantly lower lethality could be due mostly to the fact that the sample size is much smaller.

Seeing how the Method, Othered Status, and Lethality has changed through time then lends itself to studying how all terroristic acts included in the dataset has changed. Political moods and outside factors might play into how these crimes are included, and can be visualized by plotting them by the four Presidencies included in tPP.

A large change in No State Speech occurs from the Clinton to Bush Administrations. It is difficult to determine whether this is due to the United States’ sentiment towards terrorism changing after the September 11 attacks or some other unknown variable. The Combination and State Speech groups constitute the largest change from the Clinton to Bush Administration. From the Bush to Obama Administration, a change in these two categories again occurs with State Speech becoming less prevalent and Combination becoming more prevalent. The increase in the Combination group might be a result of President Barack Obama being the first African American President in the history of the United States. A terroristic act due to this fact along with other racially charged motivations constitutes inclusion in the Combination group; however, this hypothesis requires further analysis and is not part of this study.

Machine learning processes can help to classify each case, with respect to their Reason for Inclusion, by the separate variables in the dataset. Splitting each of the nodes into various methods, Presidents, and lethality allows for the computer to decide where a case might fit based on the given factors and to create a Classification Tree from these splits.

The most important factor in this tree is the Method. From the first partition, all of the methods are present except for Non-Political and Perjury/Obstruction of Justice, which lends itself to the State Speech node on the right. The only Methods for which a case is likely to be State Speech are Perjury/Obstruction of Justice or Non-Political. Of note, President is the second partition on both the No State Speech and the State Speech nodes and that Obama appears in both of the positive splits for president. Only 10% of the entries fall under the criteria of Non-Political Method. Additionally, there is no partition that requires the Othered Status or Lethality in this tree. This tree shows a path of which to follow to see the categorization by the government of each type of case. The model accuracy rate of this optimized tree is about 65%, this comes from comparing the predicted values with those in the testing data set. A confusion matrix allows for the analysis of the performance of a Classification Tree. The model is the most accurate in predicting cases of State Speech and the least accurate for cases of a Combination.

The above procedure of obtaining a pruned tree involves using a training and testing data set. Splitting the data and training a model on part of it and then testing the model on the other part is a form of cross-validation. Another way to check the accuracy of the model is through a random forest. A random forest allows for validation of singular trees. Random forest importance plots show the validity of five hundred random trees from the data.

The Mean Decrease Accuracy and Mean Decrease Gini coefficients plots how important a variables is to the partitioning process in the creation of a characteristic tree. The further along a variable is on the x-axis (in both plots) signifies a greater presence in the partitioning process in randomly generated trees in the forest. As in the earlier singular characteristic tree, Method again is the most important variable for determining whether an act is State Speech, No State Speech, or Combination. Despite the large gaps in the variables (meaning the partitioning process becomes less accurate), it is worth noting that the variables in this order help to increase the validity of the singular tree. Lethality and Othered Status are the two least important predictors, according to the random forest data. Summarily, this means that the order of importance for determining the Reason for Inclusion is Method, President, Lethality, and Othered Status. The accuracy rate for the random forest is 70.2%, meaning the model for this data is predicting cases correctly 70.2% of the time.

Conclusion

Many outside political factors (e.g. the September 11 attacks, the Global War on Terror, Presidential Administration) can affect how the government classifies crimes as terroristic acts or not. These classifications do change over time and the involving methods play a significant role in determining whether they are state speech acts of terrorism, not affiliated with state speech, or a combination of the two. In predicting what a government will classify a case as, the method by which a crime is committed and the President in office at the time of its being committed, in order of importance, have the most impact. The others do not provides as much information, but they are, in order of relevance, lethality of the crime and the Othered Status of the terrorist committing the crime. The splitting power of President in the characteristic trees drives home the finding of how the Reason for Inclusion changes in time. The random forest solidifies the importance of the variables presented in the pruned tree through the Mean Decrease Accuracy and the Mean Decrease Gini. Not surprisingly, the pruned tree ended with five terminal nodes, two of which were No State Speech, two of which were Combination, and only one was State Speech. These results are consistent with the raw counts of each of the individual reasons for inclusion. With a 65% accuracy rate in the pruned tree and a 70% accuracy rate in the random forest, there is reason to believe that the variables in these trees make for important determining factors in whether a terroristic act will be classified as a state speech act, a non-state speech act, or a combination of the two.

tPP Preliminary Statistical Report #4 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the non-technical analytical summary and selected visualizations for Team #3 (Change over time) of 5.

This report was authored by Daniel Cirkovic, Yi Jing, Samantha Thompson, & Xuemeng Wang. To download the complete report, including the statistical source code, click here.



0.1 Non-Technical Report

0.1.1 Introduction

The Prosecution Project (tPP) is a collection of data that specifically investigates patterns in political violence and terrorism occurring in the United States from 1990 to the present. Data is continuously being added, so updates to the following analysis may need to occur when more recent data becomes available. Our analysis focuses on characteristics of the terrorists, and their acts, including demographics, religion, prosecution types, ideology, tactic, targeting, and group affiliation. Our goal is to show visually and statistically analyze how these variables change over time.[/su_expand]

0.1.2 Methodology

In order to more clearly detect variable changes, we split the data into time periods separated by major terrorist events. We decided to take this approach to not just evenly split the entire time period (events are not evenly spaced, but amount of data included in each period is fairly similar), but to also see if these major events induced any specific patterns within the variables. We try to depict the reasoning behind these changes, but all of this is subjective – correlation is not necessary causation. The only conclusions we can draw for certain, come from the statistical tests performed, relating to the overall change of each variables’ categories over time.

Some of the variables included many categories; and in order to fit them all into one graph, with enough data available within each category per period, we only took the categories with the highest frequencies – while combining some categories together. This was done on a case by case basis, and more information on how this was completed is in the Appendix. NA’s for that variable were deleted only for that variable, making sure to leave the data in the complete data set in case there were values (not NA) for the other variables.

In order to find differences in each variable over time, we summed each category within the variable and time period, and divided it over the total amount per time period. This gives us the frequency of each category per period, so that we can test if it has differences over time.

The tests we used for this are the Pearson Chi-Square Test, Fisher Exact Test, and Cramer’s V Statistics. Because of the minimal amount of data in some categories per time period, the Fisher Exact Test is included because it has more relaxed rules on data size, contrary to the Pearson Chi-Square Test which is testing similar things. Cramer’s V is a little bit different in that it measures how important the period is in determining each categories count.

These tests do not tell us if the variables’ categories are increasing or decreasing over time, so we created bar charts where all bars are equal to 100%, and within each period the categories are split into percentages.

We additionally wanted to see if any of the variables impacted the counts of another variable over time. To do this, we selected racial/ethnic group to compare with time against (1) prison sentence length, (2) plea and (3) tactic. The Cochran-Mantel Haenszel test was used in order to test the differences over time with now two variables and time, whereas we only had one with time in all previous tests.

0.1.3 Conclusion

We saw that characteristics of terrorists and their acts of terrorism have significant changes over the time period the data was collected in as of now. By using both visualizations and statistical tests, these changes can be closer investigated by importance and size, as each variable has its differences. Overall, the key variables to assign the most importance to based on the statistical tests are Othered Status, Citizenship, Tactic, and Group Affiliation. This is why the visualizations included in this report are chosen, and explored/researched reasons for these changes along with the directions of their differences.

0.2 Technical Report

0.2.1 Introduction

Terrorism in the United States peaked in the late 1960’s and early 1970’s, followed by a precipitous decline (Ross et al, 1989). Despite this decline, terrorism seems ever more present. Large scale media coverage and the development of social media have often been cited as contributors to discerned prevalence of terrorism (Weimann et al, 2014). Further, media coverage of events such as 9/11 has framed many attacks as “Muslims/Arabs/Islam working together in organized terrorist cells against a Christian America”. On the other hand, domestic terrorists often receive the label of “troubled individuals” (Powell, 2011). Thus, there is strong evidence of media coverage affecting the perception of terrorist attacks in the United States. Given the Prosecution Project (tPP) dataset, trends in terrorist activity are analyzed by grouping events into periods delineated by large scale media events and detecting any changes between said periods. This organization of events may allow for the detection of changes in terrorism, perhaps due to perpetrators attempting to imitate previous attacks covered in the media.

0.2.2 Methodology

In order to recognize the patterns in demographics, prosecution types, ideology, tactic, targeting, sentence length, informant, and group affiliation over time, each event was organized into different time periods separated by major terrorist attacks in the United States. The events of interest are listed below:

The purpose of this delineation is to determine whether these events, largely covered in the media, trigger “copycat” terrorist attacks (known as contagion) or somehow impact a variable’s distribution in time periods near said events (Nacos, 2010).

Once each event was grouped, the frequencies of each variable category were computed within each time period and compared using 2-way contingency tables. That is, each variable had its own contingency table with the rows representing the categories given in the variable of interest, and the columns representing the time periods described earlier. Often, multiple categories were either condensed or removed due to sparseness of information (see Appendix for the exact breakdown of tables). The difference in distribution of the categories across time will be tested using both a Pearson Chi-Square Test and Fisher Exact Test.

The Pearson Chi-Square Contingency Table Test tests homogeneity of the time periods. More specifically, it decides whether or not there is a difference between the proportions of the categories of a certain variable across the time periods. For example, if the gender variable were to be considered, it would test whether the proportion of events committed by males and females has changed over time. However, it does not indicate the direction of these changes (Lachin, 2011).

Most of the variables, however, violate the expected count assumption of the Pearson Test. The test assumes that the expected counts in each of the cells are greater than five, but much of the tables contains zero values in multiple categories. Despite this violation, the Pearson Chi-Square Test is quite robust with these small expected cell frequencies (Camili, 163). To ensure this infraction does not impact results, an additional Fisher Exact Test is performed.

Fisher’s Exact Test again tests a difference between time periods in each of the variable category proportions. Specifically, it counts the number of possible tables that could be constructed with the given marginal totals. Then, it computes the proportion of those tables that are more extreme than the observed table, giving a p-value (Raymond et al, 1995). Since this could amount to a large number of tables, a bootstrap simulation with 2000 replicates is considered. This test relaxes the assumptions given by the Pearson Chi-Square Test.

Trends will be visually analyzed using proportional, stacked bar charts. Along with the Pearson Chi-Square tests, Cramer’s V statistics were computed. Cramer’s V is a measure of association between two categorical values ranging from 0 to 1. The higher Cramer’s V, the stronger the relationship between period and the given variable is (Acock et al, 1979).

Finally, the interaction between racial/ethnic group, prison sentence length, and time is considered. Perhaps, over time, certain races will have differing sentence lengths, whether that be a result of discrimination, ethnic tendencies, or other factors. A three dimensional table will be considered with a Cochran-Mantel Haenszel Test applied. This test is an extension of the Chi-Square Test, and, in general, tests for differences in the joint and marginal distributions of three variables (Lachin, 2011).

In each table, any unknown observations were not considered, since they add no information to the story, other that adding sample size and changing inference in a direction that may not necessarily be honest.

0.2.3 Results

From the collection of two-way tables, the distribution of most variables have changed over time. Only the distribution of death sentencing and gender seemed homogenous over time, as both the Fisher and Chi-Square tests failed to detect a difference in their distributions. The uniformity of gender and death sentencing throughout the periods is not surprising, as the vast majority of events in the dataset were perpetrated by men and did not result in a death sentencing of the perpetrator. More interesting insights can be gathered visually.

The three-way tables invites some interesting insights. When comparing ethnicity, sentence length (categorized by every 100 months), and time period, there was no significant difference found between the distributions of the categories within each of the groups. The same results was reached when comparing ethnicity, plea, and time period. However, the Cochran-Mantel-Haenszel Test found a significant difference between the distribution of ethnicity and tactic over the time periods.

The following proportional, stacked bar charts show us how, and the direction of change, on the variables we felt were key to this analysis.

We see in Figure 1 that the amount of terrorism acts by Non-U.S. citizens has consistently decreased over time, with it reaching very minimal counts by 2015 to present day. In 2011, the Department of Homeland Security defined a new term of “specially designated countries” to be countries “that have shown a tendency to promote, produce, or protect terrorist organizations or their members.” In 2003, the Department of Homeland Security provided US border crossings with a list of 52 countries that fell under this term – in order to increase border security against possible terrorists. The list was continually updated and changed until present day. From 2007 to 2017, the US Border Patrol apprehended 45,006 immigrants from any of these countries to have ever been on the list. There have been zero attacks committed by illegal border crossings from any of the listed special designated countries. However, foreigners who have entered legally from these countries are responsible for 99.5% of all murders and 94.7% of all injuries committed by terrorists in the US from 1975 through the end of 2017 (Bier). We see that 9/11 may have spiked this trend that a successful strategy for foreign terrorism is to first enter legally, or to have a US citizen commit the act. After 9/11, the amount of non-US citizens to commit acts of terrorism is at its peak and then its decline. All terrorists involved in 9/11 were non-US citizens. This decrease in non-US citizens being able to commit acts of terrorism is likely the cause of increased security. However, terrorism is evolving so that the US may no longer be looking for non-US citizens to be committing these acts, as our graph shows.

Figure 2 is very interesting in how group affiliation overall changes over time. Not looking into specific terrorist events, but at each group over time, we see that Al Qaeda has decreased consistently over time, but the Islamic State has increased – by large amounts especially in more recent years. There are many factors that play into this variable’s directional changes, and we will try to summarize what we think is the cause the best that we can. Bin Laden, the previous leader of Al Qaeda, was killed in 2011. Period 6 is after the year 2009, and the period that we first start to see the decrease of Al Qaeda. This may be due to their leader dying, but some additional cause of conflict between groups could also play a role. Let’s start at the beginning. Period 4 is after 9/11, an event Al Qaeda wished to take credit for, and therefore Al Qaeda is strong and on the rise here. In period 5, which is after 2006 when Al Shabab was formed, we see a heavier Al Shabab presence seen. Al Shabab was known to be tied to Al Qaeda, and they declared official allegiance to them in 2012. We see both Al Qaeda and Al Shabab decrease after period 8 (2012), which is what we would expect as Al Qaeda was weakened, so was Al Shabab because of their affiliation. We now start to see the rise of ISIS, who have taken advantage of the weakened Al Qaeda and Al Shabab, in order to make their presence more known. Although these groups have similar views, they are not supportive of one another, and have different tactics on how they wish to be heard. We can see how the changes in tactic over time graph below reflects these different groups, by which tactic they decided to use.

Going back on what we discuss in the previous paragraph, we can see in Figure 3 that when Al Qaeda was in greater power, the categories of tactic that are most prevalent are crimes like Arson, Chemical or biological weapon deployment, and Explosives. These are all tactics that support Al Qaeda’s goal to plot terrorism spectaculars to electrify the Muslim world. Whereas, ISIS viewpoint is to aim to control territory and expand their ideology. This can be seen as why once ISIS are in more power, the popular tactics are Providing material/financial support to terrorist organizations, Firearms, and Armed intimidation/standoff – all ways to overtake, build their organization, and control.

Additionally, from Figure 3, we see rises in tactic that could be the result of the major acts of terrorism we split the periods by. Explosives seem to increase from period 1 to period 2, which is after the Oklahoma City Bombing. Also, after the Aurora Theater Shooting, there seems to be a drastic decrease in civilian firearms, while there is an increase in armed intimidation/standoff. On another note, we see perjury/obstruction of justice slowly appear and begin to increase from past to present. This could be the cause of laws changing over time, so as stricter laws are implemented, more people may be convicted.

Other notable changes where graphics are not included are listed here. The terrorists’ religion shows changes over time like after the Charleston Church Shooting, no Christians committed acts of terrorism. This could be due to the shooting happening in a Christian church, making other Christians less likely to commit any crimes or act out. The Veteran Status changing over time plot shows that after 9/11, the amount of veterans that committed acts of terrorism decreased drastically – then fluctuating but never again reaching the amount of terrorism acts before 9/11. Another change we see around 9/11 occurs in the ideological affiliation. We see that after 9/11 there is a massive increase in No Affiliation ideologies. This could be because groups were trying to draw attention away from themselves after all the security measures put into place after 9/11. We also see a huge increase in Rightist ideologies after the Charleston Church Shooting. This is interesting to note because the man that committed this act of terrorism was a 21-year-old white supremacist, who most likely believed in a rightist ideology. After the death of Trayvon Martin, State jurisdiction for acts of terrorism increased largely, possibly due to the pressure on local police following this event. The increase in verdict of charged but not tried over time can be due to possible ongoing cases as we get closer to present day. After the first major act of terrorism, we see more informants coming forward to prevent terrorist events.

The three-way tables invites some interesting insights. When comparing ethnicity, sentence length (categorized by every 100 months), and time period, there was no significant difference found between the distributions of the categories within each of the groups. The same results were reached when comparing ethnicity, plea, and time period. However, the Cochran-Mantel-Haenszel Test found a significant difference between the distribution of ethnicity and tactic over the time periods.

To further inspect these differences, a stacked bar plot was developed. Ethnicity was limited to only the white and middle eastern groups, as they provided interesting insight. Over time, it seems that of crimes in the data set committed by people of middle eastern ethnicity, the proportion of those crimes that included providing financial support to terrorist organizations has increased drastically over each time period. This occurrence spawned right before the 9/11 attacks. Crimes perpetrated by white individuals in Period 2, post Oklahoma City Bombing, started to consist mainly of explosives, perhaps furthering the idea of similar “copycat” crimes being committed after large media coverage of terrorist attacks. Similarly, after the Aurora shooting, white criminals seemed to heavily gravitate towards armed intimidation to commit their crimes as well. Other ethnicity plots can be seen in the Appendix.

0.2.4 Conclusion

The analysis provides some evidence that “copycat” terrorism or contagion impact the distribution of multiple characteristics of terrorist attacks over time. These changes are especially prevalent in the distribution of tactics across ethnicity and othered status after key events such as the Oklahoma City Bombing, 9/11, and the Aurora Shooting. Further, Ideological Affiliation trended towards Rightist Leanings after the Charleston Church Shooting, while Group Affiliation has seen a recent increase in attacks perpetrated by the Islamic State, despite the decrease in attacks perpetrated by Al-Qaeda. The claim that characteristics of these terrorist attacks are associated with the selected time periods are both bolstered by the results given by Chi-Square Tests and Cramer’s V quantitates. Of course, the Chi-Square Tests only say that period and terrorist attacks are associated and do not imply mechanism. However, the bar charts provide the context to our hypothesis. The analysis is limited by sparseness of events in some categories in which measures were taken to combat.

0.3 References

Acock, Alan C., and Gordon R. Stavig. “A measure of association for nonparametric statistics.” Social Forces 57, no. 4 (1979): 1381-1386.

Bier, David, and Alex Nowrasteh. “45,000 ‘Special Interest Aliens’ Caught Since 2007, But No U.S. Terrorist Attacks from Illegal Border Crossers.” Cato Institute, 17 Dec. 2018, www.cato.org/blog/45000-special-interest-aliens-caught-2007-no-us-terrorist-attacks-illegal- border-crossers.

Camilli, Gregory, and Kenneth D. Hopkins. “Applicability of chi-square to 2× 2 contingency tables with small expected cell frequencies.” Psychological Bulletin 85, no. 1 (1978): 163.

Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL http://www.jstatsoft.org/v40/i03/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

Jeffrey B. Arnold (2019). ggthemes: Extra Themes, Scales and Geoms for ‘ggplot2’. R package version 4.1.1. https://CRAN.R-project.org/package=ggthemes

Lachin, John M. Biostatistical Methods: The Assessment of Relative Risks. 3rd ed. Hoboken: Wiley, 2011.

Nacos, Brigitte L. “Revisiting the contagion hypothesis: Terrorism, news coverage, and copycat attacks.” Perspectives on Terrorism 3, no. 3 (2010).

Powell, Kimberly A. “Framing Islam: An analysis of US media coverage of terrorism since 9/11.” Communication Studies 62, no. 1 (2011): 90-112.

Raymond, Michel, and François Rousset. “An exact test for population differentiation.” Evolution 49, no. 6 (1995): 1280-1283.

Ross, Jeffrey Ian, and Ted Robert Gurr. “Why terrorism subsides: A comparative study of Canada and the United States.” Comparative Politics 21, no. 4 (1989): 405-426.

Weimann, Gabriel. New terrorism and new media. Vol. 2. Washington, DC: Commons Lab of the Woodrow Wilson International Center for Scholars, 2014.

See full report for complete contingency tables, stacked bar plots, and r code for age, gender, othered status, ethnicity, religion, veteran status, citizenship, jurisdiction, plea, verdict, length of sentence, death sentence, ideology, tactic, physical target, ideological target, informant, group affiliation, FTO affiliation.

tPP Preliminary Statistical Report #3 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the analysis report provided by Team #3 (Identity and Criminal Action Analysis) of 5.

This report was authored by Athena Chapekis, Jing Lin, Ruoqi Tan & James Wieneck. To download the report, click here.



Non-technical Summary

Introduction

We are presented with a data set involving individuals who were indicted and prosecuted for crimes which have socio-political motivations and/or crimes that have rendered them as designated terrorists in the United States. These cases involve various identity variables of the defendants (age, race/ethnicity, gender, “othered” status, religion, citizenship status, and veteran status), as well as various criminal activity variables (people vs. property, number injured, number killed, physical target, ideological target, and tactic). The question we seek to answer is: how do aspects of a defendant’s identity play a role in their criminal activity?

Results

The statistical result shows unbalanced levels among identity variables. For gender, the vast majority of offenders are male. For race and religion, ‘Muslim’ appear more frequently. Most cases have civilian status. The most common tactic is ‘Providing material/financial support to terrorist organization’, ‘Unspecified’ appear most frequently as ideological target, ‘Online’ appear most frequently as physical target.

All identity variables have significant relationships with activity variables, however the actual size of the effect varies across different variables. Gender and othered status affect the number of persons killed or injured significantly, with men and othered defendants having a higher injury count. Age was a consistently influential variable when examining how trends in criminal activity are influenced by one’s identity across the board, though it almost always had some interaction with citizenship status, veteran status, and/or othered status. Othered status was also a highly influential variable in predicting different trends in criminal activity.

Conclusions

This report finds that the identity variables which have the greatest prediction effect of criminal activity are Othered Status, Religion, Ethnicity/race, Citizenship Status, and Veteran Status. Gender is a significant predictor of the number of killed and injured by a crime but is not a significant predictor of other criminal activity variables.

The models we built in predicting trends in criminal activity based on the identities of the defendants had poor predictive power, in part because of unused scenarios and unspecified cases for multiple variables. The data set used for the analysis may likely need more information provided to give a more complete picture of how criminal activity is linked to a defendant’s identity.

Technical report

Introduction

The definition of what constitutes “terrorism” is not a unanimous one. Different sources report different standards for what an act of terror entails. Because of this, there has not been a thorough body of research built on terrorism in all its forms. Issue-specific groups like the Department of Justice (DOJ)/Federal Bureau of Investigation (FBI), the Center for Biomedical Research (CBR), and the National Abortion Federation (NAF) have collected their own databases of terrorism and terrorists over time, but they generally focus on one specific ideological group – whichever is of the greatest concern to them.

The Prosecution Project (tPP) is a large-scale project out of Miami University that seeks to construct a database of all acts of terrorism and socio-politically motivated crimes ending in felony prosecutions in the United States 1990-present. Each case in tPP’s database is coded across 44 variables, including demographic information on the defendant, details of their affiliations, details of the crime they committed, and details of the legal proceedings.

This report seeks to investigate the connection between a defendant’s identity (i.e. their demographic information) and their criminal activity and provide an answer to the question of how who someone is relates to what they do.

Methodology

The first step in approaching this analysis is to clean the data. Categorical variables which have many levels are reduced to allow for better comparison and analysis. Much of this reduction was done using the classification provided by the Prosecution Project codebook.

For example, in the variable Physical Target, the levels of ‘Federal site: non-military non-judicial’, ‘Federal site: military’, ‘Federal site: judicial’, and ‘Federal site: non-U.S. embassy or consulate’ are combined and recoded simply as ‘Federal site’. Furthermore, the levels for ‘State site’ and the levels for ‘Municipal site’ are combined with ‘Federal site’ to make one unified level of ‘Governmental site’. This is done for the variables of Physical Target and Ideological Target. Due to the low representation in many of the levels for the variable ‘Tactic’, many levels were combined into an ‘Other’ level. Other categorical variables that were not recoded but included in this report in their original state are People vs. Property, Gender, Ethnicity, Religion, ‘Other’ Status, Citizenship Status, and Veteran Status. For each categorical variable, a bar chart is generated to compare frequencies of levels.

To conduct an analysis, this report begins with T-tests to determine the influence binary predictor variables Gender (male v. female), Othered Status (othered v. non-othered), and Veteran Status (citizen v. non-citizen) may have on number of people killed and number of people injured in socio-politically motivated crimes. A significance level of 0.05 is used. Furthermore, Analysis of Variance (ANOVA) tests are used to test for significant differences in the number of people killed and the number of people injured between demographic groups for the identity variables of Race/ethnicity, Religion, and Citizenship Status. As well, ANOVA tests are used to see if a defendant’s age differs significantly between the types of things that are targeted in socio-political crimes (both physically and ideologically) and if age differs significantly between types of tactics. On top of the ANOVA tests, Eta Squared values are calculated to test for effect size in the relationships (Brown). To investigate relationships between categorical identity variables (e.g. Religion, Citizenship Status, etc.) and categorical activity variables (e.g Tactic, Physical target, etc.) Chi-Squared Tests of Independence are used. As well, Cramer’s V is used to calculate effect size for the respective relationships between these categorical variables. Initially, this report sought to use linear regression to create a predictive model of trends. However, we have found that due to the categorical nature of many of the variables (often with many levels) and given there are different trends among differing variables related to the crime, it is not advisable that we attempt to build regression models based on a singular response variable. Instead, we will want to use classification tree modeling for the categorical variables whose trends we want to analyze and ANOVA tree modeling for the numerical variables whose trends we want to analyze.

We will be using classification trees for the following variables: People vs. Property, Physical Target, Ideological Target, and Tactic; we will be using ANOVA/regression trees for the following variables: Number Injured and Number Killed. These will be considered as our criminal activity variables for this portion of the analysis. The identity variables we are using in this portion of the analysis are age, gender, race/ethnicity, religion, othered status, veteran status, and citizenship status. The purpose of this portion of the analysis is to see which aspects of a criminal’s identity are most often associated with various aspects of criminal activity, and also how these aspects interact or intersect. To validate the results from our classification and regression trees, we will also be using random forests for each model to see which variables are most significantly linked to each criminal activity variable, and to see which variables the most significant contributors were to differences in criminal activity trends (Liaw). For each random forest, 1,000 classification trees will be generated.

Results

For most of the categorical variables, there are a number of levels which appear in the data very infrequently.

Identity variables

Looking at the demographics of the data, we see fairly uneven representation among levels for almost all of the variables. As far as gender, the data is overwhelmingly male, and the levels of ‘Non-binary/gender non-conforming’ and ‘Unknown/unclear’ are used virtually never.

Ages range from 16 to 88 with a median age of 33 and a mean age of 35.9. The ethnicities of ‘Biracial’ and ‘American Indian/Alaskan Native’ hardly occur, and for Religion, ‘Jewish’ and ‘Other’ appear very infrequently. As well, ‘Christian’ and ‘Christian Identity’, while occurring somewhat more often, do not occur in the data nearly as often as ‘Muslim’ and ‘Unknown’.

Regarding Citizenship Status, all levels are relatively infrequent compared to ‘Civilian’ and ‘Foreign national’. There are more cases marked as ‘Othered’ than ‘Non-othered’, but both are well-represented in the data. Lastly, when looking at Veteran Status, almost all cases are coded ‘Civilian’. All othered statuses are fairly uncommon and combined make up only about 16% of the data.

Criminal activity variables

The most commonly occurring tactic by far is ‘Providing material/financial support to terrorist organization’. After that, ‘Explosives’, ‘Criminal violation not linked or motivated politically’, ‘Various methods’, ‘Arson’, and ‘Firearms’ occur most frequently.

All levels in the People vs. Property variable are fairly well represented. Regarding targets, for Ideological Target, ‘Unspecified’ is the most frequently occurring level in the data followed by ‘Government’, but all levels aside from those do appear to occur at similar rates. For Physical Target, the levels of ‘Online’, ‘Educational institution’, and ‘Municipal site’ do not occur frequently.

Analysis of Variance (ANOVA)

From the results of ANOVA test, the F test shows that race, religion, and citizenship have significant influence on number of killed and injured. The identity variable age has significant relationship with the activity variables people or property, physical target, ideology target, and tactic. The eta squared test shows that citizenship has larger effect on number of killed and injured than race and religion, and ideological target has the largest effect on age.

Student’s T-test

Regarding the number of people killed by a crime, we can be 95% confident that, on average, for each death caused by a woman’s crime, men’s crimes kill between 0.08 and 8.76 more people. For the differences in the number of people injured, we can say with 95% confidence that, on average, men injure anywhere between 16.11 and 52.71 more people than women in the course of a socio-politically motivated crime. There is no statistically significant difference in fatalities between crimes committed by othered and non-othered defendants, however, we can be 95% confident that othered defendants injure between 20.15 and 76.3 more people in the course of their crime than non-othered defendants. As well, there is no statistically significant difference found in the number of people killed or the number of people injured between the those who are civilians and those who were not.

Chi-Squared and Cramer’s V

The results of the Chi-Squared Test of Independence showed widespread statistical significance between all identity variables and all criminal activity variables. When Cramer’s V is calculated for effect size, however, it appears that many identity variables have a weak effect on criminal activity. Specifically, gender seems to have the least effect on criminal activity. Othered Status has a particularly significant effect on criminal activity, so much so that Cramer’s V indicates Othered Status may be measuring the exact same trends as the criminal activity variables.

Classification/Regression Trees and Random Forests

Figure 1. The classification tree for the people vs. property variable. At least 50 cases were required for each split, and each final outcome required at least 50 cases.

What we have been able to see is that for predicting the trends in whether a target is human or property, othered status appears to interact with veteran status and age. Othered defendants are more likely to either have targeted people or have no direct target (Figure 1). Of othered defendants who were of civilian status, released on hardship discharge, or whose veteran status was unknown, no direct target was identified; otherwise, people were more likely to be targeted. Among those of non-othered status, those whose veteran status was active duty, dishonorably discharged, belonging to a non-U.S. military, or unknown were more likely to target people. Among those who were not of those veteran statuses, age was an additional factor; those and who were 52 and under were more likely to target property, and those 53 and over were more likely to target people (Figure 1). We can see that the most significant variables which made a difference in the trends in which type of target was involved were othered status, veteran status, and age, in this order.

Figure 2. The variable importance plot for the people vs. property random forest model.

After conducting a random forest on the data used to build the classification model and plotting the importance of each variable, we find that veteran status, othered status, and age are the largest contributors to the differences in which types of targets defendants tend to target. Age and othered status were particularly strong determinants in these patterns (Figure 2).

Figure 3. The ANOVA/regression tree for the number killed variable. At least 20 cases were required for each split, and each final outcome of the tree required at least 15 cases.

What we can see for the number of fatalities in each crime is that there is a split at veteran status. Those whose veteran status was either active duty, civilian, dishonorably discharged, honorably discharged, or unknown had an average of 1.8 fatalities (Figure 3). Among that group, the average number of people who were killed as a result of a defendant whose citizenship status was either refugee, residing on a visa, a citizen, a permanent resident, or unknown had a fairly low average of 0.77 (Figure 3). Among defendants who were not of these citizenship statuses, there was an average of 5.7, and another split at religion (Figure 3). Those whose religion was identified as Christian or unknown had fairly low average fatalities at 0.43, which was lower than for those whose religions fell outside of these 2 categories at 10 (Figure 3). From there, age was a major determinant in the number of fatalities. Those who were under 25 had, on average, the second-most fatalities at 30, and those who were 25 or older only had 7.6 fatalities on average (Figure 3).

For defendants who were a former or current non-U.S. military member or who were discharged on the basis of hardship, the average number of fatalities was 10 times higher than defendants not of these veteran status categories at 18 fatalities (Figure 3). We notice that, from here, there is a split at age; those who were 35 or younger had an average fatality count of 6.2, whereas those who were 36 or older had an average fatality count of 32 (Figure 3). We can see that the most significant variables in predicting differences in the number of people killed were veteran status, citizenship status, religion, and age.

Figure 4. The variable importance plot for the people killed random forest model.

After conducting a random forest on the data used to build the regression/ANOVA model and plotting the importance of each variable, we find that age is a very significant predictor in determining the differences in fatalities among each case of crime (Figure 4). However, we cannot ignore the influence of veteran status or citizenship status, as they were significant variables on which the regression trees were split, and the variable importance plot also reflects this (Figure 4).

Figure 5. The ANOVA/regression tree for the people injured variable. At least 50 cases were required for each split, and each final outcome of the tree required at least 25 cases.

Looking at our results in Figure 5, we find that among defendants who were U.S. citizens, refugees, residents on a visa, permanent residents, or of unknown citizenship status, the average number of people injured was 4.1. For defendants who were not, there was a split at religion; those whose religion was identified as Christian, Christian Identity, or unknown had an average of 1.6 injuries (Figure 5). Among those whose religions were not in those categories, there was a split at age. For those who were 27 or older, the average number was 141, and for those who were 26 or under, the average number was 429 (Figure 5). We can conclude from this tree that citizenship status, religion, and age were important factors in predicting the differences in the number of people injured.

Figure 6. The variable importance plot for the people injured random forest model.

After conducting a random forest on the data used to build the regression/ANOVA model and plotting the importance of each variable, we find that citizenship status and age are particularly important in determining trends and predicting differences in the number of people injured as a result of a crime (Figure 6).

Figure 7. The classification tree for the physical target variable. At least 75 cases were required for each split, and each final outcome required at least 75 cases.

After conducting a random forest on the data used to build the regression/ANOVA model and plotting the importance of each variable, we find that citizenship status and age are particularly important in determining trends and predicting differences in the number of people injured as a result of a crime (Figure 6).

What we can see in this classification tree is that there is an initial split for othered status (Figure 7). Among those of othered status, we can see a split for veteran status. Among defendants who were civilians, former veterans released on hardship discharge, or former veterans who were honorably discharged, the physical target was more likely to be unspecified; among defendants whose veteran status did not fall in these 3 categories, no direct physical target was found (Figure 7). For those of non-othered status, private sites were more likely to be attacked, and there was a split for religion. Defendants whose religion was identified as Christian, Jewish, or Muslim were more likely to have an unspecified target, and those whose religion was not one of those 3 were more likely to attack private property (Figure 7). There is a further split in age; defendants who were 40 or older often had an unspecified physical target, whereas those under 40 tended to attack private sites (Figure 7).

Figure 8. The variable importance plot for the physical target random forest model.

After conducting a random forest on the data used to build the classification model and plotting the importance of each variable, we find that age, religion, othered status, and veteran status are important in predicting differences in physical targets (Figure 8). Age and veteran status appear to be particularly important in determining the differences between physical targets (Figure 8).

Figure 9. The classification tree for the ideological target variable. At least 50 cases were required for each split, and each final outcome required at least 25 cases.

We notice that the first split of this classification tree is at othered status (Figure 9). Of defendants who are of othered status, there is a split at veteran status. For defendants who are civilians, were honorably discharged, were discharged on the basis of hardship, or whose veteran status is unknown, there was an unspecified ideological target; for defendants whose veteran status is not one of those 4 categories, government was the most likely ideological target (Figure 9). For those of non-othered status, there is a split on age; those who were 35 or over were more likely to attack government targets on the basis of ideology (Figure 9).

For non-othered defendants who were under 35, there was a split on religion; those whose religions were identified as Christian, Christian Identity, Jewish, or Muslim tended to attack on the basis of identity (Figure 9). Among those whose religions were not one of those 4 categories, veteran status was a significant predictor; civilians were more likely to attack left-leaning industries, while non-civilians were more likely to attack government on an ideological basis (Figure 9). In general, we have found that othered status, veteran status, age, and religion were significant variables in predicting ideological target.

Figure 10. The variable importance plot for the ideological target random forest model.

After conducting a random forest on the data used to build the classification model and plotting the importance of each variable, we find that age, othered status, and religion are important in predicting differences in ideological targets (Figure 10). Age and othered status appear to be particularly important in determining the differences between ideological targets (Figure 10).

Figure 11. The classification tree for the tactic variable. At least 100 cases were required for each split, and each final outcome required at least 100 cases.

Othered status appears to be very significant in predicting the tactic that a defendant used in committing a crime (Figure 11). Among those who are of othered status, the most common tactic, by far, was providing material support to a terrorist organization (Figure 11). Among those of non-othered status, religion is a significant predictor of tactic; defendants whose religion was identified as Christian, Muslim, or “Other” were more likely to employ multiple (or various) methods (Figure 11). Among defendants whose religion was not Christian, Muslim, or “other”, age is a significant predictor of tactic; those who were 30 or over were more likely to use explosives when committing a terrorist act, and those who were under 30 were more likely to use arson (Figure 11).

Conclusions

This report finds that while all interactions between variables that define a defendant’s identity and variables that define a defendant’s criminal activity are significant, the variables which have the greatest prediction effect in terms of criminal activity are whether a defendant is othered or non-othered and the factors which contribute to that differentiation (religion, ethnicity/race, citizenship status), and a defendant’s veteran status. A defendant’s gender, while a significant factor in terms of the number of victims that result from a socio-politically-motivated crime, is generally not a significant predictor in other factors of criminal activity (tactic, target, etc.). The results from our classification/regression trees and random forests appear to show that the most significant identity variables associated with different trends in criminal activity were related to age, citizenship status, veteran status, religion, and othered status. For the classification trees and their associated random forests, the variables that were particularly of importance were age and othered status, and for the regression trees and their associated random forests, the variables that were particularly of importance were age and citizenship status. Overall, age proved to be a very significant predictor in explaining differences in trends in criminal activity.

Some limitations of these random forests and classification/regression trees was the large number of unspecified or unknown cases, as well as a sizable number of unused levels for tactic, physical target, ideological target, and people vs. property. We noticed that for the classification tree models, the general error rate generally ranged from 46-55%, and for the regression/ANOVA tree models, the percentage of variability explained by the model was in the negatives. Thus, because of the poor predictive power of these models, we must exercise caution in assuming that the identity variables we found to be significant have any causal effect.

References

Brown, James D. 2008. “Effect size and eta squared.” JALT Testing & Evaluation SIG News. conjugateprior. 2013. “Formulae in R: ANOVA and other models, mixed and fixed.” Blog. Accessed February 27, 2019. Retrieved from http://conjugateprior.org/2013/01/formulae-in-r-anova/.

Liaw, A., and M. Wiener 2002. Classification and Regression by randomForest. R News 2(3), 18-22.

Loadenthal, Michael, et al. 2019. “The Prosecution Project (tPP)” (Version March 2019) [Dataset]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

Loadenthal, Michael, Athena Chapekis, Lauren Donahoe, Alexandria Doty, and Sarah Moore. 2019. “The Prosecution Project (tPP) Codebook” (Version 2) [Code book]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

Loadenthal, Michael, Athena Chapekis, Lauren Donahoe, Alexandria Doty, and Sarah Moore. 2019. “The Prosecution Project (tPP) New Member Guidebook” (Version 1) [Instructional Manual]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

Milborrow, Stephen. 2018. rpart.plot: Plot ‘rpart’ Models: An Enhanced Version of ‘plot.rpart’. R package version 3.0.6. https://CRAN.R-project.org/package=rpart.plot

Navarro, D. J. 2015. Learning statistics with R: A tutorial for psychology students and other beginners. R package version 0.5. University of Adelaide. Adelaide, Australia.

Salvatore S. Mangiafico. 2015. “Student’s t–test for Two Samples”. http://rcompanion.org/rcompanion/d_02.html

Therneau, Terry, and Beth Atkinson. 2018. rpart: Recursive Partitioning and Regression Trees. R package version 4.1-13. https://CRAN.R-project.org/package=rpart

Wickham, Hadley. 2017. tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

tPP Preliminary Statistical Report #2 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the non-technical analytical summary and selected visualizations for Team #2 (Ideological Analysis) of 5.

This report was authored by Lesi Wei, Lexi Gelinas, Siqi Zhang & Yiduo Yang. To download the complete report, click here.



Introduction

The Prosecution Project (tPP) has collected data on cases in which individuals or groups engage in political violence that results in a felony or has been described through State speech as having a connection to a terrorist or extremist group with a political agenda. Specifically, this analysis is looking at several key variables in the relationship between ideology and the political violence itself.

Results

Ideology and Lethality

There are more instances of political violence that do not result in a death, but of the ones that do, Rightist groups commit more of these attacks than other groups.

Ideology and People vs Property

Salafi, Jihadist, or Islamic groups commit more attacks against no direct target than any other group. Rightist groups have more cases in which they attack property than people.

Tactic and Physical Target

Threat/support of an organization is the most used tactic and has the most cases in the online community and against unknown targets.

Ideology and Ideological Target

Salafi, Jihadist, or Islamic groups have more cases in which they attack unspecified ideological targets more than any other groups.

Ideology and State Speech

No group affiliation and Leftist groups have more cases in which they use state speech than the other groups

Tactic and Group Affiliation & FTO Affiliation

 

Salafi, Jihadist, or Islamist individuals tend to have strong tactic of threat/support of an organization, and the rightist tend to external device as their tactic. And group that affiliation with an FTO, individuals tend to provide material/financial support to the terrorist organization. No affiliation with an FTO, leads to more use of an external device.

Ideology and Location

Salafi, Jihadist, or Islamist Individuals commit more attacks in the East Coast, West Coast, and Midwest areas in the United States. Rightist groups commit more attacks in the Central area of United States. Leftist only have two states in which they commit the most political violence.

Conclusions

Not all groups of categorical variables have obvious trends, only few categories have some significant trends under each variable based on the plots. The deeper analysis will examine this in the technical report part.

References

McHugh, M. (2013). The Chi-square test of independence. Biochemia Medica 23 (2) 143-149.

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ .

tPP Preliminary Statistical Report #1 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find an analytical summary and selected visualizations for Team #1 (Descriptive Analysis) of 5.

This report was authored by Emma Ellis, Sikai Huang, Haiduan Tao & Haosen Yang. To download the complete report, click here.



The main question answered in this report is: How does the US legal system prosecute acts of political violence (descriptive) and how has this changed over time and space?

First, the data was mined and edited using RStudio. The final format had 1280 observations. The only observations that were removed from the data set were cases that had ‘pending’ as values because these had no information and would negatively impact the descriptive statistics that were created. Each of the variables chosen had a table created. These tables looked at Category, Number of Observations, Average Prison Sentence Length, Percentage of Life Sentences, and Percentage of Death Sentences. Multiple tables had a lot of zeros under the death sentence column.

After tables were initially created it was decided that the combination of some categories depending on the variable would occur. The only variable that did not have a table created was the location. That is because a geomap was found to be more beneficial as a visualization. The geomap showed that states with higher populations also had a higher amount of life and death sentences.

The white color states (Wyoming, Nebraska, Rhode Island, and Hawaii) have no information in the data provided in the project. New York has the largest prosecution count number, far more than other states. Overall, about 87% states’ length of prison sentences is fewer than 200 months. Oklahoma and New Hampshire have longer prison sentence than other states, but they have few prosecution counts. Texas, California and New York also have relatively longer prison sentence with more prosecution count. Oklahoma has the largest percentage of life sentence and death sentence. Nearly half of the states have life sentences and 23% of states have death sentences.

Since this analysis is wholly descriptive there can be no definite conclusions drawn for predicting the length of a prison sentence. From the tables that were created and the geomap, there are some trends that were found in regards to life and death sentences.

One major finding is that there were no death sentences given to any case where the criminal was not of U.S. Citizenship.

Another notable find was that if there were no deaths involved there was no death sentence given, the most interesting part of this is that there were over 1,000 observations of zero killed.

The last notable find was that if an informant was present there were no cases that resulted in the death penalty. This can be explained by a crime being able to be stopped if the police were informed beforehand.

References

Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL http://www.jstatsoft.org/v40/i03/ .

David M Diez, Christopher D Barr and Mine Cetinkaya-Rundel (2017). openintro: Data Sets and Supplemental Functions from ‘OpenIntro’ Textbooks. R package version 1.7.1. https://CRAN.R-project.org/package=openintro

Paolo Di Lorenzo (2018). usmap: US Maps Including Alaska and Hawaii. R package version 0.4.0. https://CRAN.R-project.org/package=usmap

Carson Sievert (2018) plotly for R. https://plotly-book.cpsievert.me