tPP Preliminary Statistical Report #5 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the non-technical and technical analytical summaries and selected visualizations for Team #5 (Classification & Characteristic Tree Analysis) of 5.

This report was authored by Brent Crist, Elena McDonald & Yuan Liu, Xinru YuTo download the complete report, including the statistical source code, click here.



Non-Technical Summary

Introduction

Classification of terrorist attacks is the main problem of the Prosecution Project. Terrorism is one of the hottest topics in the news today, due to its increasing prevalence. Looking at acts of terrorism or political violence from a case-to-case basis, it is interesting to see how the government classifies each of them. Having the only reason for inclusion being “State Speech Act” in comparison to a combination of State Speech Act with other reasons, or no State Speech is of interest. Determining factors for why and how the government labels these cases provides an opportunity for analysis. The data comes from The Prosecution Project (tPP) from the sociology department at Miami University and yields the Reason for Inclusion, Tactic, Number Killed, Number Injured, and Othered Status for each case. This tPP dataset looks into the taxonomy of felony criminal cases involving illegal political violence, occurring in the United States since 1990. Utilizing the tPP dataset will allow for an explanation of the government classifications and the effects these variables have on the decision and how it changes through time.

Results

The Lethality variable is split by Reason of Inclusion categories: State Speech (the motivation for the terrorist act is explicitly political), No State Speech (the motivation for the terrorist act does not involve political purposes), and Combination (a mixture of the two). For better examination of the distribution for the lethality, below is the mean and the standard deviation for each reason, along with the number of cases belonging to the Reasons. It is clear that mean and standard deviation of State Speech are the lowest and have a large variability in comparison to No State Speech and Combination. It also occurs in the same as number of cases.

Looking at the Methods attackers are using, the top three Methods per Reason for Inclusion are below. Providing Support to a terrorist organization is the top method for No State Speech and a Combination. Non-political Method is the most common for State Speech and represents over half of all State Speech Cases. Generally, terrorist attacks in the news, in recent years, involve explosives, firearms, and/or vehicle ramming. The Explosives Method appears less frequently than one might expect, given the frequency of news articles.

The third variable of interest is Othered Status. The table below, once again, breaks down Othered Status into each Reason for Inclusion. For both State Speech and a Combination, Othered individuals heavily out number Non-Othered. In cases that are No State Speech, the two groups are almost perfectly split fifty-fifty.

Conclusion

For Lethality, no state speech is the most common reason, where state speech is much lower. Interestingly, providing support to terrorists or terrorist organizations is the most frequently encountered category for both no state speech and combination. Given the size of both of these categories, the frequency of this providing support is of interest to researchers for its implications in both separate categories. In all cases, the othered status of an individual might help researchers better understand how the state labels these people as terrorists. Because the categories state speech and combination carry implications of a directed attack against the state, the juxtaposition of the othered status reveals data to researchers who might be studying the othered status of terrorists.

Technical Summary

Introduction

The Prosecution Project provides a chance to determine when and what factors cause the state to label a criminal act as terrorism. In this analysis, many different techniques aid the process of determination of how these acts make the list. Data manipulation and cleaning assist the analysis by creating convenient (and statistically viable) groupings. Summary statistics and data visualization further enhances the ability to better understand how these variables change over time and how they relate to one another. Creating a characteristic tree is a strong method for analyzing what factors cause the government to label criminal acts as terrorism. The random forest method allows for validation of pruned trees and aids the analysis in this paper.

Methods

Data cleaning and manipulation are the first two crucial steps to proper analysis. For the tPP data, the research question revolves around the following variables: Reason for Inclusion, Tactic, Lethality, Other Status, and Date. Lethality is not a variable present in the data set; construction of the Lethality variable consists of adding the total kills and injuries per case, resulting from an offense. To answer the time element to the research question, the use of presidential terms creates meaningful time intervals for comparison. Associating the Day, Month, and Year of an event with the Day, Month, and Year of the inauguration of each president (in the scope of the data frame) allows for this timeline to form. The earliest case in the data frame occurs during Bill Clinton’s service, while the latest case occurs during Donald Trump’s service, with George W. Bush and Barack Obama in between. By adding the political affiliation of each president, another layer of analysis and comparison comes into play.

For purposes of the characteristic tree analysis, reduction of the Tactic variable with twenty unique levels is necessary. Reducing the number of levels gives more splitting power in the characteristic trees, further in the analysis. The percentage of cases involving each tactic hints at how much information each unique tactic provides to the overall analysis. Having eight levels, seven without Other, rather than the original twenty levels strengthens the resulting analysis.

Reason for Inclusion also must undergo manipulation. To look specifically at the prevelane of the State Speech Act, splitting of Reason for Inclusion reflects this act. The three groups become cases that are State Speech, Not State Speech, and a Combination of the State Speech Act and other reasons. With this new variable, along with the others, the data are ready for investigation. Working with the data, summary statistics for Reason, Method, Lethality, and Other Status show how the data behaves and what it looks like. Additionally, separating bar graphs for the same set of variables by President, shows how each of these are changing in time. The bar graphs for Reason, Method, and Other Status are proportions while the bar graph for Lethality represents a count.

Creation of a characteristic tree (Buntine, 1992) can help analyze what factors cause the government to include each case, and the reason for the inclusion. Building a characteristic tree is not enough, both cross-validation and building a random forest provide insight as to how well the tree fits to the data. Execution of this technique in R, by partitioning the data into a training and testing set, produces this information. Fitting a tree, using a cost element for each partition, creates the optimal tree which will undergo methods of cross-validation (Zhong, 2016).

Comparing the values of the predictions and the real data computes the accuracy of these models. Further testing of the accuracy comes from the Random Forest, in the creation of a large sample of random trees (Zhong, 2016). By creating a large number of random trees, which use a random selection of the variables to split on, provides more evidence of model accuracy. The random forest generalizes the process, as such, the comparing predictions from the testing data set gives a stronger accuracy measure.

Many R packages are essential for the methods of this analysis. These procedures require the lubridate (Grolemund and Wickham, 2011), caret (Kuhn and Others, 2019), rpart (Therneau and Atkinson, 2018), rpart.plot (Milborrow, 2018), and randomForest (Liaw and Wiener, 2002) packages in R.

Results

In order to properly understand the motive of terrorist attacks, the execution methods play a vital role in their inclusion to this dataset. The Prosecution Project includes an exhaustive list of methods detailing how the acts are committed; however, grouping methods with similar tactics allow for proper analysis. That is, all acts, including acts that effectively serve as the threat of committing another act, are in the same group for analysis (e.g. “Explosives” and “Bomb Threats” become “Explosives”). Additionally, tactics that are “Unspecified” are not useful to a deeper understanding and hence, do not appear in this analysis. Lastly, all tactics that comprise less than 1% of the total tactics and do not fit neatly into the aforementioned methods (Animal Release, Blockading, Unarmed Assault, Vandalism) do not appear in this analysis (see Prevalence of Tactic table in Appendix for more details). These categories, with the terrorists’ reasoning, offer more insight into how a terrorist attack carries out given their motivation. The table below shows the prevalence of each Method in the data in relation to each Reason for Inclusion.

Interestingly, more than half the cases that are State Speech are Non-Political (e.g. James Tyler Williams who killed a homosexual couple because they were gay). The majority of State Speech cases are Non-Political which are non-violent crimes relating to assisting terrorism or denying the ability of the state to pursue these crimes. No State Speech’s top three methods together account for 62.2% of the cases in this category. This means that there is a higher spread of types of crimes as opposed to State Speech’s Non-Political or Combination’s Provide Support which are more highly skewed to these crimes.

The summary statistics of lethality per method provides useful insight into how each of these crimes change by lethality. For instance, the mean lethality of Firearms should be different from the Provide Support method. The standard deviation also shows the spread of each of these methods.

Most cases yield results that fit the narrative of terrorism. Notice the higher means in the Firearms and Hostage/Standoff categories and the lower means in Non-Political and Provide Support categories. Higher standard deviations in the Explosives, Firearms, and Hostage/Standoff categories create a level of uncertainty in how many people are likely to be killed or injured from one of these attacks.

The Othered Status of an individual provides notable statistics for the Reason for Inclusion as well. It is critical to note that the Othered Status itself is quite subjective and is not a uniform label. That is, in no way are there exact criteria for a terrorist to be given an Othered Status. Mapping the Othered Status of a person to the reason their crime was included in the database allows for insight on how an othered person’s crime might be perceived by the State.

State Speech has the largest discrepancy between Othered and Non-Othered Status. This is to say that the vast majority of terroristic acts, when involving State Speech, are by Othered people. Whether or not this has any bearing to what period of time these acts happen appears later in this paper. No State Speech sees an almost even percentages by either Othered or Non-Othered people. As the guidelines for No State Speech are less specific than the other Reasons for Inclusion, there might be less cause for people of Othered and Non-Othered status to commit motivated terrorist attacks and more for the sake of senseless violence. The Combination Reason for Inclusion sees just over twice as many Othered people committing terrorist attacks for this reason as Non-Othered people.

As technology and geopolitical climates change with time, so too does the methodology of a terroristic act. Grouping these methods by their place in time relative to the President in office at the time of their happening gives way to visual representation of these statistics.

Drastic changes come over the years as the geopolitical climate changes. Notice there is a massive increase in the Provide Support (yellow) Method in the Bush, Obama, and Trump Administrations vice the Clinton Administration. This could be due largely to the fact that the Global War on Terror takes place during these Presidencies but not during Clinton’s. It is not unreasonable to believe that the United States, as a strategy to deter violent terrorism, is labeling more non-violent crimes as terrorism than in years past. Since the United States is an economic superpower, its dollar has more buying power around the world. Because of this, terrorist sympathizers are able to accrue cash with much more buying power than in their home countries (assuming they support foreign terrorist organizations). This results in the ability of foreign terrorist organizations to acquire much higher numbers of supplies for violent terrorist attacks.

As the changing of methods through time offers insight into how the United States labels a crime as a terroristic act, the Othered Status of a person, too, changes in time. Different conditions in the United States during the four Presidencies included in this dataset might offer clues into how the status changes.

Notice, again, how the Othered status of terrorists changes drastically after the Clinton era. The United States, during this time period, could be experiencing higher sensitivity to terrorism due greatly to the loss of life from the September 11 attacks. As the Global War on Terror continues through the years, the Othered status of terrorists lowers. Whether this is due to a Liberal Obama Administration and a smaller sample size for the Trump Administration or that the United States and its citizens are becoming less skeptical of the people committing these crimes requires further study.

Seeing how the lethality of each of these acts changes in time can give clues as to how violent the crimes committed in these separate time periods are. Given the rise in non-violent methods in the past three Presidencies, studying the counts of lethality in their terms will shed light on how many people were killed in violent terrorist attacks in these time periods.

The lethality of these attacks again increases substantially during the years following the September 11 attacks. It is important to note that President Donald Trump has only been in office for just over two years at the time of conducting this analysis. The significantly lower lethality could be due mostly to the fact that the sample size is much smaller.

Seeing how the Method, Othered Status, and Lethality has changed through time then lends itself to studying how all terroristic acts included in the dataset has changed. Political moods and outside factors might play into how these crimes are included, and can be visualized by plotting them by the four Presidencies included in tPP.

A large change in No State Speech occurs from the Clinton to Bush Administrations. It is difficult to determine whether this is due to the United States’ sentiment towards terrorism changing after the September 11 attacks or some other unknown variable. The Combination and State Speech groups constitute the largest change from the Clinton to Bush Administration. From the Bush to Obama Administration, a change in these two categories again occurs with State Speech becoming less prevalent and Combination becoming more prevalent. The increase in the Combination group might be a result of President Barack Obama being the first African American President in the history of the United States. A terroristic act due to this fact along with other racially charged motivations constitutes inclusion in the Combination group; however, this hypothesis requires further analysis and is not part of this study.

Machine learning processes can help to classify each case, with respect to their Reason for Inclusion, by the separate variables in the dataset. Splitting each of the nodes into various methods, Presidents, and lethality allows for the computer to decide where a case might fit based on the given factors and to create a Classification Tree from these splits.

The most important factor in this tree is the Method. From the first partition, all of the methods are present except for Non-Political and Perjury/Obstruction of Justice, which lends itself to the State Speech node on the right. The only Methods for which a case is likely to be State Speech are Perjury/Obstruction of Justice or Non-Political. Of note, President is the second partition on both the No State Speech and the State Speech nodes and that Obama appears in both of the positive splits for president. Only 10% of the entries fall under the criteria of Non-Political Method. Additionally, there is no partition that requires the Othered Status or Lethality in this tree. This tree shows a path of which to follow to see the categorization by the government of each type of case. The model accuracy rate of this optimized tree is about 65%, this comes from comparing the predicted values with those in the testing data set. A confusion matrix allows for the analysis of the performance of a Classification Tree. The model is the most accurate in predicting cases of State Speech and the least accurate for cases of a Combination.

The above procedure of obtaining a pruned tree involves using a training and testing data set. Splitting the data and training a model on part of it and then testing the model on the other part is a form of cross-validation. Another way to check the accuracy of the model is through a random forest. A random forest allows for validation of singular trees. Random forest importance plots show the validity of five hundred random trees from the data.

The Mean Decrease Accuracy and Mean Decrease Gini coefficients plots how important a variables is to the partitioning process in the creation of a characteristic tree. The further along a variable is on the x-axis (in both plots) signifies a greater presence in the partitioning process in randomly generated trees in the forest. As in the earlier singular characteristic tree, Method again is the most important variable for determining whether an act is State Speech, No State Speech, or Combination. Despite the large gaps in the variables (meaning the partitioning process becomes less accurate), it is worth noting that the variables in this order help to increase the validity of the singular tree. Lethality and Othered Status are the two least important predictors, according to the random forest data. Summarily, this means that the order of importance for determining the Reason for Inclusion is Method, President, Lethality, and Othered Status. The accuracy rate for the random forest is 70.2%, meaning the model for this data is predicting cases correctly 70.2% of the time.

Conclusion

Many outside political factors (e.g. the September 11 attacks, the Global War on Terror, Presidential Administration) can affect how the government classifies crimes as terroristic acts or not. These classifications do change over time and the involving methods play a significant role in determining whether they are state speech acts of terrorism, not affiliated with state speech, or a combination of the two. In predicting what a government will classify a case as, the method by which a crime is committed and the President in office at the time of its being committed, in order of importance, have the most impact. The others do not provides as much information, but they are, in order of relevance, lethality of the crime and the Othered Status of the terrorist committing the crime. The splitting power of President in the characteristic trees drives home the finding of how the Reason for Inclusion changes in time. The random forest solidifies the importance of the variables presented in the pruned tree through the Mean Decrease Accuracy and the Mean Decrease Gini. Not surprisingly, the pruned tree ended with five terminal nodes, two of which were No State Speech, two of which were Combination, and only one was State Speech. These results are consistent with the raw counts of each of the individual reasons for inclusion. With a 65% accuracy rate in the pruned tree and a 70% accuracy rate in the random forest, there is reason to believe that the variables in these trees make for important determining factors in whether a terroristic act will be classified as a state speech act, a non-state speech act, or a combination of the two.

tPP Preliminary Statistical Report #4 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the non-technical analytical summary and selected visualizations for Team #3 (Change over time) of 5.

This report was authored by Daniel Cirkovic, Yi Jing, Samantha Thompson, & Xuemeng Wang. To download the complete report, including the statistical source code, click here.



0.1 Non-Technical Report

0.1.1 Introduction

The Prosecution Project (tPP) is a collection of data that specifically investigates patterns in political violence and terrorism occurring in the United States from 1990 to the present. Data is continuously being added, so updates to the following analysis may need to occur when more recent data becomes available. Our analysis focuses on characteristics of the terrorists, and their acts, including demographics, religion, prosecution types, ideology, tactic, targeting, and group affiliation. Our goal is to show visually and statistically analyze how these variables change over time.[/su_expand]

0.1.2 Methodology

In order to more clearly detect variable changes, we split the data into time periods separated by major terrorist events. We decided to take this approach to not just evenly split the entire time period (events are not evenly spaced, but amount of data included in each period is fairly similar), but to also see if these major events induced any specific patterns within the variables. We try to depict the reasoning behind these changes, but all of this is subjective – correlation is not necessary causation. The only conclusions we can draw for certain, come from the statistical tests performed, relating to the overall change of each variables’ categories over time.

Some of the variables included many categories; and in order to fit them all into one graph, with enough data available within each category per period, we only took the categories with the highest frequencies – while combining some categories together. This was done on a case by case basis, and more information on how this was completed is in the Appendix. NA’s for that variable were deleted only for that variable, making sure to leave the data in the complete data set in case there were values (not NA) for the other variables.

In order to find differences in each variable over time, we summed each category within the variable and time period, and divided it over the total amount per time period. This gives us the frequency of each category per period, so that we can test if it has differences over time.

The tests we used for this are the Pearson Chi-Square Test, Fisher Exact Test, and Cramer’s V Statistics. Because of the minimal amount of data in some categories per time period, the Fisher Exact Test is included because it has more relaxed rules on data size, contrary to the Pearson Chi-Square Test which is testing similar things. Cramer’s V is a little bit different in that it measures how important the period is in determining each categories count.

These tests do not tell us if the variables’ categories are increasing or decreasing over time, so we created bar charts where all bars are equal to 100%, and within each period the categories are split into percentages.

We additionally wanted to see if any of the variables impacted the counts of another variable over time. To do this, we selected racial/ethnic group to compare with time against (1) prison sentence length, (2) plea and (3) tactic. The Cochran-Mantel Haenszel test was used in order to test the differences over time with now two variables and time, whereas we only had one with time in all previous tests.

0.1.3 Conclusion

We saw that characteristics of terrorists and their acts of terrorism have significant changes over the time period the data was collected in as of now. By using both visualizations and statistical tests, these changes can be closer investigated by importance and size, as each variable has its differences. Overall, the key variables to assign the most importance to based on the statistical tests are Othered Status, Citizenship, Tactic, and Group Affiliation. This is why the visualizations included in this report are chosen, and explored/researched reasons for these changes along with the directions of their differences.

0.2 Technical Report

0.2.1 Introduction

Terrorism in the United States peaked in the late 1960’s and early 1970’s, followed by a precipitous decline (Ross et al, 1989). Despite this decline, terrorism seems ever more present. Large scale media coverage and the development of social media have often been cited as contributors to discerned prevalence of terrorism (Weimann et al, 2014). Further, media coverage of events such as 9/11 has framed many attacks as “Muslims/Arabs/Islam working together in organized terrorist cells against a Christian America”. On the other hand, domestic terrorists often receive the label of “troubled individuals” (Powell, 2011). Thus, there is strong evidence of media coverage affecting the perception of terrorist attacks in the United States. Given the Prosecution Project (tPP) dataset, trends in terrorist activity are analyzed by grouping events into periods delineated by large scale media events and detecting any changes between said periods. This organization of events may allow for the detection of changes in terrorism, perhaps due to perpetrators attempting to imitate previous attacks covered in the media.

0.2.2 Methodology

In order to recognize the patterns in demographics, prosecution types, ideology, tactic, targeting, sentence length, informant, and group affiliation over time, each event was organized into different time periods separated by major terrorist attacks in the United States. The events of interest are listed below:

The purpose of this delineation is to determine whether these events, largely covered in the media, trigger “copycat” terrorist attacks (known as contagion) or somehow impact a variable’s distribution in time periods near said events (Nacos, 2010).

Once each event was grouped, the frequencies of each variable category were computed within each time period and compared using 2-way contingency tables. That is, each variable had its own contingency table with the rows representing the categories given in the variable of interest, and the columns representing the time periods described earlier. Often, multiple categories were either condensed or removed due to sparseness of information (see Appendix for the exact breakdown of tables). The difference in distribution of the categories across time will be tested using both a Pearson Chi-Square Test and Fisher Exact Test.

The Pearson Chi-Square Contingency Table Test tests homogeneity of the time periods. More specifically, it decides whether or not there is a difference between the proportions of the categories of a certain variable across the time periods. For example, if the gender variable were to be considered, it would test whether the proportion of events committed by males and females has changed over time. However, it does not indicate the direction of these changes (Lachin, 2011).

Most of the variables, however, violate the expected count assumption of the Pearson Test. The test assumes that the expected counts in each of the cells are greater than five, but much of the tables contains zero values in multiple categories. Despite this violation, the Pearson Chi-Square Test is quite robust with these small expected cell frequencies (Camili, 163). To ensure this infraction does not impact results, an additional Fisher Exact Test is performed.

Fisher’s Exact Test again tests a difference between time periods in each of the variable category proportions. Specifically, it counts the number of possible tables that could be constructed with the given marginal totals. Then, it computes the proportion of those tables that are more extreme than the observed table, giving a p-value (Raymond et al, 1995). Since this could amount to a large number of tables, a bootstrap simulation with 2000 replicates is considered. This test relaxes the assumptions given by the Pearson Chi-Square Test.

Trends will be visually analyzed using proportional, stacked bar charts. Along with the Pearson Chi-Square tests, Cramer’s V statistics were computed. Cramer’s V is a measure of association between two categorical values ranging from 0 to 1. The higher Cramer’s V, the stronger the relationship between period and the given variable is (Acock et al, 1979).

Finally, the interaction between racial/ethnic group, prison sentence length, and time is considered. Perhaps, over time, certain races will have differing sentence lengths, whether that be a result of discrimination, ethnic tendencies, or other factors. A three dimensional table will be considered with a Cochran-Mantel Haenszel Test applied. This test is an extension of the Chi-Square Test, and, in general, tests for differences in the joint and marginal distributions of three variables (Lachin, 2011).

In each table, any unknown observations were not considered, since they add no information to the story, other that adding sample size and changing inference in a direction that may not necessarily be honest.

0.2.3 Results

From the collection of two-way tables, the distribution of most variables have changed over time. Only the distribution of death sentencing and gender seemed homogenous over time, as both the Fisher and Chi-Square tests failed to detect a difference in their distributions. The uniformity of gender and death sentencing throughout the periods is not surprising, as the vast majority of events in the dataset were perpetrated by men and did not result in a death sentencing of the perpetrator. More interesting insights can be gathered visually.

The three-way tables invites some interesting insights. When comparing ethnicity, sentence length (categorized by every 100 months), and time period, there was no significant difference found between the distributions of the categories within each of the groups. The same results was reached when comparing ethnicity, plea, and time period. However, the Cochran-Mantel-Haenszel Test found a significant difference between the distribution of ethnicity and tactic over the time periods.

The following proportional, stacked bar charts show us how, and the direction of change, on the variables we felt were key to this analysis.

We see in Figure 1 that the amount of terrorism acts by Non-U.S. citizens has consistently decreased over time, with it reaching very minimal counts by 2015 to present day. In 2011, the Department of Homeland Security defined a new term of “specially designated countries” to be countries “that have shown a tendency to promote, produce, or protect terrorist organizations or their members.” In 2003, the Department of Homeland Security provided US border crossings with a list of 52 countries that fell under this term – in order to increase border security against possible terrorists. The list was continually updated and changed until present day. From 2007 to 2017, the US Border Patrol apprehended 45,006 immigrants from any of these countries to have ever been on the list. There have been zero attacks committed by illegal border crossings from any of the listed special designated countries. However, foreigners who have entered legally from these countries are responsible for 99.5% of all murders and 94.7% of all injuries committed by terrorists in the US from 1975 through the end of 2017 (Bier). We see that 9/11 may have spiked this trend that a successful strategy for foreign terrorism is to first enter legally, or to have a US citizen commit the act. After 9/11, the amount of non-US citizens to commit acts of terrorism is at its peak and then its decline. All terrorists involved in 9/11 were non-US citizens. This decrease in non-US citizens being able to commit acts of terrorism is likely the cause of increased security. However, terrorism is evolving so that the US may no longer be looking for non-US citizens to be committing these acts, as our graph shows.

Figure 2 is very interesting in how group affiliation overall changes over time. Not looking into specific terrorist events, but at each group over time, we see that Al Qaeda has decreased consistently over time, but the Islamic State has increased – by large amounts especially in more recent years. There are many factors that play into this variable’s directional changes, and we will try to summarize what we think is the cause the best that we can. Bin Laden, the previous leader of Al Qaeda, was killed in 2011. Period 6 is after the year 2009, and the period that we first start to see the decrease of Al Qaeda. This may be due to their leader dying, but some additional cause of conflict between groups could also play a role. Let’s start at the beginning. Period 4 is after 9/11, an event Al Qaeda wished to take credit for, and therefore Al Qaeda is strong and on the rise here. In period 5, which is after 2006 when Al Shabab was formed, we see a heavier Al Shabab presence seen. Al Shabab was known to be tied to Al Qaeda, and they declared official allegiance to them in 2012. We see both Al Qaeda and Al Shabab decrease after period 8 (2012), which is what we would expect as Al Qaeda was weakened, so was Al Shabab because of their affiliation. We now start to see the rise of ISIS, who have taken advantage of the weakened Al Qaeda and Al Shabab, in order to make their presence more known. Although these groups have similar views, they are not supportive of one another, and have different tactics on how they wish to be heard. We can see how the changes in tactic over time graph below reflects these different groups, by which tactic they decided to use.

Going back on what we discuss in the previous paragraph, we can see in Figure 3 that when Al Qaeda was in greater power, the categories of tactic that are most prevalent are crimes like Arson, Chemical or biological weapon deployment, and Explosives. These are all tactics that support Al Qaeda’s goal to plot terrorism spectaculars to electrify the Muslim world. Whereas, ISIS viewpoint is to aim to control territory and expand their ideology. This can be seen as why once ISIS are in more power, the popular tactics are Providing material/financial support to terrorist organizations, Firearms, and Armed intimidation/standoff – all ways to overtake, build their organization, and control.

Additionally, from Figure 3, we see rises in tactic that could be the result of the major acts of terrorism we split the periods by. Explosives seem to increase from period 1 to period 2, which is after the Oklahoma City Bombing. Also, after the Aurora Theater Shooting, there seems to be a drastic decrease in civilian firearms, while there is an increase in armed intimidation/standoff. On another note, we see perjury/obstruction of justice slowly appear and begin to increase from past to present. This could be the cause of laws changing over time, so as stricter laws are implemented, more people may be convicted.

Other notable changes where graphics are not included are listed here. The terrorists’ religion shows changes over time like after the Charleston Church Shooting, no Christians committed acts of terrorism. This could be due to the shooting happening in a Christian church, making other Christians less likely to commit any crimes or act out. The Veteran Status changing over time plot shows that after 9/11, the amount of veterans that committed acts of terrorism decreased drastically – then fluctuating but never again reaching the amount of terrorism acts before 9/11. Another change we see around 9/11 occurs in the ideological affiliation. We see that after 9/11 there is a massive increase in No Affiliation ideologies. This could be because groups were trying to draw attention away from themselves after all the security measures put into place after 9/11. We also see a huge increase in Rightist ideologies after the Charleston Church Shooting. This is interesting to note because the man that committed this act of terrorism was a 21-year-old white supremacist, who most likely believed in a rightist ideology. After the death of Trayvon Martin, State jurisdiction for acts of terrorism increased largely, possibly due to the pressure on local police following this event. The increase in verdict of charged but not tried over time can be due to possible ongoing cases as we get closer to present day. After the first major act of terrorism, we see more informants coming forward to prevent terrorist events.

The three-way tables invites some interesting insights. When comparing ethnicity, sentence length (categorized by every 100 months), and time period, there was no significant difference found between the distributions of the categories within each of the groups. The same results were reached when comparing ethnicity, plea, and time period. However, the Cochran-Mantel-Haenszel Test found a significant difference between the distribution of ethnicity and tactic over the time periods.

To further inspect these differences, a stacked bar plot was developed. Ethnicity was limited to only the white and middle eastern groups, as they provided interesting insight. Over time, it seems that of crimes in the data set committed by people of middle eastern ethnicity, the proportion of those crimes that included providing financial support to terrorist organizations has increased drastically over each time period. This occurrence spawned right before the 9/11 attacks. Crimes perpetrated by white individuals in Period 2, post Oklahoma City Bombing, started to consist mainly of explosives, perhaps furthering the idea of similar “copycat” crimes being committed after large media coverage of terrorist attacks. Similarly, after the Aurora shooting, white criminals seemed to heavily gravitate towards armed intimidation to commit their crimes as well. Other ethnicity plots can be seen in the Appendix.

0.2.4 Conclusion

The analysis provides some evidence that “copycat” terrorism or contagion impact the distribution of multiple characteristics of terrorist attacks over time. These changes are especially prevalent in the distribution of tactics across ethnicity and othered status after key events such as the Oklahoma City Bombing, 9/11, and the Aurora Shooting. Further, Ideological Affiliation trended towards Rightist Leanings after the Charleston Church Shooting, while Group Affiliation has seen a recent increase in attacks perpetrated by the Islamic State, despite the decrease in attacks perpetrated by Al-Qaeda. The claim that characteristics of these terrorist attacks are associated with the selected time periods are both bolstered by the results given by Chi-Square Tests and Cramer’s V quantitates. Of course, the Chi-Square Tests only say that period and terrorist attacks are associated and do not imply mechanism. However, the bar charts provide the context to our hypothesis. The analysis is limited by sparseness of events in some categories in which measures were taken to combat.

0.3 References

Acock, Alan C., and Gordon R. Stavig. “A measure of association for nonparametric statistics.” Social Forces 57, no. 4 (1979): 1381-1386.

Bier, David, and Alex Nowrasteh. “45,000 ‘Special Interest Aliens’ Caught Since 2007, But No U.S. Terrorist Attacks from Illegal Border Crossers.” Cato Institute, 17 Dec. 2018, www.cato.org/blog/45000-special-interest-aliens-caught-2007-no-us-terrorist-attacks-illegal- border-crossers.

Camilli, Gregory, and Kenneth D. Hopkins. “Applicability of chi-square to 2× 2 contingency tables with small expected cell frequencies.” Psychological Bulletin 85, no. 1 (1978): 163.

Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL http://www.jstatsoft.org/v40/i03/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

Jeffrey B. Arnold (2019). ggthemes: Extra Themes, Scales and Geoms for ‘ggplot2’. R package version 4.1.1. https://CRAN.R-project.org/package=ggthemes

Lachin, John M. Biostatistical Methods: The Assessment of Relative Risks. 3rd ed. Hoboken: Wiley, 2011.

Nacos, Brigitte L. “Revisiting the contagion hypothesis: Terrorism, news coverage, and copycat attacks.” Perspectives on Terrorism 3, no. 3 (2010).

Powell, Kimberly A. “Framing Islam: An analysis of US media coverage of terrorism since 9/11.” Communication Studies 62, no. 1 (2011): 90-112.

Raymond, Michel, and François Rousset. “An exact test for population differentiation.” Evolution 49, no. 6 (1995): 1280-1283.

Ross, Jeffrey Ian, and Ted Robert Gurr. “Why terrorism subsides: A comparative study of Canada and the United States.” Comparative Politics 21, no. 4 (1989): 405-426.

Weimann, Gabriel. New terrorism and new media. Vol. 2. Washington, DC: Commons Lab of the Woodrow Wilson International Center for Scholars, 2014.

See full report for complete contingency tables, stacked bar plots, and r code for age, gender, othered status, ethnicity, religion, veteran status, citizenship, jurisdiction, plea, verdict, length of sentence, death sentence, ideology, tactic, physical target, ideological target, informant, group affiliation, FTO affiliation.

tPP Preliminary Statistical Report #3 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the analysis report provided by Team #3 (Identity and Criminal Action Analysis) of 5.

This report was authored by Athena Chapekis, Jing Lin, Ruoqi Tan & James Wieneck. To download the report, click here.



Non-technical Summary

Introduction

We are presented with a data set involving individuals who were indicted and prosecuted for crimes which have socio-political motivations and/or crimes that have rendered them as designated terrorists in the United States. These cases involve various identity variables of the defendants (age, race/ethnicity, gender, “othered” status, religion, citizenship status, and veteran status), as well as various criminal activity variables (people vs. property, number injured, number killed, physical target, ideological target, and tactic). The question we seek to answer is: how do aspects of a defendant’s identity play a role in their criminal activity?

Results

The statistical result shows unbalanced levels among identity variables. For gender, the vast majority of offenders are male. For race and religion, ‘Muslim’ appear more frequently. Most cases have civilian status. The most common tactic is ‘Providing material/financial support to terrorist organization’, ‘Unspecified’ appear most frequently as ideological target, ‘Online’ appear most frequently as physical target.

All identity variables have significant relationships with activity variables, however the actual size of the effect varies across different variables. Gender and othered status affect the number of persons killed or injured significantly, with men and othered defendants having a higher injury count. Age was a consistently influential variable when examining how trends in criminal activity are influenced by one’s identity across the board, though it almost always had some interaction with citizenship status, veteran status, and/or othered status. Othered status was also a highly influential variable in predicting different trends in criminal activity.

Conclusions

This report finds that the identity variables which have the greatest prediction effect of criminal activity are Othered Status, Religion, Ethnicity/race, Citizenship Status, and Veteran Status. Gender is a significant predictor of the number of killed and injured by a crime but is not a significant predictor of other criminal activity variables.

The models we built in predicting trends in criminal activity based on the identities of the defendants had poor predictive power, in part because of unused scenarios and unspecified cases for multiple variables. The data set used for the analysis may likely need more information provided to give a more complete picture of how criminal activity is linked to a defendant’s identity.

Technical report

Introduction

The definition of what constitutes “terrorism” is not a unanimous one. Different sources report different standards for what an act of terror entails. Because of this, there has not been a thorough body of research built on terrorism in all its forms. Issue-specific groups like the Department of Justice (DOJ)/Federal Bureau of Investigation (FBI), the Center for Biomedical Research (CBR), and the National Abortion Federation (NAF) have collected their own databases of terrorism and terrorists over time, but they generally focus on one specific ideological group – whichever is of the greatest concern to them.

The Prosecution Project (tPP) is a large-scale project out of Miami University that seeks to construct a database of all acts of terrorism and socio-politically motivated crimes ending in felony prosecutions in the United States 1990-present. Each case in tPP’s database is coded across 44 variables, including demographic information on the defendant, details of their affiliations, details of the crime they committed, and details of the legal proceedings.

This report seeks to investigate the connection between a defendant’s identity (i.e. their demographic information) and their criminal activity and provide an answer to the question of how who someone is relates to what they do.

Methodology

The first step in approaching this analysis is to clean the data. Categorical variables which have many levels are reduced to allow for better comparison and analysis. Much of this reduction was done using the classification provided by the Prosecution Project codebook.

For example, in the variable Physical Target, the levels of ‘Federal site: non-military non-judicial’, ‘Federal site: military’, ‘Federal site: judicial’, and ‘Federal site: non-U.S. embassy or consulate’ are combined and recoded simply as ‘Federal site’. Furthermore, the levels for ‘State site’ and the levels for ‘Municipal site’ are combined with ‘Federal site’ to make one unified level of ‘Governmental site’. This is done for the variables of Physical Target and Ideological Target. Due to the low representation in many of the levels for the variable ‘Tactic’, many levels were combined into an ‘Other’ level. Other categorical variables that were not recoded but included in this report in their original state are People vs. Property, Gender, Ethnicity, Religion, ‘Other’ Status, Citizenship Status, and Veteran Status. For each categorical variable, a bar chart is generated to compare frequencies of levels.

To conduct an analysis, this report begins with T-tests to determine the influence binary predictor variables Gender (male v. female), Othered Status (othered v. non-othered), and Veteran Status (citizen v. non-citizen) may have on number of people killed and number of people injured in socio-politically motivated crimes. A significance level of 0.05 is used. Furthermore, Analysis of Variance (ANOVA) tests are used to test for significant differences in the number of people killed and the number of people injured between demographic groups for the identity variables of Race/ethnicity, Religion, and Citizenship Status. As well, ANOVA tests are used to see if a defendant’s age differs significantly between the types of things that are targeted in socio-political crimes (both physically and ideologically) and if age differs significantly between types of tactics. On top of the ANOVA tests, Eta Squared values are calculated to test for effect size in the relationships (Brown). To investigate relationships between categorical identity variables (e.g. Religion, Citizenship Status, etc.) and categorical activity variables (e.g Tactic, Physical target, etc.) Chi-Squared Tests of Independence are used. As well, Cramer’s V is used to calculate effect size for the respective relationships between these categorical variables. Initially, this report sought to use linear regression to create a predictive model of trends. However, we have found that due to the categorical nature of many of the variables (often with many levels) and given there are different trends among differing variables related to the crime, it is not advisable that we attempt to build regression models based on a singular response variable. Instead, we will want to use classification tree modeling for the categorical variables whose trends we want to analyze and ANOVA tree modeling for the numerical variables whose trends we want to analyze.

We will be using classification trees for the following variables: People vs. Property, Physical Target, Ideological Target, and Tactic; we will be using ANOVA/regression trees for the following variables: Number Injured and Number Killed. These will be considered as our criminal activity variables for this portion of the analysis. The identity variables we are using in this portion of the analysis are age, gender, race/ethnicity, religion, othered status, veteran status, and citizenship status. The purpose of this portion of the analysis is to see which aspects of a criminal’s identity are most often associated with various aspects of criminal activity, and also how these aspects interact or intersect. To validate the results from our classification and regression trees, we will also be using random forests for each model to see which variables are most significantly linked to each criminal activity variable, and to see which variables the most significant contributors were to differences in criminal activity trends (Liaw). For each random forest, 1,000 classification trees will be generated.

Results

For most of the categorical variables, there are a number of levels which appear in the data very infrequently.

Identity variables

Looking at the demographics of the data, we see fairly uneven representation among levels for almost all of the variables. As far as gender, the data is overwhelmingly male, and the levels of ‘Non-binary/gender non-conforming’ and ‘Unknown/unclear’ are used virtually never.

Ages range from 16 to 88 with a median age of 33 and a mean age of 35.9. The ethnicities of ‘Biracial’ and ‘American Indian/Alaskan Native’ hardly occur, and for Religion, ‘Jewish’ and ‘Other’ appear very infrequently. As well, ‘Christian’ and ‘Christian Identity’, while occurring somewhat more often, do not occur in the data nearly as often as ‘Muslim’ and ‘Unknown’.

Regarding Citizenship Status, all levels are relatively infrequent compared to ‘Civilian’ and ‘Foreign national’. There are more cases marked as ‘Othered’ than ‘Non-othered’, but both are well-represented in the data. Lastly, when looking at Veteran Status, almost all cases are coded ‘Civilian’. All othered statuses are fairly uncommon and combined make up only about 16% of the data.

Criminal activity variables

The most commonly occurring tactic by far is ‘Providing material/financial support to terrorist organization’. After that, ‘Explosives’, ‘Criminal violation not linked or motivated politically’, ‘Various methods’, ‘Arson’, and ‘Firearms’ occur most frequently.

All levels in the People vs. Property variable are fairly well represented. Regarding targets, for Ideological Target, ‘Unspecified’ is the most frequently occurring level in the data followed by ‘Government’, but all levels aside from those do appear to occur at similar rates. For Physical Target, the levels of ‘Online’, ‘Educational institution’, and ‘Municipal site’ do not occur frequently.

Analysis of Variance (ANOVA)

From the results of ANOVA test, the F test shows that race, religion, and citizenship have significant influence on number of killed and injured. The identity variable age has significant relationship with the activity variables people or property, physical target, ideology target, and tactic. The eta squared test shows that citizenship has larger effect on number of killed and injured than race and religion, and ideological target has the largest effect on age.

Student’s T-test

Regarding the number of people killed by a crime, we can be 95% confident that, on average, for each death caused by a woman’s crime, men’s crimes kill between 0.08 and 8.76 more people. For the differences in the number of people injured, we can say with 95% confidence that, on average, men injure anywhere between 16.11 and 52.71 more people than women in the course of a socio-politically motivated crime. There is no statistically significant difference in fatalities between crimes committed by othered and non-othered defendants, however, we can be 95% confident that othered defendants injure between 20.15 and 76.3 more people in the course of their crime than non-othered defendants. As well, there is no statistically significant difference found in the number of people killed or the number of people injured between the those who are civilians and those who were not.

Chi-Squared and Cramer’s V

The results of the Chi-Squared Test of Independence showed widespread statistical significance between all identity variables and all criminal activity variables. When Cramer’s V is calculated for effect size, however, it appears that many identity variables have a weak effect on criminal activity. Specifically, gender seems to have the least effect on criminal activity. Othered Status has a particularly significant effect on criminal activity, so much so that Cramer’s V indicates Othered Status may be measuring the exact same trends as the criminal activity variables.

Classification/Regression Trees and Random Forests

Figure 1. The classification tree for the people vs. property variable. At least 50 cases were required for each split, and each final outcome required at least 50 cases.

What we have been able to see is that for predicting the trends in whether a target is human or property, othered status appears to interact with veteran status and age. Othered defendants are more likely to either have targeted people or have no direct target (Figure 1). Of othered defendants who were of civilian status, released on hardship discharge, or whose veteran status was unknown, no direct target was identified; otherwise, people were more likely to be targeted. Among those of non-othered status, those whose veteran status was active duty, dishonorably discharged, belonging to a non-U.S. military, or unknown were more likely to target people. Among those who were not of those veteran statuses, age was an additional factor; those and who were 52 and under were more likely to target property, and those 53 and over were more likely to target people (Figure 1). We can see that the most significant variables which made a difference in the trends in which type of target was involved were othered status, veteran status, and age, in this order.

Figure 2. The variable importance plot for the people vs. property random forest model.

After conducting a random forest on the data used to build the classification model and plotting the importance of each variable, we find that veteran status, othered status, and age are the largest contributors to the differences in which types of targets defendants tend to target. Age and othered status were particularly strong determinants in these patterns (Figure 2).

Figure 3. The ANOVA/regression tree for the number killed variable. At least 20 cases were required for each split, and each final outcome of the tree required at least 15 cases.

What we can see for the number of fatalities in each crime is that there is a split at veteran status. Those whose veteran status was either active duty, civilian, dishonorably discharged, honorably discharged, or unknown had an average of 1.8 fatalities (Figure 3). Among that group, the average number of people who were killed as a result of a defendant whose citizenship status was either refugee, residing on a visa, a citizen, a permanent resident, or unknown had a fairly low average of 0.77 (Figure 3). Among defendants who were not of these citizenship statuses, there was an average of 5.7, and another split at religion (Figure 3). Those whose religion was identified as Christian or unknown had fairly low average fatalities at 0.43, which was lower than for those whose religions fell outside of these 2 categories at 10 (Figure 3). From there, age was a major determinant in the number of fatalities. Those who were under 25 had, on average, the second-most fatalities at 30, and those who were 25 or older only had 7.6 fatalities on average (Figure 3).

For defendants who were a former or current non-U.S. military member or who were discharged on the basis of hardship, the average number of fatalities was 10 times higher than defendants not of these veteran status categories at 18 fatalities (Figure 3). We notice that, from here, there is a split at age; those who were 35 or younger had an average fatality count of 6.2, whereas those who were 36 or older had an average fatality count of 32 (Figure 3). We can see that the most significant variables in predicting differences in the number of people killed were veteran status, citizenship status, religion, and age.

Figure 4. The variable importance plot for the people killed random forest model.

After conducting a random forest on the data used to build the regression/ANOVA model and plotting the importance of each variable, we find that age is a very significant predictor in determining the differences in fatalities among each case of crime (Figure 4). However, we cannot ignore the influence of veteran status or citizenship status, as they were significant variables on which the regression trees were split, and the variable importance plot also reflects this (Figure 4).

Figure 5. The ANOVA/regression tree for the people injured variable. At least 50 cases were required for each split, and each final outcome of the tree required at least 25 cases.

Looking at our results in Figure 5, we find that among defendants who were U.S. citizens, refugees, residents on a visa, permanent residents, or of unknown citizenship status, the average number of people injured was 4.1. For defendants who were not, there was a split at religion; those whose religion was identified as Christian, Christian Identity, or unknown had an average of 1.6 injuries (Figure 5). Among those whose religions were not in those categories, there was a split at age. For those who were 27 or older, the average number was 141, and for those who were 26 or under, the average number was 429 (Figure 5). We can conclude from this tree that citizenship status, religion, and age were important factors in predicting the differences in the number of people injured.

Figure 6. The variable importance plot for the people injured random forest model.

After conducting a random forest on the data used to build the regression/ANOVA model and plotting the importance of each variable, we find that citizenship status and age are particularly important in determining trends and predicting differences in the number of people injured as a result of a crime (Figure 6).

Figure 7. The classification tree for the physical target variable. At least 75 cases were required for each split, and each final outcome required at least 75 cases.

After conducting a random forest on the data used to build the regression/ANOVA model and plotting the importance of each variable, we find that citizenship status and age are particularly important in determining trends and predicting differences in the number of people injured as a result of a crime (Figure 6).

What we can see in this classification tree is that there is an initial split for othered status (Figure 7). Among those of othered status, we can see a split for veteran status. Among defendants who were civilians, former veterans released on hardship discharge, or former veterans who were honorably discharged, the physical target was more likely to be unspecified; among defendants whose veteran status did not fall in these 3 categories, no direct physical target was found (Figure 7). For those of non-othered status, private sites were more likely to be attacked, and there was a split for religion. Defendants whose religion was identified as Christian, Jewish, or Muslim were more likely to have an unspecified target, and those whose religion was not one of those 3 were more likely to attack private property (Figure 7). There is a further split in age; defendants who were 40 or older often had an unspecified physical target, whereas those under 40 tended to attack private sites (Figure 7).

Figure 8. The variable importance plot for the physical target random forest model.

After conducting a random forest on the data used to build the classification model and plotting the importance of each variable, we find that age, religion, othered status, and veteran status are important in predicting differences in physical targets (Figure 8). Age and veteran status appear to be particularly important in determining the differences between physical targets (Figure 8).

Figure 9. The classification tree for the ideological target variable. At least 50 cases were required for each split, and each final outcome required at least 25 cases.

We notice that the first split of this classification tree is at othered status (Figure 9). Of defendants who are of othered status, there is a split at veteran status. For defendants who are civilians, were honorably discharged, were discharged on the basis of hardship, or whose veteran status is unknown, there was an unspecified ideological target; for defendants whose veteran status is not one of those 4 categories, government was the most likely ideological target (Figure 9). For those of non-othered status, there is a split on age; those who were 35 or over were more likely to attack government targets on the basis of ideology (Figure 9).

For non-othered defendants who were under 35, there was a split on religion; those whose religions were identified as Christian, Christian Identity, Jewish, or Muslim tended to attack on the basis of identity (Figure 9). Among those whose religions were not one of those 4 categories, veteran status was a significant predictor; civilians were more likely to attack left-leaning industries, while non-civilians were more likely to attack government on an ideological basis (Figure 9). In general, we have found that othered status, veteran status, age, and religion were significant variables in predicting ideological target.

Figure 10. The variable importance plot for the ideological target random forest model.

After conducting a random forest on the data used to build the classification model and plotting the importance of each variable, we find that age, othered status, and religion are important in predicting differences in ideological targets (Figure 10). Age and othered status appear to be particularly important in determining the differences between ideological targets (Figure 10).

Figure 11. The classification tree for the tactic variable. At least 100 cases were required for each split, and each final outcome required at least 100 cases.

Othered status appears to be very significant in predicting the tactic that a defendant used in committing a crime (Figure 11). Among those who are of othered status, the most common tactic, by far, was providing material support to a terrorist organization (Figure 11). Among those of non-othered status, religion is a significant predictor of tactic; defendants whose religion was identified as Christian, Muslim, or “Other” were more likely to employ multiple (or various) methods (Figure 11). Among defendants whose religion was not Christian, Muslim, or “other”, age is a significant predictor of tactic; those who were 30 or over were more likely to use explosives when committing a terrorist act, and those who were under 30 were more likely to use arson (Figure 11).

Conclusions

This report finds that while all interactions between variables that define a defendant’s identity and variables that define a defendant’s criminal activity are significant, the variables which have the greatest prediction effect in terms of criminal activity are whether a defendant is othered or non-othered and the factors which contribute to that differentiation (religion, ethnicity/race, citizenship status), and a defendant’s veteran status. A defendant’s gender, while a significant factor in terms of the number of victims that result from a socio-politically-motivated crime, is generally not a significant predictor in other factors of criminal activity (tactic, target, etc.). The results from our classification/regression trees and random forests appear to show that the most significant identity variables associated with different trends in criminal activity were related to age, citizenship status, veteran status, religion, and othered status. For the classification trees and their associated random forests, the variables that were particularly of importance were age and othered status, and for the regression trees and their associated random forests, the variables that were particularly of importance were age and citizenship status. Overall, age proved to be a very significant predictor in explaining differences in trends in criminal activity.

Some limitations of these random forests and classification/regression trees was the large number of unspecified or unknown cases, as well as a sizable number of unused levels for tactic, physical target, ideological target, and people vs. property. We noticed that for the classification tree models, the general error rate generally ranged from 46-55%, and for the regression/ANOVA tree models, the percentage of variability explained by the model was in the negatives. Thus, because of the poor predictive power of these models, we must exercise caution in assuming that the identity variables we found to be significant have any causal effect.

References

Brown, James D. 2008. “Effect size and eta squared.” JALT Testing & Evaluation SIG News. conjugateprior. 2013. “Formulae in R: ANOVA and other models, mixed and fixed.” Blog. Accessed February 27, 2019. Retrieved from http://conjugateprior.org/2013/01/formulae-in-r-anova/.

Liaw, A., and M. Wiener 2002. Classification and Regression by randomForest. R News 2(3), 18-22.

Loadenthal, Michael, et al. 2019. “The Prosecution Project (tPP)” (Version March 2019) [Dataset]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

Loadenthal, Michael, Athena Chapekis, Lauren Donahoe, Alexandria Doty, and Sarah Moore. 2019. “The Prosecution Project (tPP) Codebook” (Version 2) [Code book]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

Loadenthal, Michael, Athena Chapekis, Lauren Donahoe, Alexandria Doty, and Sarah Moore. 2019. “The Prosecution Project (tPP) New Member Guidebook” (Version 1) [Instructional Manual]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

Milborrow, Stephen. 2018. rpart.plot: Plot ‘rpart’ Models: An Enhanced Version of ‘plot.rpart’. R package version 3.0.6. https://CRAN.R-project.org/package=rpart.plot

Navarro, D. J. 2015. Learning statistics with R: A tutorial for psychology students and other beginners. R package version 0.5. University of Adelaide. Adelaide, Australia.

Salvatore S. Mangiafico. 2015. “Student’s t–test for Two Samples”. http://rcompanion.org/rcompanion/d_02.html

Therneau, Terry, and Beth Atkinson. 2018. rpart: Recursive Partitioning and Regression Trees. R package version 4.1-13. https://CRAN.R-project.org/package=rpart

Wickham, Hadley. 2017. tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

tPP Preliminary Statistical Report #2 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the non-technical analytical summary and selected visualizations for Team #2 (Ideological Analysis) of 5.

This report was authored by Lesi Wei, Lexi Gelinas, Siqi Zhang & Yiduo Yang. To download the complete report, click here.



Introduction

The Prosecution Project (tPP) has collected data on cases in which individuals or groups engage in political violence that results in a felony or has been described through State speech as having a connection to a terrorist or extremist group with a political agenda. Specifically, this analysis is looking at several key variables in the relationship between ideology and the political violence itself.

Results

Ideology and Lethality

There are more instances of political violence that do not result in a death, but of the ones that do, Rightist groups commit more of these attacks than other groups.

Ideology and People vs Property

Salafi, Jihadist, or Islamic groups commit more attacks against no direct target than any other group. Rightist groups have more cases in which they attack property than people.

Tactic and Physical Target

Threat/support of an organization is the most used tactic and has the most cases in the online community and against unknown targets.

Ideology and Ideological Target

Salafi, Jihadist, or Islamic groups have more cases in which they attack unspecified ideological targets more than any other groups.

Ideology and State Speech

No group affiliation and Leftist groups have more cases in which they use state speech than the other groups

Tactic and Group Affiliation & FTO Affiliation

 

Salafi, Jihadist, or Islamist individuals tend to have strong tactic of threat/support of an organization, and the rightist tend to external device as their tactic. And group that affiliation with an FTO, individuals tend to provide material/financial support to the terrorist organization. No affiliation with an FTO, leads to more use of an external device.

Ideology and Location

Salafi, Jihadist, or Islamist Individuals commit more attacks in the East Coast, West Coast, and Midwest areas in the United States. Rightist groups commit more attacks in the Central area of United States. Leftist only have two states in which they commit the most political violence.

Conclusions

Not all groups of categorical variables have obvious trends, only few categories have some significant trends under each variable based on the plots. The deeper analysis will examine this in the technical report part.

References

McHugh, M. (2013). The Chi-square test of independence. Biochemia Medica 23 (2) 143-149.

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ .

tPP Preliminary Statistical Report #1 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find an analytical summary and selected visualizations for Team #1 (Descriptive Analysis) of 5.

This report was authored by Emma Ellis, Sikai Huang, Haiduan Tao & Haosen Yang. To download the complete report, click here.



The main question answered in this report is: How does the US legal system prosecute acts of political violence (descriptive) and how has this changed over time and space?

First, the data was mined and edited using RStudio. The final format had 1280 observations. The only observations that were removed from the data set were cases that had ‘pending’ as values because these had no information and would negatively impact the descriptive statistics that were created. Each of the variables chosen had a table created. These tables looked at Category, Number of Observations, Average Prison Sentence Length, Percentage of Life Sentences, and Percentage of Death Sentences. Multiple tables had a lot of zeros under the death sentence column.

After tables were initially created it was decided that the combination of some categories depending on the variable would occur. The only variable that did not have a table created was the location. That is because a geomap was found to be more beneficial as a visualization. The geomap showed that states with higher populations also had a higher amount of life and death sentences.

The white color states (Wyoming, Nebraska, Rhode Island, and Hawaii) have no information in the data provided in the project. New York has the largest prosecution count number, far more than other states. Overall, about 87% states’ length of prison sentences is fewer than 200 months. Oklahoma and New Hampshire have longer prison sentence than other states, but they have few prosecution counts. Texas, California and New York also have relatively longer prison sentence with more prosecution count. Oklahoma has the largest percentage of life sentence and death sentence. Nearly half of the states have life sentences and 23% of states have death sentences.

Since this analysis is wholly descriptive there can be no definite conclusions drawn for predicting the length of a prison sentence. From the tables that were created and the geomap, there are some trends that were found in regards to life and death sentences.

One major finding is that there were no death sentences given to any case where the criminal was not of U.S. Citizenship.

Another notable find was that if there were no deaths involved there was no death sentence given, the most interesting part of this is that there were over 1,000 observations of zero killed.

The last notable find was that if an informant was present there were no cases that resulted in the death penalty. This can be explained by a crime being able to be stopped if the police were informed beforehand.

References

Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL http://www.jstatsoft.org/v40/i03/ .

David M Diez, Christopher D Barr and Mine Cetinkaya-Rundel (2017). openintro: Data Sets and Supplemental Functions from ‘OpenIntro’ Textbooks. R package version 1.7.1. https://CRAN.R-project.org/package=openintro

Paolo Di Lorenzo (2018). usmap: US Maps Including Alaska and Hawaii. R package version 0.4.0. https://CRAN.R-project.org/package=usmap

Carson Sievert (2018) plotly for R. https://plotly-book.cpsievert.me

tPP’s brand new tri-fold pamphlet

Here at tPP, we believe in communicating. We want to be in communication with scholars, with journalists, with policy makers and anyone who would like to engage with complex questions. To that end, we have recently completed designing a tri-fold pamphlet in conjunction with Nando Zegarra, Kendall Erickson, and the folks at Miami University’s SLANT Marketing & Design.

We’ve already put these into the hands of a few noted scholars, a few students, and a reporter or two. We plan to use them to better communicate to students about the opportunities the project offers, and to make direct appeals to incoming students, and other prospective coders, analysts and team members.

To check it out, click here: tPP tri-fold pamphlet

tPP crunches the numbers for news report

In what we hope will be a recurring pattern, tPP was contacted by a reporter investigating threats against elected officials. Since we have a rather unique data set, we were able to provide the investigator with a quantitative breakdown of our relevant cases, as well as speak to him on the phone to provide context, background and help frame the data.

You can see the great reporting here: https://qz.com/1578862/arrests-for-death-threats-against-us-politicians-rose-in-2018/

You can also see the great findings and analysis report provided by tPP Steering Committee members Athena Chapekis and Lauren Donahoe here: tPP report on threatening public officials

tPP in the news again! This time a short video interview with project Director

Following coverage of tPP by our university news, tPP Director Dr. Michael Loadenthal sat down with Sinclair Broadcast Group for a 30-minute interview about the project, the state of political violence in the US, and the challenges of researching these matters. From this interview we are happy to share a short segment produced by Sinclair below.

We were also happy to be mentioned in Miami University’s College of Arts and Science Alumni Update for November 2018 which you can see below:

tPP in the (Miami) news!

As hate crimes rise across the U.S., a Miami team researches political motivations and prosecution

by Shavon Anderson, university news and communications

Two weeks after a mass shooting in Pittsburgh, what’s being called the largest anti-Semitic attack in U.S. history, the Federal Bureau of Investigation confirms bias-motivated attacks are on the rise.

The FBI recently released its 2017 Hate Crime Statistics report, revealing 7,175 criminal incidents were submitted by law enforcement agencies, a 17 percent increase from 2016 and a 21 percent increase since the 2013 report. A further breakdown of victim data shows motivations behind the attacks:

  • 59.6 percent of victims were targeted because of the offenders’ race/ethnicity/ancestry bias.
  • 20.6 percent were targeted because of the offenders’ religious bias.
  • 15.8 percent were victimized because of the offenders’ sexual-orientation bias.
  • 1.9 percent were victimized because of the offenders’ disability bias.
  • 2.2 percent were targeted because of the offenders’ gender identity and gender bias.

“In just the last two weeks, we have seen the mailing of bombs to Democrats, a racially motivated shooting at a supermarket a state to the west, and the murder of 11 Jews attending morning services in the state to the east,” said Miami University’s Michael Loadenthal.

Loadenthal, visiting assistant professor of sociology and social justice, researches political violence and attributes the increase to shifts in U.S. political discourse, which he said is moving toward authoritarianism, nativism and nationalism. Such rhetoric brings racist tropes into issues like immigration and crime, and further fuels anti-Jewish conspiracies. As a result of the political tone, there’s been a 37 percent rise in crimes targeting Jews.

Michael Loadenthal, visiting assistant professor of sociology and social justice, heads the research project The Prosecution Project (courtesy Loadenthal).

Hate and terrorism: what defines it?

Hate is evolving to become more lethal, more visible and more frequent, Loadenthal said. While recent attacks nationwide have linked suspects to white supremacist groups, he noted the Alt-Right movement has filled a vacuum left by the KKK and Aryan Nations.

“Those of us who have been studying political violence in this country are far less surprised with the sudden rise of white nationalist, neo-Nazi, and fascist violence,” Loadenthal said.

But, breaking down hate crimes in the justice system is the foundation for his ongoing research, The Prosecution Project. Started in March 2017, the project involves around 40 Miami students working to explore the relationship between what was attacked, by whom, and through what methods, and how a defendant is charged, prosecuted and sentenced in the U.S.

The Prosecution Project also aims to answer a broader question: What is the relationship between a defendant’s ethnicity, religion, age or ideological motivation and the likelihood that they would be labeled a ‘terrorist’ or receive an atypically high or low prison sentence?

Eventually, the group will create and publish a public database breaking down incidents of political violence, extremism and terrorism from factions including jihadists, nationalist/separatists, right/left-wing and issue-focused groups. Their research already generated one student-authored journal article to be published in a forthcoming issue of Critical Studies on Terrorism, with plans to partner with other leading terrorism studies journals this spring.

No-Hate initiative

The latest FBI report also revealed an increase in hate crimes reported at colleges and universities nationwide between 2016 to 2017.

Miami University works to provide a safe environment through the No-Hate initiative. The campus and surrounding community are encouraged to combat hate-fueled incidents by denouncing biased ideas and actions.

At Miami, a bias-related incident directed at an individual or group is viewed as an attack on the entire community.

If you’re a member of the Miami community and feel you’ve been the victim of an incident of bias due to your race, religion, sexual orientation, ethnicity, national origin, gender, gender identity or disability, you’re encouraged to submit a Bias Incident Report. Miami University provides an annual report of hate crimes, reported to campus security authorities in accordance with the Jeanne Clery Disclosure of Campus Security Policy and Campus Crime Statistics Act.

You can find more information on the initiative at the university’s webpage.

Want to join tPP for the Spring semester?

Now entering its 5th semester, the Prosecution Project (tPP) is currently recruiting a limited number of student researchers and analysts for the Spring semester.

tPP is a large data collection and analysis project that seeks to understand trends in how political violence, terrorism, and extremism are prosecuted in the US court system. The project involves a combination of locating cases, coding them together with team members and helping to generate and interpret quantitative and qualitative forms of analysis. Interested students can enroll as coders (via an independent study) or as analysts/writers (via SOC462).

There are two ways for Miami students to join tPP:

1.) We are interested in recruiting approximately 10 new student coders to help locate and code cases for the data set. This involves pairing up with another student coder, locating records and court documents, discussing the case, and finally entering information into an already established database. Student coders must register for 1-3 credits of independent study with Professor Loadenthal (credits will be in Sociology/SOC or Social Justice Studies/SJS). If you would like to join as a team coder, please complete the application here and our team will be in touch. Space is limited so please apply as soon as you’re able.

2.) We are also seeking up to 15 new students to join as analysts focused of the current data set to generate research suitable for publication. Students may enroll in SOC462 which will be an applied sociological research class, focused on terrorism studies, and based entirely around the tPP data set. To join this class, complete the application and email Professor Loadenthal to be added. Our goal is to publish a collection of scholarly research dealing with the tPP dataset in 2019, and the project director has already spoken with several journals about this.

No previous experience is required for coders or analysts, and the opportunity is open to students of all majors. Students student coders will be required to check in with the team twice a month and student analysts–those enrolled in SOC462–must attend that class and complete writing assignments. We are specially seeking Freshman, Sophomore and Junior students who can engage with the project for multiple terms and most team members have enjoyed their work and have sustained it throughout their their at Miami.

We are also interested in finding a few students with specialized skill sets including machine learning/artificial intelligence, grant writing, Digital content management systems and marketing/outreach. Students interested in working in these areas should complete the online applicationTo see our growing team of student researchers, visit us here!

Why join tPP?

  • Get real world experience dealing with court records, criminal indictments and data processing relevant for careers in law, public policy, intelligence analysis, security/law enforcement and government.
  • Learn and practice research skills including project design, data coding, qualitative analysis, quantitative analysis, data verification, sampling and using software suites such as SPSS, R, Tableau and a variety of cloud computing platforms.
  • Have the opportunity to publish in high ranking academic journals, present at conferences, and generate connections which are helpful for graduate school and other post-college challenges.
  • Meet with professionals working on issues of security, crime, terrorism and extremism including local leaders in the FBI, US Attorneys Office and Cincinnati Fusion Center, and leading academics at Georgetown University, George Mason University, University of Cincinnati, University of Maryland and elsewhere.
  • Help to create the largest, ideologically-mixed data set for public use by researchers, academics and other practitioners.