tPP Preliminary Statistical Report #4 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the non-technical analytical summary and selected visualizations for Team #3 (Change over time) of 5.

This report was authored by Daniel Cirkovic, Yi Jing, Samantha Thompson, & Xuemeng Wang. To download the complete report, including the statistical source code, click here.



0.1 Non-Technical Report

0.1.1 Introduction

The Prosecution Project (tPP) is a collection of data that specifically investigates patterns in political violence and terrorism occurring in the United States from 1990 to the present. Data is continuously being added, so updates to the following analysis may need to occur when more recent data becomes available. Our analysis focuses on characteristics of the terrorists, and their acts, including demographics, religion, prosecution types, ideology, tactic, targeting, and group affiliation. Our goal is to show visually and statistically analyze how these variables change over time.[/su_expand]

0.1.2 Methodology

In order to more clearly detect variable changes, we split the data into time periods separated by major terrorist events. We decided to take this approach to not just evenly split the entire time period (events are not evenly spaced, but amount of data included in each period is fairly similar), but to also see if these major events induced any specific patterns within the variables. We try to depict the reasoning behind these changes, but all of this is subjective – correlation is not necessary causation. The only conclusions we can draw for certain, come from the statistical tests performed, relating to the overall change of each variables’ categories over time.

Some of the variables included many categories; and in order to fit them all into one graph, with enough data available within each category per period, we only took the categories with the highest frequencies – while combining some categories together. This was done on a case by case basis, and more information on how this was completed is in the Appendix. NA’s for that variable were deleted only for that variable, making sure to leave the data in the complete data set in case there were values (not NA) for the other variables.

In order to find differences in each variable over time, we summed each category within the variable and time period, and divided it over the total amount per time period. This gives us the frequency of each category per period, so that we can test if it has differences over time.

The tests we used for this are the Pearson Chi-Square Test, Fisher Exact Test, and Cramer’s V Statistics. Because of the minimal amount of data in some categories per time period, the Fisher Exact Test is included because it has more relaxed rules on data size, contrary to the Pearson Chi-Square Test which is testing similar things. Cramer’s V is a little bit different in that it measures how important the period is in determining each categories count.

These tests do not tell us if the variables’ categories are increasing or decreasing over time, so we created bar charts where all bars are equal to 100%, and within each period the categories are split into percentages.

We additionally wanted to see if any of the variables impacted the counts of another variable over time. To do this, we selected racial/ethnic group to compare with time against (1) prison sentence length, (2) plea and (3) tactic. The Cochran-Mantel Haenszel test was used in order to test the differences over time with now two variables and time, whereas we only had one with time in all previous tests.

0.1.3 Conclusion

We saw that characteristics of terrorists and their acts of terrorism have significant changes over the time period the data was collected in as of now. By using both visualizations and statistical tests, these changes can be closer investigated by importance and size, as each variable has its differences. Overall, the key variables to assign the most importance to based on the statistical tests are Othered Status, Citizenship, Tactic, and Group Affiliation. This is why the visualizations included in this report are chosen, and explored/researched reasons for these changes along with the directions of their differences.

0.2 Technical Report

0.2.1 Introduction

Terrorism in the United States peaked in the late 1960’s and early 1970’s, followed by a precipitous decline (Ross et al, 1989). Despite this decline, terrorism seems ever more present. Large scale media coverage and the development of social media have often been cited as contributors to discerned prevalence of terrorism (Weimann et al, 2014). Further, media coverage of events such as 9/11 has framed many attacks as “Muslims/Arabs/Islam working together in organized terrorist cells against a Christian America”. On the other hand, domestic terrorists often receive the label of “troubled individuals” (Powell, 2011). Thus, there is strong evidence of media coverage affecting the perception of terrorist attacks in the United States. Given the Prosecution Project (tPP) dataset, trends in terrorist activity are analyzed by grouping events into periods delineated by large scale media events and detecting any changes between said periods. This organization of events may allow for the detection of changes in terrorism, perhaps due to perpetrators attempting to imitate previous attacks covered in the media.

0.2.2 Methodology

In order to recognize the patterns in demographics, prosecution types, ideology, tactic, targeting, sentence length, informant, and group affiliation over time, each event was organized into different time periods separated by major terrorist attacks in the United States. The events of interest are listed below:

The purpose of this delineation is to determine whether these events, largely covered in the media, trigger “copycat” terrorist attacks (known as contagion) or somehow impact a variable’s distribution in time periods near said events (Nacos, 2010).

Once each event was grouped, the frequencies of each variable category were computed within each time period and compared using 2-way contingency tables. That is, each variable had its own contingency table with the rows representing the categories given in the variable of interest, and the columns representing the time periods described earlier. Often, multiple categories were either condensed or removed due to sparseness of information (see Appendix for the exact breakdown of tables). The difference in distribution of the categories across time will be tested using both a Pearson Chi-Square Test and Fisher Exact Test.

The Pearson Chi-Square Contingency Table Test tests homogeneity of the time periods. More specifically, it decides whether or not there is a difference between the proportions of the categories of a certain variable across the time periods. For example, if the gender variable were to be considered, it would test whether the proportion of events committed by males and females has changed over time. However, it does not indicate the direction of these changes (Lachin, 2011).

Most of the variables, however, violate the expected count assumption of the Pearson Test. The test assumes that the expected counts in each of the cells are greater than five, but much of the tables contains zero values in multiple categories. Despite this violation, the Pearson Chi-Square Test is quite robust with these small expected cell frequencies (Camili, 163). To ensure this infraction does not impact results, an additional Fisher Exact Test is performed.

Fisher’s Exact Test again tests a difference between time periods in each of the variable category proportions. Specifically, it counts the number of possible tables that could be constructed with the given marginal totals. Then, it computes the proportion of those tables that are more extreme than the observed table, giving a p-value (Raymond et al, 1995). Since this could amount to a large number of tables, a bootstrap simulation with 2000 replicates is considered. This test relaxes the assumptions given by the Pearson Chi-Square Test.

Trends will be visually analyzed using proportional, stacked bar charts. Along with the Pearson Chi-Square tests, Cramer’s V statistics were computed. Cramer’s V is a measure of association between two categorical values ranging from 0 to 1. The higher Cramer’s V, the stronger the relationship between period and the given variable is (Acock et al, 1979).

Finally, the interaction between racial/ethnic group, prison sentence length, and time is considered. Perhaps, over time, certain races will have differing sentence lengths, whether that be a result of discrimination, ethnic tendencies, or other factors. A three dimensional table will be considered with a Cochran-Mantel Haenszel Test applied. This test is an extension of the Chi-Square Test, and, in general, tests for differences in the joint and marginal distributions of three variables (Lachin, 2011).

In each table, any unknown observations were not considered, since they add no information to the story, other that adding sample size and changing inference in a direction that may not necessarily be honest.

0.2.3 Results

From the collection of two-way tables, the distribution of most variables have changed over time. Only the distribution of death sentencing and gender seemed homogenous over time, as both the Fisher and Chi-Square tests failed to detect a difference in their distributions. The uniformity of gender and death sentencing throughout the periods is not surprising, as the vast majority of events in the dataset were perpetrated by men and did not result in a death sentencing of the perpetrator. More interesting insights can be gathered visually.

The three-way tables invites some interesting insights. When comparing ethnicity, sentence length (categorized by every 100 months), and time period, there was no significant difference found between the distributions of the categories within each of the groups. The same results was reached when comparing ethnicity, plea, and time period. However, the Cochran-Mantel-Haenszel Test found a significant difference between the distribution of ethnicity and tactic over the time periods.

The following proportional, stacked bar charts show us how, and the direction of change, on the variables we felt were key to this analysis.

We see in Figure 1 that the amount of terrorism acts by Non-U.S. citizens has consistently decreased over time, with it reaching very minimal counts by 2015 to present day. In 2011, the Department of Homeland Security defined a new term of “specially designated countries” to be countries “that have shown a tendency to promote, produce, or protect terrorist organizations or their members.” In 2003, the Department of Homeland Security provided US border crossings with a list of 52 countries that fell under this term – in order to increase border security against possible terrorists. The list was continually updated and changed until present day. From 2007 to 2017, the US Border Patrol apprehended 45,006 immigrants from any of these countries to have ever been on the list. There have been zero attacks committed by illegal border crossings from any of the listed special designated countries. However, foreigners who have entered legally from these countries are responsible for 99.5% of all murders and 94.7% of all injuries committed by terrorists in the US from 1975 through the end of 2017 (Bier). We see that 9/11 may have spiked this trend that a successful strategy for foreign terrorism is to first enter legally, or to have a US citizen commit the act. After 9/11, the amount of non-US citizens to commit acts of terrorism is at its peak and then its decline. All terrorists involved in 9/11 were non-US citizens. This decrease in non-US citizens being able to commit acts of terrorism is likely the cause of increased security. However, terrorism is evolving so that the US may no longer be looking for non-US citizens to be committing these acts, as our graph shows.

Figure 2 is very interesting in how group affiliation overall changes over time. Not looking into specific terrorist events, but at each group over time, we see that Al Qaeda has decreased consistently over time, but the Islamic State has increased – by large amounts especially in more recent years. There are many factors that play into this variable’s directional changes, and we will try to summarize what we think is the cause the best that we can. Bin Laden, the previous leader of Al Qaeda, was killed in 2011. Period 6 is after the year 2009, and the period that we first start to see the decrease of Al Qaeda. This may be due to their leader dying, but some additional cause of conflict between groups could also play a role. Let’s start at the beginning. Period 4 is after 9/11, an event Al Qaeda wished to take credit for, and therefore Al Qaeda is strong and on the rise here. In period 5, which is after 2006 when Al Shabab was formed, we see a heavier Al Shabab presence seen. Al Shabab was known to be tied to Al Qaeda, and they declared official allegiance to them in 2012. We see both Al Qaeda and Al Shabab decrease after period 8 (2012), which is what we would expect as Al Qaeda was weakened, so was Al Shabab because of their affiliation. We now start to see the rise of ISIS, who have taken advantage of the weakened Al Qaeda and Al Shabab, in order to make their presence more known. Although these groups have similar views, they are not supportive of one another, and have different tactics on how they wish to be heard. We can see how the changes in tactic over time graph below reflects these different groups, by which tactic they decided to use.

Going back on what we discuss in the previous paragraph, we can see in Figure 3 that when Al Qaeda was in greater power, the categories of tactic that are most prevalent are crimes like Arson, Chemical or biological weapon deployment, and Explosives. These are all tactics that support Al Qaeda’s goal to plot terrorism spectaculars to electrify the Muslim world. Whereas, ISIS viewpoint is to aim to control territory and expand their ideology. This can be seen as why once ISIS are in more power, the popular tactics are Providing material/financial support to terrorist organizations, Firearms, and Armed intimidation/standoff – all ways to overtake, build their organization, and control.

Additionally, from Figure 3, we see rises in tactic that could be the result of the major acts of terrorism we split the periods by. Explosives seem to increase from period 1 to period 2, which is after the Oklahoma City Bombing. Also, after the Aurora Theater Shooting, there seems to be a drastic decrease in civilian firearms, while there is an increase in armed intimidation/standoff. On another note, we see perjury/obstruction of justice slowly appear and begin to increase from past to present. This could be the cause of laws changing over time, so as stricter laws are implemented, more people may be convicted.

Other notable changes where graphics are not included are listed here. The terrorists’ religion shows changes over time like after the Charleston Church Shooting, no Christians committed acts of terrorism. This could be due to the shooting happening in a Christian church, making other Christians less likely to commit any crimes or act out. The Veteran Status changing over time plot shows that after 9/11, the amount of veterans that committed acts of terrorism decreased drastically – then fluctuating but never again reaching the amount of terrorism acts before 9/11. Another change we see around 9/11 occurs in the ideological affiliation. We see that after 9/11 there is a massive increase in No Affiliation ideologies. This could be because groups were trying to draw attention away from themselves after all the security measures put into place after 9/11. We also see a huge increase in Rightist ideologies after the Charleston Church Shooting. This is interesting to note because the man that committed this act of terrorism was a 21-year-old white supremacist, who most likely believed in a rightist ideology. After the death of Trayvon Martin, State jurisdiction for acts of terrorism increased largely, possibly due to the pressure on local police following this event. The increase in verdict of charged but not tried over time can be due to possible ongoing cases as we get closer to present day. After the first major act of terrorism, we see more informants coming forward to prevent terrorist events.

The three-way tables invites some interesting insights. When comparing ethnicity, sentence length (categorized by every 100 months), and time period, there was no significant difference found between the distributions of the categories within each of the groups. The same results were reached when comparing ethnicity, plea, and time period. However, the Cochran-Mantel-Haenszel Test found a significant difference between the distribution of ethnicity and tactic over the time periods.

To further inspect these differences, a stacked bar plot was developed. Ethnicity was limited to only the white and middle eastern groups, as they provided interesting insight. Over time, it seems that of crimes in the data set committed by people of middle eastern ethnicity, the proportion of those crimes that included providing financial support to terrorist organizations has increased drastically over each time period. This occurrence spawned right before the 9/11 attacks. Crimes perpetrated by white individuals in Period 2, post Oklahoma City Bombing, started to consist mainly of explosives, perhaps furthering the idea of similar “copycat” crimes being committed after large media coverage of terrorist attacks. Similarly, after the Aurora shooting, white criminals seemed to heavily gravitate towards armed intimidation to commit their crimes as well. Other ethnicity plots can be seen in the Appendix.

0.2.4 Conclusion

The analysis provides some evidence that “copycat” terrorism or contagion impact the distribution of multiple characteristics of terrorist attacks over time. These changes are especially prevalent in the distribution of tactics across ethnicity and othered status after key events such as the Oklahoma City Bombing, 9/11, and the Aurora Shooting. Further, Ideological Affiliation trended towards Rightist Leanings after the Charleston Church Shooting, while Group Affiliation has seen a recent increase in attacks perpetrated by the Islamic State, despite the decrease in attacks perpetrated by Al-Qaeda. The claim that characteristics of these terrorist attacks are associated with the selected time periods are both bolstered by the results given by Chi-Square Tests and Cramer’s V quantitates. Of course, the Chi-Square Tests only say that period and terrorist attacks are associated and do not imply mechanism. However, the bar charts provide the context to our hypothesis. The analysis is limited by sparseness of events in some categories in which measures were taken to combat.

0.3 References

Acock, Alan C., and Gordon R. Stavig. “A measure of association for nonparametric statistics.” Social Forces 57, no. 4 (1979): 1381-1386.

Bier, David, and Alex Nowrasteh. “45,000 ‘Special Interest Aliens’ Caught Since 2007, But No U.S. Terrorist Attacks from Illegal Border Crossers.” Cato Institute, 17 Dec. 2018, www.cato.org/blog/45000-special-interest-aliens-caught-2007-no-us-terrorist-attacks-illegal- border-crossers.

Camilli, Gregory, and Kenneth D. Hopkins. “Applicability of chi-square to 2× 2 contingency tables with small expected cell frequencies.” Psychological Bulletin 85, no. 1 (1978): 163.

Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL http://www.jstatsoft.org/v40/i03/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

Jeffrey B. Arnold (2019). ggthemes: Extra Themes, Scales and Geoms for ‘ggplot2’. R package version 4.1.1. https://CRAN.R-project.org/package=ggthemes

Lachin, John M. Biostatistical Methods: The Assessment of Relative Risks. 3rd ed. Hoboken: Wiley, 2011.

Nacos, Brigitte L. “Revisiting the contagion hypothesis: Terrorism, news coverage, and copycat attacks.” Perspectives on Terrorism 3, no. 3 (2010).

Powell, Kimberly A. “Framing Islam: An analysis of US media coverage of terrorism since 9/11.” Communication Studies 62, no. 1 (2011): 90-112.

Raymond, Michel, and François Rousset. “An exact test for population differentiation.” Evolution 49, no. 6 (1995): 1280-1283.

Ross, Jeffrey Ian, and Ted Robert Gurr. “Why terrorism subsides: A comparative study of Canada and the United States.” Comparative Politics 21, no. 4 (1989): 405-426.

Weimann, Gabriel. New terrorism and new media. Vol. 2. Washington, DC: Commons Lab of the Woodrow Wilson International Center for Scholars, 2014.

See full report for complete contingency tables, stacked bar plots, and r code for age, gender, othered status, ethnicity, religion, veteran status, citizenship, jurisdiction, plea, verdict, length of sentence, death sentence, ideology, tactic, physical target, ideological target, informant, group affiliation, FTO affiliation.

tPP Preliminary Statistical Report #3 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the analysis report provided by Team #3 (Identity and Criminal Action Analysis) of 5.

This report was authored by Athena Chapekis, Jing Lin, Ruoqi Tan & James Wieneck. To download the report, click here.



Non-technical Summary

Introduction

We are presented with a data set involving individuals who were indicted and prosecuted for crimes which have socio-political motivations and/or crimes that have rendered them as designated terrorists in the United States. These cases involve various identity variables of the defendants (age, race/ethnicity, gender, “othered” status, religion, citizenship status, and veteran status), as well as various criminal activity variables (people vs. property, number injured, number killed, physical target, ideological target, and tactic). The question we seek to answer is: how do aspects of a defendant’s identity play a role in their criminal activity?

Results

The statistical result shows unbalanced levels among identity variables. For gender, the vast majority of offenders are male. For race and religion, ‘Muslim’ appear more frequently. Most cases have civilian status. The most common tactic is ‘Providing material/financial support to terrorist organization’, ‘Unspecified’ appear most frequently as ideological target, ‘Online’ appear most frequently as physical target.

All identity variables have significant relationships with activity variables, however the actual size of the effect varies across different variables. Gender and othered status affect the number of persons killed or injured significantly, with men and othered defendants having a higher injury count. Age was a consistently influential variable when examining how trends in criminal activity are influenced by one’s identity across the board, though it almost always had some interaction with citizenship status, veteran status, and/or othered status. Othered status was also a highly influential variable in predicting different trends in criminal activity.

Conclusions

This report finds that the identity variables which have the greatest prediction effect of criminal activity are Othered Status, Religion, Ethnicity/race, Citizenship Status, and Veteran Status. Gender is a significant predictor of the number of killed and injured by a crime but is not a significant predictor of other criminal activity variables.

The models we built in predicting trends in criminal activity based on the identities of the defendants had poor predictive power, in part because of unused scenarios and unspecified cases for multiple variables. The data set used for the analysis may likely need more information provided to give a more complete picture of how criminal activity is linked to a defendant’s identity.

Technical report

Introduction

The definition of what constitutes “terrorism” is not a unanimous one. Different sources report different standards for what an act of terror entails. Because of this, there has not been a thorough body of research built on terrorism in all its forms. Issue-specific groups like the Department of Justice (DOJ)/Federal Bureau of Investigation (FBI), the Center for Biomedical Research (CBR), and the National Abortion Federation (NAF) have collected their own databases of terrorism and terrorists over time, but they generally focus on one specific ideological group – whichever is of the greatest concern to them.

The Prosecution Project (tPP) is a large-scale project out of Miami University that seeks to construct a database of all acts of terrorism and socio-politically motivated crimes ending in felony prosecutions in the United States 1990-present. Each case in tPP’s database is coded across 44 variables, including demographic information on the defendant, details of their affiliations, details of the crime they committed, and details of the legal proceedings.

This report seeks to investigate the connection between a defendant’s identity (i.e. their demographic information) and their criminal activity and provide an answer to the question of how who someone is relates to what they do.

Methodology

The first step in approaching this analysis is to clean the data. Categorical variables which have many levels are reduced to allow for better comparison and analysis. Much of this reduction was done using the classification provided by the Prosecution Project codebook.

For example, in the variable Physical Target, the levels of ‘Federal site: non-military non-judicial’, ‘Federal site: military’, ‘Federal site: judicial’, and ‘Federal site: non-U.S. embassy or consulate’ are combined and recoded simply as ‘Federal site’. Furthermore, the levels for ‘State site’ and the levels for ‘Municipal site’ are combined with ‘Federal site’ to make one unified level of ‘Governmental site’. This is done for the variables of Physical Target and Ideological Target. Due to the low representation in many of the levels for the variable ‘Tactic’, many levels were combined into an ‘Other’ level. Other categorical variables that were not recoded but included in this report in their original state are People vs. Property, Gender, Ethnicity, Religion, ‘Other’ Status, Citizenship Status, and Veteran Status. For each categorical variable, a bar chart is generated to compare frequencies of levels.

To conduct an analysis, this report begins with T-tests to determine the influence binary predictor variables Gender (male v. female), Othered Status (othered v. non-othered), and Veteran Status (citizen v. non-citizen) may have on number of people killed and number of people injured in socio-politically motivated crimes. A significance level of 0.05 is used. Furthermore, Analysis of Variance (ANOVA) tests are used to test for significant differences in the number of people killed and the number of people injured between demographic groups for the identity variables of Race/ethnicity, Religion, and Citizenship Status. As well, ANOVA tests are used to see if a defendant’s age differs significantly between the types of things that are targeted in socio-political crimes (both physically and ideologically) and if age differs significantly between types of tactics. On top of the ANOVA tests, Eta Squared values are calculated to test for effect size in the relationships (Brown). To investigate relationships between categorical identity variables (e.g. Religion, Citizenship Status, etc.) and categorical activity variables (e.g Tactic, Physical target, etc.) Chi-Squared Tests of Independence are used. As well, Cramer’s V is used to calculate effect size for the respective relationships between these categorical variables. Initially, this report sought to use linear regression to create a predictive model of trends. However, we have found that due to the categorical nature of many of the variables (often with many levels) and given there are different trends among differing variables related to the crime, it is not advisable that we attempt to build regression models based on a singular response variable. Instead, we will want to use classification tree modeling for the categorical variables whose trends we want to analyze and ANOVA tree modeling for the numerical variables whose trends we want to analyze.

We will be using classification trees for the following variables: People vs. Property, Physical Target, Ideological Target, and Tactic; we will be using ANOVA/regression trees for the following variables: Number Injured and Number Killed. These will be considered as our criminal activity variables for this portion of the analysis. The identity variables we are using in this portion of the analysis are age, gender, race/ethnicity, religion, othered status, veteran status, and citizenship status. The purpose of this portion of the analysis is to see which aspects of a criminal’s identity are most often associated with various aspects of criminal activity, and also how these aspects interact or intersect. To validate the results from our classification and regression trees, we will also be using random forests for each model to see which variables are most significantly linked to each criminal activity variable, and to see which variables the most significant contributors were to differences in criminal activity trends (Liaw). For each random forest, 1,000 classification trees will be generated.

Results

For most of the categorical variables, there are a number of levels which appear in the data very infrequently.

Identity variables

Looking at the demographics of the data, we see fairly uneven representation among levels for almost all of the variables. As far as gender, the data is overwhelmingly male, and the levels of ‘Non-binary/gender non-conforming’ and ‘Unknown/unclear’ are used virtually never.

Ages range from 16 to 88 with a median age of 33 and a mean age of 35.9. The ethnicities of ‘Biracial’ and ‘American Indian/Alaskan Native’ hardly occur, and for Religion, ‘Jewish’ and ‘Other’ appear very infrequently. As well, ‘Christian’ and ‘Christian Identity’, while occurring somewhat more often, do not occur in the data nearly as often as ‘Muslim’ and ‘Unknown’.

Regarding Citizenship Status, all levels are relatively infrequent compared to ‘Civilian’ and ‘Foreign national’. There are more cases marked as ‘Othered’ than ‘Non-othered’, but both are well-represented in the data. Lastly, when looking at Veteran Status, almost all cases are coded ‘Civilian’. All othered statuses are fairly uncommon and combined make up only about 16% of the data.

Criminal activity variables

The most commonly occurring tactic by far is ‘Providing material/financial support to terrorist organization’. After that, ‘Explosives’, ‘Criminal violation not linked or motivated politically’, ‘Various methods’, ‘Arson’, and ‘Firearms’ occur most frequently.

All levels in the People vs. Property variable are fairly well represented. Regarding targets, for Ideological Target, ‘Unspecified’ is the most frequently occurring level in the data followed by ‘Government’, but all levels aside from those do appear to occur at similar rates. For Physical Target, the levels of ‘Online’, ‘Educational institution’, and ‘Municipal site’ do not occur frequently.

Analysis of Variance (ANOVA)

From the results of ANOVA test, the F test shows that race, religion, and citizenship have significant influence on number of killed and injured. The identity variable age has significant relationship with the activity variables people or property, physical target, ideology target, and tactic. The eta squared test shows that citizenship has larger effect on number of killed and injured than race and religion, and ideological target has the largest effect on age.

Student’s T-test

Regarding the number of people killed by a crime, we can be 95% confident that, on average, for each death caused by a woman’s crime, men’s crimes kill between 0.08 and 8.76 more people. For the differences in the number of people injured, we can say with 95% confidence that, on average, men injure anywhere between 16.11 and 52.71 more people than women in the course of a socio-politically motivated crime. There is no statistically significant difference in fatalities between crimes committed by othered and non-othered defendants, however, we can be 95% confident that othered defendants injure between 20.15 and 76.3 more people in the course of their crime than non-othered defendants. As well, there is no statistically significant difference found in the number of people killed or the number of people injured between the those who are civilians and those who were not.

Chi-Squared and Cramer’s V

The results of the Chi-Squared Test of Independence showed widespread statistical significance between all identity variables and all criminal activity variables. When Cramer’s V is calculated for effect size, however, it appears that many identity variables have a weak effect on criminal activity. Specifically, gender seems to have the least effect on criminal activity. Othered Status has a particularly significant effect on criminal activity, so much so that Cramer’s V indicates Othered Status may be measuring the exact same trends as the criminal activity variables.

Classification/Regression Trees and Random Forests

Figure 1. The classification tree for the people vs. property variable. At least 50 cases were required for each split, and each final outcome required at least 50 cases.

What we have been able to see is that for predicting the trends in whether a target is human or property, othered status appears to interact with veteran status and age. Othered defendants are more likely to either have targeted people or have no direct target (Figure 1). Of othered defendants who were of civilian status, released on hardship discharge, or whose veteran status was unknown, no direct target was identified; otherwise, people were more likely to be targeted. Among those of non-othered status, those whose veteran status was active duty, dishonorably discharged, belonging to a non-U.S. military, or unknown were more likely to target people. Among those who were not of those veteran statuses, age was an additional factor; those and who were 52 and under were more likely to target property, and those 53 and over were more likely to target people (Figure 1). We can see that the most significant variables which made a difference in the trends in which type of target was involved were othered status, veteran status, and age, in this order.

Figure 2. The variable importance plot for the people vs. property random forest model.

After conducting a random forest on the data used to build the classification model and plotting the importance of each variable, we find that veteran status, othered status, and age are the largest contributors to the differences in which types of targets defendants tend to target. Age and othered status were particularly strong determinants in these patterns (Figure 2).

Figure 3. The ANOVA/regression tree for the number killed variable. At least 20 cases were required for each split, and each final outcome of the tree required at least 15 cases.

What we can see for the number of fatalities in each crime is that there is a split at veteran status. Those whose veteran status was either active duty, civilian, dishonorably discharged, honorably discharged, or unknown had an average of 1.8 fatalities (Figure 3). Among that group, the average number of people who were killed as a result of a defendant whose citizenship status was either refugee, residing on a visa, a citizen, a permanent resident, or unknown had a fairly low average of 0.77 (Figure 3). Among defendants who were not of these citizenship statuses, there was an average of 5.7, and another split at religion (Figure 3). Those whose religion was identified as Christian or unknown had fairly low average fatalities at 0.43, which was lower than for those whose religions fell outside of these 2 categories at 10 (Figure 3). From there, age was a major determinant in the number of fatalities. Those who were under 25 had, on average, the second-most fatalities at 30, and those who were 25 or older only had 7.6 fatalities on average (Figure 3).

For defendants who were a former or current non-U.S. military member or who were discharged on the basis of hardship, the average number of fatalities was 10 times higher than defendants not of these veteran status categories at 18 fatalities (Figure 3). We notice that, from here, there is a split at age; those who were 35 or younger had an average fatality count of 6.2, whereas those who were 36 or older had an average fatality count of 32 (Figure 3). We can see that the most significant variables in predicting differences in the number of people killed were veteran status, citizenship status, religion, and age.

Figure 4. The variable importance plot for the people killed random forest model.

After conducting a random forest on the data used to build the regression/ANOVA model and plotting the importance of each variable, we find that age is a very significant predictor in determining the differences in fatalities among each case of crime (Figure 4). However, we cannot ignore the influence of veteran status or citizenship status, as they were significant variables on which the regression trees were split, and the variable importance plot also reflects this (Figure 4).

Figure 5. The ANOVA/regression tree for the people injured variable. At least 50 cases were required for each split, and each final outcome of the tree required at least 25 cases.

Looking at our results in Figure 5, we find that among defendants who were U.S. citizens, refugees, residents on a visa, permanent residents, or of unknown citizenship status, the average number of people injured was 4.1. For defendants who were not, there was a split at religion; those whose religion was identified as Christian, Christian Identity, or unknown had an average of 1.6 injuries (Figure 5). Among those whose religions were not in those categories, there was a split at age. For those who were 27 or older, the average number was 141, and for those who were 26 or under, the average number was 429 (Figure 5). We can conclude from this tree that citizenship status, religion, and age were important factors in predicting the differences in the number of people injured.

Figure 6. The variable importance plot for the people injured random forest model.

After conducting a random forest on the data used to build the regression/ANOVA model and plotting the importance of each variable, we find that citizenship status and age are particularly important in determining trends and predicting differences in the number of people injured as a result of a crime (Figure 6).

Figure 7. The classification tree for the physical target variable. At least 75 cases were required for each split, and each final outcome required at least 75 cases.

After conducting a random forest on the data used to build the regression/ANOVA model and plotting the importance of each variable, we find that citizenship status and age are particularly important in determining trends and predicting differences in the number of people injured as a result of a crime (Figure 6).

What we can see in this classification tree is that there is an initial split for othered status (Figure 7). Among those of othered status, we can see a split for veteran status. Among defendants who were civilians, former veterans released on hardship discharge, or former veterans who were honorably discharged, the physical target was more likely to be unspecified; among defendants whose veteran status did not fall in these 3 categories, no direct physical target was found (Figure 7). For those of non-othered status, private sites were more likely to be attacked, and there was a split for religion. Defendants whose religion was identified as Christian, Jewish, or Muslim were more likely to have an unspecified target, and those whose religion was not one of those 3 were more likely to attack private property (Figure 7). There is a further split in age; defendants who were 40 or older often had an unspecified physical target, whereas those under 40 tended to attack private sites (Figure 7).

Figure 8. The variable importance plot for the physical target random forest model.

After conducting a random forest on the data used to build the classification model and plotting the importance of each variable, we find that age, religion, othered status, and veteran status are important in predicting differences in physical targets (Figure 8). Age and veteran status appear to be particularly important in determining the differences between physical targets (Figure 8).

Figure 9. The classification tree for the ideological target variable. At least 50 cases were required for each split, and each final outcome required at least 25 cases.

We notice that the first split of this classification tree is at othered status (Figure 9). Of defendants who are of othered status, there is a split at veteran status. For defendants who are civilians, were honorably discharged, were discharged on the basis of hardship, or whose veteran status is unknown, there was an unspecified ideological target; for defendants whose veteran status is not one of those 4 categories, government was the most likely ideological target (Figure 9). For those of non-othered status, there is a split on age; those who were 35 or over were more likely to attack government targets on the basis of ideology (Figure 9).

For non-othered defendants who were under 35, there was a split on religion; those whose religions were identified as Christian, Christian Identity, Jewish, or Muslim tended to attack on the basis of identity (Figure 9). Among those whose religions were not one of those 4 categories, veteran status was a significant predictor; civilians were more likely to attack left-leaning industries, while non-civilians were more likely to attack government on an ideological basis (Figure 9). In general, we have found that othered status, veteran status, age, and religion were significant variables in predicting ideological target.

Figure 10. The variable importance plot for the ideological target random forest model.

After conducting a random forest on the data used to build the classification model and plotting the importance of each variable, we find that age, othered status, and religion are important in predicting differences in ideological targets (Figure 10). Age and othered status appear to be particularly important in determining the differences between ideological targets (Figure 10).

Figure 11. The classification tree for the tactic variable. At least 100 cases were required for each split, and each final outcome required at least 100 cases.

Othered status appears to be very significant in predicting the tactic that a defendant used in committing a crime (Figure 11). Among those who are of othered status, the most common tactic, by far, was providing material support to a terrorist organization (Figure 11). Among those of non-othered status, religion is a significant predictor of tactic; defendants whose religion was identified as Christian, Muslim, or “Other” were more likely to employ multiple (or various) methods (Figure 11). Among defendants whose religion was not Christian, Muslim, or “other”, age is a significant predictor of tactic; those who were 30 or over were more likely to use explosives when committing a terrorist act, and those who were under 30 were more likely to use arson (Figure 11).

Conclusions

This report finds that while all interactions between variables that define a defendant’s identity and variables that define a defendant’s criminal activity are significant, the variables which have the greatest prediction effect in terms of criminal activity are whether a defendant is othered or non-othered and the factors which contribute to that differentiation (religion, ethnicity/race, citizenship status), and a defendant’s veteran status. A defendant’s gender, while a significant factor in terms of the number of victims that result from a socio-politically-motivated crime, is generally not a significant predictor in other factors of criminal activity (tactic, target, etc.). The results from our classification/regression trees and random forests appear to show that the most significant identity variables associated with different trends in criminal activity were related to age, citizenship status, veteran status, religion, and othered status. For the classification trees and their associated random forests, the variables that were particularly of importance were age and othered status, and for the regression trees and their associated random forests, the variables that were particularly of importance were age and citizenship status. Overall, age proved to be a very significant predictor in explaining differences in trends in criminal activity.

Some limitations of these random forests and classification/regression trees was the large number of unspecified or unknown cases, as well as a sizable number of unused levels for tactic, physical target, ideological target, and people vs. property. We noticed that for the classification tree models, the general error rate generally ranged from 46-55%, and for the regression/ANOVA tree models, the percentage of variability explained by the model was in the negatives. Thus, because of the poor predictive power of these models, we must exercise caution in assuming that the identity variables we found to be significant have any causal effect.

References

Brown, James D. 2008. “Effect size and eta squared.” JALT Testing & Evaluation SIG News. conjugateprior. 2013. “Formulae in R: ANOVA and other models, mixed and fixed.” Blog. Accessed February 27, 2019. Retrieved from http://conjugateprior.org/2013/01/formulae-in-r-anova/.

Liaw, A., and M. Wiener 2002. Classification and Regression by randomForest. R News 2(3), 18-22.

Loadenthal, Michael, et al. 2019. “The Prosecution Project (tPP)” (Version March 2019) [Dataset]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

Loadenthal, Michael, Athena Chapekis, Lauren Donahoe, Alexandria Doty, and Sarah Moore. 2019. “The Prosecution Project (tPP) Codebook” (Version 2) [Code book]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

Loadenthal, Michael, Athena Chapekis, Lauren Donahoe, Alexandria Doty, and Sarah Moore. 2019. “The Prosecution Project (tPP) New Member Guidebook” (Version 1) [Instructional Manual]. Miami University Sociology Department. https://tpp.lib.miamioh.edu.

Milborrow, Stephen. 2018. rpart.plot: Plot ‘rpart’ Models: An Enhanced Version of ‘plot.rpart’. R package version 3.0.6. https://CRAN.R-project.org/package=rpart.plot

Navarro, D. J. 2015. Learning statistics with R: A tutorial for psychology students and other beginners. R package version 0.5. University of Adelaide. Adelaide, Australia.

Salvatore S. Mangiafico. 2015. “Student’s t–test for Two Samples”. http://rcompanion.org/rcompanion/d_02.html

Therneau, Terry, and Beth Atkinson. 2018. rpart: Recursive Partitioning and Regression Trees. R package version 4.1-13. https://CRAN.R-project.org/package=rpart

Wickham, Hadley. 2017. tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

tPP Preliminary Statistical Report #2 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find the non-technical analytical summary and selected visualizations for Team #2 (Ideological Analysis) of 5.

This report was authored by Lesi Wei, Lexi Gelinas, Siqi Zhang & Yiduo Yang. To download the complete report, click here.



Introduction

The Prosecution Project (tPP) has collected data on cases in which individuals or groups engage in political violence that results in a felony or has been described through State speech as having a connection to a terrorist or extremist group with a political agenda. Specifically, this analysis is looking at several key variables in the relationship between ideology and the political violence itself.

Results

Ideology and Lethality

There are more instances of political violence that do not result in a death, but of the ones that do, Rightist groups commit more of these attacks than other groups.

Ideology and People vs Property

Salafi, Jihadist, or Islamic groups commit more attacks against no direct target than any other group. Rightist groups have more cases in which they attack property than people.

Tactic and Physical Target

Threat/support of an organization is the most used tactic and has the most cases in the online community and against unknown targets.

Ideology and Ideological Target

Salafi, Jihadist, or Islamic groups have more cases in which they attack unspecified ideological targets more than any other groups.

Ideology and State Speech

No group affiliation and Leftist groups have more cases in which they use state speech than the other groups

Tactic and Group Affiliation & FTO Affiliation

 

Salafi, Jihadist, or Islamist individuals tend to have strong tactic of threat/support of an organization, and the rightist tend to external device as their tactic. And group that affiliation with an FTO, individuals tend to provide material/financial support to the terrorist organization. No affiliation with an FTO, leads to more use of an external device.

Ideology and Location

Salafi, Jihadist, or Islamist Individuals commit more attacks in the East Coast, West Coast, and Midwest areas in the United States. Rightist groups commit more attacks in the Central area of United States. Leftist only have two states in which they commit the most political violence.

Conclusions

Not all groups of categorical variables have obvious trends, only few categories have some significant trends under each variable based on the plots. The deeper analysis will examine this in the technical report part.

References

McHugh, M. (2013). The Chi-square test of independence. Biochemia Medica 23 (2) 143-149.

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ .

tPP Preliminary Statistical Report #1 of 5

The following report was completed by statistics students utilizing a version of tPP dataset as of March 13, 2019. These analyses are focused on developing models for future use, and the interpretations and conclusions they contain reflect a dataset still in development, and only a superficial engagement with the wider literature on political violence. We continue to expand, improve and refine the data, and as such, these analyses should be seen as preliminary and subject to change. This views expressed in these reports belong solely to the authors, and do not necessarily reflect the findings of tPP team and are subject to further inquiry and revision.



Below you will find an analytical summary and selected visualizations for Team #1 (Descriptive Analysis) of 5.

This report was authored by Emma Ellis, Sikai Huang, Haiduan Tao & Haosen Yang. To download the complete report, click here.



The main question answered in this report is: How does the US legal system prosecute acts of political violence (descriptive) and how has this changed over time and space?

First, the data was mined and edited using RStudio. The final format had 1280 observations. The only observations that were removed from the data set were cases that had ‘pending’ as values because these had no information and would negatively impact the descriptive statistics that were created. Each of the variables chosen had a table created. These tables looked at Category, Number of Observations, Average Prison Sentence Length, Percentage of Life Sentences, and Percentage of Death Sentences. Multiple tables had a lot of zeros under the death sentence column.

After tables were initially created it was decided that the combination of some categories depending on the variable would occur. The only variable that did not have a table created was the location. That is because a geomap was found to be more beneficial as a visualization. The geomap showed that states with higher populations also had a higher amount of life and death sentences.

The white color states (Wyoming, Nebraska, Rhode Island, and Hawaii) have no information in the data provided in the project. New York has the largest prosecution count number, far more than other states. Overall, about 87% states’ length of prison sentences is fewer than 200 months. Oklahoma and New Hampshire have longer prison sentence than other states, but they have few prosecution counts. Texas, California and New York also have relatively longer prison sentence with more prosecution count. Oklahoma has the largest percentage of life sentence and death sentence. Nearly half of the states have life sentences and 23% of states have death sentences.

Since this analysis is wholly descriptive there can be no definite conclusions drawn for predicting the length of a prison sentence. From the tables that were created and the geomap, there are some trends that were found in regards to life and death sentences.

One major finding is that there were no death sentences given to any case where the criminal was not of U.S. Citizenship.

Another notable find was that if there were no deaths involved there was no death sentence given, the most interesting part of this is that there were over 1,000 observations of zero killed.

The last notable find was that if an informant was present there were no cases that resulted in the death penalty. This can be explained by a crime being able to be stopped if the police were informed beforehand.

References

Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL http://www.jstatsoft.org/v40/i03/ .

David M Diez, Christopher D Barr and Mine Cetinkaya-Rundel (2017). openintro: Data Sets and Supplemental Functions from ‘OpenIntro’ Textbooks. R package version 1.7.1. https://CRAN.R-project.org/package=openintro

Paolo Di Lorenzo (2018). usmap: US Maps Including Alaska and Hawaii. R package version 0.4.0. https://CRAN.R-project.org/package=usmap

Carson Sievert (2018) plotly for R. https://plotly-book.cpsievert.me

So what do tPP team members do?

As we begin recruiting for the next class of tPP students, I have been receiving a lot of emails asking what exactly being part of the team entails. Well, in the fall, tPP will be ran through SJS497 where we will learn data science and methodology skillsets in the classroom each Tuesday, and then practice them in the classroom each Thursday. For example, on a Tuesday we may learn how to verify a newspaper story via locating and interpreting a criminal indictment, and on Thursday, use that approach to verify and complete various cases under analysis.

Throughout the semester we plan to cover a wide range of tasks, including but now limited to:

  • Coding cases: This is one of the main tasks of tPP. This involves studying a particular criminal case, collecting the necessary source documents (e.g. Case Docket, Indictment, Criminal Complaint, Plea Agreement, Sentencing Memorandum, newspaper article) and then translating these texts into codes from our code book. For a bizarre cartoon explaining Qualitative Coding, check this out. Like all tPP skills, this will be taught in class and then practiced in a workshop style
  • Checking, improving and verifying cases already in our system. This is especially important as cases change–defendants are sentenced, fugitives are captured and tried, and arrests continue to occur
  • Helping to identify new cases for inclusion through reviewing and monitoring services of the Department of Justice, US Attorney’s Office, FBI and others.
  • ‘Scraping’ and ‘mining’ texts from large documents to help locate new cases for inclusion and to ensure all appropriate cases are counted
  • Evaluating cases marked for exclusion through investigating the facts of the cases and working them through a decision tree
  • Evaluating documents for accuracy, authenticity and reliability; rep-lacing poorly scoring sources with better sources
  • Reviewing the work of your fellow coders, providing peer-review and intercoder reliability and helping to refine the code book
  • Refining the data for analysis which involves ‘cleaning’ the data, shifting its format, exporting/importing and learning how to work with the materials in SPSS, R, Tableu, GIS and a variety of other tool suites.

So if this sounds like you, get in touch with us. Check out this post for information on SJS497 and the application process.

tPP’s brand new tri-fold pamphlet

Here at tPP, we believe in communicating. We want to be in communication with scholars, with journalists, with policy makers and anyone who would like to engage with complex questions. To that end, we have recently completed designing a tri-fold pamphlet in conjunction with Nando Zegarra, Kendall Erickson, and the folks at Miami University’s SLANT Marketing & Design.

We’ve already put these into the hands of a few noted scholars, a few students, and a reporter or two. We plan to use them to better communicate to students about the opportunities the project offers, and to make direct appeals to incoming students, and other prospective coders, analysts and team members.

To check it out, click here: tPP tri-fold pamphlet

tPP crunches the numbers for news report

In what we hope will be a recurring pattern, tPP was contacted by a reporter investigating threats against elected officials. Since we have a rather unique data set, we were able to provide the investigator with a quantitative breakdown of our relevant cases, as well as speak to him on the phone to provide context, background and help frame the data.

You can see the great reporting here: https://qz.com/1578862/arrests-for-death-threats-against-us-politicians-rose-in-2018/

You can also see the great findings and analysis report provided by tPP Steering Committee members Athena Chapekis and Lauren Donahoe here: tPP report on threatening public officials

tPP forms its Fall 2019 team!

Hello current & future tPP team members!
We are excited to announce that we will continue to build, refine and analyze the tPP data set this fall through a new course, SJS497, which Miami University students are welcome and encouraged to enroll in to serve on the project for the Fall 2019 semester.
This is a very exciting time to join the project as our completed case count nears 1,700, our first publications are about to come out, our Advisory Board forms, and our social media presence is getting more and more attention.
SJS497 (CRN:75594)…the class through which we’ll be running the Prosecution Project through for the Fall, will be held Tuesday and Thursdays, 10:05-11:25 in Upham Hall. You will need to register for the course to participate as part of the central coding, research and analysis team. If you plan to register for the class, you MUST get in contact with tPP’s Director, Dr. Loadenthal, and let him know. A few points of clarification:
  1. The class will be limited to 25 students, and with 20 students (as of 5 April) already asking to join, we are very encouraged. Soon we will be reaching out to invite applicants from Sociology/Criminology, pre-law, Political Science, International Studies, Global and Inter-Cultural Studies, and other programs. We expect these efforts to fill the remaining seats in the class. So if you are interested in the class, please let us know ASAP.
  2. If you have not been a part of the team in the past, you will need to complete the application online so we can see where best to place you in the project. The form should take less than 10 minutes and is available here: https://tpp.lib.miamioh.edu/want-to-join-the-team/. After completing the form, you’ll need to email your resume/CV to Dr. Loadenthal.
  3. We are also looking to recruit a small number of students for specific project roles. These students would not be expected to enroll in SOC497 but would instead work alongside the project Director via an Independent Study. If you have experience in any of the following areas and would like to take part in the project, contact Dr. Loadenthal
    • machine learning/Python
    • grant writing
    • mapping/GIS
    • database design (e.g. File Maker, SQL)
  4. If you use Twitter, please follow us (https://twitter.com/ProsecutionThe) so you can begin to see what types of cases make up the project. Casually following these updates between now and August will suit you well for engaging with tPP in the fall.
(our Spring 2019 team)

Faults in Statistical Analysis and tPP’s Solutions


This continues our series of student reflections and analysis authored by our research team.


Continuing along the theme of the correlation we might find between attack lethality (i.e. the number of fatalities recorded from a terrorist incident) and affiliation with an FTO, there are several problems we may encounter as a team when running linear regressions on the variable “Number Killed” with any other variable. The tPP dataset, for one, is a dataset that codes terrorist incidents on an individual basis rather than an event basis. Because of this, when running a linear regression on“Number Killed” and “Affiliation with FTO”, for example, the scatterplot will include individual data points for each individual. This is problematic because when we consider cases that include multiple perpetrators, fatalities will be repetitively counted based on the number of perpetrators carrying out the attack. Take, for example, the case in my previous blog post which included six individuals who called themselves “The Family” and carried out attacks that were affiliated with the Animal Liberation Front (ALF) and the Earth LiberationFront (ELF). In the arson attack on BLM Wild Horse Corrals in Litchfield, California, all six individuals carried out the attack, be it through organizing or perpetrating the actual attack. Although there are no deaths which resulted from the incident, each of the six individuals which appear in our dataset are assigned a 0 value for the “Number Killed” variable. When any statistical software plots this attack on a graph, the zero deaths that resulted from this attack will be counted six different times, essentially as six different incidents. In other words, any regression which runs the variable“Number Killed” against another variable will be skewed and inaccurate.

In addition to the repetitive counting issue we see in regressions including the “Number Killed” variable, we also see a significant number of extreme outliers for the yes-valued entries under “Affiliation with FTO” when running a regression between the two variables (this regression was accomplished in my preliminary analysis on the correlation between attack lethality and FTO affiliation). Although, we also see a substantial number of outliers for the no-valued entries under“Affiliation with FTO”, most of the numerical values for these outliers are much lower than the numerical values of the yes-valued data points. Take, for example, the figure below. In this figure, we would not expect the relationship between “Number Killed” and “Affiliation with FTO” to be linear, rather we would expect the relationship to be exponential. Due to this skewed nature of“Affiliation with FTO”, a linear regression, again, would not accurately capture the relationship between the two variables. This is problematic because the linear equation we would obtain from running a linear regression on the scatterplot below, would not give meaningful results and our analysis would be distorted. In the preliminary analysis we ran on these two variables we concluded that there is a significant positive relationship between “NumberKilled” and “Affiliation with FTO”, but because of the two major issues 1) with the nature of the dataset 2) with the nature of the “Affiliation with FTO”variable, our analysis is mired in falsehoods.

To resolve the issue of repetitive counting in our dataset, we are in the process of compiling the entries in tPP to a secondary dataset which will account for all of the perpetrators in the dataset but will record each entry on a per incident basis. In other words, this new dataset will eliminate the counting errors experienced in our original dataset. The new dataset will allow members of tPP to run regressions on the“Number Killed” and “Number Injured” variables with other variables in our dataset and obtain accurate results. As tPP is approaching its fifth semester in existence, a separate “analysis” course has been created for team members to extrapolate constructive and meaningful results from our data. The new dataset will be crucial in furthering students quantitative analysis of our data.

Then, to resolve the issue of skewedness in the above scatterplot, and in my existing regression, the variable “NumberKilled” will need to be logged on “Affiliation with FTO”.

This equation will come closer in producing a realistic relationship between “Number Killed” and “Affiliation with FTO”.

As tPP moves forward, it is our goal to always analyze our data in an accurate and ethical manner. The problems that we have encountered thus far are in the process of being resolved. We will continue to resolve any issues we notice along the way.

  • Meg

Who are informants and what role do they play in prosecutions?


This continues our series of student reflections and analysis authored by our research team.


Within the tPP dataset, there is a code that involves informants.  It is a binary variable and indicates whether or not an informant was used for the defendant.  Most of the informants within our dataset are from the Federal Bureau of Investigation (FBI).  With this variable, about 72% of our defendants had informants for their cases (817 of 1130 cases).  It is interesting to examine the rules and regulations behind these informants.  There have been five sets of guidelines over the past 30 or more years.  These guidelines were set by the attorney general at the time in conjunction with Congress and some of the committees that deal with criminals and terrorism.  Each of these guidelines builds off of the previous set of guidelines and adds different aspects to these guidelines respective to the time and the arising threats.

The five guidelines include; The Levi Guidelines (1976), The Civiletti Guidelines (1980-1981), The Smith Guidelines (1983), The Thornburgh Guidelines (1989), and the Reno Guidelines (2001).  The Levi Guidelines establish the basics of FBI investigations that occur due to finding specific facts that lead them to believe a violent crime is inevitable. These guidelines wanted to minimize the use of informants and maximize the rights of the citizens (Office of Inspector General).  These guidelines caused many concerns for the FBI especially with their collection and use of intelligence to help prevent large-scale attacks.

In the beginning of the 1980s, a new set of guidelines were put into place and these were the Civiletti Guidelines.  Most of the changes that these new guidelines brought about allowed FBI informants to participate in criminal activities through authorization, created new directions for FBI informants to follow, and clarified and revised undercover operations practices (Office of Inspector General).  Many of these changes were in response to actions of the FBI at the time and some of its undercover operations.

A few years later, the Smith Guidelines were introduced (1983).  These guidelines relaxed the restraints on domestic security investigations to allow for the FBI to protect citizens from more complex groups while also allowing for peaceful and lawful dissent (Office of Inspector General).  These guidelines also allowed for a greater look into terrorism and developed rules and regulations for a domestic security/terrorism investigation (Office of Inspector General).  The use of terrorism by groups within the United States was increasing at this time and becoming more of a problem for law enforcement.  These new guidelines helped to establish a way for the FBI to deal with these emerging threats in a lawful way.

At the end of the 1980s, the Thornburgh Guidelines were developed. It arose from a case that the FBI was working internationally with a group in El Salvador.  These guidelines added direction for field offices when dealing with cases involving international terrorism (Office of Inspector General).  Questions arose concerning the FBI dealing with some international cases of terrorism and whether they were going past their rights and any individual rights.  These new guidelines addressed what the FBI can and cannot do concerning international terrorism.

The last set of guidelines that were put into place were the Reno Guidelines in 2001.  Concerns had arisen about the capabilities of the FBI to handle large cases of terrorism such as McVeigh’s bombing of the federal building in Oklahoma and the 1993 World Trade Center bombing.  On the other hand, there were also concerns about the handling of informants by the FBI.  The Reno Guidelines clarified and revised the Civiletti Guidelines concerning the use and handling of informants as well as changing the interpretation of some of the previous guidelines to give greater confidence to the FBI when dealing with cases of preventing terrorist attacks (Office of Inspector General).

All of these guidelines build upon each other and help reconcile some problems the FBI has had in the past.  It makes sure that the FBI is staying within the rights of the citizens as well as helping them in combating domestic terrorism.  Understanding the role of the informant within these cases allows for a deeper analysis of the defendants who had informants within our dataset.

– Lizzy Springer


References

Office of the Inspector General. 2005. “Historical Background of the Attorney General’s Investigative Guidelines.” Federal Bureau of Investigation’s Compliance with the Attorney General’s Investigative Guidelines.