Out of Sample Regression Prediction of Mass Shooting Fatalities

Does a Trump dummy “work”? Reader sam writes:

i think you’re putting too much weight into too few observations.

Some things to make your analysis more convincing 1) show the if predictive accuracy increased with a trump dummy OUT OF SAMPLE or 2) try placing the ‘trump dummy’ variable a few months before or a few months after and see if that changes the coefs. i doubt you’ll see much of an effect.

I did (2). sam is wrong. I address (1) by estimating a regression 1982q4-2016q4:

f = -14.838 + 0.083pop
f = -14.438 + 0.0835pop (corrected 11/10, 1:30 Pacific)

Adj.R2 = 0.08, N = 137, DW = 1.84, bold denotes significance at 10% msl using HAC robust standard errors.

Where f denotes mass shooting fatalities per quarter, pop is average monthly population in millions.

Here is a graph of actual cumulative, estimated cumulative, and upper 90% prediction interval. Actual exceeds predicted.

Figure 1: Cumulative mass shooting fatalities by quarter (black line), from 1982M08, through November 8, 2018. OLS fitted line (dark red), upper 90% prediction interval (pink). Orange denotes 2017Q1-2018Q4, light orange 2016Q4-2017Q1. Source: Mother Jones, accessed 11/8/2018, and author’s calculations.

In other words, out-of-sample prediction suggests an important and statistically significant Trump effect (taking into account the fact the population is an included trend variable, and that inclusion of a deterministic trend usually leaves unchanged the Trump coefficient, as discussed in this post).

54 thoughts on “Out of Sample Regression Prediction of Mass Shooting Fatalities

  1. AS

    Professor Chinn,
    A couple questions.
    1. Is there a typo on the constant in your equation?
    2. Did you begin the 90% upper bound at 2017Q1?


        1. pgl

          “gun control laws increase crime”. Care to provide us with a shred of credible evidence here? You did not even cite the writings of John Lott who FYI is not a credible source.

          Are you in a contest with Sarah Sanders for most lies per day?

      1. PeakTrader

        And, do guns kill people or do people kill people?:

        We know per 100,000 people the homicide rate of blacks is 19.5, Hispanics 5.3, and whites 2.5. However, what is white?

        “…a majority of Hispanics actually self-identify as white. According to the 2010 U.S. Census, 53 percent of Hispanics chose “white “ as their race.”

        1. pgl

          “We know per 100,000 people the homicide rate of blacks is 19.5, Hispanics 5.3, and whites 2.5. ”

          Wow both racist and not supported by any source. PeakRacist or PeakDishonesty?!

  2. Moses Herzog

    The video below is related to the great inheritor donald trump and the “MAGA” vacancy space located between some people’s ears.

    I’m gonna warn Menzie and the readers, that after the 5:12 mark the language gets vulgar, a little crude, and could be said it’s “in poor taste”. If Menzie decides not to put this up I understand. But I’m going to tell you, all you have to do is stop the video, change videos, or close out Youtube at the 5:12 mark. But in between the part where I have this link start the video and the 5:12 mark is comedy gold. So we are mostly adults , yeah?? With the possible exceptions of CoRev, Ed Hanson, and PeakIgnorance we can all handle clicking the pause button and exiting on a commonly used social media site at the 5:12 mark yeah??

  3. Moses Herzog

    Off-topic Menzie, I obviously don’t know you personally. But, you read enough of someone’s writings, say a newspaper columnist, or you’re reading the 15th novel of some famous writer or their 3rd biography, and you start to feel in your mind (however false that impression is) that you “know” that person. So maybe that is “kinda” my feeling towards you. So, my impression is, standing from afar, you’re a pretty good judge of character. Which brings me to the following question: When exactly (a specific year and month would be cool) did your perceptive abilities conclude, that eventually, Peter Navarro would flip out and completely and utterly lose his mind??

  4. Bruce Hall

    Most of the sudden increase is one incident of an insane person in Las Vegas.
    • attributable to Trump?
    • preparing for how long?
    • political in nature?
    • why didn’t Obama ban “bump stocks”?


    Correlation (Trump being president) is not necessarily causation (a deranged man shooting into a crowd)… except for purposes of this economics blog. Now let’s chat about how Trump is responsible for the hurricane that hit Panama City, FL because he cut corporate taxes.

    1. 2slugbaits

      Bruce Hall Most of the sudden increase is one incident of an insane person in Las Vegas.

      But see Menzie’s previous post regarding the COUNT model; i.e., the negative binomial regression. The frequency of the attacks is increasing and the TRUMP dummy variable is statistically significant at p = 1%.

    2. Menzie Chinn Post author

      Bruce Hall: Since you’re the second to suggest omitting Las Vegas, I have the results already. I’ll just say the point estimate for the Trump dummy is still statistically significant after making that adjustment. See addendum to this post (Figure 4).

      1. Bruce Hall

        Menzie, causation or coincidence?

        Are the shooters wild-assed Trump supporters shooting up Progressive’s gatherings? The CBS links provided earlier would indicate not. But unless there was a definitive analysis of the motivations and political affiliations of the shooters through the years, I’d say the question goes unanswered regardless of statistics.

        There is something to be said about the idea that notoriety increases frequency. Whoever heard of terrorist attacks using vehicles to run down pedestrians… and then it became commonplace. Was the shooter of Republican congressman Steve Scalise a closet Trump fanatic or just another nut job seeking attention? Was the Las Vegas shooter a closet Trump fanatic or just another nut job seeking attention?

        Statistically significant connections are often without logical connections. Are you arguing that Trump is driving these shooters crazy?

      2. Bruce Hall

        Menzie, you like to use Mother Jones as a source, so I’ll go along with that. https://www.motherjones.com/politics/2015/10/columbine-effect-mass-shootings-copycat-data/

        So… I guess the increase in shootings is Bill Clinton’s fault. Or maybe, just maybe, presidents don’t have control over insane people. They certainly don’t have control over the opposition party. I’m thinking that they rarely have control over their wives (I know, that’s sexist).

  5. Moses Herzog

    No better way for Cadet Bone Spurs Trump (the one with 5 draft deferments) to show his “love” for the U.S. military veterans than choosing an “acting” Attorney General who made money scamming and f*cking over…… wait for it……..wait for it……. wait for it……….F*cking over U.S. Military Veterans of their life savings:

    All of the rest in bold below I lift verbatim from a portion from Jon Swain’s great journalism in “The Guardian”. Swaine’s reporting is based out of New York:
    Another WPM client, Ryan Masti, who served in the navy and suffers from dyslexia and attention deficit hyperactivity disorder (ADHD), said a WPM representative boasted of the company’s connections to Whitaker and Mast in a promotional telephone call that persuaded him to hand over money.
    Masti told the court he lost more than $75,000 after paying WPM to register, develop and promote his idea for “Socially Accepted”, a social network aimed at people with disabilities. He said that in return he received only a press release, a logo and a shoddy website template.
    “I spent the money on a dream to help people,” Masti said in an interview on Friday. “And I lost everything.”
    Masti, a 26-year-old farmer from upstate New York, borrowed $50,000 from his father’s retirement account, took out a commercial loan for about $20,000 and used another $7,000 he had inherited from his late grandfather, a veteran of the second world war. A WPM executive told him he “could make a million in sales” as a minimum, he said.
    Having voted for Trump enthusiastically in 2016, Masti said on Friday he would soon be changing his party affiliation to Democratic, following the president’s elevation of Whitaker.
    “It’s totally ridiculous,” said Masti. “It makes the whole Republican party look so bad. How could a president appoint someone like this? And then not have a problem about it when it comes out? He should be taking care of the victims.”

    There are more examples of the U.S. Military Veterans “acting AG” Matthew Whitaker F*cked over given in Jon Swain’s article:

    1. pgl

      ‘Whitaker, a former US attorney in Iowa, was paid to work as an advisory board member for World Patent Marketing (WPM), a Florida-based company accused by the US government of tricking aspiring inventors out of millions of dollars. Earlier this year, it was ordered to pay authorities $26m. Several veterans, two of them with disabilities, said they lost tens of thousands of dollars in the WPM scam, having been enticed into paying for patenting and licensing services by the impressive credentials of Whitaker and his fellow advisers. None said they dealt with Whitaker directly. “World Patent Marketing has devastated me emotionally, mentally and financially,” Melvin Kiaaina, of Hawaii, told a federal court last year, adding that he trusted the firm with his life savings in part because it “had respected people on the board of directors”. The 60-year-old said he was a disabled veteran US army paratrooper and paid the company in 2015 and 2016 to patent and promote his ideas for fishing equipment. “I received nothing for the $14,085 I paid to the company, other than a bad quality drawing and logo that my grandson could have made,” he said.’

      Whitaker should be in jail and not serving as Attorney General. How disgraceful.

      1. baffling

        you don’t think this info is known and the appointment intentional? whitaker will serve as a distraction for other trump maneuvers-this was the plan all along. another pawn (whitaker) will be left out to hang by trump (ie sessions WAS a senator, tillerson WAS a ceo, campaign manager is in jail, etc) while furthering trumps agenda. it just goes to show we have a lot of rubes still in this world.

  6. Moses Herzog

    More interesting stuff on the scammer and fraudster who likes to steal U.S. Military Veterans’ savings that Trump has his new man-crush on:

    You’d think “acting” AG Matthew Whitaker would be busy enough doing half-hour long infomercials late at night trying to sell copper frying pans to senile old ladies. So many people to screw out of their life savings, so little time…..

    1. pgl

      “More gun control laws haven’t slowed mass shootings.”

      A stupid statement even for you. We have not been passing more gun laws. Rather the Brady Bill and the Assault Weapon ban have both been neutered.

      “mass shooting deaths is a tiny percentage of the population and a very small percentage of murders.”

      Go to Thousand Oaks and tell that to the parents of the kids who died at the Borderline Bar.

    2. pgl

      “Population and drugs may be causes of mass shootings.”

      PeakJunkScience! Drugs had nothing to do with the Borderline Bar massacre. Thousand Oaks is in Ventura County not Los Angeles County so your “population” hypothesis is debunked. Note a lot of these mass shootings are in suburbs or the country.

      Now if dense populations were the cause of high murder rates – why does your own link show NYC being more safe than the US average? Or did you not know how many people live here. And our mayor is a LIBERAL.

    3. baffling

      interestingly, all mass shootings appear to have one thing in common: guns. now even you, peakloser, must agree that if we have no guns, we have no mass shootings.

      1. PeakTrader

        Baffling, mass shootings also have another thing in common: People. No people, no mass shootings.

        You don’t believe in deterrence and protection. Criminals don’t care about laws.

        Don’t be a loser and an idiot. Don’t punish law abiding citizens and reward criminals, which you like to do all the time.

        1. baffling

          i offer up a world WITHOUT GUNS. peakloser offers up a world WITHOUT PEOPLE! therein is your problem peak. antisocial tendencies.

        2. pgl

          “You don’t believe in deterrence and protection.”
          Another lie. Baffling does believe in both but you have no clue what either word means. Why do you lie about what other people have said? These attacks in no way makes your incessant stupidity in any way smart.

      2. pgl

        Art Laffer has his cocktail napkin that claims a zero tax rate will maximize tax revenues. I’m sure PeakDishonesty has something similar!

    1. pgl

      “The U.S. murder rate is 4.9 per 100,000. Globally, it’s 6.2 per 100,000.”

      This again? No source of link again?

      Oh yea – the USA is safer than Somalia. Your point is ?????????????????

  7. DW

    I must say, it is quite satisfying seeing critics get smacked down here. Kudos to the host for quality analysis and excellent rebuttals.

  8. sam

    Here is some R code on out of sample MSE at various cutoffs between training and test samples. I trained two models; one with and one without trump variable. If the trump variable were predictive then the mse should be lower for that model. However, predictive error of model with trump is higher.

    This suggests your model overfits the data even if the trump dummy is statistically significant.

    data = read.csv(loc)
    data$time_numeric = 1:nrow(data)

    return_diff_mse = function(j){
    reg_1 = lm(MASSKILL ~ POPTHM + time_numeric , data = data[1:j,])
    reg_2 = lm(MASSKILL ~ POPTHM +TRUMP+ time_numeric , data = data[1:j,])

    pred_reg_1 = predict(reg_1, data[-c(1:j),])
    pred_reg_2 = predict(reg_2, data[-c(1:j),])
    mse_model_1 = mean((data[-c(1:j),’MASSKILL’] – pred_reg_1)^2)
    mse_model_2 = mean((data[-c(1:j),’MASSKILL’] – pred_reg_2)^2)
    res = mse_model_1 – mse_model_2


    1. pgl

      I hope you were not using Excel to run your statistics. Well known for not being reliable. Next time provide a link to your data and what stat package you are using.

    2. 2slugbaits

      sam A couple of comments. First, you included a deterministic time trend in both of your models. I believe Menzie’s principal comments interpreted the population variable as a proxy for a deterministic time trend but the regression models explicitly shown in the two posts did not include a deterministic time trend. So you should probably redo your analysis without the deterministic time trends. Also, while you could calculate the MSE of out of sample using the base “stats” package, that strikes me as doing things the hard way. There is a package with dedicated functions and arguments for all kinds of out-of-sample accuracy metrics. Finally, MSE is a popular but by no means the only way to measure out-of-sample accuracy. It all depends on your assumed loss function, which might be quadratic or it could be linear.

      1. sam

        1) yea population and time are very correlated but they are different. i think anytime you have time series data you should include a time co-variate. To be sure I did rerun analysis without and received oos predictions that were worse than with time. So I think including time is the way to go.
        2) sure there are packages. but it’s a simple enough loss that i coded it up rather than find a package. i dont think that matters for this analysis. now if i was reporting AUC thatd be a different story
        3) yes valid point. mse is not always the best metric. but since regression minimizes mse it’s kind of the default. open to other suggestions but i think you’d have to argue for a different metric.

        1. 2slugbaits

          sam When I sum the monthly MASSKILL into quarterly buckets and take the quarterly average of the population I get the following regression using a training period from 1982Q4 thru 2016Q4:

          coefficient std. error z p-value
          const −18.4375 8.04226 −2.293 0.0219 **
          popavg 8.35486e-05 3.02740e-05 2.760 0.0058 ***

          Mean dependent var 4.875912 S.D. dependent var 8.109041
          Sum squared resid 8189.668 S.E. of regression 7.788723
          R-squared 0.084226 Adjusted R-squared 0.077442
          F(1, 135) 7.616211 P-value(F) 0.006588
          Log-likelihood −474.6039 Akaike criterion 953.2079
          Schwarz criterion 959.0479 Hannan-Quinn 955.5811
          rho 0.075595 Durbin-Watson 1.840868

          Test for normality of residual –
          Null hypothesis: error is normally distributed
          Test statistic: Chi-square(2) = 162.323
          with p-value = 5.65083e-036

          The most obvious issue is (not unexpectedly) that the residuals are not normally distributed. The RMSE of the out-of-sample (static) forecast for 2017Q1 through 2018Q3 is 33.614.

          If you add a deterministic linear time trend I get the following regression statistics:

          coefficient std. error z p-value
          const 205.340 137.901 1.489 0.1365
          popavg −0.000889309 0.000601804 −1.478 0.1395
          time 0.691137 0.439034 1.574 0.1154

          Warning: data matrix close to singularity!

          Mean dependent var 4.875912 S.D. dependent var 8.109041
          Sum squared resid 7967.972 S.E. of regression 7.711192
          R-squared 0.109016 Adjusted R-squared 0.095718
          F(2, 134) 4.156033 P-value(F) 0.017735
          Log-likelihood −472.7241 Akaike criterion 951.4482
          Schwarz criterion 960.2081 Hannan-Quinn 955.0080
          rho 0.046108 Durbin-Watson 1.892879

          There are some obvious problems with this second regression model. As with the first regression, the residuals are not normally distributed. But more importantly notice that none of the three regression variables are statistically significant at any of the usual thresholds. In both models I corrected the standard errors using HAC with bandwith of 3. The statistical insignificance is probably because the time trend and population variable are strongly correlated, so you should expect a multicollinearity issue pushing down the t-stats, which also likely explains the matrix singularity warning. Also, the Schwarz criterion for the second regression is higher than the first regression, indicating that including the time trend overparameterizes the regression even though the “fit” is better. So overall I would be hard pressed to conclude that including a time trend improves the model even though the out-of-sample RMSE is lower at 32.231. Including a time trend improves the RMSE, but you shouldn’t trust the result because the model is overparameterized. The better RMSE is most likely a lucky accident.

          1. sam

            my original point was that statistical significance is kind of useless in terms of predictability. more than that id argue for a lot of cases statistical significance gives you the false appearance of science. you can have statistically sig vars that are not predictive (like chinn’s original model) and you can have non stat sig vars that are predictive.

            seeing how a model generalizes to data it has not seen before is the only way to ground it to reality.

            im not saying that two variable model is the ‘best’ model. a random walk or some regularized regression might be better. but i stand by the original criteria i proposed. given the choice between two models id go with one with a lower oos error.

  9. sam

    I’m asking you to amend your posts to address this criticism or retract your posts.

    I think your original rebuttal (this post) was wrong for a couple reasons.

    1) you built a model and tested that out of sample. since that model did not accurately explain the data oos you concluded it was wrong and said the alternative model was better. what i think you should have done (and what i just posted) was that you compare both models oos and see which does better.

    2) you built a model on rates but then did a significance test on cumulative data. at the very least this seems a bit muddled.

    I’m not arguing against the premise (that trump emboldens crazy people – i agree with that).

    i’m arguing against naive null hypothesis testing for a rare dependent outcome with small numbers of observations in time series data.

    1. PeakTrader

      “I’m not arguing against the premise (that trump emboldens crazy people – i agree with that).”

      How do you conclude that?

      It could be the chaos from so much fake news, because of the enormous hatred and ignorance of Trump by leftists.

      So, mental cases respond to all the hatred.

      1. baffling

        “It could be the chaos from so much fake news,”
        except the only fake news is in the mind of trump and his acolytes. rational people understand trump is simply conning folks with his “fake news” tweets. you, peakloser, are simply falling for the con. big time.

        1. PeakTrader

          I wouldn’t say Trump keeping his campaign promises is a fault.

          You’ve been conned by professional politicians you voted for.

          Talk about a loser.

          1. baffling

            no, peakloser, his assault on the free press is not a campaign promise. and fools like yourself, who support his assaults, are guilty of attacking the very essence of our constitution. lets be very clear here, peak loser, you are guilty of attacking the very essence of the constitution and a free press. why not own up to such behavior? it is rather pathetic and indicative of the many weak minded fools we still have in this country. you are being conned and you don’t even know it.

  10. PeakTrader

    Trump wants fairness.

    But he gets 95% negative coverage from the mainstream media, except for Fox News, which is roughly 50-50.

    All the negativity has an impact.

    1. pgl

      “But he gets 95% negative coverage from the mainstream media”.

      You would define the truth as “negative coverage”.

      But huh – you are claiming Faux News gets it half right. Good to know.

  11. js

    I have no horse in this race, I am a-political, and welcome comments from both sides. The data in both this and the other thread are interesting but the question I’m left with is why. I don’t follow policy as closely as others but I don’t recall any change in national gun laws or enforcement that would lead to the observed jump.

    1. macroduck

      Incitement, not enforcement. The hypothesis is that when a national leader encourages violence, there will be more violence.

Comments are closed.