Clinton vs Trump, Social Desirability Bias

Election 2016 Social Desirability Bias / Trump Shaming: Clinton vs Trump head-to-head

In 2016, Trump shaming is real and Clinton has superficially benefited. Her position is seen as being stronger than it actually is due to the fact that many Trump supporters have been closeted and many demographic groups feel social pressure to support Clinton. This trend should revert itself by the election and should produce some ‘unexpected’ swings between polls and actual results.

Social Desirability Bias is a phenomenon that explains why people tend to change answers in surveys and polls from their preferred answer to one that is deemed more socially acceptable. It has been observed in a variety of areas including personal health, personal finance, sexual practices, and politics to name a few. For instance, in a personal health survey someone might, due to perceived social pressure, claim to smoke one cigarette a week when in reality it is one pack per week.

In politics, Social Desirability Bias influences people to change their answers in polls as they do not want to be seen as supporting a socially unacceptable candidate. Often they will respond that they are ‘undecided’, that they support a lesser known candidate (like a third party candidate), or that they support the main rival of their preferred candidate who is assumed to be more socially acceptable. Such support is superficial, however, as claimed support, for social reasons, does not appear to change voting behavior.

In 2016, Trump has been successfully branded by the Democrats and the media in such negative terms that he has become the socially undesirable candidate. People prefer not to be publicly associated with a candidate who is often likened to Hitler and accused of sympathizing with the KKK. This brand does not need to be accurate or real, it just needs to be sufficiently ‘truthy’ for people to repeat it as if it was the truth. The social pressure, real or imagined, that supporters feel to not support their preferred candidate results in a bias against that candidate in the polls, especially polls that include a live interviewer as opposed to an anonymous data collection method.

In contrast to Trump, Clinton has significant support from the media who, as explained elsewhere, are and have been over the previous decades Democrat-leaning. Clinton has many liabilities but for the most part she is painted by the media in generally positive terms as the first female presidential candidate of a major US party. The juxtaposing of Clinton as breaker of the last great glass ceiling and Trump as the racist sympathizer by the majority of the main media agencies, has created much of the Social Desirability Bias.

Although fully present in the 2016 election, Social Desirability Bias did not find its US origins with Trump and Clinton. In fact, the overall environment in the US has been acting as a catalyst more so than just the candidates. Extremely strong biases have been detected around approval of Congress during the mid-terms in 2014, well before Trump entered national politics. Likewise, Obama’s approval rating also shows strong bias. It is not simply a 2016 US presidential phenomenon that we are discussing but an ongoing issue with which the US must grapple. In this sense, biases surrounding Trump and Clinton are symptoms of larger issues.

As with other cases of measuring Social Desirability Bias, we take the difference between the live interviewer based polls (‘live polls’) and polls that are based on robocalls, IVR, and internet collection (‘anonymous polls’). The basic idea is that if there is a significant difference between these two types of poll results then a bias is likely. If people do not feel any social pressure to respond in a certain way when there is a live interviewer then the polls should be more or less in-line with each other.

Chart 1: Social Desirability Bias of Clinton versus Trump, Live Poll Spread of Clinton Support minus Trump Support – Anonymous Poll Spread of Clinton Support minus Trump Support

Source: RealClearPolitics

This data shows that there is a clear bias in favor of Clinton. In other words, people consistently favor Clinton in live polls more than they do in anonymous polls as compared to Trump. Taking an average of the last two months shows the pro-Clinton bias to be around 3.5 percentage points. So, on average, a live poll showing that Clinton is up by 3.5% actually is implying a tied race. In a close race, this would amount to an enormous swing. Recall that Obama beat Romney in 2012 by approximately 3.8% so the aforementioned bias swing is extremely relevant.

It is also of interest to highlight that most of the primary newsworthy polls are ‘live’ polls such as NBC, CBS, ABC, Fox, Wall Street Journal, Quinnipiac University, CNN, Gallup, PEW Research, Monmouth University, Bloomberg, IBD, McClatchy/Marist, and USA Today. So chances are if you hear a report or see a headline about Clinton leading in the polls, they are likely referring to a biased poll as most polls are in fact still done using live interviewers to collect data.

Another interesting point is to correct for unusual changes in methodology in the polls. Reuters, in rather dramatic fashion, announced a change in their methodology in the last few months. There is a strong argument for taking this data out of the analysis as we really do not fully understand how they changed things. The same analysis without Reuters data shows a slightly higher bias of just over 4 percentage points when taking the average from August and September.

Social Desirability Bias also impacts the undecided vote. In all national elections, there will be some people who want to decide as late as possible. But when there is a consistent and large difference between the percent of undecideds in live and anonymous polls there is a good chance that bias is influencing the responses. In other words, an excess undecided percent likely is explained by many people ‘hiding’ in undecided simply because they want to avoid supporting the socially undesirable candidate.

Chart 2: Difference in Percent Undecided between Live Polls and Anonymous Polls

Source: RealClearPolitics

In the 2016 election, using head-to-head data, the percent of undecideds is consistently much higher in anonymous than live polls. The implication is that when there is no social pressure, people are much more inclined to say they are undecided.

Note the significant size of the difference in undecideds between live and anonymous polls. Over the last two months, the difference averages 4.7 percentage points. In other words, it is about a percentage point higher than the pro-Clinton bias observed previously.

Given the size, the increase in undecideds in anonymous polls could dramatically impact the outcome of the race – if the undecideds break disproportionately for one candidate. A pro-rata division based on polling levels would not impact the race. These undecideds need to move more towards one candidate to have a real impact.

There is little hard evidence that implies it will be anything but a pro-rata division. However, by using an understanding of Social Desirability Bias and historical precedent, we can infer that Trump should receive a higher number of votes from these undecideds.

A similar race occurred in the US in 1980 which included Carter (Democratic incumbent), Reagan (Republican), and Anderson (Independent). Carter as the incumbent was very well known and had a significant advantage in the race. Reagan was also extremely well known, but mostly as a ‘B-movie’ actor and for his national cigarette commercials and not for being Governor of California. Anderson, who had actually run for the Republican nomination but lost to Reagan, was running as an independent.

Polls showed a fairly high level of undecideds. In an election with an incumbent, such a high level is not usual. People will normally know if they like the sitting president and if that president should get reelected. The election of 1980 however produced one of the highest undecided levels on record.

Chart 3: Average Percent Undecided from August / September

Source: Wikipedia, Gallup

Note: This data is of the average undecideds from August / September. Some of the historical data was presented this way, so for comparison purposes all data was changed to this standard. For 2016, the data comes from anonymous polls as in this election they seem to provide more accurate data. Live polls show a lower level of undecideds, as explained in another post.

We can assume that many of the undecideds were supporting another candidate but for some reason either had not fully made up their mind or were constrained somehow from admitting it. We can more or less disregard the idea that many of these undecideds would break for Carter. The nation just got a four year test run with him as president and, being extremely well known nationally as a politician, voters should have known by just a few months prior to election-day if they would support him or not.

Reagan, on the other hand, had become the butt of many jokes. Having stared in “Bedtime for Bonzo” which had Reagan sharing the lead with a chimpanzee, Reagan was not taken seriously as a politician by many. In the 1980 election he was clearly the socially undesirable candidate and many hid their preference for Reagan by declaring undecided in polls.

As you likely noticed in the previous chart, the 2016 undecided level is extremely high, in fact the highest on record. Such an extraordinary undecided level implies a strong bias in the election.

Using 1980 as an analogy, Clinton appears to be Carter in this race. She is extremely well known nationally as a politician and has been for decades. One of her top selling points is that many within her party declare that she is the ‘most well qualified candidate to have ever run for president’ – which is a strike against her for this metric as it confirms that she is in fact very experienced as a politician and as a government servant so an usually high undecided level at this stage of the race does not bode well for her. Put another way, so many voters should not be undecided with such an experienced individual in the race right before voting is about to begin.

Continuing, Trump is Reagan. Reagan’s ‘B-movie’ and Trump’s ‘Reality-TV star’ credentials are very similar in that they are used not only against the candidates but against any supporter of them. These credentials are used to make the argument that they are not-serious politicians which is used to question the judgment and level-headedness of anyone supporting them. This goes beyond a policy disagreement over which people can argue. This is an attack on supporter’s perception of reality and decision-making capabilities. Both candidates were attacked for other soft-topics as well (soft in the sense that they are difficult to actually prove, disprove or debate, unlike a policy choice). Reagan was a ‘war-monger’ who would certainly start ‘WW III’ and Trump is a ‘conman’ who will ‘destroy the country’. Reagan and Trump in their elections clearly had some major Social Desirability Bias going against them.

Continuing, all third party candidates are Anderson. Anderson ran for the Republican nomination against Reagan and dropped out fairly early. From the start, he was not that strong of a candidate but during the summer received upwards of 24% in Gallup polls versus 33% for Reagan. Judging from these poll numbers, Anderson looked extremely strong. In reality, however, Reagan had been so successfully branded as a socially undesirable candidate that many supporters were either hiding in undecided or with Anderson in the polls.

The analogy seems to be playing out in 2016 with some eerie similarities. Clinton has been shown to be the socially more acceptable main candidate as she is receiving significantly higher live polls results as compared to anonymous poll results due to an assumed positive bias. Also, the level of undecideds is very high for an election that has as the front-runner such a well-known and experienced candidate. Third party candidates also appear to have very strong positive biases (as analyzed in another post) which help them to poll at levels far in excess of their performances in previous elections and in excess of what would normally be expected.

The results of the 1980 election show that these biases fell apart on election-day.

Table 1: October / November 1980 US Presidential Polls versus Actual Results

	Ronald Reagan (R) %	Jimmy Carter (D) %	John B. Anderson (I) %
October / November Poll Averages	42%	44%	9%
Actual Results	51%	41%	7%
Poll Error	9%	-3%	-2%

Source: Wikipedia, Gallup

In a normal election, polls should be fairly accurate. But in an environment where one of the main candidates has been successfully branded as socially undesirable, polls can be significantly off. The previous table shows that the discrepancy between late-polls and actual results was impressive. Reagan as the socially undesirable candidate gained the most from undecideds and actually pulled from the other candidates on election-day. In fact, it looks like almost all of the undecideds went to Reagan, a truly amazing outcome.

Though not fully covered in this post, it seems rather clear that 2016 could produce similar results. In addition to live polls showing an unusually high bias for Clinton and third party candidates and to an unusually high undecided level, we have social media and on-line activity measures pointing to Trump as being the preferred candidate. Assuming that people act in a less filtered way when on-line than when answering polls and surveys, we can infer even greater negative social bias against Trump in the polls. If this is the case, then the actual results of the 2016 election will shock most.

Another important point to highlight is the demographics of bias. Earlier in the election cycle, the bias against Trump and in favor of Clinton was exceptionally high, breaking 10 percentage points in numerous months. At that time, bias was more universal. Most recently, however, the pro-Clinton bias appears to have mostly settled in two main demographic groups – women and African-Americans.

There is a very strong element of social pressure for women to support Clinton. On numerous occasions very high profile women, many of whom have been recognized as leaders for female empowerment, have attempted to shame women into supporting Clinton. There is an argument that, if all things were equal, it would be best to help the country break prior stereo-types and limitations of specific groups. However, some women might prefer to vote in a gender-neutral fashion and shaming might simply force supporters of Trump to superficially publicly support Clinton. By comparing live and anonymous polls for the female demographic, we have found there to be a 6 percentage point pro-Clinton bias in live polls. This means that in anonymous polls support for Clinton is significantly lower. If you believe that the anonymous polls more accurately reflect how people will vote in anonymous polling booths, then this pro-Clinton bias should evaporate.

The African-American community has been the most loyal demographic to Democrats by far. In fact, nothing comes close in the history of the US, at least for as long as there are records. In the most recent elections, over 90% of this community voted Democrat. It smashes all other demographic-based figures. There must be, within this community that votes so regularly pro-Democrat, a very high level of social pressure for this to continue. Also, more recently, Obama came out and said to the African-American community that he would see it as a personal insult to not vote for Clinton. This is a very direct way of shaming this community for support. Again, the social pressure must be intense to publicly support Clinton. According to the live versus anonymous polls metric, African-Americans have a 10 percentage point positive bias for Clinton. In other words, the bias is stronger within this community than within the female demographic.

Although we do not have access to the complete datasets for every poll that we use to determine these averages, it seems that most of the pro-Clinton bias can be explained by the biases in the female and African-American demographics. Furthermore, it seems that an anti-Trump bias in the white male demographic has shrunk to a significant degree.

Now, let’s take a look at how the live polls and anonymous polls have trended regarding the 2016 election, using head-to-head data.

Chart 4: Comparing Average Monthly Clinton minus Trump Spread from Live Polls and Anonymous Polls

Source: RealClearPolitics

Live polls have shown a significant lead for Clinton over time, for much of which the spread has averaged over 4 percentage points. In other words, it looked like Clinton would beat Trump to an equal degree or greater than Obama beat Romney. This likely led to many talking of a landslide victory.

However, looking at the anonymous poll spread, this election looks extremely tight. In fact, it has been within the margin of error and within the Obama – Romney spread for most of the last six months. Journalists and analysts calling for a Clinton landslide simply appear to have not understood the data.

By further incorporating bias analysis done in other areas and by correcting for some apparent mistakes in polls, it seems like Trump is already up by a considerable degree. Some examples include:

Pro Third Party Bias – as explained in other posts, third parties in this election are receiving a positive Social Desirability Bias. Much of this superficial support will likely go to the most socially undesirable candidate, Trump, on election-day,
Unusually High Undecideds – as covered in this post and others, a very high undecided level late in the election cycle is a symptom of Social Desirability Bias. When these votes eventually get allocated on election-day, a disproportionately low percent will go to the most well-known politician, Clinton, and a disproportionately high percent of them should go to the most socially undesirable candidate, Trump,
Social Media and On-Line Activity – show how individuals spend their time in unfiltered environments, which during an emotional election likely better indicates voting intention than skewed polls. Indicators based on such activity show that Trump significantly outperforms his poll levels making him the only candidate to do so and implying that the level of bias could be significantly higher than is measured by taking the difference between live and anonymous polls,
Poll Re-Weighting – as discussed in many other posts, current polls, both live and anonymous, use non-transparent methods to re-weight their raw data in order to calculate final results. This re-weighting process normally includes assumptions based on the previous or last few elections. In 2016 however things like turnout will likely drastically change in favor of Trump which makes almost all polls inherently skewed in favor of Clinton.

Looking exclusively at anonymous poll data from September 2016 shows that this election appears too close to call. Once you start to incorporate other factors as those listed above, Trump looks to have a considerable lead. Although many might view Social Desirability Bias against a candidate as a disadvantage, it might turn out to be an advantage for a number of reasons – not least of which is that Democrats convinced of a landslide (covered in a different post) due to pro-Clinton biased polls will be much less likely to actually turn out and vote.

Putting together the entire picture of Social Desirability Bias goes beyond this post. To fully understand what is going on you should look at how bias impacted the approval rating of Congress after the 2014 mid-terms, how bias around the third party candidates should impact the 2016 election, how social media and on-line activity should be used to help confirm and measure bias, and many other related topics. As a summary, this post serves to outline the main points and highlight the fact that Trump shaming has worked but in an undesired way, namely that superficial support for Clinton has increased but that support translating into votes is highly dubious at best.