Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.By the end of an average day in the early twenty-first century, human bei...

Title:Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Author:
Rating:
Edition Language:English

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Reviews

  • Will Byrnes

    There’s lies, damned lies and then there are statistics. One must won

    There’s lies, damned lies and then there are statistics. One must wonder. Do the lies get bigger as the datasets grow? Seth Stephens-Davidowitz posits that the availability of vast sums of new data not only allows researchers to make better predictions, but offers them never-before-available tools that can offer insight that direct questioning never could.

    We have seen steps up of this type before. Malcolm Gladwell has made a career of such, with

    ,

    , and

    .

    is the one I would expect most folks would know. Nate Silver put his data expertise into

    . All these looks at data and how we interpret it rely on the analyst, regardless, pretty much, of the data. While the same might be true of Stephens-Davidowitz’s approach, he focuses on the availability of materials that have not been there in the past. The smarts that must be applied to get the most interesting results can now be applied to new oceans of data. It is more possible than it has ever been to draw inferences and actually test them out.

    In addition to the

    of data that is now available, there is the

    . The author looks at Google and FB data for evidence of underlying realities. Surveys can sometimes offer inaccurate outcomes, when the people being queried do not provide honest answers.

    . But one can look at what people enter into Google to get a sense of possible racism by geographic area.

    Looking for queries on jokes involving the N-Word, for example, turns out to yield a telling portrait of anti-black sentiment, which also correlates with lower black life expectancy. (And pro-Trump vote totals)

    We are treated to looks into a variety of research subjects, from picking the ponies, to seeing what

    interests/concerns people sexually, looking for patterns of child abuse, selecting the best wine, using the texts of a vast number of books and movie scripts to come up with six simple plot structures.

    I thought the most interesting piece was on the use of associations, and provoking curiosity, rather than relying on overt statements to influence how people feel about a different group of people. Another was on using a data comparison of one’s (anonymous) medical information to others who share many characteristics to improve medical diagnoses.

    There are some areas in which it was not entirely persuasive that the methodology in question was tracking what was claimed. SS-D sees in searches of Pornhub, for example,

    Really? I expect that what people check out on-line does not necessarily track with what might be of interest in real life. It would be like someone with an interest in mysteries being thought to have homicidal tendencies after searching for a variety of homicide related titles. Should a writer doing research into a dark subject like child pornography, human trafficking or cannibalism expect the heavy knock of the police on his/her door? Where is the line between an academic or titillation search and one made for planning?

    SS-D makes a point about there being a significant difference between searches that offer projections for groups or areas, and their inapplicability for predicting individual behavior, although that will not necessarily remain the case. In baseball, for example, the explosion of available information may very well be applied to specific players to diagnose and even correct flaws in technique, or recognize patterns that might expose underlying medical issues, or predict their arrival. The Big Data related here is much more macro, looking at group proclivities. Useful for spotting trends, measuring public sentiment, but in more detail than has been heretofore possible.

    And of course there is the impact of dark players. Those with the resources and motivation could manipulate the Big Data produced by Google and Facebook. Such players would not necessarily be limited to Russian cyber-spies and pranksters, but corporate and ideological players as well, like

    . There could have been a bit more in here on those concerns.

    The book offers plenty of anecdotal bits that could have been lifted from any of the other data books noted at the top of this review. What one needs, ultimately is smart, insightful analysis. Having all the data in the world (that means you, NSA) is merely a burden unless there is someone insightful enough to figure out the right questions to ask, and how to ask them.

    SS-D notes several Google (Trends, Ngrams, Correlate) services that might be familiar to folks doing actual research, but which were news to me. It might be useful to check out some of these, maybe even come up with meaningful queries to shed light on pressing, or even completely frivolous questions.

    Not all problems can be solved, or even examined by the addition of ever more data. Sometimes, many times, the information that is available is perfectly sufficient to the task, but other factors prevent the joining together of its various pieces to create a meaningful whole. The now classic example is from 9/11, when an absence of coordination between the CIA and FBI resulted in suicide bombers who could have been foiled succeeding in their mission. Politics and the culture of nations and organizations figure into how data is used

    So if everybody lies, is Seth Stephens-Davidowitz telling us the truth? I am sure there is a query one could construct that would look at diverse data sources, pull them all together and give us a fuller picture, but for now, we will have to make do with reading his book and articles, checking out his videos, applying the analytical tools already incorporated into our brains, and seeing if there is enough information there with which to come to a well-grounded conclusion. And that’s no lie.

    Review posted – May 5, 2017

    Publication date – May 9, 2017

    =============================

    Links to the author’s

    ,

    , and

    pages

    VIDEOS – SS-D speaking

    -----

    -----

    - Arts & Ideas at the JCCSF

    -----

    - The Julis-Rabinowitz Center for Public Policy and Finance

    The June 2017

    cover story has particular relevance to the treatment of actual truth in today's political environment. It is illuminating, if not exactly uplifting. -

    - By Yudhijit Bhattacharjee

    July 12, 2017 - Washington Post - one of the very serious applications of big data -

    - by Philip Bump

    July 15, 2017 - One of the ways big data gets compromised is via automated dishonesty -

    by Tim Wu - Thanks to Henry B for letting us know about the article

  • Jessica

    This book tries too hard to be Freakonomics. The first two parts are full of random examples of interesting but mostly pointless things that can learned via Google search trends. However, a whole lot of assumptions are made off these bits of data that don't seem to have much basis in factual scientific methods of research. Unprofessional jokes are thrown in randomly. If you need a footnote to explain why a joke was not homophobic maybe you should have just skipped the joke. And any book of less

    This book tries too hard to be Freakonomics. The first two parts are full of random examples of interesting but mostly pointless things that can learned via Google search trends. However, a whole lot of assumptions are made off these bits of data that don't seem to have much basis in factual scientific methods of research. Unprofessional jokes are thrown in randomly. If you need a footnote to explain why a joke was not homophobic maybe you should have just skipped the joke. And any book of less than 300 pages of text should not need to use the same example three times, especially when it's about how the author can't believe women are concerned about the smell of their vagina.

    The last section of the book explains the limitations big data holds and is really the most grounded section, the rest being almost hagiography. It would have done a lot to work the third section into the examples of the first two sections. It would have balanced out the praise and also would have done much to explain the flaws present in some of the examples included.

    Some cool facts buried in a lot of murky oddness.

    Disclaimer: I was given this book in a Goodreads giveaway.

  • linhtalinhtinh

    A pretty short book with some interesting remarks, but not yet charming enough for me. The author definitely has his quirky and funny moments, when he presents himself, his family, and especially his views more. Yet the books' ideas and findings aren't exactly ground breaking. The types of questions like this have been posed in

    . The usefullness of big data has been discussed by ones such as

    (discussion o

    A pretty short book with some interesting remarks, but not yet charming enough for me. The author definitely has his quirky and funny moments, when he presents himself, his family, and especially his views more. Yet the books' ideas and findings aren't exactly ground breaking. The types of questions like this have been posed in

    . The usefullness of big data has been discussed by ones such as

    (discussion on sex and gender actually resemble Dataclysm a lot).

    I was looking for something more nuanced, a long and rigorous thematic research on human's tendency, and data as an extremely useful tool but not the main focus. Instead, it's more like a collection of observations. Each time Stephens-Davidowitz has an idea, he looks for answer from the available data, then moves on. The questions are somewhat related to human's private behaviors that traditionally we can't observe. The tool seems to be a bit more at the center here, but he doesn't discusses the cons and all the ethical implications of big data that deep enough, except for a short section at the end of the book.

    Now, that's totally ok, for a casual and light, yet still useful read. More importantly, we have to consider that these type of research and the topic of big data are still relatively new. It takes decades and decades more to build a literature huge enough to draw really meaningful and profound conclusions. The time simply hasn't arrived yet for the book of my taste, but this one, as the author states, hopefully would raise interests in young people, young social scientist, steering them towards potentially fruitful topics and research methodologies. That's why it's a 3 star.

  • Caroline

    I wish I could give this book more than five stars. Anyone who has a sneaking feeling that Americans aren't who they SAY they are will find confirmation here. It's also easy to read, no academic language here.

    I was already riveted by the introduction. His premise is that we all lie to each other, pollsters, and ourselves, but not to that white box where you type internet searches. Both before and after the election everyone went nuts trying to figure out why Trump was doing so much better than p

    I wish I could give this book more than five stars. Anyone who has a sneaking feeling that Americans aren't who they SAY they are will find confirmation here. It's also easy to read, no academic language here.

    I was already riveted by the introduction. His premise is that we all lie to each other, pollsters, and ourselves, but not to that white box where you type internet searches. Both before and after the election everyone went nuts trying to figure out why Trump was doing so much better than polls would indicate, looking for factors that would explain it. There was only one. "[Nate] Silver found that the single factor that best correlated with Donald Trump's support in the Republican primaries was that measure I had discovered four years earlier. Areas that supported Trump in the largest numbers were those that made the most Google searches for 'n-----'." (He uses the real word, which deepens the revulsion you feel at what he's discovered.)

    Despite Obama's two easy election victories and the narrative that we were post-racism, the Google search data tells another story about reactions to those victories.

    Immediately after the San Bernadino shootings, what happened online? A ton of people searched for "kill Muslims."

    And there is a lot more, about sex and child abuse and sexism. Did you know that the most common term used to complete the sentence "Is my son..." is 'gifted' or some variant thereof, and the most common term used to complete the sentence "Is my daughter..." is 'overweight'?

    America is not post-anything except maybe post-good intentions.

    What use does he think this can be? Well, he did have some good suggestions, and none of them are based on finding out who any individual is who's done a search. For example, if searches for "kill Muslims" spike in a certain city, a few extra police could be deployed to watch over the local mosque until the spike subsides. He spends a moment talking about how big data is not meant to be, and should not be, used to try to figure out who specifically is going to commit crimes.

    By the time I was done with this book I was a bit discouraged at who Americans seem to be, but it's better to know. I hope that this kind of study continues, so we can attempt to realistically work with our society instead of pretend it's something it's not.

  • Atila Iamarino

    Acertei em cheio nessa leitura! Seth Stephens-Davidowitz apresenta uma análise de como as pessoas se comportam, na mesma linha do

    e do

    . Mas enquanto Signal and the Noise fala de tendências de dados e Dataclisma fala do comportamento das pessoas dentro do OkCupid!, Everybody Lies fala de como as pessoas se comportam em geral.

    O autor usa uma série de dados de forma b

    Acertei em cheio nessa leitura! Seth Stephens-Davidowitz apresenta uma análise de como as pessoas se comportam, na mesma linha do

    e do

    . Mas enquanto Signal and the Noise fala de tendências de dados e Dataclisma fala do comportamento das pessoas dentro do OkCupid!, Everybody Lies fala de como as pessoas se comportam em geral.

    O autor usa uma série de dados de forma bastante inovadora, como tendências de buscas no Google (onde ele trabalha), buscas no PornHub, Facebook e outras fontes de big data para fazer o que ele chama de "sociologia de verdade" ou sociologia baseada em evidências. Os dados que ele mostra sobre preconceito (buscas por temas preconceituosos), insegurança de auto-imagem, inseguranças em relação aos filhos e afins mostram uma imagem bem mais crua e feia da sociedade do que o que pintamos com postagens em Facebook e Instagram.

    Outros revelam informações no mínimo interessantes, sobre a diferença que se formar em Harvard pode fazer (nenhuma, o ponto parece estar em quem se forma), onde criar os filhos, como aumentar as chances de sucesso em um encontro... O livro lembra bastante uma versão mais nova e, na minha opinião, mais curiosa da abordagem inovadora de Freakonomics.

    Se você não está interessado na revolução que o registro e a disponibilidade de dados está causando no mundo, e no estrago que empresas e governos conseguem fazer com o controle que têm sobre a informação, no mínimo vai curtir o livro pelos fatos curiosos e mórbidos que ele levanta dos dados. Saber por exemplo que o número de homens que buscam como fazer bem sexo oral nas mulheres é o mesmo que busca por como fazer sexo oral em si mesmo fala muito sobre como as pessoas pensam. Um livro para todos os gostos.

  • Emma Deplores Goodreads Censorship

    3.5 stars

    This is an engaging and informative book about the huge amount of data available online and what it tells us about society. I read it alongside

    and found Everybody Lies to be by far the better of the two, presenting a wealth of information in a cohesive fashion and making fewer unfounded assumptions. The author was a data scientist at Google, and draws in large part on the searches people make on the site, along with information from sites including Facebook and Pornhub.

    There’

    3.5 stars

    This is an engaging and informative book about the huge amount of data available online and what it tells us about society. I read it alongside

    and found Everybody Lies to be by far the better of the two, presenting a wealth of information in a cohesive fashion and making fewer unfounded assumptions. The author was a data scientist at Google, and draws in large part on the searches people make on the site, along with information from sites including Facebook and Pornhub.

    There’s a lot of interesting stuff in the data, from the rate of racist searches in the rust belt predicting the rise of Donald Trump, to common body anxieties and whether they actually matter to the opposite sex, to an estimate of how many men are gay and whether that varies by geography (it appears not), to rates of self-induced abortions. This is a great book to read if you love unusual factoids, whether on sexual proclivities or how sports fans are made.

    The author also writes in a compelling way about the uses of Big Data itself, and while he waxes evangelical about it (evidently preferring to spend all his time immersed in statistically significant data, he finds novels and biographies too “small and unrepresentative" and therefore uninteresting), there are certainly a lot of possibilities there. In health, for instance, compiling early searches about symptoms with later searches for how to handle a diagnosis can help doctors detect pancreatic cancer at an earlier stage, while epidemics can be tracked through symptom searches. The author is also interested in how applying data can revolutionize a field, discussing at length the data that predicted the success of the racehorse American Pharaoh. (By "at length" I mean 9 pages; this is a book that moves through a broad range of topics quickly.)

    Overall, the writing is engaging and the book hangs together well, being informative while mostly resisting the urge to speculate. But the author does make a couple of assumptions worth pointing out. One is that people’s Google searches are made in earnest and for personal reasons. Certainly, you might search for “depression symptoms” out of concern that you or someone you know is depressed. But you also might want to be prepared in advance to identify warning signs, or might have encountered something in the media that sparked your interest, or you might be a student writing a paper on the topic. On the other hand, if you’re intimately familiar with depression already, you’re unlikely to google the symptoms. None of this means the author’s finding a 40% difference in rates of depression symptom searches between Chicago and Hawaii isn’t relevant, but data that’s both over- and under-inclusive serves better as a starting point for research than a definitive conclusion. It's certainly not proof that better geography is twice as effective as antidepressants, as the author suggests.

    The other assumption is that everybody lies: the book insists on it, based largely on the fact that typically rosy social media posts fail to reflect all those unhappy or hateful searches. Selectively sharing information doesn’t necessarily seem to me to be lying, but the author appears invested in proving the book’s title. For instance, he discusses a particular type of tax fraud: in areas where few tax professionals or people eligible for the scheme live, 2% of people who could benefit from this lie tell it, while in areas with high concentrations of both, the rate of cheating is around 30%. The author concludes that “the key isn’t determining who is honest and who is dishonest. It is determining who knows how to cheat and who doesn’t.” This bleak view of the world fails to account for the 70% who don’t cheat even in areas with high levels of knowledge; finding that significant numbers of people cheat if they know how is a far cry from finding that everyone does.

    So, like the author of Dataclysm, Stephens-Davidowitz is probably a better statistician than sociologist. But if you’re interested in Big Data, or in getting a peek at the thoughts and anxieties people ask Google about because they’re not comfortable sharing with others, this is the book I recommend. You’ll certainly get a lot of interesting tidbits from it, along with perhaps new inhibitions about typing things into Google!

  • Lubinka Dimitrova

    I sought out the book after reading an interview with the author, and it was totally worth it. The book is quite enlightening, and to be honest, deeply frightening. Internet data can work miracles for the benefit of humanity, but it can bring to life many unimaginable, Big-Brother-type nightmares (current US presidents not excluded, just sayin...). Still, it's good to know.

  • Lori

    When sociologist ask people if they waste food, people give the only correct answer. It's wrong to waste food.

    When sociologist survey the contents of the same people's garbage, they get a more accurate answer.

    Just image how much more information is available trolling through internet searches.

  • Trish

    Maybe everyone does lie. But they don’t lie all the time. Stephens-Davidowitz makes the good point that asking people directly doesn’t always, in fact may not often, yield true answers. People have their own reasons for answering pollsters untruthfully, but it is clear that this is a documented fact. People sometimes lie to pollsters.

    Stephens-Davidowitz was told by mentors and advisors not to consider Google searches worthwhile data, but the more he looked at it, the more he was convinced that G

    Maybe everyone does lie. But they don’t lie all the time. Stephens-Davidowitz makes the good point that asking people directly doesn’t always, in fact may not often, yield true answers. People have their own reasons for answering pollsters untruthfully, but it is clear that this is a documented fact. People sometimes lie to pollsters.

    Stephens-Davidowitz was told by mentors and advisors not to consider Google searches worthwhile data, but the more he looked at it, the more he was convinced that Google searches contained the best data for determining what people are concerned about. He has uncovered some interesting trends that are not apparent through direct questioning because people are sometimes ashamed of their fears, feelings, prejudices, and predilections.

    I didn’t really like this book. Partly the reason is because I listened to it, and Stephens-Davidowitz gives charts, graphs, data points that obviously cannot be represented in the audio version. These usually help me to grasp things easily and maybe bypass pages of material that is not as interesting to me. It wasn’t that his material was hard, it was that I oftentimes did not like what he was talking about. He had a tendency to focus on deviant behavior, e.g., sexual predators, abuse, porn, etc. One might make the argument that these behaviors are important to understand and therefore worth looking at. Possibly. However, if ‘everybody lies,’ one might make the argument that we do not have to look at deviance to find untruthfulness.

    What we discover is that to test Stephens-Davidowitz’s thesis that ‘everybody lies,’ we have to spend quite a lot of time with statistics and creating studies, or as he is wont to do, studying big data. Big data probably irons out discrepancies in the reasons for our Google searches, e.g., that it is not me that is interested in the herpes virus, it is my brother, because in the end it doesn’t matter why we did the search; what matters is that we did the search. Besides, maybe I’m lying about my brother having the virus, but my interest in the topic is not a lie.

    Stephens-Davidowitz has made a career so far out of the study of big data, showing us ways to slice and dice it so that it is useful to our view of the world. Only thing is, I am not as interested in what big data tells us as he is. He’d trained as an economist, and towards the end of the book he hit a couple of areas I did find more interesting, like the notion of regression discontinuity, a term used to describe a statistical tool created to measure the outcomes of people very close to some arbitrary cut-off.** S-D talks about using this tool on federal inmates, discovering criminals treated more harshly committed more crimes upon their release. But S-D also studied students on either side of the admissions cut-off for the prestigious Stuyvesant High School: those who attended Stuyvesant did not have a significant performance difference in later life than students who did not.

    Apparently Stephens-Davidowitz went into data science because of

    , the bestselling book by Steven D. Levitt. He believes that many of the next generation of scientists in every field will be data scientists. I did finish the audiobook, another study he took note of in the last pages. Apparently few readers finish ‘treatises’ by economists. He believes this is his big contribution to our knowledge base, and there is no doubt his contrariness did highlight ways big data can be used effectively.

    If I may be so bold, I might be able to suggest a reason why many female readers may not be as interested in the material presented, or in Stephens-Davidowitz himself (he was/is apparently looking for a girlfriend). Stay away from the deviant sex stuff, Seth. It may interest you but I can guarantee that fewer women are going to find that appealing or reassuring conversation or reading material.

    An interesting corollary to this economists’ data view is the question of whether the truth matters, which is how I came to pick up this book. Recently on PBS’ The Third Rail with Ozy, Carlos Watson asked whether the truth matters. At first blush the answer seems obvious, and two sides debated this question. One side said of course truth matters…but most of us know one man’s truth to be another man’s lie. The other side said ‘everybody lies.’ It got me to thinking…I do think the two ways of coming to the notion of lying dovetail at some point, and one has to conclude that truth may not matter as much as we think. What matters is what we believe to be true.

    Finally, it appears Stephens-Davidson agrees to some degree with Cathy O'Neill, author of

    , in that he agrees you best not let algorithms run without human tweaking and interference. The best outcomes are delivered when humans apply their particular observations and knowledge and expertise

    big data.

    ** S-D describes it this way:

  • Greg

    UPDATE: In summary, the author bounces back and forth between real data/numbers and pure speculation. It's fascinating, really, as that's got to be the entire point: to show us how to tell what's real and what's fiction as we are bombarded by information..

    ORIGINAL REVIEW:

    Yes, "Everybody Lies" including, obviously, the author because if Seth Stephens-Davidowitz never lies, I'm sure the subtitle would have been "Except Me Within This Book". So, from our data thus far, we know the author lies, and

    UPDATE: In summary, the author bounces back and forth between real data/numbers and pure speculation. It's fascinating, really, as that's got to be the entire point: to show us how to tell what's real and what's fiction as we are bombarded by information..

    ORIGINAL REVIEW:

    Yes, "Everybody Lies" including, obviously, the author because if Seth Stephens-Davidowitz never lies, I'm sure the subtitle would have been "Except Me Within This Book". So, from our data thus far, we know the author lies, and maybe even within this book. The author's first major error comes from a hilarious statement about how gay men like Judy Garland (one can only suppose the author's sample is his gay uncles and their friends, in which case he should have used Edith Piaf instead, or maybe Bette Midler, as both Bette and Piaf were indeed placed into stardom by gay communities, while Garland was a star for everyone who went to the movies in 1939 and saw her in "Wizard of Oz". Oh, I'm digressing, sorry. I liked some of this book: specifically the parts where real numbers/data are used: for example, in 1950 a survey revealed 20% of a certain sample said they had a library card, but the official sample count indicated only 13% actually had one. (Why, oh why, would any adult in the USA NOT have a library card? This boggles my mind.) But get this ridiculous utilization of words: "...the overwhelming majority of black Americans think they suffer from prejudice....On the other hand, very few white Americans will admit to being racist." Good grief. EVERYONE has prejudices (that's how we get through this chaotic world, as we are prejudicial for, say, driving instead of flying because we like road trips and like to stop at places we have never visited and meet people we would otherwise never have a chance to speak with-and I'm talking about me, as long airport lines are no fun) but racism is a different issue entirely, as racism has nothing to do with my prejudicial decision to drive. A section devoted to "omitted-variable bias" certainly belongs in another science book this year entitled "We Have No Idea". Hilariously, the author concludes with a lot of questions, including this howler: "Where do sexual preferences come from?" The answer is simply one of genetics (in combination with epigenetics, which may turn "on" or "off" these genes) but that's old news. Hence, can we conclude that all economists and statisticians don't read current books and journals regarding genetics. Hardly, but one would assume an editor somewhere would catch this issue. In summary: when the author uses real, solid numbers (the number of likes on Facebook vs that same person's internet searches - and let's keep in mind google doesn't release names, but does release the number of loving wives who praise their husbands on facebook and then the number of wives who googles "Is My Husband Gay". Now that I think about it, why such a fixation on gay matters? A better title for this book would have been, "Everybody Lies About Their Sex Life" cause we know absolutely that is true.

Books Finder is in no way intended to support illegal activity. We uses Search API to find the overview of books over the internet, but we don't host any files. All document files are the property of their respective owners, please respect the publisher and the author for their copyrighted creations. If you find documents that should not be here please report them. Read our DMCA Policies and Disclaimer for more details.