Data: It's Everywhere!

Data is all around us, and it affects our everyday lives in ways we often take for granted. Google and Facebook analyze the content of our e-mails, searches, and posts, and then they use that data to target ads relevant to our interests. Amazon and Netflix track our online behavior, compare it to the behavior of other users, and recommend products and movies that suit our tastes—often with uncanny accuracy.

In an increasingly data-driven world, data influences all aspects of society—from our careers to our roles as citizens to our private lives. To thrive in this environment, you must be able to work with data, draw well-reasoned inferences from it, and effectively communicate your discoveries to broader audiences. 

Law

Litigation and legal studies are increasingly relying on data and statistics—decisions about discrimination claims, products liability, trademark dilution, forensic identification, anti-trust litigation, economic damages, even jury selection are often determined by data. 

Business

  • Morgan Stanley and other companies use big data to inform investments and make economic forecasts.
  • Most equity trading employs data algorithms that interpret signals from a variety of sources to gauge risk. 
  • Businesses and entrepreneurs use government Census Data to identify new markets.
  • Marketing firms utilize customer surveys, analyze correlations between advertising outlays and increased revenues to make decisions, and engage in random sampling techniques to estimate market sizes.
  • Union Pacific Railroad uses thermometers, microphones, and ultrasounds to collect performance data on engines and identify equipment at risk for failure before repair costs are prohibitively expensive.
  • Ford's hybrid cars generate and store about 25 GB of data per hour, which enables Ford to better understand driving behavior, reduce accidents, understand wear and tear, and to reduce maintenance costs. 

Academia

  • Literature scholars borrow techniques from natural language processing, sentiment analysis, signal processing, and machine learning to extract and compare the plot structures of novels and track how archetypes evolve from the 19th to the 20th century.
  • Historians are combining Geographic Information Systems (GIS) data with traditional historic sources to examine the growth of railroads and their impact on the American West.
  • Musicians, linguists, and cognitive scientists use computational modeling to understand how infants learn to distinguish words from all of the other sounds in their environment. 
  • Stanford’s City Nature project looks at why natural areas are unevenly distributed in urban environments using spatial analysis and text mining of planning documents.  
  • Economists are mapping variations in medical diagnoses and treatments for people in different parts of the country using data from programs such as Medicare and Medicaid.
  • With a host of emerging areas of study like connectomics, genomics, regulomics, and metabolomics, neuroscience is generating huge data sets that require not only knowing how to collect the data, but also how to sort through it all and analyze it.
  • Psychologists are learning to harness the data from smart phones and wearable sensors to collect information on users, such as physical activity, social interactions, and travel patterns. And because this information is collected invisibly and automatically (unlike traditional surveys, which are susceptible to self-reporting errors), they are collecting more accurate data.
  • Computational sociology uses computer simulations, artificial intelligence, statistical methods, and social network analysis to model and analyze human social behavior in organizations, cities, and social networks and to understand how this behavior effects society at large. 

Health

  • Doctors rely on statistics to gauge the effectiveness of drugs and calculate life expectancy and chances of recovery.
  • Epidemiologists conduct statistical analyses on the spread and risk of diseases.
  • The Centers for Disease Control partnered with Google in 2008 after researchers found that spikes in Google searches for flu symptoms coincided with actual outbreaks. This partnership led to the launch of Google Flu Trends, a site that allows people to compare volumes of flu-related search activity against reported incidence rates on a map of their area.
  • Hospitals analyze patient records to predict which patients are likely to seek re-admission within a few months of discharge. Identifying these patients allows doctors to provide better long-term care, decreasing both hospital and patient costs due to re-admission.
  • Medical records are also used to identify side effects of prescription drugs and to calculate life expectancy or the probability of recovery after diagnosis of terminal diseases or severe accidents.

Politics

Campaigns collect data on each voter—they know your party affiliation, how frequently you vote, whether you’ve made political contributions, and how often you volunteer, among other factors. They also know what TV channels you watch, what magazines you read, and what activities you engage in, so they know which voters to target, what issues are important to them, and where to place advertisements.

Government

  • The Department of Education’s National Center for Education Statistics collects data on enrollment rates, test scores, graduation rates, student financial aid, and students and teachers to identify areas in need of the more support, funding, and attention.
  • NASA’s Center for Climate Simulation is home to 32 petabytes of climate data. This is used to track climate change, improve weather predictions, and increase awareness of severe weather.
  • Law enforcement agencies collect and analyze data on past crime (locations, frequency, level of violence, etc), weather, big events, and gang influence to predict where crime activity will be more likely to occur and where to send more patrols to prevent crime.