2012-2013 Events

Annual Theme: Big Data

Conferences & Symposia

Mini-Conference: What Can We Do With Words?
Friday Dec 4, 2015
This mini-conference sought perspectives on big data analysis of text corpi from Emory faculty and internationally-recognized scholars from the other side of the 'pond.' The event was sponsored by&The Hightower Fund, The Department of Sociology, The Department of Mathematics and Computer Science, Digital Scholarship Commons, The Institute for Quantitative Theory and Methods, The Emory College Language Center, The Graduate School of Liberal Arts, and The Fox Center for Humanistic Inquiry. Event Program.
Elections and Political Order - Quantitative and Formal Approaches

Nov 9-10, 2012
This conference was designed to examine the relationship between elections and force from the perspectives of both Comparative Politics and International Relations. It brought together participants working on interrelated questions such as: When political actors can resort to force in pressing their claims, when and why do they participate in elections? When do elections constrain behavior, such that they lead to political order (in the form of peace or self-enforcing democracy)? Do the answers to these questions vary for different political actors: incumbents, the military, political parties, insurgencies, citizens? The event made possible through the generous support of the Halle Institute, the Institute for Quantitative Theory and Methods, and the Institute for Developing Nations.

Day 1

Paper: Elections under the Shadow of Force

Speaker: Dr. Adam Przeworski (New York University)

Discussant: Dr. James Fearon (Stanford University)

Abstract ▸

Paper: Circumstances and Reputational Incentives in International Crisis Bargaining

Speaker: Dr. Alexandre Debs (Yale University)

Discussant: Dr. Dan Reiter (Emory University)

Abstract ▸
Speaker: Dr. David Carroll (Director, Democracy Program, Carter Center) No Abstract

Paper: Information and Self-Enforcing Democracy: The Role of International Election Observation

Speaker: Dr. Susan Hyde (Yale University)

Discussant: Dr. John Reuter (University of Rochester)

Abstract ▸

Paper: Third-Party Institutions and the Success of Democracy

Speaker: Dr. Milan Svolik (University of Illinois Urbana-Champaign)

Discussant: Dr. Jeff Staton (Emory University)

Abstract ▸

Paper: Defective Democratization: Prior Regimes and Civil Conflict

Speaker: Dr. Burcu Savun (University of Pittsburgh)

Discussant: Dr. Kyle Beardsley (Emory University)

Abstract ▸

Paper: Autonomous Decisions: Why Do Militant Groups Conduct Simultaneous Electoral and Armed Campaigns, and Why Does the Government Allow It?

Speaker: Dr. Aila Matanock (University of California-Berkeley)

Discussant: Dr. Laia Balcells (Duke University)

Abstract ▸
Day 2

Paper: Pocketbook Protests: Explaining the Worldwide Emergence of Pro-democracy Protests

Speaker: Dr. Dawn Brancati (Washington University)

Discussant: Dr. Tom Remington (Emory University)

Abstract ▸
The Interdisciplinary Forum on Complex Networks

Fall 2012
The Interdisciplinary Forum on Complex Networks (IFCN) had the mission to generate a platform where scientists from different disciplines and units at Emory University and beyond could join together to discuss and share methodological, theoretical, and practical ideas that concern complex networks. Biological, physical, and social networks represent a point of interdisciplinary convergence because 1) their architectures tend to have similar properties, 2) they face similar challenges, such as questions about diffusion and robustness, and 3) they require the same methodological tools. Convened by: Monica Capra and Edmund Waller. Forum Details.

-return to top-

Speaker Series

Annual Theme Series: Big Data

Dan Edelstein, The Department of French and Italian, Stanford University

How to Read a Million Letters
Wednesday Apr 24, 2013
Talk Abstract. The Humanities have entered the age of Big Data. Does this also mean that quantification will make a comeback? Already some scholars are announcing the return of ¿cliometrics.¿ In this talk, I examine the place that quantification can play in humanistic projects, particularly those that rely on messy datasets. While I readily grant that quantification is a useful and even necessary tool in digital humanities, I argue that it must still be supplemented by the qualitative, hermeneutic skills that humanists have honed over time. This event was co-sponsored by QTM and the Department of History.

Recording Available ▸
Marcel Salathé, Department of Biology, Penn State

The Dynamics of Vaccination Sentiments on a Large Online Social Network
Wednesday Apr 10, 2013
Talk Abstract. Modifiable health behaviors, a leading cause of illness and death in many countries, are often driven by individual beliefs and sentiments about health and disease. Individual behaviors affecting health outcomes are increasingly modulated by social networks, for example through the associations of like-minded individuals - homophily - or through peer influence effects. Using a statistical approach to measure the individual temporal effects of a large number of variables pertaining to social network statistics, we investigate the spread of a health sentiment towards a new vaccine on Twitter, a large online social network. We find that the effects of neighborhood size and exposure intensity are qualitatively very different depending on the type of sentiment. Generally, we find that larger numbers of opinionated neighbors inhibit the expression of sentiments. We also find that exposure to negative sentiment is contagious - by which we merely mean predictive of future negative sentiment expression - while exposure to positive sentiments is generally not. In fact, exposure to positive sentiments can even predict increased negative sentiment expression. Our results suggest that the effects of peer influence and social contagion on the dynamics of behavioral spread on social networks are strongly content-dependent. This event was co-sponsored by QTM and the Department of Biology.

Recording Available ▸
Eric Green, Director of the National Human Genome Research Institute

Entering the Era of Genomic Medicine: Opportunities and Challenges
Tuesday Mar 19, 2013
Talk Abstract. The Human Genome Project's generation of a reference human genome sequence was a landmark scientific achievement of historic significance. It also signified a critical transition for the field of genomics, as the new foundation of genomic knowledge started to be used in powerful ways by researchers and clinicians to tackle increasingly complex problems in biomedicine. To exploit the opportunities provided by the human genome sequence and to ensure the productive growth of genomics as one of the most vital biomedical disciplines of the 21st century, the National Human Genome Research Institute (NHGRI) is pursuing a broad vision for genomics research beyond the Human Genome Project. This vision includes using genomic data, technologies, and insights to acquire a deeper understanding of genome function and biology as well as to uncover the genetic basis of human disease. Some of the most profound advances are being catalyzed by revolutionary new DNA sequencing technologies; these methods are producing prodigious amounts of DNA sequence data as part of studies aiming to elucidate the complexities of genome function and to unravel the genetic basis of rare and complex diseases. Together, these developments are ushering in the era of genomic medicine. This event was co-sponsored by QTM and the Cherry L. Emerson Center for Scientific Computation.

Recording Available ▸
Josh Angrist, Department of Economics, Massachusetts Institutes of Technology

Wanna Get Away? RD Identification Away From the Cutoff
Friday Mar 8, 2013
Talk Abstract. In the canonical regression discontinuity (RD) design for applicants who face an award or admissions cutoff, causal effects are nonparametrically identified for those near the cutoff. The impact of treatment on inframarginal applicants is also of interest, but identification of such effects requires stronger assumptions than are required for identification at the cutoff. This talk explores RD identification away from the cutoff. Our identification strategy exploits the availability of dependent variable predictors other than the running variable. Conditional on these predictors, the running variable is assumed to be ignorable. This identification strategy is illustrated with data on applicants to Boston exam schools. Functional-form-based extrapolation generates unsatisfying results in this context, either noisy or not very robust. By contrast, identification based on RD-specific conditional independence assumptions produces reasonably precise and surprisingly robust estimates of the effects of exam school attendance on inframarginal applicants. These estimates suggest that the causal effects of exam school attendance for 9th grade applicants with running variable values well away from admissions cutoffs differ little from those for applicants with values that put them on the margin of acceptance. An extension to fuzzy designs is shown to identify causal effects for compliers away from the cutoff. This event was co-sponsored by QTM and the Department of Economics.

No Recording
Steve Cole, The David Geffen School of Medicine, UCLA

Finding Meaning in Big Genomic Data
Wednesday Feb 20, 2013
Talk Abstract. This presentation considered how adding constraints from other levels of analysis can help identify islands of biological meaning within ultra-high-dimensional spaces created by genomic big data. Focusing on “DNA in action”—how genes respond to and influence their molecular ecology—provides a set of common fate and mass action models that focus the analytic search space. Additional constraints imposed by social and ecological systems provide a meta-genomic framework for understanding genomes as a joint product of both their own internal regulatory logic and the constraints and affordances of their environment. Hybridizing abstract models of inter-level constraint with supervised machine learning provides a new approach to “informed search” through big data. This event was co-sponsored by QTM and the Department of Biology.

Recording Available ▸
Gary King, Department of Government, Harvard University

How Censorship in China Allows Government Criticism but Silences Collective Expression
Friday Feb 1, 2013
Talk Abstract. We offer the first large scale, multiple source analysis of the outcome of what may be the most extensive effort to selectively censor human expression ever implemented. To do this, we have devised a system to locate, download, and analyze the content of millions of social media posts originating from nearly 1,400 different social media services all over China before the Chinese government is able to find, evaluate, and censor (i.e., remove from the Internet) the large subset they deem objectionable. Using modern computer-assisted text analytic methods that we adapt to and validate in the Chinese language, we compare the substantive content of posts censored to those not censored over time in each of 85 topic areas. Contrary to previous understandings, posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored. Instead, we show that the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content. Censorship is oriented toward attempting to forestall collective activities that are occurring now or may occur in the future¿and, as such, seem to clearly expose government intent. This event was co-sponsored by QTM and the Department of Political Science

No Recording
Mark Dredze, The Department of Computer Science, John Hopkins University

Public Health in Twitter: What's in there?
Wednesday Nov 14, 2012
Talk Abstract. Twitter and other social media websites contain a wealth of information about populations, and have been used to track sentiment towards products, measure political attitudes, and study social linguistics. In this talk, we investigated the potential for Twitter and social media to impact public health research. Broadly, we explored a range of applications for which social media may hold relevant data, including disease surveillance, public safety, and drug usage patterns. To uncover these trends, we developed new statistical models that can process vast quantities of data and reveal trends and patterns of interest to public health. Our results suggest that social media has broad applicability for public health research. This event was co-sponsored by QTM and the Department of Mathematics & Computer Science.

Recording Available ▸
David Figlio, Institute for Policy Research, Northwestern University

The Effect of Poor Neonatal Health on Cognitive Development: Evidence from a Large New Population of Twins
Thursday Oct 24, 2012
Talk Abstract. Several recent studies show that poor neonatal health (proxied by low birth weight) has persistent effects into adulthood by reducing both an individual's level of educational attainment as well as adult earnings, but little is known about effects before age 18. This paper makes use of a large new population of twins from Florida to study this question. We find that the effects of poor neonatal health on student outcomes are remarkably invariant. The estimates are virtually identical from third grade through tenth grade. They are the same regardless of whether a student attended a "better" school versus a "worse" school, across racial and ethnic groups, and across maternal education levels. However, the effects grow in magnitude between the start of kindergarten and the end of third grade. These results suggest an important potential role for early childhood and early elementary investments in remediating this persistent condition. This event was co-sponsored by QTM and the Department of Economics.

Recording Available ▸
Lillian Lee, Department of Computer Science, Cornell University

Language as Influence(d): Power and Memorability
Wednesday Sept 26, 2012
Talk Abstract. We discussed two projects exploring the interplay between language and influence as revealed by computational analysis of large data sets. (1) We show that power differentials among group participants are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. We considered multiple types of power: status differences (which are relatively static), and dependence (a more "situational" relationship). Using a precise probabilistic formulation of the notion of linguistic coordination, we look at two very different settings: discussions among Wikipedians and arguments before the U.S. Supreme Court. (2) What information achieved widespread public awareness? We consider whether, and how, the way in which the information is phrased¿the choice of words and sentence structure¿can affect information's memorability. We introduce an experimental paradigm that seeks to separate contextual from language effects, using movie quotes as our test case. We find that there are significant differences between memorable and non-memorable quotes in several key dimensions, even after controlling for situational and contextual factors. This event was co-sponsored by QTM, the Department of Mathematics & Computer Science, and the Department of Political Science.

Recording Available ▸

-return to top-

Visiting Fellow Speaker Series

Coen P.H. Elemans, Institute of Biology, University of Southern Denmark

Singing in the Fast Lane: the Neuromechanics of Sound Production in Vocal Vertebrates
Friday Mar 22, 2013
Talk Abstract.Sound is the fastest, most accurate, and information-rich modality for communication in all vertebrates, with human language at the pinnacle of complexity. Just like human infants, songbirds learn their song through imitation learning, mimicking their parents. Songbirds have therefore become an important model system to understand the neural processes and pathologies underlying human speech production and language acquisition. My research aims at unraveling the question How are neural signals translated into sound, operating at the border of neuroscience and biomechanics. As such, neuromechanics integrates both experimental and computational approaches from physics, molecular biology, physiology and neuroscience. We find that sound production systems are pushed to the extremes: tissues violently collide at 100,000 times/sec and extreme performing superfast muscles contract up to 250 times/sec. While focused on songbirds, I use a comparative approach to find unifying principles of motor control and discover new model systems across the vocal vertebrates, from birds to fish, from mice to whales. This event was co-hosted by the Department of Biology.

Recording Available ▸

-return to top-


Data Visualization

Apr 18 & 23, 2013
Data visualization is an essential tool for discovery and communication of quantitative information, especially as datasets have grown in size and complexity.  This workshop, offered twice in April 2013, provided a set of ideas, techniques, and best practices for creating effective graphical data presentations using R software (www.r-project.org). Led by Robi Ragan.

No Recording
A Primer on Recent Advances in Nonparametric Estimation and Inference

Mar 1, 4, & 6, 2013
This was the second series of non-parametric primer workshops offered by QTM in Spring 2013. In this workshop, we studied a unified framework for nonparametric and semiparametric kernel-based analysis with an emphasis on applied modeling. We focused on kernel-based methods capable of handling the mix of categorical (nominal and ordinal) and continuous datatypes one typically encounters in the course of applied data analysis. Applications were emphasized throughout, and we used R for data analysis (www.r-project.org). Led by Jeffrey Racine.

No Recording
A Primer on Recent Advances in Nonparametric Estimation and Inference

Feb 25 & 27, 2013
In these workshops, we studied a unified framework for nonparametric and semiparametric kernel-based analysis with an emphasis on applied modeling. We focused on kernel-based methods capable of handling the mix of categorical (nominal and ordinal) and continuous datatypes one typically encounters in the course of applied data analysis. Applications were emphasized throughout, and we used R for data analysis (www.r-project.org). Led by Jeffrey Racine.

No Recording
Scraping Data from the Web

Friday Nov 16, 2012
In this workshop, we will talk about how to crawl web pages and PDF files from the web, as well as how to extract target information from the crawled raw data using R, Python, and Java. Several examples were given, e.g. data scraping from Twitter and Wikipedia. The workshop began by introducing two crawling methods: crawling based on HTTP request and crawling via web API. After the pages are crawled, we showed several methods of data extraction from the web pages, including table data extraction, regular expression based data extraction, and Xpath based data extraction. Several data scraping examples were demonstrated, including the entire pipeline of scraping data¿from crawling to analysis¿using Twitter and Wikipedia. How to automatically crawl a batch of PDF files from the web and extract the text was also demonstrated. The workshop was sponsored by QTM.  Led by Qiaoling Liu and Yu Wang.

Part I ▸

Part II ▸

-return to top-