Speaker Series

QuanTM hosts a number of themed speaker series. Normally these events take place Wednesdays from 12:00-1:30pm in room 201 of the Modern Languages building unless otherwise specified. Interested in attending a lecture? Please register for the event using the form included at the bottom of each series page.  

Research Design is Not Enough (2016-17)

2016-17 Annual Theme: Research Design is Not Enough
The Role of Theory in Causal Inference

Social science seeks to understand the causes of important social phenomena. Understanding requires both theoretical explanation as well as empirical evidence bearing on causal claims. Statistical research on the problem of causal inference beginning in the last quarter of the 20th century has provided a powerful foundation for both experimental research as well as strong observational designs meant to uncover causation. These advances have permeated all of the social sciences in multiple ways.  In addition, contemporary developments in computing have permitted scholars to carry out empirical research on a massive scale, opening up possibilities for causal research in contexts, on subjects, and at scales that were impossible until a very short time ago. As exciting and useful as these empirical developments have been, it is important not to overlook the crucial role that social science theories play in the interpretation of results and assessment of policy implications. Our annual theme considers the role of theory development and application in empirical research in the social sciences intended to support causal claims.  The theme will be explored through interdisciplinary workshops, conferences, and a variety of lectures.

Erik Snowberg - February 1, 2017

from Caltech, division of Humanities and Social Sciences

Time/Location: 12:00pm - 1:30 pm/Modern Languages Building, RM201

Maggie Penn - March 29, 2017

from the University of Chicago, Department of Political Science

Time/Location: 12:00pm - 1:30 pm/Modern Languages Building, RM201

Charles Manski - April 12, 2017

from Northewestern University, Department of Economics

Time/Location: 12:00pm - 1:30 pm/Modern Languages Building, RM201

Visiting Fellows Speaker Series

Alberto Purpura

Visiting Faculty Fellow, Fall 2016
University of Padua, Italy

Thursday, December 8th 12:00 - 1:30pm in Modern Languages Building, RM 201

Before Computer Scientists Make Us Obsolete… Let’s Take Advantage of Them

The talk illustrates some of the cutting-edge tools that data miners and computational linguists have been perfecting over the last decade or so. We will start by showing how basic principles of machine learning built in PC-ACE (Program for Computer-Assisted Coding of Events) make manual approaches to text such as Content Analysis or Quantitative Narrative Analysis (QNA) more efficient and more reliable than in CAQDAS programs (Computer-Assisted Qualitative Data Analysis Software, such as Atlas.ti, NVivo, MaxQda). We will show how narrative data in PC-ACE can be visualized in dynamic network graphs and dynamic GIS maps. For illustrative purposes we will rely on a corpus of a thousand newspaper articles on lynching events that occurred in Georgia between 1875 and 1935. The main focus of the talk will be on Natural Language Processing (NLP) tools and what they can do for us: Part-of-Speech (POS) tagging, Named Entity Recognition (NER), Dependency Parsing, and Sentiment analysis in Stanford CoreNLP, topic modelling in Mallet, sentence complexity in Computerized Linguistic Analysis System (CLAS), Key-Word in Context searches (KWIC). We will use these computational tools to compare the short story Dry September, a fictional story of lynching by Nobel laureate William Faulkner, to the thousand newspaper articles on real Georgia lynchings.

Weihua An

Visiting Faculty Fellow, Summer 2016
Departments of Sociology and Statistics, Indiana University

Network Dynamics of Network Interventions

Researchers have long recognized the power of social networks and are fascinated by the idea of utilizing network information to design more effective interventions. In previous social network-based interventions, networks are kept (or assumed to be) constant. In this study, I examine the effects of a smoking prevention intervention on friendship networks. The results show that as compared to random intervention (which targeted random students), smokers’ popularity significantly decreased more in social network-based interventions (i.e., those targeted central students or students with their friends together). The results are surprising in that even weak interventions that may not be able to change attitudes or behaviors can have notable, unintended effects on the structures of the underlining networks. The results are profound in that the structural changes can lead to long-term impact on future attitudes and behaviors. Overall, the findings highlight the importance of treating networks (not just attitudes and behaviors) as outcomes in evaluating health and social interventions. They also indicate that social networks cannot be treated as fixed transmitters that simply passively spread diffusion. Instead, social
networks fluidly and actively respond to external interventions, even if the interventions are not meant to alter the networks. The reason lies in that human beings constantly utilize social selection as a tool to exert, resist, and manage social influence. 

Related paper: Multilevel meta network analysis with application to studying network dynamics of network interventions

Michael Rubin

Visiting Faculty Fellow, Spring 2016
Department of Political Science, Columbia University

Rebel Territorial Control and Civilian Agency in Civil War

Where do rebel organizations successfully control territory during insurgency? Under what conditions does community-level collective action influence rebel territorial control? Existing theories of rebel control have emphasized geography, natural resources and identity- or ideology-based affinity within the population, with mixed empirical support. This paper emphasizes non-combatants' political role in conflict processes: it argues that community collective action capacity, the ease with which communities facilitate collective action to pursue common interests, influences the distribution of territorial control during civil war. Communities with high collective action capacity deter rebels by raising the costs of controlling territory. Under certain conditions, collective action capacity also increased the expected benefits to rebels associated with controlling territory; in particular, where rebels seek population-dependent resources such as intelligence regarding counterinsurgent strategy, food/supplies, population concealment, or political legitimacy. I test the theory within a single case: the communist insurgency in the Philippines. I fit a linear multilevel model, regressing Armed Forces of the Philippines measures of village-level rebel influence on collective action capacity measured by summarizing village family network structure using data from a 2008-2010 Poverty Census. The results suggest the social structure in conflict-affected communities predict the level of rebel influence, consistent with the theory.

Elliott Sober

Visiting Faculty Fellow, Spring 2016

Department of Philosophy, University of Wisconsin

Ockham's Razor - When Is the Simpler Theory Better?

Tuesday March 15, 2016 @ 4:00 pm  in PAIS Room 290

Many scientists believe that the search for simple theories is not optional; rather, it is a requirement of the scientific enterprise. When theories get too complex, scientists reach for Ockham’s razor, the principle of parsimony, to do the trimming. This principle says that a theory that postulates fewer entities, processes, or causes is better than a theory that postulates more, so long as the simpler theory is compatible with what we observe. Ockham’s razor presents a puzzle. It is obvious that simple theories may be beautiful and easy to remember and understand. The hard problem is to explain why the fact that one theory is simpler than another tells you anything about the way the world is. In my lecture, I’ll describe two solutions.

Does Ockham's Razor Solve the Mind/Body Problem?

Thursday March 17, 2016 @ 4:15 pm in White Hall 101

Philosophy Colloquium at Emory - for more details contact the Philosophy Deparment at 404-727-6577

Darwin’s Phylogenetic Reasoning

Tuesday March 22, 2016 @ 9:00 am in Rollins Research Center Room 1052

Refreshments will be served at 8:45 am

Parsimony and Chimpanzee Mind-reading

Friday March 25, 2016 @ 4:00 pm in the Psychology Building Room 280

Are chimpanzees mind-readers? That is, besides forming beliefs about the physical objects in their environment, do they also form beliefs about the mental states of other chim-panzees?  Psychologists have tried to answer this question by using Ockham's razor. In fact, two sorts of parsimony have been invoked -- phylogenetic parsimony and blackbox parsimony.  Although it is generally conceded that phylogenetic parsimony is on the side of the mind-reading hypothesis, it is controversial what conclusion can be drawn from that fact.  With respect to blackbox parsimony, some psychologists have argued that this consideration counts in favor of the mind-reading hypothesis while others contend that parsimony counts against the hypothesis. In my talk, I'll try to clarify both sorts of parsimony arguments.

Ermal Shpuza

Visiting Faculty Fellow, Fall 2014

Quantifying the Social Logic of Buildings and Cities

Department of Architecture, Southern Polytechnic State University

The built environment is the largest and most complex physical human artifact. Relational patterns of connections and separations in the built space underlie important aspects of human behavior and illuminate the spatial logic of society. Buildings and cities are described according to topological properties in contrast to geometrical representations that have traditionally informed architectural and urban theories. This research is aimed at unfolding important aspects of the social logic of built space by means of quantitative analysis of buildings and cities based on graph-theoretic methods. Spatial complexes are studied as networks of connections among elementary units of rooms, circulation spaces and streets, which support structured conditions of encounter, co-presence, co-awareness, and movement. Several themes of inquiry are
discussed supported by the morphological analysis of various scales of built space including: the evolution of Adriatic and Ionian coastal cities, complex dynamics of arterial roads during city growth, shape description based on human perception of space, interaction between boundary shape and circulation structure in buildings and cities, and Balkan vernacular houses. The research spans cross-disciplinary links from architecture and urbanism to complexity science, graph theory, morphology, morphometry, physiography, historical cartography, urban
history, organizational management, and environmental studies.

Jason Fletcher

Visiting Faculty Fellow, Summer 2013

Understanding Heterogeneous Effects of Health Policies Using a Gene-Environment Interaction Framework

Yale School of Public Health (Currently at University of Wisconsin-Madison)

This talk outlines a research agenda that combines genetic and social science concepts, methodologies, and data to pursue new insights in understanding the impacts of environments and policies on human behavior.  The primary example of this agenda will focus on new results suggesting a gene-environment interaction in responses to tobacco control policies.  Based on variation in a nicotinic receptor gene, I find that some individuals have no behavioral response to tobacco taxation, suggesting a novel gene-policy interaction.  This finding has implications for how we understand the mechanisms of specific health policies and how we might consider introducing new policies to further reduce tobacco consumption.  I also show preliminary evidence of additional gene-environment (policy) interactions with alcohol policies and experiences of economic downturns. 

Cohosted by the Department of Economics

Missed the seminar? Watch Dr. Fletcher's presenation HERE

Coen P.H. Elemans 

Visiting Faculty Fellow, Spring 2013

Singing in the Fast Lane: the Neuromechanics of Sound Production in Vocal Vertebrates

Institute of Biology, University of Southern Denmark

Sound is the fastest, most accurate, and information-rich modality for communication in all vertebrates, with human language at the pinnacle of complexity. Just like human infants, songbirds learn their song through imitation learning, mimicking their parents. Songbirds have therefore become an important model system to understand the neural processes and pathologies underlying human speech production and language acquisition.

My research aims at unraveling the question ¿How are neural signals translated into sound¿, operating at the border of neuroscience and biomechanics. As such, neuromechanics integrates both experimental and computational approaches from physics, molecular biology, physiology and neuroscience.

We find that sound production systems are pushed to the extremes: tissues violently collide at 100,000 times/sec and extreme performing superfast muscles contract up to 250 times/sec. While focused on songbirds, I use a comparative approach to find unifying principles of motor control and discover new model systems across the vocal vertebrates, from birds to fish, from mice to whales.

Cohosted by the Department of Biology

Quantitative Biology & Theoretical Biophysics (2015-16)

Alessandro Treves

Cognitive Neuroscience, SISSA, Italy

March 23, 2016: The Hippocampus: from Memory into Space and Back

In this seminar, the speaker will contrast the spatial and memory narratives that have dominated these last few decades of hippocampal research, leading to the somersault caused by the discovery of grid cells. Besides yielding a Nobel prize for Edvard and May-Brit Moser, grid cells have paradoxically refocused attention on the dentate gyrus, as one of two key innovations introduced in the mammalian nervous system some 250 million years ago.

Daniel Fisher

Department of Applied Physics, Stanford University

April  6, 2016 Evolutionary Population Dynamics

Ken Miller

from Department of Neuroscience, Columbia University

The stabilized supralinear network: A simple "balanced network" mechanism explaining nonlinear cortical integration

Talk held March 2, 2016: Across multiple sensory cortical areas, strong nonlinearities are seen in the summation of responses to multiple stimuli. Responses to two stimuli in a neuron's receptive field (the sensory region in which appropriate stimuli can drive spike responses) typically sum sublinearly, with the response to the two stimuli presented simultaneously typically closer to the average than the sum of the responses to the two individual stimuli. However, when stimuli are weak, responses sum more linearly. Similarly, contextual stimuli, outside the receptive field, can suppress responses to strong stimuli in the receptive field, but more weakly suppress or facilitate responses to weaker receptive field stimuli. I'll present a simple circuit mechanism that explains these and many other results. Individual neurons have supralinear input/output functions, leading the gain of neuronal responses to increase with response level. This drives a transition from (i) a weak-input regime in which neurons are weakly coupled, responses sum linearly or supralinearly, and contextual stimuli can facilitate, to (ii) a stronger-input regime in which neurons are strongly coupled and stabilized by inhibition against excitatory instability, responses sum sublinearly, and contextual stimuli suppress. I'll describe this mechanism and show how it can explain a variety of cortical behaviors, including those described above as well as suppression of correlated neural variability by stimuli and other behaviors as time permits.

Eleni Katifori

from the Department of Physics & Astronomy, University of Pennsylvania

Emerging hierarchies in biological distribution networks

Talk held February 3, 2016.  Biological transport webs, such as the blood circulatory system in the brain and other animal organs, or the slime mold Physarum polycephalum, are frequently dominated by dense sets of nested cycles. The architecture of these networks, as defined by the topology and edge weights, determines how efficiently the networks perform their function. In this talk we present some general models regarding the emergence and extraction of hierarchical nestedness in biological transport networks. In particular, we discuss how a hierarchically organized vascular system is optimal under conditions of variable, time-dependent flow, but also how it emerges naturally from a set of simple local feedback rules. To characterize the topology of these weighted cycle-rich network architectures, we develop an algorithmic framework that analyzes how the cycles are nested. Finally, using this algorithmic framework and an extensive dataset of more than 180 leaves and leaflets, we show how the hierarchical organization of the nested architecture is in fact a distinct phenotypic trait, akin to a fingerprint, that characterizes the vascular systems of plants and can be used to assist species identification from leaf fragments.

Stephanie Palmer

from the Department of Organismal Biology and Anatomy, University of Chicago

The learnability of critical distributions

Talk held January 27th 2016.  Many biological systems, including some neural population codes, have been shown empirically to sit near a critical point. While many detailed discussions about the origins of these phenomena have been had in recent years, less is known about the utility of such behavior for the biological system. Here we demonstrate a potentially useful feature of such codes. We construct networks of interacting binary neurons with random, sparse interactions (i.e. an Erdos-Renyi graph) of uniform strength. We then characterize the discriminability of those interactions from samples by performing a direct coupling analysis and thresholding the direct information between each pair of neurons to predict the presence or absence of an interaction. By sweeping through threshold values, we compute the area under the ROC curve as a measure of discriminability of the interactions. We show that this resulting discriminability is maximized when the original distribution is at its critical point. This behavior may be useful for efficient communication between brain areas.

Aleksandra Walczak

from the Physics Department at Ecole Normale Supérieure, Paris

Diversity of Immune Receptor Repertoires

talk held December 2, 2015:  Recognition of pathogens relies on the diversity of immune receptor proteins. Recent experiments that sequence the entire immune cell repertoires provide a new opportunity for quantitative insight into naturally occurring diversity and how it is generated. I will describe how we can use statistical inference to quantify the origins of diversity in these sequence and characterize selection in the somatic evolutionary process that leads to the observed receptor diversity. A well-adapted repertoire should be tuned to the pathogenic environment to reduce the cost of infections. I will finish by discussing the form of the optimal repertoire that minimizes the cost of infections contracted from a given distribution of pathogens.

Dmitri Chklovskii

Group Leader for Neuroscience, Simons Foundation

Similarity Matching: A New Principle of Neural Computation

Talk held October 21, 2015.  Inspired by experimental neuroscience results we developed a family of online algorithms that reduce dimensionality, cluster and discover features in streaming data. The novelty of our approach is in starting with similarity matching objective functions used offline in Multidimensional Scaling and Symmetric Nonnegative Matrix Factorization. During this seminar, I discuss how we derived online distributed algorithms that can be implemented by biological neural networks resembling brain circuits. I will also cover how such algorithms may also be used for Big Data applications.

Elena Koslover

from the Biochemistry Department at Stanford University 

Emergent Physical Phenomena: from Biomolecules to Living Cells 

Talk held September 16,2015: The internal microenvironment of a cell comprises an intricate choreography of molecules that must be transported from one location to another, elastic forces that must be overcome or harnessed into useful work, and molecular interactions whose rates must be carefully controlled. Using multi-scale models grounded in statistical physics and continuum mechanics, we study how collective physical phenomena arise from biomolecular constituents and how they impact biological function.

The mechanical properties of DNA as a semiflexible "worm-like" chain play a critical role in the packaging and accessibility of the genome. The elasticity of this chain contributes to interactions between DNA-binding proteins and impacts the formation of chromatin fibers that serve as the lowest level of genome organization. The wide range of length scales relevant to the behavior of biopolymers such as DNA necessitate the development of efficient and accurate coarse-graining methods. A novel technique for systematically mapping polymer models to effective semielastic chains that are both analytically tractable and suitable for large-scale simulation will be discussed.

At the cellular level, the complex mechanics of the cytoplasm emerges from a combination of active forces and heterogeneous material conditions. The cytoplasm of motile cells provides a uniquely dynamic intracellular environment for studying the interplay of these effects. Using newly developed techniques for analyzing microrheological data in moving systems, we demonstrate that the cytoplasm of neutrophils behaves as a viscous fluid whose flows dominate intracellular particle motion. The shape dynamics of these cells further motivates the study of mixing and reactions inside fluctuating confined fluid domains.

From individual DNA-protein interactions to dynamic whole-cell deformations, this talk will highlight the importance of large-scale physical phenomena in the structure and function of living cells.

Quantitative Approaches in Climate Change Series (2015-16)

Ian Howat

School of Earth Sciences, Ohio State University

April 15, 2016 - Are the Ice Sheets Collapsing?

Human civilization has developed during a period of relatively stable sea level, preceded by rapid pulses of rising oceans during deglaciation. Will our warming atmosphere and oceans bring a return to the
catastrophic conditions of the early Holocene, or worse? The concept that ice sheets can change substantially on timescales of centuries or less is new, and the past decade has brought radical changes in
our understanding of their dynamics and how they react to changes at their air and ocean boundaries. Yet, our understanding is far from complete and predictions are still highly uncertain: current estimates of sea level rise for this century range from decimeters to over a meter. Starting from a basic global energy and mass balance perspective, we will review the mechanisms driving ice sheet response to climate and
assess the potential for rapid, near-future sea level rise.

Christian Schoof

Department of Earth, Ocean, and Atmospheric Sciences, University of British Columbia

Talk took place March 3, 2016 titled: Models for melt drainage in ice sheet dynamics

This lecture began with a brief introduction to basic ice sheet dynamics, focusing on a handful of processes that play a dominant role in determining steady state configurations of a simplified ice sheet and their stability. Recent developments in modelling subglacial drainage were then be discussed including, our current understanding about how water drains along glacier beds us based on ideas that were mostly
developed in the 1970s and 1980s, and how we have only recently succeeded in drawing these ideas together in spatially extended two-dimensional models. Reviewed were the basic physics involved, and
show how this dictates the rich dynamical structure of the model - where we have a channelizing instability that differs from that for hillslope stream formation, and the possiblity of oscillatory behaviour in the form of subglacial outburst floods. Much of the talk was motivated by the application of drainage models to seasonal variations in ice flow in Greenland, where time-varying water supply causes summer-time speedups and slow-downs in ice flow by changing the lubrication of the glacier bed.

Susan Solomon

Ellen Swallow Richards Professor of Atmospheric Chemistry & Climate Science, MIT

Talk took place February 24, 2016 titled: A Tale for Our Times: Something For Everyone About Climate Change & Getting Past Climate Gridlock

This talk will include key aspects of (i) the science of climate change, (ii) why international agreement on climate change policy has proven particularly difficult, and (iii) what the Paris agreement on climate change is achieving and could achieve in the future. Manmade greenhouse gases are slowly forcing the climate system to change. Carbon dioxide emissions from fossil fuel burning are the dominant cause of climate change. Some of today's carbon emissions will still affect the atmosphere in a thousand years and beyond, leading to a very long 'commitment' to future climate change. Increases in carbon dioxide arise from a mix of different countries,both developed and developing, with different current emissions, infrastructure capabilities, and past commitments, and these human factors shape global policy discussions.  Comparisons will be briefly drawn between the success of policy on ozone depletion (Montreal Protocol) and the prospects for success of the Paris agreement, adopted by nearly 200 countries in December, 2015.

Daniel Rothman

Department of Earth, Atmospheric & Planetary Sciences, MIT

Talk took place February 11, 2016 Titled:Earth System Stability Through Geologic Time

The five great mass extinctions of the last 500 million years are each associated with significant perturbations of Earth's carbon cycle. But there are also many such environmental events not associated with mass extinction. What makes them different? We show that natural perturbations of the carbon cycle exhibit a critical rate of change resulting from a transient balance between the photosynthetic uptake and respiratory return of CO2. The critical rate is also the fastest rate at which the resulting excess CO2 can be produced in a sustained steady state. We identify the critical rate with marginal stability, and find that four of the five great mass extinctions occur on the fast, unstable side of the stability boundary. Moreover, many severe yet relatively benign events occur close to the boundary. These results suggest that major environmental change is characterized by common mechanisms of Earth-system instability. The most rapid instabilities result in mass extinction.

David Archer

Department of Geophysical Sciences, University of Chicago

Talk took place January 28th 2016, titled Near Miss: The importance of the preanthropogenic atmospheric CO2 concentration on human historical evolution

When fossil fuel energy was discovered, the timing and intensity of the resulting climate impacts depended on what the natural CO2 concentration in the atmosphere was at that time. The natural CO2 concentration is thought to be controlled by complex, slow-acting natural feedback mechanisms, and could easily have been different than it turned out to be. If the natural concentration had been a factor of two or more lower, the climate impacts of fossil fuel CO2 release would have occurred about 50 or more years sooner, making it much more challenging for the developing human society to scientifically understand the phenomenon of anthropogenic climate change in time to prevent it.


Data Visualization (2015-16)

2015-16 Annual Theme: Data Visualization

Data visualization is the graphic presentation of information which supports the exploration, examination, and communication of complex data (Few 2009).  In short, it is a method by which scientists and researchers can transform data and evidence into explanations (Tufte 2006).  Beyond serving as a means of effective and efficient communication, data visualization affords researchers and consumers alike the opportunity to process large quantities of data and develop a deeper understanding of the world in which we live.  The unprecedented quantity and quality of data now available has created a renewed interest in and demand for data visualization techniques (Yau 2011).  Recent advances in data visualization have made possible the analysis of such information which previously may have been too complex to uncover substantively important patterns and relationships.  The applications and development of data visualizations techniques, which span nearly every field from the humanities to hard sciences, constitute an important and vibrant area of research highly relevant across academic disciplines and professions. 

Katy Börner

From the Department of Information and Library Science, Indiana University

Data Visualizations: Drawing Actionable Insights from Data

Talk held February 4, 2016.  In an age of information overload, the ability to make sense of vast amounts of data and to render insightful visualizations is as important as the ability to read and write. This talk explains and exemplifies the power of visualizations not only to help locate us in physical space but also to help us understand the extent and structure of our collective knowledge, to identify bursts of activity, pathways of ideas, and borders that beg to be crossed. It introduces a theoretical visualization framework meant to empower anyone to systematically render data into insights together with tools that support temporal, geospatial, topical, and network analyses and visualizations. Materials from the Information Visualization MOOC and maps from the Places & Spaces: Mapping Science exhibit will be used to illustrate key concepts and to inspire participants to visualize their very own data.

Ben Schmidt

from the History Department at Northeastern University

Historical data visualization and presenting rich data archives

Talk held November 11, 2015.  In the contemporary humanities, datasets are not just evidence but archives, demanding reinterpretation; visualization provides one of the richest and most widespread ways facilitating this. This talk will describe the reception and remarkable misrepresentations of the most influential single data visualization in the historical profession, the US Census's maps of the frontier line from the late 19th century; and then describe an agenda of web-based data visualizations using D3 geared towards exploratory analysis that can allow freer exploration of data archives as evidence. These platforms--for exploring census data, historical shipping routes, and text collections with metadata--embody an approach towards humanities data visualization not simply as presenting single views, but as creating weak domain-specific-languages for sharing data archives with scholars and a wider public.

John Stasko

from the School of Interactive Computing at Georgia Institute of Technology

The Value of Visualization for Exploring and Understanding Data

Talk held October 1, 2015.  Investigators have an ever-growing suite of tools available for analyzing and understanding data. While techniques such as statistical analysis, machine learning, and data mining all have value, visualization provides an additional unique set of beneficial capabilities. In this talk I identify the particular advantages that visualization brings to data analysis beyond other techniques, and I describe the situations in which it can be most beneficial. Additionally, I identify three key tenets for success in data visualization: understanding purpose, embracing interaction, and identifying value. To help support these arguments, I will draw upon and illustrate a number of current research projects from my lab. One particular system demonstrates how visualization can facilitate exploration and knowledge acquisition from a collection of thousands of narrative text documents.

Polo Chau

from the College of Computing at Georgia Institute of Technology

Catching Bad Guys with Visualization and Data Mining

Talk held October 14, 2015.  Big data has redefined crime. We now see new breeds of crime where technologically savvy criminals cover their tracks with the large amount of data generated, and obfuscate law enforcement with multiple fake virtual identities. I will describe major data mining and visualization projects from my group that combat malicious behaviors by untangling sophisticated schemes crafted by criminals.

  1. The Polonium malware detection technology that unearth malware from 37 billion machine-file relationships. Deployed by Symantec, Polonium protects 120 million machines worldwide. Our next generation Aesop technology pushes the detection rate to over 99%.
  2. The NetProbe system detects auction fraud on eBay and fingers bad guys by identifying their networks of suspicious transactions.
  3. Mixed-initiative graph sensemaking, such as the Apolo system and the MAGE system that combines machine inference and visualization to guide the user to interactively explore large graphs.

Speaker Bio: Duen Horng (Polo) Chau is an Assistant Professor at Georgia Tech’s School of Computational Science and Engineering, and an Associate Director of the MS Analytics program. Polo holds a PhD in Machine Learning and a Masters in human-computer interaction (HCI). His PhD thesis won Carnegie Mellon’s Computer Science Dissertation Award, Honorable Mention. 

Polo’s research lab bridges data mining and HCI to solves large-scale, real world problems. They develop scalable, interactive, and interpretable tools for big data analytics.  Their patented Polonium malware detection technology protects 120 million people worldwide. Their auction fraud detection research was widely covered by media. Their fake review detection research received the Best Student Paper Award at the 2014 SIAM Data Mining Conference.

Polo received faculty awards from Google, Yahoo, and LexisNexis, Raytheon Faculty Fellowship, Edenfield Faculty Fellowship, Outstanding Junior Faculty Award. He is the only two-time Symantec fellow and an award-winning designer.

Collective Computation in Biological Communication, Neural Dynamics, and Behavior (2014-15)

Jessica Flack

November 19th, 2014:  From the Center for Complexity and Collective Computation in the Wisconsin Institute for Discovery, UW Madison, and from the Sante Fe Institute

Life’s Information Hierarchy

We have proposed that biological systems are information hierarchies organized into multiple functional space and time scales. This multi-scale structure results from the collective effects of components estimating regularities in their environments by coarse-graining or compressing time series data and using these perceived regularities to tune strategies.  As coarse-grained (slow) variables become for components better predictors than microscopic behavior (which fluctuates), and component estimates of these variables converge, new levels of organization consolidate, giving the appearance of downward causation. This intrinsic subjectivity suggests that the fundamental macroscopic properties in biology will be informational in character. If this view is correct, a natural approach is to treat the micro to macro mapping as a collective computation performed by system components in their search for configurations that reduce environmental uncertainty.  I will discuss how we can move towards a thermodynamics of biology by studying this process inductively. This includes strategy extraction from data, construction of stochastic circuits that map micro to macro, dimension reduction techniques to simplify the circuits and move towards an algorithmic theory for the macroscopic output, and macroscopic tuning and control.

Sara A. Solla

December 2nd, 2014: From the Department of Physiology and Department of Physics and Astronomy, Northwestern University

Statistical Inference on Networks of Spiking Neurons

Coupling large numbers of relatively simple elements often results in networks with complex  computational abilities. Examples abound in biological systems - from genetic to neural networks, from metabolic networks to immune systems, from networks of proteins to networks of economic and social agents.  Recent and continuing increases in the experimental ability to simultaneously track the dynamics of many constituent elements within these networks present a challenge to theorists: to provide conceptual frameworks and develop mathematical and numerical tools for the analysis of such vast data. The subject poses great challenges, as the systems of interest are noisy and the available information is incomplete. 

For the specific case of neural activity, Generalized Linear Models provide a useful framework for a systematic description. The formulation of these models is based on the exponential family of probability distributions; the Bernoulli and Poisson distributions are relevant to the case of stochastic spiking. In this approach, the time-dependent activity of each individual neuron is modeled in terms of  experimentally accessible correlates: preceding patterns of activity of this neuron and other monitored neurons in the network, inputs provided through various sensory modalities or by other brain areas, and outputs such as muscle activity or motor responses. Model parameters are fit to maximize the likelihood of the  observed firing statistics; smoothness and sparseness constraints can be incorporated via regularization techniques. When applied to neural data, this modeling approach provides a powerful tool for mapping the spatiotemporal receptive fields of individual neurons, characterizing network connectivity through pairwise interactions, and monitoring synaptic plasticity.

Tatyana Sharpee

January 14th, 2015: From the Computational Neurobiology Laboratory, Salk Institute for Biological Studies

Maximally Informative Behaviors Implemented by Simple Neural Circuits

In this talk I will show that the foraging patterns of a small nematode, C. elegans, can be accurately described by theories of maximally informative search strategies. Further it is possible to design environmental conditions for C. elegans where worm foraging patterns follow maximally informative search strategies that are in direct contrast to chemotaxis predictions. in order to perform a maximally informative search, animals technically need to maintain a full mental map for the likelihood distribution of food throughout the environment. However, my colleagues and I find that this search can be approximated well (under conditions of our experiments) with a simple drift-diffusion model. The corresponding neural implementation within the C. elegans neural circuits will be discussed.

Gonzalo de Polavieja

February 18th, 2015: From the Champalimaud Neuroscience Programme, Champalimaud Foundation

Decision Making in Groups

Missed the talk? Watch it here.

I will talk about a theoretical approach to collective decisions that works well across species. I will also take the opportunity to present idTracker (www.idtracker.es), software that analyses video to identify each individual in a group. This identification is used for tracking without propagation of mistakes, thus obtaining large amounts of high quality data. I will end my talk with applications of our models to understand aggregation in adverse conditions, how humans make estimations in groups, and how we can improve them.

Mala Murthy

April 1st, 2015: From the Princeton Neuroscience Institute, Department of Molecular Biology, Princeton University

Neural Computations Underlying Acoustic Communication in Drosophila

This talk addresses the goal of our research: to discover fundamental principles about sensory perception, sensorimotor integration, and the generation of behavior. To make these discoveries, we focus primarily on the acoustic communication system of Drosophila. Similar to other animals, flies produce and process patterned sounds during their mating ritual: males generate songs via wing vibration, while females arbitrate mating decisions. I will discuss how our studies, using quantitative behavior, in vivo electrophysiology, computational modeling, and genetic tools, address the neural mechanisms underlying both the production and perception of dynamic courtship songs in Drosophila.

Learning Analytics Series (2014-15)

Charles Dziuban

October 15th, 2014:  From the Center for Distributed Learning, University of Central Florida

Teaching and Learning in an Evolving Educational Environment

Missed the talk? Watch it here. View the talk slides here.

Chuck will discuss twenty years of research on the impact of online and blended learning at the University of Central Florida. He will show the degree to which students succeed and in various course modalities as well as their preference for instructional formats. In addition he will document 15 years of research on understanding the student voice in contemporary education and how it impacts models of excellent teaching. Finally he will argue that the scholarship of teaching and learning plays a critical rule in understanding engagement, connection and transformation in new learning cultures.

Charles Dziuban is Director of the Research Initiative for Teaching Effectiveness at the University of Central Florida (UCF) where has been a faculty member since 1970 teaching research design and statistics. He received his Ph.D. from the University of Wisconsin. Since 1996, he has directed the impact evaluation of UCF’s distributed learning initiative examining student and faculty outcomes as well as gauging the impact of online, blended and lecture capture courses on the university. His methods for determining psychometric adequacy have been featured in both the SPSS and the SAS packages. He has received funding from several government and industrial agencies including the Ford Foundation, Centers for Disease Control, National Science Foundation and the Alfred P. Sloan Foundation. Chuck has co-authored, co-edited, or contributed to numerous books and chapters on blended and online learning including Handbook of Blended Learning Environments, Educating the Net Generation, and Blended Learning: Research Perspectives. He has given invited presentations on how modern technologies impact learning at more than 80 colleges and universities worldwide. His new book Blended Learning Research Perspectives II, co-edited with Anthony Picciano and Charles Graham was released in the fall of 2013.

Alyssa Wise

November 17th, 2014: From the Department of Education, Simon Fraser University 

Advancing University Teaching and Learning Analytics: Linking Pedagogical Intent and Student Activity through Data-Based Reflection

Missed the talk? Watch it here. View the talk slides here.

Learning analytics are data traces of student activity that can be used to better understand and support learning processes and outcomes. Over the last few years there have been remarkable advances in our ability to calculate and display useful information about what students are doing. Now, we face the important challenge of how to mobilize this intelligence to have a meaningful impact on university teaching and learning. To do so, we need to consider and design for the ways in which learning analytics can become a part of (and change) the activity patterns of instructors and students. Working within the scope of the university course, I describe ways to integrate learning analytics into teaching and learning processes by using data-informed reflection to probe the connections (and disconnects) between instructors’ and designers’ pedagogical intents and students’ actual activity patterns. Particular attention will be paid to roles for students in the process, and the use of different reference frames for data interpretation. To ground the discussion, work from the E-Listening Project at Simon Fraser University will be presented as an initial example of a learning analytics application developed and implemented in a university course using such an integrated approach.

Ryan Baker

February 9th, 2015:  From the Teachers College, Columbia University

Towards Long-Term and Actionable Prediction of Student Outcomes Using Automated Detectors of Engagement and Affect

Missed the talk? Watch it here.

In recent years, researchers have been able to model an increasing range of aspects of student interaction with online learning environments, including affect, meta-cognition, robust learning, and engagement. In this talk, I discuss how automated detectors of engagement and learning can be used in prediction of long-term student outcomes, illustrating this with examples of how affect, engagement, and learning during middle school use of educational software can support prediction of student long-term success, including end-of-year learning, decisions about whether to attend college, and even what major a student chooses. These predictive models can in turn support inference about what factors make a specific student at-risk for poorer learning or lower long-term engagement in learning.

Ryan Baker is Associate Professor of Cognitive Studies at Teachers College, Columbia University. He earned his Ph.D. in Human-Computer Interaction from Carnegie Mellon University. Dr. Baker was previously Assistant Professor of Psychology and the Learning Sciences at Worcester Polytechnic Institute, and served as the first technical director of the Pittsburgh Science of Learning Center DataShop, the largest public repository for data on the interaction between learners and educational software. He is currently serving as the founding president of the International Educational Data Mining Society, and as associate editor of the Journal of Educational Data Mining. His research combines educational data mining and quantitative field observation methods to better understand how students respond to educational software, and how these responses impact their learning. He studies these issues within intelligent tutors, simulations, multi-user virtual environments, and educational games.

Timothy McKay

April 8th, 2015:  Arthur F. Thurnau Professor of Physics, University of Michigan

Hail to the Data: What We're Learning from Learning Analytics

Missed the talk?  Watch it here.

At the University of Michigan today, many interactions among teachers and students are mediated by technology. Students use clickers in class, do homework online, write and revise papers and project in the cloud, and produce video of presentations. This 'digital exhaust' gives us unprecedented opportunities to understand teaching and learning and improve student success. A new field of Learning Analytics is emerging to take advantage of this opportunity. Professor Timothy McKay’s presentation will introduce this topic using examples from a variety of local projects.

Dragan Gašević

April 17th, 2015:  From the School of Education at the University of Edinburgh

Do counts of digital traces count for learning?

Missed the talk?  Watch it here.

The analysis of data collected from user interactions with educational and information technology has attracted much attention as a promising approach for advancing our understanding of the learning process.  This promise motivated the emergence of the new field learning analytics and mobilized the education sector to embrace the use of data for decision-making. This talk will first introduce the field of learning analytics and touch on lessons learned from some well-known case studies. The talk will then identify critical challenges that require immediate attention in order for learning analytics to make a sustainable impact on research and practice of learning and teaching. The talk will conclude in discussing a set of milestones selected as critical for the maturation of the field of learning analytics. The most important take away from the talk will be that learning analytics are about learning and that computational aspects of learning analytics need to be integrated deeply with educational research and practice.

Dragan Gasevic is Professor and Chair of Learning Analytics and Informatics in the Schools of Education and Informatics at the University of Edinburgh. As the incoming President (2015-2017) and a co-founder of the Society for Learning Analytics Research (SoLAR), he has had the pleasure to serve as a founding program co-chair of the International Conference on Learning Analytics & Knowledge (LAK) in 2011 and 2012, founding program co-chair of the Learning Analytics Summer Institute (LASI) in 2013 and 2014, and a founding editor of the Journal of Learning Analytics since 2013. Computer scientist by formal education, Dragan considers himself a learning scientist whose research centers on learning analytics, self-regulated and social learning, higher education policy, and data mining. The award-winning work of his team on the LOCO-Analytics software is considered one of the pioneering contributions in the growing area of learning analytics. Recently, he has founded ProSolo Technologies Inc that developed a software solution for tracking, evaluating, and recognizing competencies gained through self-directed learning and social interactions. He is a frequent keynote speaker and a (co-)author of numerous research papers and books.

Quantitative Humanities Series (2014-15)

Walter Scheidel

From the Department of Classics, Stanford University 

October 7th, 2014

Quantitative Models for Ancient Historians

Realistic simulation of historical processes is a final frontier for the study of the past. The ultimate purpose of simulation is to test causal hypotheses regarding the nature of the determinants of observed outcomes. This approach rests on the ability to assess the impact of different variables in an interactive model, an ability that requires concurrent consideration of factors such as geography, ecology (climate, land cover), natural endowments (such as mineral resources), the distribution of population, and the real cost of connectivity in terms of time and price, which is itself a function of geographical, infrastructural and technological conditions as well as institutional constraints. Recent years have witnessed considerable progress in discrete areas of simulations. The most notable examples include increasingly sophisticated raster-based simulation of state formation (PNAS 110, 2013, 16384-9) and geospatial modeling of the patterning of connectivity created by real transfer costs (Orbis 2.0, June 2014). We now also have access to spatial models for population and land use (e.g. HYDE), as well as to a growing number of geo-referenced datasets for various features such as settlements and certain types of deposits. What is still missing is proper integration of all these diverse elements, which is a vital precondition for meaningful multivariate simulation and hypothesis testing. Cooperation among different project teams was established in 2013/14 in order to pursue this goal, both specifically for Roman history and more globally. This paper, which draws on an international collective effort, seeks to illustrate the potential and challenge inherent in this ongoing endeavor by means of case studies (currently in progress) that focus on the properties of economic and political connectivity in the Roman world.

Lauren Klein

From the School of Literature, Media and Communication, Georgia Institute of Technology

April 15th, 2015

Exploratory Thematic Analysis for Historical Newspaper Archives

How do humanities scholars make sense of new or otherwise unfamiliar archives? Is there a role for computational text analysis in the process of sensemaking? In this talk, I propose that topic modeling, when conceived as a process of thematic exploration, can provide a new entry-point into the sensemaking process. I will present new research from the Georgia Tech Digital Humanities Lab on a software tool called TOME: Interactive TOpic Model and MEtadata Visualization, designed to support the exploratory thematic analysis of digitized archival collections. TOME is centered around a set of visualizations intended to facilitate the interpretation of the topic model and its incorporation into extant humanities research practices. In contrast to other topic model browsers, which present the model on its own terms, ours is informed by the process of conducting early-stage humanities research. This talk will thus also demonstrate the conceptual conversions--in terms of both design and process-- that interdisciplinary collaboration necessarily entails. In making these conversions explicit, and exploring the implications of their successes and failures, my collaborators and I take up the call, as voiced by Johanna Drucker (2011), to resist the “intellectual Trojan horse” of visualization. We seek to model a new mode of interdisciplinary inquiry, one that brings the methodological emphasis of the digital humanities to bear on the practices of humanities research and computer science alike. 

Complex Network Series (2013-14)

Aaron Batista

From the department of Bioengineering, University of Pittsburgh

Neural Constraints on Learning 

Why are some behaviors easier to learn than others? New behaviors must require new patterns of neural activity. Some new neural activity patterns must be easier to generate than others, but what makes the difference? We use the paradigm of a closed-loop brain-computer interface to encourage animals to exhibit new patterns of neural activity. We find that the extent to which a novel brain-to-behavior mapping can be learned is a function of the current state of the network of neurons controlling behavior. This means that the ease or difficulty with which we can learn new behaviors might be determined by interactions among networks of neurons. 

Cohosted by the Graduate Neuroscience Program in the Graduate Division of Biological and Biomedical Sciences

Caroline Buckee

From the department of Epidemiology, Harvard University

Challenges in Modeling Malaria Parasite Infection Dynamics and Evolution for Elimination Planning

Cohosted by the department of Biology

Rachel Kranton

From the department of Economics, Duke University

Strategic Interaction and Networks

A presentation will be made about the following paper: this paper brings a general network analysis to a wide class of games, including strategic innovation, public goods, investment, and social interactions. The major interest, and challenge, is seeing how network structure shapes outcomes. We have a striking result. Equilibrium conditions depend on a single number: the lowest eigenvalue of a network matrix. When the graph is sufficiently tight (as measured by this eigenvalue), there is a unique equilibrium. When it is loose, stable equilibria always involve extreme play where some agents take no actions at all. We combine tools from potential games, optimization, and spectral graph theory to solve for all Nash and stable equilibria. This paper is the first to uncover the importance of the lowest eigenvalue to social and economic outcomes, and we relate this measure to different network link patterns. 

Cohosted by the department of Economics

Missed the talk? Watch Dr. Kranton's presentation HERE

Melanie Mitchell

From the department of Computer Science, Portland State University

Using Analogy to Discover the Meaning of Images

Enabling computers to understand images remains one of the hardest open problems in artificial intelligence. No machine vision system comes close to matching human ability at identifying the contents of images or visual scenes or at recognizing similarity between different scenes, even though such abilities pervade human cognition. In this talk I will describe research--currently in early stages--on bridging the gap between low-level perception and higher-level image understanding by integrating a cognitive model of pattern recognition and analogy-making with a neural model of the visual cortex.

Cohosted by the Center for Mind, Brain, and Culture (CMBC) and the department of Biology

Peter Mucha

From the department of Mathematics, UNC Chapel Hill

Communities in Networks

Network science is an interdisciplinary endeavor with methods and applications drawn from across the natural, social, and information sciences. A prominent problem in network science is the algorithmic detection of tightly connected groups of nodes known as communities. Community detection has been used successfully in a number of applications, some of which we highlight in this talk. We also discuss the extension of community detection to multilayer networks, a general framework that allows studies of community structures in networks that change over time and/or have multiple types of links. No prior knowledge about community detection in networks will be assumed for this presentation.

Cohosted by the department of Mathematics and Computer Science

Missed the talk? Watch Dr. Mucha's presentation HERE

Amanda Murdie

From the department of Political Science, University of Missouri

Help or Hindrance? The Role of Humanitarian Military Interventions in Human Security NGO Operations

How do humanitarian military interventions influence the work of NGOs?  Previous work has found that the joint presence of military and NGO actors is essential for the fulfillment of the most complex human security tasks after humanitarian disasters, like improvements in government human rights performance and  economic development.  NGOs were better able to fulfill their human security objectives when humanitarian military interventions were present, arguably because military interveners provide logistical support that aids in collaboration between various humanitarian actors, including NGOs, and because military interveners provide security. In this piece, we use both network analysis methods to examine the process through which military interventions improve the ability of NGOs to connect to each other and econometric methods to examine the ways in which interventions influence the violence NGOs face from domestic actors.  Using a dataset of over 2,500 human security organizations involved in states with a history of humanitarian disasters, we find that human security NGOs involved in countries where there is a humanitarian military intervention benefit in terms of their network ties to other NGOs. 

Note: This work/research was funded by the Ewing Marion Kauffman Foundation. The contents of this publication are solely the responsibility of the Grantee.

Cohosted by the department of Political Science

Missed the talk? Watch Dr. Murdie's presentation HERE

Kanaka Rajan

Generation of Sequences Through Reconfiguration of Ongoing Activity in Neural Networks: A Model of Choice-Specific Cortical Dynamics in Virtual Navigation

Complex timing tasks are the basis for experiments identifying the neural correlates of behaviors like memory-based decision making in brain areas like the posterior parietal cortex (PPC). Recently, cellular-resolution imaging of neural activity in PPC during a virtual memory-guided 2-alternative forced choice task [Harvey, Coen & Tank, 2012] showed that individual neurons had transient activation staggered relative to one another in time, forming a sequence spanning the entire duration of the task. Motivated by these results, our goal here is to develop a computational framework that reconciles the emergence of biologically realistic assemblies or trajectories of activity states, with the ability of the same neural population to translate sensory information into long time-scale behaviors.  
We build an echo state network to test our hypothesis that during memory-based decision making, sensory cues set up an initial network state that follows the intrinsic dynamics of the brain area to generate activity underlying a behavioral response. We start with a firing rate network which exhibits rich ongoing dynamics correlated over numerous time scales. This network acts as a dynamic reservoir whose modes can be tapped to perform the task through minimal reconfiguration or partial in-network (PIN) training, and not complete rewiring. 1) Only weights carrying the inputs to a subset of neurons are subject to change. 2) There is no external unit that feeds back network output, nor learning of readout weights. 3) The learning rule targets as many units as required for the task, the fraction varying with task demands. 
We show that the PINned network performs a timing task involving a sensory cue, its storage in working memory during a delay period, and response to its retrieved trace. We change the fraction of trained units until the duration and shape of the sequential activation pattern in the network is comparable with PPC neurons imaged during the task. Further, we show that like the PPC, the network's activity is specific to cue/outcome pairing, temporally confined to a task epoch, and contains similar levels of extra-sequential noise. Notably, since no sequence-specific wiring diagram is embedded a priori units remain spatially intermixed, like the PPC where there is no topological organization of active neurons. Finally, we study the properties and functional consequences of the synaptic connectivity matrix which is initially random but acquires non-normal features because of PINning. We are currently exploring whether such partially trained networks can be extended to simulate our ability to generalize across different task conditions.
Cohosted by the Emory Georgia Tech Training Program in Computational Neuroscience (CNTG) and the department of Physics

Alessandro Vespignani

From the department of Physics, Northeastern University 

Modeling and Forecast of Socio-Technical Systems in the Data-Science Age

In recent years the increasing availability of computer power and informatics tools has enabled the gathering of reliable data quantifying the complexity of socio-technical systems. Data-driven computational models have emerged as appropriate tools to tackle the study of contagion and diffusion processes as diverse as epidemic outbreaks, information spreading and Internet packet routing. These models aim at providing a rationale for understanding the emerging tipping points and nonlinear properties that often underpin the most interesting characteristics of socio-technical systems. Here I review some of the recent progress in modeling contagion and epidemic processes that integrates the complex features of heterogeneities of real-world systems.

Cohosted by the department of Biostatistics and Bioinformatics

Quantitative Humanities Series (2013-14)

Peter Bol

From the department of East Asian Languages and Civilizations, Harvard University

Vice Provost for Advances in Learning

Geography, Networks, and Prosopography in China's History

The China Historical GIS (covering 221 BCE-1911CE) and the China Biographical Database (300,000 figures mainly from the 7th-early 20th century) provide data for new approaches to China's history. Yet the methodologies for compiling, organizing, and analyzing this data requires historians to make conceptual leaps from the time-worn and familiar to the profoundly different. 

Cosponsored by the Emory Center for Digital Scholarship and the Department of History

Matthew Jockers

From the department of English, University of Nebraska-Lincoln 

Computing the Shape of Stories: A Macroanalysis

Jockers will open his lecture with an argument about the applicability of quantitative methods to literary studies. He'll offer his answer to the "so what" question that is frequently asked by humanists who are unaccustomed to thinking about literature as data on the one hand and quantitative evidence on the other. After sketching the broad outlines of how quantitative data might and should be employed in literary studies, Jockers will move to a "proof of concept" derived from his own recent work charting plot structure in 40,000 narratives. In this section Jockers will discuss how he employed tools and techniques from natural language processing, sentiment analysis, signal processing, and machine learning in order to extract and compare the plot structures of novels in a corpus of texts spanning the two hundred year period from 1800-2011. He'll explore the six core plot archetypes revealed by the technique and how these shapes change from the 19th to the 20th century. He'll then compare the plot structures of 1,800 contemporary best sellers to the larger corpus in order to suggest that at least one element of market success is related to plot shape.


Cohosted by the Emory Center for Digital Scholarship and the department of History

Ted Underwood 

From the department of English, University of Illinios at Urbana-Champaign

Ted Underwood is Associate Professor of English at the University of Illinois, Urbana-Champaign, and the author of two books on eighteenth- and nineteenth-century literary history, including Why Literary Periods Mattered (Stanford, 2013). He is currently developing models of genre in eighteenth- and nineteenth-century books, supported by a Digital Humanities Start-Up Grant from the NEH and an ACLS Digital Innovation Fellowship. A collaborative essay with Andrew Goldstone, topic-modeling the history of literary scholarship, is forthcoming in New Literary History.

Beyond Tools: The Shared Questions about Interpretation that Link Computer Science to the Humanities 

The phrase "digital humanities" suggests an encounter with digital technology itself -- which might involve departments of computer science only indirectly, as creators of tools. But as collaborations between humanists and computer scientists grow more common, it's becoming clear that these disciplines are working in parallel on shared, surprisingly fundamental questions. For instance, computer scientists want to understand how we learn to generalize about latent categories from limited evidence, which is a good part of what humanists do when we "interpret an archive" or "develop a theory." Instead of treating CS as a source of tools, some humanists are starting to approach the discipline as a theoretical interlocutor, analogous to linguistics or anthropology. What might that conversation look like concretely? I'll flesh out some possibilities, briefly describing collaborative research on literary character with David Bamman (CS, Carnegie Mellon), and reflecting more generally on the humanistic value of model-building. I'll also acknowledge some of the social divisions that make this conversation risky.


Cohosted by the Emory Center for Digital Scholarship and the department of History.

Big Data Series (2012-13)

Steve Cole

From the David Geffen School of Medicine, UCLA

Finding Meaning in Big Genomic Data

      This presentation considers how adding constraints from other levels of analysis can help identify islands of biological meaning within ultra-high-dimensional spaces created by genomic big data. Focusing on DNA in action how genes respond to and influence their molecular ecology provides a set of common fate and mass action models that focus the analytic search space. Additional constraints imposed by social and ecological systems provide a meta-genomic framework for understanding genomes as a joint product of both their own internal regulatory logic and the constraints and affordances of their environment. Hybridizing abstract models of inter-level constraint with supervised machine learning provides a new approach to ¿informed search through big data. 

Cosponsored by the department of Biology

Missed the talk? Watch Dr. Cole's presentation HERE

Mark Dredze

From the Department of Computer Science, John Hopkins University

Public Health in Twitter: What's in there?

     Twitter and other social media websites contain a wealth of information about populations, and has been used to track sentiment towards products, measure political attitudes, and study social linguistics. In this talk, we investigate the potential for Twitter and social media to impact public health research. Broadly, we explore a range of applications for which social media may hold relevant data, including disease surveillance, public safety, and drug usage patterns. To uncover these trends, we develop new statistical models that can process vast quantities of data and reveal trends and patterns of interest to public health. Our results suggest that social media has broad applicability for public health research.

Cosponsored by the department of Mathematics & Computer Science

Dan Edelstein

From the Department of French and Italian, Stanford University

How to Read a Million Letters

The Humanities have entered the age of Big Data. Does this also mean that quantification will make a comeback? Already some scholars are announcing the return of ¿cliometrics.¿ In this talk, I examine the place that quantification can play in humanistic projects, particularly those that rely on messy datasets. While I readily grant that quantification is a useful and even necessary tool in digital humanities, I argue that it must still be supplemented by the qualitative, hermeneutic skills that humanists have honed over time.

Cosponsored by the department of History

Missed the talk? Watch Dr. Edelstein's presentation HERE

David Figlio

Institute for Policy Research, Northwestern University

The Effect of Poor Neonatal Health on Cognitive Development: Evidence from a Large New Population of Twins

     Several recent studies show that poor neonatal health (proxied by low birth weight) has persistent effects into adulthood by reducing both an individual's level of educational attainment as well as adult earnings, but little is known about effects before age 18. This paper makes use of a large new population of twins from Florida to study this question. We find that the effects of poor neonatal health on student outcomes are remarkably invariant. The estimates are virtually identical from third grade through tenth grade. They are the same regardless of whether a student attended a "better" school versus a "worse" school, across racial and ethnic groups, and across maternal education levels. However, the effects grow in magnitude between the start of kindergarten and the end of third grade. These results suggest an important potential role for early childhood and early elementary investments in remediating this persistent condition.

cosponsored by the department of Economics

Eric Green

Director of the National Human Genome Research Institute

Entering the Era of Genomic Medicine: Opportunities and Challenges

The Human Genome Project¿s generation of a reference human genome sequence was a landmark scientific achievement of historic significance. It also signified a critical transition for the field of genomics, as the new foundation of genomic knowledge started to be used in powerful ways by researchers and clinicians to tackle increasingly complex problems in biomedicine. To exploit the opportunities provided by the human genome sequence and to ensure the productive growth of genomics as one of the most vital biomedical disciplines of the 21st century, the National Human Genome Research Institute (NHGRI) is pursuing a broad vision for genomics research beyond the Human Genome Project. This vision includes using genomic data, technologies, and insights to acquire a deeper understanding of genome function and biology as well as to uncover the genetic basis of human disease. Some of the most profound advances are being catalyzed by revolutionary new DNA sequencing technologies; these methods are producing prodigious amounts of DNA sequence data as part of studies aiming to elucidate the complexities of genome function and to unravel the genetic basis of rare and complex diseases. Together, these developments are ushering in the era of genomic medicine.

Cosponsored by the Cherry L. Emerson Center for Scientific Computation

Missed the talk? Watch Dr. Green's presentation HERE

Gary King

from the Department of Government, Harvard University

How Censorship in China Allows Government Criticism but Silences Collective Expression 

     We offer the first large scale, multiple source analysis of the outcome of what may be the most extensive effort to selectively censor human expression ever implemented. To do this, we have devised a system to locate, download, and analyze the content of millions of social media posts originating from nearly 1,400 different social media services all over China before the Chinese government is able to find, evaluate, and censor (i.e., remove from the Internet) the large subset they deem objectionable. Using modern computer-assisted text analytic methods that we adapt to and validate in the Chinese language, we compare the substantive content of posts censored to those not censored over time in each of 85 topic areas. Contrary to previous understandings, posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored. Instead, we show that the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content. Censorship is oriented toward attempting to forestall collective activities that are occurring now or may occur in the future¿and, as such, seem to clearly expose government intent. 

cosponsored by the department of Political Science

Learn more about Dr. King - Listen to Dr. King's NPR interview here

Lillian Lee

from the Department of Computer Science, Cornell University


     We will discuss two projects exploring the interplay between language and influence as revealed by computational analysis of large data sets. (1) We show that power differentials among group participants are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. We consider multiple types of power: status differences (which are relatively static), and dependence (a more ''situational'' relationship). Using a precise probabilistic formulation of the notion of linguistic coordination, we look at two very different settings: discussions among Wikipedians and arguments before the U.S. Supreme Court. (2) What information achieves widespread public awareness? We consider whether, and how, the way in which the information is phrased¿the choice of words and sentence structure¿can affect information's memorability. We introduce an experimental paradigm that seeks to separate contextual from language effects, using movie quotes as our test case. We find that there are significant differences between memorable and non-memorable quotes in several key dimensions, even after controlling for situational and contextual factors.

cosponsored by the department of Political Science and Mathematics & Computer Science

Missed the talk? Watch Lillian Lee's presentation HERE

Marcel Salathé

From the Department of Biology, Penn State

The Dynamics of Vaccination Sentiments on a Large Online Social Network

Modifiable health behaviors, a leading cause of illness and death in many countries, are often driven by individual beliefs and sentiments about health and disease. Individual behaviors affecting health outcomes are increasingly modulated by social networks, for example through the associations of like-minded individuals - homophily - or through peer influence effects. Using a statistical approach to measure the individual temporal effects of a large number of variables pertaining to social network statistics, we investigate the spread of a health sentiment towards a new vaccine on Twitter, a large online social network. We find that the effects of neighborhood size and exposure intensity are qualitatively very different depending on the type of sentiment. Generally, we find that larger numbers of opinionated neighbors inhibit the expression of sentiments. We also find that exposure to negative sentiment is contagious - by which we merely mean predictive of future negative sentiment expression - while exposure to positive sentiments is generally not. In fact, exposure to positive sentiments can even predict increased negative sentiment expression. Our results suggest that the effects of peer influence and social contagion on the dynamics of behavioral spread on social networks are strongly content-dependent.

Cosponsored by the department of Biology

Missed the talk? Watch Dr. Salathé's presentation HERE