Git is a free, open-source version control system available for download online. This tool can be used for managing a code base and can be extremely helpful when multiple people are working on the same project. It creates isolation for individuals to work on new features and allows them to easily merge those changes together. Learning to use this tool can be beneficial for anyone who needs to organize code.
Using R to Connect to an SQL Server/MySQL Database
When working with data stored on an SQL server or MySQL database you can organize and export data from the database to a file that can be imported to R to run statistical analyses. However, rather than exporting the data to do this, you can use R to connect directly to the database. This can be beneficial for researchers who have an ongoing project for which data is being updated frequently. Using this method, researchers can gain direct access to their data without first having to export the data from the database.
Creating Graphics in R
If you are interested in learning how to create visually appealing figures and other graphics in R for papers, presentations, or just for fun, then learning how to use the packages ggplot, xyplot, and rcharts can be extremely useful. All three R packages allow the user to easily customize graphics.
Creating a package in R can easily allow a researcher or regular R user to not only manage and share functions and datasets, but also save memory when not working with these functions or datasets by only loading what you need when you need it. Packages can be created and shared privately amongst colleagues that are working on the same research project and using the same functions for analyses and data management. Packages can also be used on a larger scale as a way to share statistical methodology in R.
Reproducible Research in R
Being able to replicate and reproduce research is a core method used to validate research findings. R can be used to facilitate the reproducible research process by using R packages designed for literate programming, which create documents that are combinations of data analysis code, the logic behind the methods, and instructions for reproduction. These tools can be used to create documents that can be submitted with research publications.
Creating S3 and S4 Class Objects in R
As with other object-oriented programming languages, classes serve as a template for creating objects in R. Depending on your needs, you may want to create S3 class or S4 class objects. The different classes allow for different levels of template-customization. S3 classes are simpler and easier to implement whereas S4 classes are more structured and complex. Learn which classes best suit your needs for your project.
Managing Big Data in R
If you are working with a very large dataset, learning how to use big data management tools in R can be extremely beneficial and time-saving. The data.table package in R allows a user to easily manage a large data set quickly. The plyr package can be used to split large datasets into subsets, apply functions to those subsets, and then combine the results or to quickly find summary statistics for different groups within a large dataset.
Caret Package for Data Mining
The caret (classification and regression training) package in R optimizes the process of creating predictive models in R, making it easier to explore large datasets for patterns.
Python / Pandas
Data analysis can be done using the Python programming language and the Pandas package. The Pandas package can be very useful when working with tabular data, time series data, matrix data, and other forms of observational or statistical data sets.
- Gephi (Open source software for network visualization and analysis)
- Network Workbench (Toolkit for network analysis, modeling, and visualization)
- Pajek (A free program for large network analysis with access to network datasets)
- Qualtrics (an easy-to-use online survey software that’s powerful enough to perform even the most sophisticated research, capture insights for process improvement, program feedback, and more)
- Complex Network Resources (Software, online resources, and datasets)
- Network Data (Lists of accessible complex network datasets)
- Using Metadata to find Paul Revere (An introduction to social network analysis using Paul Revere as an example)
- Complexity Digest (Networking resource for researchers studying complex networks; listings of complexity-related publications; and forums for discussion)
- Economics of Networks (Website of Nicholas Economides, professor of economics at the Stern School of Business, NYU)
R is the primary statistical software used in our introductory statistics course QTM100, the Quantitative Sciences Major, and the joint Applied Mathematics and Statistics major. A free software for statistical computing and graphics, R runs on Windows, Mac, and a variety of UNIX platforms.
Emory IT provides a downloadable R model that can get you started today or you can use one of the on-campus computers equipped with R.
- The RSPH Dept of Biostats/Bioinformatics Wiki
- The Brown Bag website
- The Biostats YouTube channel
- Power Points for Short R Course
- Power Point for R linear regression Brown Bag (Automatic download of the Power Point)
- R Twoturials
- R tips
- Collection of R tutorials
- Python Introduction, Resources and FAQs. An index of tutorials for performing various tasks in Python.
- Python for Ecologists. An ecology example of using Pandas to load, summarize, merge, and visualize data. The lessons are broadly applicable to social and physical sciences, although the title suggests a specific discipline.
- Automate the Boring Stuff. A free, online book that step by step guides the reader through programming in Python. No experience required!
Complex Network Tutorials
- Network Formation (A list of introductory readings, research sites, software tools, books, and journals maintained by the Department of Economics at Iowa State University)
- New England Complex Systems Institute (Online access to the full, graduate-level textbook Dynamics of Complex Systems)
- Visual Complexity (A resource for data visualization in complex networks)