Introduction to Data Science: Data Analysis and Prediction Algorithms with R

This textbook introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression, and machine learning. It also helps you develop skills such as R programming, data wrangling with dplyr, data visualization with ggplot2, algorithm building with caret, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation with knitr and R markdown. The book is divided into six parts: R, Data Visualization, Data Wrangling, Statistics with R, Machine Learning, and Productivity Tools. Each part has several chapters meant to be presented as one lecture and includes dozens of exercises distributed across chapters.

This book started out as the class notes used in the HarvardX Data Science Series.

You can buy a hard copy of the book here

A free PDF copy is available from Leanpub.

A free online version of the book is available here.

This book was written in bookdown and can be regenerated from scratch. All the R markdown files needed to do this are available on GitHub.

Data Analysis for the Life Sciences

This book will cover several of the statistical concepts and data analytic skills needed to succeed in data-driven life science research. We go from relatively basic concepts related to computing p-values to advanced topics related to analyzing high-throughput data. While statistics textbooks focus on mathematics, this book focuses on using a computer to perform data analysis. Instead of explaining the mathematics and theory, and then showing examples, we start by stating a practical data-related challenge. This book also includes the computer code that provides a solution to the problem and helps illustrate the concepts behind the solution. By running the code yourself, and seeing data generation and analysis happen live, you will get a better intuition for the concepts, the mathematics, and the theory.

This book started out as the class notes used in the Data Analysis for the Life Sciences HarvardX Series

A free PDF copy is available from Leanpub.

The book was created using the R markdown language and we make all this code available from GitHub.