Introduction to Data Analysis with R

.absolute.top-0.right-1.tr.w-10[
![](https://raw.githubusercontent.com/jmbuhr/dataIntro20/master/images/hex.png)
]

---
name: title
class: left bottom hide-count
background-color: #FBFCFF;

<div class="talk-meta">
<div class="talk-title">
<h1>Introduction to Data Analysis with R</h1>
<p>Lecture 5: The Nature of Randomness</p>
</div>
<div class="talk-author">
Jannik Buhr
<br/>
<span>Heidelberg University, WS20/21</span>
</div>
<div class="talk-date">2020-11-29</div>
</div>

.absolute.bottom-0.right-1.mid-gray[
With Artwork by @allison_horst
]

---

> To understand statistics means understanding the nature of randomness first.

---

---

---

<div class="figure">
<img src="img/paste-5E31A2FF.png" alt="Artwork by @allison_horst" width="213" class=.external />
<p class="caption">Artwork by @allison_horst</p>
</div>

---

## Definitions

- **alternative hypothesis** ( `$H_1$` ) ("I am the better player.")
- **null hypothesis** ( `$H_0$` ) ("This is just luck")

## `$\rightarrow$` to R!

---

## Making Decisions

- How likely is certain event is under the assumption of the null hypothesis (only chance)?
- Decide on some threshold `$\alpha$`, at which we reject the null hypothesis.
- This is called the **significance threshold**.
- For `$P[X \ge x]< \alpha$`: **statistically significant**
- This probability is called the **p-value**.

---

> »A p value is not a measure of how right you are,
  or how significant the difference is;
  it’s a measure of how surprised you should be if there is no actual difference
  between the groups, but you got data suggesting there is.
  A bigger difference, or one backed up by more data,
  suggests more surprise and a smaller p value.«
> — Alex Reinhart [@reinhartStatisticsDoneWrong2015]

---

[I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here's How.](https://io9.gizmodo.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800)
by John Bohannon

---

## Example: Medical Testing

- Sensitivity = Power = true positive rate = `$1-\beta$`
- Specificity = true negative rate = `$1-\alpha$`

Let us assume a test with a
sensitivity of 90% and a specificity
of 92%.

- 1000 people
- 10 positive
- 9 tested true positive
- 1 false negative
- 79 false positives

Probability of being positive after positive test:

`$$\frac{true~positives}{true~positives + false~positives}=10\%$$`

Formally, this is described by Bayes's Formula

`$$P(A|B)=\frac{P(B|A)*P(A)}{P(B)}$$`