Sharpening the Axe

Reproducible Data Analysis at the Speed of Thought

Jannik Buhr

Heidelberg Institute for Theoretical Studies

2023-12-19

Welcome

Otters have a favorite Rock 🪨

“The mean number of blows necessary to open a mussel was 35.5 […].
The same stone was frequently retained for several successive food items, […]”

Hall and Schaller (1964)

 

Otters have a favorite Rock 🪨

“Give me six hours to cut down a tree and I will spend the first four sharpening the axe.”

— Abraham Lincoln

Otters have a favorite Rock 🪨

“Give me six minutes to open a mussel and I will spend the first four finding the perfect rock.”

Abraham Otterham Lincoln

Humans are toolmakers 🪓

Humans don’t just use tools.

We shape and perfect them.

Who am I? 🦦

Hi, I’m Jannik!

Computational Biochemist at
HITS and Heidelberg University

I use Quantum Mechanical Simulations to study how Collagen breaks under force

I love building and teaching tools 🧰

  • @jmbuhr

The Tools Today 🧰

Quarto

https://quarto.org/

Next iteration of Rmarkdown.

Targets

https://docs.ropensci.org/targets/

R package for reproducible workflows.

Neovim

https://neovim.io/

hyperextensible Vim-based text editor

Warmup!

Remember these letters

00:07

xfwiadcnanvrqybceawgjdczlyhlwoovxxce

Remember these letters

How many of these letters do you remember?

Remember these letters

00:07

The quick brown fox jumps over the lazy dog

Remember these letters

Do you remember the sentence on the previous slide?

Of course you do, it’s just one sentence!

But it has the same number of letters as the random letters.

The Magic 7 ± 2

One piece of information is a chunk.

We can keep 7 (± 2) chunks in our working memory (Miller 1956).

We gain space by combining thoughts and concepts

Letters → Words → Sentences → Concepts

Optimize Your Workflow

Quarto: Structure Thoughts

https://quarto.org/

An open-source scientific and technical publishing system.

  • Keep thoughts, code and results close
  • Cater to different output formats
    • web, pdf, docx, ppt …
  • Fully reproducible





qmd files are just plain text!

---
title: "quarto demo"
format: 
  html:
    code-fold: true
---

## Air Quality

@fig-airquality further explores the impact of temperature on ozone level.

```{r}
#| label: fig-airquality
#| fig-cap: "Temperature and ozone level."

library(ggplot2)

ggplot(airquality, aes(Temp, Ozone)) + 
  geom_point() + 
  geom_smooth(method = "loess"
)
```

Quarto: Structure Thoughts

🖥️📱📰

  • Why did I perform this operation?
    • Read a detailed explanation of past-me right next to the code!
  • Different and changing output formats?
    • Just let quarto generate them for you!
  • Could you increase the fontsize in all your plots?
    • Sure, just change one number and re-render!

“I have a machine learning model that takes 6 hours to calculate. […] how do I put this in a Quarto notebook?”

Well, don’t!

Targets: Organize Workflows

https://docs.ropensci.org/targets/

Function-oriented Make-like declarative workflows for R.

Aim for pure functions that take inputs and produce outputs without side effects.

NeoVim: Not just for Speed!

NeoVim: Not just for Speed!

NeoVim: Not just for Speed!

Modal editing allows us to communicate with the editor.

Complicated operations become one chunk.

Free up resources for the important questions.


  • modes
    • normal, insert, visual
  • verbs
    • change, delete, paste etc.
  • nouns (text objects)
    • word, sentence, paragraph, block, parenthesis, function etc.
  • movements
    • “to end of line”, “to beggining of the document”, “down 3 lines” etc.

Demo

Demo Time!

Take Home Messages

Take Home Messages

Keyboard shortcuts are for thoughts, not just speed ⌨️

Reproducibility and Interactivity can go hand in hand ▶️

Find your favorite rock 🪨

And sharpen the axe 🪓

Slides: https://jmbuhr.de/2023-workflow

  • @jmbuhr

References

Hall, K. R. L., and George B. Schaller. 1964. “Tool-Using Behavior of the California Sea Otter.” Journal of Mammalogy 45 (2): 287–98. https://doi.org/10.2307/1376994.
Miller, George A. 1956. “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information.” Psychological Review 63 (2): 81–97. https://doi.org/10.1037/h0043158.

Backup Slides

Aside: Pure Functions

functions A A B B A->B f C C A->C h B->C g

\[ g \circ f = h\ \]

pure

f = function(data, cutoff) {
  data |>
    filter(x < cutoff) |>
    mutate(x = x * pi)
}

vs. Side Effects

f_prime = function(cutoff) {
  data <<- data |>
    filter(x < cutoff) |>
    mutate(x = x * pi)
}