Counting Characters

Using python and R to improve my typing experience.

Jannik Buhr
11-08-2020

The Why

In this post I am doing something slightly unusual. I am using python (in an Rmarkdown document of course) to count the key utilization of R and Rmarkdown files. The goal is twofold:

Firstly, I want to get better at python, beyond theoretical studies and watching videos, so I have to practice. Secondly, I recently acquired a new keyboard (the moonlander) and while I tried to stick to a layout that makes transitioning between my laptop and the keyboard easy, I still have some space left on on of my symbols layer that I want to populate with the most used keys. And of course we will be doing this based on data! So, without further ado, let’s jump in.

The How

I wasn’t sure how many R files I had in my projects folder, so I resorted to python’s generator expressions to make sure to not load everything into memory at once.

from glob import iglob
from collections import Counter
from functools import reduce

path = "/home/jannik/Documents/projects/rstats"

def get_count(p):
  try:
    with open(p, 'r') as f:
      cs = [c for x in f.readlines()
              for c in x]
      return {char : cs.count(char) for char in set(cs)}
  except:
    return {'a':0}

def add_counts(d1, d2):
  result = Counter(d1) + Counter(d2)
  return result

Using functional programming techniques, this process becomes rather compact. A map of character counts per file is reduced to a single dictionary of counts.

files = iglob(path + "/*/**/*.R", recursive = True)
counts = map(get_count, files)
result = dict(reduce(add_counts, counts))

The Result

And now back to R for visualization. The reticulate package translates python dictionaries to named lists in R, so we are good to go.

library(tidyverse)

res <- enframe(py$result) %>% 
  mutate(value = unlist(value)) %>% 
  arrange(desc(value))

res %>% 
  mutate(name = case_when(
    name == " " ~ "space",
    name == "\n" ~ "enter",
    TRUE ~ name
  )) %>%
  filter(!(tolower(name) %in% c(letters, "space", "enter", 0:9))) %>% 
  slice_max(value, n = 20) %>% 
  mutate(name = fct_reorder(name, value)) %>% 
  ggplot(aes(value, name, fill = value)) +
  geom_col(color = "black", width = 0.8) +
  geom_text(aes(label = name),
            hjust    = -1,
            fontface = "bold") +
  scale_fill_viridis_c() +
  scale_x_continuous(expand = expand_scale(c(0.01, 0.05))) +
  scale_y_discrete(labels = NULL) +
  guides(fill = "none") +
  theme_minimal() +
  theme(axis.ticks.y = element_blank()) +
  labs(y       = "",
       x       = "count",
       caption = "excluding letters and numbers")

The overwhelming result: I need a better place for the minus sign! Not because I am subtracting numbers all the time, but because it is also in the assignment operator <-, which can be inserted in RStudio using Alt+Minus.

This is what my layout currently looks like.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/jmbuhr/jmbuhr.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Buhr (2020, Nov. 8). jmbuhr.de: Counting Characters. Retrieved from https://jmbuhr.de/posts/2020-11-08-counting-characters/

BibTeX citation

@misc{buhr2020counting,
  author = {Buhr, Jannik},
  title = {jmbuhr.de: Counting Characters},
  url = {https://jmbuhr.de/posts/2020-11-08-counting-characters/},
  year = {2020}
}