Knowledge Base

Table of Contents

Table of Contents

Tips & tricks

My ultimate gotcha personal collection

Stuff I keep forgetting and I need to think about, decide on, or search for too often

1. Motivation

This is the way (The Mandalorian)

A power user might not have extensive technical knowledge of the systems they use but is rather characterized by competence or desire to make the most intensive use of computer programs or systems (Wikipedia)

2. Linux

2.1. Arch Linux

2.1.1. How to update pacman mirrors?

# https://github.com/westandskif/rate-mirrors#usage
export TMPFILE="$(mktemp)"; \
sudo true; \
rate-mirrors --save=$TMPFILE arch --max-delay=21600 \
    && sudo mv /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist-backup \
    && sudo mv $TMPFILE /etc/pacman.d/mirrorlist

2.1.2. How to fix "Pacman is currently in use, please wait"?

If you're sure that pacman is not running in another shell, you can remove the lock by hand.

sudo rm /var/lib/pacman/db.lck

Source:

2.2. Bash

2.2.1. How to execute a loop conditionally (dry runs)

#!/bin/bash
doit=0

for i in {1..3}; do
echo -n ${i}
done

echo "---"

for i in {1..3}; do
((doit))&&(echo -n ${i})
done

echo "---"

for i in {1..3}; do
((doit))&&echo -n ${i}
done

echo "---"

((doit))&&for i in {1..3}; do
echo -n ${i}
done

echo "---"

((doit))&&
for i in {1..3}; do
    echo -n ${i}
done

echo "---"

Also ((dontdoit))!!echo "something"

2.2.2. How to open the N most recently modified files?

emacs $(/bin/ls */* --sort=time | tail -n 5)

2.2.3. How to cast a string to intenger?

Cast a string to integer in bash by adding 0. It works well if the string is NULL.

NUM="99"
NUM=$(($NUM+0))

NUM=""
NUM=$(($NUM+0))

Source:

2.2.4. How to make a conditional assignment in bash for a string?

## Assign "string-true" to VAR if VARTOCHECK is equal to "string-cond"
[ $VARTOCHECK == "string-cond" ] && VAR="string-true" || VAR="string-false"

Source:

2.2.5. How to make a conditional assignment in bash for a numeric?

Use the following expression if the assignment is numerical.

## Assign 20 if true, or 10 if false
variable=$(( 1 > 0 ? 20 : 10 ))

Source:

2.2.6. How to explain shell code?

Use explainshell to match command-line arguments to their help text.

2.2.7. How to validate check shell code?

Use shellcheck for warnings and suggestions for shell scripts. It can be used online in ShellCheck.net.

2.3. Linux

2.3.1. How to restart the bluetooth service?

Use bluetoothctl to turn the bluetooth controller off and on

echo -e 'show\npower off\npower on\nquit' | bluetoothctl

Source: https://unix.stackexchange.com/a/533782/88701

2.3.2. How to read a systemd service log for today?

Don't forget to add --user if it's a user service. Also, -b is short for -b-0 (Ghosthree3 at libera.chat)

journalctl --user -u servicename -b

2.3.3. How to clean up space in /boot?

Consider removing the fallback initramfs to make temporary space. A definite solution is to expand the boot folder. Edit the linux.preset in /etc/mkinitcpio.d to disable generating a new fallback image. Helpful when pacman throws Partition /boot too full.

ls -sh /boot/initramfs-linux-fallback.img

2.3.4. How to determine what shell I am currently working on?

2.3.5. How to set Firefox to default browser in xdg settings?

Check that there is a desktop file in /usr/share/applications/firefox.desktop. If there isn't, create one:

Then, update the corresponding xdg entry xdg-settings set default-web-browser firefox.desktop. Test by running xdg-open "www.mozilla.org".

2.3.6. How to find and copy all files that match a regex?

Use rsync --parent to keep the folder structure.

find -iname '*.rds' -exec cp {} . \;
find -iname '*.rds' -exec rsync --parent {} . \;

2.3.7. How to list all files whose content match a regex?

find . -exec grep optimHess {} +

2.3.8. How to list all the unique file names?

## any file extension
find . -iname '*.*' -exec basename {} \; | sort | uniq
## a specific file extension, e.g., rds files
find . -iname '*.rds' -exec basename {} \; | sort | uniq

Source:

2.3.9. How to list memory used by processes in remote cluster?

#!/bin/sh
squeue --me -o "%R" |
while read -r host; do
    ssh -n
    "$host" "ps -o pid,user,%mem,rss,size,%cpu,command ax | grep \"r-4\"";
done

2.3.10. What is the keycode of this key?

Use xev to look for the keycode and keysym in the KeyPress or KeyRelease event

2.3.11. How to find the window dimension and position?

Run xwininfo on a console and click on the target window.

2.3.12. How to convert a pdf to png?

1920 (1080p) is often a good size

# TODO

Source: Convert every pdf in the current directory to png

2.3.13. How to switch keyboard layout to type Greek letters?

To switch between US and Greek (polytonic) keyboard layouts using Alt + Caps Locks. Add to ~/.xinitrc, or possibly ~/.config/i3/config, to make it persistent.

setxkbmap 'us,gr' -variant ',polytonic' -option grp:'alt_caps_toggle'

2.3.14. What are the keyboard layout switching keys?

2.3.16. How to set a compose key (Multi_key)?

2.3.17. What characters have superscripts and subscripts?

2.3.18. Which folder does this file belong in?

2.3.19. How to display and sort by modified date time with find command?

find . -name "*.log" | xargs ls -lt

Source:

2.3.20. How to connect OBS to Zoom or Webex?

A PulseAudio source works as an input device (microphone, monitor) sink works as an output device (speaker).

To redirect video from OBS to a new virtual camera input

  • Install the linux headers if you don't have them, e.g, sudo pacman -S linux-headers
  • Install v4l2loopback-dkms, e.g., sudo pacman -S v4l2loopback-dkms
  • Load the kernel module sudo modprobe v4l2loopback
  • Select Dummy video device as video source in the meeting app, e.g., in Zoom Settings > Video > Camera
  • Source: Open Broadcaster Software - ArchWiki

To redirect audio from OBS to a new virtual sound input

  • Create a null output device

    pulsemodule=$(pactl load-module module-null-sink sink_name=obs_audio
          sink_properties=device.description=obs_audio_sink_for_mic)
    
  • In pavucontrol, Playback tab, change the output of OBS-monitor to Null output
  • In pavucontrol, Recording tab, change the input of Zoom to Null output
  • Source:

2.3.21. Where are the X.Org Server config files?

  • /etc/X11/xorg.conf.d/ (preferred place for host-specific configurations)
  • /usr/share/X11/xorg.conf.d/
  • /etc/X11/xorg.conf (deprecated)
  • /etc/xorg.conf (deprecated)

2.3.22. How to migrate from nouveau to propietary NVIDIA drivers for a NVS 510 graphic card?

  1. Remove nouveau: sudo pacman -R xf86-video-nouveau
  2. Remove all xorg.conf files that might still refer to nouveau drivers (check in /etc/X11/xorg.conf.d/ and /usr/share/X11/xorg.conf.d/)
  3. Install the NVIDIA drivers: AUR (en) - nvidia-470xx-utils

2.4. Math typing

2.4.1. Math keyboard layout with UTF-8 support

2.4.2. Compose key for mathematics

  • Type an equation in UTF8?

3. Utilities

3.1. Git

3.1.1. What on earth is git?

[…] at the core of Git is a simple key-value data store [..,] you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.

Source: Git - Git Objects

3.1.2. How to rename a branch?

git branch -m newname

3.1.3. What to do with a branch after merging?

After the merge, it's safe to delete the branch

git branch -d branch1

Source: what to do with branch after merge?

3.1.4. How to list unmodified files?

diff <(git diff --name-only)  <(git ls-files -- sub/dir) | grep "^>" | cut -b3-

Source: is there any way to list the unmodified files in Git?

3.1.5. How to clean a git repo?

3.1.6. How to change a git repo remote from https to ssh?

git remote -v
git remote set-url origin git@server:repo.git
git remote -v

3.1.7. How to clean a .git folder that is too large?

Call the garbage collector

git gc

For binary files that change often

git repack -a -d --depth=250 --window=250

Source: how to shrink the .git folder?

3.1.8. How to stash files that were deleted from disk?

git commit -a

Source: using git commit -a

git ls-files --deleted -z | xargs -r0 git rm

Source: staging deleted files

3.1.9. How to revert file removal from git repo?

To revert all changes since last commit

git reset --hard HEAD

To revert some changes

git reset
git checkout <file-name>

Source: how to revert git rm -r

3.2. SSH

3.2.1. How to generate a new SSH key?

The path to the key has to be unique. Requires x-clip.

# Generate a new key
ssh-keygen -t ed25519 -C "your_email@example.com"
# Add key to the ssh-agent in your local machine
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
# Copy key to clipboard
xclip -selection clipboard < ~/.ssh/id_ed25519.pub

3.2.2. How to avoid logging in remotely many times?

Use ControlMaster in .ssh/config

ControlMaster auto
ControlPath ~/.ssh/%r@%h:%p

3.2.3. How to avoid typing remote addresses and ports?

Configure the host in .ssh/config once and do ssh shortname

Host shortname
User username
HostName hostname
Port 22
ServerAliveInterval 60

3.2.4. Quality of life improvements

  • ssh/config preconfigurations for shortening the ssh and sshfs invocations (see ssh_config)
  • autosf
  • lazy mount for sshfs via systemd

3.3. Regular expressions

Gotta love them!

3.3.1. How to match a keyword except when used as a tag?

To match foo in any conex except when enclosed by brackets {foo}, use negative lookarounds but be careful to match {foo and foo}. Alternatively, use PCRE with (*SKIP)(*F)|} (OnlineCop at libera.chat). Other use cases for this pattern: exclude foo in comment lines (#foo, //foo, /*foo*/), quotes ("foo", 'foo').

grep "(?<!\{)foo(?!\})"
grep -P "\{foo\}(*SKIP)(*F)|foo"

Source:

3.3.2. How to match all R dependencies?

Note the need to use single quotes around the regex and to double-escape the function call parenthesis to match, for example, library(capturethis)

grep -Poh -r '(\w+)(?=::)|(?:library|require(?:Namespace)?)\\("?(\w+)"?\\)' --include="*\.R" | sort | uniq

Source:

3.3.3. How to match URLs?

grep -Poh '(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])' myfile.txt | sort | uniq

Source:

3.3.4. How to test all the URLs in a text file?

grep -Poh '(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])' myfile.txt | sort | uniq | xargs wget --spider

3.3.5. How to match a custom file separator?

I typically separate my file in labeled sections using this pattern: comment separator (one or more), space, label (can include space), space, at least four dash lines but typically enough to fill the column space. This can be matched with \S+\s(.+?)\s-{4,}

;; Speed bar -------------------------------------------------------------------

%% Introduction ----------------------------------------------------------------

## Read data -------------------------------------------------------------------

3.3.6. How to match any number with decimals up to 10?

^(?:\d|\d?\.\d+|10(?:\.0+)?)$

Source: https://regex101.com/r/kk8SIF/1/

3.3.7. How to generate a string that matches a regular expression?

A regex to string kind of situation.

3.3.8. General resources

3.4. GNU Make

3.4.1. How to call Make on all files in a directory?

Populate the all target programatically. For each file with extension ext1, build a file with the same name but extension ext2.

all : $(patsubst %.ext1, %.ext2, $(wildcard *.ext1))
all : $(patsubst %.tex, %.pdf, $(wildcard *.tex))

Source:

3.5. PDF processing

3.5.1. How to convert a PDF file to a text file?

Use pdftotext to extract the content of a PDF file and write it into a text file.

pdftotext document.pdf document.txt

3.5.2. How to list all the fonts used in a PDF file?

Use pdffonts to list all the fonts used in a PDF

pdffonts document.pdf

3.5.3. How to extract images from a PDF file?

Use pdfimages to extract images from a PDF file

pdfimages -list document.pdf
pdfimages -png -f 1 -l 10 document.pdf

3.5.4. How to merge PDF files?

Use pdfunite to merge input1.pdf and input2.pdf into merged_document.pdf.

pdfunite input1.pdf input2.pdf merged_document.pdf

Use Ghostscript to merge input1.pdf and input2.pdf into merged_document.pdf.

gs -sDEVICE=pdfwrite dNOPAUSE -dBATCH -dSAFER -dQUIET  \
   -dAutoRotatePages=/None \
   -sOutputFile="merged_document.pdf" input1.pdf input2.pdf

Source: How to merge several PDF files?

3.5.5. How to extract some pages from a PDF file?

Use pdfseparate to extract page by page as sample-1.pdf, sample-2.pdf, sample-3.pdf.

pdfseparate sample.pdf sample-%d.pdf

Use Ghostscript to extract pages 12-15 from input.pdf into extracted_document.pdf.

gs -sDEVICE=pdfwrite dNOPAUSE -dBATCH -dSAFER -dQUIET  \
     -dFirstPage=12 \
     -dLastPage=15 \
     -sOutputFile="extracted_document.pdf" \
     input.pdf

Source: How can I extract a page range / a part of a PDF?

3.5.6. How to compress a PDF file?

Use Ghostscript to compress input.pdf and write into compressed_document.pdf using a distiller preset

  • -dPDFSETTINGS=/screen lower quality, smaller size (72 dpi)
  • -dPDFSETTINGS=/ebook better quality, slightly larger size (150 dpi)
  • -dPDFSETTINGS=/prepress prepress optimized (300 dpi)
  • -dPDFSETTINGS=/printer print optimized (300 dpi)
  • -dPDFSETTINGS=/default wide variety of uses, larger size
gs -sDEVICE=pdfwrite dNOPAUSE -dBATCH -dSAFER -dQUIET  \
   -dCompatibilityLevel=1.4 \
   -dPDFSETTINGS=/ebook \
   -sOutputFile="compressed_document.pdf" input.pdf

Source: How can I reduce the file size of a scanned PDF file?

3.5.7. How to automatically crop margins from a PDF file?

Use pdfcrop to automatically crop margins from a PDF file

pdfcrop document.pdf

3.5.8. How to extract and delete all the metadata in a PDF file?

Use exiftool for reading and manipulating PDF metadata.

exiftool -all:all  file.pdf ## read all the tags
exiftool -all:all= file.pdf ## remove all the tags

Source:

3.5.9. How to set a password for a PDF file?

Use qpdf, or the older and sometimes deprecated pdftk. Use single quotes when including special characters. The user password is mandatory. The user and owner passwords cannot be the same.

## 128 and 256 bit encryption available in qpdf
qpdf --encrypt 'pass1' 'pass2' 256 -- input.pdf output.pdf
## 128 bit or lower encryption available in pdftk
pdftk input.pdf output output.pdf owner_pw 'pass1' user_pw 'pass2' encrypt_128bit

Source:

3.6. Hugo

3.6.1. How to set target=_blank for all external links?

mkdir -p layouts/_default/_markup/render-link.html
echo '<a href="{{ .Destination | safeURL }}"{{ with .Title}} title="{{ . }}"{{ end }}{{ if strings.HasPrefix .Destination "http" }} target="_blank"{{ end }}>{{ .Text }}</a>' > layouts/_default/_markup/render-link.html

Source: How to Open Link in New Tab with Hugo's new Goldmark Markdown Renderer

3.6.2. How to set the home page to other content?

Show contentname as landing or home page

mkdir -p layouts
echo '{{ with .GetPage "/contentname" }}{{.Render}}{{end}}' > layouts/index.html

Source: I want to redirect my Homepage to a content page

4. Typesetting

4.1. (La)TeX

4.1.1. How to write better LaTeX?   guidelines

  • TeX FAQ: The TeX Frequently Asked Question List
  • nug: Package that warns the user about the use of obsolete things
  • l2tabu: Practical usage guide with mistakes, obsolete packages and commands

4.1.2. How to organize LaTeX files?

Elements are in order recommended in The Chicago Manual of Style

Short document

  • myclass.cls
  • wrapper.tex
  • frontmatter.tex
    • title page, table of contents, list of illustrations, list of tables, list of abbreviations
  • mainmatter.tex
    • introduction
    • rest of the content
  • backmatter.tex
    • appendices, bibliography, index

Full book

  • myclass.cls
  • wrapper.tex
  • frontmatter.tex
    • half title (main title sans subtitle), series title or frontispiece, title page, copyright page, dedication, epigraph, table of contents, list of illustrations, list of tables, foreword, preface, acknowledgements, list of abbreviations, second half title
  • mainmatter.tex
    • introduction
    • rest of the content
  • backmatter.tex
    • epilogue, appendices, glossary, bibliography, index, and colophon

4.1.3. How to compile LaTeX document on save?

Run latexmk -pdf -pvc --interaction=nonstopmode file.tex to compile the TeX document every time the file changes on disk.

Source: What You See is What You Get (WYSIWYG) for PGF/TikZ?

4.1.4. How to avoid orphan words?

Sometimes, we have one or two words hanging on the last line of a paragraph.

  • Globally, set a large penalty for creating a club line at bottom of page adding \clubpenalty=10000 (source)
  • Add \looseness=-1 immediately before the paragraph (source)
  • Adjust badness limits
  • \parfillskip controls the length at the end of a paragraph

4.1.5. How to prepend letter to figure name?

%% Add prefix S to label names
\renewcommand{\thepage}{S\arabic{page}}
\renewcommand{\thesection}{S\arabic{section}}
\renewcommand{\thetable}{S\arabic{table}}
\renewcommand{\thefigure}{S\arabic{figure}}
%% Modify the text used at the start of the caption
\renewcommand{\figurename}{Supplemental Material, Figure}

4.1.7. How to write an unnumbered footnote?

4.1.8. How to type a differential operator dx?

Use upright d followed by math x, and possibly add a space \, before the differential operator

\begin{equation}
  \int_0^1 x^2 \, \mathrm{d}x
\end{equation}

4.1.9. How to define a new operator like log?

Use \mathop, or \DeclareMathOperator if amsopn or amsmath are loaded to create operators such as trace, diag, argmax, and argmin.

\newcommand{\diag}{\mathop{\mathrm{diag}}}
\DeclareMathOperator{\diag}{diag}
\DeclareMathOperator{\trsym}{tr}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\evsym}{E}
\DeclareMathOperator{\vsym}{V}
\DeclareMathOperator{\corsym}{Cor}
\DeclareMathOperator{\covsym}{Cov}
\DeclareMathOperator{\sesym}{SE}

Source:

4.1.10. How should I type …?

  • Punctuation
    • add to an inline and displayed equation the punctuation required by conventional rules, e.g., question mark, comma, semicolon, period
    • add punctuation inside the math mode scope to avoid orphans
  • Fractions
    • prefer horizontal bar or forward slash over precomposed fractions for accessibility reasons.
    • metric units are given in decimal fractions, non-metric units can be either type of fraction
  • Upright (Roman) versus italic
    • prefer upright over italic for operators, e.g., \(\mathrm{d}x\) over \(dx\)
    • but prefer italic over upright for standard universal constants, e.g., \(e\) over \(\mathrm{e}\) and \(\pi\) over \(\mathrm{\pi}\)
    • prefer italic for variable and function names,
    • but prefer upright for multi-letter names, e.g., \(\mathrm{log}(x)\) over \(log(x)\) and \(\mathrm{sin}(x)\) over \(sin(x)\)
    • prefer upper case italics for sets
  • Blackboard bold: standard number system and some certain mathematical objects
  • Linear algebra:

Source: Manual of Style/Mathematics - Wikipedia

4.1.11. What symbol should I use for … ?

4.1.12. Symbol lookup

  • Detexify: draw symbol by hand or use the symbol table

4.1.13. How to type the math symbol for definition :=?

Use \coloneqq and \vcoloneqq from the mathtools package.

4.1.14. How to debug the LaTeX document layout, sizes, and spacing?

4.1.15. What is the document body ratio?

Body width:  \the\textwidth
Body height: \the\textheight

4.1.16. How to reduce spacing around figures, tables, captions?

  • \textfloatsep — distance between floats on the top or the bottom and the text;
  • \floatsep — distance between two floats;
  • \intextsep — distance between floats inserted inside the page text
  • (using h) and the text proper.
  • \dbltextfloatsep — distance between a float spanning both columns and the text;
  • \dblfloatsep — distance between two floats spanning both columns.
  • \captionsetup{font=footnotesize}

    % Reduced margins
    \usepackage{geometry}
    \geometry{left=0.75in,right=0.75in,bottom=1in,top=1in}
    
    % Minimal spacing around captions and floats
    \setlength{\abovecaptionskip}{2pt plus 1pt minus 2pt}
    \setlength{\belowcaptionskip}{2pt plus 1pt minus 2pt}
    \setlength{\textfloatsep}{2pt plus 1pt minus 2pt}
    \setlength{\intextsep}{2pt plus 1pt minus 2pt}
    \setlength{\floatsep}{2pt plus 1pt minus 2pt}
    
    % Minimal spacing around equations
    \setlength{\abovedisplayskip}{0pt}
    \setlength{\belowdisplayskip}{0pt}
    \setlength{\abovedisplayshortskip}{0pt}
    \setlength{\belowdisplayshortskip}{0pt}
    

Source:

4.1.17. How to clip an image with percentages?

Use the adjustbox package to

\adjustbox{trim={.05\width} {.2\height} {0.1\width} {.15\height}, clip}%
{\includegraphics[width=0.5cm]{cupdot.png}}

Source:

4.1.18. How to write a good proof in LaTeX?

  • Use the \proof environment from the amsthm package (source)
    • Add some text, if needed
    • Begin a align environment
      • Use the aligned subenvironment within align, possibly with a [t] option, or similar subenvinroments to break long lines (source)
      • Use the \shortintertext command from the mathtools package to write short text between lines, or \intertext if you prefer larger space above and below the math line (source)
      • Use the \tag command if you want to refer to a line by a name instead of a number

4.1.19. How to smash all in-line math?

There is a strong argument for increasing the linespread to accomodate for $Y^{(0)}$ if you need to use smash too often. If you insist, though, use \setlength{\lineskiplimit}{-100pt}

4.1.20. How to influence the position of a figure/table?

4.1.21. How to fiddle with math font height and width?

4.1.22. How to set a global path for input and graphic files?

\makeatletter
\def\input@path{{/path/to/folder/}}
% or: \def\input@path{{/path/to/folder/}{/path/to/another/folder/}}
\makeatother

Source: \input and absolute paths

4.1.23. How to align table columns without counting them?

Say you want to align the first column to the left and the rest to the right without needing to figure out how many columns the table has. You can specify more columns than used, but not vice-versa, as in l*9r or l*{99}. Use brackets for more than one digit.

\begin{tabular}{l*{99}r}
  col1 & col2 \\
\end{tabular}

4.1.24. How to gray out leading zeroes in a table?

This macro was shared by [exa] at libera.chat.

4.1.25. How to add the (sub)section name in the page header?

% Add section name in page header
\usepackage{fancyhdr}
\fancypagestyle{main}{
  \fancyhf{}
  \renewcommand{\sectionmark}[1]{\markright{\thesection\ ##1}}
  \renewcommand{\subsectionmark}[1]{\markright{\thesubsection\ ##1}}
  \renewcommand{\subsubsectionmark}[1]{\markright{\thesubsubsection\ ##1}}
  \fancyhead[L]{\textsl{\footnotesize{\rightmark}}}
}
\pagestyle{main}

4.1.26. How to hide section headings?

\usepackage{titlesec}

\makeatletter
\titleformat{\section}[runin]{}{}{0pt}{\@gobble}
\titleformat{\subsection}[runin]{}{}{0pt}{\@gobble}

\makeatother
\titlespacing{\section}{\parindent}{0pt}{0pt}
\titlespacing{\subsection}{\parindent}{0pt}{0pt}

4.1.27. How to change the caption font size?

\usepackage{caption}
\captionsetup{font=footnotesize}

4.1.28. How to include a tex file with multicolumn commands?

Long story short, you can't call \input{filename} before \multicolumn.

\csname @@input\endcsname filename

Source:

4.1.29. How to properly compile a LaTeX project?

Use latemk file.tex -pdf and that's it. No need to run multiple commands or figure out the right order. Put in a makefile if that's your workflow.

Source: how to properly 'make' a latex project?

4.1.30. How to make GitHub compile a document after a push?

4.1.31. How to build a custom style?

Mix and match (suggestions for a research paper style):

  • Search for accessibility guidelines
  • A base template for inspiration, e.g., ICML2021 Template
  • Layout, e.g., two columns
  • Portrait pages need to be easy to set up
  • Line numbering
  • Author affiliation need to be easy to set up
  • Short title in header/footer
  • Page number in header/footer
  • Headings need to be easy to spot
  • Text font, e.g., libertinus, palladios, see also The LaTeX Font Catalogue
  • Math font
  • Load AMS packages
  • Reasonably minimal page margins
  • Reasonably minimal fig/table caption margins
  • Allow for fig/table extending over both columns
  • Appendix need to be easy to set up
  • Appendix fig/table numbering should start with A, e.g., Table A.1
  • Option to hide names and acknowledgments
  • Option for a short table of content on front page
  • Have some basic editing commands
  • Recall that not everyone types in English

4.1.32. What type of LaTeX commands exist?

  • Author commands
    • typically short, lower case names
    • e.g. \section, \emph, \times
  • Class and package writer commands
    • typically long, CamelCase names
    • e.g., \InputIfFileExists, \RequirePackage, \PassOptionsToClass
  • Internal commands
    • typically contain an @ in their name
    • e.g., \@tempcnta, \@ifnextchar and \@eha
  • Exceptions:
    • \hbox is internal
    • \m@ne, the constant -1, is for class and package writers

4.1.33. How to make a class color safe?

  • Issue: when using {\color{green} text}, color is restored after the final }
    • e.g., \setbox0=\hbox{\color{green} ⟨text⟩}
  • Use LaTeX box commands rather than TeX primites
    • \sbox rather than \setbox
    • \mbox rather than \hbox
    • \parbox or a minipage environment rather than \vbox
  • Use \normalcolor to set regions to main document color rather like \normalfont

4.1.34. How to write a custom LaTeX class?

  • Use the doc software which comes with LaTeX (see The LaTeX companion)

4.1.35. Where do I put local LaTeX style files and packages?

The path of the local folder, and many others, are defined in /etc/texmf/web2c/texmf.cnf. The default path for local files is ~/texmf. Packages, .tex and .sty files go in ~/texmf/tex/latex, bibtex style files .bst go in ~/texmf/bibtex/bst/. Finally, run texhash ~/texmf/ to update the database.

4.1.36. How to make a poster in LaTeX?

Use beamerposter (possibly with the Gemini template) or tikzposter.

4.1.37. Resources

4.2. BibLaTeX

4.2.1. How to format citations and the bibliography with BibLaTeX?

%% references
\usepackage[
backend=bibtex,            % Use legacy bibtex backend (biber is better)
natbib=true,               % Load aliases for citation commands
citestyle=authoryear-comp, % Inline: Author year, compressed
maxcitenames=1,            % Inline: max 1 author name
bibstyle=authoryear,       % Bibliography: Author year
giveninits=true,           % Bibliography: first and middle name initials
dashed=false,              % Bibliography: no dash for recurrent authors
abbreviate=true,           % Bibliography: abbreviate
maxbibnames=100,           % Bibliography: all author names
sorting=nyt,               % Bibliography: sort by name, year, title
isbn=true,                 % Bibliography: print ISBN
url=false,                 % Bibliography: don't print URL
doi=true,                  % Bibliography: print DOI
eprint=false               % Bibliography: don't print eprint information
]{biblatex}

% Bibliography: print authors as "last name, first name"
\DeclareNameAlias{sortname}{family-given}

Source:

4.2.2. How to bold my name using BibLaTeX?

This will bold YOURFAMILYNAME in the bibliography but not in the citations.

% Bibliography: bold last name
\usepackage{ifthen}
\AtBeginBibliography{%
\renewcommand*{\mkbibnamefamily}[1]{%
  \ifthenelse{\equal{#1}{YOURFAMILYNAM}}{\textbf{#1}}{#1}}
}

Source:

4.3. Beamer

4.3.1. How to automatically add a section slide in Beamer?

Use AtBeginSection and AtBeginSubsection respectively to add a frame when a section or a subsection is created. Beamer has the built-in templates sectionpage and subsectionpage for section slides. You may modify them or simply create a new template.

\AtBeginSection{\frame{\sectionpage}}
\AtBeginSubsection{\frame{\subsectionpage}}

Source:

4.3.2. How to hide a frame in Beamer?

Use <beamer:0> to hide it in the presentation, use <handout:0> to hide it in the handout, use <handout:0|beamer:0> or simply <all:0> to hide it in both beamer and handout mode.

\begin{frame}<beamer:0>
  \frametitle{This frame won't be visible}
  Lorem ipsum
\end{frame}

Source:

4.3.3. How not to count appendix slides in Beamer?

Use \usepackage{appendixnumberbeamer} to automatically reset the frame counter when the \appendix command is called. No need to do this by hand :)

4.3.4. How to add a hyperlink to a Beamer frame?

Load \usepckage{hyperref}. Use \label{somename} or \hypertarget{somename} to set a target. Use \hyperlink{somename}{text to show} or \hyperlink{somename}{\beamerbutton{text to show}} to print a text or button linking to the target.

4.3.5. How to build a Beamer template?

Mix and match:

4.4. Graph diagrams

4.4.1. Graphviz

4.4.2. PGF/TikZ

Typically requires a bit more of work, but it's more flexible and allows for neater graphs

4.5. Reference management

5. Text editors

5.1. Emacs

5.1.1. How to input TeX style?

5.1.2. How to open all files by pattern?

emacs *.tex
emacs *.tex */*.tex
find . -iname \*tex -exec emacsclient {} +

5.1.3. How to highlight by word, line, or regex?

5.1.4. How to profile init / settings?

5.1.5. How to navigate back to part of a buffer?

  • Save point in register r C-x r SPC r (point-to-register), jump to position in register r C-x r j r (jump-to-register) (rpav at Libera.Chat)
  • Set a mark C-SPC C-SPC, then jump back C-u C-SPC (Desertcoffee at Libera.Chat)

5.1.7. How to have custom outlining and folding in Emacs?

bpalmer at Libera.chat

5.1.8. What are some good .emacs config files?

5.1.9. Emacs packages worth checking

5.1.10. Useful modes

5.2. Org-mode

5.2.1. What is the programming language identifier for a source block?

See Tables 1 and 2 in Babel: Languages.

5.2.2. How to export/tangle automatically on commit?

NB: --batch implies -q, hence initialization files are no loaded. Add -l ~/.emacs, -l ~/.config/emacs/init.el or similar.

echo 'type emacs > /dev/null 2>&1 &&
emacs README.org --batch -f org-md-export-to-markdown --kill &&
git add README.md' >> .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit

5.2.3. How to add a custom format to display a horizontal line in the buffer?

Set up a custom font lock format by calling the function font-lock-add-keywords.

(add-hook 'org-mode-hook
     (lambda ()
       (font-lock-add-keywords
        nil
        '(("^-\\{5,\\}"  0 '(:foreground "green" :weight bold))))))

Source:

5.2.4. How to improve the look of an org-mode buffer?

Use org-modern for a modern style via font locking and text properties. Consider also org-superstar, which has a reduced scope for headings and plain lists only.

Source:

5.3. Elisp

5.3.1. Where to start with elisp?

(info "(eintr) Top")

6. Programming

6.1. Programming fonts

  • Recommended fonts: Consoles, Andale mono, fira code (ligatures), Source Code Pro, DejaVu Sans Mono
  • Use Illegal1 = O0 as a test
  • Play font tournament: Find Your True Love of Coding Fonts

Source: recommended fonts for programming?

6.2. Project structure

  • Prefer folder-by-feature over folder-by-type (source)
  • Organize tests by mirroring the source tree (source)
  • Typical structure (source)
    • /bin Files to be executed
    • /src Source files
    • /lib External dependencies
    • /doc Documentation, and more generally any kind of writing
    • /test All tests
    • /sandbox The fact that there's one is probably a bad sign, but there you go
  • Executable files
    • Go in the \bin folder
    • Start with the appropiate shebang, e.g., #!/usr/bin/Rscript
    • Are truly executable, e.g., chmod +x script.R
    • Has all paths set so that it can be called from the project root, e.g., ./bin/process-data.R

6.3. Functional programming

6.3.2. Pure functions

  • Requirements:

    • Referential transparecy: always returns the same result if given the same arguments
      • Rely on their own arguments and immutable values only, e.g.,

    a string

    • No dependence on random number generation, data files
    • Must have no side effects
      • No need to worry if used in multiple places, or down a deep

    call chain

    • No uncertainty about what an object name refers to
    • Easier to trace and debug
  • Immutable data structures
    • Recursion instead of for/while loops
    • Function composition instead of attribute mutation

6.3.3. Functional design patterns

A program is chain of monoids (which is a chain of continuations (which is a chain of partial applications (which is a chain of compositions (which is a chain of one-argument functions))))

  • Functions all the way down!
    • Function as inputs
    • Functions as outputs
    • Function as arguments
      • Hard-coded data, e.g., for i in 1:10
      • Hard-coded behavior, e.g., print
      • Decouple behavior from data, i.e., any behavior times any data
      • Collection functions, e.g., fold, map, reduce, collect
  • Composition pattern: chain low-level operations
    • e.g., apple -> cherry = apple -> banana + banana -> cherry
    • Low-level operation: e.g., string -> string
    • Service: chained low level operations, e.g., Address -> Validation
    • Use-case: chained services, e.g., ChangeProfileRequest -> ChangeProfileResult
    • Application: chained use-cases, e.g., Request -> Response
    • Composition is fractal
    • Works for functions with one parameter only (see next)
  • Partial application

    • Write a two-parameter function as a one-parameter function that returns a one parameter function, e.g., a + b = function(b) { a + b }).
    • Uses:
      • When working with vectorized applications, e.g., lapply, Map,

    Filter.

    • To inject dependencies, e.g., on a database or file

    connection.

  • Continuations: chain partial applications

    • Bad: nested checks (pyramid of doom)
    • Good:
      • Start with an abstract if_ function -> returnType
      • Chain them:
    if_(doA) else doNestedActionLevel1
    if_(doB) else doNestedActionLevel2
    if_(doC) else doNestedActionLevel3
    
  • Monoids: chain continuations
    • Closure: combining two things always returns another one thing.
      • Readily vectorized!
    • Associativity: when combining more than two things, pairwise combination order does not matter
      • Readily parallelized!
      • Easy for incremental accumulation
    • Identity: a special thing called "zero" that returns the same thing that was combined with it
      • Use as initial value for empty or missing data
  • Two-track model for error handling:
    • Use a switch function for error handling, e.g., validate(input) calls sucessFun(input) or failure(error)

From Functional Programming Patterns (NDC London 2014) by Scott Wlaschin.

6.3.4. General

7. R programming

7.1. Write better R

7.1.1. Non-standard evaluation

  • NSEs are like super duper macros

7.1.2. R is functional

  • Functions as arguments
  • Functions as ?
  • Functions as output

7.1.3. R has dynamic binding

Or something like that for functions, check

7.2. R programming

7.2.1. Programming with functions

  1. How to recover the name of the variable passed as argument?
      z <- 1:10
      f <- function(x) {
    varName <- as.character(as.list(match.call())$x) ## "z"
      }
    

    Source: lazy evaluation - Passing a variable name to a function in R

  2. How to get all function call parameters as a list?

    Call as.list(match.call())[-1] within your function

      mydummyfun <- function(x, y, main = NULL, ...) {
    l <- as.list(match.call())[-1]
    do.call(plot, l)
      }
    
      mydummyfun(1:10, 1:10, ylab = "Ex")
    

    Source: Get all Parameters as List

    If you want to pass all the arguments to a new function: use match.call to get the complete call, and inject the name of the new function, and evaluate the modified call.

      f1 <- function(a, b, c) { sum(a, b, c^2) }
    
      f2 <- function(a, b, c) { prod(a, b, c) }
    
      f  <- function(a, b, c) {
    fun <- f1
    if (a < 0)
      fun <- f2
    
    ## get the call (fun and args)
    thiscall      <- match.call(expand.dots = TRUE)
    ## change function name
    thiscall[[1]] <- fun
    ## evaluate new call same as fun(a, b, c)
    eval.parent(thiscall)
    
      }
    
      f(+1, 0, 5)
      f(-1, 0, 5)
    

    Source: Passing all arguments to another function

  3. How to add a hook to a function call?

    Use trace to get execute an expression after and before a function call.

    f_before <- function() { print("Before call") }
    f_after  <- function() { print("After  call") }
    
    trace(base::cumsum, f_before, f_after, print = FALSE)
    cumsum(1:10)
    
    untrace(base::cumsum)
    cumsum(1:10)
    
  4. How to mimic function overloading?

    Unfortunately, I haven't found anything better than optional arguments. See for example ?xy.coords. In some situations, there might be better approaches.

    Source: R - Function overloading

7.2.2. Vectorization

  1. Why are for loops slow in R?

    for, :, [, [<- are all function calls and function calls can be time-consuming. The following loop is, in fact, a sequence of many function calls. Vectorized functions are written in C and are typically faster.

      N      <- 20
      fib    <- rep(NA, 10)
      fib[1] <- 0
      fib[2] <- 1
    
      for (i in 3:N)
    fib[i] <- fib[i - 1] + fib[i - 2]
    

    Source: The Art of R Programming: A Tour of Statistical Software Design section 14.1.1

  2. Why is apply not fast?

    While lapply and others are implemented in C, apply is actually implemented in R and might not provide a high speedup.

    Source: The Art of R Programming: A Tour of Statistical Software Design section 14.1.1

  3. Which base R functions are vectorized?

    Not an comprehensive list

    • Math operators
      • +, -, etc
    • Logic
      • ==, !=, etc
    • Vectors
      • ifelse
      • which
      • where
      • any
      • all
      • cumsum
      • cumprod
      • abs
      • pmin
      • pmax
    • Matrix:
      • rowSums
      • colSums
      • lower.tri
      • upper.tri
    • All pairs
      • outer
    • All combinations
      • combin
      • expand.grid
  4. How to vectorize the functions and the arguments?

    Use mapply. Pass the vector of functions as an argument to an anonymous function that calls the function passed as argument to the remaining arguments (credits: Fendur at libera.chat)

    funs <- c(function(x, y) x + y, function(x, y) x^2 * y)
    grd  <- expand.grid(fun = funs, x = 1:3, y = 1:4)
    with(grd, mapply(function(f, x, y) f(x, y), fun, x, y))
    

    Use ellipsis for functions taking different arguments

    funs <- c(function(x, y, ...) x + y, function(x, y, z) x^2 * y/z)
    grd  <- expand.grid(fun = funs, x = 1:3, y = 1:4, z = 1:2)
    with(grd, mapply(function(f, x, y, z) f(x, y, z), fun, x, y, z))
    

    In the case of two argument functions, the explicit expansion can be avoided using outer

    funs <- c(function(x, y) x + y, function(x, y) x^2 * y)
    sapply(funs, function(f) outer(1:3, 1:4))
    ## Using R > 4.1
    sapply(funs, \(f) outer(1:3, 1:4))
    

7.2.3. How to create a sequence within groups?

x <- unlist(replicate(10, rep(sample(LETTERS, 1), rpois(1, 4))))
sequence(rle(x)$lengths)                  # if ordered
unlist(sapply(unname(table(x)), seq.int)) # doesn't need ordering

Source: generate sequence within group in R

7.2.4. How to identify value changes in a sequence?

x <- unlist(replicate(10, rep(sample(LETTERS, 1), rpois(1, 4))))
head(cumsum(rle(x)$lengths)+1, -1)

Source: Identifying where value changes in R data.frame column

7.2.5. How to renumber a group?

Note that as.numeric(as.factor(x)) does not work if x contains numbers. Also, the match-unique combo scales better with large vectors.

x <- c(4, 4, 4, 6, 6, 6, 6, 8, 8, 8, 8, 1, 1, 1, 5, 5, 5, 5)
match(x, unique(x))
## [1] 1 1 1 2 2 2 2 3 3 3 3 4 4 4 5 5 5 5

Source: how to create a consecutive group number

7.2.6. How to make a cross table with a custom function (e.g., mean)?

addmargins(xtabs(value ~ ., aggregate(value ~ factor1, factor2, DF, mean)),
   FUN = mean)

7.2.7. How to make lapply return a data.frame?

If x is a data.frame, which is a list with attributes, use x[] to preserve all attributes.

x[] = lapply(x, type.convert)

Source: Jiří Moravec

7.2.8. How to split a matrix row-wise as a list?

Use base::asplit for efficiency, where asplit(X, 1) and asplit(X, 2) return a list of rows and columns respectively.

## I just learned about asplit
asplit(X, 1)

## Original note
X <- matrix(rnorm(100), nrow = 20)
l <- as.list(as.data.frame(t(X)))

7.2.9. How to look up a value among possibilities?

The workhorse of any labeling function

l <- list(key1 = "value1", key2 = NA, key3 = 022)
lookup <- function(x, l) { unlist(l[x]) }

Source: benchmark unlist versus do.call(c, list) for list lookup in R

7.2.10. How to remove columns with all NA fast?

General approach with Base R only

Filter(function(x)!all(is.na(x)), df)

Via data.table for general time and memory efficiency (40% faster in example)

DT[, which(unlist(lapply(DT, function(x)!all(is.na(x))))), with = FALSE]

Source: remove columns from dataframe where ALL values are NA

7.2.11. How to fast apply a function over a ragged array?

Use unlist(lapply(split(x, f), FUN)) for speed, but consider tapply(x, f, FUN) for readibility maybe?

## Unit: microseconds
##     expr      min        lq      mean    median        uq       max neval
## f0(x, f)  409.578  415.5165  426.9041  418.9500  424.3400  4237.466 10000
## f1(x, f)  411.208  418.3095  430.4231  421.8645  427.4155  5550.120 10000
## f2(x, f)  474.681  487.1265  498.8509  492.6960  497.6360  2552.075 10000
## f3(x, f) 1395.582 1442.3785 1494.0197 1459.3515 1472.5205 28121.379 10000
set.seed(1)

x <- rnorm(10000)
f <- factor(rpois(10000, 5))

tapply_ <- function(x, f, FUN) { unlist(lapply(split(x, f), FUN)) }

f0 <- function(x, f) { unlist(lapply(split(x, f), mean)) }
f1 <- function(x, f) { tapply_(x, f, mean) }
f2 <- function(x, f) { tapply(x, f, mean) }
f3 <- function(x, f) { by(x, f, mean) }

bench <- microbenchmark::microbenchmark(
           f0(x, f), f1(x, f), f2(x, f), f3(x, f),
           times = 1E4)

print(bench)

7.2.12. How to fast subset rows corresponding to max value by group?

## Row with maximum `g` for each group `id` in the `bdt` data.table
bdt[bdt[, .I[g == max(g)], by = id]$V1]

Source: subset rows corresponding to max value by group using data.table

7.2.13. How to create named vector programatically in one statement?

out <- setNames(c("value1", "value2"), c("name1", "name2"))

Source: create a numeric vector with names in one statement?

7.2.14. How to get all function call arguments as a list?

Including ellipsis also!

f <- function(a, b = 2, ...) { c(as.list(environment()), list(...)) }

Source: get all Parameters as List

7.2.15. How to debug an error thrown in a package?

options(error = recover, show.error.locations = TRUE, warn = 2)

Source: debugging unexpected errors in R – how can I find where the error occurred?

7.2.16. How to compute a simple moving average with base R?

SMA <- function(x, K) {
  if(!(K %% 2)) stop("K is not even")
  rowMeans(embed(c(rep(NA, K / 2), x, rep(NA, K / 2)),  K), na.rm = TRUE)
}

Source:

7.2.17. How to set the seed locally for a function?

Note: this needs validation.

myfun <- function(seed) {
  old  <- .Random.seed
  on.exit({assign(".Random.seed", old, envir = .GlobalEnv)})
  set.seed(seed)
}

7.2.18. Visualization

  1. How to draw a plot with minimal margins?
    ## oma: Outer  = device margin lines (bltr)
    ## mar: Margin = figure margin lines (bltr)
    ## mgp:      ? = axis margin lines (title, label, line)
    
    ## No title
    opar <- par(
      oma = c(0, 0, 0, 0) + .1,
      mar = c(3, 3, 0, 0),
      mgp = c(2, 1, 0)
    )
    
    plot(x = 1:10, y = 1:10)
    
    ## No title nor axis labels
    opar <- par(
      oma = c(0, 0, 0, 0) + .1,
      mar = c(2, 2, 0, 0),
      mgp = c(2, 1, 0)
    )
    
    plot(x = 1:10, y = 1:10)
    
  2. How to check if a value is in an interval?
    ## x <- 1
    ## confint <- c(-0.5, 0.5)
    (prod(sign(confint - x)) < 0)
    

    Source: rickyrick at libera.chat

  3. How to cache a read function?

    Use memoise to store the function call on memory, or an ad-hoc implementation like here (Colombo at Libera.chat).

  4. How to use geom_tile with irregular data?

    Note that ?geom_tile recommends akima::interp.

    x <- mtcars$hp ## x-axis
    y <- mtcars$qsec ## y-axis
    z <- mtcars$mpg ## surface color
    
    ak   <- akima::interp(x, y, z)
    DF   <- expand.grid(ak[1:2])
    DF$z <- ak[[3]]
    
    DF <- data.frame(expand.grid(ak[1:2]), z = c(ak[[3]]))
    
    ggplot(DF, aes(x, y, fill = z)) +
      geom_tile()
    

    If the data size is large, constructing the data.frame as follows might be more efficient.

    DF   <- expand.grid(ak[1:2])
    DF$z <- ak[[3]]
    
  5. How to remove white lines from geom_tile with ggplot2?

    The horizontal and vertical variable values should be equally spaced for geom_tile to work automatically. If there are white lines, there might be small inconsistencies in the gap between a few values (e.g., the first value is 1E-4 instead of an actual zero). Try round(x) or factor(x) for a quick fix.

    Source:

  6. How to plot in reverse log scale with ggplot2?
    #' Reverse log transformation
    #'
    #' @param base a positive or complex number: logarithm base.
    # 'Defaults to `e=exp(1)`.
    #' @return
    #' @reference https://gist.github.com/JoFrhwld/2266961
    .revlog_trans <- function(base = exp(1)){
      scales::trans_new(
                name      = paste("revlog-", base, sep = ""),
                transform = function(x){ -log(x, base) },
                inverse   = function(x){ base^(-x) },
                breaks    = scales::log_breaks(base = base),
                domain    = c(1e-100, Inf)
              )
    }
    
    scale_x_revlog10 <- function(...) {
      scale_x_continuous(trans = .revlog_trans(base = 10), ...)
    }
    
    scale_y_revlog10 <- function(...) {
      scale_y_continuous(trans = .revlog_trans(base = 10), ...)
    }
    
  7. How to add labels near the plot boundaries with ggplot2?

    Use -Inf and Inf to signal the left/bottom and right/top end respectively, e.g., use x=Inf and y=Inf to place a geom_label on the north-eath.

    ggplot() +
      geom_point(aes(x = 1:10, y = rnorm(10))) +
      geom_label(aes(x = Inf, y = Inf, label = "Some text"),
                 vjust = 1, hjust = 1)
    

    Source:

  8. How to move the legend closer to the axis label with ggplot2?

    Give yourself the gift of the manual fine tuning using theme(legend.margin = margin(-10, 0, 0, 0)), where -10 needs to be defined on a case-by-case basis.

    Source:

  9. How to make ggplot2 match LaTeX graphics and font size?

    Reasonable sizes are 11pt (3.87mm) for manuscripts, 24pt (8.44mm) or 25pt (8.79mm) for posters read at 1m of distance.

    • Use \show\f@size in LaTeX to show the font length
    • Use theme(text = element_text(size = 11)) in R to set the target font size in points
    • Use \the\linewidth in LaTeX to show the line width
      • Note that \textwidth ignore borders that could be defined in the document class
    • Use ggsave(..., width = 10, units = "in") to match LaTeX's line width
    • Include the figure with width=\linewidth

    Source:

  10. How to plot in log scale with base R?
    plot(exp(1:10), 1:10, log = "x")
    plot(1:10, exp(1:10), log = "y")
    plot(exp(1:10), exp(1:10), log = "xy")
    
  11. How to fine tune R plot margins?
  12. How to make beautiful plots with base R?

7.2.19. How to improve the look of my rmarkdown HTML document?

Thanks to fendur on #R at libera.chat

  • Client-side, precomputed dashboard like document:
    • Idea: one chapter contains an image plus some comments
    • Start with html_document
    • Use .tabset-pills to organize chapters
    • Use .tabset-fade to make switching smoother
  • Pimp my RMD: a few tips for R Markdown

Below is a quick template I prepared.

---
title: "Barebone dashboard"
date: "`r Sys.Date()`"
output: html_document
---

<style type="text/css">
.main-container {
   max-width: 100% !important;
}
.title, .author, .date {
  display:inline!important;
}
.nav-pills {
  line-height: 0px !important;
}
.h1 {
 font-size: 16px !important;
}
</style>

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  echo       = FALSE,
  fig.align  = "center",
  fig.width  = 16,
  fig.height = 8.1,
  out.height = "80%"
)
```

# {.tabset .tabset-fade .tabset-pills}

## Precipitation per year

```{r}
plot(Nile, col = "darkgreen", lwd = 2, log = "y")
title(main = "Annual flow of the river", adj = 1, line = .5)
```

1. This is a full-page wide picture
2. Line 2
3. Line 3
4. Line 4
5. Line 5

## MPG explained

```{r}
par(mfrow = c(1, 2))
plot(mtcars$mpg, mtcars$drat)
plot(mtcars$mpg, mtcars$wt)
```

1. This is a full-page side-by-side picture
2. Line 2
3. Line 3
4. Line 4
5. Line 5

## Species and width

```{r fig.width = 8.1, fig.height = 8.1}
pairs(iris[, 1:3], col = iris$Species, bg = iris$Species, pch = 21)
```

1. This is a full-page square picture
2. Line 2
3. Line 3
4. Line 4
5. Line 5

7.2.20. Where can I find useful addins for RStudio?

7.3. RStudio

7.3.1. Should .Rproj files be added to .gitignore?

RStudio general recommendation is to include .Rproj file in the repository, i.e., do not ignore it

Source: should *.Rproj files be added to .gitignore?

7.3.2. How to order lines alphabetically?

?

7.3.3. How to align assignment or equal symbols across lines?

7.4. Emacs Speaks Statistics

7.4.1. How to navigate code and jump around?

Use ess-build-tags-for-directory (C-c C-e t and C-c C-e C-t) to build the tags, then xref-find-definitions (M-.) to look up a definition via Xref.

Source: How to jump to R source code with ESS?

8. Stan programming

8.1. Stan

8.1.1. How do I choose between transformed parameters vs model blocks?

  • Variables in the transformed parameters blocks

    1. Can have constraints for error checking
    2. Are global
    1. can be accessed from the generated quantities block
    2. are part of the output by default

9. Data science

9.1. Datasets

These are hosted by other people. Use the wayback machine or web archive for dead links.

9.1.3. Spatial data

9.2. Data backup

9.2.1. What software backup should I use?

  • Borg: Deduplicating archiver with compression and encryption

9.2.2. How to mark a folder not to be backed up?

9.3. Data processing

9.3.1. GNU Make

9.3.2. Pipeline automation

9.3.3. How to view tabular data?

9.3.4. How to work with collaborators who use spreadsheets?

9.3.5. ETL cycle

  1. Cycle initiation
  2. Build reference data
  3. Extract (from sources)
  4. Validate
  5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)
  6. Stage (load into staging tables, if used)
  7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)
  8. Publish (to target tables)
  9. Archive

Source: Real-life ETL cycle

  1. ETL Template
    • Type of tasks
      • Data loading
    • Keep many small units as principle
      • Data munging
    • Munge many small units in parallel
      • Data processing – computationally intensive PEA
    • Run tasks that need many data units (prepare)
    • Run tasks that need one data unit only in parallel (execute)

      • One script with command argument line?
      • Makefile call the script?
      • One shell script to call makefile?

      * TODO This needs to be figured out

    • Run tasks that need many data units (aggregate)
      • Data storing
    • Write many small units to disk
    • Write aggregates for each small data unit or together?

9.4. A typical workflow

Scientific data generation, collection, curation and processing.

9.5. Data visualization

9.5.1. What colors are recommended for data visualization?

  • 1 color:
    • for points: black #000000
    • for lines: either black #000000 or honolulu blue #2271B2
    • for filled areas: summer sky #3DB7E9
    • for surfaces: TBD
  • 2 colors:
    • for points: TBD
    • for lines: honolulu blue #2271B2 and gamboge #E69F001
    • for filled area: honolulu blue #2271B2 and gamboge #E69F001
  • 3 colors:
    • for points: TBD
    • for lines: TBD
    • for filled areas: TBD

Source:

9.5.2. Color palette

9.5.3. Resources

9.6. High power computing

9.6.1. How to nicely list all my jobs in Slurm?

watch -d squeue --me --format=\"%.18i %.18P %.32j %.8u %.2t \| %.10M \| %.10l \| %.19S \| %.6D %20Y %R\" --sort=i

9.6.2. How to list previously submitted jobs in Slurm?

Use the sacct command, possibly with -S starttime and -E endtime arguments. It also accepts the --format argument.

sacct -S now-1days

Source:

9.6.3. How to find out the CPU time and memory usage of a Slurm job?

Use seff jobid after the job has finished. Use sacct -s r jobid or sstat jobid for jobs in progress.

Source:

9.6.4. How to specify Slurm job resources conditionally or programatically?

Short answer: you can't if you're using #SBATCH tokens. Consider creating a command called via sbatch instead, where you can pass modifiers and arguments to the sbatch as needed (e.g., sbatch --mem=$mymem).

Source:

Alternatively, you can pass the script through stdin replacing the position arguments.

Source:

9.6.5. What is a good guide for Slurm?

This is an excellent guide: Introducing Slurm | Princeton Research Computing. It covers how to submit serial jobs, multithreaded jobs, multinode or parallel MPI jobs, multinode, multithreaded jobs, job arrays, running multiple jobs in parallel as a single job, and running a sequence of jobs (job dependencies).

9.7. Naming convention

9.7.1. Common guidelines

  • Name things mainly for their role, sometimes for their type
  • Variable names follow mathematical notation only in low-level functions, use meaningful name for user facing variables
  • Use unabbreviated verbs for function names
  • Keep abbreviations between 3 and 4 chars

9.7.2. Keywords

Modeling

ou Observational unit
xu Experimental unit
sim Simulated/simulations
rsts Restars
init Initialize/initialization
pars A vector of parameter
fix Fixed values
kwn Known values
val Validation
opt Optima (obtained with optimization)
fit Fit
prd Predicted

Cross-validation

oos Out of sample
ins In sample
ios In and out of sample
idx Index
ind Indicator (boolean, true or false)
fold Fold (as in K-fold CV)

Object types

Str String
Ls List
Li List item
Mat Matrix
Vec Vector
Fun Function
DF Data frame / data.table
LDF Long data.frame / data.table
WDF Wide data.frame / data.table (might have more than one)
xi A scalar (must be a scalar)
xs A vector (must be a vector)
x A scalar and/or a vector (both should work)
X A vector or matrix (both should work)
Bit A scalar, vector, or matrix bitmask

Transformations

sc Scaled
uc Unscaled? maybe `nt` for natural? `og` for original?
svd guess what :)
pca idem

Other

id Identification code
ET Elapsed time
seed Seed number
hash For a hash (if it's an id, use id instead)

Statistics

med Median
mean Mean
var Variance
cov Covariance
cor Correlation
wt Weight (or ws and wi)
lsc Length-scale
qi Quantile
qs Vector of quantiles
sigma2 Use 2 instead of Sq o.o
ll Log-likelihood
hess Hessian matrix
mse Mean square error
rmse Root mean square error

Measurement units

Bm Biomass
KgHa Kilogram per hectare
MgHa Megamgram per hectare
BuAc Bushels per acre
Temp Temperature
El Elevation

10. Statistics

10.1. Statistics

10.1.1. What is the Karhunen–Loève theorem?

Let \(X_t\), \(t\in[a,b]\) be a centered stochastic process with \(\mathrm{E}[X_t] = 0\) for \(t\in[a,b]\) Assume the process satifies a technical continuity condition. Then, we have

\begin{equation} X_t = \sum_{k=1}^{\infty} Z_k e_k(t) \end{equation}

where \(Z_k\) are pairwise uncorrelated rando mvariables and \(e_k\) are continuous real-valued functions on \([a,b]\) that are pariwise orthogonal in \(L^2([a,b])\). If the process is Gaussian, then \(Z_k\) are Gaussian and stochastically indepenedent.

Given any orthonormal basis \(e_k(t)\) of \(L^2([a,b])\), we can approximate the stochastic process with

\begin{equation} \hat{X}_t = \sum_{k=1}^{K} A_k\,e_k(t),\ A_k = \int_a^b X_t\,e_k(t)\,\mathrm{d}t ,\, K\in\mathbb{N} \end{equation}

The Karhunen–Loève expansion minimizes the total mean square error resulting of its truncation.

Source: Kosambi–Karhunen–Loève theorem - Wikipedia

10.1.2. What is the Karhunen–Loève decomposition of a Wiener process?

Let \(W_t\) be a Wiener process, i.e., a center standard Gaussian process with covariance function \(K_{W}(t,s)=\operatorname {cov} (W_{t},W_{s})=\min(s,t)\). Then, the expansion consists of sinusoidal functions

\begin{align} e_{k}(t) &={\sqrt{2}}\sin\left(\left(k-{\tfrac{1}{2}}\right)\pit\right) &\text{eigenfunctions}\\ \lambda_{k} &=\frac{1}{(k-{\frac{1}{2}})^{2}\pi^{2}} &\text{eigenvalues} \end{align}

Source: Kosambi–Karhunen–Loève theorem - Wikipedia

10.1.3. Principal Component Regression

Let \(\mathbf{T} = \mathbf{X} \mathbf{W}\) for principal component score matrix \(\mathbf{T}\) and loading matrix \(\mathbf{W}\). Set the models \(Y = \mathbf{X} \mathbf{\beta} + \mathbf{\varepsilon}\) and \(Y = \mathbf{T} \mathbf{\beta}_T + \mathbf{\varepsilon}\). Then, \(\mathbf{X} \mathbf{\beta} = \mathbf{T} \mathbf{\beta}_T \iff \mathbf{\beta} = \mathbf{W} \mathbf{\beta}_T\).

10.1.5. How to find which variables are collinear?

Look at the tail of the QR decomposition pivot vector (rickyrick at libera.chat)

10.1.6. What classification model to use for observations in a map?

  • Conditional random field: considers the context (neighborhood) of an observation when predicting a label
  • Markov random field: finite or infinite undirected graphical model that can do cyclic dependencies but can't do induced dependencies

10.1.7. How to compute the Cressie and Hawkins (1980) robust estimator?

Cressie and Hawkins (1980) found that the fourth-root of χ1 has a skewness of 0.08 and a kurtosis of 2.48 (compared with 0 and 3 for the Gaussian distribution). Estimates of location, such as the mean and the median, can then be applied to sqrt(X). Finally, these estimates can be raised to the 4th power and adjusted for bias. Consider the square-root-differences cloud for visualization.

10.1.8. When should I apply log transformation to positive data?

Some heuristics that I have heard here and there.

  • Recall that linearity, additivity, and constant variance are key assumptions for a linear model
  • Are you analyzing multiple responses separately? Do some responses benefit from the log transformation? Do you want to apply the same transformation to all the responses to avoid "p-hacking"?
  • Are the changes on the responses happening on a relative scale?
  • "Log transform, kids. And don’t listen to people who tell you otherwise."

Source:

10.1.9. When should I visualize positive data in log scale?

  • If the range(x) = max(x) - min(x) is equal to or larger than 10 (Jarad)
  • Similarly, if the ratio(x) = max(x) / min(x) is equal to or larger than 10
  • If plot is more loaded to the left of 1 than to the right (Jarad), especially for ratios.

10.1.10. How to transform a covariate taking non-negative values with zeroes?

Consider a covariate (AKA independent variable, regressor, predictor) with true zeroes observed.

10.1.11. What is the difference between cross-validation and the WAIC?

For hierarchical models, WAIC estimates predictive performance for a new observation from an existing group whereas cross-validation estimates the predictive performance for a new observation from a new group.

10.1.12. How to do positivity-preserving interpolation?

10.1.13. What are some good, general rules on reporting?

Here is some general guidance, which of course may not be the best fit for some specific situations

  • Clarify the research question
  • Focus on estimates, confidence intervals, and clinical relevance
  • Carefully account for missing data
  • Do not dichotomise continuous variables
  • Consider non-linear relationships
  • Quantify differences in subgroup results
  • Consider accounting for clustering
  • Interpret I2 and meta-regression appropriately
  • Assess calibration of model predictions
  • Carefully consider the variable selection approach
  • Assess the impact of any assumptions
  • Use reporting guidelines and avoid overinterpretation

Source

10.1.15. MCMC

  • One long run in MCMC: If you can't get a good answer with one long run, then you can't get a good answer with many short runs either.

10.1.16. Neural networks

10.2. Gaussian processes

10.2.1. Is the mean function a Gaussian process uninteresting?

  • Universal kernels should be able to approximate any continuous functions on a compact subset, making a mean function nonessential
  • The mean function may not live in the space of functions being modeled
  • A mean function can help with extrapolation when using a nonperiodic kernel

Source: Why is the mean function in Gaussian Process uninteresting?

10.2.2. Is the derivative of a Gaussian process another Gaussian process?

Since differentiation is a linear operator, the derivative of a Gaussian process is another Gaussian process.

Source:

10.2.3. Is the integral of a Gaussian process another Gaussian process?

If all finite linear combinations \(\sum_i a_i X_{t_i}\) are Gaussian and the process is continuous, \(Y_t = \int_0^t X_s \, \mathrm{d}s\) is a Gaussian process.

10.2.4. Is the convolution of a Gaussian process another Gaussian process?

The convolution of a Gaussian process, as a linear combination of Gaussia nrandom variables, remains a Gaussian process.

10.2.5. What are the main limitations of a Gaussian process?

  • Computationally impractical for large data sets: inference requires the inversion of an \(N\times{}N\) covariance matrix, which is \(O(N^3)\)
  • Covariance function is commonly assumed to be stationary, limiting modeling flexiblity (e.g., noise variance is different in differents parts of the input space, the function has a discontinuity)

11. Mathematics

11.1. Math

11.1.1. Calculus

  1. Resources for hard calculus
  2. Multivariable calculus
  3. How to choose \(u\) and \(dv\) in integration by parts?

    We pick \(u\) and \(dv\) from an expression of the form \(\int f(x) g(x) dx\). We need to differentiate \(u\) and integrate \(dv\).

    • Consider \(dv = f\) for \(f\) the most complex expression in the integrand among all those with known integral
    • Consider \(u = f\) if \(\int f dx\) is not known
    • Consider \(u = f\) when \(df\) is nicer to work with than \(\int f\)
    • Typical choices are \(u = f\) when \(f\) is a logarithmic, inverse trigonometric, algebraic, trigonometric, or exponential function.

    Source:

11.1.3. Linear algebra

11.1.4. Optimization

  • GENO: optimization solver code generator

11.1.5. Fourier basis functions

An orthonormal basis for \(L^2([0,1], \mathcal{R})\) is \(1, 2\sqrt{cos(2\pi nx)},2–\sqrt{sin(2\pi nx)}\) for \(x\in[0, 1]\) and \(n = 1,2,\dots\)

Source:

11.1.6. Computer Assisted Algebra (CAS)

As with any software, for Computer Assisted Algebra I try to use open source software as much as possible. I keep the use of Wolfram Alpha and Wolfram Mathematica to a minimum.

12. Science

12.1. Scientific communication

English is weird but it can be understood through thorough thought though. Communication is as hard as it gets. There's a trade-off between message length, precision, ease of reading, and understandability. I thrive for short and precise sentences that are easy to digest that are modular yet feel complete. Here's a couple notes for better writting.

12.1.1. General resources

  • The Elements of Style by Strunk, W., Jr. and White, E.B. An opinionated style guide.
  • The Chicago Manual of Style (15th ed.), Chicago: University of Chicago Press.
  • Garner's Modern American Usage by Bryan A. Garner.
  • English for Writing Research Papers by Adrian Wallwork
  • proselint: whisper suggestions on how to improve your prose, see what it looks for.

12.1.2. Academic phrases

12.1.3. No adjectives allowed

  • Good, best, better: in what sense?
  • Difficult: what are you trying to say more precisely? What's the complexity?

12.1.4. Use technical terms

  • List of technical names used frequently and short sentences
  • Emulator, surrogate, meta-model, statistical model versus computer experiment, computer model, mechanistic model, physical model, climate model, what have you
  • Significant(ly) has a technical connotation, use considerable or considerably if one want to avoid such connotation
  • A vector/scalar(-valued) function of a vector/scalar variable (greenbagels at Libera.Chat). A function of a vector variable, associated with multivariate calculus, depends on the domain of the function. A vector-valued or scalar-valued function, associated with vector calculus, depends on the range of the function.

12.1.5. Repeat technical terms, no synonyms

  • Prefer repeating a technical term in the same sentence, or a sequence of sentences, rather than using synonyms that are not defined.

12.1.6. Repeat the technical terms when in doubt

Yeah, it sounds weird to be repetitive and I sometimes feel that I keep repeting myself too much. As a reader, though, I sometimes wish the writter would be more precise even at the price of repetition. Even thought a section might be about a specific density function (e.g., predictive density), make sure that in every sentence it is obvious that you are referring to that density and not to any of the dozen densities that a model have and are highly connected to the density you are talking about.

12.1.7. Attributive nouns

They are tough to beat. In most situations, I prefer an extensive use of attributive nouns. Be ware they grow too big too often; I sometimes break a really long chain of nouns into two smaller attribute noun clauses.

Compare the benefit of the parametrization that we propose for this model versus the proposed model parametrization benefit.

12.1.8. Prefer short precise terms over phrases

  • multiple instead of more than one

12.1.9. Getting determiners right

12.1.10. Double negation is not an affirmation

Litotes help moderate an adjective, e.g., not unlike vs like, not uncommon vs common, non-trivial vs complex. "Not unlike" is slightly different than saying "like" much like saying "I love apples" is not the same thing as saying "I don't hate apples."

Source:

12.1.11. Verb tenses

I sometimes doubt about the right verb tense for a sentence. When I read, I often see authors changing tenses often, perhaps with a good reason. Here a few thoughts.

  • Active voice
    • I don't have a reference to use, but I've seen a pushback against the traditional use of passive voice. It's probably a good idea to stick to either active or passive voice throughout the manuscript.
  • Present tense
    • Facts that are always true, e.g., a Gaussian process is a probabilistic model
  • Present perfect
    • Things that have been done recently, e.g., Author (year) has recently introduced a novel framework
  • Past tense
    • Things that were done in the past, e.g., Author (year) built a parametric function

12.1.12. My own common tics to grep for

  • Hyphens are used when two or more adjectives or an adjective and a noun together modify another noun; for example, goodness-of-fit test is the equivalent of test for goodness of fit.
  • Most words with prefixes such as sub, non, pre, post are not hyphenated, for example: subtable, nonnormal, nonlinear, premultiply, postgraduate.
  • No dash when using the wise suffix, e.g., elementwise instead of element-wise, subspace instead of sub-space
  • Closed compound nouns, e.g., metadata instead of meta data
  • Prefer "Considering/As/Since subject verb" over "Considering that subject verb" (mdogg at libera.chat)
  • squared exponential is grammatical, square exponential is not
  • y is quadratic in x or y is a quadratic function of x, but never y is quadratic on x

12.1.13. Rules of thumb

  • Use "respectively" at the end of the sentence, preferably, or in the middle. Use comma.
  • Prefer "and is thus" over "and thus is" (187M vs 24M matches). Consider "therefore" or "thence" instead of "thus", which is becoming archaic, for causation.
  • Don't be unnecessarily indirect, don't be fluffy. E.g., "Consider"
  • Can/could/may: can expresses certainty, could expresses uncertainty or a conditional statement, may expresses a possibility or a permission, may not expresses a denial of permission. (The Chicago manual of style, 2017, 5.250).

12.1.14. Contextual formatting

  • Use bold for (i) make the core of the message as salient as possible, (ii) in-line headings at the start of the line
  • Use emph, or italic if emph is not available, to mark format names or definition concepts
  • Use underline for editing comments that won't make it to the final version
  • Use verbatim for programming-like keywords, e.g., gemv
  • Use code for in-line code snippets

12.1.15. Oral presentations

  • A sentence is better than a paragraph. A phrase is better than a sentence. A word is better than a phrase. An image is better than a word.
  • Have you noticed how hard it is to convey a technical detail while giving an oral presentation? Now imagine how hard it is for the audience to understand anything too technical. Be a nice speaker instead :)

12.1.17. Research proposal

12.3. Research project

Four git repositories are needed. An overarching repository for the project (project), and subcomponents for the three elements in the research project (data, analytics, literature respectively). Possibly use git submodules for the latter.

  • Data: data acquisition and munging, i.e. from raw data to the format required downstream
  • Analytics: analyses of the above data
  • Literature: writing based on the above analyses, e.g., abstracts, reports, manuscripts, presentations

12.4. Scientific writing

12.4.1. Tags

You probably want a LaTeX command for these.

  • Citation needed: when you are inclined to accept the statement, but still need to pin-point the best reference
  • Verification needed: when you want to double-check that something is as stated
  • Revision needed: when you want to review the math or, more generally, the logic of the statement
  • Further research needed: when you make a broad statement, typically outside the scope of the current writing, that seems worth exploring by you or anyone else
  • Discussion needed: when you want to have others opinion

12.5. Numerical computing

12.5.1. When to use analytical versus automatic differentiation?

Automatic differentiation can decrease runtime at large over numeric differentiation with little effort. In some (typically surprising) cases, automatic differentiation can yield better runtimes than analytical differentiation. A key aspect about analytical differentiation is that, besides the math work needed to obtain the analytical form, their implementation typically requires a non-trivial amount of work to beat automatic differentiation. Numerical differentiation is almost always the least performant.

Source: Analytic Derivatives — Ceres Solver

12.5.2. How many evaluations do L-BFGS-B require?

To optimize a function of \(m\) parameters, L-BFGS-B spends per step roughly \(2m\) calls on calculating a tangent plane (derivatives) plus a further call at each newly chosen location, which is typically at a short distance. L-BFGS-B can be more robust in higher dimension settings than Nelder-Mead.

Source: Surrogates - Robert Gramacy

12.5.3. How to select algorithm tolerance for numerical differentiation?

In absence of problem-specific information, use the square root of machine epsilon for forward difference, and the cubic root of machine epsilon for centered difference. Finding a minimum gets the square root of epsilon while finding a root gets epsilon.

Source: Choosing epsilon

12.5.4. How to compute the Euclidean distance matrix fast?

Use the dot product

X  <- iris3[,,1]

M  <- tcrossprod(X)
m  <- diag(M)
o  <- rep(1, nrow(M)) # column vector
h2 <- m %*% t(o)      # outer(m, o) = m %*% t(o)

D2 <- -2 * M + h2 + t(h2)

all.equal(D2, unname(as.matrix(dist(X))^2))
def compute_distances_no_loops(self, X):
  dists = -2 * np.dot(X, self.X_train.T) +
  np.sum(self.X_train**2, axis=1) + np.sum(X**2, axis=1)[:, np.newaxis]
  return dists

Source: Dot Product and Distance Matrix

Author: Luis Damiano

Created: 2023-03-13 Mon 18:03

Validate