Knowledge Base
Table of Contents
- 1. Motivation
- 2. Linux
- 2.1. Arch Linux
- 2.2. Bash
- 2.2.1. How to execute a loop conditionally (dry runs)
- 2.2.2. How to open the N most recently modified files?
- 2.2.3. How to cast a string to intenger?
- 2.2.4. How to make a conditional assignment in bash for a string?
- 2.2.5. How to make a conditional assignment in bash for a numeric?
- 2.2.6. How to explain shell code?
- 2.2.7. How to validate check shell code?
- 2.3. Linux
- 2.3.1. How to restart the bluetooth service?
- 2.3.2. How to read a systemd service log for today?
- 2.3.3. How to clean up space in /boot?
- 2.3.4. How to determine what shell I am currently working on?
- 2.3.5. How to set Firefox to default browser in xdg settings?
- 2.3.6. How to find and copy all files that match a regex?
- 2.3.7. How to list all files whose content match a regex?
- 2.3.8. How to list all the unique file names?
- 2.3.9. How to list memory used by processes in remote cluster?
- 2.3.10. What is the keycode of this key?
- 2.3.11. How to find the window dimension and position?
- 2.3.12. How to convert a pdf to png?
- 2.3.13. How to switch keyboard layout to type Greek letters?
- 2.3.14. What are the keyboard layout switching keys?
- 2.3.15. How to type math symbols, ascii emoji easily?
- 2.3.16. How to set a compose key (
Multi_key
)? - 2.3.17. What characters have superscripts and subscripts?
- 2.3.18. Which folder does this file belong in?
- 2.3.19. How to display and sort by modified date time with find command?
- 2.3.20. How to connect OBS to Zoom or Webex?
- 2.3.21. Where are the X.Org Server config files?
- 2.3.22. How to migrate from nouveau to propietary NVIDIA drivers for a NVS 510 graphic card?
- 2.4. Math typing
- 3. Utilities
- 3.1. Git
- 3.1.1. What on earth is git?
- 3.1.2. How to rename a branch?
- 3.1.3. What to do with a branch after merging?
- 3.1.4. How to list unmodified files?
- 3.1.5. How to clean a git repo?
- 3.1.6. How to change a git repo remote from https to ssh?
- 3.1.7. How to clean a .git folder that is too large?
- 3.1.8. How to stash files that were deleted from disk?
- 3.1.9. How to revert file removal from git repo?
- 3.2. SSH
- 3.3. Regular expressions
- 3.3.1. How to match a keyword except when used as a tag?
- 3.3.2. How to match all R dependencies?
- 3.3.3. How to match URLs?
- 3.3.4. How to test all the URLs in a text file?
- 3.3.5. How to match a custom file separator?
- 3.3.6. How to match any number with decimals up to 10?
- 3.3.7. How to generate a string that matches a regular expression?
- 3.3.8. General resources
- 3.4. GNU Make
- 3.5. PDF processing
- 3.5.1. How to convert a PDF file to a text file?
- 3.5.2. How to list all the fonts used in a PDF file?
- 3.5.3. How to extract images from a PDF file?
- 3.5.4. How to merge PDF files?
- 3.5.5. How to extract some pages from a PDF file?
- 3.5.6. How to compress a PDF file?
- 3.5.7. How to automatically crop margins from a PDF file?
- 3.5.8. How to extract and delete all the metadata in a PDF file?
- 3.5.9. How to set a password for a PDF file?
- 3.6. Hugo
- 3.1. Git
- 4. Typesetting
- 4.1. (La)TeX
- 4.1.1. How to write better LaTeX? guidelines
- 4.1.2. How to organize LaTeX files?
- 4.1.3. How to compile LaTeX document on save?
- 4.1.4. How to avoid orphan words?
- 4.1.5. How to prepend letter to figure name?
- 4.1.6. When should I use non-breaking space (tilde)?
- 4.1.7. How to write an unnumbered footnote?
- 4.1.8. How to type a differential operator
dx
? - 4.1.9. How to define a new operator like
log
? - 4.1.10. How should I type …?
- 4.1.11. What symbol should I use for … ?
- 4.1.12. Symbol lookup
- 4.1.13. How to type the math symbol for definition
:=
? - 4.1.14. How to debug the LaTeX document layout, sizes, and spacing?
- 4.1.15. What is the document body ratio?
- 4.1.16. How to reduce spacing around figures, tables, captions?
- 4.1.17. How to clip an image with percentages?
- 4.1.18. How to write a good proof in LaTeX?
- 4.1.19. How to smash all in-line math?
- 4.1.20. How to influence the position of a figure/table?
- 4.1.21. How to fiddle with math font height and width?
- 4.1.22. How to set a global path for input and graphic files?
- 4.1.23. How to align table columns without counting them?
- 4.1.24. How to gray out leading zeroes in a table?
- 4.1.25. How to add the (sub)section name in the page header?
- 4.1.26. How to hide section headings?
- 4.1.27. How to change the caption font size?
- 4.1.28. How to include a tex file with multicolumn commands?
- 4.1.29. How to properly compile a LaTeX project?
- 4.1.30. How to make GitHub compile a document after a push?
- 4.1.31. How to build a custom style?
- 4.1.32. What type of LaTeX commands exist?
- 4.1.33. How to make a class color safe?
- 4.1.34. How to write a custom LaTeX class?
- 4.1.35. Where do I put local LaTeX style files and packages?
- 4.1.36. How to make a poster in LaTeX?
- 4.1.37. Resources
- 4.2. BibLaTeX
- 4.3. Beamer
- 4.4. Graph diagrams
- 4.5. Reference management
- 4.1. (La)TeX
- 5. Text editors
- 5.1. Emacs
- 5.1.1. How to input TeX style?
- 5.1.2. How to open all files by pattern?
- 5.1.3. How to highlight by word, line, or regex?
- 5.1.4. How to profile init / settings?
- 5.1.5. How to navigate back to part of a buffer?
- 5.1.6. How to do collaborative editing with Emacs?
- 5.1.7. How to have custom outlining and folding in Emacs?
- 5.1.8. What are some good .emacs config files?
- 5.1.9. Emacs packages worth checking
- 5.1.10. Useful modes
- 5.2. Org-mode
- 5.3. Elisp
- 5.1. Emacs
- 6. Programming
- 7. R programming
- 7.1. Write better R
- 7.2. R programming
- 7.2.1. Programming with functions
- 7.2.2. Vectorization
- 7.2.3. How to create a sequence within groups?
- 7.2.4. How to identify value changes in a sequence?
- 7.2.5. How to renumber a group?
- 7.2.6. How to make a cross table with a custom function (e.g., mean)?
- 7.2.7. How to make lapply return a data.frame?
- 7.2.8. How to split a matrix row-wise as a list?
- 7.2.9. How to look up a value among possibilities?
- 7.2.10. How to remove columns with all NA fast?
- 7.2.11. How to fast apply a function over a ragged array?
- 7.2.12. How to fast subset rows corresponding to max value by group?
- 7.2.13. How to create named vector programatically in one statement?
- 7.2.14. How to get all function call arguments as a list?
- 7.2.15. How to debug an error thrown in a package?
- 7.2.16. How to compute a simple moving average with base R?
- 7.2.17. How to set the seed locally for a function?
- 7.2.18. Visualization
- 7.2.19. How to improve the look of my rmarkdown HTML document?
- 7.2.20. Where can I find useful addins for RStudio?
- 7.2.21. Resources
- 7.3. RStudio
- 7.4. Emacs Speaks Statistics
- 8. Stan programming
- 9. Data science
- 10. Statistics
- 10.1. Statistics
- 10.1.1. What is the Karhunen–Loève theorem?
- 10.1.2. What is the Karhunen–Loève decomposition of a Wiener process?
- 10.1.3. Principal Component Regression
- 10.1.4. How to revert SVD?
- 10.1.5. How to find which variables are collinear?
- 10.1.6. What classification model to use for observations in a map?
- 10.1.7. How to compute the Cressie and Hawkins (1980) robust estimator?
- 10.1.8. When should I apply log transformation to positive data?
- 10.1.9. When should I visualize positive data in log scale?
- 10.1.10. How to transform a covariate taking non-negative values with zeroes?
- 10.1.11. What is the difference between cross-validation and the WAIC?
- 10.1.12. How to do positivity-preserving interpolation?
- 10.1.13. What are some good, general rules on reporting?
- 10.1.14. Priors
- 10.1.15. MCMC
- 10.1.16. Neural networks
- 10.2. Gaussian processes
- 10.2.1. Is the mean function a Gaussian process uninteresting?
- 10.2.2. Is the derivative of a Gaussian process another Gaussian process?
- 10.2.3. Is the integral of a Gaussian process another Gaussian process?
- 10.2.4. Is the convolution of a Gaussian process another Gaussian process?
- 10.2.5. What are the main limitations of a Gaussian process?
- 10.2.6. Software
- 10.2.7. Resources
- 10.1. Statistics
- 11. Mathematics
- 12. Science
- 12.1. Scientific communication
- 12.1.1. General resources
- 12.1.2. Academic phrases
- 12.1.3. No adjectives allowed
- 12.1.4. Use technical terms
- 12.1.5. Repeat technical terms, no synonyms
- 12.1.6. Repeat the technical terms when in doubt
- 12.1.7. Attributive nouns
- 12.1.8. Prefer short precise terms over phrases
- 12.1.9. Getting determiners right
- 12.1.10. Double negation is not an affirmation
- 12.1.11. Verb tenses
- 12.1.12. My own common tics to grep for
- 12.1.13. Rules of thumb
- 12.1.14. Contextual formatting
- 12.1.15. Oral presentations
- 12.1.16. Research statement
- 12.1.17. Research proposal
- 12.2. Scientific programming
- 12.3. Research project
- 12.4. Scientific writing
- 12.5. Numerical computing
- 12.1. Scientific communication
Table of Contents
Tips & tricks
My ultimate gotcha personal collection
Stuff I keep forgetting and I need to think about, decide on, or search for too often
1. Motivation
This is the way (The Mandalorian)
A power user might not have extensive technical knowledge of the systems they use but is rather characterized by competence or desire to make the most intensive use of computer programs or systems (Wikipedia)
2. Linux
2.1. Arch Linux
2.1.1. How to update pacman mirrors?
# https://github.com/westandskif/rate-mirrors#usage export TMPFILE="$(mktemp)"; \ sudo true; \ rate-mirrors --save=$TMPFILE arch --max-delay=21600 \ && sudo mv /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist-backup \ && sudo mv $TMPFILE /etc/pacman.d/mirrorlist
- Pacman Mirrorlist Generator
- Reflector: retrieve, filter, sort by speed, and overwrite mirror list
2.1.2. How to fix "Pacman is currently in use, please wait"?
If you're sure that pacman is not running in another shell, you can remove the lock by hand.
sudo rm /var/lib/pacman/db.lck
Source:
2.2. Bash
- Bash scripting cheatsheet
- Advanced Bash-Scripting Guide: an in-depth exploration of the art of shell scripting
- pure bash bible: a collection of pure bash alternatives to external processes
- BashFAQ: do not sleep on this, it's a gold mine
2.2.1. How to execute a loop conditionally (dry runs)
#!/bin/bash doit=0 for i in {1..3}; do echo -n ${i} done echo "---" for i in {1..3}; do ((doit))&&(echo -n ${i}) done echo "---" for i in {1..3}; do ((doit))&&echo -n ${i} done echo "---" ((doit))&&for i in {1..3}; do echo -n ${i} done echo "---" ((doit))&& for i in {1..3}; do echo -n ${i} done echo "---"
Also ((dontdoit))!!echo "something"
2.2.2. How to open the N most recently modified files?
emacs $(/bin/ls */* --sort=time | tail -n 5)
2.2.3. How to cast a string to intenger?
Cast a string to integer in bash by adding 0. It works well if the string is NULL.
NUM="99" NUM=$(($NUM+0)) NUM="" NUM=$(($NUM+0))
Source:
2.2.4. How to make a conditional assignment in bash for a string?
## Assign "string-true" to VAR if VARTOCHECK is equal to "string-cond" [ $VARTOCHECK == "string-cond" ] && VAR="string-true" || VAR="string-false"
Source:
2.2.5. How to make a conditional assignment in bash for a numeric?
Use the following expression if the assignment is numerical.
## Assign 20 if true, or 10 if false variable=$(( 1 > 0 ? 20 : 10 ))
Source:
2.2.6. How to explain shell code?
Use explainshell to match command-line arguments to their help text.
2.2.7. How to validate check shell code?
Use shellcheck for warnings and suggestions for shell scripts. It can be used online in ShellCheck.net.
2.3. Linux
2.3.1. How to restart the bluetooth service?
Use bluetoothctl
to turn the bluetooth controller off and on
echo -e 'show\npower off\npower on\nquit' | bluetoothctl
2.3.2. How to read a systemd service log for today?
Don't forget to add --user
if it's a user service. Also, -b
is
short for -b-0
(Ghosthree3 at libera.chat)
journalctl --user -u servicename -b
2.3.3. How to clean up space in /boot?
Consider removing the fallback initramfs to make temporary space. A
definite solution is to expand the boot folder. Edit the
linux.preset in /etc/mkinitcpio.d to disable generating a new
fallback image. Helpful when pacman throws Partition /boot too
full
.
ls -sh /boot/initramfs-linux-fallback.img
2.3.4. How to determine what shell I am currently working on?
ps -p $$
2.3.5. How to set Firefox to default browser in xdg settings?
Check that there is a desktop file in
/usr/share/applications/firefox.desktop
. If there isn't, create one:
Then, update the corresponding xdg entry xdg-settings set
default-web-browser firefox.desktop
. Test by running
xdg-open "www.mozilla.org"
.
2.3.6. How to find and copy all files that match a regex?
Use rsync --parent
to keep the folder structure.
find -iname '*.rds' -exec cp {} . \; find -iname '*.rds' -exec rsync --parent {} . \;
2.3.7. How to list all files whose content match a regex?
find . -exec grep optimHess {} +
2.3.8. How to list all the unique file names?
## any file extension find . -iname '*.*' -exec basename {} \; | sort | uniq ## a specific file extension, e.g., rds files find . -iname '*.rds' -exec basename {} \; | sort | uniq
Source:
2.3.9. How to list memory used by processes in remote cluster?
#!/bin/sh squeue --me -o "%R" | while read -r host; do ssh -n "$host" "ps -o pid,user,%mem,rss,size,%cpu,command ax | grep \"r-4\""; done
2.3.10. What is the keycode of this key?
Use xev
to look for the keycode and keysym in the KeyPress or
KeyRelease event
2.3.11. How to find the window dimension and position?
Run xwininfo
on a console and click on the target window.
2.3.12. How to convert a pdf to png?
2.3.13. How to switch keyboard layout to type Greek letters?
To switch between US and Greek (polytonic) keyboard layouts using
Alt + Caps Locks. Add to ~/.xinitrc
, or possibly
~/.config/i3/config
, to make it persistent.
setxkbmap 'us,gr' -variant ',polytonic' -option grp:'alt_caps_toggle'
2.3.14. What are the keyboard layout switching keys?
- To list them all
man xkeyboard-config
Other than Alt + Shift to switch keyboard layout, any other Xorg key - To list the toggle keys only
grep "grp:.*toggle" /usr/share/X11/xkb/rules/base.lst
Xorg/Keyboard configuration
2.3.15. How to type math symbols, ascii emoji easily?
Use a Compose key. Here's my .XCompose, below there are many more
- Xorg/Keyboard configuration - ArchWiki
- xcompose/.XCompose at master · exaexa/xcompose · GitHub
- XCompose/.XCompose at master · svinota/XCompose · GitHub
- xcompose/math.conf at master · nshepperd/xcompose · GitHub
- xcompose/superscripts.conf at master · nshepperd/xcompose · GitHub
- dotfiles/.XCompose at master · Wafelack/dotfiles · GitHub
2.3.16. How to set a compose key (Multi_key
)?
- To list all compose key options
grep "compose:" /usr/share/X11/xkb/rules/base.lst
- To set a compose key
setxkbmap -option compose:rctrl
- Xorg/Keyboard configuration - ArchWiki
2.3.17. What characters have superscripts and subscripts?
2.3.18. Which folder does this file belong in?
2.3.19. How to display and sort by modified date time with find command?
2.3.20. How to connect OBS to Zoom or Webex?
A PulseAudio source works as an input device (microphone, monitor) sink works as an output device (speaker).
To redirect video from OBS to a new virtual camera input
- Install the linux headers if you don't have them, e.g,
sudo pacman -S linux-headers
- Install
v4l2loopback-dkms
, e.g.,sudo pacman -S v4l2loopback-dkms
- Load the kernel module
sudo modprobe v4l2loopback
- Select
Dummy video device
as video source in the meeting app, e.g., in Zoom Settings > Video > Camera - Source: Open Broadcaster Software - ArchWiki
To redirect audio from OBS to a new virtual sound input
Create a null output device
pulsemodule=$(pactl load-module module-null-sink sink_name=obs_audio sink_properties=device.description=obs_audio_sink_for_mic)
- In
pavucontrol
, Playback tab, change the output ofOBS-monitor
toNull output
- In
pavucontrol
, Recording tab, change the input ofZoom
toNull output
- Source:
- MacGyver at libera.chat,
- PulseAudio/Examples - ArchWiki
- Redirecting Pulseaudio sink to a virtual source
2.3.21. Where are the X.Org Server config files?
/etc/X11/xorg.conf.d/
(preferred place for host-specific configurations)/usr/share/X11/xorg.conf.d/
/etc/X11/xorg.conf
(deprecated)/etc/xorg.conf
(deprecated)
2.3.22. How to migrate from nouveau to propietary NVIDIA drivers for a NVS 510 graphic card?
- Remove nouveau:
sudo pacman -R xf86-video-nouveau
- Remove all xorg.conf files that might still refer to nouveau drivers
(check in
/etc/X11/xorg.conf.d/
and/usr/share/X11/xorg.conf.d/
) - Install the NVIDIA drivers: AUR (en) - nvidia-470xx-utils
2.4. Math typing
2.4.1. Math keyboard layout with UTF-8 support
- Fork and change https://github.com/SV-97/Math-Layout/blob/master/gm
2.4.2. Compose key for mathematics
- Type an equation in UTF8?
3. Utilities
3.1. Git
3.1.1. What on earth is git?
[…] at the core of Git is a simple key-value data store [..,] you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.
Source: Git - Git Objects
3.1.2. How to rename a branch?
git branch -m newname
3.1.3. What to do with a branch after merging?
After the merge, it's safe to delete the branch
git branch -d branch1
3.1.4. How to list unmodified files?
diff <(git diff --name-only) <(git ls-files -- sub/dir) | grep "^>" | cut -b3-
Source: is there any way to list the unmodified files in Git?
3.1.5. How to clean a git repo?
git clean -d -x -f
3.1.6. How to change a git repo remote from https to ssh?
git remote -v git remote set-url origin git@server:repo.git git remote -v
3.1.7. How to clean a .git folder that is too large?
Call the garbage collector
git gc
For binary files that change often
git repack -a -d --depth=250 --window=250
Source: how to shrink the .git folder?
3.1.8. How to stash files that were deleted from disk?
git commit -a
Source: using git commit -a
git ls-files --deleted -z | xargs -r0 git rm
Source: staging deleted files
3.1.9. How to revert file removal from git repo?
To revert all changes since last commit
git reset --hard HEAD
To revert some changes
git reset git checkout <file-name>
Source: how to revert git rm -r
3.2. SSH
3.2.1. How to generate a new SSH key?
The path to the key has to be unique. Requires x-clip
.
# Generate a new key ssh-keygen -t ed25519 -C "your_email@example.com" # Add key to the ssh-agent in your local machine eval "$(ssh-agent -s)" ssh-add ~/.ssh/id_ed25519 # Copy key to clipboard xclip -selection clipboard < ~/.ssh/id_ed25519.pub
3.2.2. How to avoid logging in remotely many times?
Use ControlMaster
in .ssh/config
ControlMaster auto ControlPath ~/.ssh/%r@%h:%p
3.2.3. How to avoid typing remote addresses and ports?
Configure the host in .ssh/config
once and do ssh shortname
Host shortname User username HostName hostname Port 22 ServerAliveInterval 60
3.2.4. Quality of life improvements
ssh/config
preconfigurations for shortening the ssh and sshfs invocations (see ssh_config)- autosf
- lazy mount for sshfs via systemd
3.3. Regular expressions
Gotta love them!
3.3.1. How to match a keyword except when used as a tag?
To match foo
in any conex except when enclosed by brackets
{foo}
, use negative lookarounds but be careful to match {foo
and foo}
. Alternatively, use PCRE with (*SKIP)(*F)|}
(OnlineCop at libera.chat). Other use cases for this pattern:
exclude foo in comment lines (#foo
, //foo
, /*foo*/
),
quotes ("foo"
, 'foo'
).
grep "(?<!\{)foo(?!\})" grep -P "\{foo\}(*SKIP)(*F)|foo"
Source:
3.3.2. How to match all R dependencies?
Note the need to use single quotes around the regex and to double-escape the
function call parenthesis to match, for example, library(capturethis)
grep -Poh -r '(\w+)(?=::)|(?:library|require(?:Namespace)?)\\("?(\w+)"?\\)' --include="*\.R" | sort | uniq
Source:
3.3.3. How to match URLs?
grep -Poh '(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])' myfile.txt | sort | uniq
Source:
3.3.4. How to test all the URLs in a text file?
grep -Poh '(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])' myfile.txt | sort | uniq | xargs wget --spider
3.3.5. How to match a custom file separator?
I typically separate my file in labeled sections using this
pattern: comment separator (one or more), space, label (can include
space), space, at least four dash lines but typically enough to
fill the column space. This can be matched with \S+\s(.+?)\s-{4,}
;; Speed bar ------------------------------------------------------------------- %% Introduction ---------------------------------------------------------------- ## Read data -------------------------------------------------------------------
3.3.6. How to match any number with decimals up to 10?
^(?:\d|\d?\.\d+|10(?:\.0+)?)$
Source: https://regex101.com/r/kk8SIF/1/
3.3.7. How to generate a string that matches a regular expression?
A regex to string kind of situation.
- onlinestringtools: string from regex generator
3.3.8. General resources
3.4. GNU Make
3.4.1. How to call Make on all files in a directory?
Populate the all
target programatically. For each file with extension
ext1
, build a file with the same name but extension ext2
.
all : $(patsubst %.ext1, %.ext2, $(wildcard *.ext1))
all : $(patsubst %.tex, %.pdf, $(wildcard *.tex))
Source:
3.5. PDF processing
- Tesseract OCR: neural net (LSTM) based OCR engine
- Use pdftotext instead of copying directly from a PDF
- Use OCRdesktop to grab content from a screen as text
- What's so hard about PDF text extraction?
- Deleting whitespaces
3.5.1. How to convert a PDF file to a text file?
Use pdftotext
to extract the content of a PDF file and write it into a text
file.
pdftotext document.pdf document.txt
3.5.2. How to list all the fonts used in a PDF file?
Use pdffonts
to list all the fonts used in a PDF
pdffonts document.pdf
3.5.3. How to extract images from a PDF file?
Use pdfimages
to extract images from a PDF file
pdfimages -list document.pdf pdfimages -png -f 1 -l 10 document.pdf
3.5.4. How to merge PDF files?
Use pdfunite
to merge input1.pdf
and input2.pdf
into
merged_document.pdf
.
pdfunite input1.pdf input2.pdf merged_document.pdf
Use Ghostscript to merge input1.pdf
and input2.pdf
into
merged_document.pdf
.
gs -sDEVICE=pdfwrite dNOPAUSE -dBATCH -dSAFER -dQUIET \ -dAutoRotatePages=/None \ -sOutputFile="merged_document.pdf" input1.pdf input2.pdf
Source: How to merge several PDF files?
3.5.5. How to extract some pages from a PDF file?
Use pdfseparate
to extract page by page as sample-1.pdf, sample-2.pdf,
sample-3.pdf.
pdfseparate sample.pdf sample-%d.pdf
Use Ghostscript to extract pages 12-15 from input.pdf
into
extracted_document.pdf
.
gs -sDEVICE=pdfwrite dNOPAUSE -dBATCH -dSAFER -dQUIET \ -dFirstPage=12 \ -dLastPage=15 \ -sOutputFile="extracted_document.pdf" \ input.pdf
3.5.6. How to compress a PDF file?
Use Ghostscript to compress input.pdf
and write into
compressed_document.pdf
using a distiller preset
-dPDFSETTINGS=/screen
lower quality, smaller size (72 dpi)-dPDFSETTINGS=/ebook
better quality, slightly larger size (150 dpi)-dPDFSETTINGS=/prepress
prepress optimized (300 dpi)-dPDFSETTINGS=/printer
print optimized (300 dpi)-dPDFSETTINGS=/default
wide variety of uses, larger size
gs -sDEVICE=pdfwrite dNOPAUSE -dBATCH -dSAFER -dQUIET \ -dCompatibilityLevel=1.4 \ -dPDFSETTINGS=/ebook \ -sOutputFile="compressed_document.pdf" input.pdf
Source: How can I reduce the file size of a scanned PDF file?
3.5.7. How to automatically crop margins from a PDF file?
Use pdfcrop
to automatically crop margins from a PDF file
pdfcrop document.pdf
3.5.8. How to extract and delete all the metadata in a PDF file?
Use exiftool
for reading and manipulating PDF metadata.
exiftool -all:all file.pdf ## read all the tags exiftool -all:all= file.pdf ## remove all the tags
Source:
3.5.9. How to set a password for a PDF file?
Use qpdf
, or the older and sometimes deprecated pdftk
. Use single quotes
when including special characters. The user password is mandatory. The user and
owner passwords cannot be the same.
## 128 and 256 bit encryption available in qpdf qpdf --encrypt 'pass1' 'pass2' 256 -- input.pdf output.pdf ## 128 bit or lower encryption available in pdftk pdftk input.pdf output output.pdf owner_pw 'pass1' user_pw 'pass2' encrypt_128bit
Source:
3.6. Hugo
3.6.1. How to set target=_blank
for all external links?
mkdir -p layouts/_default/_markup/render-link.html echo '<a href="{{ .Destination | safeURL }}"{{ with .Title}} title="{{ . }}"{{ end }}{{ if strings.HasPrefix .Destination "http" }} target="_blank"{{ end }}>{{ .Text }}</a>' > layouts/_default/_markup/render-link.html
Source: How to Open Link in New Tab with Hugo's new Goldmark Markdown Renderer
3.6.2. How to set the home page to other content?
Show contentname
as landing or home page
mkdir -p layouts echo '{{ with .GetPage "/contentname" }}{{.Render}}{{end}}' > layouts/index.html
4. Typesetting
4.1. (La)TeX
4.1.1. How to write better LaTeX? guidelines
4.1.2. How to organize LaTeX files?
Elements are in order recommended in The Chicago Manual of Style
Short document
- myclass.cls
- wrapper.tex
- frontmatter.tex
- title page, table of contents, list of illustrations, list of tables, list of abbreviations
- mainmatter.tex
- introduction
- rest of the content
- backmatter.tex
- appendices, bibliography, index
Full book
- myclass.cls
- wrapper.tex
- frontmatter.tex
- half title (main title sans subtitle), series title or frontispiece, title page, copyright page, dedication, epigraph, table of contents, list of illustrations, list of tables, foreword, preface, acknowledgements, list of abbreviations, second half title
- mainmatter.tex
- introduction
- rest of the content
- backmatter.tex
- epilogue, appendices, glossary, bibliography, index, and colophon
4.1.3. How to compile LaTeX document on save?
Run latexmk -pdf -pvc --interaction=nonstopmode file.tex
to
compile the TeX document every time the file changes on disk.
Source: What You See is What You Get (WYSIWYG) for PGF/TikZ?
4.1.4. How to avoid orphan words?
Sometimes, we have one or two words hanging on the last line of a paragraph.
4.1.5. How to prepend letter to figure name?
%% Add prefix S to label names \renewcommand{\thepage}{S\arabic{page}} \renewcommand{\thesection}{S\arabic{section}} \renewcommand{\thetable}{S\arabic{table}} \renewcommand{\thefigure}{S\arabic{figure}} %% Modify the text used at the start of the caption \renewcommand{\figurename}{Supplemental Material, Figure}
4.1.6. When should I use non-breaking space (tilde)?
4.1.7. How to write an unnumbered footnote?
4.1.8. How to type a differential operator dx
?
Use upright d
followed by math x
, and possibly add a space \,
before the differential operator
\begin{equation} \int_0^1 x^2 \, \mathrm{d}x \end{equation}
4.1.9. How to define a new operator like log
?
Use \mathop
, or \DeclareMathOperator
if amsopn
or amsmath
are loaded
to create operators such as trace, diag, argmax, and argmin.
\newcommand{\diag}{\mathop{\mathrm{diag}}} \DeclareMathOperator{\diag}{diag} \DeclareMathOperator{\trsym}{tr} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator{\evsym}{E} \DeclareMathOperator{\vsym}{V} \DeclareMathOperator{\corsym}{Cor} \DeclareMathOperator{\covsym}{Cov} \DeclareMathOperator{\sesym}{SE}
Source:
4.1.10. How should I type …?
- Punctuation
- add to an inline and displayed equation the punctuation required by conventional rules, e.g., question mark, comma, semicolon, period
- add punctuation inside the math mode scope to avoid orphans
- Fractions
- prefer horizontal bar or forward slash over precomposed fractions for accessibility reasons.
- metric units are given in decimal fractions, non-metric units can be either type of fraction
- Upright (Roman) versus italic
- prefer upright over italic for operators, e.g., \(\mathrm{d}x\) over \(dx\)
- but prefer italic over upright for standard universal constants, e.g., \(e\) over \(\mathrm{e}\) and \(\pi\) over \(\mathrm{\pi}\)
- prefer italic for variable and function names,
- but prefer upright for multi-letter names, e.g., \(\mathrm{log}(x)\) over \(log(x)\) and \(\mathrm{sin}(x)\) over \(sin(x)\)
- prefer upper case italics for sets
- Blackboard bold: standard number system and some certain mathematical objects
- Linear algebra:
- prefer lowercase bold for vectors, uppercase bold for matrices
- transpose: prefer
\top
for a superscript non-italic capital letter T - scalar or dot product: prefer center dot
\cdot
over juxtaposition - inner product: use angle brackets
\langle
and\rangle
, e.g., \(\langle \mathbf{a}, \mathbf{b} \rangle\) (How to define an inner product argument in LaTeX) - matrix product
- this is some text followed by a link (This is the helpful and descriptive page title)
4.1.11. What symbol should I use for … ?
- The Comprehensive LaTeX Symbol List
- List of LaTeX mathematical symbols: all predefined mathematical symbols
4.1.12. Symbol lookup
- Detexify: draw symbol by hand or use the symbol table
4.1.13. How to type the math symbol for definition :=
?
Use \coloneqq
and \vcoloneqq
from the mathtools
package.
- Source: Symbol for definition :=
4.1.14. How to debug the LaTeX document layout, sizes, and spacing?
- lua-visual-debug: it looks gorgeous
4.1.15. What is the document body ratio?
Body width: \the\textwidth Body height: \the\textheight
4.1.17. How to clip an image with percentages?
Use the adjustbox
package to
\adjustbox{trim={.05\width} {.2\height} {0.1\width} {.15\height}, clip}% {\includegraphics[width=0.5cm]{cupdot.png}}
Source:
4.1.18. How to write a good proof in LaTeX?
- Use the
\proof
environment from theamsthm
package (source)- Add some text, if needed
- Begin a
align
environment- Use the
aligned
subenvironment withinalign
, possibly with a[t]
option, or similar subenvinroments to break long lines (source) - Use the
\shortintertext
command from themathtools
package to write short text between lines, or\intertext
if you prefer larger space above and below the math line (source) - Use the
\tag
command if you want to refer to a line by a name instead of a number
- Use the
4.1.19. How to smash all in-line math?
There is a strong argument for increasing the linespread
to
accomodate for $Y^{(0)}$
if you need to use smash too often. If
you insist, though, use \setlength{\lineskiplimit}{-100pt}
4.1.20. How to influence the position of a figure/table?
- Low-effort approach: load
\usepackage[section]{placeins}
and seth!
for placement specifiers. It adds a\FloatBarrier
(more) at the end of a section to avoid mixing floats and sections. - How to influence the position of float environments like figure and table in LATEX?
4.1.21. How to fiddle with math font height and width?
4.1.22. How to set a global path for input and graphic files?
\makeatletter \def\input@path{{/path/to/folder/}} % or: \def\input@path{{/path/to/folder/}{/path/to/another/folder/}} \makeatother
Source: \input and absolute paths
4.1.23. How to align table columns without counting them?
Say you want to align the first column to the left and the rest to
the right without needing to figure out how many columns the table
has. You can specify more columns than used, but not vice-versa, as
in l*9r
or l*{99}
. Use brackets for more than one digit.
\begin{tabular}{l*{99}r} col1 & col2 \\ \end{tabular}
4.1.24. How to gray out leading zeroes in a table?
This macro was shared by [exa] at libera.chat.
4.1.25. How to add the (sub)section name in the page header?
% Add section name in page header \usepackage{fancyhdr} \fancypagestyle{main}{ \fancyhf{} \renewcommand{\sectionmark}[1]{\markright{\thesection\ ##1}} \renewcommand{\subsectionmark}[1]{\markright{\thesubsection\ ##1}} \renewcommand{\subsubsectionmark}[1]{\markright{\thesubsubsection\ ##1}} \fancyhead[L]{\textsl{\footnotesize{\rightmark}}} } \pagestyle{main}
4.1.26. How to hide section headings?
\usepackage{titlesec} \makeatletter \titleformat{\section}[runin]{}{}{0pt}{\@gobble} \titleformat{\subsection}[runin]{}{}{0pt}{\@gobble} \makeatother \titlespacing{\section}{\parindent}{0pt}{0pt} \titlespacing{\subsection}{\parindent}{0pt}{0pt}
4.1.28. How to include a tex file with multicolumn commands?
Long story short, you can't call \input{filename}
before \multicolumn
.
\csname @@input\endcsname filename
Source:
4.1.29. How to properly compile a LaTeX project?
Use latemk file.tex -pdf
and that's it. No need to run multiple
commands or figure out the right order. Put in a makefile if that's
your workflow.
4.1.30. How to make GitHub compile a document after a push?
4.1.31. How to build a custom style?
- Consider building a class
Mix and match (suggestions for a research paper style):
- Search for accessibility guidelines
- A base template for inspiration, e.g., ICML2021 Template
- Layout, e.g., two columns
- Portrait pages need to be easy to set up
- Line numbering
- Author affiliation need to be easy to set up
- Short title in header/footer
- Page number in header/footer
- Headings need to be easy to spot
- Text font, e.g., libertinus, palladios, see also The LaTeX Font Catalogue
- Math font
- Load AMS packages
- Reasonably minimal page margins
- Reasonably minimal fig/table caption margins
- Allow for fig/table extending over both columns
- Appendix need to be easy to set up
- Appendix fig/table numbering should start with A, e.g., Table A.1
- Option to hide names and acknowledgments
- Option for a short table of content on front page
- Have some basic editing commands
- Logical markup, see CTAN: Package soul (Manual, Sec 5.1)
- Note box, see How to highlight an entire paragraph?
- Recall that not everyone types in English
4.1.32. What type of LaTeX commands exist?
- Author commands
- typically short, lower case names
- e.g.
\section
,\emph
,\times
- Class and package writer commands
- typically long, CamelCase names
- e.g.,
\InputIfFileExists
,\RequirePackage
,\PassOptionsToClass
- Internal commands
- typically contain an @ in their name
- e.g.,
\@tempcnta
,\@ifnextchar
and\@eha
- Exceptions:
\hbox
is internal\m@ne
, the constant -1, is for class and package writers
4.1.33. How to make a class color safe?
- Issue: when using
{\color{green} text}
, color is restored after the final}
- e.g., \setbox0=\hbox{\color{green} ⟨text⟩}
- Use LaTeX box commands rather than TeX primites
\sbox
rather than\setbox
\mbox
rather than\hbox
\parbox
or aminipage
environment rather than\vbox
- Use
\normalcolor
to set regions to main document color rather like\normalfont
4.1.34. How to write a custom LaTeX class?
- Use the
doc
software which comes with LaTeX (see The LaTeX companion)
4.1.35. Where do I put local LaTeX style files and packages?
The path of the local folder, and many others, are defined in
/etc/texmf/web2c/texmf.cnf
. The default path for local files is ~/texmf
.
Packages, .tex and .sty files go in ~/texmf/tex/latex
, bibtex style files
.bst go in ~/texmf/bibtex/bst/
. Finally, run texhash ~/texmf/
to update
the database.
4.1.36. How to make a poster in LaTeX?
Use beamerposter (possibly with the Gemini template) or tikzposter.
4.1.37. Resources
- The Not So Short Introduction to LaTeX2e: LaTeX2e in 139 minutes
- The TeXbook: an example of the business of writing a book in TeX
- TeXbyTopic: covers the way TeX works in as much detail as most ordinary programmers will ever need to know
- AMS Author Handbook: include package recommendation, structure, and checklist
4.2. BibLaTeX
- Use biblatex (LaTeX package that format citations) with a biber backend (external program that processes bibliography information)
- bibtex vs. biber and biblatex vs. natbib
4.2.1. How to format citations and the bibliography with BibLaTeX?
%% references \usepackage[ backend=bibtex, % Use legacy bibtex backend (biber is better) natbib=true, % Load aliases for citation commands citestyle=authoryear-comp, % Inline: Author year, compressed maxcitenames=1, % Inline: max 1 author name bibstyle=authoryear, % Bibliography: Author year giveninits=true, % Bibliography: first and middle name initials dashed=false, % Bibliography: no dash for recurrent authors abbreviate=true, % Bibliography: abbreviate maxbibnames=100, % Bibliography: all author names sorting=nyt, % Bibliography: sort by name, year, title isbn=true, % Bibliography: print ISBN url=false, % Bibliography: don't print URL doi=true, % Bibliography: print DOI eprint=false % Bibliography: don't print eprint information ]{biblatex} % Bibliography: print authors as "last name, first name" \DeclareNameAlias{sortname}{family-given}
Source:
4.2.2. How to bold my name using BibLaTeX?
This will bold YOURFAMILYNAME in the bibliography but not in the citations.
% Bibliography: bold last name \usepackage{ifthen} \AtBeginBibliography{% \renewcommand*{\mkbibnamefamily}[1]{% \ifthenelse{\equal{#1}{YOURFAMILYNAM}}{\textbf{#1}}{#1}} }
Source:
4.3. Beamer
4.3.1. How to automatically add a section slide in Beamer?
Use AtBeginSection
and AtBeginSubsection
respectively to add a frame when
a section or a subsection is created. Beamer has the built-in templates
sectionpage
and subsectionpage
for section slides. You may modify them or
simply create a new template.
\AtBeginSection{\frame{\sectionpage}} \AtBeginSubsection{\frame{\subsectionpage}}
Source:
4.3.2. How to hide a frame in Beamer?
Use <beamer:0>
to hide it in the presentation, use <handout:0>
to hide it
in the handout, use <handout:0|beamer:0>
or simply <all:0>
to hide it in
both beamer and handout mode.
\begin{frame}<beamer:0> \frametitle{This frame won't be visible} Lorem ipsum \end{frame}
Source:
4.3.3. How not to count appendix slides in Beamer?
Use \usepackage{appendixnumberbeamer}
to automatically reset the frame
counter when the \appendix
command is called. No need to do this by hand :)
4.3.4. How to add a hyperlink to a Beamer frame?
Load \usepckage{hyperref}
. Use \label{somename}
or
\hypertarget{somename}
to set a target. Use \hyperlink{somename}{text to
show}
or \hyperlink{somename}{\beamerbutton{text to show}}
to print a text
or button linking to the target.
4.3.5. How to build a Beamer template?
Mix and match:
- A base template
- Text font, e.g., https://tug.org/FontCatalogue/mathfonts.html
- Latin Modern, Stix2, XITS, Asana, Fira Math, TeX Gyre {Pagella,Bonum,Schola,Termes,Heros}, Libertinus, Euler, Lucida Bright, and Minion Pro
- Math font, e.g., Libertinus
- Color palette, e.g., https://lvjr.bitbucket.io/contrast.html
- Accessibility guidelines
- Good presentation guidelines
- List of slide layouts
4.4. Graph diagrams
4.4.2. PGF/TikZ
Typically requires a bit more of work, but it's more flexible and allows for neater graphs
- Use the
graphdrawing
tikz library for automated graph layout - TikZ & PGF manual
4.5. Reference management
- Use Zotero with Zotero connector and better bibtex.
5. Text editors
5.1. Emacs
5.1.1. How to input TeX style?
C-\ RET TeX
, thenC-u C-\
to change it back (parsnip at Libera.chat)- latex-input: Enter Unicode characters using LaTeX notation (grym at Libera.chat)
5.1.2. How to open all files by pattern?
emacs *.tex
emacs *.tex */*.tex
find . -iname \*tex -exec emacsclient {} +
5.1.3. How to highlight by word, line, or regex?
5.1.4. How to profile init / settings?
5.1.6. How to do collaborative editing with Emacs?
5.1.7. How to have custom outlining and folding in Emacs?
- Use
tree-sitter
if your expression is a programming language identifier - Use
origami
for general purpose folding, i.e., GitHub - gregsexton/origami.el: A folding minor mode for Emacs - Change
imenu
's regular expressions, e.g., seeimenu-generic-expression
in Imenu (GNU Emacs Lisp Reference Manual) - See more information about category folding EmacsWiki: Category Outline
bpalmer at Libera.chat
5.1.8. What are some good .emacs config files?
5.1.9. Emacs packages worth checking
- GitHub - emacs-tw/awesome-emacs: A community driven list of useful Emacs pack…
- ripgrep
- eshell
- dired
- mu4e
- org-publish: for a website
- org-roam
5.1.10. Useful modes
5.2. Org-mode
5.2.1. What is the programming language identifier for a source block?
See Tables 1 and 2 in Babel: Languages.
5.2.2. How to export/tangle automatically on commit?
NB: --batch
implies -q
, hence initialization files are no
loaded. Add -l ~/.emacs
, -l ~/.config/emacs/init.el
or similar.
echo 'type emacs > /dev/null 2>&1 && emacs README.org --batch -f org-md-export-to-markdown --kill && git add README.md' >> .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
- ~mplscorwin/org-git-hooks: automatically render code and documentation from org when you git commit
- bug#12791: 24.2; An option to load user init file with -batch
5.2.3. How to add a custom format to display a horizontal line in the buffer?
Set up a custom font lock format by calling the function
font-lock-add-keywords
.
(add-hook 'org-mode-hook (lambda () (font-lock-add-keywords nil '(("^-\\{5,\\}" 0 '(:foreground "green" :weight bold))))))
Source:
5.2.4. How to improve the look of an org-mode buffer?
Use org-modern
for a modern style via font locking and text properties.
Consider also org-superstar
, which has a reduced scope for headings and
plain lists only.
Source:
5.2.5. General resources
5.3. Elisp
5.3.1. Where to start with elisp?
(info "(eintr) Top")
6. Programming
6.1. Programming fonts
- Recommended fonts: Consoles, Andale mono, fira code (ligatures), Source Code Pro, DejaVu Sans Mono
- Use
Illegal1 = O0
as a test - Play font tournament: Find Your True Love of Coding Fonts
6.2. Project structure
- Prefer folder-by-feature over folder-by-type (source)
- Organize tests by mirroring the source tree (source)
- Typical structure (source)
/bin
Files to be executed/src
Source files/lib
External dependencies/doc
Documentation, and more generally any kind of writing/test
All tests/sandbox
The fact that there's one is probably a bad sign, but there you go
- Executable files
- Go in the
\bin
folder - Start with the appropiate shebang, e.g.,
#!/usr/bin/Rscript
- Are truly executable, e.g.,
chmod +x script.R
- Has all paths set so that it can be called from the project
root, e.g.,
./bin/process-data.R
- Go in the
6.3. Functional programming
6.3.1. Concepts
6.3.2. Pure functions
Requirements:
- Referential transparecy: always returns the same result if
given the same arguments
- Rely on their own arguments and immutable values only, e.g.,
a string
- No dependence on random number generation, data files
- Must have no side effects
- No need to worry if used in multiple places, or down a deep
call chain
- No uncertainty about what an object name refers to
- Easier to trace and debug
- Referential transparecy: always returns the same result if
given the same arguments
- Immutable data structures
- Recursion instead of for/while loops
- Function composition instead of attribute mutation
6.3.3. Functional design patterns
A program is chain of monoids (which is a chain of continuations (which is a chain of partial applications (which is a chain of compositions (which is a chain of one-argument functions))))
- Functions all the way down!
- Function as inputs
- Functions as outputs
- Function as arguments
- Hard-coded data, e.g.,
for i in 1:10
- Hard-coded behavior, e.g.,
print
- Decouple behavior from data, i.e., any behavior times any data
- Collection functions, e.g., fold, map, reduce, collect
- Hard-coded data, e.g.,
- Composition pattern: chain low-level operations
- e.g.,
apple -> cherry = apple -> banana + banana -> cherry
- Low-level operation: e.g.,
string -> string
- Service: chained low level operations, e.g.,
Address -> Validation
- Use-case: chained services, e.g.,
ChangeProfileRequest -> ChangeProfileResult
- Application: chained use-cases, e.g.,
Request -> Response
- Composition is fractal
- Works for functions with one parameter only (see next)
- e.g.,
Partial application
- Write a two-parameter function as a one-parameter function that
returns a one parameter function, e.g.,
a + b = function(b) { a + b })
. - Uses:
- When working with vectorized applications, e.g., lapply, Map,
Filter.
- To inject dependencies, e.g., on a database or file
connection.
- Write a two-parameter function as a one-parameter function that
returns a one parameter function, e.g.,
Continuations: chain partial applications
- Bad: nested checks (pyramid of doom)
- Good:
- Start with an abstract
if_ function -> returnType
- Chain them:
- Start with an abstract
if_(doA) else doNestedActionLevel1 if_(doB) else doNestedActionLevel2 if_(doC) else doNestedActionLevel3
- Monoids: chain continuations
- Closure: combining two things always returns another one
thing.
- Readily vectorized!
- Associativity: when combining more than two things, pairwise
combination order does not matter
- Readily parallelized!
- Easy for incremental accumulation
- Identity: a special thing called "zero" that returns the same
thing that was combined with it
- Use as initial value for empty or missing data
- Closure: combining two things always returns another one
thing.
- Two-track model for error handling:
- Use a switch function for error handling, e.g.,
validate(input)
callssucessFun(input)
orfailure(error)
- Use a switch function for error handling, e.g.,
From Functional Programming Patterns (NDC London 2014) by Scott Wlaschin.
6.3.4. General
- Functional Programming Design Patterns: design pattern overview and some demonstrations
- Where are all the functional programming design patterns?
- Origami programming: natural patterns for computation over recursive datatypes
- Functional Programming, Simplified: large free preview available
- The proper way to handle exceptions and null values
- Many lessons on
for
-expressions, which lead naturally into monads
- To know good vs bad Scala style
- Functional Programming | Clojure for the Brave and True
7. R programming
7.1. Write better R
7.1.1. Non-standard evaluation
- NSEs are like super duper macros
7.1.2. R is functional
- Functions as arguments
- Functions as ?
- Functions as output
7.1.3. R has dynamic binding
Or something like that for functions, check
7.2. R programming
7.2.1. Programming with functions
- How to recover the name of the variable passed as argument?
z <- 1:10 f <- function(x) { varName <- as.character(as.list(match.call())$x) ## "z" }
Source: lazy evaluation - Passing a variable name to a function in R
- How to get all function call parameters as a list?
Call
as.list(match.call())[-1]
within your functionmydummyfun <- function(x, y, main = NULL, ...) { l <- as.list(match.call())[-1] do.call(plot, l) } mydummyfun(1:10, 1:10, ylab = "Ex")
Source: Get all Parameters as List
If you want to pass all the arguments to a new function: use
match.call
to get the complete call, and inject the name of the new function, and evaluate the modified call.f1 <- function(a, b, c) { sum(a, b, c^2) } f2 <- function(a, b, c) { prod(a, b, c) } f <- function(a, b, c) { fun <- f1 if (a < 0) fun <- f2 ## get the call (fun and args) thiscall <- match.call(expand.dots = TRUE) ## change function name thiscall[[1]] <- fun ## evaluate new call same as fun(a, b, c) eval.parent(thiscall) } f(+1, 0, 5) f(-1, 0, 5)
- How to add a hook to a function call?
Use
trace
to get execute an expression after and before a function call.f_before <- function() { print("Before call") } f_after <- function() { print("After call") } trace(base::cumsum, f_before, f_after, print = FALSE) cumsum(1:10) untrace(base::cumsum) cumsum(1:10)
- How to mimic function overloading?
Unfortunately, I haven't found anything better than optional arguments. See for example ?xy.coords. In some situations, there might be better approaches.
Source: R - Function overloading
7.2.2. Vectorization
- Why are for loops slow in R?
for
,:
,[
,[<-
are all function calls and function calls can be time-consuming. The following loop is, in fact, a sequence of many function calls. Vectorized functions are written in C and are typically faster.N <- 20 fib <- rep(NA, 10) fib[1] <- 0 fib[2] <- 1 for (i in 3:N) fib[i] <- fib[i - 1] + fib[i - 2]
Source: The Art of R Programming: A Tour of Statistical Software Design section 14.1.1
- Why is
apply
not fast?
While
lapply
and others are implemented in C,apply
is actually implemented in R and might not provide a high speedup.Source: The Art of R Programming: A Tour of Statistical Software Design section 14.1.1
- Which base R functions are vectorized?
Not an comprehensive list
- Math operators
+
,-
, etc
- Logic
==
,!=
, etc
- Vectors
- ifelse
- which
- where
- any
- all
- cumsum
- cumprod
- abs
- pmin
- pmax
- Matrix:
- rowSums
- colSums
- lower.tri
- upper.tri
- All pairs
- outer
- All combinations
- combin
- expand.grid
- Math operators
- How to vectorize the functions and the arguments?
Use
mapply
. Pass the vector of functions as an argument to an anonymous function that calls the function passed as argument to the remaining arguments (credits: Fendur at libera.chat)funs <- c(function(x, y) x + y, function(x, y) x^2 * y) grd <- expand.grid(fun = funs, x = 1:3, y = 1:4) with(grd, mapply(function(f, x, y) f(x, y), fun, x, y))
Use ellipsis for functions taking different arguments
funs <- c(function(x, y, ...) x + y, function(x, y, z) x^2 * y/z) grd <- expand.grid(fun = funs, x = 1:3, y = 1:4, z = 1:2) with(grd, mapply(function(f, x, y, z) f(x, y, z), fun, x, y, z))
In the case of two argument functions, the explicit expansion can be avoided using
outer
funs <- c(function(x, y) x + y, function(x, y) x^2 * y) sapply(funs, function(f) outer(1:3, 1:4)) ## Using R > 4.1 sapply(funs, \(f) outer(1:3, 1:4))
7.2.3. How to create a sequence within groups?
x <- unlist(replicate(10, rep(sample(LETTERS, 1), rpois(1, 4)))) sequence(rle(x)$lengths) # if ordered unlist(sapply(unname(table(x)), seq.int)) # doesn't need ordering
7.2.4. How to identify value changes in a sequence?
x <- unlist(replicate(10, rep(sample(LETTERS, 1), rpois(1, 4))))
head(cumsum(rle(x)$lengths)+1, -1)
Source: Identifying where value changes in R data.frame column
7.2.5. How to renumber a group?
Note that as.numeric(as.factor(x))
does not work if x
contains
numbers. Also, the match-unique combo scales better with large
vectors.
x <- c(4, 4, 4, 6, 6, 6, 6, 8, 8, 8, 8, 1, 1, 1, 5, 5, 5, 5) match(x, unique(x)) ## [1] 1 1 1 2 2 2 2 3 3 3 3 4 4 4 5 5 5 5
7.2.6. How to make a cross table with a custom function (e.g., mean)?
addmargins(xtabs(value ~ ., aggregate(value ~ factor1, factor2, DF, mean)), FUN = mean)
7.2.7. How to make lapply return a data.frame?
If x
is a data.frame
, which is a list with attributes, use x[]
to preserve all attributes.
x[] = lapply(x, type.convert)
Source: Jiří Moravec
7.2.8. How to split a matrix row-wise as a list?
Use base::asplit
for efficiency, where asplit(X, 1)
and
asplit(X, 2)
return a list of rows and columns respectively.
## I just learned about asplit asplit(X, 1) ## Original note X <- matrix(rnorm(100), nrow = 20) l <- as.list(as.data.frame(t(X)))
7.2.9. How to look up a value among possibilities?
The workhorse of any labeling function
l <- list(key1 = "value1", key2 = NA, key3 = 022) lookup <- function(x, l) { unlist(l[x]) }
Source: benchmark unlist versus do.call(c, list) for list lookup in R
7.2.10. How to remove columns with all NA fast?
General approach with Base R only
Filter(function(x)!all(is.na(x)), df)
Via data.table for general time and memory efficiency (40% faster in example)
DT[, which(unlist(lapply(DT, function(x)!all(is.na(x))))), with = FALSE]
Source: remove columns from dataframe where ALL values are NA
7.2.11. How to fast apply a function over a ragged array?
Use unlist(lapply(split(x, f), FUN))
for speed, but consider
tapply(x, f, FUN)
for readibility maybe?
## Unit: microseconds ## expr min lq mean median uq max neval ## f0(x, f) 409.578 415.5165 426.9041 418.9500 424.3400 4237.466 10000 ## f1(x, f) 411.208 418.3095 430.4231 421.8645 427.4155 5550.120 10000 ## f2(x, f) 474.681 487.1265 498.8509 492.6960 497.6360 2552.075 10000 ## f3(x, f) 1395.582 1442.3785 1494.0197 1459.3515 1472.5205 28121.379 10000 set.seed(1) x <- rnorm(10000) f <- factor(rpois(10000, 5)) tapply_ <- function(x, f, FUN) { unlist(lapply(split(x, f), FUN)) } f0 <- function(x, f) { unlist(lapply(split(x, f), mean)) } f1 <- function(x, f) { tapply_(x, f, mean) } f2 <- function(x, f) { tapply(x, f, mean) } f3 <- function(x, f) { by(x, f, mean) } bench <- microbenchmark::microbenchmark( f0(x, f), f1(x, f), f2(x, f), f3(x, f), times = 1E4) print(bench)
7.2.12. How to fast subset rows corresponding to max value by group?
## Row with maximum `g` for each group `id` in the `bdt` data.table bdt[bdt[, .I[g == max(g)], by = id]$V1]
Source: subset rows corresponding to max value by group using data.table
7.2.13. How to create named vector programatically in one statement?
out <- setNames(c("value1", "value2"), c("name1", "name2"))
Source: create a numeric vector with names in one statement?
7.2.14. How to get all function call arguments as a list?
Including ellipsis also!
f <- function(a, b = 2, ...) { c(as.list(environment()), list(...)) }
Source: get all Parameters as List
7.2.15. How to debug an error thrown in a package?
options(error = recover, show.error.locations = TRUE, warn = 2)
Source: debugging unexpected errors in R – how can I find where the error occurred?
7.2.16. How to compute a simple moving average with base R?
SMA <- function(x, K) { if(!(K %% 2)) stop("K is not even") rowMeans(embed(c(rep(NA, K / 2), x, rep(NA, K / 2)), K), na.rm = TRUE) }
Source:
7.2.17. How to set the seed locally for a function?
Note: this needs validation.
myfun <- function(seed) { old <- .Random.seed on.exit({assign(".Random.seed", old, envir = .GlobalEnv)}) set.seed(seed) }
7.2.18. Visualization
- How to draw a plot with minimal margins?
## oma: Outer = device margin lines (bltr) ## mar: Margin = figure margin lines (bltr) ## mgp: ? = axis margin lines (title, label, line) ## No title opar <- par( oma = c(0, 0, 0, 0) + .1, mar = c(3, 3, 0, 0), mgp = c(2, 1, 0) ) plot(x = 1:10, y = 1:10) ## No title nor axis labels opar <- par( oma = c(0, 0, 0, 0) + .1, mar = c(2, 2, 0, 0), mgp = c(2, 1, 0) ) plot(x = 1:10, y = 1:10)
- How to check if a value is in an interval?
## x <- 1 ## confint <- c(-0.5, 0.5) (prod(sign(confint - x)) < 0)
Source: rickyrick at libera.chat
- How to cache a read function?
- How to use
geom_tile
with irregular data?
Note that
?geom_tile
recommendsakima::interp
.x <- mtcars$hp ## x-axis y <- mtcars$qsec ## y-axis z <- mtcars$mpg ## surface color ak <- akima::interp(x, y, z) DF <- expand.grid(ak[1:2]) DF$z <- ak[[3]] DF <- data.frame(expand.grid(ak[1:2]), z = c(ak[[3]])) ggplot(DF, aes(x, y, fill = z)) + geom_tile()
If the data size is large, constructing the data.frame as follows might be more efficient.
DF <- expand.grid(ak[1:2]) DF$z <- ak[[3]]
- How to remove white lines from
geom_tile
with ggplot2?
The horizontal and vertical variable values should be equally spaced for
geom_tile
to work automatically. If there are white lines, there might be small inconsistencies in the gap between a few values (e.g., the first value is 1E-4 instead of an actual zero). Tryround(x)
orfactor(x)
for a quick fix.Source:
- How to plot in reverse log scale with ggplot2?
#' Reverse log transformation #' #' @param base a positive or complex number: logarithm base. # 'Defaults to `e=exp(1)`. #' @return #' @reference https://gist.github.com/JoFrhwld/2266961 .revlog_trans <- function(base = exp(1)){ scales::trans_new( name = paste("revlog-", base, sep = ""), transform = function(x){ -log(x, base) }, inverse = function(x){ base^(-x) }, breaks = scales::log_breaks(base = base), domain = c(1e-100, Inf) ) } scale_x_revlog10 <- function(...) { scale_x_continuous(trans = .revlog_trans(base = 10), ...) } scale_y_revlog10 <- function(...) { scale_y_continuous(trans = .revlog_trans(base = 10), ...) }
- How to add labels near the plot boundaries with ggplot2?
Use
-Inf
andInf
to signal the left/bottom and right/top end respectively, e.g., usex=Inf
andy=Inf
to place ageom_label
on the north-eath.ggplot() + geom_point(aes(x = 1:10, y = rnorm(10))) + geom_label(aes(x = Inf, y = Inf, label = "Some text"), vjust = 1, hjust = 1)
Source:
- How to move the legend closer to the axis label with ggplot2?
Give yourself the gift of the manual fine tuning using
theme(legend.margin = margin(-10, 0, 0, 0))
, where-10
needs to be defined on a case-by-case basis.Source:
- How to make ggplot2 match LaTeX graphics and font size?
Reasonable sizes are 11pt (3.87mm) for manuscripts, 24pt (8.44mm) or 25pt (8.79mm) for posters read at 1m of distance.
- Use
\show\f@size
in LaTeX to show the font length - Use
theme(text = element_text(size = 11))
in R to set the target font size in points - Use
\the\linewidth
in LaTeX to show the line width- Note that
\textwidth
ignore borders that could be defined in the document class
- Note that
- Use
ggsave(..., width = 10, units = "in")
to match LaTeX's line width - Include the figure with
width=\linewidth
Source:
- [exa] at libera.chat
- Get current font size as length - TeX - LaTeX Stack Exchange
- Use
- How to plot in log scale with base R?
plot(exp(1:10), 1:10, log = "x") plot(1:10, exp(1:10), log = "y") plot(exp(1:10), exp(1:10), log = "xy")
- How to fine tune R plot margins?
- How to make beautiful plots with base R?
7.2.19. How to improve the look of my rmarkdown HTML document?
Thanks to fendur on #R at libera.chat
- Client-side, precomputed dashboard like document:
- Idea: one chapter contains an image plus some comments
- Start with
html_document
- Use
.tabset-pills
to organize chapters - Use
.tabset-fade
to make switching smoother
- Pimp my RMD: a few tips for R Markdown
Below is a quick template I prepared.
--- title: "Barebone dashboard" date: "`r Sys.Date()`" output: html_document --- <style type="text/css"> .main-container { max-width: 100% !important; } .title, .author, .date { display:inline!important; } .nav-pills { line-height: 0px !important; } .h1 { font-size: 16px !important; } </style> ```{r setup, include = FALSE} knitr::opts_chunk$set( echo = FALSE, fig.align = "center", fig.width = 16, fig.height = 8.1, out.height = "80%" ) ``` # {.tabset .tabset-fade .tabset-pills} ## Precipitation per year ```{r} plot(Nile, col = "darkgreen", lwd = 2, log = "y") title(main = "Annual flow of the river", adj = 1, line = .5) ``` 1. This is a full-page wide picture 2. Line 2 3. Line 3 4. Line 4 5. Line 5 ## MPG explained ```{r} par(mfrow = c(1, 2)) plot(mtcars$mpg, mtcars$drat) plot(mtcars$mpg, mtcars$wt) ``` 1. This is a full-page side-by-side picture 2. Line 2 3. Line 3 4. Line 4 5. Line 5 ## Species and width ```{r fig.width = 8.1, fig.height = 8.1} pairs(iris[, 1:3], col = iris$Species, bg = iris$Species, pch = 21) ``` 1. This is a full-page square picture 2. Line 2 3. Line 3 4. Line 4 5. Line 5
7.2.20. Where can I find useful addins for RStudio?
Check GitHub - daattali/addinslist to discover and install useful RStudio
addins. You can install the addinslist
package to browse the list from
RStudio.
Here are some suggestions:
7.3. RStudio
7.3.1. Should .Rproj files be added to .gitignore?
RStudio general recommendation is to include .Rproj file in the repository, i.e., do not ignore it
7.3.2. How to order lines alphabetically?
?
7.3.3. How to align assignment or equal symbols across lines?
7.4. Emacs Speaks Statistics
7.4.2. Resources
8. Stan programming
8.1. Stan
8.1.1. How do I choose between transformed parameters
vs model
blocks?
Variables in the
transformed parameters
blocks- Can have constraints for error checking
- Are global
- can be accessed from the
generated quantities
block - are part of the output by default
9. Data science
9.1. Datasets
These are hosted by other people. Use the wayback machine or web archive for dead links.
9.1.1. General collections
9.1.2. Functional data
9.1.3. Spatial data
- IBEX: Interstellar Boundary Explorer
- Main paper: 10.3847/1538-4365/aa66d8
9.2. Data backup
9.2.1. What software backup should I use?
- Borg: Deduplicating archiver with compression and encryption
9.2.2. How to mark a folder not to be backed up?
- Cache Directory Tagging Specification: avoid backing up, archiving, or otherwise unnecessarily copying directories
9.3. Data processing
9.3.1. GNU Make
- Minimal make file
- Passing arguments to make
- Rules with grouped targets: when one script produces more than one target
- Automatic variables: handy to write recipes
- Automation and make: for analysis, visualization and papers
- Self-documenting makefiles
9.3.2. Pipeline automation
- Automating data-analysis pipelines
- Awesome pipeline: curated list of awesome pipeline toolkits
9.3.3. How to view tabular data?
- Tabular data pager: works fine with CSV files too!
- VisiData: lightning fast tabular data exploring, arranging, plotting
9.3.4. How to work with collaborators who use spreadsheets?
9.3.5. ETL cycle
- Cycle initiation
- Build reference data
- Extract (from sources)
- Validate
- Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)
- Stage (load into staging tables, if used)
- Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)
- Publish (to target tables)
- Archive
Source: Real-life ETL cycle
- ETL Template
- Type of tasks
- Data loading
- Keep many small units as principle
- Data munging
- Munge many small units in parallel
- Data processing – computationally intensive PEA
- Run tasks that need many data units (prepare)
Run tasks that need one data unit only in parallel (execute)
- One script with command argument line?
- Makefile call the script?
- One shell script to call makefile?
* TODO This needs to be figured out
- Run tasks that need many data units (aggregate)
- Data storing
- Write many small units to disk
- Write aggregates for each small data unit or together?
- Type of tasks
9.3.6. Workflow for statistical analysis and report writing
9.4. A typical workflow
Scientific data generation, collection, curation and processing.
9.5. Data visualization
9.5.1. What colors are recommended for data visualization?
- 1 color:
- for points: black #000000
- for lines: either black #000000 or honolulu blue #2271B2
- for filled areas: summer sky #3DB7E9
- for surfaces: TBD
- 2 colors:
- for points: TBD
- for lines: honolulu blue #2271B2 and gamboge #E69F001
- for filled area: honolulu blue #2271B2 and gamboge #E69F001
- 3 colors:
- for points: TBD
- for lines: TBD
- for filled areas: TBD
Source:
- Color-palette for color blindness: maintain perceptual luminance uniformity in color blind space
9.5.2. Color palette
- Color Palette for All types of Color Blindness
- Color-palette for color blindness: maintain perceptual luminance uniformity in color blind space
- Color Universal Design: set of colors that is unambiguous both to colorblinds and non-colorblinds
9.5.3. Resources
- What to consider when choosing colors for data visualization
- Data-to-viz: visualization decision tree guide
9.6. High power computing
9.6.1. How to nicely list all my jobs in Slurm?
watch -d squeue --me --format=\"%.18i %.18P %.32j %.8u %.2t \| %.10M \| %.10l \| %.19S \| %.6D %20Y %R\" --sort=i
9.6.2. How to list previously submitted jobs in Slurm?
Use the sacct
command, possibly with -S starttime
and -E endtime
arguments. It also accepts the --format
argument.
sacct -S now-1days
Source:
9.6.3. How to find out the CPU time and memory usage of a Slurm job?
Use seff jobid
after the job has finished. Use sacct -s r jobid
or sstat
jobid
for jobs in progress.
Source:
9.6.4. How to specify Slurm job resources conditionally or programatically?
Short answer: you can't if you're using #SBATCH
tokens. Consider creating a
command called via sbatch
instead, where you can pass modifiers and
arguments to the sbatch
as needed (e.g., sbatch --mem=$mymem
).
Source:
Alternatively, you can pass the script through stdin
replacing the position
arguments.
Source:
9.6.5. What is a good guide for Slurm?
This is an excellent guide: Introducing Slurm | Princeton Research Computing. It covers how to submit serial jobs, multithreaded jobs, multinode or parallel MPI jobs, multinode, multithreaded jobs, job arrays, running multiple jobs in parallel as a single job, and running a sequence of jobs (job dependencies).
9.7. Naming convention
9.7.1. Common guidelines
- Name things mainly for their role, sometimes for their type
- Variable names follow mathematical notation only in low-level functions, use meaningful name for user facing variables
- Use unabbreviated verbs for function names
- Keep abbreviations between 3 and 4 chars
9.7.2. Keywords
Modeling
ou |
Observational unit |
xu |
Experimental unit |
sim |
Simulated/simulations |
rsts |
Restars |
init |
Initialize/initialization |
pars |
A vector of parameter |
fix |
Fixed values |
kwn |
Known values |
val |
Validation |
opt |
Optima (obtained with optimization) |
fit |
Fit |
prd |
Predicted |
Cross-validation
oos | Out of sample |
ins | In sample |
ios | In and out of sample |
idx | Index |
ind | Indicator (boolean, true or false) |
fold | Fold (as in K-fold CV) |
Object types
Str | String |
Ls | List |
Li | List item |
Mat | Matrix |
Vec | Vector |
Fun | Function |
DF | Data frame / data.table |
LDF | Long data.frame / data.table |
WDF | Wide data.frame / data.table (might have more than one) |
xi | A scalar (must be a scalar) |
xs | A vector (must be a vector) |
x | A scalar and/or a vector (both should work) |
X | A vector or matrix (both should work) |
Bit | A scalar, vector, or matrix bitmask |
Transformations
sc | Scaled |
uc | Unscaled? maybe `nt` for natural? `og` for original? |
svd | guess what :) |
pca | idem |
Other
id | Identification code |
ET | Elapsed time |
seed | Seed number |
hash | For a hash (if it's an id, use id instead) |
Statistics
med | Median |
mean | Mean |
var | Variance |
cov | Covariance |
cor | Correlation |
wt | Weight (or ws and wi) |
lsc | Length-scale |
qi | Quantile |
qs | Vector of quantiles |
sigma2 | Use 2 instead of Sq o.o |
ll | Log-likelihood |
hess | Hessian matrix |
mse | Mean square error |
rmse | Root mean square error |
Measurement units
Bm | Biomass |
KgHa | Kilogram per hectare |
MgHa | Megamgram per hectare |
BuAc | Bushels per acre |
Temp | Temperature |
El | Elevation |
10. Statistics
10.1. Statistics
10.1.1. What is the Karhunen–Loève theorem?
Let \(X_t\), \(t\in[a,b]\) be a centered stochastic process with \(\mathrm{E}[X_t] = 0\) for \(t\in[a,b]\) Assume the process satifies a technical continuity condition. Then, we have
\begin{equation} X_t = \sum_{k=1}^{\infty} Z_k e_k(t) \end{equation}where \(Z_k\) are pairwise uncorrelated rando mvariables and \(e_k\) are continuous real-valued functions on \([a,b]\) that are pariwise orthogonal in \(L^2([a,b])\). If the process is Gaussian, then \(Z_k\) are Gaussian and stochastically indepenedent.
Given any orthonormal basis \(e_k(t)\) of \(L^2([a,b])\), we can approximate the stochastic process with
\begin{equation} \hat{X}_t = \sum_{k=1}^{K} A_k\,e_k(t),\ A_k = \int_a^b X_t\,e_k(t)\,\mathrm{d}t ,\, K\in\mathbb{N} \end{equation}The Karhunen–Loève expansion minimizes the total mean square error resulting of its truncation.
10.1.2. What is the Karhunen–Loève decomposition of a Wiener process?
Let \(W_t\) be a Wiener process, i.e., a center standard Gaussian process with covariance function \(K_{W}(t,s)=\operatorname {cov} (W_{t},W_{s})=\min(s,t)\). Then, the expansion consists of sinusoidal functions
\begin{align} e_{k}(t) &={\sqrt{2}}\sin\left(\left(k-{\tfrac{1}{2}}\right)\pit\right) &\text{eigenfunctions}\\ \lambda_{k} &=\frac{1}{(k-{\frac{1}{2}})^{2}\pi^{2}} &\text{eigenvalues} \end{align}10.1.3. Principal Component Regression
Let \(\mathbf{T} = \mathbf{X} \mathbf{W}\) for principal component score matrix \(\mathbf{T}\) and loading matrix \(\mathbf{W}\). Set the models \(Y = \mathbf{X} \mathbf{\beta} + \mathbf{\varepsilon}\) and \(Y = \mathbf{T} \mathbf{\beta}_T + \mathbf{\varepsilon}\). Then, \(\mathbf{X} \mathbf{\beta} = \mathbf{T} \mathbf{\beta}_T \iff \mathbf{\beta} = \mathbf{W} \mathbf{\beta}_T\).
10.1.4. How to revert SVD?
10.1.5. How to find which variables are collinear?
Look at the tail of the QR decomposition pivot vector (rickyrick at libera.chat)
10.1.6. What classification model to use for observations in a map?
- Conditional random field: considers the context (neighborhood) of an observation when predicting a label
- Markov random field: finite or infinite undirected graphical model that can do cyclic dependencies but can't do induced dependencies
10.1.7. How to compute the Cressie and Hawkins (1980) robust estimator?
Cressie and Hawkins (1980) found that the fourth-root of χ1 has a skewness of 0.08 and a kurtosis of 2.48 (compared with 0 and 3 for the Gaussian distribution). Estimates of location, such as the mean and the median, can then be applied to sqrt(X). Finally, these estimates can be raised to the 4th power and adjusted for bias. Consider the square-root-differences cloud for visualization.
- Source: 10.1002/9781119115151 eq. (2.2.8)
10.1.8. When should I apply log transformation to positive data?
Some heuristics that I have heard here and there.
- Recall that linearity, additivity, and constant variance are key assumptions for a linear model
- Are you analyzing multiple responses separately? Do some responses benefit from the log transformation? Do you want to apply the same transformation to all the responses to avoid "p-hacking"?
- Are the changes on the responses happening on a relative scale?
- "Log transform, kids. And don’t listen to people who tell you otherwise."
Source:
- You should (usually) log transform your positive data
- Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.
10.1.9. When should I visualize positive data in log scale?
- If the
range(x) = max(x) - min(x)
is equal to or larger than 10 (Jarad) - Similarly, if the
ratio(x) = max(x) / min(x)
is equal to or larger than 10 - If plot is more loaded to the left of 1 than to the right (Jarad), especially for ratios.
10.1.10. How to transform a covariate taking non-negative values with zeroes?
Consider a covariate (AKA independent variable, regressor, predictor) with true zeroes observed.
- \(log(x+1)\) which maps 0 to 0
- \(log(x+c)\) where \(c\) is either estimated or set to be some small positive value
- inverse hyperbolic sine, which behaves like a log for large values
- replace with two variables: a binary indicator for zero, and \(log(x)\) for
if \(x\) is nonzero or zero otherwise
(see Hosmer & Lemeshow's book)
- probability plots of the positive part of the original variable are useful for identifying an appropriate re-expression
- Box-Cox
- Square root, or cube root, for something quick and dirty to get you going
- Source
10.1.11. What is the difference between cross-validation and the WAIC?
For hierarchical models, WAIC estimates predictive performance for a new observation from an existing group whereas cross-validation estimates the predictive performance for a new observation from a new group.
10.1.12. How to do positivity-preserving interpolation?
- Splines with Nonnegative $B$-Spline Coefficients
- How can I find a non-negative interpolation function? (positivity-preserving interpolation is hard)
10.1.13. What are some good, general rules on reporting?
Here is some general guidance, which of course may not be the best fit for some specific situations
- Clarify the research question
- Focus on estimates, confidence intervals, and clinical relevance
- Carefully account for missing data
- Do not dichotomise continuous variables
- Consider non-linear relationships
- Quantify differences in subgroup results
- Consider accounting for clustering
- Interpret I2 and meta-regression appropriately
- Assess calibration of model predictions
- Carefully consider the variable selection approach
- Assess the impact of any assumptions
- Use reporting guidelines and avoid overinterpretation
Source
10.1.14. Priors
10.1.15. MCMC
- One long run in MCMC: If you can't get a good answer with one long run, then you can't get a good answer with many short runs either.
10.1.16. Neural networks
- A mostly complete chart of Neural Networks by Fjodor van Veen at the Asimov Institute
10.2. Gaussian processes
10.2.1. Is the mean function a Gaussian process uninteresting?
- Universal kernels should be able to approximate any continuous functions on a compact subset, making a mean function nonessential
- The mean function may not live in the space of functions being modeled
- A mean function can help with extrapolation when using a nonperiodic kernel
Source: Why is the mean function in Gaussian Process uninteresting?
10.2.2. Is the derivative of a Gaussian process another Gaussian process?
Since differentiation is a linear operator, the derivative of a Gaussian process is another Gaussian process.
Source:
10.2.3. Is the integral of a Gaussian process another Gaussian process?
If all finite linear combinations \(\sum_i a_i X_{t_i}\) are Gaussian and the process is continuous, \(Y_t = \int_0^t X_s \, \mathrm{d}s\) is a Gaussian process.
- Integral of a Gaussian process (this result is more general)
- Integral of Brownian motion is Gaussian?
10.2.4. Is the convolution of a Gaussian process another Gaussian process?
The convolution of a Gaussian process, as a linear combination of Gaussia nrandom variables, remains a Gaussian process.
- 10.1109/TSP.2011.2119315
- Sparse Convolved Gaussian Processes for Multi-output Regression
10.2.5. What are the main limitations of a Gaussian process?
- Computationally impractical for large data sets: inference requires the
inversion of an \(N\times{}N\) covariance matrix, which is \(O(N^3)\)
- Use a large data approximation: When Gaussian Process Meets Big Data: A Review of Scalable GPs
- Covariance function is commonly assumed to be stationary, limiting modeling flexiblity (e.g., noise variance is different in differents parts of the input space, the function has a discontinuity)
10.2.6. Software
10.2.7. Resources
11. Mathematics
11.1. Math
11.1.1. Calculus
- Resources for hard calculus
- Integrals
- General resources
- Lists of integrals
- List of integrals of Gaussian functions sweet Normals o' mine
- Hairy integrals
- Symbolic Integration I (Vol. 1). (2005). Springer-Verlag. https://doi.org/10.1007/b138171
- General resources
- List of limits
- List of logarithmic identities
- List of mathematical series
- Integral transform
- Integrals
- Multivariable calculus
- How to choose \(u\) and \(dv\) in integration by parts?
We pick \(u\) and \(dv\) from an expression of the form \(\int f(x) g(x) dx\). We need to differentiate \(u\) and integrate \(dv\).
- Consider \(dv = f\) for \(f\) the most complex expression in the integrand among all those with known integral
- Consider \(u = f\) if \(\int f dx\) is not known
- Consider \(u = f\) when \(df\) is nicer to work with than \(\int f\)
- Typical choices are \(u = f\) when \(f\) is a logarithmic, inverse trigonometric, algebraic, trigonometric, or exponential function.
Source:
11.1.2. Functional analysis
11.1.3. Linear algebra
- The Matrix Cookbook: collection of linear algebra facts
- Matrix calculus: symbolic vector and matrix derivatives
- The Matrix Reference Manual: reference information about linear algebra and the properties of real and complex matrices.
- Old and New Matrix Algebra Useful for Statistics: large number of matrix identities.
- Useful Matrix and Gaussian formulae: reference sheets of useful matrix and Gaussian formulae.
11.1.4. Optimization
- GENO: optimization solver code generator
11.1.5. Fourier basis functions
An orthonormal basis for \(L^2([0,1], \mathcal{R})\) is \(1, 2\sqrt{cos(2\pi nx)},2–\sqrt{sin(2\pi nx)}\) for \(x\in[0, 1]\) and \(n = 1,2,\dots\)
Source:
- Fourier basis functions - Mathematics Stack Exchange
- For interval \([-\pi,\pi]\) Are Fourier series a basis for \(L^2((-\pi,\pi)^m)\)?
- For arbitrary \([a, b]\) Orthonormal set of basis functions in \(L^2({a,b})\)
11.1.6. Computer Assisted Algebra (CAS)
As with any software, for Computer Assisted Algebra I try to use open source software as much as possible. I keep the use of Wolfram Alpha and Wolfram Mathematica to a minimum.
- List of computer algebra systems - Wikipedia
- SymPy (documentation): Python library for symbolic mathematics
- Yacas: (documentation) can be access via R using the Ryacas package
- Wolfram Mathematica (proprietary, paid) and Wolfram Alpha (proprietary, free in some cases)
12. Science
12.1. Scientific communication
English is weird but it can be understood through thorough thought though. Communication is as hard as it gets. There's a trade-off between message length, precision, ease of reading, and understandability. I thrive for short and precise sentences that are easy to digest that are modular yet feel complete. Here's a couple notes for better writting.
12.1.1. General resources
- The Elements of Style by Strunk, W., Jr. and White, E.B. An opinionated style guide.
- The Chicago Manual of Style (15th ed.), Chicago: University of Chicago Press.
- Garner's Modern American Usage by Bryan A. Garner.
- English for Writing Research Papers by Adrian Wallwork
- proselint: whisper suggestions on how to improve your prose, see what it looks for.
12.1.2. Academic phrases
- academic-phrases Emacs package
- 600 academic phrases from English for Writing Research Papers
12.1.3. No adjectives allowed
- Good, best, better: in what sense?
- Difficult: what are you trying to say more precisely? What's the complexity?
12.1.4. Use technical terms
- List of technical names used frequently and short sentences
- Emulator, surrogate, meta-model, statistical model versus computer experiment, computer model, mechanistic model, physical model, climate model, what have you
- Significant(ly) has a technical connotation, use considerable or considerably if one want to avoid such connotation
- A vector/scalar(-valued) function of a vector/scalar variable (greenbagels at Libera.Chat). A function of a vector variable, associated with multivariate calculus, depends on the domain of the function. A vector-valued or scalar-valued function, associated with vector calculus, depends on the range of the function.
12.1.5. Repeat technical terms, no synonyms
- Prefer repeating a technical term in the same sentence, or a sequence of sentences, rather than using synonyms that are not defined.
12.1.6. Repeat the technical terms when in doubt
Yeah, it sounds weird to be repetitive and I sometimes feel that I keep repeting myself too much. As a reader, though, I sometimes wish the writter would be more precise even at the price of repetition. Even thought a section might be about a specific density function (e.g., predictive density), make sure that in every sentence it is obvious that you are referring to that density and not to any of the dozen densities that a model have and are highly connected to the density you are talking about.
12.1.7. Attributive nouns
They are tough to beat. In most situations, I prefer an extensive use of attributive nouns. Be ware they grow too big too often; I sometimes break a really long chain of nouns into two smaller attribute noun clauses.
Compare the benefit of the parametrization that we propose for this model versus the proposed model parametrization benefit.
12.1.8. Prefer short precise terms over phrases
- multiple instead of more than one
12.1.9. Getting determiners right
- List of English determiners: Positive paucal determiners
- The Cambridge Grammar of the English Language. DOI: 10.1017/9781316423530
12.1.10. Double negation is not an affirmation
Litotes help moderate an adjective, e.g., not unlike vs like, not uncommon vs common, non-trivial vs complex. "Not unlike" is slightly different than saying "like" much like saying "I love apples" is not the same thing as saying "I don't hate apples."
Source:
12.1.11. Verb tenses
I sometimes doubt about the right verb tense for a sentence. When I read, I often see authors changing tenses often, perhaps with a good reason. Here a few thoughts.
- Active voice
- I don't have a reference to use, but I've seen a pushback against the traditional use of passive voice. It's probably a good idea to stick to either active or passive voice throughout the manuscript.
- Present tense
- Facts that are always true, e.g., a Gaussian process is a probabilistic model
- Present perfect
- Things that have been done recently, e.g., Author (year) has recently introduced a novel framework
- Past tense
- Things that were done in the past, e.g., Author (year) built a parametric function
12.1.12. My own common tics to grep for
- Hyphens are used when two or more adjectives or an adjective and a noun together modify another noun; for example, goodness-of-fit test is the equivalent of test for goodness of fit.
- Most words with prefixes such as sub, non, pre, post are not hyphenated, for example: subtable, nonnormal, nonlinear, premultiply, postgraduate.
- No dash when using the wise suffix, e.g., elementwise instead of element-wise, subspace instead of sub-space
- Closed compound nouns, e.g., metadata instead of meta data
- Prefer "Considering/As/Since subject verb" over "Considering that subject verb" (mdogg at libera.chat)
- squared exponential is grammatical, square exponential is not
- y is quadratic in x or y is a quadratic function of x, but never y is quadratic on x
12.1.13. Rules of thumb
- Use "respectively" at the end of the sentence, preferably, or in the middle. Use comma.
- Prefer "and is thus" over "and thus is" (187M vs 24M matches). Consider "therefore" or "thence" instead of "thus", which is becoming archaic, for causation.
- Don't be unnecessarily indirect, don't be fluffy. E.g., "Consider"
- Can/could/may: can expresses certainty, could expresses uncertainty or a conditional statement, may expresses a possibility or a permission, may not expresses a denial of permission. (The Chicago manual of style, 2017, 5.250).
12.1.14. Contextual formatting
- Use bold for (i) make the core of the message as salient as possible, (ii) in-line headings at the start of the line
- Use emph, or italic if emph is not available, to mark format names or definition concepts
- Use underline for editing comments that won't make it to the final version
- Use
verbatim
for programming-like keywords, e.g.,gemv
- Use
code
for in-line code snippets
12.1.15. Oral presentations
- A sentence is better than a paragraph. A phrase is better than a sentence. A word is better than a phrase. An image is better than a word.
- Have you noticed how hard it is to convey a technical detail while giving an oral presentation? Now imagine how hard it is for the audience to understand anything too technical. Be a nice speaker instead :)
12.1.16. Research statement
12.1.17. Research proposal
- Research proposal template (published by the SML group for an opening)
12.2. Scientific programming
12.3. Research project
Four git repositories are needed. An overarching repository for the
project (project
), and subcomponents for the three elements in the
research project (data
, analytics
, literature
respectively). Possibly use git submodules for the latter.
- Data: data acquisition and munging, i.e. from raw data to the format required downstream
- Analytics: analyses of the above data
- Literature: writing based on the above analyses, e.g., abstracts, reports, manuscripts, presentations
12.4. Scientific writing
12.4.1. Tags
You probably want a LaTeX command for these.
- Citation needed: when you are inclined to accept the statement, but still need to pin-point the best reference
- Verification needed: when you want to double-check that something is as stated
- Revision needed: when you want to review the math or, more generally, the logic of the statement
- Further research needed: when you make a broad statement, typically outside the scope of the current writing, that seems worth exploring by you or anyone else
- Discussion needed: when you want to have others opinion
12.5. Numerical computing
12.5.1. When to use analytical versus automatic differentiation?
Automatic differentiation can decrease runtime at large over numeric differentiation with little effort. In some (typically surprising) cases, automatic differentiation can yield better runtimes than analytical differentiation. A key aspect about analytical differentiation is that, besides the math work needed to obtain the analytical form, their implementation typically requires a non-trivial amount of work to beat automatic differentiation. Numerical differentiation is almost always the least performant.
12.5.2. How many evaluations do L-BFGS-B require?
To optimize a function of \(m\) parameters, L-BFGS-B spends per step roughly \(2m\) calls on calculating a tangent plane (derivatives) plus a further call at each newly chosen location, which is typically at a short distance. L-BFGS-B can be more robust in higher dimension settings than Nelder-Mead.
Source: Surrogates - Robert Gramacy
12.5.3. How to select algorithm tolerance for numerical differentiation?
In absence of problem-specific information, use the square root of machine epsilon for forward difference, and the cubic root of machine epsilon for centered difference. Finding a minimum gets the square root of epsilon while finding a root gets epsilon.
Source: Choosing epsilon
12.5.4. How to compute the Euclidean distance matrix fast?
Use the dot product
X <- iris3[,,1] M <- tcrossprod(X) m <- diag(M) o <- rep(1, nrow(M)) # column vector h2 <- m %*% t(o) # outer(m, o) = m %*% t(o) D2 <- -2 * M + h2 + t(h2) all.equal(D2, unname(as.matrix(dist(X))^2))
def compute_distances_no_loops(self, X): dists = -2 * np.dot(X, self.X_train.T) + np.sum(self.X_train**2, axis=1) + np.sum(X**2, axis=1)[:, np.newaxis] return dists
Source: Dot Product and Distance Matrix