Sample Code
A quick exploration of food data (Python)
Let me guide you through a quick exploration of the openfoodfacts dataset, which contains 1.8 million foods as of today. What are the five most common ingredients in foods? Are additives related to poor nutrition scores? Are foods from the US richer than foods from other countries? I use a subset of the data to explore these questions, with a focus on EDA, hypothesis testing (Chi-Squared test, t-test), manipulating text data, and dealing with a large number of outliers and missing values.

Predicting house prices in Ames, Iowa (Python)
Let's use this dataset by Dean De Cock to predict house prices in Ames, Iowa. We build models with differing levels of complexity (2, 20, and 77 predictors) using linear regression and ridge regression, with a focus on feature engineering , hyperparameter tuning, and grid search. We obtain validation MAEs that range from approx. 32,000 USD (Simple model) to 12,300 USD (Complex tuned model), compared to a baseline of 49,300 dollars, with a R-Squared of roughly 0.9.
Fork code here . Or, read the pdf:

Favourite libraries (Python)
A list of my favourite/most used libraries in Python.

International Migrant Stock (R)
Very quick visualisations of the United Nations International Migrant Stock, defined as the number of people born in a country other than that in which they live. Analyses show break up by country, both in absolute numbers and in percentage of population.
Fork code here . Or, read the pdf:

Bar plots! (R)
A function I wrote to automatically generate bar plots for all categorical variables in your dataset. The function will (a) define all relevant variables as factors, (b) generate aesthetically pleasing bar plots with number of observations and percentages as annotations, (c) add a graph title with the name of the variable and N, (d) export a png image with meaninfgul file names.
Fork code here . Or, read the pdf:

You can find more of my code here.