Sample Code

A quick exploration of food data (Python)

Let me guide you through a quick exploration of the openfoodfacts dataset, which contains 1.8 million foods as of today. What are the five most common ingredients in foods? Are additives related to poor nutrition scores?  Are foods from the US richer than foods from other countries? I use a subset of the data to explore these questions, with a focus on EDA, hypothesis testing (Chi-Squared test, t-test), manipulating text data, and dealing with a large number of outliers and missing values.

Download code here . Download data here . Or, read the pdf:

food.pdf

Predicting house prices in Ames, Iowa (Python)

Let's use this dataset by Dean De Cock to predict house prices in Ames, Iowa. We build models with differing levels of complexity (2, 20, and 77 predictors) using linear regression and ridge regression, with a focus on feature engineering ,  hyperparameter tuning, and grid search. We obtain validation MAEs  that range from approx. 32,000 USD (Simple model) to 12,300 USD (Complex tuned model), compared to a baseline of 49,300 dollars,  with a R-Squared of roughly 0.9.

Fork code here . Or, read the pdf:

house_prices.pdf

Favourite libraries (Python)

A list of my favourite/most used libraries in Python.

favourite_libraries.pdf

International Migrant Stock (R)

Very quick visualisations of the United Nations International Migrant Stock, defined as the number of people born in a country other than that in which they live. Analyses show break up by country, both in absolute numbers and in percentage of population.

Fork code here . Or, read the pdf:

merged.pdf

Bar plots! (R)

A function I wrote to automatically generate bar plots for all categorical variables in your dataset. The function will (a) define all relevant variables as factors, (b) generate aesthetically pleasing bar plots with number of observations and percentages as annotations, (c) add a graph title with the name of the variable and N, (d) export a png image with meaninfgul file names. 

Fork code here . Or, read the pdf:

bar-plots_script.R at main · jean-luc-jucker_bar-plots.pdf

You can find more of my code here.