Software

py-puzzles

I really love puzzles, and whenever I have some time I like to think about how to generate and solve puzzles programmatically. My python package py-puzzles currently contains my collection of code for this purpose. At the time of writing, I have implementations for classical Sudoku, some Sudoku variants (diagonal, king, knight, non-consecutive) and Shikaku.

The Github repo is here.

DuaLip

I am one of the contributors to DuaLip, LinkedIn's extreme-scale Linear Program (LP) solver based on Apache Spark. It is a distributed solver that solves a perturbed LP at scale via gradient-based algorithms on the smooth dual. DuaLip can easily scale to problems in trillions of variables.

The Github repo is here, and the package website is here.

cvwrapr

cvwrapr is a small R package I wrote to help with cross-validating tuning hyperparameters in supervised learning algorithms. The code base is largely extracted from the glmnet package.

The R package cvwrapr is available here. The package website contains vignettes that explain how to use the package. A short blog post on the package can be found here.

Lasso and elastic net generalized linear models ("glmnet")

I had the privilege of working on versions 4.0 and 4.1 of glmnet, a popular R package with efficient procedures for fitting the entire lasso or elastic-net regularization path for generalized linear models (GLMs). With v4.0, we can fit ANY GLM by letting the family argument accept a class "family" object (previously we could only fit special GLM families). With v4.1, we can fit regularized Cox models for start-stop data as well as stratified Cox models. This opens up the way to fit a wide array of regularized Cox models (e.g. time-dependent covariates, left truncation, multiple events per subject).

The R package glmnet is available here; more details on the family argument can be found here, and more details on regularized Cox models can be found here.

Feature-weighted elastic net ("fwelnet")

Feature-weighted elastic net ("fwelnet") is a variant of the elastic net which has feature-specific penalties. These penalties are based on additional information that the user has on the features (e.g. grouping information, expert views on the importance of the feature). This allows the model-fitting algorithm to leverage such external information to learn better predictive models.

The R package fwelnet is available here; more details on the method can be found here.

Reluctant generalized additive models ("RGAM")

Reluctant generalized additive models ("RGAM") are an extension of sparse generalized linear models (GLMs) which allow a model's prediction to vary non-linearly with an input variable. It is guided by the principle that, if all else is equal, one should prefer a linear feature over a non-linear feature. It is a multi-stage algorithm which scales well computationally, and works for quantitative, binary, count and survival data.

The R package relgam is available here; more details on the method can be found here.

Principal components lasso ("pcLasso")

Principal components lasso ("pcLasso") is a new method for supervised learning which I worked on with Rob Tibshirani and Jerry Friedman. The method shrinks predictions toward the leading principal components of the feature matrix. This method is especially useful when the features come in groups, and it works for both overlapping or non-overlapping groups.

The R package pcLasso is available here; more details on the method can be found here.

Other

Statistical Odds & Ends

I often find myself googling the same keywords over and over again, piecing together information across various sites to (re-)learn the derivation of or intuition behind statistical results. This blog is a way for me to pen down what I've learned as a form of knowledge retention, as well as to post short statistical tidbits that randomly pop into my head.

Mathematical Odds & Ends

This blog is just like Statistical Odds & Ends, except that the content is mathematical but non-statistical.

Beyond Solutions

I post math olympiad problems and brainteasers along with their solutions here. The goal is to bring students beyond reading complete proofs to understanding how the author thought of the solutions (and how they could have thought of it themselves). The blog is largely inactive but I still post there once in a while.

FIFA World Cup 2018: Data Compilation and Analysis

I love to watch sports, especially international tournaments like the Olympics and the FIFA World Cup. For FIFA World Cup 2018, I compiled some basic statistics for the matches and countries, and did some basic data analyses. The data and analysis files are available on Github.