List of papers I read in 2020

2021-01-17 by xiaoguang

Retrospecting 2020, I found a lot of time is “wasted” with my family. Like always I had made some resolutions in the beginning of 2020, but they are rarely done (I blame Covid-19 for this, not myself :). Reading papper is not one of the resolution though, but I found it took a lot of my time too. So I decide to summary a list of the papers I read in the past year.

Most of the papers are database related.

Consensus algorithms

  1. In Search of an Understandable Consensus Algorithm (Extended Version), by Diego Ongaro and John Ousterhout, Stanford University
  2. Three modifications for the Raft consensus algorithm, by Henrik Ingo, MongoDB


  1. Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
  2. The Cascades Framework for Query Optimization, by Goetz Graefe, a paper from Microsoft SQLServer product (TBH, I don’t fully understand this paper yet, after reading more than 3 times)
  3. Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products, by Guido Moerkotte and Thomas Neumann, (I think this is a result of the hyperdb project if I recall correctly, this is the DPccp algorithm paper)
  4. Optimizing Join Enumeration in Transformation-based Query Optimizers, by Anil Shanbhag and S. Sudarshan, (this is like a traditional way, using transforming rules, to solve joing ordering problem compares to the DPccp algorithm)
  5. Building Query Compilers (Under Construction, this is not a paper, it’s a unfinished book, mixed with English and German), by Guido Moerkotte
  6. Efficiency in the Columbia Database Query Optimizer, by Yongwen Xu, Portland State University, (I read this papper more than 5 times, bc it’s supposed to be a detailed version of the Cascades Framework paper by Goetz Graefe)


  1. Top Down Operator Precedence, by Vaughan R. Pratt, MIT (this is a really effective way to parse expressions, I used it in one of my project, it’s simple and elegant and powerful)
  2. Bigtable: A Distributed Storage System for Structured Data, (I don’t have to cite the author here, it’s famous. I read this paper again when thinking our online feature store implementation for our company, I found something neglected before, and I have to say it’s a precious.)

At Last

I found all the papers are database related, but actually I tried to read some ml and datascience textbooks, but failed. I mean I didn’t finish any of them yet, so this may become a resolution for 2021.

Anyway, wish me and my family good luck for the upcoming 2021!