Some Interesting MLrelated Papers in VLDB 2018
TL;DR: VLDB 2019 just happened and now I am going to write a post about papers in VLDB 2018 :).
In this post, I am going to write my reading report for MLrelated research papers in VLDB 2018. I will go through all the papers based my own research interests and write their basic ideas, methods and my personal evaluations. Please let me know if you find anything inappropriate.

MLBench: Benchmarking Machine Learning Services Against Human Experts
 Summary
 This paper presents MLBench, a benchmark providing a besteffort baseline of both feature engineering and machine learning models for each dataset, proposes a performance metric measuring the map between a ML system and topranked Kaggle performers, and extract some interesting insights.
 The performance metric, namely “quality tolerance”, is \(\pi\) if the user is satisfied by only being ranked amont the top \(\pi\%\) in a Kaggle competition.
 They manually collected the winning code from 41 Kaggle competitions as the besteffort baseline and compare Azure ML and Amazon ML services on these datasets (with or without hyperparameter tuning). They found that model diversity helps, model selection is necessary and hyperparameter tuning also makes a difference.
 Further, nonlinear models outperform linear models on big datasets (but they are also more likely to suffer from overfitting on small datasets), and linear models have similar performance in general, therefore it is a hitormiss pattern for linear models. Another interesting insight to note is that among nonlinear models, they exert similar performance within each model family (e.g., SVM or decision tree), and the loner training time doesn’t help within the nonlinear or linear model space.
 Not surprisingly, feature engineering helps a lot.
 Comments
 This paper is well written and easy to understand, and also it discusses the limitations and potential alternative methods to justify its designs.
 This is definitely a good paper with lots of insightful findings. Based on my experience of building AutoML systems, some models are just mostly better, but this is not easy to explain in theory (whereas there is No Free Lunch Theorem).
 I think it would be more helpful to make a largescale comparison across hundreds of datasets and different libraries.
 Summary

On Optimizing Operator Fusion Plans for LargeScale Machine Learning in SystemML
 Summary
 This paper presents an exact, costbased framework for optimizing operator fusion plans over DAGs of linear algebra operations, which guarantees finding the optimal plan regarding the considered decisions.
 For candidate exploration, they enumerate
partial fusion plan
per operator. They propose a memo table to store all the candidate plans and design a DFS algorithm to populate the memo table bottomup.  For candidate selection, they first split the plans into independent partitions, and linearize the search space per partition and enumerate plans while skipping plans that can be safely pruned. They also cache plans for repeated optimization problems.
 They compare against Julia and TensorFlow on some synthetic datasets and real datasets (e.g., Airline78, MNIST, Netflix, Amazon product review).
 Comments
 This paper is well written and clear structured.
 I do think there are better baselines for comparison in the experiments (e.g., TensorFlow is not built for such scenarios).
 Summary

Snorkel: Rapid Training Data Creation with Weak Supervision
 Summary
 This paper presents Snorkel, a system that enables user to write labeling functions that express heuristics, learns a generative model over the labeling functions, and then trains a discriminative model.
 Snorkel constructs the generative model as a factor graph (including three factors, labeling propensity, accuracy and pairwise correlations of labeling functions).
 They train a discriminative model on the probabilistic labels by minimizing a noiseaware variant of the loss, i.e., the expected loss.
 They present a theoretical analysis of when a simple majority vote will work just as well as the modeling of the accuracies of labeling functions, and introduces an optimizer for deciding when to model accuracies of labeling functions and which correlations to model among labeling functions.
 Snorkel provides 132% average improvements to predictive performance over prior heuristics and comes within an average 3.60% of the predictive performance over large handcurated training sets.
 Comments
 This paper is well written and clear structured.
 Although the overall architecture is normal, the analysis of tradeoffs is really interesting and insightful.
 Summary

Ease.ml: Towards Multitenant Resource Sharing for Machine Learning Workloads
 Summary
 This paper presents ease.ml, a declarative machine learning service platform for multitenant resource sharing.
 It proposes a novel algorithm for multitenant, costaware model selection algorithm. First step it determines the best model for each user by estimating the “potential for accuracy improvement” for each model and then picks the user with the highest potential. They develop a costaware variant of the standard GPUCB algorithm for selecting the model of each user.
 Secondly, they use a greedy algorithm that selects the user with a confidence bound above the average, and in practice, they pick the user with the maximum gap between the largest upper confidence bound and the best accuracy so far.
 Comments
 This paper is well written and clear structured. Its introduction is written in a way such that they provide some realworld failed experiences as the motivation.
 It is always helpful to provide some examples for better illustration and some alternative designs to justify your proposed designs.
 It is good to discuss about the limitations to better scope the paper and illustrate why the limitations are currently not resolved.
 They mention Bayesian Optimization in the abstract but it is not discussed in the paper, I guess it is probably used in the hyerparameter tuning?
 Summary
Written on September 1, 2019