## Some Interesting ML-related Papers in VLDB 2018

TL;DR: VLDB 2019 just happened and now I am going to write a post about papers in VLDB 2018 :).

In this post, I am going to write my reading report for ML-related research papers in VLDB 2018. I will go through all the papers based my own research interests and write their basic ideas, methods and my personal evaluations. Please let me know if you find anything inappropriate.

• ### MLBench: Benchmarking Machine Learning Services Against Human Experts

• Summary
1. This paper presents MLBench, a benchmark providing a best-effort baseline of both feature engineering and machine learning models for each dataset, proposes a performance metric measuring the map between a ML system and top-ranked Kaggle performers, and extract some interesting insights.
2. The performance metric, namely “quality tolerance”, is $\pi$ if the user is satisfied by only being ranked amont the top $\pi\%$ in a Kaggle competition.
3. They manually collected the winning code from 41 Kaggle competitions as the best-effort baseline and compare Azure ML and Amazon ML services on these datasets (with or without hyper-parameter tuning). They found that model diversity helps, model selection is necessary and hyper-parameter tuning also makes a difference.
4. Further, nonlinear models outperform linear models on big datasets (but they are also more likely to suffer from overfitting on small datasets), and linear models have similar performance in general, therefore it is a hit-or-miss pattern for linear models. Another interesting insight to note is that among nonlinear models, they exert similar performance within each model family (e.g., SVM or decision tree), and the loner training time doesn’t help within the nonlinear or linear model space.
5. Not surprisingly, feature engineering helps a lot.
1. This paper is well written and easy to understand, and also it discusses the limitations and potential alternative methods to justify its designs.
2. This is definitely a good paper with lots of insightful findings. Based on my experience of building AutoML systems, some models are just mostly better, but this is not easy to explain in theory (whereas there is No Free Lunch Theorem).
3. I think it would be more helpful to make a large-scale comparison across hundreds of datasets and different libraries.
• ### On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML

• Summary
1. This paper presents an exact, cost-based framework for optimizing operator fusion plans over DAGs of linear algebra operations, which guarantees finding the optimal plan regarding the considered decisions.
2. For candidate exploration, they enumerate partial fusion plan per operator. They propose a memo table to store all the candidate plans and design a DFS algorithm to populate the memo table bottom-up.
3. For candidate selection, they first split the plans into independent partitions, and linearize the search space per partition and enumerate plans while skipping plans that can be safely pruned. They also cache plans for repeated optimization problems.
4. They compare against Julia and TensorFlow on some synthetic datasets and real datasets (e.g., Airline78, MNIST, Netflix, Amazon product review).
1. This paper is well written and clear structured.
2. I do think there are better baselines for comparison in the experiments (e.g., TensorFlow is not built for such scenarios).
• ### Snorkel: Rapid Training Data Creation with Weak Supervision

• Summary
1. This paper presents Snorkel, a system that enables user to write labeling functions that express heuristics, learns a generative model over the labeling functions, and then trains a discriminative model.
2. Snorkel constructs the generative model as a factor graph (including three factors, labeling propensity, accuracy and pairwise correlations of labeling functions).
3. They train a discriminative model on the probabilistic labels by minimizing a noise-aware variant of the loss, i.e., the expected loss.
4. They present a theoretical analysis of when a simple majority vote will work just as well as the modeling of the accuracies of labeling functions, and introduces an optimizer for deciding when to model accuracies of labeling functions and which correlations to model among labeling functions.
5. Snorkel provides 132% average improvements to predictive performance over prior heuristics and comes within an average 3.60% of the predictive performance over large hand-curated training sets.
1. This paper is well written and clear structured.
2. Although the overall architecture is normal, the analysis of tradeoffs is really interesting and insightful.
• ### Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads

• Summary
1. This paper presents ease.ml, a declarative machine learning service platform for multi-tenant resource sharing.
2. It proposes a novel algorithm for multi-tenant, cost-aware model selection algorithm. First step it determines the best model for each user by estimating the “potential for accuracy improvement” for each model and then picks the user with the highest potential. They develop a cost-aware variant of the standard GP-UCB algorithm for selecting the model of each user.
3. Secondly, they use a greedy algorithm that selects the user with a confidence bound above the average, and in practice, they pick the user with the maximum gap between the largest upper confidence bound and the best accuracy so far.