Zeyuan Shang

Alpine Meadow 2.0: New Horizons in AutoML

TL;DR: This bost is about something I really would like to do but unfortunately haven’t got time yet :).

Written on January 30, 2020

Some Interesting ML-related Papers in VLDB 2018

TL;DR: VLDB 2019 just happened and now I am going to write a post about papers in VLDB 2018 :).

Written on September 1, 2019

AutoML: Methods, Challenges and Opportunities

Recently, we got an paper about interactive AutoML accepted by SIGMOD 2019 (Shang et al., 2019). In that paper, we discussed how we tackled the problem of automated machine learning while providing interactive responses. We architected our system into several components and discussed implementations and techniques for each of them. Although looking back, the overall design seems pretty straightforward, it takes us almost one year and a half to get into the kingdom of AutoML and figure out a reasonable architecture for a practical system. But I are really glad that eventually we figured this out and have this paper accepted to SIGMOD.

Written on April 5, 2019

A Practical Overview of Feature Engineering in Machine Learning and How to Automate it

In this blog post, I am going to talk about feature engineering in machine learning, which is fundamental but difficult. I will first examine what techniques we have for applying feature engineering and then discuss some works on automating the process.

Written on March 5, 2019

Some Good Papers in SIGMOD 2018 (Industry Sessions)

In this post, I am going to write my reading report for industry papers in SIGMOD 2018. I will go through all the papers I feel interesting and write their basic ideas, methods and my personal evaluations. Please let me know if you find anything inappropriate.

Written on February 16, 2019

Some Good Papers in SIGMOD 2018 (Research Sessions 9-15)

In this post, I write my reading report for research papers in SIGMOD 2018. I will go through all the papers I feel interesting and write their basic ideas, methods and my personal evaluations. Please let me know if you find anything inappropriate.

Written on October 15, 2018

Some Good Papers in SIGMOD 2018 (Research Sessions 1-8)

In this post, I write my reading report for research papers in SIGMOD 2018. I will go through all the papers I feel interesting and write their basic ideas, methods and my personal evaluations. Please let me know if you find anything inappropriate.

Written on June 15, 2018

How to Integrate Your Stuff into Spark SQL

Spark SQL is a module in Apache Spark that enables relational processing (e.g., declarative queries) using Spark’s functional programming API. Spark SQL also provides a declarative DataFrame API to bridge between relational and procedural processing. It supports both external data sources (e.g., JSON, Parquet and Avro) and internal data collections (i.e., RDDs). Besides, it uses a highly extensible optimizer Catalyst, making it easy to add complex rules, control code generation, and define extension points.

Written on November 7, 2017

System Overview: LevelDB

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Many of its ideas and techniques are widely used in Big Data stacks, e.g., BigTable and HBase. The code in LevelDB is well-written with good documents, which is a ideal project to learn from.

Written on March 17, 2017