Some Interesting Papers in VLDB 2018

This blog post is under development, please stay tuned.

TL;DR: VLDB 2019 just happened and now I am going to write a post about papers in VLDB 2018 :).

In this post, I am going to write my reading report for research papers in VLDB 2018. I will go through all the papers based my own research interests and write their basic ideas, methods and my personal evaluations. Please let me know if you find anything inappropriate.

Session 1: Database Techniques for Machine Learning

  • MLBench: Benchmarking Machine Learning Services Against Human Experts

    • Summary
      1. TODO
    • Comments
      1. TODO
  • On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML

  • Snorkel: Rapid Training Data Creation with Weak Supervision

Session 2: Optimization using Hardware

  • PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database

Session 3: Knowledge Graphs and Semantic Web


Session 4: New Architectures and Challenges

  • FlexPS: Flexible Parallelism Control in Parameter Server Architecture

  • CloudKit: Structured Storage for Mobile Applications

Session 5: Emerging Topics: IoT, Blockchain, Security, and Semantic Queries

  • ForkBase: An Efficient Storage Engine for Blockchain and Forkable Applications

  • Data Synthesis based on Generative Adversarial Networks

Session 6: Joins

  • Interleaving with Coroutines: A Practical Approach for Robust Index Joins

  • Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models

Session 7: Data Statistics


Session 8: Stream Processing

  • Challenges and Experiences in Building an Efficient Apache Beam Runner For IBM Streams

  • Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems

  • Providing Streaming Joins as a Service at Facebook

  • Model-Free Control for Distributed Stream Data Processing using Deep Reinforcement Learning

Session 9: Spatial Data

  • How Good Are Modern Spatial Analytics Systems?

  • Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability

Session 10: Ranking Queries

  • Adaptive Sampling for Rapidly Matching Histograms

  • HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Session 11: Estimation and Approximation

  • Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse

  • Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses

  • Cardinality Estimation: An Experimental Survey

Session 12: Time and Event Data

  • ModelarDB: Modular Model-Based Time Series Management with Spark and Cassandra

Session 13: Trajectories

  • UlTraMan: A Unified Platform for Big Trajectory Data Management and Analytics

Session 14: Key-Value Stores and Caching

  • Sundial: Harmonizing Concurrency Control and Caching in a Distributed OLTP Database Management System

  • ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data

Session 15: Dependency Discovery and Data Models


Session 16: High Performance Transaction Processing

  • Exploiting Coroutines to Attack the “Killer Nanoseconds”

Session 17: Influence and Statistics in Networks

  • Noticeable Network Delay Minimization via Node Upgrades

Session 18: Data Cleaning and Extraction


Session 19: Subgraphs and Communities


Session 20: Data Center Scale Query Techniques

  • F1 Query: Declarative Querying at Scale

  • Selecting Subexpressions to Materialize at Datacenter Scale

  • Smoke: Fine-grained Lineage at Interactive Speed

Session 21: Advanced Queries


Session 22: In-Memory and In-Database Analytics

  • Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last

  • AIDA - Abstraction for Advanced In-Database Analytics

Session 23: Data Science Algorithms


Session 24: Clusters and Cycles in Graphs


Session 25: High Performance Platforms

  • Filter Before You Parse: Faster Analytics on Raw Data with Sparser

  • Froid: Optimization of Imperative Programs in a Relational Database

  • Evaluating End-to-End Optimization for Data Analytics Applications in Weld

Session 26: Query Execution and Optimization


Session 27: Data Mining and Data Science


Session 28: Graph Systems


Session 29: Cloud

  • RHEEM: Enabling Cross-Platform Data Processing – May The Big Data Be With You! –

  • Bubble Execution: Resource-aware Reliable Analytics at Cloud Scale

  • FusionInsight LibrA: Huawei’s Enterprise Cloud Data Analytics Platform

Session 30: Privacy and Security


Written on July 15, 2019