wgcDB: A Rust prototype of an adaptive multi-representation database inspired by a research paper

โš“ Rust    ๐Ÿ“… 2026-03-16    ๐Ÿ‘ค surdeus    ๐Ÿ‘๏ธ 1      

surdeus

I've been experimenting with an idea from a database research paper (linked in repo):
what if a database could observe your queries and automatically build the right index
for each column
, instead of forcing you to create them upfront?

So I built a minimal prototype in Rust to test the concept.

:brain: Core idea

  • Data is split into micro-shards (time-based directories with CSV data)
  • Each shard maintains a pool of lightweight representations (indexes)
  • The system monitors query patterns and automatically builds:
    • Mini B-trees for high-cardinality columns (e.g., user_id)
    • Bloom filters for low-cardinality columns (e.g., category)
  • Queries pick the best available representation per shard at runtime

:sparkles: Current MVP (800 lines of Rust)

git clone https://github.com/guangdawang/wgcDB
cd wgcDB
cargo run

You'll see:

  1. Test data generation (3 shards, 1000 rows each)
  2. Queries that automatically trigger index builds after the 3rd access
  3. Each shard choosing the optimal representation

:books: Background

This is a minimal implementation of ideas from this paper โ€“
just a proof-of-concept to validate the architecture. The full vision includes RL-based
scheduling, more representation types (columnar, vector indexes), and persistent storage.

:light_bulb: Why I'm posting

I'm looking for:

  • Feedback on the architecture design
  • Ideas for better scheduling logic (currently just counter-based)
  • Potential contributors who find this direction interesting

The code is intentionally simple โ€“ great for learning both Rust and database internals!

Repo: GitHub - guangdawang/wgcDB ยท GitHub

1 post - 1 participant

Read full topic

๐Ÿท๏ธ Rust_feed