A Fault-Tolerant Distributed MapReduce System Based on Rust

โš“ Rust    ๐Ÿ“… 2025-09-14    ๐Ÿ‘ค surdeus    ๐Ÿ‘๏ธ 1      

surdeus

Design and implement the core coordinator for a distributed MapReduce framework, completing end-to-end functionality:
(1) Implement worker node registration and heartbeat detection via Register/Heartbeat RPC, establishing a timeout/failure detection mechanism;
(2) Implement a task queue based on VecDeque and HashMap, supporting FIFO scheduling to ensure efficient task allocation across jobs;
(3) Implemented SubmitJob/PollJob RPC interfaces for job submission and status queries, rigorously validating application logic and transmitting byte parameters;
(4) Developed GetTask dynamic task distribution and FinishTask status update systems to drive Map/Reduce phase transitions;
(5) Designed and implemented a three-tier fault tolerance strategy: redistribute Map tasks upon Worker failure (persist Reduce outputs without retries), enable task-level auto-retry via FailTask RPC (retry=true), and immediately mark jobs as failed upon I/O or function exceptions (failed=true).

1 post - 1 participant

Read full topic

๐Ÿท๏ธ Rust_feed