A Fault-Tolerant Distributed MapReduce System Based on Rust

⚓ Rust 📅 2025-09-14 👤 surdeus 👁️ 11

Warning

This post was published 59 days ago. The information described in this article may have changed.

Info

This post is auto-generated from RSS feed The Rust Programming Language Forum - Latest topics. Source: A Fault-Tolerant Distributed MapReduce System Based on Rust

Design and implement the core coordinator for a distributed MapReduce framework, completing end-to-end functionality:
(1) Implement worker node registration and heartbeat detection via Register/Heartbeat RPC, establishing a timeout/failure detection mechanism;
(2) Implement a task queue based on VecDeque and HashMap, supporting FIFO scheduling to ensure efficient task allocation across jobs;
(3) Implemented SubmitJob/PollJob RPC interfaces for job submission and status queries, rigorously validating application logic and transmitting byte parameters;
(4) Developed GetTask dynamic task distribution and FinishTask status update systems to drive Map/Reduce phase transitions;
(5) Designed and implemented a three-tier fault tolerance strategy: redistribute Map tasks upon Worker failure (persist Reduce outputs without retries), enable task-level auto-retry via FailTask RPC (retry=true), and immediately mark jobs as failed upon I/O or function exceptions (failed=true).

1 post - 1 participant

Read full topic

🏷️ Rust_feed

👍 󠁮󠁮󠁮󠁮 👎 󠁮󠁮󠁮󠁮