Text editing differencing algorithm

⚓ Rust    📅 2026-01-14    👤 surdeus    👁️ 2      

surdeus

Hi, I have met an algorithm issue about how to calculate the difference between two text content, I am trying to implement a text editor.

Please let me explain it in detail.

I am using the ropey crate as the data structure to store the text content of a text file in my project. In short, ropey provide a set of APIs let you directly updates the text content by lines and every chars on a line, with a great performance. The line index and char index on each line start from 0. Here we have:

pub struct Rope { ... }

And then I am trying to implement the undo, redo features. I want to create a structure Edit, it saves each editing operation from the user: inserting, deleting, changing.

pub struct Insert {
 pub line: usize, // Insert at this line
 pub char: usize, // Insert at this char of the line
 pub payload: String, // Inserted payload
}
pub struct Delete {
  pub line: usize, // Delete at this line
 pub char: usize, // Delete at this char of the line
 pub count: usize, // Deleted chars from the position
}
pub struct Change {
 pub line: usize, // Change at this line. Note: A change only changes 1 line at a time.
 pub start_char: usize, // Change start from this char of the line.
 pub end_char: usize, // Change end at this char of the line.
 pub payload: String // New payload
}
pub struct MultiLineChange {
 pub lines: BTreeMap<usize, Change>, // Multiple line change, maps from line to the change.
}
pub enum Edit {
 Insert(Insert),
 Delete(Delete),
 Change(Change),
 MultiLineChange(MultiLineChange),
}
pub struct EditHistory {
 pub edits: Vec<Edit>
}

So in my thinking, the EditHistory is a list of Edit (i.e. Vec<Edit>), the Rope is always the latest text content. So we can recover the text content before last N edits by applying the last N Edit operation reversely. This operation is a O(N) time complexity operation, we need to do N reverse editing on the current Rope.

Finally, here is my questions:

  1. Is there any algorithm, that can merge 2 or more continuously edit operations (&[Edit]) into 1 edit operation (Edit)? This can help me reduce some duplicated/repeated operations such as first add a char, then delete it.
  2. Now we have two rope snapshots Rope-A and Rope-B, the B is updated from A after some editing operations. Is there any algorithm that can calculate the difference between them? The difference result should be represented by a single Edit structure above.

    It seems this question can vary based on different requirements:

    • Line-iteration compare: Go through all the lines on A and B, for the same line, just string-compare the two lines in A and B (I found some algorithms here: wu-diff-rs, lcs-diff-rs, diff.rs).
    • Some more advanced algorithms to minimize the edit payload just like git diff?

1 post - 1 participant

Read full topic

🏷️ Rust_feed