Drafty code for an `mdbook` preprocessor
⚓ Rust 📅 2025-12-17 👤 surdeus 👁️ 1I guess this code isn't there for review, but it's not an explicit help question either. Rather, I'd appreciate any general or specific suggestions or fixes that comes to your mind.
Background
The mdbook has a template for creating preprocessors over here.
The overall idea is a program to fetch remote, raw markdown files from and place them in a book. Just as an exercise (there is a GenAI package for this already)
For now it won't work with images or styled HTML, only markdown text.
Snippet
Preprocessors simply grab the mdbook's chapters, one by one as a String, and we can transform it. My current attempt is:
/// Replaces the URL to markdown-content by the content itself.
/// Could be used for other formats eventually.
fn urls_to_content(content: &str) -> String {
regex_replace_all!(
r"\s(\{\{\s*#remote\s+([^\s}{]{5,200}.md)\s*\}\})",
content,
|_, _whole, url| {
let body = reqwest::blocking::get(url).unwrap().text().unwrap();
body
}
)
.to_string()
}
- So the user adds
{{ #remote <BASE_URL>/path/to/file.md}}inside a markdown book (see snippet above)- Example of raw markdown URL.
- The remote markdown content is placed in the book where the placeholder above was.
- Likely, should be cached as well (to be done) so it does not download every time. Maybe in a folder called
remoteand using ahashor the commit-hash number if present.
- Likely, should be cached as well (to be done) so it does not download every time. Maybe in a folder called
Probably should create a Client for all GET requests, but I've simply written the idea at the moment.
Something I couldn't figure out is how to re-use the regex in the snippet above everywhere, like in the tests.
The longer snippet is
use lazy_regex::regex_replace_all;
use mdbook_preprocessor::{
Preprocessor, PreprocessorContext,
book::{Book, Chapter},
errors::Result,
};
/// Preprocessor that fetches remote markdown files
pub struct Fetch;
impl Fetch {
pub fn new() -> Fetch {
Fetch
}
}
impl Preprocessor for Fetch {
fn name(&self) -> &str {
"fetch"
}
/// Modify chapters replacing `{{#remote URLs}}` by the .md content.
fn run(
&self,
ctx: &PreprocessorContext,
mut book: Book,
) -> Result<Book> {
// book.toml option for this preprocessor.
let option = "preprocessor.fetch.disable";
match ctx.config.get::<bool>(option) {
// Ok(None) is field unset.
Ok(None) | Ok(Some(false)) => {
book.for_each_chapter_mut(include_markdown);
Ok(book)
}
Ok(Some(true)) => Ok(book),
Err(err) => Err(err.into()),
}
}
/// Run when rendering to HTML,
/// But operate on markdown files.
fn supports_renderer(&self, renderer: &str) -> Result<bool> {
Ok(renderer == "html")
}
}
/// Write markdown to book.
/// This function is separated so we test the replce
fn include_markdown(chapter: &mut Chapter) {
chapter.content = urls_to_content(&chapter.content)
}
/// Replaces the URL to markdown-content by the content itself.
/// Could be used for other formats eventually.
fn urls_to_content(content: &str) -> String {
regex_replace_all!(
r"\s(\{\{\s*#remote\s+([^\s}{]{5,200}.md)\s*\}\})",
content,
|_, _whole, url| {
let body = reqwest::blocking::get(url).unwrap().text().unwrap();
body
}
)
.to_string()
}
#[cfg(test)]
mod test {
use lazy_regex::{regex, regex::Match};
use super::*;
#[test]
fn test_regex() {
let input_str: &str = r#"some text and even more but now
// Should fail: blank in `// a.`
{{ #remote https:// abc.def.g/mypath/to.md }}
// Should pass
{{ #remote https://abc.def.g/mypath/to.md }}
// Should pass
{{#remote https://abc.def.ga.b.c/mypath/to.md}}
// Should pass: `http` is accepted
{{ #remote http://this.is.insecure/fails/to.md }}
// Should pass:
{{#remote https://github.com/rvben/rumdl/blob/main/docs/markdownlint-comparison.md}}
//"#;
fn find_markdown_urls(str_file: &str) -> Vec<&str> {
// I did not find out a way to use the same regex
// since `regex!` and `regex_replace_all!` need a
// literal. And using `static reg=..` was too hard.
let found: Vec<&str> =
regex!(r"\s(\{\{\s*#remote\s+([^\s}{]{5,200})\s*\}\})")
.find_iter(str_file)
.map(|m: Match| m.as_str())
.collect();
found
}
let result = find_markdown_urls(input_str);
assert_eq!(result.len(), 4)
}
#[test]
fn test_url_replacement() {
let content = r"safgdsafgdsaf
hello world
{{#remote https://raw.githubusercontent.com/rust-lang/mdBook/7b29f8a7174fa4b7b31536b84ee62e50a786658b/README.md}}
";
let new_doc = urls_to_content(&content);
assert!(new_doc.starts_with("safgd"));
assert!(
new_doc
.contains("mdBook is a utility to create modern online books from Markdown files.")
)
}
}
1 post - 1 participant
🏷️ Rust_feed