Parsing bad JSON with serde_json

⚓ Rust    📅 2025-11-01    👤 surdeus    👁️ 7      

surdeus

I am using serde_json to parse large JSON files from the healthcare world. The healthcare industry has some of the worst data quality issues I have ever encountered, and there are many instances of malformed JSON files. Some issues I have been able to work around with attributes and custom deserializers, and others are a lost cause.

The latest issue I'm trying to work around are files with, essentially:

{"field": 0000000000}

which I realize is not valid JSON. I would like to treat the repeated zeros as just 0, but since it's not a valid Number, it fails to parse before I can do anything in a custom deserializer. Does anyone have any ideas for a way to approach this?

Some details: the files are monstrously large, so I must use from_reader(). So a simple preprocessing step to search and replace ten zeros with 0 is not trivial. jq does accept numbers with leading zeros, so I could insert that into the pipeline to clean up the input, but that would probably introduce some overhead and I'd prefer to solve it in Rust. I would be willing to entertain using a different crate, but the rest of the code is heavily serde dependent so would prefer to stay in the serde world.

This may be a "lost cause" situation, but I'm interested to see if anyone has any good ideas. Thanks.

1 post - 1 participant

Read full topic

🏷️ Rust_feed