A diff algorithm does not understand meaning. It finds which parts of two sequences are shared and which parts changed.
The Basic Idea
- Split text into lines, words, or characters.
- Find common parts that can be aligned.
- Mark gaps between common parts as additions or deletions.
- Adjacent deletions and additions can be displayed as modifications.
- Render the result as a highlighted view.
Line, Word, and Character Diff
| Granularity | Best for | Trait |
|---|---|---|
| Line-level | Logs, configs, lists | Fast and easy to scan |
| Word-level | Copy, prose, sentences | Shows changed words inside a line |
| Character-level | Short strings and identifiers | Precise but noisy for long text |
| Structured | JSON and object data | More accurate by field path |
Many diff implementations use longest common subsequence ideas or similar strategies: preserve shared content and explain changes with insertions and deletions.