[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

Version Compare/Merge: The Problem



Character movement is surprisingly difficult to dissect 
analytically. Let us look at the simplest possible 
documents that demonstrate movement: let us take the 
document "A B" as the original, and the document "B A" as 
the final.  There are 4 ways to describe the 
transformation: we could say

- A and B were transposed; 
- A was moved behind B; 
- B was moved in front of A; 
- A was moved behind B while at the same time B was 
moved in front of A.

Of these only one is intuitively a poor 
description--the last description saying that both were 
moved, which is clearly redundant. Each of the other three 
descriptions will be considered "correct" in different 
circumstances, depending on the nature of A and B. If A and 
B are both single characters, or if they are of 
approximately the same size (both sentences, both 
paragraphs, both chapters), they would be described as 
"transposed". If A is substantially larger than  B, then we 
would say that B was moved around A, and vice versa.

Alas, a naive computer program would choose the fourth 
description: after all, both A and B moved, and so the 
program would list them both as having moved. So we need to 
develop a more sophisticated algorithm that will construct 
"simple", "intuitive" descriptions of the transformation.

First we need to define what it means to have a 
"simple" or "intuitive" description. After much 
contemplation, I offer the following heuristic. A simple 
description is one which:

- minimizes the number of movements
- moves small objects large distances rather than 
moving large objects small distances
- Minimizes the number of objects moved large 
distances: another way of saying this is, prefers local 
movement to global movement.

I would like to say something about minimizing nesting 
of movement as well, but in fact nested motion is in some 
circumstances quite intuitive. For example, if you moved a 
sentence in which a mispelled word had its letters 
transposed, you have a nested motion: the characters are 
moved inside the sentence that is moved, as in, "I the big 
red package have recieved" being rewritten as "I have 
received the big red package". You would say "I flipped the 
'i' and the 'e' in 'received', then moved 'have received'". 
You would NOT say, "I moved 'have rec', then I moved 'e'". 
Both descriptions fullfill the requirement of minimizing 
motion--both take 2 moves. But the second description 
violates the rule "minimize the number of objects moved 
large distances".

Anyway, within the context of these 3 rules for a 
simple description, I would say "minimize the level of 
nesting", though the first 3 rules leave rather little room 
for nesting control.

The above is only a heuristic specification; the result 
is NOT necessarily "intuitively optimal". However, I hereby 
define the term "heuristically optimal" to mean that an 
algorithm generates a transformation that meets the 
heuristic specification.  

The algorithm described next generates heuristically 
optimal descriptions for most common editing situations; it 
generates valid descriptions under all circumstances.

--marcs