Thanks, Jarlve.
I like the idea about not changing all the occurrences of symbols involved with the word placements.
Have you ever looked at multiobjective simulated annealing? https://pdfs.semanticscholar.org/3129/6 … 4617f9.pdf
Multiobjective optimization appeals to me because you don’t have to come up with a "magic" formula that properly combines multiple objectives. I’ve done work with multiobjective approaches as applied to evolutionary algorithms, but not yet with simulated annealing.No I have not and thank you for the link. I do not understand how multiobjective optimization could properly combine multiple objectives in a single result without making a choice somewhere.
Right. It doesn’t do that.
In the multi-objective setting, the notion of "optimality" is relaxed to the concept of "pareto optimality", which is a weaker criterion that admits families of solutions, not "the" optimal solution.
This family is characterized by the condition that improving any particular objective would weaken one or more of the others.
It occurs to me that even if the generated family of solutions was "large" for the purposes of inspection, you could use a more expensive objective function in a second pass, like word recognition.
Your objective function uses 5-grams only? Any particular reasoning for that? Did you experiment with other n-grams, and find that empirically best?
AZdecrypt supports the ngram sizes 3, 4, 5, 6, 7 and 8. And comes with 3, 4, 5 and 6-gram files in the download.
5-grams offer a very good trade off between memory use, speed and solving strength for a wide number of use cases.
Excellent information, thank you!
Right. It doesn’t do that.
In the multi-objective setting, the notion of "optimality" is relaxed to the concept of "pareto optimality", which is a weaker criterion that admits families of solutions, not "the" optimal solution.
This family is characterized by the condition that improving any particular objective would weaken one or more of the others.
I see, thank you for this wonderful explanation. Now I do not have to read the paper anymore.
It occurs to me that even if the generated family of solutions was "large" for the purposes of inspection, you could use a more expensive objective function in a second pass, like word recognition.
This "more expensive second pass" is a very good idea!
In the multi-objective setting, the notion of "optimality" is relaxed to the concept of "pareto optimality", which is a weaker criterion that admits families of solutions, not "the" optimal solution.
This family is characterized by the condition that improving any particular objective would weaken one or more of the others.
It occurs to me that even if the generated family of solutions was "large" for the purposes of inspection, you could use a more expensive objective function in a second pass, like word recognition.
What I like about the pareto optimality approach is that it helps get around the problem of knowing how important the tradeoffs are between different measurements. I usually think of the example of designing a factory that makes cars. You want cars to have few defects, but also be inexpensive to build. A perfect car would be very expensive to build. But a cheap car would have too many defects to be safe. So what is the best tradeoff? It’s hard to know in advance, so an algorithm that yields a surface of solutions (instead of a single point) gives you more options to play with.
In the case of Z340, maybe it has a low 5-gram score (due to unusual bias in the plaintext), but a high number of context-specific words that make sense together.
What I like about the pareto optimality approach is that it helps get around the problem of knowing how important the tradeoffs are between different measurements. I usually think of the example of designing a factory that makes cars. You want cars to have few defects, but also be inexpensive to build. A perfect car would be very expensive to build. But a cheap car would have too many defects to be safe. So what is the best tradeoff? It’s hard to know in advance, so an algorithm that yields a surface of solutions (instead of a single point) gives you more options to play with.
In the case of Z340, maybe it has a low 5-gram score (due to unusual bias in the plaintext), but a high number of context-specific words that make sense together.
Agree completely.
As a relative newcomer to ciphers as an application of optimization techniques, I am somewhat surprised that ngram statistics work as well as they do as an objective function.
Intuitively it seems like you could find a lot of gibberish that has "good" ngram statistics, and conversely you could find valid language with "bad" ngram stats. Maybe my intuition isn’t very good on this.
Is it a true statement that if it was a cheap calculation, we’d always prefer to do some kind of word detection rather than counting ngrams as an objective function?
If so, I could see a benefit to using ngrams as a sort of coarse grained sieve for a subspace of candidate solutions, and then applying the word detection objective function on this reduced problem. Using pareto optimality across a number of ngrams could make for a very nice criterion for such a sieve.
Intuitively it seems like you could find a lot of gibberish that has "good" ngram statistics, and conversely you could find valid language with "bad" ngram stats. Maybe my intuition isn’t very good on this.
It’s very easy for greedy algorithms to take advantage of objective functions that rely solely on the ngram statistics which is why most people end up including IOC and other measurements, to "punish" the candidates that show this behavior. Zkdecrypto, for example, includes IOC, entropy, and chi^2 stats to keep the candidate plaintexts from deviating too much from normal-looking language.
I think it would be very interesting to evolve efficient single-objective functions. In other words, you have a pile of measurements such as IOC, ngram stats, etc. You combine them in some random mathematical way and then measure how well the search converges to the correct key. Then evolve the function to try to discover the mathematical combination that provides the best search efficiency and accuracy. You could even use simulated annealing itself to meta-search for an efficient objective function.
Is it a true statement that if it was a cheap calculation, we’d always prefer to do some kind of word detection rather than counting ngrams as an objective function?
I think it’s tricky because you have to make a lot of choices about how to measure the words without running into the same problems with greedy algorithms exploiting the search. For example, a simple word count isn’t enough. You could include word stats but then greedy algorithms would be biased towards very common words. You also have to deal with overlapping words and words that contain subwords. Do you count CATERPILLAR as the one word, or as CAT and PILLAR? It can get complicated.
If so, I could see a benefit to using ngrams as a sort of coarse grained sieve for a subspace of candidate solutions, and then applying the word detection objective function on this reduced problem. Using pareto optimality across a number of ngrams could make for a very nice criterion for such a sieve.
I like this approach, because it starts with simple features detected in the plaintexts (letter-level ngrams), then works its way upwards (word-level ngrams). I am extremely curious to know if a neural network could be applied here, because they are notoriously great at combining simple features into complex high-level combinations of features, without needing any human to say "this is what a feature is". Could a neural network be trained to accurately tell the difference between a 50% correct plaintext candidate and a 75% correct plaintext? If it could, the network could be a very powerful objective function.
Is it a true statement that if it was a cheap calculation, we’d always prefer to do some kind of word detection rather than counting ngrams as an objective function?
I conjecture that simple word detection is less powerful than 5-grams. You will have to use word ngrams. And word ngrams do not work on ciphers with a collection of random words. I also think that letter ngrams are better for hill climbing transposition ciphers.
The solver https://quipqiup.com/ combines the two in a good way. It first tries to fit words in the cipher given the substitution constraints without really looking at which words it is placing and where they are being placed. It makes a large list of these possible word collections and then scores each of them with letter 3-grams. The explanation given was that letter ngrams are useful since they cross word boundaries.