Cipher challenge

doranchak · 2015-10-06T07:29:21Z

My experiments to produce Z340-like ciphers are starting to bear fruit. Here's an example cipher - I challenge anyone to try to break it. It is a standard homophonic substitution cipher, read and enciphered in the normal reading direction. 9%(1UyKbjJ.#5BR+3 +28@TSp1l-^NBtHER +B|JLY8OzFR(4>bl* VLk+FU2)^RJ/c5.DO zB(WH8MNR+|c+.cO6 |5FU+<+RJ|*b.cVOL |5FBc)T(ZU+7XzR+k >+lpyV)D|(#kcNz): 68Vp%CK-*<WqC2#pc -Ff2B9+>;ZlCP^BU- 7tLRd|D5.p9O)*ZM6 Bctz:&yVOp%<K+>^C FqNLPp*-WfzZ2d7;k l<S^+/|dT9f4YK+WG j4EyM+WAlH#+VB+L< z|4&+OkNpB1V2Ff/) z+Mp_*(;KSp2(TGO+ FBcMSEG3dWKc.4_G5 pDCE4GyTY+_BAdP2p |+tFMPHYGK+F6pX^2 What makes these generated ciphers unique is that they share very many qualities with the actual Z340: Very similar symbol distribution Same number of repeated bigrams/trigrams Appearance of pivots Top 20 L=2 homophone cycles have the same "strength" as the top 20 from Z340 Same "prime phobia" feature of + and B Simulated transcription errors (Up to 20 polyphones where the alternate symbol assignments come from set of visually similar symbols) Same number of repeated ngrams when even (or odd) positions removed. Same absence of repeated symbols on lines 1-3, 11-13 Matches Jarlve's non-repeat measurement of the Z340 (4462) Same appearance of word-like symbol sequences ("zodiak", "her", "god") Same phenomenon of a trigram repeating in the same column, and that intersects with another repeating trigram Similar appearance of "box corner" patterns Similar appearance of "fold marks" on line 10 Misspellings and missing words in the plaintext There is a section of filler [/list:u:336k7txn] Here's a visual of some of the patterns in the generated cipher that are similar to Z340: The hint for the plaintext is that it comes from a Gilbert and Sullivan play. I fed the cipher into zkdecrypto and azdecrypt. They each were able to recover around 60% to 65% of the plaintext. A 10-symbol section of the cipher is filler. And there are 20 intentional transcription errors. So, that contributes to 9% of the cipher text being messed up. On top of that there are several intentional misspellings. These factors are probably making it difficult to automatically recover more of the plaintext. Another hint: Because of the poetic nature of the plain text, some phrases appear more often than normal. Perhaps that is also increasing the difficulty.

glurk

(@glurk)

Posts: 756

Prominent Member

Damned bean counters.

-glurk

——————————–
I don’t believe in monsters.

Posted : October 7, 2015 3:43 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Thanks doranchak,

So not accept if none of the ngram scores are better, and accept if at least one of the ngram scores is better?

AZdecrypt

Posted : October 8, 2015 12:29 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

Topic starter

Yes, with an additional step:

Once accepting a new solution, check the old ones that you are keeping track of. If any of them are completely dominated by the new solution, remove them. A "completely dominated" solution means none of its ngram scores is better than the corresponding ngram score for the new solution.

Also, depending on how the search proceeds, the list of non-dominated solutions may grow too large, so you have to come up with a strategy to trim them to keep the search speedy.

http://zodiackillerciphers.com

Posted : October 8, 2015 12:34 am

daikon

(@daikon)

Posts: 179

Estimable Member

It’s an interesting idea, but what exactly does using 4/5/6-grams in parallel accomplish? I must’ve missed an important key point in the discussion. 🙂

In my experience higher-order N-grams tend to be much more reliable in scoring the correct plaintext, and the lower-order N-grams tend to be more tolerant of misspellings. Just like Jarvle (although I’m not sure about his exact reasons), I’m actually maintaining an option to run my solver with either 6- or 7-grams. If I don’t get a stable solve with one, I try the other. Higher-multiplicity ciphers tend to be solved by 7-grams more cleanly, where 6-grams produce a lot of nearly equally scored plaintext, only one of which is the correct solution. On the other hand, some of the ciphers get solved by 6-grams almost right away, and the 7-grams require a lot of restarts to get to the correct solution. Seems to be due to misspellings or liberal use of rare words in the plaintext. Makes sense: shorter N-grams "fit" around the misspellings easier, compared to longer ones.

Posted : October 8, 2015 1:17 am

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Thus,

Every cipher has a fixed amount of different scores.

Not accept if none of the scores are better than all of the ciphers in the list. Accept if at least one of the scores is better than one of the ciphers in the list.

If accepted then remove the ciphers in the list for which all scores are lower. If none of the ciphers in the list can be removed and the list size limit is met then do not keep the accepted cipher.

If iterations limit is met then output cipher list.

Okay?

AZdecrypt

Posted : October 8, 2015 1:21 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

Topic starter

It’s an interesting idea, but what exactly does using 4/5/6-grams in parallel accomplish? I must’ve missed an important key point in the discussion.

The idea is that a single objective search might get stuck in local optima, whereas multiple objectives might allow for a better sampling of broader criteria, to increase the chances of approaching the global optimum.

I think it really depends on the problem. Sometimes a good solution has a tradeoff among measurements. For example, if we want our solvers to handle ciphers that are just lists of names or places, we have to use different ngram stats. So you could have one ngram score based on normal language stats, and another based on expected stats for lists of names/places. Without knowing which kind of cipher we have, you could track both objectives, and maintain a set of non-dominated solutions that explore the tradeoffs between those objectives.

http://zodiackillerciphers.com

Posted : October 8, 2015 1:35 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

Topic starter

Thus,

Every cipher has a fixed amount of different scores.

Not accept if none of the scores are better than all of the ciphers in the list. Accept if at least one of the scores is better than one of the ciphers in the list.

Yes, but to be clear: Accept if at least one of the scores is better than one of the ciphers in the list, even if the existing cipher has another score that is worse.

So, new cipher has score [1,4]. Cipher in list has score [3,2]. Keep both, because neither is "totally better" than the other.

If accepted then remove the ciphers in the list for which all scores are lower. If none of the ciphers in the list can be removed and the list size limit is met then do not keep the accepted cipher.

That works but you might get too many samples in one "spot" on the search landscape. The algorithm I’ve used (SPEA2) uses a density function to distribute solutions equally along the objectives, so it can remove the extras fairly (so they don’t cluster).

A simpler approach is to just accept the new cipher and then remove one at random.

If iterations limit is met then output cipher list.
Okay?

Yes.

http://zodiackillerciphers.com

Posted : October 8, 2015 1:41 am

daikon

(@daikon)

Posts: 179

Estimable Member

It’s an interesting idea, but what exactly does using 4/5/6-grams in parallel accomplish? I must’ve missed an important key point in the discussion. 🙂

The idea is that a single objective search might get stuck in local optima, whereas multiple objectives might allow for a better sampling of broader criteria, to increase the chances of approaching the global optimum.

Ah, I see. A different approach to help with the "sparsity" of the higher N-gram samples. There is actually a number of methods that have been developed for the word-level N-grams, but they can be equally applied to letter-level N-grams. Except that they require quite a bit of math background to even begin to understand the ideas: http://www.ee.columbia.edu/~stanchen/pa … report.pdf
I’ve given up trying to get everything they are talking about and used a simple interpolation method that uses lower-order N-gram data for higher N-grams that are missing (i.e. not present in the corpus). I’ve seen people use weighted average of different N-grams, something like this:
combined_score = 0.2 * 3_grams_score + 0.3 * 4_grams_score + 0.5 * 5_grams_score
The weights (0.2, 0.3, 0.5) can be different, but they have to add up to 1. The problem I found with this approach is that if you already have a reliable 5-grams score, the lower-order scores will "contaminate" it. So for my solver I computed the scores "recursively" instead. For each N-gram level, if the score is over a certain small threshold, I use that value, if it’s less (the N-gram is not present in the corpus, or found just once or twice), I average the scores of overlapping lower-order N-grams and subtract a small "penalty" value. Starting with 3-grams, as all 2-grams are pretty much present. Then compute 4-grams data, using 3-grams, and so forth "recursively" one level up at a time. For example, if you have a "ZFACT" 5-gram that’s not present in the corpus (probably; it’s just for illustration purposes). Instead of assigning 0 score to it, I look at overlapping "ZFAC" and "FACT" lower-order 4-grams, one is probably 0 as well, but the other one is pretty well represented, so let’s say it has a score of 100. So I assign the score of (0 + 100) / 2 – 20 = 30. Where 20 is the "penalty". Obviously, if you end up with less than 0, you clamp it to 0. So you end up with "interpolated" 5-grams data, that combine data from 3-grams and 4-grams, which means you only need to do 1 lookup, instead of 3.

Posted : October 10, 2015 10:08 pm

Zodiac Discussion Forum