Strongest known algorithm for Z408

anderson110

(@anderson110)

Posts: 55

Trusted Member

Topic starter

I’m curious what the strongest known algorithm is for decoding Z408. This looks pretty strong:

http://www.aclweb.org/anthology/D14-1184

Beam search, < 10s on a single processor. I don’t see any obvious way this is "tuned" to Z408, either.

Can anyone beat these guys?

If you want to test hypotheses for Z340, on the assumption that it’s like Z408, but with some "twist" before or after the encoding, seems like the most efficient solver for Z408 is an essential starting point.

Posted : May 7, 2018 8:47 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

I think Jarlve’s AZDecrypt is the fastest/strongest one I know of. Maybe Jarlve can offer some stats on how fast it decodes Z408 on average. It seems almost instantaneous. I think zkdecrypto is the next fastest – it can solve Z408 in a few seconds.

Jarlve and others here have indeed been using AZDecrypt as a starting point for exploring hypotheses of Z340’s construction. This thread is full of such experiments.

http://zodiackillerciphers.com

Posted : May 7, 2018 1:16 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

I’m curious what the strongest known algorithm is for decoding Z408. This looks pretty strong:

http://www.aclweb.org/anthology/D14-1184

Beam search, < 10s on a single processor. I don’t see any obvious way this is "tuned" to Z408, either.

Can anyone beat these guys?

I suspect that simulated annealing is at least one of the stronger algorithms for decoding homophonic substitution ciphers. My solver, AZdecrypt, uses something similar. And from my experience, beam search is not really well suited for this problem and throws allot of its time away.

AZdecrypt should take about 0.2 seconds on average for a 408 decode on a single CPU thread. So on a quad core CPU you can divide that number by 4 since AZdecrypt supports up to 128 CPU threads and can be expanded to support more.

If you want to test hypotheses for Z340, on the assumption that it’s like Z408, but with some "twist" before or after the encoding, seems like the most efficient solver for Z408 is an essential starting point.

I had exactly the same idea about 4 years ago and started working on my own solver that does exactly that.

Glurk’s zkdecrypto debunked that the 340 is similarly constructed as the 408 as it solved all test ciphers that were thrown at it. AZdecrypt further compounded on that and has recently solved a 5 row 408. It is quite clear that the 340 is not like the 408 and that at least some extra difficulty has to be considered. Last year I have tested something like 50 languages on the 340 with no success so count that one out also.

AZdecrypt

Posted : May 7, 2018 6:25 pm

anderson110

(@anderson110)

Posts: 55

Trusted Member

Topic starter

Thanks. I looked at both of these codes.

I did not see any source code for AZdecrypt, nor an algorithm description.

zkdecrypto has source, but it seems to be tangled up in a windows specific gui, and the algorithmic code is a bit difficult to interpret. I did not find an algorithm description for it either.

For experiments like this I’m a linux guy who wants to work on the command line with text files. A C/C++ solver would be ideal for me.

Largo’s project sounds very close to what I am looking for. I hope he can find time to open source it.

Posted : May 7, 2018 6:52 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

zkdecrypto has source, but it seems to be tangled up in a windows specific gui, and the algorithmic code is a bit difficult to interpret. I did not find an algorithm description for it either.

I think zkdecrypto uses a tabu search algorithm. It basically keeps a short term memory of recently visited keys that it avoids revisiting, to help from getting stuck during the search. I think it also allows worse keys to occasionally be retained, similar to simulated annealing.

There is a command-line version of zkdecrypto called zkdecrypto-lite: http://www.zodiackillerciphers.com/?p=197 You should be able to compile and run that via command line on linux.

http://zodiackillerciphers.com

Posted : May 7, 2018 6:58 pm

anderson110

(@anderson110)

Posts: 55

Trusted Member

Topic starter

zkdecrypto-lite looks great. Thanks for the tip.

Posted : May 7, 2018 7:34 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

I did not see any source code for AZdecrypt, nor an algorithm description.

Here is a basic run down.

Load some 5-gram letter log frequencies into a 5 dimensional array(25,25,25,25,25) where the elements represent the letters, from 0 to 25 equals from A to Z. You can use the 5-grams_english_practicalcryptography_wortschatz.txt that come with AZdecrypt found under the Ngrams directory.

The 5-gram file looks like this:

OFTHE1582ATION1577INTHE1570THERE1513INGTH1512TOTHE1508NGTHE1497ATTHE1495OTHER1495TIONS1491ANDTH1490
HODRC 110DUTDB 110BWSNO 110BEUAP 110MODRC 110IUCRC 110TIPJI 110

The format is a 5-gram followed by a 4 digit number which is the logarithm of its frequency in a corpus times 10. You can store it in a byte array if you divide the number by 10. The 5-gram file has multiple lines.

1. Make a random key such that each symbol represents a letter and put this in a solution array.
2. Set up some number of iterations and make only one random change per iteration such that only a single symbol changes its letter. Update the solution array with the change. Take note of the underlining, it needs to be so for simulated annealing to work well: make only the smallest change possible!
3. Score the change by adding up the scores all 5-grams in the solution array. Normalize the score by dividing it by the number of total 5-grams in the cipher and also the index of coincidence. A higher index of coincidence should lower the score!
4. Use some acceptance function such as with simulated annealing. Go back to step 2 until all the iterations are done.

To clarify. If the solution is "ILIKEKILLING" then the 5-grams are "ILIKE", "LIKEK", "IKEKI", "KEKIL", "EKILL", "KILLI", "ILLIN" and "LLING". The individual score of each 5-gram is simply looked up in the 5 dimensional array. AZdecrypt is not much more than these 4 steps but heavily optimized. Feel free to ask any question.

AZdecrypt

Posted : May 7, 2018 7:53 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Jarlve, do you know why simulated annealing works better with smaller changes? I also noticed that trying to do too much at each step seems to lead to more false solutions. For example, when exploring keys, it’s tempting to plug in entire words in addition to just individual letters. But this doesn’t seem to work well in my experiments.

http://zodiackillerciphers.com

Posted : May 7, 2018 7:58 pm

anderson110

(@anderson110)

Posts: 55

Trusted Member

Topic starter

Thanks so much for that rundown!

Posted : May 7, 2018 8:29 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Jarlve, do you know why simulated annealing works better with smaller changes? I also noticed that trying to do too much at each step seems to lead to more false solutions. For example, when exploring keys, it’s tempting to plug in entire words in addition to just individual letters. But this doesn’t seem to work well in my experiments.

Simulated annealing is best suited for problems with a "smooth search space" because of its gradual cool down mechanic I suppose. The nature of the search space is of course partially inherent to the problem, but small changes help to smooth it out.

If you want to plug in entire words I would make it so that plugging in the words do not change all the letters of the symbols associated with it. Thus, only update the plaintext with the word. And measure how well the plaintext substitutes to the ciphertext as added normalization to the score to make up for that. Say that the plaintext is a 80% correct substitution to the ciphertext then multiply the score by 0.8. And add a weight to control the level of polyalphabetism: score * 0.8 ^ weight.

Thanks so much for that rundown!

No problem.

AZdecrypt

Posted : May 7, 2018 10:04 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Thanks, Jarlve.
I like the idea about not changing all the occurrences of symbols involved with the word placements.
Have you ever looked at multiobjective simulated annealing? https://pdfs.semanticscholar.org/3129/6 … 4617f9.pdf
Multiobjective optimization appeals to me because you don’t have to come up with a "magic" formula that properly combines multiple objectives. I’ve done work with multiobjective approaches as applied to evolutionary algorithms, but not yet with simulated annealing.

http://zodiackillerciphers.com

Posted : May 7, 2018 10:14 pm

anderson110

(@anderson110)

Posts: 55

Trusted Member

Topic starter

Jarlve, do you know why simulated annealing works better with smaller changes? I also noticed that trying to do too much at each step seems to lead to more false solutions. For example, when exploring keys, it’s tempting to plug in entire words in addition to just individual letters. But this doesn’t seem to work well in my experiments.

In a continuous setting (most of my experience with optimization algorithms), the "step size" is often made to be a function of T, starting large, and getting smaller as the solution converges. If you don’t shrink the step size, even if you find a nice smooth well, you’re likely to keep stepping over it.

In the discrete case this mental model is a little less clear (to me), but did you try starting with bigger changes and converging to minimal changes as you "cool"?

Posted : May 8, 2018 4:15 am

anderson110

(@anderson110)

Posts: 55

Trusted Member

Topic starter

I did not see any source code for AZdecrypt, nor an algorithm description.

Here is a basic run down.

Load some 5-gram letter log frequencies into a 5 dimensional array(25,25,25,25,25) where the elements represent the letters, from 0 to 25 equals from A to Z. You can use the 5-grams_english_practicalcryptography_wortschatz.txt that come with AZdecrypt found under the Ngrams directory.

Your objective function uses 5-grams only? Any particular reasoning for that? Did you experiment with other n-grams, and find that empirically best?

Posted : May 8, 2018 8:36 am

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Thanks, Jarlve.
I like the idea about not changing all the occurrences of symbols involved with the word placements.
Have you ever looked at multiobjective simulated annealing? https://pdfs.semanticscholar.org/3129/6 … 4617f9.pdf
Multiobjective optimization appeals to me because you don’t have to come up with a "magic" formula that properly combines multiple objectives. I’ve done work with multiobjective approaches as applied to evolutionary algorithms, but not yet with simulated annealing.

No I have not and thank you for the link. I do not understand how multiobjective optimization could properly combine multiple objectives in a single result without making a choice somewhere.

AZdecrypt

Posted : May 8, 2018 7:11 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Your objective function uses 5-grams only? Any particular reasoning for that? Did you experiment with other n-grams, and find that empirically best?

AZdecrypt supports the ngram sizes 3, 4, 5, 6, 7 and 8. And comes with 3, 4, 5 and 6-gram files in the download.

5-grams offer a very good trade off between memory use, speed and solving strength for a wide number of use cases. Generally higher order ngrams are better in such that they can solve more difficult ciphers. We use the term multiplicity to measure the difficulty of a substitution cipher, multiplicity = unique symbols / cipher length. The 340 has a multiplicity of 0.185.

5-grams can handle multiplicities of up to 0.3 and sometimes even higher. But around 0.3 5-grams may fail and then 6 or higher order ngrams may still succeed to get a solution. Though for higher order ngrams to be really effective you also need to compile from increasingly large corpora. I compiled the reddit ngrams that come with the AZdecrypt download from a 1 terabyte reddit corpus. Though I downsized it to about 250 gigabyte of quality sentences.

With the exception of the substitution + row bound solver, a concept by smokie treats, AZdecrypt uses only one ngram size at a time. Both me and daikon found that lower order ngram scores will typically contaminate the results of higher order ngram scores if not used very specifically. doranchak suggested that a multiobjective optimization may be able to overcome that. Though using many different ngram sizes at once will also slow down the algorithm considerably. I am quite sure it is not worth it for most use cases.

AZdecrypt

Posted : May 8, 2018 7:41 pm

Zodiac Discussion Forum

Strongest known algorithm for Z408