Cryptanalysis of Homophonic Substitution Cipher Using Hidden Markov
Models
by Guannan Zhong
http://scholarworks.sjsu.edu/cgi/viewco … d_projects
We investigate the effectiveness of a Hidden Markov Model (HMM) with random
restarts as a mean of breaking a homophonic substitution cipher. Based on extensive
experiments, we find that such an HMM-based attack outperforms a previously developed
nested hill climb approach, particularly when the ciphertext message is short.
We then consider a combination cipher, consisting of a homophonic substitution and
a column transposition. We develop and analyze an attack on such a cipher. This
attack employs an HMM (with random restarts), together with a hill climb to recover
the column permutation. We show that this attack can succeed on relatively short ciphertext
messages. Finally, we test this combined attack on the unsolved Zodiac 340
cipher.
They couldn’t crack Z340 but I’m glad to see they are attacking the combination of columnar transposition and homophonic substitution.
Thank you for sharing this paper! It is amazing how many people are working on historical ciphers and solving methods.
They couldn’t crack Z340 but I’m glad to see they are attacking the combination of columnar transposition and homophonic substitution.
They did not attack columnar transposition but columnar rearrangement, or what Largo calls chunksize based transposition.
I find it interesting that they tried to solve the 340 as if it was a keyed columnar transposition instead of route transposition. There is not any evidence that the 340 is a keyed columnar transposition cryptogram.
I don’t think that very many people are aware of the period 15 / 19 bigram repeats.
I wonder if anyone knows of significant statistical information that we do not know.
They are swapping columns systematically in a 17 by 20 grid and then scoring each varation horizontally with 2-grams.
My advice,
– Using 2-grams is okay for very low multiplicity ciphers but not for the 340. Move up to (or also use) 4-grams at least.
– It looks like there is no effective system that controls the letter frequencies (such as the index of coincidence), solves have ioc around 0.045.
– Don’t pare hill climbing with systematically argumented operations, use the best possible randomness (creativity).
– Start easy with few columns (2, 3, 4, 5, 6, 7…) and fine tune the algorithm as you go along.
– Keep it simple, focus on the algorithm, not the paper.
jqkcdwilrbbvnvemo dtfangivexklsykda ncofstyhexnangeca ddgwiwxylldittqkm ustheryobyorthebe xqidowilyorbokmen ytkgbvemyouellenb emandygithakethnu sstorthqmofongpek ewylltiatidgsofnl tqeyyrjbekfdrvski sowmqabyyveslthfo puwengtabyofonexq eoawidymerlolertt fuvnyceurjklyhylt ollbefknidildsbem ovtqnvprstowxfban taxtrexoriwcvenad obobxsgrfhkifylro seckadbwehixgbket
I don’t think that very many people are aware of the period 15 / 19 bigram repeats.
I wonder if anyone knows of significant statistical information that we do not know.
I don’t think so, I’m pretty sure we have the lead here. When people originally looked at the 340 and transposition they stuck to the 17 by 20 grid. Kevin Knight made the same mistake. Zodiac has been underestimated.
(He also talked about the unsolved Zodiac 340, and how it has no obvious reading order bias based on analyzing bigrams in several directions:)
There you can see the 408 has a preference for left/right order, since it results in a spike of repeating bigrams. So, the 340 “could be nonsense, or maybe bigrams are smoothed out via more careful substitutions.”
Source: http://www.zodiackillerciphers.com/?p=448
I’m rather unconvinced. Unless their Markov Model solver can be shown to solve as many ciphers – and as accurately – as modern hill-climbers, I am doubtful. It seem like blowing smoke for a research paper to me. Just saying.
-glurk
——————————–
I don’t believe in monsters.
Hey glurk,
I have just coded something up. Right now it is easily solving such ciphers (column rearrangement) at multiplicities below 0.1 with 17 columns. Just using a column swap operation is probably not the best way to go. At some point the hill climber may figure out a certain amount of columns in the right order, but has to discard this information because it is not in the right place. So I think it would be beneficial to add an operation that allows to swap a certain amount of columns at once and another operation to control the offset.
The main problem will become the multiplicity because that will increase the amount of information that has to be in the right order for the hill climber to catch on.
Thanks for posting David.