Yesterday I watched (again) David Oranchaks great presentation about the zodiac ciphers:
https://www.youtube.com/watch?v=BV5R3TBMWJg
I like this video because it is very informative and points out a lot of the odd things in z340. (By the way, Klaus Schmeh is also seen in this video. I have met him on the "Historical Ciphers Colloquium" in Germany a couple of months ago. He wrote several great books and he also has an very interesting blog.)
Doranchak, in your presentation you are talking about the rows which does not contain repetitions (e.g. row 1, 2, 3 and 11, 12, 13). In the past I have experimented a lot with this information and Dan Olsons assumption. But yesterday I asked myself why a repetition measurment should consider only the length of a row. If you take a text and write it within a grid, the number of lines without repetitions depends on the width of the rows. It says nothing about the "chunks" which have no repetitions. If you have e.g. a „+“ at the end of row 1 and a „+“ at the beginning of row 2 then no repetition is recognized. I don’t get the point what the repetitions within a row should state. This makes me wonder if even the symetry between the upper and lower half of z340 is something special.
I have implemented a small test which extracts chunks of symbols without a repetition. I am sure someone else has done that before. But if I compare z408 to z340 I do not see too much interesting differences. The only curious thing is that z408 contains longer chunks without repetition than z340 does. At the first look it seems that z340 is more repetitive than z408. But I think that is because of the high amount of plus signs. What do you think, is this something to have a closer look at?
ATM I am a bit tired. I will test it again without the plus signs and bring up some statistics. At the moment it is just "visual".
Comparison of chunks without repetition between z408 and z340 with my transcription:
z408: abPcZ cUBbdORefX eBWV+gGYFhaHPiKjk YgMJ YlUIdmkTnNQ YDopS1carBPORAUbs RtkEdlLMZJvyzfFHVWgwYi+ kGDaKIph kXwoxS1RNnjYEtO wkGBTQSr BLvcPr BiXkEHMUlR RdCZKkfIpW kjwoLMyarBPDR+uehzN1gEZHdF ZCfOVWIo+nLptlRhH IaDRqTYyzvgciXJQAPoMw RUnbLpNVEKHeGyIjJdoaw LMtNApZ1PxUfd AarBVWz+ VTnOPleSytsUghmDxGb bIMNdpSCEca b bZsAPrBVfgXkW kqFrwC+iaA aBbOToRUC+qvYk qlSkWVZgGYKE qTYAabrLn qHjFBXax XADvzmLjekqg vr rhgoPORXQFbGCZiJTnkqw JI+yBPQWhVEX yaWIhkEHMpeu z340: HERabcdVPeIfLTGghN b+BjkOlDWYmnoKpq BrstM+UZGWjqLkuHJSb bvdcwoVx bO+ +RKgyzM +u12hI7FP +34e5bwRdFcO-ohC eFagDjk7+KQl8 gUtXGVmuLIj GgJp2kO+yNYu +9LzhnM +0 +ZRgFBtrA#4K-ucUV +dJ +ObvnFBr-U +R571EIDYBb0TMKOgntc RJIo7T4Mm+3BFu#zSrk +NI7FBtj8wRcG FNdp7g40mtV 41+ +rBXfos4zCEaVUZ7- +ItmxuBKjObd mpMQGgRtT+Lf#Cn +FcWBIqL + +qWCu WtPOSHT5jqbIFeh Wnv1ByYO
Same comparison with original symbols. Sorry for the misaligned rows. There is something wrong with my fonts thus the line spacings are wrong (my fonts are even not monospaced at the moment. I will correct that later and contribute them to this forum if you like):
I think the presence of clusters of "non repeat" rows suggests that the cipher author may have begun the encipherment scheme on or near those lines. It is presumably easier to avoid symbol repetitions at the beginning of homophonic encipherment than later (this could be tested in homophonic encipherment simulations). It also seems to confirm the direction of encipherment, since the non-repeat phenomenon doesn’t occur when reading by columns. But in z408, the non-repeat rows do not appear at the beginnings of the three sections so they do not coincide with the start of the sections of cipher.
I did some shuffle tests for repeated symbols by rows (for z408, and for z340). The results seem to confirm the presence of a deliberate scheme to avoid repetition of symbols. I suspect it may simply be a natural result of homophonic substitution (which would be some evidence that z340’s author made a real attempt to encode a real message). Also, the fact that z340 has 9 rows with no repeats (compared to z408’s 6 rows) seems to indicate even more deliberate encipherment of some kind, because random placement of symbols would tend to have a lot more repeats in each row. You are right that it depends on the grid width, so this measurement does seem arbitrary. But even at width 17, the presence of 9 non-repeating rows is still a very strong statistical anomaly when compared to random shuffles. In fact, none of my 1,000,000 shuffles produced as many as 9 non-repeating rows.
Your observation about the arbitrariness of grid width makes me think that we need to run the same tests for a wider variety of grid widths. Then we can compute the statistical significance (sigma) of each width, and see if it peaks at width 17 or at some other width. I’ve added this to my already long TO-DO list.
(By the way, Klaus Schmeh is also seen in this video. I have met him on the "Historical Ciphers Colloquium" in Germany a couple of months ago. He wrote several great books and he also has an very interesting blog.)
Klaus is a great guy; he’s really enthusiastic about collecting stories about codes and ciphers. I can be seen in the video for his talk as well. He recently added some English language posts to his blog.
I have implemented a small test which extracts chunks of symbols without a repetition. I am sure someone else has done that before. But if I compare z408 to z340 I do not see too much interesting differences. The only curious thing is that z408 contains longer chunks without repetition than z340 does. At the first look it seems that z340 is more repetitive than z408. But I think that is because of the high amount of plus signs. What do you think, is this something to have a closer look at?
There is a feature in my old "cryptoscope" tool that looks for such sequences. Scroll down to where it says "Largest non-repeating sequences", then click "Show all and chart".
I think it may be useful to compute the mean length of these chunks for z340 and z408, then compare that to random shuffles. I suspect the result will confirm that z408 and z340 have mean chunk lengths that are very significant compared to shuffles. I think all we can really conclude from it is that it is the effect of homophonic substitution (in a horizontal direction), so z408 and z340 are similar in that regard. Also, it might be interesting to answer this question: Is z340’s mean chunk length more or less statistically significant than z408’s?
Klaus is a great guy; he’s really enthusiastic about collecting stories about codes and ciphers. I can be seen in the video for his talk as well. He recently added some English language posts to his blog.
Brilliant.
Thank you for the explanation doranchak! Now I understand why it absolutely makes sense to check how many rows have no repeated letters. For me this is a good evidence that if any kind of transposition is involved it was applied before the homophonic substitution. Otherwise the cipher would behave much more random.
I have not spend much time in a more detailed analysis because I have so many other promising ideas at the moment. My todo list is also very long