Zodiac Discussion Forum

Code to solve a rot…
 
Notifications
Clear all

Code to solve a rotating chiffre disk

19 Posts
7 Users
0 Reactions
3,487 Views
(@f-reichmann)
Posts: 30
Eminent Member
Topic starter
 

I want to share my Java code with you, that I wrote to attempt breaking z340 if it is a rotating chiffre disk, and that may be partially re-used for trying other cipher methods.

Starting from trying to understand the zkdecrypto implementation, I have re-implemented in Java, with avoiding any line I would not understand where it came from. Another goal is to make the strong part in zkdecrypto, which I regard as the scoring function, and the hill-climbing (because it just simply works impressive well!), easy to use for trying out other encryption methods: Homophonic cipher can be considered as safe to not be the z340 encryption method. Scoring and hill-climbing remain valid approaches for any "linear" (I mean: small change to the key makes a small change to the decipher result, so that it can be "optimized" little by little) encryption method.
Viginère would be another candidate, but I believe with pen and paper with a homophonic substitution afterwards more complicated than a chiffre disk to carry out.

I like the idea to believe that Zodiac used a rotating chiffre disk, with a clear text inside, and symbols outside, and with each letter, turning them by one. Variables are the arrangement of the alphabet inside, and the symbols outside, and the size of the disk. I believe so, because it is easy to create with pen and paper and scissors, is the third next intuitive stupid idea after caeser and homophonic substitution, and at the time I grew up in Europe around a decade after the murders, was existing as a children toy (with alphabet inside and outside).

I like the idea that when you put symbols that resemble letters on the cipher symbol disc at specific places, you can create symbol sequences in the enciphered text that look like a clear text, enabling to write something that looks like "her body", or "zodiac" signature.
The ultimate goal in this direction might be to encipher a text, that as enciphered text is again a readable text, which you could achieve with 26 alphabet-like looking cipher symbols on every of the 26 positions (and then you have 26×26 cipher symbols, which is point-less, so it is only cool when you manage with few symbols).
In addition, by preferring some symbols, or by uneven distribution of the symbols on the symbols disc, it is possible to create relatively arbitrary uneven distributions, and introduce features into the cipher text, without unveiling the clear text letter distribution on a short text.

I do not like a lot the idea of transpositions. This as well has be tried already, without convincing results.

I found a rotating chiffre disk with non-alphabetic order of clear letters on the "clear disk" hard to break on a 340 letters short length with 63 symbols, and I perceived it under-represented in the analysis I found so far on the web.

Because 26=2*13, a 26 letters chiffre disk is harder to phyiscally construct than 24, or 32 sizes (which contain more 2 and 3 as prime factors, so you do not need to calculate angles but can just easily draw by hand dividing into halves), so that I do not take for granted that, in the case my speculation should be true at all, the size must be 26. Unfortunately, this further increases the degrees of freedom. I believe though, as long as the tried chiffre disk size, and the real one, have some prime factors in common, there should be at least some statistical signal of reaching a bit higher scores.

My code is available on github: https://github.com/freichmann/jDecryptor

It features:
– Creating the language statistics from world-literature that I downloaded from Project Gutenberg
– Breaks the z408, and sample 340 homophonic with similar symbol distributions (testing usually with 408 clear text, or with Luther King, or with Daniel 5)
– Breaks rotating chiffre-disk enciphered 1344 letter sequence with ~63 symbols from Daniel 5 clear text
– Has few "magic numbers"
– Scoring function uses no absolute values, and no functions with unsteady features or derivatives, but are based on Gauss and standard deviations, and does not arbitrarily weight chi2 or ioc or parameter count, but just multiplies the values, so that I consider it as an improvement in transparency
– Better – though not yet matured – separation of the different classes
– Multi-threaded. This is of more than insignificant, but still of less than linear use because all threads are to converge to the same solution: Multi-threads then only give the fastest of the race, but not the sum of speed of each.
– Possibility to compute ioc and entropy on bi- and longer letter sequences. This has shown to be of little use for a text only 340 long

To be done:
– Calculate the remaining "magic numbers" (mainly how many sigmas deviation is accepted for the statistical features) from the text samples, completely removing "magic numbers"
– Enhance the code separation into classes that are of completely distinct functionality, to ease implementation of other cipher methods
– Enhance the scoring function until it successfully drives the hill-climber to solve a 340 symbol rotating chiffre disc text (so far best I managed was around 600, as shortest), and to converge on as poor as possible cipher length/unicity distance ratio’s

I do not care much about:
– GUI’s. I want to run it on my raspberry pi 3b from command line on Linux, to safe on electricity bill
– Speed. After almost 50 years, a few days do not seriously matter. I prefer the code to be understandable, portable, and re-usable. And hopefully, to be simply and only successful at the very end.

So mostly, you get a re-implemented zkdecrypto, with a different cipher method, in Java, and a personal perception of more transparent code, and back to command line.

I share the code as GPLv3, because I hope it might help someone to be inspired as I was by the zkdecrypto open source

Enjoy!
Fritz

 
Posted : February 27, 2018 12:18 am
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Fritz,

Nice work! Thanks for sharing this. The rotating chiffre disk idea is interesting. If I understand your idea, it seems similar to Vigenere, but with these changes:

– At the left side of each row of the Vigenere table, instead of writing A-Z in order, they are written in a random or arbitrary order
– Along the top of the table, the letters A-Z are written in order
– In each square of the table, one or more cipher symbols are written
– Each row shifts the symbols one position to the left (or right)
– When enciphering, instead of using a keyword to find a letter on the left side of the table, the rows are considered in sequence.

In homophonic, the quantity of symbols per plaintext letter is determined by the plaintext letter frequencies. But I’m wondering: In this rotating disk scheme, how would Zodiac determine the quantity of symbols to put along each letter on the outside ring of the disk? Since he’d rotate the disk after writing each letter assignment, the size of the next set of cipher symbols would not necessarily be proportional to the expected frequency of the next letter in his plaintext message.

Also, what do you mean by "magic numbers"?

http://zodiackillerciphers.com

 
Posted : February 27, 2018 2:41 pm
(@largo)
Posts: 454
Honorable Member
 

Hi Fritz,

I have spent a lot of time working with z340 and developed a quite large tool chain (first in Python, then in C#). What I always wanted to have was a command-line solver. If I got it right, that’s what you developed. I think I have time tonight to check out the repository. Unfortunately I don’t have much time for z340 at the moment.

BTW: Are you from Germany?

@doranchak:
https://en.wikipedia.org/wiki/Magic_number_(programming )

 
Posted : February 27, 2018 6:32 pm
(@largo)
Posts: 454
Honorable Member
 

I’ve checked out the repository and installed IntelliJ Idea CE. The project compiles but I think I found a little error in the documentation (/doc/StartExample.txt). It seems that the read mode is missing:

Current: "[a-z]+" "[^a-z]+" RUN 4 10000 5 NOROTATE NOSWAP NOSHUFFLE 4 OUTER ...
Fixed?: "[a-z]+" "[^a-z]+" RUN 4 10000 5 NOROTATE NOSWAP NOSHUFFLE TXT 4 OUTER ...

I really like your coding style! Especially the use of exceptions. Nice work!

 
Posted : February 27, 2018 9:24 pm
(@capricorn)
Posts: 567
Honorable Member
 

This sparks more fragmented memories for me of long-ago conversations with my poi. Particularly, ones having to do with combination locks such as used on school lockers, bicycle lockers, safes, etc.

Also of posts I’ve recently read on other sites and "overlay" theory and a very recent one by someone who claims she was immediately able to decipher all very quickly but is taking her time to reveal it and put out her first step. (This is on the zodiac killer site). She refers to numbers from the cards he sent IIRC so am wondering if these numbers could perhaps be the combination needed to open a locker somewhere or if the idea could somehow apply to this disk.

 
Posted : February 27, 2018 9:41 pm
(@f-reichmann)
Posts: 30
Eminent Member
Topic starter
 

Many thanks for the responses, I am very happy for the thoughts and feed-back.

@doranchak
Very interesting and attractive thought. First, what I had in my mind when writing, was literally something like this disk here: https://de.m.wikipedia.org/wiki/Vigen ère-Chiffre#/media/Datei%3ACipherDisk2000.jpg. It features a reverted alphabet on the inside (which is my swapping, and reverting is one possibility to swap the inner disk), and on the outside disk on the picture, I thought placing the 63 symbols, in again some arbitrary way. Each encrypted letter, I would turn the disk by one, and randomly or systematically chose a symbol on the outer disk opposite of my clear text letter on the inner. The result would be the encrypted text. To decrypt, I would need to reverse the process. To crack, I need to find the symbols and letter distributions. This is easy to construct with pen, paper and scissors, and in easy reach with late 60ies technology, sotosay.
Your thoughts on mapping this to Viginère is attractive, because if I re-code, my idea may become a special case of a generic Viginère with only adding a shifting of one line on the rows of the Viginère table with each letter, and I could validate both with the same code. The obvious difference is, my idea needs no key phrase, while Viginère does. I now understand it gets lost implicitly the moment I swap the order of the left column, and chose to shift it each letter by one, instead of using a key word. The top of the table is then not written in order, but breaks down to only one single column. Or, my cipher method suggestion uses a Viginère key that consists of only one single letter. That is easy to extend, and then the code is fit to challenge both chiffre disk and Viginère, which may help to rule out both.

To answer on how to chose how many symbols to place where on the symbols disk: I call that a feature instead of a bug of the idea. Taking sophisticated care to hide frequencies is no longer needed, because all will distribute evenly by themselves. I could even use the liberty to play with it, and let “+” stand alone, to give it a high frequency in the encryption. And, with the matching order of clear letters on the inner disk, I could even encrypt the text with only the same symbol, each time representing a different letter. The key then becomes a part of the message. Which is my preferred way to interpret the “My name is …” cipher. It may become transparent with the key, if that chiffre disk method was really used.

Magic numbers: Those numbers in zkdecrypto that work, and are not self-explanatory. Most significant in the zkdecrypto calcscore function, from https://github.com/glurk/zkdecrypto/blo … o/z340.cpp:
if(info->dioc_weight) //DIC, EDIC
{
score_mult*=1.05-((info->dioc_weight>>1)*ABS(FastDIoC(solv,score_len,2)-info->lang_dioc));
}

if(info->chi_weight) score_mult*=1.05-(info->chi_weight*ABS(FastChiSquare()-info->lang_chi))/60.0;
if(info->ent_weight) score_mult*=1.05-(info->ent_weight*ABS(FastEntropy()-info->lang_ent))/150.0;

Why the heck does this work so well? Why is it ok to risk negative numbers if 1.05 – maybe very large number? Why is the non – differentiable abs function used? Where come 60 and 150 from, 1.05, and the weights? How were the language files created, with logarithms of likelihood ( ok, that logarithms clearly is for numeric stability, avoiding trouble adding large with many many small numbers). There is a lot of magic in above lines, they work, and maybe only for the conditions they have been prepared for, and I do not grasp which these have been.
I tried to go to standard statistics, working with Gauß, and with squares as metrics instead of abs. I think it is possible to remove all magic.

@Largo
You are right, it is a command line solver. I will correct the example. The text or binary mode is needed when ciphers are encoded unicode (copy paste from web pages), or as numbers just written to disk (when using numbers as symbols, and converting them to symbols by just writing these values as characters to disk).
I live near Munich. We seem to be close, you are from Frankfurt? What is your tool chain?

 
Posted : March 1, 2018 2:26 am
(@capricorn)
Posts: 567
Honorable Member
 

Would this be similar to something that rotates on a pen or pencil?

 
Posted : March 1, 2018 2:37 am
(@f-reichmann)
Posts: 30
Eminent Member
Topic starter
 

I think I now fully understood the analogy to Viginère: The rotating chiffre disk is the special case of Viginère with the table top be a-z in sequence, and the exact same sequence as Viginère key. Making in my model the inner clear letter disk smaller or bigger, is equivalent to choosing shorter or longer keys in Viginère. Swapping the inner disk is re-arranging the left of the table, is equivalent to swapping the symbols in the inner of the table, so that it can be removed as degree of freedom.

I will change the code to be Viginère with symbols inside of the table, and allow arbitrary keys. That will allow to validate the rotating chiffre disk as a subset of solutions of Viginère.

That is good news, because there is a lot of available analysis on Viginère, including its unicity distances, and methods to attack. It as well then makes sense to check for the signatures of Viginère in the cipher, like a statistical IoC signal at the correct key length.

The not so good news is: I consider this idea very likely to have been tested already many times. If the attempts had provided a result, then we would already know about it.

 
Posted : March 1, 2018 11:18 am
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

I could even use the liberty to play with it, and let “+” stand alone, to give it a high frequency in the encryption.

I thought about that, and wondered how often "+" would show up if you advanced the disk one position for each letter of the plaintext during encipherment. It means that at each step, "+" would be associated with a different plaintext letter (one of 26 possible letters). I think this would mean that for each step of encipherment, there’s a 1 in 26 chance the "+" would turn up for your letter assignment. For 340 cipher symbols then, the expected number of +’s would be 340 * (1 / 26) = 13. But the actual count in Z340 is 24. So, what if two different spots in the outer disk have a "+"? Then the expected number of +’s would be 340 * (2 / 26) = 26, which more closely matches the actual count.

But I’m not certain the chance is truly 1 in 26, though, since it may be influenced by plaintext letter frequencies and the order of plaintext letters on the inner disk.

Magic numbers: Those numbers in zkdecrypto that work, and are not self-explanatory. Most significant in the zkdecrypto calcscore function

Ah, I see what you mean now. Yes, a lot of those numbers and computational steps are derived from experimentation and trial-and-error.

I think a better approach would be to make the algorithm to tune itself for the best parameters and evaluation function. There are some interesting ways to do it. For example, you can generate a big set of test ciphers with known solutions and run the hillclimber to find the solutions. Then measure how well the hillclimber performed. Tweak the evaluation function and/or parameters, then run the hillclimber again. Then keep the new function and parameters if they improve the hillclimber’s performance. Basically, you are hillclimbing the hillclimber itself! This kind of approach is sometimes called a "metaheuristic" algorithm. It is useful for this problem of codebreaking, since as you say it is a mystery to why the "magic" numbers work so well. A metaheuristic approach forces the computer to discover the best numbers and formulas (in zkdecrypto’s case, this was done by hand).

http://zodiackillerciphers.com

 
Posted : March 1, 2018 5:08 pm
(@f-reichmann)
Posts: 30
Eminent Member
Topic starter
 

I thought about that, and wondered how often "+" would show up if you advanced the disk one position for each letter of the plaintext during encipherment. It means that at each step, "+" would be associated with a different plaintext letter (one of 26 possible letters). I think this would mean that for each step of encipherment, there’s a 1 in 26 chance the "+" would turn up for your letter assignment. For 340 cipher symbols then, the expected number of +’s would be 340 * (1 / 26) = 13. But the actual count in Z340 is 24. So, what if two different spots in the outer disk have a "+"? Then the expected number of +’s would be 340 * (2 / 26) = 26, which more closely matches the actual count.

But I’m not certain the chance is truly 1 in 26, though, since it may be influenced by plaintext letter frequencies and the order of plaintext letters on the inner disk.

The above neglects the high sigma for short texts in the random distribution. Taking a simple example, encrypting the first 340 letters of the solved z408, with a rotating chiffre disk that maps a-z to A-Z 1:1, carried out below. As clear text the first 340 characters of z408

ilikekillingpeoplebecauseitissomuchfunitismorefunthankillingwildgameintheforestbecausemanisthemostdangerousanimalofalltokillsomethinggivesmethemostthrillingexperienceitisevenbetterthangettingyourrocksoffwithagirlthebestpartofitisthatwhenidieiwillbereborninparadiceandalltheihavekilledwillbecomemyslavesiwillnotgiveyoumynamebecauseyouwilltry

when encrypted with the 1:1 mapping rotating chiffre disk

{[a:A][b:B][c:C][d:D][e:E][f:F][g:G][h:H][i:I][j:J][k:K][l:L][m:M][n:N][o:O][p:P][q:Q][r:R][s:S][t:T][u:U][v:V][w:W][x:X][y:Y][z:Z]}

give the encrypted text

IKGHAFCEDZDVDRAAVNJLIFYVGJTHQPKHOVZWKCWGUDWXZLLZRWJBNJGIHDHZOZBSUNYPSWBOKKSUGTTACZWPMXERDXGGTPWXAAJFRJGSOTQXJDGTDFVPZYFZURTSYTQHVIIMEDEQYLEVJWSZADDCPYOQPLPHEWNBNDYGUVYIWFQGOWJLZYIUVIAMEBPOCGYPEJFEANUBWMLBMWJBGHPIPCYUWJJEOEFZPRBPYYLDVXHDLFZDYBOZBAPRDPLXZUOSTDTBDHABWIXTDCJWSVTLFNSPRQIGYJLKZBYJGXEPIAOIQDSFQSRSSWIJVDWLQHSGSDUQSPMFCNGVABMONURX

which has a symbol distribution of:

A : 3.82E-02 : 13
B : 4.41E-02 : 15
C : 2.35E-02 : 8
D : 6.47E-02 : 22
E : 3.53E-02 : 12
F : 3.53E-02 : 12
G : 5.00E-02 : 17
H : 3.24E-02 : 11
I : 4.12E-02 : 14
J : 5.00E-02 : 17
K : 1.76E-02 : 6
L : 3.82E-02 : 13
M : 2.06E-02 : 7
N : 2.65E-02 : 9
O : 3.53E-02 : 12
P : 5.59E-02 : 19
Q : 3.24E-02 : 11
R : 2.94E-02 : 10
S : 4.71E-02 : 16
T : 3.53E-02 : 12
U : 3.24E-02 : 11
V : 3.82E-02 : 13
W : 5.29E-02 : 18
X : 2.94E-02 : 10
Y : 4.71E-02 : 16
Z : 4.71E-02 : 16

The maximum number is the field "D", with a count of 22, not far off from 24 in z340. It is quite easy to lower the other values, by distributing symbols over them.

So let’s try out exactly that. Having 63-26=37 symbols left to distribute over 25 fields, it means that at most 37-25=12 fields will have more than 2 symbols, and at minimum 13 outer fields will have at most 2 symbols. That means we expect for about 13 of the remaining outer fields, except "+", a value that is distributed like above statistics of the sample cipher, and around half the size of the "+" value. Looking at http://zodiackillerciphers.com/wiki/index.php?title=Comparison_of_cipher_alphabets, I count 8 times a count in the range of 9-12, with one at 12, very nicely the half of the observed maximum count of "+".

I feel that statistical match is quite good, and it is premature to exclude the possibility of a chiffre disk based on the "+" count.

 
Posted : March 2, 2018 9:45 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

I found a rotating chiffre disk with non-alphabetic order of clear letters on the "clear disk" hard to break on a 340 letters short length with 63 symbols, and I perceived it under-represented in the analysis I found so far on the web.

The 340 has no vigenère keyword length spike of 13 and/or 26 which should be there if it is a perfect example (no nulls etc) of a "rotating chiffre disk with non-alphabetic order" even after homophonic substitution.

AZdecrypt

 
Posted : March 3, 2018 1:16 am
(@f-reichmann)
Posts: 30
Eminent Member
Topic starter
 

The 340 has no vigenère keyword length spike of 13 and/or 26 which should be there if it is a perfect example (no nulls etc) of a "rotating chiffre disk with non-alphabetic order" even after homophonic substitution.

Let’s test that assumption. The result is not at all as clear as one might intuitively anticipate.

When the original z340 is encoded as

HER>pl^VPk|1LTG2dNp+B(#O%DWY.<*Kf)By:cM+UZGW()L#zHJSpp7^l8*V3pO++RK2_9M+ztjd|5FP+&4k/p8R^FlO-*dCkF>2D(#5+Kq%;2UcXGV.zL|(G2Jfj#O+_NYz+@L9d<M+b+ZR2FBcyA64K-zlUV+^J+Op7<FBy-U+R/5tE|DYBpbTMKO2<clRJ|*5T4M.+&BFz69Sy#+N|5FBc(;8RlGFN^f524b.cV4t++yBX1*:49CE>VUZ5-+|c.3zBK(Op^.fMqG2RcT+L16C<+FlWB|)L++)WCzWcPOSHT/()p|FkdW<7tB_YOB*-Cc>MDHNpkSzZO8A|K;+

then its Kasiski test is

Kasiski test for 2-grams : ++:[2, 113][173]; ():[7, 37]; Bc:[3, 23]; Op:[101]; Np:[2, 5, 31]; )L:[2, 11, 11]; UZ:[2, 3, 5, 7]; Fl:[193]; p7:[2, 5, 11]; +R:[107]; G2:[2, 2, 2, 2, 2, 2, 2, 2][2, 53]; #O:[103]; 5F:[2, 2, 2, 17]; By:[7, 19]; |5:[2, 2, 2, 17]; 8R:[7, 19]; (#:[2, 2, 2, 2, 5]; +&:[2, 2, 2, 3, 5]; O+:[2, 2, 2, 2, 2, 2]; FB:[3, 7][3, 23]; M+:[2, 2, 2, 2, 2][2, 2, 5, 5]
Kasiski test for 3-grams : FBc:[3, 23]; |5F:[2, 2, 2, 17]
Kasiski test for 4-grams : No repeats found.

For a heuristic test with the rotating chiffre disk at 63 symbols and a length of 340 charactes, sticking again to the first 340 letters of z408 which are

ilikekillingpeoplebecauseitissomuchfunitismorefunthankillingwildgameintheforestbecausemanisthemostdangerousanimalofalltokillsomethinggivesmethemostthrillingexperienceitisevenbetterthangettingyourrocksoffwithagirlthebestpartofitisthatwhenidieiwillbereborninparadiceandalltheihavekilledwillbecomemyslavesiwillnotgiveyoumynamebecauseyouwilltry

These encrypted with 63 symbols rotating chiffre disk

{[a:Pp+][b:0Qq][c:1Rr][d:2Ss][e:3Tt][f:4Uu][g:5Vv][h:6Ww][i:7Xx][j:8Yy][k:9Zz][l:Aa][m:Bb][n:Cc][o:Dd][p:Ee][q:Ff][r:Gg][s:Hh][t:Ii][u:Jj][v:Kk][w:Ll][x:Mm][y:Nn][z:Oo]}

gave the cipher text with the unbiased computer random choice of the chosen symbol out of the set, when I ran it today

Xzvwp4r3SOskSgP+Kc8aXunK5Yi6FEzWDkoLzRlvjSlMOaaogLYQcy5XW26odoqhjcNehL0dzZhjViI+ROleBMTg2mv5iElmPPyugy5HDIfmy2vI2UKeONuoJGIHniFWK77BtStFnaTK8LhopS2RENdfEaeWTLCQCSnVJKn7l4FvDlyAOnxJk7pb30edrvNetYuTpcj0LBAqBlyqvWe7eRNJlY8Td3UOEGQENna2kM6SAUoSnQdO0PegsEamoJdhiSiQSwpqLXmisr8lHKiA4cheGfxvNyaZoqn85mTex+d7f2hUFhghhlX8ksLafwH5HsJFHebURcVKPqBDcJGm

and has only 61 symbols instead of 63 by chance, which makes the hiding of factors weaker instead of stronger. The Kasiski test result is

Kasiski test for 2-grams : oq:[227]; hj:[11]; EN:[79]; ly:[2, 2, 2, 2, 2]; qB:[2, 5, 13]; 5H:[7, 31]; na:[3, 31]; Ne:[2, 2, 31]; vN:[2, 47]; wp:[2, 3, 43]; oJ:[7, 19]; Sn:[2, 3, 13]; y5:[2, 2, 2, 2, 3]; Ea:[97]; JG:[7, 31]
Kasiski test for 3-grams : No repeats found.
Kasiski test for 4-grams : No repeats found.

The factor 13 appears twice, and is hidden behind 31.

Running this example several times, tends to give sometimes more, sometimes less of the visibility of the factor 13. Because Zodiac can not have had access to modern computer technology that allows verifying these signatures easily and hence can not have optimised in a try-and-err replacing symbols as it would now be possible to completely hide the 13, and on top it seems doubtful to me that he even was aware of the methodology, that clearly puts a question mark behind the idea, despite the ease it would be to carry it out with pen and paper.

Mathematically, the amount of repeated n-grams to expect – let’s call that number N – is the sum of all n-grams that have a likelihood of 2/cipher length (which is 2/340 for z340) or greater in standard text, summed for each with multiplying with the cipher length. I am tempted to say you can neglect the deviations here, because you these will balance out with the tail where you cut off.
If the Viginère key length is K, then in only 1/K cases of the N repeats the n-gram will match the correct position. If each position is represented by R symbols, then it will again be reduced by 1/R. Makes the amount of n-grams that come from true repeats as N/(R*K). The next question is, how big would the noise be.

I’ll compute that number on occasion to document the limits for Kasiski to be useful for the generic case with real numbers for the English language. For our purpose here the example cipher is enough to show that hiding the factor 13 by chance absolutely is in statistical reach, and has a risk of not being hidden. In my example, it is only second to 31 as a noise factor, and does not look much different than the 17 in z340.

I consider it a risk, but not necessarily as an immediately convincing show-stopper as clear as it might have been anticipated. There is no such thing as a spike at 13, but at least it does appear in the test cipher.

As written at the beginning, I do not take the 26 chiffre disk size as granted. Easier to construct would be a size of 24, that you can simply fold by half 3 times in order to divide the full circle by 8, and then once by eye divide by 3, to make a 24. Which is a lot easier that dividing the 360 degree circle by 13.

Addendum
I did about a handful of more test runs. Here is an example of a random generated cipher like the above, that has the factor 13 not appearing in Kasiski, and has 63 symbols

X9vWp41t2OsksG++Kcyax4NK5YIWFeZwdKOLzRl5j2lMoaaogLyQcyVx6SwoDoQhJcneHlQd9zhJVIiprolebMtg2mvVIElm+P8ugy5hDIfmYsVi24KEonuojgihNifWKx7b32TfNatKYLho+ss1EndFEaE63LcqC2nvJKN7lUf5DLYaONXJkX+Btqed1VNe3yuTpCjQlbaQBly056exe1nJL883dTUoeG0ENNASKmw2AuOsN0dO0PegSeAmOjdHiSIQS6+QLxmiS18lhkiA4cHeGFXvnyazo0nyvm3e7+DxFSH4fHgHHL7YkSlAFwhVHsJFHeB4rCvK+qbDCJGm

so that it is a valid evidence of the significant possibility of 26 not being represented using a 26 sized chiffre disk at 340 text length. The Kasiski result is, with a count of the prime factors found

Kasiski test for 2-grams : hJ:[11]; eG:[5, 11]; VI:[3, 5]; Os:[229]; cy:[5, 7]; ya:[3, 89]; iS:[11]; ny:[2, 3]; He:[2, 23]
{2=2, 3=3, 5=3, 7=1, 11=3, 23=1, 89=1, 229=1}
Kasiski test for 3-grams : No repeats found.
Kasiski test for 4-grams : No repeats found.

As an observation, I tend to get only few 3-grams, while the original z340 has two.

I as well tried with 24 clear text letters to test my idea of smaller chiffre disks against the Kasiski visibility of the factors 2 and 3, dropping z and q that do not appear in the first 340 letters of the z408 clear text. The factors of 2 and 3 show up clearly in the Kasiski test, like

{[a:+lN][b:0mO][c:P1n][d:Q2o][e:pR3][f:qS4][g:rT5][h:sU6][i:tV7][j:uW8][k:vX9][l:AwY][m:BxZ][n:aCy][o:bDz][p:cE][r:dF][s:eG][t:fH][u:gI][v:hJ][w:iK][x:jL][y:kM]}

gives enciphered

t9TUlq1RQMQIoE+NIwUurQif36dSaB7SwGifsNH2Flfgh66JaG4KuSnRok2g8IiBDUHWZdhV33xy+xBHJeb74cjuiEkMAUbcGGmiuOL8T9UymIL8gKwScDJE96WVCuTiXjLnfGHqBlf8jvTyzddc2A1R2M2gF8OCNEXH5UvgtdPe+7IKXWHSrfXjDwlMBdtNDHDa9iRXrJg9hSGWDcLFjXT1pddZJYxUKktiqSGWn3C8GYSW3UgqU5gi7gE2p+fjM6jq6A3S+ZmKU5Z+hMiytEhGgHv7mYCZm2NA7M5duPD8dRfrdHeHHiVWhQKYEserdofDFyXpOASIklXZYezh

with only 62 symbols and is Kasiski test is

Kasiski test for 2-grams : dd:[2, 2, 2, 3, 3]; GW:[2, 2, 2, 3]; U5:[2, 2, 2, 3]; 1R:[2, 2, 2, 2, 3, 3]; Xj:[2, 3, 3, 3]; rd:[2, 2, 2, 2]; Zm:[2, 11]; SG:[2, 2, 2, 3]; 2g:[2, 2, 2, 2, 2, 3]; lf:[2, 2, 2, 2, 2, 3]; L8:[2, 2, 2]; if:[2, 2, 3]
{2=37, 3=13, 11=1}
Kasiski test for 3-grams : SGW:[2, 2, 2, 3]
{2=3, 3=1}
Kasiski test for 4-grams : No repeats found.

And with 23 sized chiffre disk, to have only one large prime factor, by dropping j as well

{[a:j+M][b:0kN][c:1lO][d:P2m][e:Q3n][f:R4o][g:pS5][h:qT6][i:rU7][k:sV8][l:tW9][m:AuX][n:BvY][o:CwZ][p:aDx][r:bEy][s:cFz][t:dG][u:eH][v:fI][w:gJ][x:hK][y:iL]}

results in

78ppMQl2lLOeOaLid9pU41fc25aQXWSmVyIc4Kz0xKEzd4RGtDmH5l+kjJ+cSzGsXQa5VBz4Nk7VgU7azwtn0udRcXee6PV9BvgFQgd3k4O7IDFnZES+s7Dsnlml6m+bnzzIBXYi5dXmyQ+STVsrJnIJfzIvVOdSFrmtLj2XkUesFjuwOktJK91Zpmbxn6jyosonNvHNgAt0Af7joSBSYNgcepS1WNlJuwhuHeTLyG2L5keidgrHfHr9grnyzv69AfWGIjzdBMw8eGMYUusNfPTopRKfviMLvaXhe9FPIY0f2aRF2Qm3nTHI5ZqgkF2y1BQjOifw9JxQU8IgIOL4

which has only 62 symbols, and with Kasiski

Kasiski test for 2-grams : uw:[2, 5, 5]; Af:[2, 2, 13]; Ng:[2, 7]; zd:[3, 73]; id:[2, 2, 2, 2, 2, 7]; gr:[7]; F2:[2, 7]; zI:[23]
{2=10, 3=1, 5=2, 7=4, 13=1, 23=1, 73=1}
Kasiski test for 3-grams : No repeats found.
Kasiski test for 4-grams : No repeats found.

as example. I never got an example without 23 showing up, most times more often than in the example above, and had an observation that if there is a 3-gram or 4-gram at all, it always contained the correct prime factors. It is surprising me that I got a result without the factor 13 for 26 size, but no result without 23 for 23 size.

 
Posted : March 3, 2018 3:44 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

As written at the beginning, I do not take the 26 chiffre disk size as granted.

I understand and just try to add information that may help with the problem.

and has only 61 symbols instead of 63 by chance, which makes the hiding of factors weaker instead of stronger.

Your example ciphers have an ioc of about 0.0162 and the 340 has an ioc of 0.0193. A higher ioc makes it harder to hide information and so your ciphers are actually hiding the factors more thoroughly and not by a trivial degree. Your first cipher randomized 10000 times gives 14.59 average bigrams (as a way to measure repetitiveness in the text) and the 340 gives 19.68 bigrams. It could then be reasoned that the 340 is roughly 35% more repetitive than your cipher despite it having 2 more symbols.

As an observation, I tend to get only few 3-grams, while the original z340 has two.

Match the ioc to that of the 340 and the amount of 3-grams in your ciphers will increase.

I did about a handful of more test runs. Here is an example of a random generated cipher like the above, that has the factor 13 not appearing in Kasiski, and has 63 symbols

It is good to have some alternatives.

Here is one test of my own showing spikes at 13, 26 and 52 for that cipher.

AZdecrypt keyword length stats for: freichmann2.txt
--------------------------------------------------------
Length 2: 2751
Length 3: 2988
Length 4: 3136
Length 5: 2500
Length 6: 2585
Length 7: 2707
Length 8: 2662
Length 9: 2651
Length 10: 1985
Length 11: 2659
Length 12: 2431
Length 13: 3259 <---
Length 14: 2025
Length 15: 2104
Length 16: 1861
Length 17: 1989
Length 18: 1508
Length 19: 1763
Length 20: 1558
Length 21: 1687
Length 22: 1808
Length 23: 1549
Length 24: 1440
Length 25: 1215
Length 26: 2192
Length 27: 1586
Length 28: 1027
Length 29: 1304
Length 30: 1002
Length 31: 1072
Length 32: 776
Length 33: 1349
Length 34: 1188
Length 35: 1108
Length 36: 836
Length 37: 1432
Length 38: 982
Length 39: 1245
Length 40: 779
Length 41: 875
Length 42: 862
Length 43: 1038
Length 44: 1099
Length 45: 640
Length 46: 1079
Length 47: 954
Length 48: 842
Length 49: 786
Length 50: 470
Length 51: 1001
Length 52: 1384
Length 53: 1178
Length 54: 665
Length 55: 581
Length 56: 641
Length 57: 418
Length 58: 692
Length 59: 699
Length 60: 759
Length 61: 812
Length 62: 485
Length 63: 808
Length 64: 382
Length 65: 976
Length 66: 662
Length 67: 647
Length 68: 546

AZdecrypt

 
Posted : March 3, 2018 11:18 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

f.reichmann:

I am glad that you are interested in the Zodiac 340. It is a very difficult problem. Take a look at what has been observed and found out so far. Here is a short list of what people so far have been confounded over:

1. Homophonic cycling and / or patterns in homophonic symbol selection
2. The period 19 and 39 bigram repeats, reading the message left right top bottom
3. The period 15 and 29 bigram repeats, reading the message right left top bottom
4. The pivots, which are on period 39 reading left right top bottom, and period 29 reading the message right left top bottom
5. The period 78 unigram repeats, and the fact that some of the period 26 and period 39 bigram repeats actually occur on a period of 78, all multiples of 13
6. The regional bias of certain symbols, where some symbols appear exclusively in the top and bottom few rows, but not in the middle rows
7. The + symbol, and how it relates to 1, 2, 3, and maybe 5 above
EDIT:
8. What a possible homophonic substitution key would look like, the symbol count distribution caused by the key, and how they relate to 1, 2, 3, 6 and 7 above.

It is maybe that all of the above do not need to be reconciled with each other. Some of the these very interesting patterns may not have been caused by the cipher. But finding a model, a cipher, that would cause at least a few of the above in a way that reconciles them all together is what I personally think is important. There are also other interesting observations. Check them out!

 
Posted : March 4, 2018 3:56 am
glurk
(@glurk)
Posts: 756
Prominent Member
 

Despite the doubts of Smokie and Jarlve as per the 340 having Vigenere elements, I think it is important to recognize the work being don by f.reichmann in re-programming a solver in very clean, clear Java code using pure statistics.

We don’t want to lose sight of the forest for the trees. His program looks very solid, very well written. And it is open-source, unlike Jarlve’s work.

I really appreciate it being open source. THAT is how progress is going to be made!

People who release their programs advance the work. People who hide their work die in obscurity.

-glurk

——————————–
I don’t believe in monsters.

 
Posted : March 4, 2018 4:05 pm
Page 1 / 2
Share: