Once again I looked at the encyclopedia of observations ( http://zodiackillerciphers.com/wiki/ind … servations) and found a "TODO" that I wanted to finish.
TODO: how often does a random shuffle show a period that has repeated quadgrams/5grams?
Here is the result. I hope that the result is correct, because this is the first time I have calculated the standard deviation. My calculations for 2- and 3-grams for comparison seems to be ok.
Number of shuffles: 5000000
4-Grams: -------- 4960430 contained no repeated 4-grams 38599 contained 1 repeated 4-gram 943 contained 2 repeated 4-grams 27 contained 3 repeated 4-grams 1 contained 4 repeated 4-grams No cipher contained 5 or more repeated 4-grams. Average: 0.008114 repeated 4-grams per cipher Variance: 0.00839484872095609 Standard Deviation: 0.091623407058219 1 repeated 4-gram per cipher = 10.83 Sigma away from mean 2 repeated 4-grams per cipher = 21.74 Sigma away from mean 3 repeated 4-grams per cipher = 32.65 Sigma away from mean 4 repeated 4-grams per cipher = 43.57 Sigma away from mean 5-Grams: -------- 4999163 contained no repeated 5-grams 816 contained 1 repeated 5-grams 21 contained 2 repeated 5-grams No cipher contained 3 or more repeated 5-grams Average: 0.0001716 repeated 5-grams per cipher Variance: 0.000179941147797584 Standard Deviation: 0.0134142143936044 1 repeated 5-gram per cipher = 74.54 Sigma away from mean 2 repeated 5-grams per cipher = 149.08 Sigma away from mean 3 repeated 5-grams per cipher = 223.64 Sigma away from mean
In this turn I noticed one point that I haven’t paid much attention to so far.
Repeated quadgrams appear only at periods 101 (illustration) and 116 (illustration). They do not appear when considering the mirrored ciphertext.
With P101, however, you get not only a 4-gram, but even a 5-gram. I found some interesting threads in the forum, but they only deal with this topic roughly. For example this one here:
viewtopic.php?f=81&t=2617&p=43877&hilit=period+101#p43877
What is interesting about this observation is the fact that these 5grams consist mostly of symbols that do not occur too often in the cipher (4-5 times on average, except for the large O, which occurs 10 times). In my shuffle tests, most of the 5grams found contained a + symbol or a B. Maybe I’m too hasty here, but doesn’t that make the find even more unusual?
In general, I also wonder how my shuffle results shown above fit the P101 find. If the expected value for a 5gram is 74.54 Sigma away from mean, the 5gram found is more than amazing, isn’t it? But I’m not really sure if it’s that easy to relate. However, I find it remarkable that a simple operation like P101 can immediately produce a 5gram.
I can’t get any further at this point, because my math knowledge is not sufficient here. I only have one idea in mind that is not quite concrete: Could it be that P101 has a kind of "correlation" of ngrams of another period?  Let’s say there are some 2- or 3-grams in the correct solution of z340 that have the same distance. With P101, they stand directly next to each other and form the 5grams. Is this conceivable? Sorry, I can’t formulate it any other way.
To get back to more easily recognizable things: If you mark the P1 pivots and perform untranspose 101, then the 5grams are conspicuously in line with parts of the pivots. In addition, the left pivot is evenly spaced:
If you shift z340 by two columns to the right and perform a p96 untransposition, you also get two 5-grams. These are close to the P101 5-grams and share the reversed F. Actually, this could be dismissed as a nice coincidence. But the sigma values and the probability for the appearance of 5-grams do not go out of my mind.
Sorry to throw this into the room so half-baked. I just run out of ideas here, but I didn’t want to leave it unmentioned.
Translated with http://www.DeepL.com/Translator
And immediately the TODO has opened again. If I had read correctly, I would have seen that it is about the number of ngrams at different periods. I think I’ll take a longer break.
Really interesting stuff Largo. I never noticed the evenly spaced "period 3" pivot at period 101! The 340 is just funny. It is truly unbelievable how this cipher works. Could you also mark the "O" symbol please?
I ran my own test that just returns the longest n-gram repeat found. About 1 in 5000 randomizations have a 5-gram or better closely matching your own results. Though your sigma’s are wrong somehow.
Combinations processed: 1000000/1000000 Measurements: - Mean: 2.341767 - Variance: 0.2416183177138649 - Standard deviation: 0.4915468621747726 - Count equal/over 5: 185 - Sigma of 5: 5.407893335416814 - Sigma of 2: -0.6952887431483237 - Sigma of 6: 7.442287361605193 - Lowest: 2 (Randomize(1)) - Highest: 6 (Randomize(264843))
Could it be that P101 has a kind of "correlation" of ngrams of another period?
Probably period 39 since we have at least one period 3 pivot. Back then I suspected that cycles would make such repeats more likely. I am running some tests now to see if that holds up.
@doranchak, have you ever scanned the 340 for such "period n" pivots?
Back then I suspected that cycles would make such repeats more likely. I am running some tests now to see if that holds up.
I ran a test with randomized plaintexts, target ioc 2236 and 0, 25, 50 and 75% cycle randomization for 5-gram period 1 repeats.
Results that had a 5-gram or better:
0%: 826 out of 1,000,000
25%: 456 out of 1,000,000
50%: 252 out of 1,000,000
75%: 194 out of 1,000,000
So at least there is a significant correlation between cycles and longer n-gram repeats at period 1. And I now need to follow up with a test for periods 2 to 170.
So at least there is a significant correlation between cycles and longer n-gram repeats at period 1. And I now need to follow up with a test for periods 2 to 170.
0%: 396 out of 10,000 have a 5-gram or better repeat within periods 2 to 170
50%: 287 out of 10,000 have a 5-gram or better repeat within periods 2 to 170
Yep. Cycles increase the odds of having longer n-grams occur through period 1 to 170. Though the phenomena is still unlikely.
I ran my own test that just returns the longest n-gram repeat found. About 1 in 5000 randomizations have a 5-gram or better closely matching your own results. Though your sigma’s are wrong somehow.
When I calculate the sigmas for 2grams and 3grams, my values seem to be correct. Here is an example for 2grams at 5000000 iterations ( on the left the number of repeated bigrams, on the right the number of ciphers containing this number). Below are the sigmas for some selected bigram repeats. I’ll do the same with the 5grams.
2 : 0 3 : 3 4 : 8 5 : 49 6 : 195 7 : 651 8 : 2169 9 : 5875 10 : 14530 11 : 31507 12 : 61147 13 : 106910 14 : 171547 15 : 250306 16 : 337300 17 : 420168 18 : 483462 19 : 517156 20 : 518222 21 : 484005 22 : 423397 23 : 349131 24 : 270680 25 : 198374 26 : 136787 27 : 89874 28 : 56222 29 : 32830 30 : 18444 31 : 9616 32 : 5002 33 : 2461 34 : 1170 35 : 501 36 : 179 37 : 79 38 : 28 39 : 10 40 : 3 41 : 2 42 : 0 Average: 19,7830468 Variance: 14,5869958264089 Standard Deviation: 3,81929258193306 10: -2,56 15: -1,25 20: 0,057 25: 1,37 30: 2,68 37: 4,51 41: 5,56 48: 7,39
This is my source code:
int numNGrams = 0;
SortedDictionary<int, int> nGramMap = new SortedDictionary<int, int>();
int numTests = 5000000;
double average = 0.0;
double variance = 0.0;
double standardDeviation = 0.0;
// Create a dictionary (key: number of ngrams, value: amount of ngrams for key)
for (int i=0; i<100; i++)
{
	nGramMap[i] = 0;
}
for (int i=0; i<numTests; i++)
{
	cipher340.Shuffle();
	cipher340.GetRepeatedNgrams(2, ref numNGrams);
	average += numNGrams;
	
	nGramMap[numNGrams]++;
}
average /= numTests;
foreach (var entry in nGramMap)
{
	if (entry.Key != 0)
	{
		variance += Math.Pow(entry.Key - average, 2.0) * entry.Value;
	}
	Console.WriteLine(entry.Key + " t: " + entry.Value);
}
variance /= (numTests - 1);
standardDeviation = Math.Sqrt(variance);
// Print some sigmas
Console.WriteLine("25: " + ((25 - average) / standardDeviation));
Console.WriteLine("37: " + ((37 - average) / standardDeviation));
Do you have any idea what might be wrong? In my opinion the calculation is correct. As I said, of course I don’t want to exclude an error.
Could you also mark the "O" symbol please?
Sure! Here you are:
Yep. Cycles increase the odds of having longer n-grams occur through period 1 to 170. Though the phenomena is still unlikely.
That’s really interesting! If I remember correctly, I did a test to see how the occurrence of pivots behaves with cycle randomization. I noticed then that pivots occur more often when the cycles are more random. In 100% cycles there were fewer pivots. In this sense, the simultaneous occurrence of pivots at P1 and 5 grams at P 1-170 is even more astonishing.
Abnormal distribution of data. Your mean is 0.0001716 while the minimum is 0. That creates an extremely left skewed distribution curve.
To be honest I think most of our tests have this problem and it would probably be better to calculate the SD from a table using the odds we have:

So your results versus randomizations lies between 4.5 and 4.89 sigma.
So at least there is a significant correlation between cycles and longer n-gram repeats at period 1. And I now need to follow up with a test for periods 2 to 170.
0%: 396 out of 10,000 have a 5-gram or better repeat within periods 2 to 170
50%: 287 out of 10,000 have a 5-gram or better repeat within periods 2 to 170Yep. Cycles increase the odds of having longer n-grams occur through period 1 to 170. Though the phenomena is still unlikely.
And this would fall between 2 and 2.57 sigma.
Yep. Cycles increase the odds of having longer n-grams occur through period 1 to 170. Though the phenomena is still unlikely.
That’s really interesting! If I remember correctly, I did a test to see how the occurrence of pivots behaves with cycle randomization. I noticed then that pivots occur more often when the cycles are more random. In 100% cycles there were fewer pivots. In this sense, the simultaneous occurrence of pivots at P1 and 5 grams at P 1-170 is even more astonishing.
Did you control the ioc in your pivot versus cycles test?
I found a semi interesting pattern last night before dinner. P19 and P101 bigrams that share the same symbols, and at least one position. Do you see the patterns? I showed symbol count on the right. Some of the patterns are caused by high count symbols, but some of them are caused buy low count symbols.
It is weird stuff like this that makes me think more and more about a hoax, but some mechanical or mathematical way of doing it to intentionally create a lot of interesting patterns. Again, woke up in the middle of the night thinking about how this could be done.
Abnormal distribution of data. Your mean is 0.0001716 while the minimum is 0. That creates an extremely left skewed distribution curve.
To be honest I think most of our tests have this problem and it would probably be better to calculate the SD from a table using the odds we have:
Thank you for the table and the advice! I will take a closer look, but it will take a few days.
Did you control the ioc in your pivot versus cycles test?
I just ran the test again. Criteria were "63 symbols", "Raw IOC >= 2000", ">= 1 pivot". Here are the results:
100% cyclic: 26 Ciphers with >= 1 pivot 25% random: 63 Ciphers with >= 1 pivot 50% random: 215 Ciphers with >= 1 pivot 75% random: 378 Ciphers with >= 1 pivot 100% random: 462 Ciphers with >= 1 pivot
The basis was 276163 ciphers generated from http://wortschatz.uni-leipzig.de/de/download (News 2005, 1M).
One can clearly see that the more randomised the cycles, the more frequent the pivots.
I found a semi interesting pattern last night before dinner. P19 and P101 bigrams that share the same symbols, and at least one position. Do you see the patterns?
…Image…
I showed symbol count on the right. Some of the patterns are caused by high count symbols, but some of them are caused buy low count symbols.
That looks very interesting. We would have to test how often this happens by chance. Maybe P19 and P101 are just mathematically related in a way we can’t see right now, and that explains it. As already mentioned, I often thought about a hoax, but a real plaintext (possibly with a transposition error) is more likely to me (at least I hope so).
What do you think, could a plaintext with many repetitions or some kind of a crossword puzzle be the reason for such observations? If the structure of the plaintext is extremely strange, then it is probably also the resulting cipher. So it doesn’t have to be a hoax. But that’s just a gut feeling.
I think that a plaintext could easily do this, and look at the highlighted symbols again there are small sections that are in alignment with each other at P19. It looks like words.
I have been trying to imagine a hoax method, though. P19 actually is 20 symbols if you count the first one. P39 is actually 40 symbols if you count the first one. P101 is ( 20 x 5 ) +1.
I try to imagine a grid of 20 rows and 63 or more columns. Each row has all of the symbols, but each row there is only an area that Zodiac would randomly choose symbols from. Each row has a different area, of unknown number of columns, and sometimes the areas overlap each other. Then, move down from top row to bottom row, one row at a time. Randomly choose from the area in the row. Could that cause ABAB cycling, P19 and the pivots? Could the arrangement of the areas cause P19 instead of P20?
I am totally open minded to either message or hoax.
Like this, very roughly. The symbols are not shown, but the areas from which the symbols could be randomly selected are colored blue. I don’t know right now how big the areas really need to be or how much they have to overlap to create ABAB, P19 and pivots though, especially to make the system difficult to detect. P101 with the shared symbols and positions pattern shown above and other weird observations make me think it may be something like this.
I just ran the test again. Criteria were "63 symbols", "Raw IOC >= 2000", ">= 1 pivot". Here are the results:
More cycle randomization increases the ioc and that will affect everything. What are the average raw ioc for your test ciphers at 0, 25, 50, 75 and 100% cycle randomization?
My own test confirms your results but the difference is not as large:
C4 pivots >=1 per 100,000 ciphers with randomized plaintexts: 0%: 108 25%: 145 50%: 152 75%: 159 100%: 163








