Zodiac Discussion Forum

Period 101 and its …
 
Notifications
Clear all

Period 101 and its 5gram

19 Posts
4 Users
0 Reactions
2,176 Views
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Once again I looked at the encyclopedia of observations ( http://zodiackillerciphers.com/wiki/ind … servations) and found a "TODO" that I wanted to finish.

TODO: how often does a random shuffle show a period that has repeated quadgrams/5grams?

Here is the result. I hope that the result is correct, because this is the first time I have calculated the standard deviation. My calculations for 2- and 3-grams for comparison seems to be ok.

Number of shuffles: 5000000

4-Grams:
--------

4960430 contained no repeated 4-grams
38599 contained 1 repeated 4-gram
943 contained 2 repeated 4-grams
27 contained 3 repeated 4-grams
1 contained 4 repeated 4-grams
No cipher contained 5 or more repeated 4-grams.

Average: 0.008114 repeated 4-grams per cipher
Variance: 0.00839484872095609
Standard Deviation: 0.091623407058219

1 repeated 4-gram per cipher = 10.83 Sigma away from mean
2 repeated 4-grams per cipher = 21.74 Sigma away from mean
3 repeated 4-grams per cipher = 32.65 Sigma away from mean
4 repeated 4-grams per cipher = 43.57 Sigma away from mean


5-Grams:
--------

4999163 contained no repeated 5-grams
816 contained 1 repeated 5-grams
21 contained 2 repeated 5-grams
No cipher contained 3 or more repeated 5-grams

Average: 0.0001716 repeated 5-grams per cipher
Variance: 0.000179941147797584
Standard Deviation: 0.0134142143936044

1 repeated 5-gram per cipher = 74.54 Sigma away from mean
2 repeated 5-grams per cipher = 149.08 Sigma away from mean
3 repeated 5-grams per cipher = 223.64 Sigma away from mean

In this turn I noticed one point that I haven’t paid much attention to so far.

Repeated quadgrams appear only at periods 101 (illustration) and 116 (illustration). They do not appear when considering the mirrored ciphertext.

With P101, however, you get not only a 4-gram, but even a 5-gram. I found some interesting threads in the forum, but they only deal with this topic roughly. For example this one here:

viewtopic.php?f=81&t=2617&p=43877&hilit=period+101#p43877

What is interesting about this observation is the fact that these 5grams consist mostly of symbols that do not occur too often in the cipher (4-5 times on average, except for the large O, which occurs 10 times). In my shuffle tests, most of the 5grams found contained a + symbol or a B. Maybe I’m too hasty here, but doesn’t that make the find even more unusual?

In general, I also wonder how my shuffle results shown above fit the P101 find. If the expected value for a 5gram is 74.54 Sigma away from mean, the 5gram found is more than amazing, isn’t it? But I’m not really sure if it’s that easy to relate. However, I find it remarkable that a simple operation like P101 can immediately produce a 5gram.
I can’t get any further at this point, because my math knowledge is not sufficient here. I only have one idea in mind that is not quite concrete: Could it be that P101 has a kind of "correlation" of ngrams of another period? Let’s say there are some 2- or 3-grams in the correct solution of z340 that have the same distance. With P101, they stand directly next to each other and form the 5grams. Is this conceivable? Sorry, I can’t formulate it any other way.

To get back to more easily recognizable things: If you mark the P1 pivots and perform untranspose 101, then the 5grams are conspicuously in line with parts of the pivots. In addition, the left pivot is evenly spaced:

If you shift z340 by two columns to the right and perform a p96 untransposition, you also get two 5-grams. These are close to the P101 5-grams and share the reversed F. Actually, this could be dismissed as a nice coincidence. But the sigma values and the probability for the appearance of 5-grams do not go out of my mind.

Sorry to throw this into the room so half-baked. I just run out of ideas here, but I didn’t want to leave it unmentioned.

Translated with http://www.DeepL.com/Translator

 
Posted : December 2, 2018 9:00 pm
(@largo)
Posts: 454
Honorable Member
Topic starter
 

And immediately the TODO has opened again. If I had read correctly, I would have seen that it is about the number of ngrams at different periods. I think I’ll take a longer break.

 
Posted : December 2, 2018 9:19 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

Really interesting stuff Largo. I never noticed the evenly spaced "period 3" pivot at period 101! The 340 is just funny. It is truly unbelievable how this cipher works. Could you also mark the "O" symbol please?

I ran my own test that just returns the longest n-gram repeat found. About 1 in 5000 randomizations have a 5-gram or better closely matching your own results. Though your sigma’s are wrong somehow.

Combinations processed: 1000000/1000000
Measurements:
- Mean: 2.341767
- Variance: 0.2416183177138649
- Standard deviation: 0.4915468621747726
- Count equal/over 5: 185
- Sigma of 5: 5.407893335416814
- Sigma of 2: -0.6952887431483237
- Sigma of 6: 7.442287361605193
- Lowest: 2 (Randomize(1))
- Highest: 6 (Randomize(264843))

Could it be that P101 has a kind of "correlation" of ngrams of another period?

Probably period 39 since we have at least one period 3 pivot. Back then I suspected that cycles would make such repeats more likely. I am running some tests now to see if that holds up.

@doranchak, have you ever scanned the 340 for such "period n" pivots?

AZdecrypt

 
Posted : December 3, 2018 12:36 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

Back then I suspected that cycles would make such repeats more likely. I am running some tests now to see if that holds up.

I ran a test with randomized plaintexts, target ioc 2236 and 0, 25, 50 and 75% cycle randomization for 5-gram period 1 repeats.

Results that had a 5-gram or better:

0%: 826 out of 1,000,000
25%: 456 out of 1,000,000
50%: 252 out of 1,000,000
75%: 194 out of 1,000,000

So at least there is a significant correlation between cycles and longer n-gram repeats at period 1. And I now need to follow up with a test for periods 2 to 170.

AZdecrypt

 
Posted : December 3, 2018 1:53 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

So at least there is a significant correlation between cycles and longer n-gram repeats at period 1. And I now need to follow up with a test for periods 2 to 170.

0%: 396 out of 10,000 have a 5-gram or better repeat within periods 2 to 170
50%: 287 out of 10,000 have a 5-gram or better repeat within periods 2 to 170

Yep. Cycles increase the odds of having longer n-grams occur through period 1 to 170. Though the phenomena is still unlikely.

AZdecrypt

 
Posted : December 3, 2018 11:44 am
(@largo)
Posts: 454
Honorable Member
Topic starter
 

I ran my own test that just returns the longest n-gram repeat found. About 1 in 5000 randomizations have a 5-gram or better closely matching your own results. Though your sigma’s are wrong somehow.

When I calculate the sigmas for 2grams and 3grams, my values seem to be correct. Here is an example for 2grams at 5000000 iterations ( on the left the number of repeated bigrams, on the right the number of ciphers containing this number). Below are the sigmas for some selected bigram repeats. I’ll do the same with the 5grams.

2       : 0
3       : 3
4       : 8
5       : 49
6       : 195
7       : 651
8       : 2169
9       : 5875
10      : 14530
11      : 31507
12      : 61147
13      : 106910
14      : 171547
15      : 250306
16      : 337300
17      : 420168
18      : 483462
19      : 517156
20      : 518222
21      : 484005
22      : 423397
23      : 349131
24      : 270680
25      : 198374
26      : 136787
27      : 89874
28      : 56222
29      : 32830
30      : 18444
31      : 9616
32      : 5002
33      : 2461
34      : 1170
35      : 501
36      : 179
37      : 79
38      : 28
39      : 10
40      : 3
41      : 2
42      : 0

Average: 19,7830468
Variance: 14,5869958264089
Standard Deviation: 3,81929258193306
10: -2,56
15: -1,25
20: 0,057
25: 1,37
30: 2,68
37: 4,51
41: 5,56
48: 7,39

This is my source code:

int numNGrams = 0;

SortedDictionary<int, int> nGramMap = new SortedDictionary<int, int>();
int numTests = 5000000;
double average = 0.0;
double variance = 0.0;
double standardDeviation = 0.0;

// Create a dictionary (key: number of ngrams, value: amount of ngrams for key)
for (int i=0; i<100; i++)
{
	nGramMap[i] = 0;
}

for (int i=0; i<numTests; i++)
{
	cipher340.Shuffle();
	cipher340.GetRepeatedNgrams(2, ref numNGrams);

	average += numNGrams;
	
	nGramMap[numNGrams]++;
}

average /= numTests;

foreach (var entry in nGramMap)
{
	if (entry.Key != 0)
	{
		variance += Math.Pow(entry.Key - average, 2.0) * entry.Value;
	}

	Console.WriteLine(entry.Key + " t: " + entry.Value);
}

variance /= (numTests - 1);

standardDeviation = Math.Sqrt(variance);

// Print some sigmas
Console.WriteLine("25: " + ((25 - average) / standardDeviation));
Console.WriteLine("37: " + ((37 - average) / standardDeviation));

Do you have any idea what might be wrong? In my opinion the calculation is correct. As I said, of course I don’t want to exclude an error.

Could you also mark the "O" symbol please?

Sure! Here you are:

 
Posted : December 3, 2018 8:37 pm
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Yep. Cycles increase the odds of having longer n-grams occur through period 1 to 170. Though the phenomena is still unlikely.

That’s really interesting! If I remember correctly, I did a test to see how the occurrence of pivots behaves with cycle randomization. I noticed then that pivots occur more often when the cycles are more random. In 100% cycles there were fewer pivots. In this sense, the simultaneous occurrence of pivots at P1 and 5 grams at P 1-170 is even more astonishing.

 
Posted : December 3, 2018 9:05 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

Abnormal distribution of data. Your mean is 0.0001716 while the minimum is 0. That creates an extremely left skewed distribution curve.

To be honest I think most of our tests have this problem and it would probably be better to calculate the SD from a table using the odds we have:

So your results versus randomizations lies between 4.5 and 4.89 sigma.

AZdecrypt

 
Posted : December 3, 2018 9:50 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

So at least there is a significant correlation between cycles and longer n-gram repeats at period 1. And I now need to follow up with a test for periods 2 to 170.

0%: 396 out of 10,000 have a 5-gram or better repeat within periods 2 to 170
50%: 287 out of 10,000 have a 5-gram or better repeat within periods 2 to 170

Yep. Cycles increase the odds of having longer n-grams occur through period 1 to 170. Though the phenomena is still unlikely.

And this would fall between 2 and 2.57 sigma.

AZdecrypt

 
Posted : December 3, 2018 9:51 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

Yep. Cycles increase the odds of having longer n-grams occur through period 1 to 170. Though the phenomena is still unlikely.

That’s really interesting! If I remember correctly, I did a test to see how the occurrence of pivots behaves with cycle randomization. I noticed then that pivots occur more often when the cycles are more random. In 100% cycles there were fewer pivots. In this sense, the simultaneous occurrence of pivots at P1 and 5 grams at P 1-170 is even more astonishing.

Did you control the ioc in your pivot versus cycles test?

AZdecrypt

 
Posted : December 3, 2018 10:21 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

I found a semi interesting pattern last night before dinner. P19 and P101 bigrams that share the same symbols, and at least one position. Do you see the patterns? I showed symbol count on the right. Some of the patterns are caused by high count symbols, but some of them are caused buy low count symbols.

It is weird stuff like this that makes me think more and more about a hoax, but some mechanical or mathematical way of doing it to intentionally create a lot of interesting patterns. Again, woke up in the middle of the night thinking about how this could be done.

 
Posted : December 4, 2018 5:56 pm
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Abnormal distribution of data. Your mean is 0.0001716 while the minimum is 0. That creates an extremely left skewed distribution curve.

To be honest I think most of our tests have this problem and it would probably be better to calculate the SD from a table using the odds we have:

Thank you for the table and the advice! I will take a closer look, but it will take a few days.

Did you control the ioc in your pivot versus cycles test?

I just ran the test again. Criteria were "63 symbols", "Raw IOC >= 2000", ">= 1 pivot". Here are the results:

100% cyclic:  26 Ciphers with >= 1 pivot

 25% random:  63 Ciphers with >= 1 pivot
 50% random: 215 Ciphers with >= 1 pivot
 75% random: 378 Ciphers with >= 1 pivot
100% random: 462 Ciphers with >= 1 pivot

The basis was 276163 ciphers generated from http://wortschatz.uni-leipzig.de/de/download (News 2005, 1M).

One can clearly see that the more randomised the cycles, the more frequent the pivots.

I found a semi interesting pattern last night before dinner. P19 and P101 bigrams that share the same symbols, and at least one position. Do you see the patterns?
…Image…
I showed symbol count on the right. Some of the patterns are caused by high count symbols, but some of them are caused buy low count symbols.

That looks very interesting. We would have to test how often this happens by chance. Maybe P19 and P101 are just mathematically related in a way we can’t see right now, and that explains it. As already mentioned, I often thought about a hoax, but a real plaintext (possibly with a transposition error) is more likely to me (at least I hope so).
What do you think, could a plaintext with many repetitions or some kind of a crossword puzzle be the reason for such observations? If the structure of the plaintext is extremely strange, then it is probably also the resulting cipher. So it doesn’t have to be a hoax. But that’s just a gut feeling.

 
Posted : December 4, 2018 10:14 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

I think that a plaintext could easily do this, and look at the highlighted symbols again there are small sections that are in alignment with each other at P19. It looks like words.

I have been trying to imagine a hoax method, though. P19 actually is 20 symbols if you count the first one. P39 is actually 40 symbols if you count the first one. P101 is ( 20 x 5 ) +1.

I try to imagine a grid of 20 rows and 63 or more columns. Each row has all of the symbols, but each row there is only an area that Zodiac would randomly choose symbols from. Each row has a different area, of unknown number of columns, and sometimes the areas overlap each other. Then, move down from top row to bottom row, one row at a time. Randomly choose from the area in the row. Could that cause ABAB cycling, P19 and the pivots? Could the arrangement of the areas cause P19 instead of P20?

I am totally open minded to either message or hoax.

 
Posted : December 4, 2018 10:32 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

Like this, very roughly. The symbols are not shown, but the areas from which the symbols could be randomly selected are colored blue. I don’t know right now how big the areas really need to be or how much they have to overlap to create ABAB, P19 and pivots though, especially to make the system difficult to detect. P101 with the shared symbols and positions pattern shown above and other weird observations make me think it may be something like this.

 
Posted : December 4, 2018 10:51 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

I just ran the test again. Criteria were "63 symbols", "Raw IOC >= 2000", ">= 1 pivot". Here are the results:

More cycle randomization increases the ioc and that will affect everything. What are the average raw ioc for your test ciphers at 0, 25, 50, 75 and 100% cycle randomization?

My own test confirms your results but the difference is not as large:

C4 pivots >=1 per 100,000 ciphers with randomized plaintexts:

0%: 108
25%: 145
50%: 152
75%: 159
100%: 163

AZdecrypt

 
Posted : December 5, 2018 12:41 am
Page 1 / 2
Share: