Recipes for primephobia

_pi · 2015-12-09T12:55:49Z

As demonstrated before, the z340 exhibits a peculiar characteristic: the 2 most frequent symbols ('+' and 'B') fall almost exclusively on non-prime positions. Of all 36 instances, only 2 fall on a prime position. This is statistically odd, as measured by Doranchak: Shuffle experiments show that + will fall only on 0 or 1 prime positions in 3% of shuffles. In 0.7% of shuffles, + and B each fall on 0 or 1 prime positions. So, I can't easily dismiss the phenomenon as coincidence. More info here: http://www.zodiackillerciphers.com/?p=319 While this primephobia of the most frequent symbols could be a coincidence, it could also be a symptom of the cipher's construction methodology. The purpose of this post is to present 2 encryption methods that substantially augment the odds of inducing such primephobia in the resulting ciphers. Primes in columns When listing a series of numbers in a table format, an interesting phenomenon can be observed. For example, let's list all numbers from 1 to 340 in a table of 6 columns. In this table, I have highlighted all the prime numbers in orange: You'll notice that, if we exclude the very first line of numbers, all the prime numbers are positioned in columns 1 and 5. Columns 2, 3, 4 and 6, highlighted in green, are prime-safe (again, excluding the first line), meaning that no prime number can be found in these columns. This is because all numbers in columns 2, 4 and 6 are at least divisible by 2 and numbers in column 3 are at least divisible by 3. The appearance of these "prime-safe" columns is entirely dependant on the number of columns chosen to display the list of numbers. As a second example, here is the same list of numbers organised in a 7-column table: You'll notice that, excluding the first line, only the 7th column is prime-safe, since all the numbers in that column are at least divisible by 7. All other columns potentially can host a prime number. Here are a few examples of prime-safe columns according to the number of columns used to display the list of numbers : # Columns Prime-safe Columns --------------------------------------- 5 5 6 2, 3, 4, 6 7 7 8 2, 4, 6, 8 9 3, 6, 9 10 2, 4, 5, 6, 8, 10 ... 17 17 ... If a cipher construction method were to intrinsically exploit this prime-safe columns phenomenon, it would increase the probabilities of yielding primephobic ciphers. In other words, if the construction method was somehow funneling high-frequency symbols in prime-safe columns, it would greatly increase the yield of primephobic ciphers. Recipe #1: Vigenère The Vigenère cipher is a method of encrypting alphabetic text by using a series of different Caesar ciphers based on the letters of a keyword. It is a simple form of polyalphabetic substitution. [...] In a Caesar cipher, each letter of the alphabet is shifted along some number of places; for example, in a Caesar cipher of shift 3, A would become D, B would become E, Y would become B and so on. The Vigenère cipher consists of several Caesar ciphers in sequence with different shift values. [...] The alphabet used at each point depends on a repeating keyword. - Wikipedia This repeating keyword makes the Vigenère encoding very cyclical. If the keyword is 5 characters long, it means that there will be 5 different encoding alphabets, repeated over: 1,2,3,4,5,1,2,3,4,5,1,2,etc. Another way to look at this is, for a 5-letter keyword, displaying the plaintext in a grid of 5 columns, every letter of a column will be encoded with the same alphabet. This cyclical quality of Vigenère is therefore very compatible with the prime-safe notion explained above. For example, given a random english plaintext of 340 characters, we would find on average about 43 letter E and 30 letter T. By formatting this plaintext in a grid of 6 columns, these letters would be randomly spread out across all columns. Now, let's say we encode this plaintext using Vigenère with the keyword "QDEEZE". When an E in the plaintext is encoded with an E in the keyword, an "I" is obtained. When a T in the plaintext is encoded with a D in the keyword, a "W" is obtained. Since the keyword is 6 characters long, and the keyword letters D and E are in positions 2, 3, 4 and 6 (all prime-safe columns), this encoding process will funnel a high amount of resulting I and W symbols in prime-safe columns. By generating random english plaintexts of 340 characters and Vigenère encoding them with that "QDEEZE" keyword and only selecting the resulting ciphers where the number of symbols I and W total 36 (to mimic the frequency of + and B in the z340), we get a staggering 54% of ciphers which exhibit a primephobia on these symbols equal or higher than the + and B of the z340. This is in comparison with 0.7% of random shuffles of the z340 exhibiting equal or higher prime phobia than the original z340. The size of the keyword, its letters and their positions in the keyword will have a dramatic impact on the likelyhood of produ...

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Having both the + and B be prime phobic seems important. Those are two very unique symbols also because they are high count and do not cycle with other symbols well. They could be 1:1 substitutes, nulls, or included in multiple cycles making them look like 1:1 substitutes but really polyalphabetic.

Random shuffles are one way to examine observed statistical phenomenon, but sometimes a relatively simple cipher can surprisingly re-create a statistical phenomenon without any intention to do so.

But one the other hand, perhaps two keys. One for primes and one for non-primes.

Seems fairly labor intensive to do something like that, and it still doesn’t explain the one + and one B that land on prime positions.

At some point I am going to have to play with prime phobia. We found that there is an odd – even phenomenon, and that three of the symbols that land on only odd positions cycle with each other. Prime numbers, with the exception of 2, are all odd. So I wonder about making two lists, symbols that only land on primes and symbols that only land on non-primes. Then check those mutually exclusive symbols to find out if some of them cycle with each other.

EDIT: After searching through the massive homophonic substitution thread, I found the citation for the three symbols that land on only odd numbered positions: viewtopic.php?f=81&t=2617&hilit=odd+daikon&start=210. Perhaps the odd even phenomenon and prime phobia phenomenon are related.

Posted : December 23, 2015 4:44 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

.
I am a little bit intrigued by this. So out the outset I see that a lot of the primes are positioned in "period 18" diagonal rows.

Back to that later.

I was wondering if there may have been two keys, one for primes and one for non-primes. No dice, though. I separated the primes and moved them down to the last four rows. Then I checked the cycle scores for rows 1-16 and for rows 17-20 before and after the separation.

340 rows 1-16 cycle score: 44016
Non-primes rows 1-16 score: 29956

340 rows 17-20 cycle score: 2722
Primes rows 17-20 score: 2748

The 340 is more cyclic without separating the primes from the non-primes. So no amazing discovery and probably no two keys.

There is only one symbol that is exclusive to primes, symbol 59. But there are only two of those. But there are, not surprisingly, many symbols exclusive to non-primes. I made that list and then checked the 34 top cycles. No correlation. There aren’t any two symbols that are both exclusive to non-primes and which have high cycle scores.

But I am pondering whether a period 19 transposition scheme could cause high count symbols to avoid prime locations, given that the primes are often situated in "period 18" diagonal rows…

Are the period 19 and prime phobia phenomenon related? How difficult is it to make a one key cipher that unintentionally causes high count symbols to avoid non-prime symbols? And we need to take a closer look at the other high count symbols and the statistics.

Posted : December 24, 2015 5:08 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

So here are the symbol counts, prime and non-prime counts, and expected prime and non-prime counts based on the fact that 20% of the symbol positions in the 340 are prime numbers.

It looks like symbol 19 boxed in red ( the + ) avoids prime locations and that may be statistically significant. There are 24 count of those, only one lands on a prime position, and there should be about 4.8 landing on prime positions.

Symbol 20 boxed in blue ( the B ) lands on a prime position only once, but there should be about 2.4 landing on prime positions. Not that big of a difference.

Other high count symbols, 16 and 36 boxed in blue, have a slightly lower count on prime positions as compared to expected.

High count symbols 5, 11, and 51 boxed in green land on prime positions more often than expected.

Posted : December 24, 2015 6:05 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I have an idea for another way to test the statistical significance of the prime phobia phenomenon. To double check the random shuffle statistics.

Let us say for the sake of argument that the + symbol is a 1:1 substitute.

Take a massive text, like a novel or whatever it is that you guys use to create your n-gram tables. Then take a random 340 plaintext sample from the text. Find all of the plaintext that has count of about 24 within the sample. Then find out how many of those plaintext land on prime positions within the sample. Do that a few thousand times and make a bell curve chart. Compare with the + symbol stats and the random shuffle stats.

If the + maps to two, three, or more plaintext, then does that matter for random sample statistic purposes?

Posted : December 25, 2015 6:45 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I made a spreadsheet with all of Jarlve’s 100 plaintext messages, which can be found here:

viewtopic.php?f=81&t=2435

Then I made it so that I adjust two variables, the minimum count of plaintext in a message and the maximum count of plaintext that land on prime positions.

For example, if I make the minimum count of plaintext 24 and the maximum count that land on primes 1, then here are the results:

There are 542 plaintext that have a count of 24 or more. They are all high frequency plaintext, as can be expected. A, E, H, I, N, O, R, S and T. And there are a few messages that have equal to or more than 24 of D and L.

Of the 542 occurrences where there are minimum 24 plaintext in a message, there are only 3 where 1 or fewer of those plaintext land on primes.

Upper left is message #27, where there are 24 of the letter O, and only 1 lands on a prime position.
Upper right is message #84, where there are 32 of the letter A, and only 1 lands on a prime.
Lower left is message #93, where there are 25 of the letter H, and 0 lands on a prime.

So that is 3% of the messages, or 3 / 542 = 0.6% of the occurrences where there a minimum of 24 plaintext. So it does happen, but not frequently. Why Zodiac would diffuse with 63 symbols and then make one of them a high count 1:1 substitute like the + symbol I don’t know. But even if he did, the chances of having only one land on a prime position seems statistically significant. And since he did diffuse with 63 symbols, it seems even more statistically significant.

I can change the variables, and you might find it interesting. Will show more thorough statistics soon.

EDIT: So I may be comparing apples to oranges instead of apples to apples. I don’t know, but this may give some more perspective.

I made a table and changed the variables.

Blue box upper left: There are 705 occurrences in Jarlve’s 100 message plaintext library where there are 21 or more of the same plaintext in the same message. Of those, there are 2 where those plaintext land on 0 prime positions.

Blue box lower left: Same 705 occurrences with 21 or more of the same plaintext in the same message. Of those, there are 162 where the plaintext land on 3 or fewer prime positions. So you can see, if I change the variable for prime positions from 0 to 3, the count of occurrences changes dramatically from 2 to 162.

Red box: Again, in Jarlve’s library, there are 542 occurrences where the same plaintext count is 24 or more in the same message. And of those, there are 3 where there are 1 or fewer of those plaintext that land on prime positions.

3% of random shuffles of the 340 results shows the + landing on only one prime position. And 3% of Jarlve’s messages show similar statistics. I am not an expert in interpreting statistics. But because Zodiac used 63 symbols to diffuse and there is only one symbol with a count of 24, this may be statistically important. I am just trying to think of a cipher model that would explain prime phobia, the period 19 bigram repeats, three symbols cycling together on exclusively odd positions, and the cycle statistics in general.

One final thought for the night. I made the 340 into a message that is 18 columns by 19 rows, which is conducive to a period 19 transposition scheme. The prime positions line up with each other in columns.

The 19 is the + symbol. Can anyone think of a simple relationship between transposition and prime phobia?

Posted : December 26, 2015 5:28 am

Zodiac Discussion Forum