Having both the + and B be prime phobic seems important. Those are two very unique symbols also because they are high count and do not cycle with other symbols well. They could be 1:1 substitutes, nulls, or included in multiple cycles making them look like 1:1 substitutes but really polyalphabetic.
Random shuffles are one way to examine observed statistical phenomenon, but sometimes a relatively simple cipher can surprisingly re-create a statistical phenomenon without any intention to do so.
But one the other hand, perhaps two keys. One for primes and one for non-primes.
Seems fairly labor intensive to do something like that, and it still doesn’t explain the one + and one B that land on prime positions.
At some point I am going to have to play with prime phobia. We found that there is an odd – even phenomenon, and that three of the symbols that land on only odd positions cycle with each other. Prime numbers, with the exception of 2, are all odd. So I wonder about making two lists, symbols that only land on primes and symbols that only land on non-primes. Then check those mutually exclusive symbols to find out if some of them cycle with each other.
EDIT: After searching through the massive homophonic substitution thread, I found the citation for the three symbols that land on only odd numbered positions: viewtopic.php?f=81&t=2617&hilit=odd+daikon&start=210. Perhaps the odd even phenomenon and prime phobia phenomenon are related.
.
I am a little bit intrigued by this. So out the outset I see that a lot of the primes are positioned in "period 18" diagonal rows.
Back to that later.
I was wondering if there may have been two keys, one for primes and one for non-primes. No dice, though. I separated the primes and moved them down to the last four rows. Then I checked the cycle scores for rows 1-16 and for rows 17-20 before and after the separation.
340 rows 1-16 cycle score: 44016
Non-primes rows 1-16 score: 29956
340 rows 17-20 cycle score: 2722
Primes rows 17-20 score: 2748
The 340 is more cyclic without separating the primes from the non-primes. So no amazing discovery and probably no two keys.
There is only one symbol that is exclusive to primes, symbol 59. But there are only two of those. But there are, not surprisingly, many symbols exclusive to non-primes. I made that list and then checked the 34 top cycles. No correlation. There aren’t any two symbols that are both exclusive to non-primes and which have high cycle scores.
But I am pondering whether a period 19 transposition scheme could cause high count symbols to avoid prime locations, given that the primes are often situated in "period 18" diagonal rows…
Are the period 19 and prime phobia phenomenon related? How difficult is it to make a one key cipher that unintentionally causes high count symbols to avoid non-prime symbols? And we need to take a closer look at the other high count symbols and the statistics.
So here are the symbol counts, prime and non-prime counts, and expected prime and non-prime counts based on the fact that 20% of the symbol positions in the 340 are prime numbers.
It looks like symbol 19 boxed in red ( the + ) avoids prime locations and that may be statistically significant. There are 24 count of those, only one lands on a prime position, and there should be about 4.8 landing on prime positions.
Symbol 20 boxed in blue ( the B ) lands on a prime position only once, but there should be about 2.4 landing on prime positions. Not that big of a difference.
Other high count symbols, 16 and 36 boxed in blue, have a slightly lower count on prime positions as compared to expected.
High count symbols 5, 11, and 51 boxed in green land on prime positions more often than expected.
I have an idea for another way to test the statistical significance of the prime phobia phenomenon. To double check the random shuffle statistics.
Let us say for the sake of argument that the + symbol is a 1:1 substitute.
Take a massive text, like a novel or whatever it is that you guys use to create your n-gram tables. Then take a random 340 plaintext sample from the text. Find all of the plaintext that has count of about 24 within the sample. Then find out how many of those plaintext land on prime positions within the sample. Do that a few thousand times and make a bell curve chart. Compare with the + symbol stats and the random shuffle stats.
If the + maps to two, three, or more plaintext, then does that matter for random sample statistic purposes?
I made a spreadsheet with all of Jarlve’s 100 plaintext messages, which can be found here:
Then I made it so that I adjust two variables, the minimum count of plaintext in a message and the maximum count of plaintext that land on prime positions.
For example, if I make the minimum count of plaintext 24 and the maximum count that land on primes 1, then here are the results:
There are 542 plaintext that have a count of 24 or more. They are all high frequency plaintext, as can be expected. A, E, H, I, N, O, R, S and T. And there are a few messages that have equal to or more than 24 of D and L.
Of the 542 occurrences where there are minimum 24 plaintext in a message, there are only 3 where 1 or fewer of those plaintext land on primes.
Upper left is message #27, where there are 24 of the letter O, and only 1 lands on a prime position.
Upper right is message #84, where there are 32 of the letter A, and only 1 lands on a prime.
Lower left is message #93, where there are 25 of the letter H, and 0 lands on a prime.
So that is 3% of the messages, or 3 / 542 = 0.6% of the occurrences where there a minimum of 24 plaintext. So it does happen, but not frequently. Why Zodiac would diffuse with 63 symbols and then make one of them a high count 1:1 substitute like the + symbol I don’t know. But even if he did, the chances of having only one land on a prime position seems statistically significant. And since he did diffuse with 63 symbols, it seems even more statistically significant.
I can change the variables, and you might find it interesting. Will show more thorough statistics soon.
EDIT: So I may be comparing apples to oranges instead of apples to apples. I don’t know, but this may give some more perspective.
I made a table and changed the variables.
Blue box upper left: There are 705 occurrences in Jarlve’s 100 message plaintext library where there are 21 or more of the same plaintext in the same message. Of those, there are 2 where those plaintext land on 0 prime positions.
Blue box lower left: Same 705 occurrences with 21 or more of the same plaintext in the same message. Of those, there are 162 where the plaintext land on 3 or fewer prime positions. So you can see, if I change the variable for prime positions from 0 to 3, the count of occurrences changes dramatically from 2 to 162.
Red box: Again, in Jarlve’s library, there are 542 occurrences where the same plaintext count is 24 or more in the same message. And of those, there are 3 where there are 1 or fewer of those plaintext that land on prime positions.
3% of random shuffles of the 340 results shows the + landing on only one prime position. And 3% of Jarlve’s messages show similar statistics. I am not an expert in interpreting statistics. But because Zodiac used 63 symbols to diffuse and there is only one symbol with a count of 24, this may be statistically important. I am just trying to think of a cipher model that would explain prime phobia, the period 19 bigram repeats, three symbols cycling together on exclusively odd positions, and the cycle statistics in general.
One final thought for the night. I made the 340 into a message that is 18 columns by 19 rows, which is conducive to a period 19 transposition scheme. The prime positions line up with each other in columns.
The 19 is the + symbol. Can anyone think of a simple relationship between transposition and prime phobia?