I’ve been trying to collect the many test ciphers people have made here on these forums and from other places, because I wanted to explore this question:
How much do other ciphers resemble Zodiac’s 340-character cipher (Z340)?
To try to answer the question, I made measurements of many aspects of the cipher texts to compare them to Z340.
The set of measurements for a cipher text is considered as a point in a multidimensional space. You can think of the Z340 sitting at a point in this space, and all the other ciphers each sitting at their own points somewhere else in the same space. A cipher "resembles" Z340 if it is close to the Z340’s point in space. I am interested to find many such ciphers that are similar to Z340.
Here is a spreadsheet that summaries the measurements for about 200 test ciphers:
https://docs.google.com/spreadsheets/d/ … sp=sharing
Hover over each column to see a description of the measurement. I will also include descriptions below. Columns with green backgrounds hold part of the distance calculation for a specific measurement. Each is the difference squared of the cipher’s measurement and Z340’s measurement. All the green columns are normalized and combined to calculate the "Total Distance" score. Ciphers with small Total Distance values have many measurements that resemble those of Z340. Those with larger Total Distance values have fewer measurements that resemble those of Z340.
Please let me know of other test ciphers I can include in the list. Also, let me know of other measurements you think are important to include as ways to look for similarity to Z340. Next, I am going to try to generate some more Z340-like ciphers under different schemes – perhaps a few more as simple homophonic substitution, then will move to columnar transposition.
Description of columns and measurements:
TOTAL DISTANCE FROM Z340: Euclidian distance to each measurement’s squared difference from the corresponding measurement for the original Z340 (green columns). The squared differences are normalized by the max squared difference across all test ciphers, in an effort to equalize the "importance" of each measurement. Smaller values suggest the cipher text’s qualities are more similar to those of Z340. Larger values suggest the cipher text’s qualities are less similar to those of Z340.
symbolic: Symbolic representation of cipher
numeric: Numeric representation of cipher
length: Length of cipher
multiplicity: Multiplicity is the ratio of the number of distinct symbols to the total cipher length. The difficulty in solving a cipher increases as multiplicity increases.
ioc: Index of coincidence, a comparison of the cipher’s symbol distribution to a uniform distribution.
ioc flat: This is the index of coincidence for the cipher if it had the same quantity of each symbol.
diff: The percentage difference between the actual ioc and the flat ioc. Higher percentages imply a "rougher" distribution of symbols.
chi2flat: Chi^2 statistic, compared to a uniform distribution. Ciphers with flatter distributions of symbol counts have smaller values. Rougher distributions have larger values. A value of zero means every symbol occurs the same number of times.
entropy: A measurement of the "unexpectedness" of the ciphertext. The higher the entropy value, the higher the unexpectedness or randomness.
1-grams: Distribution of unigram counts, from highest to lowest. Distance measurement compares these distributions to Z340’s unigram distribution.
Example: 24 12 11 <== most common symbol occurs 24 times, 2nd most common occurs 12 times, 3rd most common occurs 11 times
Repeated values are shown in parentheses. For example, 10(4) means the count of 10 occurs 4 times in a row.
2-grams: Counts of 2-grams, in descending order.
Example: 3(4) 2(17) 1(293) [25 repeats]
Translated: There are 4 2-grams that each occur 3 times. There are 17 2-grams that each occur 2 times. The remaining 293 2-grams each occur only once. There are a total of 25 repeated 2-grams (for example, AB AB AB counts as 2 repeated 2-grams).
3-grams: Counts of 3-grams, in descending order.
4-grams: Counts of 4-grams, in descending order.
5-grams: Counts of 5-grams, in descending order.
even/odd repeated 2-gram spread: Two ciphers are made from the cipher text: The first has all even positions removed. The second has all odd positions removed. Then repeated 2-grams are counted for each. This measurement is the difference between them. Z340 has a unusually large difference.
period with the most repeating 2-grams: Period at which the highest number of repeated 2-grams is observed.
Z340 has significantly more 2-grams at other periods.
peak period 2-grams distribution: Distribution of repeated 2-gram counts at the period in which the highest 2-gram count is observed.
Example: 4 3(5) 2(24) 1(272) [37 repeats] <== This means there was a 2-gram with a count of 4, 5 2-grams with a count of 3, 24 with a count of 2, and 272 that only occurred once each, for a total of 37 repeats.
repeating 2-gram distribution for all periods: Repeating 2-grams are counted at each period from 1 to (length/2). The results are sorted in descending order.
overall repeating fragment improvement: An estimate of the distribution of repeating fragments (ngrams, and patterns with wildcards, such as A?B??D). Positive numbers suggest there are more repeating fragments compared to Z340. Negative numbers suggest there are fewer repeating fragments compared to Z340.
period with best fragment improvement: The repeating fragment measurement is performed for all periods from 1 to (length/2). The period with the highest measurement is recorded.
best fragment improvement among all periods: The highest repeating fragment measurement observed for all periods.
repeating fragment improvement distribution for all periods: The distribution of repeating fragment measurements for every period from 1 to (length/2). Results are sorted in descending order.
unique per-line symbol counts for Olson lines: FBI’s Dan Olson observed that Lines 1-3 and 11-13 are much more random than the other lines. Each line has zero repeated symbols. This measurement counts the number of unique symbols in each line. Since Z340 has zero repeats in each line, its result is: 17 symbols * 6 lines = 102.
jarlve nonrepeat score 1: Jarlve’s "nonrepeat" measurement. For each position of the ciphertext, it adds the length of the large string there containing no repeated symbols. Larger values suggest greater use of cycles during homophonic substitution, since the cipher author is trying to limit symbol reuse.
jarlve nonrepeat score 2: Jarlve’s alternate "nonrepeat" measurement. Smaller values suggest greater use of cycling symbols during homophonic encipherment.
More info: viewtopic.php?p=42120#p42120
perfect cycle score (L=2): A score that reflects how "perfectly" homophone cycles of length 2 are appearing in the ciphertext.
Higher scores indicate greater use of regular cycles. Lower scores suggest homophone symbol assignments are more random.
perfect cycle score (L=3): A score that reflects how "perfectly" homophone cycles of length 3 are appearing in the ciphertext.
Higher scores indicate greater use of regular cycles. Lower scores suggest homophone symbol assignments are more random.
jarlve m_2s_cycles: Jarlve’s length 2 homophone cycle measurement. Higher values suggest more regular assignment of symbols. Lower values suggest symbol assignments are more random.
More info: viewtopic.php?p=43770#p43770
pivot score: "Pivots" are defined here: http://zodiackillerciphers.com/wiki/ind … tle=Pivots
If the cipher has a pair of pivots, the pivot score gets the frequency of each symbol involved in the pivots, and multiplies them all together.
A lower score suggests a lower probability of the pivots occurring by random chance. A higher score suggests a greater probability of the pivots occurring by random chance.
prime phobia: This measurement has two counts.
The first is the number of times the most frequent symbol lands on a position that is not prime.
The second is the number of times the second most frequent symbol lands on a position that is not prime.
Z340 has unexpected high counts for both (23 out of 24, and 11 out of 12).
solution: The plaintext for the cipher (if known) (note: this part is not yet fully filled out)
url: Source or info for the cipher
I like what you’ve done here doranchak.
Some observations that I find interesting: only doranchak’s evolution ciphers also have pivots, smokie’s wildcard ciphers correlate well with the 340.
Thanks, Jarlve. It’s nice to have all the test ciphers in one place – very useful for running tests on specific kinds of ciphers.
With so many measurements, it’s hard to optimize a generated cipher for every single one. I’m going to have to think about ways to combine or prioritize individual measurements. Or find some optimization algorithm that is designed specifically for a large number of objectives. It is the https://en.wikipedia.org/wiki/Curse_of_dimensionality !