Oh I see. Well, in general the operation does not improve things on my end.
That’s not the goal either. A demonstration, nothing more. But I have some ideas based on "interlocked" ciphers. Let’s see how much time I can invest the next few days.
Sorry to be so persistent on the matter.
That’s exactly what we need…different points of view! If we all had the same view, this would not be the most efficient way to look for a solution. Perhaps I am completely barking up the wrong tree, but I may also find a new clue tomorrow. Who knows?
Thinking up possibilities and discussing them is what takes us forward. And that’s exactly why I appreciate this forum
Indeed. I really hope that you find something new!
I took a look at the effects of the ioc and encoding randomization on the unique sequences. Each graph/mountain is the average of 10,000 sequential homophonic substitution ciphers. None of the options listed is a good match for the 340. Either something funny is going on in the 340 or it is a significant outlier.
1) A higher ioc compresses the mountain horizontally:
2) More encoding randomization also compresses the mountain horizontally:
Yeah. These images are from December 2014. But have not tried to interpret patterns:
Nice – have you done a similar plot of Z408?
The mountain most offset to the right is generally the true direction of the sequential homophonic substitution.
And this can be generalized to this aspect of the substitution: When encoding, symbols undergo a "spreading out" effect (in the normal reading direction) due to cycling of symbols within groups of homophones. Is that correct?
Does the effect still occur for homophonic substitution that is completely randomized (i.e., no cycles)? You probably already answered that long ago.
EDIT: Just studied your "average mountain" plots and I would interpret your results like this:
1) A cipher with more repetition of symbols (higher IoC) will have a harder time maintaining longer non-repeating sequences.
2) A cipher with more randomization in homophone groups will also have a harder time maintaining longer non-repeating sequences, since members of the groups are allowed to repeat sooner than if they were fully sequential.
So this makes some intuitive sense. Does that match your interpretations?
Yeah. These images are from December 2014. But have not tried to interpret patterns:
Nice – have you done a similar plot of Z408?
Here are some new ones.
340:
18 17 16 15 14 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 22 21 20 19 18 17 18 17 16 15 14 13 12 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 8 10 9 8 7 6 5 4 3 2 1 7 15 14 13 12 11 10 9 17 16 15 14 13 12 11 17 16 15 14 13 13 12 11 10 9 8 19 18 17 16 15 14 13 12 11 10 19 18 18 17 16 15 14 13 12 11 10 9 8 7 17 16 15 15 14 13 12 11 10 9 8 7 6 5 11 10 9 8 7 8 7 6 5 4 3 2 18 17 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 12 11 10 18 17 16 15 14 13 23 22 21 20 19 20 19 18 17 16 18 17 16 15 14 13 14 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 13 12 11 19 18 17 16 15 14 13 12 11 10 9 17 16 17 16 15 14 13 12 11 10 9 8 7 6 5 7 6 5 4 3 2 1 17 22 21 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 12 12 11 10 9 8 8 7 6 5 4 3 2 1 5 4 3 17 16 15 21 20 19 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 18 17 16 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
408:
5 4 3 2 11 10 9 8 7 6 5 4 3 19 18 17 16 15 14 13 12 11 10 13 12 11 10 9 8 7 6 5 4 14 13 12 11 24 23 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 27 26 25 24 23 22 21 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 17 16 15 14 13 17 16 15 14 13 18 17 16 17 16 15 14 13 12 11 10 9 8 7 6 5 9 8 7 6 6 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 32 34 33 32 33 32 31 30 30 29 28 27 26 25 24 23 22 21 27 26 25 25 24 23 22 21 20 19 19 18 17 16 15 14 13 12 11 13 12 11 10 9 8 7 6 23 22 21 20 19 18 30 29 28 27 29 28 27 26 25 24 23 24 23 22 21 22 21 21 20 19 18 17 19 18 26 25 24 23 22 21 20 19 18 17 16 16 15 14 13 12 11 10 9 16 15 14 13 12 11 10 9 8 7 6 5 4 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 11 11 10 9 8 7 6 5 4 3 2 1 13 12 11 10 9 8 7 6 5 4 3 2 11 10 9 8 7 6 5 4 3 2 15 14 13 12 11 10 9 8 7 6 5 4 6 5 4 12 13 12 11 10 9 8 7 6 5 11 10 9 14 13 12 11 12 11 10 9 8 7 6 5 4 3 14 13 12 11 10 9 10 9 8 7 6 5 4 3 2 1 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 17 16 15 14 13 12 11 10 9 10 9 8 7 8 9 8 13 12 11 10 9 10 9 8 7 6 5 4 3 2 1
The mountain most offset to the right is generally the true direction of the sequential homophonic substitution.
And this can be generalized to this aspect of the substitution: When encoding, symbols undergo a "spreading out" effect (in the normal reading direction) due to cycling of symbols within groups of homophones. Is that correct?
Yes. Symbols spread out evenly throughout and around the middle point of the cipher. It is very easy to see and determine the properties of sequential homophonic substitution by scaling down the problem. For instance "ABCABCABC" versus "CAABACBCB". Of course all these properties are emergent from one single phenomena. But there seems to be no one single measurement/property that perfectly captures the phenomena in practice because of the randomness of the underlying plaintext.
Does the effect still occur for homophonic substitution that is completely randomized (i.e., no cycles)? You probably already answered that long ago.
No. These are quite random.
EDIT: Just studied your "average mountain" plots and I would interpret your results like this:
1) A cipher with more repetition of symbols (higher IoC) will have a harder time maintaining longer non-repeating sequences.
2) A cipher with more randomization in homophone groups will also have a harder time maintaining longer non-repeating sequences, since members of the groups are allowed to repeat sooner than if they were fully sequential.So this makes some intuitive sense. Does that match your interpretations?
Yes. That is a very good interpretation and it is indeed as intuitive and simple as that.
Unique sequences, 340 versus 408:
408 without the last 8 rows, this graph/mountain looks allot more natural:
408 with and without the last 8 rows superimposed on top of each other, the area colored red is the difference between the two. This area is more left shifted and less significant, the last 8 rows do not add very long sequences to the result. Just a simple example of the system:
It is odd that the 340 has such a high spike that is so much shifted to the right.
It is odd that the 340 has such a high spike that is so much shifted to the right.
Within a sequential homophonic substitution 25% encoding randomization hypothesis the length 17 spike on its own it is a ~2.5 sigma observation. Not strong but I still hold some value to it.
But there seems to be no one single measurement/property that perfectly captures the phenomena in practice because of the randomness of the underlying plaintext.
What about unigram distance? http://zodiackillersite.com/viewtopic.p … 902#p53902
It seems to capture two phenomena: Anomalous gaps between symbols, and the overall spreading out of symbols.
My own implementation of the measurement in randomization tests shows a 2.6 sigma for Z408 and 4.4 for Z340.
Cycles have many properties. Unigram distance does not capture the goodness of the cycles nor the increase of bigrams. If unigram distance is considered as a single sum, then there is the problem that the ratio of 1) Anomalous gaps and 2) Overall symbol spread is unknown. We know that in the case of the 340 the high unigram distance is caused by the anomalous large gaps of some symbols + sequential homophonic substitution. But a high unigram distance is possible without anomalous large gaps.
Just for my general understanding:
We use the following pseudo-source code to generate these statistics:
for i = 0 to 339 length = getLongestUniqueSequence(i) table[length] += 1
We determine the longest sequence for each position in the cipher and put the result in a table. The further the position progresses, i.e. the larger the variable "i" becomes, the shorter the maximum achievable sequence lengths. With 63 different symbols there can theoretically also be a maximum unique sequence length of 63. However, this length can only be reached at positions 0 to 276. From position 277 only a maximum unique sequence length of 62 is possible, from position 278 only 61 and so on.
This in turn means that towards the end of the cipher the short sequences also increase. Theoretically, therefore, most of the "left half of the mountain" would represent the end of the cipher.
Shouldn’t this be taken into account in the calculation by means of a weighting?
Translated with http://www.DeepL.com/Translator
This in turn means that towards the end of the cipher the short sequences also increase. Theoretically, therefore, most of the "left half of the mountain" would represent the end of the cipher.
Shouldn’t this be taken into account in the calculation by means of a weighting?
You could exclude sequences that terminate at the end of the cipher but I do not think it is an issue.
I am considering excluding sequences that terminate at the end of the cipher because their length is uncertain. Good point Largo. Here is the difference:
I am considering excluding sequences that terminate at the end of the cipher because their length is uncertain. Good point Largo.
What about normalizing the ones at the end by the fraction of the max possible length?
For example, Z340 has a max possible unique length of 63 due to its alphabet.
Towards the beginning, a unique string of length 10, out of a possible 63, would score: 63/63 = 1. Add 1 to the "10" column in the histogram.
But let’s say towards the end, a unique string of length 10 is found, but a max length of only 40 is possible. So it would score: 40/63 = 0.63. Add 0.63 to the "10" column in the histogram.
Not sure how much it would matter.