Evolving 2 symbol cycles.

Largo

(@largo)

Posts: 454

Honorable Member

Topic starter

I just did a test to check my assumption for the strange cycle behavior of z340 ("interlocked cipher"). If there are interspersed symbols in z340 that do not belong to the LTRB order, then it may be possible to identify them using a genetic algorithm. The first test was an ideal happy sunshine case:

0ABCDEFABCDEFABCD
E1FABCDEFABCDEFAB
CD2EFABCDEFABCDEF
ABC3DEFABCDEFABCD
EFAB4CDEFABCDEFAB
CDEFA5BCDEFABCDEF
ABCDEF6ABCDEFABCD
EFABCDE7FABCDEFAB
CDEFABCD8EFABCDEF
ABCDEFABC9DEFABCD
EFABCDEFAB0CDEFAB
CDEFABCDEFA1BCDEF
ABCDEFABCDEF2ABCD
EFABCDEFABCDE3FAB
CDEFABCDEFABCD4EF
ABCDEFABCDEFABC5D
EFABCDEFABCDEFAB6
7CDEFABCDEFABCDEF
A8BCDEFABCDEFABCD
EF9ABCDEFABCDEFAB

In this cipher there is a perfect order of ABCDEF, which was interrupted once in each line by a different symbol (0-9). My genetic algorithm has identified the pattern in about 50 generations. As fitness function the 2-Cycle-Score from Jarlve was used.
Next I took the first 320 characters of z408 (16×20) and substituted them 100% cyclically. Then I inserted an additional symbol in each line to interrupt the cycle.

First 320 letters from z408, 16×20

h:cjajCuU5AEYxXY
:39ybdro+hLc1K=;
veFISqCw5M;Xn0Ir
7s2DRjhuUcAJlC:k
Ei;P5qNpgI=GVaoL
4xbZv13;d7hKwFy;
XMsBDRJ+t=SoiAc;
ZuXIdU:N=jCuU1X;
0L25qEJhQPK;gwpa
;=MsNFnc:uC7Ex-Y
3GyRe+5Lho0QPA9g
wsaVN2DqJxLwc7ET
Xrtn=bj1XIIlCspi
J5GUNF34yKLYZVw=
IhscMN2dLlp+RCH5
0hlc:u9Ptg4XnACq
YDGik5eaZ7BdU:wF
xh2DQ3jcuUyHlC:u
9+b=;0;ToUiQP15l
h:uRXsEcQgT=v;TA

Added 20 Nulls (one per row)

Ah:cjajCuU5AEYxXY
:B39ybdro+hLc1K=;
veCFISqCw5M;Xn0Ir
7s2DDRjhuUcAJlC:k
Ei;PE5qNpgI=GVaoL
4xbZvF13;d7hKwFy;
XMsBDRGJ+t=SoiAc;
ZuXIdU:HN=jCuU1X;
0L25qEJhIQPK;gwpa
;=MsNFnc:JuC7Ex-Y
3GyRe+5LhoK0QPA9g
wsaVN2DqJxLLwc7ET
Xrtn=bj1XIIlMCspi
J5GUNF34yKLYZNVw=
IhscMN2dLlp+RCOH5
0hlc:u9Ptg4XnACPq
YDGik5eaZ7BdU:wFQ
Rxh2DQ3jcuUyHlC:u
9S+b=;0;ToUiQP15l
h:TuRXsEcQgT=v;TA

On this picture you can see more clearly what is meant:

The 16×20 cipher has a 2-symbols cycling score of 3706. After inserting the nulls the score is 3426. Now I ran my genetic algorithm again. After only 47 generations a score of 4045 was reached. This is significantly higher than the score of the perfect cyclic cipher.

Conclusion:
It does not seem possible to identify interspersed symbols within a homophonic substitution in this way. A perfect cyclic cipher does not generate the maximum reachable score of the 2 symbol cycle scoring. So a different approach would be needed here. Can anyone think of a better solution? Or does someone see an obvious error in my analysis?

Translated with http://www.DeepL.com/Translator

Posted : September 8, 2018 12:30 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Hey Largo,

Your first text is perfect except for the nulls, so it is possible to find and be sure of the nulls since changes to all the other symbols cannot improve things. "ABCABCABC" is perfect. With sequential homophonic substitution ciphers, even when they are "perfectly cyclic", they are not perfect as in your first text and all sort of improvements are possible. This then becomes a big deal with a very large search space. Either decrease the search space or use a stronger statistic. Imagine trying to find a transposition while allowing all 340 symbols to be swapped or uniquely rearranged, the statistic would always fall short.

AZdecrypt

Posted : September 8, 2018 10:17 pm

Quicktrader

(@quicktrader)

Posts: 2598

Famed Member

As a strict homophone substitution believer I’d say:

"Of course does the cipher behave differently if you add additional symbols/homophones".

It then automatically should get a higher or lower score. ‘Higher’ could be accidential or, by adding symbols in each line, due to strengthening the overall sequential character of the cipher.

QT

*ZODIACHRONOLOGY*

Posted : September 9, 2018 1:19 pm

Largo

(@largo)

Posts: 454

Honorable Member

Topic starter

With sequential homophonic substitution ciphers, even when they are "perfectly cyclic", they are not perfect as in your first text and all sort of improvements are possible.

Thank you for the explanation. This confirms my assumption that in this way it is impossible to find the positions of the nulls. If there is a null per line, then there are 17 possible positions per line. With 20 lines this makes 17^20 = 4.064.231.406.647.572.522.401.601 possibilities. In addition, it is even quite easy to sort the symbols "by hand" so that a better score is achieved.

As a strict homophone substitution believer I’d say:

"Of course does the cipher behave differently if you add additional symbols/homophones".

Even if z340 is a pure homophonic substitution, there must be a reason for the interrupted cycles, no? If we could find the locations where the cycles are interrupted, we would be one step further in any case (whether pure homophonic substitution, underlying transposition or interlocked cipher). Pity it doesn’t work with my approach.

Translated with http://www.DeepL.com/Translator

Posted : September 9, 2018 1:55 pm

Quicktrader

(@quicktrader)

Posts: 2598

Famed Member

Even if z340 is a pure homophonic substitution, there must be a reason for the interrupted cycles, no? If we could find the locations where the cycles are interrupted, we would be one step further in any case.

That is correct.

The following two symbols, for example, could both represent the same (frequent) letter (‘w’ and some type of filled out ‘circle’). They only occur in the first/last vs. the middle part of the cipher and may or may not belong together. The ‘w’ symbol (here represented by ‘A’) even occurs four times in a string of moreless 27 symbols (14.8%; line 17 -19). Except occurring once in line 2 and line 3 it does not occur anywhere else.

Each letter has either one or a group of homophone(s). Which homophone is actually selected depends on both, the encryption process as well as the actual cleartext (if there is only two K in the cleartext, the homophone won’t be used until the letter occurs). The encryptor can either choose a sequence, shuffle the homophones or use a group of homophones for certain parts of the cipher.

The previous can also vary for each letter (and most likely does). It is questionable if Z had used any solid ‘system’ in this cipher. Even in the 408, although using sequences, he got mixed up with those sequences towards the end of the cipher.

I always asked myself why that happened but I guess he either wanted to complicate the cipher more or simply got confused during the set-up of the cipher. When ‘creating’ the 340, he already had known about it being complicated to keep up the sequences for ~20 letters simultaneously..so he discarded the sequences and rather used ‘areas’ where he used the homophones. This, however, is not always the case…some homophones occur at almost any part of the cipher (e.g. the + symbol).

QT

*ZODIACHRONOLOGY*

Posted : September 9, 2018 2:17 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Even if z340 is a pure homophonic substitution, there must be a reason for the interrupted cycles, no? If we could find the locations where the cycles are interrupted, we would be one step further in any case (whether pure homophonic substitution, underlying transposition or interlocked cipher). Pity it doesn’t work with my approach.

The problem is that we cannot even say that the cycles are interrupted though that is one possibility. We need to formulate some hypotheses for the 340 being less cyclic than the 408 and then see if we can distinguish between them. Do that before assuming nulls imo.

AZdecrypt

Posted : September 9, 2018 4:25 pm

Largo

(@largo)

Posts: 454

Honorable Member

Topic starter

The problem is that we cannot even say that the cycles are interrupted though that is one possibility. We need to formulate some hypotheses for the 340 being less cyclic than the 408 and then see if we can distinguish between them. Do that before assuming nulls imo.

I expressed myself somewhat misleadingly. I don’t necessarily mean "nulls" in the sense of blenders. These "nulls" can just as well be interlocked ciphers. Like the example linked by doranchak:
viewtopic.php?f=81&t=3982&start=60#p64819

My test was not based on any particular assumption. Rather, it was about finding the positions where a cycle is interrupted (if that applies at all to z340, as you just mentioned).

Posted : September 9, 2018 4:35 pm

Quicktrader

(@quicktrader)

Posts: 2598

Famed Member

To compare, the 408 (all n-gram symbols marked blue):

click to see full picture

QT

*ZODIACHRONOLOGY*

Posted : September 9, 2018 6:13 pm

Largo

(@largo)

Posts: 454

Honorable Member

Topic starter

The encryptor can either choose a sequence, shuffle the homophones or use a group of homophones for certain parts of the cipher.
…
When ‘creating’ the 340, he already had known about it being complicated to keep up the sequences for ~20 letters simultaneously..so he discarded the sequences and rather used ‘areas’ where he used the homophones. This, however, is not always the case…some homophones occur at almost any part of the cipher (e.g. the + symbol).

In my opinion, there are many possible explanations for the distribution of homophones. If the plaintext uses unusual words, this is also reflected in the substitution. Such phenomena can also occur if the key does not really match the actual letter frequency of the plaintext. Another possibility would be that several keys were used or the key was rotated after n letters. A transposition can be the reason too.

For example, if you take a book from Projekt Gutenberg and create thousands of ciphers (always in 340 sizes, start position plus 1), you will always get strange examples with strange statistics. If you then look at the plain text, you can usually see the reason for the strange behavior.
Deciphering a homophonic substitution is only made more difficult by variable cycles if one tries to solve the cipher manually. Current solvers have no problem with this and even completely random cycles are no challenge. I don’t want to categorically rule out that 340 is an LRTB substitution like z408, but I’m extremely skeptical about that. In fact, if there is no transposition, different keys or other additional steps involved, z340 should have been solved long ago.

Do you have sample ciphers that are classically substituted but cannot be solved by AZDecrypt? I would almost certainly rule out a "straight homophonic substitution" variant like the one in z408.

Translated with http://www.DeepL.com/Translator

Posted : September 10, 2018 1:20 pm

Quicktrader

(@quicktrader)

Posts: 2598

Famed Member

IMO, Z had definitely used a reference table for his letter frequencies. For certain letters he had used more homophones although the letter acutually occurs less often in his cleartext.

B vs. D (B occurs more often but has only one instead of two homophones)
F vs. G (G occurs more often but has only one instead of two homophones)
H vs. M (H and M occur equally often but a different amount of homophones was chosen)
L vs. A (A has four homophones instead of three, although the letter L actually occurs more often in his cleartext)

Thus, he must have had some type of frequency table looking like

Homophones / Letters
7: E
4: TAOINS
3: LR
2: DHF
1: BCGKMPUVWXY

while according to his cleartext a frequency table would actually look like this:

Homophones / Letters
7: E
5: TI
4: OL
3: ANSR
2: GHM
1: BCKPUVWXY

The differences are obvious. Also, according to the first table, the original letter frequency table Z had used must have had a frequency order like this:

E.TAOINS.LR.DHF.CUMPGWYBVKX

while a standard frequency table would usually look like this:

E.T.AO.INS.R.H.LD.CUM.FPGWYB.VKXJQZ
(e.g. ‘Scott Bryce’)

Something interesting: By using more homophones for the letter ‘L’, Z clearly had ‘preferred’ the letter ‘L’ over the letter ‘H’.

He also did so with the letter ‘F’ over the letter ‘C’. Nowadays, the opposite would be expected. To rank the letter ‘L’ more frequent than the letter ‘H’ is unusual. This might comply with his ceartext, but as we’ve seen previously, his frequency table did not derived from that one. It is possible, however, that Z had actually counted his own letters from different texts he had written. Those indeed might have had a more frequent ‘L’ than ‘H’.

In any case, Z had sort of a ‘weak’ reference table from either himself or some crappy cryptography book / magazine.

Well, besides that I still disagree (slightly) regarding the 340 not being a regular substitution cipher. However, I’ve been proven to be wrong regarding the possibility to solve such a complex cipher. In fact, we don’t know. What is interesting, however, is that in the 340 the bigrams/trigrams indeed occur mostly in the first part of the cipher, while they seem to be distributed quite equally in the 408.

Maybe we should start to think about what could cause such an irregularity? Why could bigrams occur less often towards the end of such a cipher?

QT

*ZODIACHRONOLOGY*

Posted : September 10, 2018 3:58 pm

Largo

(@largo)

Posts: 454

Honorable Member

Topic starter

Thank you Quicktrader for the detailed description of the key frequencies! I will take a closer look at this, it looks really interesting.

Well, besides that I still disagree (slightly) regarding the 340 not being a regular substitution cipher. However, I’ve been proven to be wrong regarding the possibility to solve such a complex cipher. In fact, we don’t know. What is interesting, however, is that in the 340 the bigrams/trigrams indeed occur mostly in the first part of the cipher, while they seem to be distributed quite equally in the 408.

Maybe we should start to think about what could cause such an irregularity? Why could bigrams occur less often towards the end of such a cipher?

A few months ago I published a tool that allows you to create ciphers from ebooks. These can be filtered according to certain criteria. For example, Bigram discrepancies between the upper and lower half.
viewtopic.php?f=81&t=3709&p=58464

For testing I just took "Pride and Prejudice" from Project Gutenberg. Here one of the results as an example:

Ciphertext:

8q8rSsc0!OTFQUP7A
jtkWxmhbKyc7tzmVO
8K8d!g51B2nGH!P6J
CXYID""!Du!O43eFo
C0DL1P!O6Sxpbl2Pv
OPLTqu4ZCMwr7sU!O
uutVPuJcSTAdvoBN3
xCW8e!!O58BHIMt7m
Uwn2by6XfcNjd3dGH
eKkCbJY5VoClOLP0F
LO88zPDZ3OJcAM0dA
N!P68xpL!O4Sue1zq
UAWjrblBXxeIGMP2g
sbfGt3OFmrQ!P5cNj
YOC8dUK!P6Dxe7skP
Lo8B0y!O4uVOJ1zDX
7pS!lb!D2PIj!O5AM
3xqHBgTcU!!P6FLO8
0yd1CWzP4IkMr7sTX
QteF2O!P5bvcDNdKl

Plaintext:

WEWEREATYORKPROVI
DEDSHECANHAVEHERO
WNWAYBUTITELLYOUM
ISSLIZZYIFYOUTAKE
ITINTOYOURHEADTOG
OONREFUSINGEVERYO
FFEROFMARRIAGEINT
HISWAYYOUWILLNEVE
RGETAHUSBANDATALL
ANDIAMSUREIDONOTK
NOWWHOISTOMAINTAI
NYOUWHENYOURFATHE
RISDEADISHALLNOTB
EABLETOKEEPYOUAND
SOIWARNYOUIHAVEDO
NEWITHYOUFROMTHIS
VERYDAYITOLDYOUIN
THELIBRARYYOUKNOW
THATISHOULDNEVERS
PEAKTOYOUAGAINAND

Bigrams upper half: 6
Trigrams lower: 0

Bigrams upper half: 20
Trigrams lower half: 5

Another example:

Ciphertext:

jxb7ncSodWOKefGpb
LX8rT0PvA7scKk1zb
2BYxP4HlftZOTmcZP
LdfIndW3ObjJC0D3g
50A8PMkoUyO8GPNw!
O48O5Ilzc7pvPKqOL
Bu!P6xdkfrrNGqu12
O!P6VXrHuD8ONlsU8
ytK!P58O6Ijzd7pYQ
PFqMAuBxckMO3cZFr
l!P4J!VsWPG50CONP
u1zdKFDKw!O6uPS!O
4TFALjMqXZ3PH!kBe
zblhmU0cANI!vVnd1
ouupi23OPJ5hxCeJc
uSeBjuPV8yb1gqiPJ
tWPu2xmJPScICuO6T
hPJuOU3XQVDKvWuSP
JdfSneixOuQTPJAXp
uOUBP4vx0LO12Oyb7

Plaintext:

DHAVEAREASONABLEA
NSWERTOGIVEANDTHA
TISHOULDBESOREASO
NABLEASTOADMITITB
UTIWONDERHOWLONGY
OUWOULDHAVEGONEON
IFYOUHADBEENLEFTT
OYOURSELFIWONDERW
HENYOUWOULDHAVESP
OKENIFIHADNOTASKE
DYOUMYRESOLUTIONO
FTHANKINGYOUFORYO
URKINDNESSTOLYDIA
HADCERTAINLYGREAT
EFFECTTOOMUCHIAMA
FRAIDFORWHATBECOM
ESOFTHEMORALIFOUR
COMFORTSPRINGSFRO
MABREACHOFPROMISE
FORIOUGHTNOTTOHAV

Bigrams upper half: 8
Trigrams lower: 1

Bigrams upper half: 17
Trigrams lower half: 1

In my opinion, the simplest explanation for such nGram discrepancies is the underlying plaintext. This can also be seen in the examples just shown. We prefer to search for symmetries and are amazed when they are broken. NGrams are not by nature evenly distributed over a cipher. Poetry, poems and song-texts have a certain tendency to generate such statistics. Zodiac also liked to repeat himself and his texts often had repetitive sections.
Of course, this does not automatically exclude another explanation. Nevertheless, here again I do not believe that z340 is behaving strangely in this respect. Rather, I will first stick to the simplest explanation.

Translated with http://www.DeepL.com/Translator

Posted : September 10, 2018 5:33 pm

Quicktrader

(@quicktrader)

Posts: 2598

Famed Member

NGrams are not by nature evenly distributed over a cipher. Poetry, poems and song-texts have a certain tendency to generate such statistics.

Correct..not to forget the accidential use of homophones (and bigrams finding together, like drawing balls of different color from a bowl and accidentially getting three yellow ones in a row..).

This is actually why I still stick on the ‘standard’ homophone substitution. To be true, it seems to not even be easy to crack the Dorabella cipher, which most likely is actually no more than a regular substitution..promising results but, so far, nothing else. The 340 won’t be easier..this is why I don’t see need to do transposition etc., but there are different approaches, which is good. The winner takes it all. I hereby offer a total of $100 for the correct solution . Hey Zodiac, wanna earn some bucks?

QT

*ZODIACHRONOLOGY*

Posted : September 10, 2018 11:06 pm

Zodiac Discussion Forum

Evolving 2 symbol cycles.