Zodiac Discussion Forum

Unigram distance cu…
 
Notifications
Clear all

Unigram distance curiosity

130 Posts
10 Users
0 Reactions
45.5 K Views
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Z340 has 10 symbols that occur only in the first or last six lines

The 340 has 10 unique symbols occupying 39 positions that occur only in the first and last six lines.

The 10 exclusive symbols occupy 39 positions of Z340. This is a 1.66 observation compared to randomizations. About 1 in 16 shuffles had at least that many positions occupied by symbols exclusive to those lines.

I posted this table earlier (see codebox), it shows a sigma of 2.37 for the 39 positions while you have it at 1.66. 100000 randomizations: mean=20.27, variance 62.45, standard deviation=7.90, sigma of 39: 2.37.

Row division: top-bottom unique unigram count, sigma
--------------------------
1: 0, -0.20
2: 0, -0.45
3: 4, 1.02
4: 15, 2.93 <---
5: 20, 1.89
6: 39, 2.37 <---
7: 43, 0.30
8: 94, 1.24
9: 147, -0.22

AZdecrypt

 
Posted : November 20, 2017 8:47 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Moonrock, out of all his cycle types considered regional cycling for the 340.

My hypothesis was that the 340 cipher used a combination of regional cycles and semi-regional cycles…

Hey moonrock,

Thanks for your explanation. I have some questions for you if you do not mind.

1. What statistics have you used to compare your ciphers versus the 340?

2. You once made a nifty list of different cycle types. Did you source these from somewhere? If not, what prompted you to come up with it?

3. Your definition of regional cycling seems very interesting in relation to some of the observations in this thread. Namely, a fair amount of symbols in the 340 do not appear in the middle 8 rows. Do you think this could be caused by regional cycling? Could you give some examples of a regional cycle.

4. Could you give another example of a concurrent cycle. I suppose the order between the sub cycles does not really mater as long as each individual sub cycle cycles? In your example (see quote below) I can see the 1 incrementing its position and the 5 is alternating between the 4th and 5th position.

5. I think some of us would like to work on different cycle types. Do you mind if I create a thread for this which lists your original cycle types and some others?

1. The perfect cycle, which has substitutions arranged in an unchanging pattern throughout the entire cipher: 12345 – 12345 – 12345 – 12345.
2. The increasingly random cycle, which has substitutions start off in an organized cycle and gradually become random: 12345 – 12345 – 12435 – 24153.
3. The decreasingly random cycle, which is the opposite of the increasingly random cycle: 24153 – 12435 – 12345 – 12345.
4. The random cycle, which has the substitutions arranged in random order: 32415 – 12543 – 41352 – 53124.
5. The concurrent cycle, which has two separate cycles existing at the same time for a single substitution; 1 and 5 cycling, and 2, 3, and 4 cycling in the following example: 12345 – 21354 – 23154 – 23415.
6. The palindromic cycle, which has the substitutions arranged in an order that reads the same forward and backward: 11211 – 11211 – 11211, 12321 – 12321 – 12321, and 123454321.
7. The inverted cycle, which has a uniform cycle inverted to be the opposite of what it was beyond a certain point: 12345 – 12345 – 54321 – 54321, and 11211 – 11211 – 22122 – 22122. The former example is an example of a perfect cycle being inverted, which creates a palindrome, and the latter example is of a palindromic cycle being inverted.
8. The shortened cycle, which has a cycle decrease in length as the ciphertext progresses: 12345 – 12345 – 1234 – 1234 – 123 – 123.
9. The lengthened cycle, which is the opposite of the shortened cycle: 123 – 123 – 1234 – 1234 – 12345 – 12345.
10. The regional cycle, which restricts substitutions to or from specific regions, or areas, of the ciphertext; this restriction typically manifests as either a restriction to specific rows or to specific columns, and, if used exclusively, is the equivalent of a series of simple substitutions.
11. The semi-regional cycle, which has the frequency of substitutions fluctuate between different regions, or areas, of the ciphertext in a similar way to regional cycling. When regional and semi-regional cycles are combined, it increases their level of security. Both regional and semi-regional cycles are capable of being accompanied by non-regional assignment of substitutions.
12. The sequential cycle, which is when one type of cycle is followed by another type of cycle: 12345 – 12345 – 123454321 – 1234321 – 12321; in this example, a perfect cycle changes to a palindromic cycle, which is then shortened.

AZdecrypt

 
Posted : November 20, 2017 9:17 pm
(@moonrock)
Posts: 9
Active Member
 

Hey moonrock,

Thanks for your explanation. I have some questions for you if you do not mind.

1. What statistics have you used to compare your ciphers versus the 340?

2. You once made a nifty list of different cycle types. Did you source these from somewhere? If not, what prompted you to come up with it?

3. Your definition of regional cycling seems very interesting in relation to some of the observations in this thread. Namely, a fair amount of symbols in the 340 do not appear in the middle 8 rows. Do you think this could be caused by regional cycling? Could you give some examples of a regional cycle.

4. Could you give another example of a concurrent cycle. I suppose the order between the sub cycles does not really mater as long as each individual sub cycle cycles? In your example (see quote below) I can see the 1 incrementing its position and the 5 is alternating between the 4th and 5th position.

5. I think some of us would like to work on different cycle types. Do you mind if I create a thread for this which lists your original cycle types and some others?

1. I am referring to this table:

https://docs.google.com/spreadsheets/d/ … itMiA/edit

One of my test ciphers, called "moonrock1", is in this table and is near the top. moonrock1 was created using the methods that I thought the 340 cipher might be using. moonrock1, however, contains numerous encryption errors because I made it hastily. For example, the number of times each symbol occurs does not match the number of times each symbol in the 340 cipher occurs. I made "improvements" in an additional cipher, but this cipher was not added to the foregoing document, so I do not know how it compares to moonrock1. If you are unable to see moonrock1 in that document, then here it is:

I79,L8#;T<WJ2.4QG
1/ZNAD{-0W$U4OCYJ
VXK9;TTS:;27L_"R
3,4)2<6AD;Z?SF/8{
=X&0_[F-G&N74_*,0
"+.NJO44D:]WTTKN7
R0<V_T}&UOG,4<F#
TTO0’P4D4270QH"+V
G=X_8MZ44}7TC;Z’]
/63K0#1%AJ#GSH2:D
IB;*^4CI27JZ2DUVR
,%8H(/]08WI2"B;);
9,T}U.>)6-4EYR}:.
D45T=0*P1F#Y4]G&#
Y[.1EJ*K&:6;/Z8=K
T]()/6#DC[B_&;+}"
A+2]G)B6^Y*$E4G#-
?L[)ZE4#:A>9PDYL}
934B":(1Z#(TNG)V6
^79)B;-=ESJ5-CH;R

And here is moonrock2, the "improved" version of moonrock1:

079,R8#;JEWT2N%QG
1T6A.YL-0$WI4OCD/
VGKX&J/3:;.Y>_B4
S,4BA<Z.7;Z3SFT(4
K[FUV*F-XF2D4_9,0
"UA#:?447T]$TT=NY
{0E_LVJ<F0?*,4EF2
TJ)0QP4Y4AD+QH?IV
9KG_(MZ44}YJ];Z8C
T63K+N14#JA9SH#:7
I?&9^4]02YJZAD+_
,4’H8J]U(W02?O;"&
[,:}02>O6-{<74ET#
D45:=09P1FNY4CGFN
7GN1</G=F/6;:Z’=K
:]Q"TZ2YCG)VF&0EO
A02]GO)6PDX$}4[.-
S4G"Z<R./A44^Y74E
X3%B)/Q162’TAG)VZ
PD[?B;-=<UA952(^4

2. I didn’t source the list from anywhere. It’s something that I came up with on my own.

3. So when I consider regional and semi-regional cycles to be used in the 340 cipher, the distinct regions in the 340 cipher roughly correspond to the Olson line boundaries, i.e. region 1 is the first few lines, region two is lines ~4 to ~10, and region three is lines ~11 to ~13. It’s unclear where the boundaries between regions are after line 13, so there may be 5 or 6 regions in total or some ciphertext letters or plaintext letters may have their own set of regions while others have their own regions. Between each of these regions, numerous symbols either exclusively appear in one or two of the regions, or they experience very noticeable changes in frequency from region to region. The "plus sign" ciphertext symbol is a noticeable example of fluctuating frequency between the regions. The symbols that have been highlighted as only appearing at the beginning and end are among the ones that may be assigned regionally, yes. My two test ciphers above also contain many such substitutions.

4. 1 and 5 behaving like that in my example was incidental since it was probably a convenient way to make the cycle work. Here are three concurrent cycles, each with two (1 and 2; 3 and 4, and 5 and 6): 123456 – 315624 – 534612 – 531462 – 152634. These example have been constructed in a way that requires each substitution to be used in each regular interval, but this isn’t necessary, e.g. 123456 – 3124 – 5346 – 531462 – 152634. This is the same as the previous example but with some deletions; the concurrent cycles remain. Also, to clarify the concurrent cycle, these are not separate cycles existing for different plaintext letters; they are separate cycles being used for a single plaintext letter that, when combined, may appear to be unrelated to each other.

5. No, I don’t mind. If you feel like doing so would be of benefit to our efforts, then feel free to do so.

 
Posted : November 21, 2017 8:46 am
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

It looks to me like the W, C and Theta are the main contributors to the regional bias, because they cluster together more, and the other symbols are more spread out.

That accounts for 14 of the 39 positions. But even with 25 that is high compared to my 768 test messages. There were only 24 of the 768 messages that had as many as 25 positions for a 6-8-6.

Thanks for sharing more of your ideas moonrock. I don’t think that anybody is going to solve this message alone, so thanks a lot.

Seems like the easiest way to pencil and paper encode a message is to encode all of the letter A, all of the letter B, all of the letter C and so on. That would make it easier to use different cycle types for the different letters. Encoding by position and using different cycle types for the letters would make it difficult to keep track of cycle type.

Maybe if you had A = 1 2 3 4 5 you could encode 1 1 1 2 3 4 2 3 4 5 5 5, which would create the regional biases.

But then next encode B = 6 7 as 6 7 6 7 6 7, which would boost the L=2 cycle scores but make it impossible to differentiate from false L=2 cycles.

Maybe he just encoded high frequency plaintext, like A, in a regional way to hide the cycles EDIT: homophone groups. That to me seems like the simplest explanation for a pencil and paper cipher.

 
Posted : November 21, 2017 4:55 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

3. So when I consider regional and semi-regional cycles to be used in the 340 cipher, the distinct regions in the 340 cipher roughly correspond to the Olson line boundaries, i.e. region 1 is the first few lines, region two is lines ~4 to ~10, and region three is lines ~11 to ~13. It’s unclear where the boundaries between regions are after line 13, so there may be 5 or 6 regions in total or some ciphertext letters or plaintext letters may have their own set of regions while others have their own regions…

Thank moonrock.

What happens to a regional cycle when it leaves its region? Does it become a 1:1 substitute?

AZdecrypt

 
Posted : November 21, 2017 7:37 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

If anyone feels like doing some work on the cycles and cycle types, I have created a thread for it, feel free to use it: viewtopic.php?f=81&t=3616

AZdecrypt

 
Posted : November 21, 2017 8:50 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

I scanned through the 340 in chunks of 8 rows and chunks of 10 rows, sliding down through the message one row at a time. And calculated the cycle scores.

Left column is top row scanned, middle column is lowest row scanned, and right column is score. Only scanning in chunks of 8 rows I didn’t find much of a difference between chunks, except that the highest scores were rows 5 through 12 at 10566 and rows 9 through 16 at 10524.

1 8 10062
2 9 9856
3 10 9746
4 11 9864
5 12 10566
6 13 10490
7 14 10046
8 15 9800
9 16 10524
10 17 10018
11 18 8938
12 19 8716
13 20 9068

Here is a scan of 10 rows chunks. The highest scoring chunk was rows 7-16, which is exactly where the symbols unique to the top and bottom 6 rows do not exist. There is a strong second place though at rows 3-12.

1 10 14888
2 11 15638
3 12 16012
4 13 15118
5 14 15218
6 15 15120
7 16 16052
8 17 14972
9 18 14120
10 19 13908
11 20 13324

 
Posted : November 21, 2017 11:13 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Z340 has 10 symbols that occur only in the first or last six lines

The 340 has 10 unique symbols occupying 39 positions that occur only in the first and last six lines.

The 10 exclusive symbols occupy 39 positions of Z340. This is a 1.66 observation compared to randomizations. About 1 in 16 shuffles had at least that many positions occupied by symbols exclusive to those lines.

I posted this table earlier (see codebox), it shows a sigma of 2.37 for the 39 positions while you have it at 1.66. 100000 randomizations: mean=20.27, variance 62.45, standard deviation=7.90, sigma of 39: 2.37.

Thanks for pointing that out, Jarlve. My test did indeed lack the requirement that the symbol had to appear in both regions. I did another test that generalized to this search: Find symbols exclusive to pairs of equal-sized rectangular regions, where the exclusive symbols appear at least once in each region. This ended up being a rougher version of an even more general search: Find all symbols exclusive to pairs of small-ish regions, where the exclusive symbols appear at least once in each region. This search notices some of the "tighter" clusters such as these:

Are such clusters true features of the encipherment method? I think to answer this will require careful comparisons of the overall distribution of these observations, instead of focusing on individual observations, because there are so many observation samples taking place.

http://zodiackillerciphers.com

 
Posted : November 23, 2017 4:13 pm
(@largo)
Posts: 454
Honorable Member
 

I made a similar observation some time ago (not distance curiosity at all, but region-based symbols). Don’t know if it is worth a deeper look:

The reversed B only occurs three times. Two times in the pivot on the right and one time indirectly connected to the pivot on the left if you assume they form a square. I made some tests by replacing the symbols from the pivot on the right with the symbols from the pivot on the left. No obvious observations. I also replaced all symbols of the pivots by one single symbol. Also no interesting results. Maybe I’ll give my "merge the pivots" approach another try.

 
Posted : November 23, 2017 4:31 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Are such clusters true features of the encipherment method? I think to answer this will require careful comparisons of the overall distribution of these observations, instead of focusing on individual observations, because there are so many observation samples taking place.

Some meta reasoning,

From looking at a perfect cycle many of the measureable properties versus randomizations of sequential homophonic substitution can be inferred/upscaled. One of these measurable properties is that the symbols tend to spread out over the cipher with roughly similar distances. To put it in other words, look at any symbol in the cycle (see codebox) and notice that the distance between them is always 5. This property tends to happen in sequential homophonic substitution ciphers. Ofcourse the distance similarity will be more roughly but certainly measurable versus randomizations and for this reason the observation in the 340 is quite significant. It does not make a whole lot of sense that so many of these symbols would not appear in the middle 8 rows as they are more or less expected to appear at regular intervals.

The observation correlates better with randomizations than with sequential homophonic substitution.

ABCDEABCDEABCDEABCDEABCDE

AZdecrypt

 
Posted : November 23, 2017 10:55 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

I made a similar observation some time ago (not distance curiosity at all, but region-based symbols). Don’t know if it is worth a deeper look:

The reversed B only occurs three times. Two times in the pivot on the right and one time indirectly connected to the pivot on the left if you assume they form a square. I made some tests by replacing the symbols from the pivot on the right with the symbols from the pivot on the left. No obvious observations. I also replaced all symbols of the pivots by one single symbol. Also no interesting results. Maybe I’ll give my "merge the pivots" approach another try.

Very interesting observation, Largo!

 
Posted : November 24, 2017 4:11 am
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

The observation correlates better with randomizations than with sequential homophonic substitution.

Do you think the symbols missing from the middle 8 rows could reflect that a transposition exists that, when reversed, will restore the expected sequential cycles?

http://zodiackillerciphers.com

 
Posted : November 24, 2017 5:18 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

The observation correlates better with randomizations than with sequential homophonic substitution.

Do you think the symbols missing from the middle 8 rows could reflect that a transposition exists that, when reversed, will restore the expected sequential cycles?

No. I consider it nearly impossible.

AZdecrypt

 
Posted : November 24, 2017 8:11 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Generated using borkky’s widget that uses Largo’s font:

viewtopic.php?f=20&t=3641
https://martinlindhe.github.io/zodiac-widget/

AZdecrypt

 
Posted : December 8, 2017 9:12 pm
(@borkky)
Posts: 28
Eminent Member
 

Hi smokie treats & all!

I was just to post and mention I made a interactive widget to visualize your observation about the missing symbols of the central area in viewtopic.php?p=56330#p56330
only to see Jarlve already brought it up here, so thanks!

I find this observation highly interesting. It is plausible the center piece is using one encoding, and the top+bottom uses another.
All we really know about Zodiac’s skills is from what he did in Z408. When it was solved so quickly, he probably combined what he knew with yet another step in order to adress the weakness he’d just learned about in the papers.

So certainly he did not reuse his old cipher key.

Assume he creates two new cipher keys. One of them has some symbols not in the other for some reason.
(maybe he was lazy and reused one older unused key and produced one new)

He splits the clear text in two parts, and encodes them separately.
First half with one cipher, the other half with the other cipher, adding filler symbols at the end to produce a neat rectangle.
Then he makes two horizontal cuts, and rearranges the pieces.

This would make one cipher symbol mean different things in different part of the text and by a minor tweak of his existing knowledge, he produced a much harder crypto.

My math is not so great, but I’ve seen here that you guys could quickly test this with your existing algorithms. Also there is much info on this forum, I am sorry if this has already been tested before.

Anyway – I also wanted to mention you all please feel free to use the software as a skeleton for a quick prototype of other z340 theories, it is a very small javascript program which I believe should be easy to modify to try out stuff. I put the source on https://github.com/martinlindhe/zodiac-widget
(In order to edit: Just download the zip, edit script.js and open index.html in your browser)

The widget itself is on https://martinlindhe.github.io/zodiac-widget/

I hope to do another widget to look more at the symbol distribution of odd/even places in the sequence if time permits.

My Z340 widget for the two-ciphers hypothesis: https://martinlindhe.github.io/zodiac-widget/

 
Posted : December 10, 2017 3:36 am
Page 5 / 9
Share: