Zodiac Discussion Forum

Unigram distance cu…
 
Notifications
Clear all

Unigram distance curiosity

130 Posts
10 Users
0 Reactions
45.9 K Views
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

We have seen that there seem to be unusually large distance gaps between some of the symbols in the 340.

I came up with a simple new statistic and called it unigram distance (it is available in AZdecrypt 1.05 as such). It sums all the distances per symbol which comes out at 15034 for the 340. Interestingly this turns out be a rather large number which does not correlate with homophonic substitution. The observation is significant (but curious) as with randomizations the rarity seems to be somewhere in between 1 in 100.000 and 1 in 1.000.000. A 340 character part of the 408 scores only 13469.

Combinations processed: 1000000/1000000
Measurements:
- Summed: 13302965150
- Average: 13302.96515
- Lowest: 11249 (Randomize(415042))
- Highest: 15053 (Randomize(414948))

One way to replicate the observation is to create a cipher with 2 different keys, one key for the first 10 rows and another for the 10 last rows (encoding restart hypothesis). Here follows such a cipher with a unigram distance of 14897 which closely matches that of the 340.

1  2  3  4  5  6  7  2  2  8  9  10 11 12 13 14 2
15 16 5  17 18 19 20 12 21 22 1  23 20 13 24 25 26
27 28 19 29 3  30 7  23 31 13 32 15 33 25 34 35 36
37 38 39 8  2  2  21 40 10 41 1  2  42 43 44 31 5
3  45 46 27 12 47 13 48 49 15 50 51 52 5  53 54 19
20 12 24 18 34 7  23 55 36 15 31 13 50 22 56 37 38
57 5  32 13 25 20 44 40 8  24 54 2  13 58 18 2  2
30 13 59 21 2  2  23 13 31 12 35 27 1  45 10 43 3
60 15 50 24 5  46 36 12 31 13 20 51 55 27 48 7  2
2  8  9  57 15 61 62 5  49 12 29 17 15 21 22 1  23
63 2  1  39 23 20 57 44 1  3  14 6  13 38 9  20 47
40 42 39 52 10 17 53 51 12 46 28 35 54 25 58 58 32
26 57 62 19 9  42 43 41 44 24 1  59 20 36 14 30 33
3  47 63 58 26 40 42 27 57 6  13 44 22 62 1  39 26
8  42 20 26 34 42 29 41 59 1  51 20 15 17 12 38 26
39 56 19 43 33 48 42 16 1  13 38 8  19 29 41 57 24
20 26 6  33 2  1  31 42 29 41 20 60 32 26 29 41 23
1  55 46 61 20 7  37 11 29 13 2  1  54 42 5  26 41
29 38 25 44 52 42 2  20 4  63 49 21 10 39 19 61 1
15 20 28 33 18 36 1  37 17 50 22 26 41 29 14 3  4

Another way to replicate the observation is to change the row order, you may try to place top over bottom. For instance, placing a 340 character part of the 408 top over bottom increases the unigram distance from 13469 to 14026. And with the 340 it decreases unigram distance from 15034 to 14302.

Here follows a period rollout for the 340, period 1 is the highest.

Periodic: (transposition, untransposition)
--------------------------------------------------
Period 1: 15034, 15034 <---
Period 2: 14237, 13495
Period 3: 13907, 13568
Period 4: 13723, 13768
Period 5: 13364, 13221
Period 6: 12973, 12568
Period 7: 13606, 13208
Period 8: 13223, 12832
Period 9: 13331, 13243
Period 10: 13674, 12105
Period 11: 13690, 13032
Period 12: 13549, 13099
Period 13: 13185, 13723
Period 14: 13424, 13176
Period 15: 13454, 13320
Period 16: 13443, 13068
Period 17: 13190, 13381
Period 18: 12468, 13204
Period 19: 13198, 12372
Period 20: 13381, 13190
Period 21: 12949, 12704
Period 22: 12957, 13640
Period 23: 13450, 13241
Period 24: 12805, 13117
Period 25: 13061, 13626
Period 26: 13262, 13260
Period 27: 13181, 12933
Period 28: 13542, 13973
Period 29: 12886, 13297
Period 30: 13642, 13893
Period 31: 13032, 13690
Period 32: 12888, 13691
Period 33: 13434, 13156
Period 34: 12105, 13674
Period 35: 12222, 13832
Period 36: 13360, 13244
Period 37: 13568, 12962
Period 38: 13382, 13346
Period 39: 13541, 13193
Period 40: 13400, 13424
Period 41: 12772, 13035
Period 42: 13166, 13078
Period 43: 12765, 13240
Period 44: 12634, 13362
Period 45: 13105, 13577
Period 46: 13188, 13374
Period 47: 13264, 13563
Period 48: 13755, 13771
Period 49: 12922, 13649
Period 50: 13224, 13734
Period 51: 13197, 13535
Period 52: 12979, 13173
Period 53: 12851, 13578
Period 54: 12889, 13663
Period 55: 12786, 13801
Period 56: 12822, 13557
Period 57: 12878, 12971
Period 58: 13105, 13265
Period 59: 13267, 13567
Period 60: 13465, 13719
Period 61: 13636, 13335
Period 62: 13683, 13665
Period 63: 13609, 13630
Period 64: 13828, 14204
Period 65: 13531, 13541
Period 66: 13543, 13369
Period 67: 13441, 12940
Period 68: 13221, 13364
Period 69: 12979, 13541
Period 70: 12940, 13476
Period 71: 13129, 13294
Period 72: 12761, 13411
Period 73: 13325, 13279
Period 74: 13379, 13279
Period 75: 13338, 13250
Period 76: 13257, 13593
Period 77: 13181, 13468
Period 78: 13172, 13470
Period 79: 13416, 13377
Period 80: 13432, 13512
Period 81: 13147, 13775
Period 82: 13145, 13904
Period 83: 12973, 13849
Period 84: 13590, 13626
Period 85: 13768, 13723
Period 86: 13684, 13607
Period 87: 13607, 13150
Period 88: 13593, 13097
Period 89: 13790, 13096
Period 90: 13850, 13464
Period 91: 13951, 13633
Period 92: 13985, 13765
Period 93: 13537, 13799
Period 94: 13776, 13977
Period 95: 13875, 14022
Period 96: 13841, 14196
Period 97: 14044, 14148
Period 98: 14081, 13968
Period 99: 13683, 14025
Period 100: 13648, 14117
Period 101: 13687, 13826
Period 102: 13722, 13877
Period 103: 13750, 13883
Period 104: 13636, 13652
Period 105: 13544, 13514
Period 106: 13712, 13595
Period 107: 13405, 13306
Period 108: 13277, 13395
Period 109: 13155, 13605
Period 110: 13059, 13674
Period 111: 12940, 13852
Period 112: 12980, 13860
Period 113: 12886, 13739
Period 114: 13578, 13846
Period 115: 13691, 14037
Period 116: 13575, 14092
Period 117: 13573, 14005
Period 118: 13518, 14052
Period 119: 13587, 14056
Period 120: 13559, 13990
Period 121: 13357, 13832
Period 122: 13395, 13885
Period 123: 13291, 13861
Period 124: 13239, 13964
Period 125: 13308, 13889
Period 126: 13279, 13825
Period 127: 13305, 13723
Period 128: 13273, 13700
Period 129: 13254, 13619
Period 130: 13455, 13625
Period 131: 13535, 13693
Period 132: 13775, 13528
Period 133: 13755, 13642
Period 134: 13625, 13477
Period 135: 13594, 13352
Period 136: 13548, 13151
Period 137: 13733, 13237
Period 138: 13638, 13489
Period 139: 13705, 13364
Period 140: 13797, 13476
Period 141: 13803, 13365
Period 142: 13738, 13495
Period 143: 13603, 13642
Period 144: 13693, 13849
Period 145: 13616, 13798
Period 146: 13732, 13681
Period 147: 13664, 13579
Period 148: 13539, 13590
Period 149: 13795, 13695
Period 150: 13947, 13923
Period 151: 14054, 14067
Period 152: 13878, 14161
Period 153: 13952, 14178
Period 154: 14012, 14257
Period 155: 14092, 14319
Period 156: 13994, 14337
Period 157: 13955, 14681
Period 158: 13693, 14662
Period 159: 13700, 14707
Period 160: 13767, 14828
Period 161: 13755, 14671
Period 162: 13430, 14715
Period 163: 13380, 14761
Period 164: 13181, 14873
Period 165: 13047, 14792
Period 166: 13246, 14679
Period 167: 13292, 14560
Period 168: 13204, 14345
Period 169: 13505, 14344
Period 170: 13495, 14237
Period 171: 13519, 14179
Period 172: 13531, 14140
Period 173: 13567, 14185
Period 174: 13609, 14094
Period 175: 13623, 14181
Period 176: 13549, 14121
Period 177: 13540, 14314
Period 178: 13439, 14313
Period 179: 13329, 14306
Period 180: 13365, 14314
Period 181: 13302, 14281
Period 182: 13325, 14283
Period 183: 13193, 14239
Period 184: 13267, 14310
Period 185: 13245, 14338
Period 186: 13179, 14309
Period 187: 13188, 14285
Period 188: 13177, 14306
Period 189: 13184, 14270
Period 190: 13175, 14259
Period 191: 13210, 14281
Period 192: 13151, 14291
Period 193: 13075, 14216
Period 194: 13118, 14203
Period 195: 13179, 14224
Period 196: 13225, 14202
Period 197: 13158, 14182
Period 198: 13185, 14157
Period 199: 13204, 14089
Period 200: 13251, 14137
Period 201: 13209, 14139
Period 202: 13309, 14146
Period 203: 13340, 14133
Period 204: 13348, 14141
Period 205: 13356, 14087
Period 206: 13201, 13986
Period 207: 13179, 13968
Period 208: 13209, 13950
Period 209: 13173, 13923
Period 210: 13185, 13961
Period 211: 13171, 13925
Period 212: 13209, 13974
Period 213: 13229, 13966
Period 214: 13238, 13950
Period 215: 13184, 13938
Period 216: 13228, 13905
Period 217: 13157, 13893
Period 218: 13153, 13945
Period 219: 12986, 13823
Period 220: 12958, 13817
Period 221: 12998, 13840
Period 222: 13017, 13857
Period 223: 13117, 13925
Period 224: 13131, 13809
Period 225: 13167, 13780
Period 226: 13204, 13825
Period 227: 13250, 13856
Period 228: 13270, 13813
Period 229: 13312, 13815
Period 230: 13331, 13827
Period 231: 13302, 13619
Period 232: 13305, 13596
Period 233: 13332, 13561
Period 234: 13374, 13619
Period 235: 13397, 13599
Period 236: 13406, 13508
Period 237: 13471, 13527
Period 238: 13357, 13489
Period 239: 13378, 13443
Period 240: 13547, 13419
Period 241: 13555, 13329
Period 242: 13582, 13571
Period 243: 13585, 13555
Period 244: 13626, 13745
Period 245: 13652, 13582
Period 246: 13708, 13473
Period 247: 13724, 13550
Period 248: 13733, 13612
Period 249: 13710, 13749
Period 250: 13750, 13743
Period 251: 13723, 13733
Period 252: 13680, 13749
Period 253: 13737, 13608
Period 254: 13660, 13545
Period 255: 13702, 13511
Period 256: 13741, 13520
Period 257: 13683, 13498
Period 258: 13739, 13497
Period 259: 13747, 13589
Period 260: 13784, 13570
Period 261: 13779, 13549
Period 262: 13777, 13560
Period 263: 13749, 13559
Period 264: 13700, 13585
Period 265: 13576, 13654
Period 266: 13729, 13676
Period 267: 13798, 13624
Period 268: 13819, 13596
Period 269: 13835, 13584
Period 270: 13877, 13563
Period 271: 13913, 13575
Period 272: 13929, 13580
Period 273: 14009, 13618
Period 274: 14051, 13578
Period 275: 14019, 13618
Period 276: 14086, 13615
Period 277: 14099, 13723
Period 278: 14101, 13729
Period 279: 14009, 13584
Period 280: 14068, 13583
Period 281: 14101, 13611
Period 282: 14033, 13594
Period 283: 13944, 13598
Period 284: 13859, 13646
Period 285: 13898, 13878
Period 286: 13945, 13856
Period 287: 13698, 13859
Period 288: 13699, 14092
Period 289: 13795, 14071
Period 290: 13793, 14071
Period 291: 13824, 14031
Period 292: 13856, 14005
Period 293: 13976, 14004
Period 294: 13944, 13960
Period 295: 13944, 13921
Period 296: 13901, 13895
Period 297: 13924, 13868
Period 298: 13945, 14072
Period 299: 13953, 14070
Period 300: 13925, 14102
Period 301: 13874, 14358
Period 302: 13804, 14365
Period 303: 13766, 14378
Period 304: 13834, 14381
Period 305: 13859, 14331
Period 306: 13825, 14366
Period 307: 13859, 14367
Period 308: 13786, 14283
Period 309: 13801, 14478
Period 310: 13822, 14618
Period 311: 13875, 14579
Period 312: 13894, 14550
Period 313: 13815, 14614
Period 314: 13799, 14589
Period 315: 13942, 14607
Period 316: 13925, 14698
Period 317: 13791, 14781
Period 318: 14047, 14764
Period 319: 14015, 14724
Period 320: 14107, 14746
Period 321: 14139, 14700
Period 322: 14194, 14615
Period 323: 14204, 14586
Period 324: 14180, 14653
Period 325: 14212, 14654
Period 326: 14291, 14761
Period 327: 14370, 14788
Period 328: 14430, 14864
Period 329: 14462, 14881
Period 330: 14448, 14885
Period 331: 14467, 14855
Period 332: 14567, 14835
Period 333: 14551, 14867
Period 334: 14683, 14853
Period 335: 14851, 14905
Period 336: 14915, 14937
Period 337: 14877, 14953
Period 338: 14872, 14995
Period 339: 14951, 15005
Period 340: 15034, 15034 <---
--------------------------------------------------
Transposition average: 13556.38
Untransposition average: 13853.61

Here follows a unigram distance frequency rollout for the 340.

Unigram distance frequencies:
--------------------------------------------------
Distance 1: 4
Distance 2: 1
Distance 3: 2
Distance 4: 2
Distance 5: 2
Distance 6: 1
Distance 7: 3
Distance 8: 4
Distance 9: 3
Distance 10: 4
Distance 11: 2
Distance 12: 3
Distance 13: 6
Distance 14: 6
Distance 15: 2
Distance 16: 5
Distance 17: 5
Distance 18: 3
Distance 19: 4
Distance 20: 3
Distance 21: 6
Distance 22: 2
Distance 23: 8
Distance 24: 7
Distance 25: 4
Distance 26: 5
Distance 27: 2
Distance 28: 4
Distance 29: 6
Distance 30: 2
Distance 31: 4
Distance 32: 5
Distance 33: 3
Distance 34: 3
Distance 35: 6
Distance 36: 3
Distance 37: 2
Distance 38: 1
Distance 39: 3
Distance 40: 2
Distance 41: 4
Distance 42: 4
Distance 43: 4
Distance 44: 1
Distance 45: 1
Distance 46: 3
Distance 47: 1
Distance 48: 4
Distance 49: 4
Distance 50: 2
Distance 51: 1
Distance 52: 3
Distance 55: 4
Distance 56: 2
Distance 57: 1
Distance 59: 3
Distance 60: 2
Distance 61: 1
Distance 62: 1
Distance 63: 1
Distance 65: 1
Distance 66: 3
Distance 67: 1
Distance 68: 1
Distance 69: 1
Distance 70: 2
Distance 71: 7
Distance 72: 1
Distance 73: 1
Distance 74: 2
Distance 75: 2
Distance 76: 3
Distance 77: 2
Distance 78: 4
Distance 80: 1
Distance 81: 1
Distance 82: 1
Distance 83: 2
Distance 84: 3
Distance 87: 1
Distance 89: 1
Distance 91: 1
Distance 92: 2
Distance 95: 1
Distance 98: 1
Distance 101: 2
Distance 102: 2
Distance 103: 3
Distance 108: 1
Distance 109: 1
Distance 110: 2
Distance 112: 2
Distance 113: 1
Distance 115: 1
Distance 120: 2
Distance 128: 1
Distance 129: 1
Distance 133: 1
Distance 137: 1
Distance 142: 1
Distance 147: 1
Distance 148: 1
Distance 150: 1
Distance 151: 1
Distance 156: 1
Distance 163: 1
Distance 170: 1
Distance 173: 1
Distance 175: 1
Distance 186: 1
Distance 187: 1
Distance 198: 1
Distance 207: 1
Distance 212: 1
Distance 218: 1
Distance 230: 1
Distance 241: 1
Distance 242: 1
Distance 251: 1

And the FreeBASIC code for the unigram distance statistic.

function m_unigramdistance(array()as integer,byval l as integer,byval s as integer)as double
	dim as integer i,j,score
	dim as short aud(s,l)
	for i=1 to l
		aud(array(i),0)+=1
		aud(array(i),aud(array(i),0))=i
	next i
	for i=1 to s
		for j=1 to aud(i,0)-1
			score+=aud(i,j+1)-aud(i,j)
		next j
	next i
	return score
end function

AZdecrypt

 
Posted : June 15, 2017 9:48 pm
(@largo)
Posts: 454
Honorable Member
 

Hi Jarlve,

I am not sure if the unigram distance is really curious. I think this depends only on the length of the used key and the underlying plaintext. For example take the first 20 lines from z408 and encode it with a longer key (e.g. the default one from peek-a-boo). Here is the result:

Plaintext:

ILIKEKILLINGPEOPL
EBECAUSEITISSOMUC
HFUNITIAMOREFUNTH
ANKILLINGWILDGAME
INTHEFORRESTBECAU
SEMANISTHEMOATDAN
GERTUEANAMALOFALL
TOKILLSOMETHINGGI
VESMETHEMOATTHRIL
LINGEXPERENCEITIS
EVENBETTERTHANGET
TINGYOURROCKSOFFW
ITHAGIRLTHEBESTPA
RTOFITIATHAEWHENI
DIEIWILLBEREBORNI
NPARADICEANDALLTH
EIHAVEKILLEDWILLB
ECOMEMYSLAVESIWIL
LNOTGIVEYOUMYNAME
BECAUSEYOUWILLTRY

Encoded (cyclic):

h:cjajCuU5AEYxXY:
39ybdro+hLc1z=;ve
FISqCw5D;Xn0Ir782
iRjhuUcAJlC:kEZ;P
5qspfI=GVgKN4abdv
Mx;D7hoLF3;XiwBZR
Jyt8S+dAD;iu=IZU:
sXjcuU1=;0N2CqEJ5
QPz;fLpg;Xdw8Fnh:
uc7Ea-YxG3ReyCs5K
+Q0A9PNLfVw2DqJg8
sh7ET=rtnXbjM=IIl
cNpiJCGULFa4xowYZ
V8XI5shdN2D3lpyRc
HC+5lh:u90tP4=nAc
qYiGZkCefd7BDU:LF
g52iQajhuUxHlc:u9
3bX;y;T1UZQ+zCl5:
uR=wEhQ0TXv;TAd;P
4feDSKgT=rlcU:8VT

The encoded version shows a very high unigram distance score of 15267.
What I find curious is the connection between unigram distance and transposition. If z340 is a transposed homophonic cipher then untransposing p15/19 or columns odd/even + p18 (which is basically a diagonal transposition) should rather increase the score than decreasing it.

 
Posted : June 16, 2017 1:18 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

I am not sure if the unigram distance is really curious. I think this depends only on the length of the used key and the underlying plaintext. For example take the first 20 lines from z408 and encode it with a longer key (e.g. the default one from peek-a-boo). Here is the result:

Good reasoning. Also, I was wrong that it did not correlate with sequential homophonic substitution. Though, the 340 does not have such a long key. And while trying to match the raw ioc and encoding randomness of the 340, the observation’s rarity may still be in between 1 in a 1.000 and 1 in 10.000. I would say it is at least still a bit curious.

In short, a lower ioc and more sequential homophonic substitution increase unigram distance. Randomizing the character order of your cipher many times gives an average unigram distance of about 13800 while for the 340 it is about 13302. You may normalize the ioc by dividing over the average unigram distance but that does not normalize the quality of the sequential homophonic substitution.

What I find curious is the connection between unigram distance and transposition. If z340 is a transposed homophonic cipher then untransposing p15/19 or columns odd/even + p18 (which is basically a diagonal transposition) should rather increase the score than decreasing it.

Well, since we generally assume transposition before encoding and if the unigram distance is some property of encoding then untransposing p19 should not increase it since that would disturb the encoding.

AZdecrypt

 
Posted : June 16, 2017 2:00 pm
(@largo)
Posts: 454
Honorable Member
 

Well, since we generally assume transposition before encoding and if the unigram distance is some property of encoding then untransposing p19 should not increase it since that would disturb the encoding.

Gosh! I’ve completely forget that! Next time I’ll reconsider my text before I post :D

 
Posted : June 16, 2017 2:10 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Another way to guarantee a high unigram distance is to start with what Largo calls a long key, or optimal suppression of frequencies and more than 63 symbols, say 70+, and merge symbols afterwards. I kinda like this one but it does not randomize symbol cycles by default and depending on the amount of symbols merged the cipher could still be solved.

Here is the base cipher with 75 symbols having a unigram distance of 16779.

37 32 48 3  24 3  49 44 46 65 5  55 35 67 33 7  2
69 9  70 58 40 74 27 4  1  42 37 29 62 66 15 12 10
36 17 51 31 48 73 49 13 25 75 30 24 52 74 43 14 39
50 59 3  65 28 32 1  5  6  19 37 44 20 55 56 53 67
48 31 42 47 69 17 8  63 72 70 27 73 26 4  34 11 12
29 24 15 40 43 49 62 14 57 67 25 33 13 42 45 50 59
6  69 18 66 51 27 56 5  65 53 11 46 75 52 40 2  28
73 8  3  1  32 44 29 33 15 70 14 16 37 31 55 6  48
21 4  62 25 24 42 36 67 53 66 13 73 14 39 30 49 46
2  65 43 55 69 22 35 70 63 4  59 58 24 1  42 37 27
67 38 69 5  60 70 73 14 4  72 42 47 50 31 6  24 73
14 48 43 55 61 75 74 18 30 8  10 3  29 33 17 52 19
49 42 57 56 6  65 63 28 73 16 67 71 69 62 14 7  11
72 42 66 17 1  73 37 13 14 36 40 42 19 39 70 59 48
64 49 4  65 19 1  32 44 9  24 18 67 26 75 30 5  37
31 35 50 63 56 20 48 34 69 11 43 45 40 46 2  73 47
70 49 57 50 54 4  3  65 28 32 24 64 19 1  44 46 60
67 58 8  15 69 25 68 27 2  56 21 70 29 37 19 48 28
32 59 33 14 55 49 38 4  23 66 12 53 41 5  11 15 24
71 67 10 40 51 62 69 61 75 74 19 65 44 46 42 72 68

And here is the base cipher after merging to 63 symbols with a unigram distance of 15228, a tad higher than the 340.

64 32 48 3  68 3  47 44 46 65 5  55 35 23 25 7  2
54 9  70 58 40 74 29 4  1  25 64 29 62 66 15 12 10
36 17 51 31 48 73 47 13 25 75 30 68 52 74 43 14 39
50 25 3  65 28 32 1  5  6  19 64 44 20 55 56 53 23
48 31 25 47 54 17 8  63 72 70 29 73 26 4  34 11 12
29 68 15 40 43 47 62 14 57 23 25 25 13 25 45 50 25
6  54 18 66 51 29 56 5  65 53 11 46 75 52 40 2  28
73 8  3  1  32 44 29 25 15 70 14 16 64 31 55 6  48
21 4  62 25 68 25 36 23 53 66 13 73 14 39 30 47 46
2  65 43 55 54 22 35 70 63 4  25 58 68 1  25 64 29
23 38 54 5  38 70 73 14 4  72 25 47 50 31 6  68 73
14 48 43 55 71 75 74 18 30 8  10 3  29 25 17 52 19
47 25 57 56 6  65 63 28 73 16 23 71 54 62 14 7  11
72 25 66 17 1  73 64 13 14 36 40 25 19 39 70 25 48
64 47 4  65 19 1  32 44 9  68 18 23 26 75 30 5  64
31 35 50 63 56 20 48 34 54 11 43 45 40 46 2  73 47
70 47 57 50 54 4  3  65 28 32 68 64 19 1  44 46 38
23 58 8  15 54 25 68 29 2  56 21 70 29 64 19 48 28
32 25 25 14 55 47 38 4  23 66 12 53 68 5  11 15 68
71 23 10 40 51 62 54 71 75 74 19 65 44 46 25 72 68

AZdecrypt

 
Posted : June 30, 2017 10:21 am
marie
(@marie)
Posts: 189
Estimable Member
 

Additional observations on unigram distances and locations from from some work I have been doing is the "middle" of the 340 seems to have anomalies in it besides the pivots and the only time you see an A-B-A type pattern (+ b + on line 9).

I find it interesting that if you attempt to divide the 340 into thirds by lines, say first 7, second 7, last 6- the middle 7 lines only use 50!?!?! of the 63 possible symbol choices with two unique to those lines, the backwards B (b) appears 3 times and is the square with the dot in it appears once. They do not appear in the first 7 or last 6 lines. I don’t know if limiting the number of symbols increases the likelihood of getting the pivot patterns?

Also, There are 11 symbols that appear in the top 7 and/or bottom 6 lines.
W- 6 times
C- 5 times
)- 5 times (circle with horizontal line)
k- 5 times
>- 4 times
P- 3 times
H- 3 times
1- 3 times (bottom shaded circle)
:- 2 times (weird I with dot on right)’
X- 2 times
%- 2 times* (right shaded square) *only appears in top

I would expect with a truly flat "homophonic" distribution, each symbol should appear ~5 times. The unigrams clearly are not distributed evenly dependent on the underlying text. Perhaps this could provide insight into the enciphering method or transposition methods, including which came first- enciphering or transposing. My guess is this suggests enciphering though I am not yet sold on homophonicism… (can I make up a word?).

Final note, 11 of the 24 + symbols are in the middle 7 lines. It may not be statistically relevant, but just another observation.

-marie

The problem when solved will be simple– Kettering

 
Posted : October 2, 2017 3:11 pm
Quicktrader
(@quicktrader)
Posts: 2598
Famed Member
 

Imo it still might be a homophone substitution but without repeating sequences (instead at least partially sectional use of homophones). More symbols (not only letters) than alphabetical letters somehow points to that conclusion as you couldn’t read any cleartxt consisting of symbols instead of letters..

The ) might be part of the E due to its high frequency in some parts of the cipher.

QT

*ZODIACHRONOLOGY*

 
Posted : October 2, 2017 4:52 pm
marie
(@marie)
Posts: 189
Estimable Member
 

Might, what a great word. Could be. E is a confusion, and w/o knowing plaintext. I just see too much fumbling to make it true.

JMHO.
-m

The problem when solved will be simple– Kettering

 
Posted : October 2, 2017 5:42 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Additional observations on unigram distances and locations from from some work I have been doing is the "middle" of the 340 seems to have anomalies in it besides the pivots and the only time you see an A-B-A type pattern (+ b + on line 9).

Related to the top-down symmetry of these symbols:

AZdecrypt

 
Posted : October 2, 2017 10:13 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

This is an interesting topic and I did some shuffle studies. I ran each cipher (z408, z340, largo’s test cipher, and jarve’s two test ciphers) through ten million shuffles and generated stats for unigram distance:

Headings:
– Min: Smallest unigram distance observed over all shuffles
– Max: Largest unigram distance observed over all shuffles
– Mean: Average unigram distance observed over all shuffles
– Median: Median of all unigram distances observed over all shuffles
– Std dev: Standard deviation of unigram distance observed over all shuffles
– Actual: The actual unigram distance observed for the given cipher
– Sigma: Number of standard deviations the observed unigram distance is from the mean unigram distance over all shuffles
– Hits: Number of times during shuffles that a unigram distance was observed having an equal or better score than the measurement for the unshuffled cipher
– Shuffles per hits: Average number of shuffles needed to achieve the same or better measurement as the actual measurement

So, you can see that z340’s unigram distance is much more significant than z408’s unigram distance. Largo’s is better than z408’s but not as significant as z340. And Jarlve’s two ciphers both have very improbable unigram distances since their measurements were never reached during shuffles.

I wonder: If some step has diffused/muted the cycling effect (but not eliminated it altogether), would correctly unravelling the step result in higher cycle scores AND higher (or more statistically significant) unigram distances? It seems like it is easy to find rearrangements of Z340 that significantly affect the cycling scores. An example I plan to give in my talk next week is this simple swap of rows:

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|

One of my measurements (a computation of the average statistical significance of individual cycles) gives this rearrangement a score that is double that of the unmodified Z340. But unigram distance is 14396 which is a little smaller than that for the unmodified Z340. Maybe because this rearrangement is a false positive. Could unigram distance significance give us a way to filter out false positives?

http://zodiackillerciphers.com

 
Posted : October 11, 2017 4:32 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

Are you sure about the above rearrangement of rows? Because I tried it and got very slightly lower cycle score overall.

 
Posted : October 12, 2017 4:10 am
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Thanks for checking. I think the reason the modified cipher scores highly for me is because its cycles are more statistically significant, rather than simply being more numerous.

Here’s my methodology:
1) Generate stats for a bunch of shuffles, for all cycles of length 2.
2) Stats are collected for the "runs" of cycles. Example: "AAA AB AB AB BAB" has a max run length of 3 because of [AB] [AB] [AB].
3) In the shuffles, average and standard deviation of max runs are computed for every pair of symbols.
4) Then, actual measurements of a target cipher’s cycles are compared to the shuffle stats.
5) The target cipher will have observed runs for each pair of symbols. The observed runs are compared to the mean during shuffles. Then I count how many standard deviations (sigma) the observed measurement is from the average runs observed during shuffles.
6) For each symbol pair, I compute sigma. So, this gives a relative sense of the significance of each observed cycle.
7) Then I compute the average of all the observed sigma values (over all symbol pairs).

I believe using standard deviation from the shuffle stats has a normalization effect on the cycle measurements. This helps filter out some of the noise of false cycles.

Computed this way, the row-swapped Z340 has about double the average sigma as the unmodified Z340. You can see this looking at individual cycles, where the overall relative probabilities are, on average, less than those of the Z340. Here is the raw output of cycles of both ciphers for comparison:

https://docs.google.com/spreadsheets/d/ … sp=sharing

It shows all L2 cycles detected for both ciphers, side by side, in decreasing order by estimated probability. When you scroll down, you’ll notice an emerging trend for cycles to become more improbable in the row-swapped Z340. Look at the "How much more improbable" column. Positive values indicate an increase in improbability. There are also 172 extra cycles in the row-swapped Z340 compared to the original.

So, I guess this boils down to a philosophical question: Can a cipher be considered "more" homophonic if it doesn’t really show that many more long cycles BUT has more improbable cycles overall?

All I know is that the "average sigma" measurement scores very high for Z408 compared to Z340 (0.68 vs 0.21).

http://zodiackillerciphers.com

 
Posted : October 12, 2017 6:01 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Good luck with your talk next week doranchak!

Thank you for the test, the results are very clear. Rearrangement of rows is a possible cause for the high unigram distance in the 340 as it could either decrease or increase the value. Your test suggests that the value of the 340 is however quite rare and therefore more unlikely to have been caused by rearrangement of rows, right? Your example cipher decreases the unique sequence length 17 repeats from 26 to 18 and since we know that is a very significant observation perhaps you could use it to filter out false positives.

Here are my 2 most likely hypotheses for the high unigram distance in the 340:

1. The high unigram distance of the 340 is related to the group of symbols that do not appear in the middle 7 rows (as marie said).
2. A long key (as Largo said) is used in the homophonic substitution process and some of the most frequently occuring symbols are for whatever reason not part it or are wildcards. Likely hypotheses for the symbols that were not taken up in the homophonic substitution or are wildcards are that these are 1:1 substitutes, plaintext nulls or wildcards (as smokie said). I would like to go as wide as possible with the interpretation of wildcards.

In your test my cipher jarlve2 very closely matches the 340 which is hypothesis 2. Some things to look in may be exotic cycling types (again) such as palindromic cycling which also increases unigram distance and not trying to repeat symbols in a certain view window as opposed to actively cycling homophones.

AZdecrypt

 
Posted : October 12, 2017 3:01 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

Thanks for the explanation about how you score the cycles.

I re-drafted it into 5 columns, and you can really see it. Tonight I will re-draft into different column counts to look for a pattern. Then I will explore in other ways.

 
Posted : October 13, 2017 3:21 am
(@mr-lowe)
Posts: 1197
Noble Member
 

hi smokie nice work. maybe mark up the pivots in a separate spreadsheet for any visual patterns as well

 
Posted : October 13, 2017 4:43 am
Page 1 / 9
Share: