Homophonic substitution

Jarlve · 2015-08-02T16:42:57Z

This thread is a continuation of viewtopic.php?f=81&t=267 in which several aspects of the Zodiac 340 cipher are discussed and researched. I'd like to continue the work from there in this thread because then I can use the main post to reference and update all the cipher material being discussed. Some of the questions which the contributors are trying to answer: - Is the 340 a straightforward homophonic substitution cipher or is there something else going on? - The 340 does not seem to cycle as well as the 408, what is going on? (doranchak:... _sequences) - To what extent is the 340 cyclic or random? Can we find areas - as for instance with the last part of the 408 - that are more random? - Is it possible to attribute the 340 not cycling as well as the 408 (despite its higher symbol count) due to some transposition after encoding? - Some of the medium-high count symbols do not seem to cycle well, are these possibly wildcards/polyalphabetic or 1:1 substitutes? (smokie treats) - Can we make a system that can adequately group homophones that belong to the same letter without having to solve the cipher? (smokie treats, glurk) - Is there a discrepancy between symbols/cycles/etc on odd and even positions for the 340? If so, what could be causing this? (daikon, doranchak, smokie treats) - There is a significant bigram repeat peak at period 19, is this a lead to the encryption scheme of the 340? (daikon) Related: 2 symbol cycle analysis for the 340 evens only. (doranchak) 2 symbol cycle analysis for the 340 odds only. (doranchak) Symbol position factors for the 340, 408 and smokie ciphers. (doranchak) 340 cipher numeric and symbolic version: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 5 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 20 34 35 36 37 19 38 39 15 26 21 33 13 22 40 1 41 42 5 5 43 7 6 44 30 8 45 5 23 19 19 3 31 16 46 47 37 19 40 48 49 17 11 50 51 9 19 52 53 10 54 5 44 3 7 51 6 23 55 30 17 56 10 51 4 16 25 21 22 50 19 31 57 24 58 16 38 36 59 15 8 28 40 13 11 21 15 16 41 32 49 22 23 19 46 18 27 40 19 60 13 47 17 29 37 19 61 19 39 3 16 51 20 36 34 62 63 53 31 55 40 6 38 8 19 7 41 19 23 5 43 29 51 20 34 55 38 19 3 54 50 48 2 11 25 27 20 5 61 14 37 31 23 16 29 36 6 3 41 11 30 50 14 53 37 28 19 52 20 51 40 63 47 42 34 22 19 18 11 50 51 20 36 21 58 44 3 6 15 51 18 7 32 50 16 53 61 28 36 8 53 48 19 19 34 20 59 12 30 35 53 47 56 2 4 8 38 39 50 55 19 11 36 28 45 40 20 31 21 23 5 7 28 32 37 57 15 16 3 36 14 19 13 12 63 56 29 19 51 6 26 20 11 33 13 19 19 33 26 56 40 26 36 9 23 42 1 14 54 21 33 5 11 51 10 17 26 29 43 48 20 46 27 23 20 30 55 56 36 4 37 25 1 18 5 10 42 40 39 23 44 62 11 31 58 19 HER>pl^VPk|1LTG2d Np+B(#O%DWY.<*Kf) By:cM+UZGW()L#zHJ Spp7^l8*V3pO++RK2 _9M+ztjd|5FP+&4k/ p8R^FlO-*dCkF>2D( #5+Kq%;2UcXGV.zL| (G2Jfj#O+_NYz+@L9 d<M+b+ZR2FBcyA64K -zlUV+^J+Op7<FBy- U+R/5tE|DYBpbTMKO 2<clRJ|*5T4M.+&BF z69Sy#+N|5FBc(;8R lGFN^f524b.cV4t++ yBX1*:49CE>VUZ5-+ |c.3zBK(Op^.fMqG2 RcT+L16C<+FlWB|)L ++)WCzWcPOSHT/()p |FkdW<7tB_YOB*-Cc >MDHNpkSzZO8A|K;+ Alterations of the 340: - In relation to the bigram peak at period 19: Scheme: move 1 row down, 2 columns right and repeat (wrap around cipher): 340_1rd-2cr-w.txt (doranchak) Grid 19 by 18, direction North-East (vertical) and 2 "?" symbols added: 340_19by18_n-e.txt Grid 20 by 17, direction SW-SE (diagonal): 340_20by17_sw-se.txt Grid 17 by 19, 17 symbols filler at end, vertically untransposed: 340_323_17.txt (smokie treats) Grid 17 by 20, 16 symbols filler at end, vertically untransposed: 340_324_16.txt (smokie treats) Grid 17 by 20, 15 symbols filler at end, vertically untransposed: 340_325_15.txt (smokie treats) Grid 17 by 20, 14 symbols filler at end, vertically untransposed: 340_326_14.txt (smokie treats) Grid 17 by 20, 13 symbols filler at end, vertically untransposed: 340_327_13.txt (smokie treats) - In relation to the odd/even encoding scheme: Evens only: 340evens.txt Odds only: 340odds.txt Randomized, shuffled: 340shuffled.txt (doranchak) Tools/links/solvers: - David Oranchak Zodiac Killer Ciphers:Zodiac Ciphers wiki:... =Main_Page CryptoScope:340 Webtoy:Zodiac Pattern Drawer:| (info) Word Search Gadget:- glurk ZKDecrypto:and viewtopic.php?f=81&t=2268 - Michael Cole The Zodiac Revisited:- Jarlve AZdecrypt:Visualizations: - In relation to the bigram peak at period 19 and 15 (mirrored 340): Doranchak's ngram viewer. Doranchak's period calculator. Doranchak's fragment explorer. Test ciphers: I'd like to introduce a whole new range of ciphers to test on, mainly being homophonic substitution but with different schemes. More will be added and particular schemes can be requested. All of these ciphers can have low count 1:1 substitutes. Please use the proper names of the ciphers when referencing them. There should be no errors in these ciphers but the number of homophones per letter were handpicked each time to introduce a human element. Perfect cycles: c_p1.txt c_p2.txt c_p3.txt Randomization of cycles: (the numb...

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Mr. Lowe, I didn’t know that you were out there working on this. This message has multiple cycled keys which are all different, but I am not telling how many yet. For instance, the first symbol in the message I encoded with Key 1. The second symbol in the message I encoded with Key 2. And so on. Almost all of the symbols in the message represent more than one letter, and it will not solve in an auto solver as presented. The keys are different and the message has to be modified by separating it into different numbers of "parts," which Jarlve calls them. For example, with a message that has two keys, there are symbols encoded with Key 1, the "odds" and symbols encoded with Key 2, the "evens." Jarlve calls those Part 1 and Part 2. Determine how many parts there are with cycle scores. Then compare the symbols in each part. Find the part that has the most number of mutually exclusive symbols and keep that one the way that it is. Then expand or change the symbols in the other part(s) so that they are all count of 1. You would have a lot of unique symbols, which would increase the multiplicity. Then feed it to an auto solver, probably Jarlve’s, and see if it will solve.

Jarlve, take your time. I wouldn’t be surprised if you or someone else blows this one out very fast though. I am not sure.

By the way, I found the exercise very instructive. I did it by hand. If you make a key and encode by hand, putting a check mark by the symbols as you go, you get a real feel for how Zodiac must have encoded the symbols. I don’t know if he used multiple keys, and would like to dispense with that idea if possible.

But with the 408 and I suspect with the 340 he started out with perfect cycles that become more random as he went. My theory is that there is a very simple reason for that. He wanted to make sure that he used all of the symbols in a cycle at least once. Once all were checked off a few times, he started to randomize more and more. With high frequency letters, he would have had a lot of check marks on the symbols. There would have been so many that it would have been difficult to keep track of the exact order of symbol selection.

If his key was written on a small piece of paper, or a bit condensed, that would explain the randomization because the check marks would start to be very difficult to count and keep track of with high frequency letter symbols. They would have been very clustered together. The only way to cycle symbols perfectly is to make a big key, with the symbols spread out away from each other so that there is room for a nice, neat line of check marks that are easy to count. That would have taken a lot more effort. Thus the increased randomization of the cycles in the second half.

I am going to have to make a table or graph maybe of cycle score totals row by row. Something to see if there is a trend or a big change somewhere to try to show more conclusive evidence of this.

Posted : September 7, 2015 12:56 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

Okay I’ll start with some analysis first because trying to solve a cipher that needs expansion may take hours or more so I can’t blindly start expanding stuff. I hope to find out what is going on. You already gave some hints so.

Unique string frequencies (non-repeats):

Possible discrepancy between normal and horizontally mirrored. At first glance smokie7 seems quite a bit less cyclic than the 340. IoC is lower than the 340 which means a flatter distribution of symbols.

Bigram distribution over different orientations/directions:
Horizontals: 25.75%
Verticals: 27.34%
Diagonals 1: 21.09%
Diagonals 2: 25.78%

Bigrams are equally spread over the directions, typically there should be a 30-40%+ bump for the horizontals. Horizontal bigrams are not hiding at higher periods.

Row-repeats:

Some lines have quite a bit more repeats than others, not sure if anything. Only 5 rows have no repeats which adds to the smokie7 appearing less cyclic than the 340.

Next up will be parts analysis with my new cycle measurement.

AZdecrypt

Posted : September 7, 2015 1:15 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

Cycle analysis for smokie7 (summed unique string frequencies versus cycle measurement):
(summed unique string frequencies a.k.a. non-repeats are normalized over the character count with 340 as base)

Splits, and other stuff:

Full: 3675 / 140
Mirrored: 4107 / 161

Uneven rows flipped: 3525 / 151
Even rows flipped: 3627 / 147

Uneven rows only: 3394 / 119
Even rows only: 3966 / 150

1st half: 3486 / 105
2nd half: 3860 / 129

Character 1-113: 3198 / 98
Character 114-226: 3758 / 125
Character 227-340: 3835 / 108

Rows 1-5: 3000 / 93
Rows 6-10: 3552 / 105
Rows 11-15: 3520 / 105
Rows 16-20: 3868 / 107

Parts
(a 2 two parts would be one part sitting on uneven symbols and another on even symbols and so on)

Part 1: 3310 / 118
Part 2: 3734 / 113

Part 1: 4169 / 170
Part 2: 6526 / 155
Part 3: 5304 / 163

Part 1: 3644 / 101
Part 2: 2932 / 115
Part 3: 3472 / 120
Part 4: 3508 / 131

Wow this is fun!

Strangely mirrored does better than normal, can’t explain this yet.

It’s looking quite clear that it’s a 3 part message since it yield the strongest returns, if not then something really flukey is going on. Sadly this is all I can do for today, tomorrow I’ll start with some cipher score return tests from the solver to see if something can be learned from that. I don’t want to break the cipher immediately, I’m aiming to improve my information systems so that it may carry over to the 340.

AZdecrypt

Posted : September 7, 2015 1:56 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Jarlve and Mr. Lowe, don’t get too excited about expanding symbols quite yet. I am afraid that the best we may be able to do on this project is identify if maybe Zodiac used multiple keys. If you have a three part message, you will have to expand the not mutually exclusive symbols in two of those parts. Multiplicity will be too high. One important factor is the total count of symbols in each part that is not mutually exclusive.

EDIT: You expanded to 148 symbols with m5p1, but that was a bit different. I think that you only used one key, and randomized every fourth symbol starting with the second symbol.

With the 340 odds and evens:

Shared number of symbols in Part 1 and Part 2 = 49.

Part 1 count of shared symbols (the number to expand) = 139.
Part 1 number of mutually exclusive symbols = 9.

Part 2 count of shared symbols (the number to expand) = 158.
Part 2 number of mutually exclusive symbols = 5.

So it seems to me that to minimize multiplicity, I would have to add:

139 shared and expanded symbols from Part 1
9 mutually exclusive symbols from Part 1
49 shared but not expanded symbols from Part 2
5 mutually exclusive symbols from Part 2
———————————————————-
= 202 unique symbols / 340 = 0.594 multiplicity

I think that this logic is correct, and I may have to just limit my work to trying to make a message with multiple keys that mimics 340 stats. If you see things differently, let me know. EDIT: The logic was incorrect. See Jarlve’s post below for explanation about expanding and multiplicity.

Posted : September 7, 2015 5:22 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

There is a difference between parts with different keys and wildcards or randomized symbols.

We need to create an entirely new set of symbols for wildcards or randomized symbols because the original mapping has been destroyed. But for parts with different keys we can keep the symbols because they map correctly for each part, so I suggest we just add a numeric value to set it apart. There’s 58 symbols in the uneven part and 54 symbols for the even part. That should give us no more than 112 symbols.

Here is the 340 numbered by appearance, just add 100 to to the uneven or even part. 112 symbols, multiplicity 0.32.

101 2   103 4   105 6   107 8   109 10  111 12  113 14  115 16  117
18  105 19  120 21  122 23  124 25  126 27  128 29  130 31  132 33
120 34  135 36  137 19  138 39  115 26  121 33  113 22  140 1   141
42  105 5   143 7   106 44  130 8   145 5   123 19  119 3   131 16
146 47  137 19  140 48  149 17  111 50  151 9   119 52  153 10  154
5   144 3   107 51  106 23  155 30  117 56  110 51  104 16  125 21
122 50  119 31  157 24  158 16  138 36  159 15  108 28  140 13  111
21  115 16  141 32  149 22  123 19  146 18  127 40  119 60  113 47
117 29  137 19  161 19  139 3   116 51  120 36  134 62  163 53  131
55  140 6   138 8   119 7   141 19  123 5   143 29  151 20  134 55
138 19  103 54  150 48  102 11  125 27  120 5   161 14  137 31  123
16  129 36  106 3   141 11  130 50  114 53  137 28  119 52  120 51
140 63  147 42  134 22  119 18  111 50  151 20  136 21  158 44  103
6   115 51  118 7   132 50  116 53  161 28  136 8   153 48  119 19
134 20  159 12  130 35  153 47  156 2   104 8   138 39  150 55  119
11  136 28  145 40  120 31  121 23  105 7   128 32  137 57  115 16
103 36  114 19  113 12  163 56  129 19  151 6   126 20  111 33  113
19  119 33  126 56  140 26  136 9   123 42  101 14  154 21  133 5
111 51  110 17  126 29  143 48  120 46  127 23  120 30  155 56  136
4   137 25  101 18  105 10  142 40  139 23  144 62  111 31  158 19

AZdecrypt

Posted : September 8, 2015 12:52 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

O.k. you are right. I was being a Zodiac but that helps me feel better about smokie7 and I will edit that post soon. I hope that it can be solved. I have been thinking also about a new technique for making messages.

Posted : September 8, 2015 1:36 am

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

I’m not going to try to solve the smokie7 because it is too difficult (will make attempts when my solver improves). Don’t worry about it, I think it’s far more important to be able to figure out what is going on with a cipher than solving it right away (ask questions first, shoot later). Do we have strong leads that the Zodiac 340 cipher is a 2/3/4 keyed (even/uneven/every n’th) part cipher?

By the way, is my analysis for smokie7 correct that it’s a 3 part keyed cipher? If we can figure out a couple of these then perhaps we could also figure out if anything like this is actual in the 340. If not then we can move on to other polyalphabetic schemes.

I’m in process of creating a cipher for you but I need to make adaptations to my software so it might take a few days. By the way, if you make another cipher for me don’t tell me anything about it, but let me make a cipher for you first.

AZdecrypt

Posted : September 8, 2015 1:14 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Yes, smokie7 has three keys. With both smokie6 and smokie7, where I cycled the symbols, your stats showed that as soon as you broke the message down into the right number of parts. Your 340 numbers did not show that. Actually I should probably do a table/ line up of messages soon to compare with the 340, with my smokie7 numbers included. Go ahead and make another one, and I will make another one that I have plans for. Take your time. I will start on a simple table, and we can fill in stats for the next two messages. Then we can decide if Zodiac may have used more than one key or done something similar to one of the recent messages. Thanks.

Posted : September 8, 2015 3:11 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

Have fun: m6p28.txt

I will start on a simple table, and we can fill in stats for the next two messages. Then we can decide if Zodiac may have used more than one key or done something similar to one of the recent messages.

Excellent.

AZdecrypt

Posted : September 9, 2015 1:50 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

O.k., here is smokie8, where I attempted to emulate 340 cycle stats using multiple keys:

50 45 43 50 1 57 36 33 41 61 44 10 31 51 32 12 46
55 23 7 25 56 11 55 51 18 12 58 24 48 16 52 45 34
29 13 20 60 16 39 44 15 50 59 28 62 63 54 8 22 22
36 13 46 37 48 17 36 26 56 15 32 6 26 38 30 23 32
52 44 1 15 11 57 39 29 9 5 12 19 13 20 10 30 17
50 8 59 49 43 47 27 16 8 33 23 24 34 25 46 21 51
9 47 36 36 8 6 18 40 7 44 42 24 48 15 15 16 46
52 9 32 40 14 20 22 29 47 3 12 10 21 4 58 53 41
57 60 49 63 57 54 50 46 1 38 45 7 10 36 55 60 22
32 54 18 12 14 19 41 19 2 15 55 31 36 34 56 17 43
3 15 41 52 8 45 4 29 9 48 56 36 46 24 16 55 26
61 31 36 35 56 15 22 11 63 19 34 49 62 33 55 44 9
5 38 1 18 58 46 22 58 44 49 46 10 21 60 62 48 1
34 29 28 33 26 34 50 4 32 24 50 18 51 44 16 40 8
34 56 13 55 30 38 27 60 44 62 22 34 10 47 4 38 35
18 10 47 29 63 15 46 49 48 51 1 44 45 4 7 26 32
17 13 36 9 6 18 13 56 16 60 23 13 11 4 10 33 22
32 12 55 53 44 13 18 15 50 44 1 49 18 55 43 11 16
12 55 31 24 34 56 45 9 49 45 15 56 51 45 44 27 55
47 20 13 21 54 6 10 57 50 19 32 15 34 38 37 60 61

I had to get through this so that I could get to the table analysis, which may take some time.

Posted : September 9, 2015 7:24 am

Mr lowe

(@mr-lowe)

Posts: 1197

Noble Member

thought of the day. unless you already thunked it.. Whilst we are discussing two part or three part cipher possibilities’. the end of each part `may` have filler.. so in effect two or three lines could have filler in them so as to start the code off again in a next line reading from right to left standard format so as not to give away its a new cipher… This would add to its complexity.

hope ya get what im thinking..

Posted : September 9, 2015 8:36 am

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

@Mr lowe, that is a possibility.

@smokie, thanks for another cipher.

Cycle analysis for smokie8:

Unique string frequencies:

Flat top?

Encoding direction assessment:

Full: 4037 / 196
Mirrored: 3900 / 172
Uneven rows flipped: 3644 / 176
Even rows flipped: 3948 / 195

Seems to be encoded normally (left-to-right) but perhaps some discrepancies.

Order of encoding, polyalphabetic and further analysis:

Rows and columns assessment:

Uneven rows: 3790 / 120
Even rows: 3868 / 123

Rows +3 starting from 1: 2988 / 120
Rows +3 starting from 2: 3545 / 111
Rows +3 starting from 3: 3413 / 109

Rows +4 starting from 1: 3456 / 109
Rows +4 starting from 2: 3416 / 134
Rows +4 starting from 3: 2896 / 110
Rows +4 starting from 4: 4892 / 137

Rows, uneven pairs: 3750 / 164
Rows, even pairs: 3636 / 116

Uneven columns: 3847 / 188
Even columns: 3302 / 126

Columns 1-9: 3717 / 121
Columns 10-17: 3393 / 135

Uneven columns and uneven paired rows are quite high, possible outliers.

Division parts assessment:

1st half: 3938 / 140
2nd half: 3952 / 163

Character 1-113: 4022 / 126
Character 114-226: 4173 / 123
Character 227-340: 3784 / 147

Rows 1-5: 3844 / 114
Rows 6-10: 3384 / 104
Rows 11-15: 3998 / 127
Rows 16-20: 3652 / 135

Looks fairly normal though rows 6-10 don’t seem to cycle very well.

Interval (interlaced) parts assessment:

Part 1: 3882 / 141
Part 2: 3560 / 145

Part 1: 3203 / 111
Part 2: 3204 / 128
Part 3: 3724 / 108

Part 1: 3088 / 130
Part 2: 3616 / 127
Part 3: 3292 / 106
Part 4: 3224 / 113

Numbers are weaker than the division table so it’s making it worse.

So far I can’t say much yet, although one of my other measurement shows that some of the symbols seem to prefer sitting on either even or uneven positions. It may be a lead worth looking into. Don’t tell me anything, I’m going to dig deeper but will need some time.

AZdecrypt

Posted : September 9, 2015 12:16 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I like how you break down your assessments into direction, division parts and interval parts. It is high quality work. I have learned a lot from this recent project and am thinking about how I may conclude. I will begin to assemble the table/ data summary tonight.

Posted : September 9, 2015 11:47 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

Thanks,

I added another section to my previous post (rows and columns) because I didn’t want to miss anything obvious. You don’t have to include everything in your table just yet but some general categories in which to classify things under would be nice. I’m going the extra mile because when we get back to the 340 I don’t want to leave a stone unturned.

Tomorrow I’ll make a start on a symbol analysis for the smokie8.

AZdecrypt

Posted : September 10, 2015 12:38 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Jarlve, I am having a lot of fun with this project. So far I have a nice simple format set up for the tables and data filled in for smokie8, which I made by hand and caused me to have some new thoughts. You are thorough and I am very interested in what you come up with for symbol analysis. I want to do more tonight, but am short on sleep and feeling very tired.

Posted : September 10, 2015 4:29 am

Zodiac Discussion Forum