Homophonic substitution

Jarlve · 2015-08-02T16:42:57Z

This thread is a continuation of viewtopic.php?f=81&t=267 in which several aspects of the Zodiac 340 cipher are discussed and researched. I'd like to continue the work from there in this thread because then I can use the main post to reference and update all the cipher material being discussed. Some of the questions which the contributors are trying to answer: - Is the 340 a straightforward homophonic substitution cipher or is there something else going on? - The 340 does not seem to cycle as well as the 408, what is going on? (doranchak:... _sequences) - To what extent is the 340 cyclic or random? Can we find areas - as for instance with the last part of the 408 - that are more random? - Is it possible to attribute the 340 not cycling as well as the 408 (despite its higher symbol count) due to some transposition after encoding? - Some of the medium-high count symbols do not seem to cycle well, are these possibly wildcards/polyalphabetic or 1:1 substitutes? (smokie treats) - Can we make a system that can adequately group homophones that belong to the same letter without having to solve the cipher? (smokie treats, glurk) - Is there a discrepancy between symbols/cycles/etc on odd and even positions for the 340? If so, what could be causing this? (daikon, doranchak, smokie treats) - There is a significant bigram repeat peak at period 19, is this a lead to the encryption scheme of the 340? (daikon) Related: 2 symbol cycle analysis for the 340 evens only. (doranchak) 2 symbol cycle analysis for the 340 odds only. (doranchak) Symbol position factors for the 340, 408 and smokie ciphers. (doranchak) 340 cipher numeric and symbolic version: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 5 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 20 34 35 36 37 19 38 39 15 26 21 33 13 22 40 1 41 42 5 5 43 7 6 44 30 8 45 5 23 19 19 3 31 16 46 47 37 19 40 48 49 17 11 50 51 9 19 52 53 10 54 5 44 3 7 51 6 23 55 30 17 56 10 51 4 16 25 21 22 50 19 31 57 24 58 16 38 36 59 15 8 28 40 13 11 21 15 16 41 32 49 22 23 19 46 18 27 40 19 60 13 47 17 29 37 19 61 19 39 3 16 51 20 36 34 62 63 53 31 55 40 6 38 8 19 7 41 19 23 5 43 29 51 20 34 55 38 19 3 54 50 48 2 11 25 27 20 5 61 14 37 31 23 16 29 36 6 3 41 11 30 50 14 53 37 28 19 52 20 51 40 63 47 42 34 22 19 18 11 50 51 20 36 21 58 44 3 6 15 51 18 7 32 50 16 53 61 28 36 8 53 48 19 19 34 20 59 12 30 35 53 47 56 2 4 8 38 39 50 55 19 11 36 28 45 40 20 31 21 23 5 7 28 32 37 57 15 16 3 36 14 19 13 12 63 56 29 19 51 6 26 20 11 33 13 19 19 33 26 56 40 26 36 9 23 42 1 14 54 21 33 5 11 51 10 17 26 29 43 48 20 46 27 23 20 30 55 56 36 4 37 25 1 18 5 10 42 40 39 23 44 62 11 31 58 19 HER>pl^VPk|1LTG2d Np+B(#O%DWY.<*Kf) By:cM+UZGW()L#zHJ Spp7^l8*V3pO++RK2 _9M+ztjd|5FP+&4k/ p8R^FlO-*dCkF>2D( #5+Kq%;2UcXGV.zL| (G2Jfj#O+_NYz+@L9 d<M+b+ZR2FBcyA64K -zlUV+^J+Op7<FBy- U+R/5tE|DYBpbTMKO 2<clRJ|*5T4M.+&BF z69Sy#+N|5FBc(;8R lGFN^f524b.cV4t++ yBX1*:49CE>VUZ5-+ |c.3zBK(Op^.fMqG2 RcT+L16C<+FlWB|)L ++)WCzWcPOSHT/()p |FkdW<7tB_YOB*-Cc >MDHNpkSzZO8A|K;+ Alterations of the 340: - In relation to the bigram peak at period 19: Scheme: move 1 row down, 2 columns right and repeat (wrap around cipher): 340_1rd-2cr-w.txt (doranchak) Grid 19 by 18, direction North-East (vertical) and 2 "?" symbols added: 340_19by18_n-e.txt Grid 20 by 17, direction SW-SE (diagonal): 340_20by17_sw-se.txt Grid 17 by 19, 17 symbols filler at end, vertically untransposed: 340_323_17.txt (smokie treats) Grid 17 by 20, 16 symbols filler at end, vertically untransposed: 340_324_16.txt (smokie treats) Grid 17 by 20, 15 symbols filler at end, vertically untransposed: 340_325_15.txt (smokie treats) Grid 17 by 20, 14 symbols filler at end, vertically untransposed: 340_326_14.txt (smokie treats) Grid 17 by 20, 13 symbols filler at end, vertically untransposed: 340_327_13.txt (smokie treats) - In relation to the odd/even encoding scheme: Evens only: 340evens.txt Odds only: 340odds.txt Randomized, shuffled: 340shuffled.txt (doranchak) Tools/links/solvers: - David Oranchak Zodiac Killer Ciphers:Zodiac Ciphers wiki:... =Main_Page CryptoScope:340 Webtoy:Zodiac Pattern Drawer:| (info) Word Search Gadget:- glurk ZKDecrypto:and viewtopic.php?f=81&t=2268 - Michael Cole The Zodiac Revisited:- Jarlve AZdecrypt:Visualizations: - In relation to the bigram peak at period 19 and 15 (mirrored 340): Doranchak's ngram viewer. Doranchak's period calculator. Doranchak's fragment explorer. Test ciphers: I'd like to introduce a whole new range of ciphers to test on, mainly being homophonic substitution but with different schemes. More will be added and particular schemes can be requested. All of these ciphers can have low count 1:1 substitutes. Please use the proper names of the ciphers when referencing them. There should be no errors in these ciphers but the number of homophones per letter were handpicked each time to introduce a human element. Perfect cycles: c_p1.txt c_p2.txt c_p3.txt Randomization of cycles: (the numb...

doranchak

(@doranchak)

Posts: 2614

Member Admin

I am wondering if you can combine that sort of attack with another one:

For each transposition under consideration, apply it in reverse to Z340, then measure the resulting normal (period 1) ngrams, as well as the even/odd and top/bottom biases. Also measure the IOC of columns vs rows. Then, you can rank the transpositions based on how well they:

1) Increase the number of normal (period 1) repeating ngrams (or cause their quantities to match what we’d expect for a normal homophonic cipher)
2) Remove the even/odd bias and top/bottom bias
3) Match the expected column and row IOCs of normally-enciphered homophonic ciphers.
4) Match other expectations of normally-enciphered homophonic ciphers (not sure what would be in this category yet)

Doing this might help narrow down the possibilities even more, although it seems the large amount of choices to draw from for transposition steps would be prohibitive to a brute force search. It’d be interesting to see how the numbers compare for each type of transposition.

Perhaps you have already done something similar to this in your experiments.

http://zodiackillerciphers.com

Posted : November 17, 2016 10:05 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I am wondering if you can combine that sort of attack with another one:

For each transposition under consideration, apply it in reverse to Z340, then measure the resulting normal (period 1) ngrams, as well as the even/odd and top/bottom biases. Also measure the IOC of columns vs rows. Then, you can rank the transpositions based on how well they:

1) Increase the number of normal (period 1) repeating ngrams (or cause their quantities to match what we’d expect for a normal homophonic cipher)
2) Remove the even/odd bias and top/bottom bias
3) Match the expected column and row IOCs of normally-enciphered homophonic ciphers.
4) Match other expectations of normally-enciphered homophonic ciphers (not sure what would be in this category yet)

Doing this might help narrow down the possibilities even more, although it seems the large amount of choices to draw from for transposition steps would be prohibitive to a brute force search. It’d be interesting to see how the numbers compare for each type of transposition.

Perhaps you have already done something similar to this in your experiments.

EDIT: The other thing is that there are only 230 possible period 110 repeats to begin with, making the spike more significant.

I haven’t done anything quite that sophisticated. My basic idea is to categorize different types of transpositions, make a list. Then try variations of each on the list and compare, or overlay, with the 340. See if period 1 bigram positions match up. But not just the typical period 1 bigrams that create an obvious pattern, but the non-obvious ones, such as created at the edge of a rectangle, or transition locations between multiple inscription rectangles which would occur at certain intervals. Maybe give special weight to those. And keep trying until finding something that looks more like the 340 than anything else.

Your post about the mirrored period 110 repeats has been in the back of my mind for over a year now. They are similar to the period 29 repeats because they do not appear unless the message is mirrored ( left, mirrored, right regular ). I will have to take a closer look at those to see what they are. 110 doesn’t have to be close to a multiplier of 15 or 19 if there are three rectangles.

Posted : November 18, 2016 4:42 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

EDIT: The other thing is that there are only 230 possible period 110 repeats to begin with, making the spike more significant.

Good point – I hadn’t considered that before. Usually, when I look for periodic bigrams, I allow positions that "fall off" the cipher to wrap around back to the beginning at the first unvisited position. That way I guarantee that all 340 symbols are visited.

My basic idea is to categorize different types of transpositions, make a list.

Have you posted that list somewhere? I want to eventually think about how to generalize that to a transposition function that can explore the space of all such transpositions.

http://zodiackillerciphers.com

Posted : November 18, 2016 5:00 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

EDIT: The other thing is that there are only 230 possible period 110 repeats to begin with, making the spike more significant.

Good point – I hadn’t considered that before. Usually, when I look for periodic bigrams, I allow positions that "fall off" the cipher to wrap around back to the beginning at the first unvisited position. That way I guarantee that all 340 symbols are visited.

My basic idea is to categorize different types of transpositions, make a list.

Have you posted that list somewhere? I want to eventually think about how to generalize that to a transposition function that can explore the space of all such transpositions.

I have sort of an outline that I started a long time ago. I should probably work on it some more to include different types of diagonal transposition and odd even column transposition.

viewtopic.php?f=81&t=2916

Posted : November 18, 2016 5:58 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Nice – thanks for reminding me about the outline. I will plan to circle back to that at some point. I’m taking a neural networks / machine learning course at the moment. For the final project, I hope to build a cipher identification network that will attempt to distinguish ciphers of various types. I’m working from this list at the moment: http://www.cryptogram.org/resources/cipher-types

If the network is successful with the ACA types then I can start adding other types to the list. I’m curious how such a network would classify Z340 (and other unsolved ciphers), especially when all the test ciphers we’ve been collecting here on the forums are included in the training set for the network. Other classification techniques are out there, such as bion’s cipher ID test ( http://bionsgadgets.appspot.com/gadget_ … ended.html) and bion’s neural network ( http://bionsgadgets.appspot.com/gadget_ … ction.html), so there is some really useful work to build from. And the back issues of the ACA periodical are filled with many test ciphers to include in these experiments.

Even if the network fails to classify Z340 (it’s very likely that it won’t classify it), it will serve two purposes: 1) It will be a foundation for adding new cipher types to try to auto-identify them (for instance, the novel transposition schemes you are exploring), and 2) I will have generated a lot of statistics about all the cipher types that could be meaningful to compare to the stats of Z340 and other unknown ciphers.

http://zodiackillerciphers.com

Posted : November 18, 2016 6:30 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

That sounds like a fun project. I am going to keep working on transposition, take a closer look at the period 110 repeats, and develop more detection tools.

Posted : November 19, 2016 4:18 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

Awesome, I look forward to more reports of your progress!

http://zodiackillerciphers.com

Posted : November 19, 2016 6:05 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I took another look at the coincidence count column chart. There is a spike at period 110.

Posted : November 20, 2016 6:32 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

I took another look at the coincidence count column chart. There is a spike at period 110.

Yes, it shows up on BartW’s plot as well: viewtopic.php?p=48147#p48147 (the 2nd plot on that post)

Also, I have a "fragment IOC" measurement that attempts to count the number of repeating fragments (patterns such as A?BC, AB???C, etc), and it gives period 110 the 2nd highest measurement. For some odd reason, it gives period 62 the 1st highest measurement. Has period 62 come up for you before in your analysis?

I need to revisit my fragment IOC calculation to make sure I’m doing it properly and not overcounting anything or making some other mistake like that. It doesn’t give period 1 the highest measurement in Z408 (it shows up 11th on the list). My hunch is that it’s easy to assign repeating fragments too much significance.

http://zodiackillerciphers.com

Posted : November 21, 2016 2:39 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I took another look at the coincidence count column chart. There is a spike at period 110.

Yes, it shows up on BartW’s plot as well: http://www.zodiackillersite.com/viewtop … 147#p48147 (the 2nd plot on that post)

Also, I have a "fragment IOC" measurement that attempts to count the number of repeating fragments (patterns such as A?BC, AB???C, etc), and it gives period 110 the 2nd highest measurement. For some odd reason, it gives period 62 the 1st highest measurement. Has period 62 come up for you before in your analysis?

I need to revisit my fragment IOC calculation to make sure I’m doing it properly and not overcounting anything or making some other mistake like that. It doesn’t give period 1 the highest measurement in Z408 (it shows up 11th on the list). My hunch is that it’s easy to assign repeating fragments too much significance.

I have not noticed period 62 for anything. I would guess that the + symbol ( my 19 ) is probably showing up a lot in your repeated fragments. I didn’t work on the 340 much this weekend. I have to take frequent breaks and also time out to think before I decide on a direction to invest my time in.

But here is the mirrored 340 redrafted into 55 columns, showing the period 110 repeats. The repeats seems to be clustered into groups of 14-15 columns, separated by three or four columns. May be of no significance, but interesting to me.

Posted : November 21, 2016 4:22 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I set up a very simple experiment to find out if I could detect the contours of multiple inscription rectangles. Here is how it works, with an example message.

Step 1. This is message # 82 from Jarlve’s plaintext library. Follow the colored bigram P-A through the example. P appears at positions 136, 219 and 280 and A appears at positions 137, 220 and 281.

Step 2. The simple transposition includes two 10 x 17 inscription rectangles. These are the positions. Inscribe left right top bottom.

Step 3. Lift plaintext to transcribe into a 17 x 20 rectangle vertically top bottom left right. These are the positions.

Posted : November 27, 2016 5:42 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

The idea is that most of the bigram repeats are period 17, but some of them are period 152 with the plaintext in reverse order. The bigram repeats P-A that appear at positions 136-137 and 219-220 become period 17. But the bigram repeat P-A at positions 280-281 becomes A-P at period 152 because it is cut by the edge of an inscription rectangle.

Step 4. There are only 32 possible period 152 reverse bigrams, but with the plaintext and diffusion, there will actually be much fewer. These are the ones that the experiment is designed to detect. Scroll right to see them.

Posted : November 27, 2016 5:51 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Step 5. Here is the plaintext, transposed.

Step 6. Encoded with the following key. Note that symbols 1 maps to A and 40 maps to P, with matching colors.

Step 7. Here is the message.

47 58 13 36 1 32 34 14 48 33 2 51 35 15 34 21 9
25 3 7 55 52 16 17 4 42 13 49 22 13 10 20 26 59
35 47 37 53 27 11 12 47 56 54 40 14 23 11 30 48 24
8 31 55 21 38 10 13 41 15 28 39 6 3 16 60 18 25
17 26 30 27 34 13 36 28 2 7 49 3 35 47 4 37 31
13 52 11 47 22 53 48 8 34 13 49 7 12 40 8 43 14
23 54 9 50 15 16 53 13 10 41 25 29 44 1 45 17 24
13 30 38 26 47 45 43 56 11 39 6 36 14 27 37 21 28
44 15 2 52 16 33 25 45 46 47 31 18 49 43 50 17 47
13 22 5 55 17 26 29 15 3 16 17 38 53 28 48 13 39
54 13 28 1 13 44 14 15 23 28 15 1 30 36 17 2 31
24 23 35 34 51 53 57 43 26 7 12 47 10 53 18 54 53
13 46 3 53 22 36 27 14 50 23 4 28 59 3 38 15 52
43 39 19 1 16 18 47 19 32 59 34 18 13 34 43 2 24
3 59 30 53 39 21 25 37 38 4 9 52 45 11 22 35 13
45 35 4 13 35 26 6 46 56 48 41 23 14 8 27 12 34
32 59 33 28 30 50 30 15 51 41 30 16 52 36 47 24 17
58 25 5 13 60 19 13 59 21 56 55 58 36 1 40 14 17
1 9 39 55 40 13 54 3 58 44 32 38 39 45 13 18 12
47 14 60 11 1 7 22 48 23 49 40 46 21 50 30 17 52

Posted : November 27, 2016 6:00 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Step 8. O.k. here are the ALL of the period 17 bigrams that have the same symbols as period 152 bigrams, but with symbols reversed. The other symbols are not shown for clarity.

Step 9. Here are ALL of the period 152 bigrams where the symbols are reversed, and match period 17 bigrams.

Step 10. There are only 53 period 17 bigram repeats, not nearly as many as Z340 ( the cycles are approximately 90% perfect rows 1-5, 80% perfect rows 2-10, 70% perfect rows 11-15, and 60% perfect rows 16-20 ).

Posted : November 27, 2016 6:14 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Step 11. You would think that there would be a small spike at period 152 where the symbols, reversed, have matches at period 17. But unfortunately with these types of messages there are a lot of reverse matches at a lot of periods. There is no noticeable spike at 152, as compared to the other reverse period matches. The red lines are at multipliers of 17, minus 1, showing where you would expect a spike but there usually is not. Scroll right to see the non existent spike.

Step 12. However, I found another way to detect the boundaries of the inscription rectangles, without yet going to the probability scores of the repeats. I just summed the count of highlighted cells on each row. Then I summed each combination of side by side rows. Here the sum of the rows 10 and 11 is larger than the sum of any other two rows, showing the boundaries of the two inscription rectangles.

I made 100 similar messages, and the method worked for 64 of them. With the other 36 messages, there were two other rows with more cells highlighted, or there was a tie between rows 10 and 11 and two other rows. So it worked almost two out of three times.

Step 13. I think that the system could be improved by looking for other reverse bigrams that are created by the edge of an inscription rectangle. Period 2 becomes period 34 and reversed period 118, period 3 becomes period 51 and reverse period 135. And so on.

Basically, the idea is that if there are more than one inscription or transcription shapes in the 340, it may be possible to detect their boundaries by finding a path through the message that ties together broken repeating n grams at that boundary.

That is all for this morning.

Posted : November 27, 2016 6:36 pm

Zodiac Discussion Forum