Homophonic substitution

Jarlve · 2015-08-02T16:42:57Z

This thread is a continuation of viewtopic.php?f=81&t=267 in which several aspects of the Zodiac 340 cipher are discussed and researched. I'd like to continue the work from there in this thread because then I can use the main post to reference and update all the cipher material being discussed. Some of the questions which the contributors are trying to answer: - Is the 340 a straightforward homophonic substitution cipher or is there something else going on? - The 340 does not seem to cycle as well as the 408, what is going on? (doranchak:... _sequences) - To what extent is the 340 cyclic or random? Can we find areas - as for instance with the last part of the 408 - that are more random? - Is it possible to attribute the 340 not cycling as well as the 408 (despite its higher symbol count) due to some transposition after encoding? - Some of the medium-high count symbols do not seem to cycle well, are these possibly wildcards/polyalphabetic or 1:1 substitutes? (smokie treats) - Can we make a system that can adequately group homophones that belong to the same letter without having to solve the cipher? (smokie treats, glurk) - Is there a discrepancy between symbols/cycles/etc on odd and even positions for the 340? If so, what could be causing this? (daikon, doranchak, smokie treats) - There is a significant bigram repeat peak at period 19, is this a lead to the encryption scheme of the 340? (daikon) Related: 2 symbol cycle analysis for the 340 evens only. (doranchak) 2 symbol cycle analysis for the 340 odds only. (doranchak) Symbol position factors for the 340, 408 and smokie ciphers. (doranchak) 340 cipher numeric and symbolic version: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 5 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 20 34 35 36 37 19 38 39 15 26 21 33 13 22 40 1 41 42 5 5 43 7 6 44 30 8 45 5 23 19 19 3 31 16 46 47 37 19 40 48 49 17 11 50 51 9 19 52 53 10 54 5 44 3 7 51 6 23 55 30 17 56 10 51 4 16 25 21 22 50 19 31 57 24 58 16 38 36 59 15 8 28 40 13 11 21 15 16 41 32 49 22 23 19 46 18 27 40 19 60 13 47 17 29 37 19 61 19 39 3 16 51 20 36 34 62 63 53 31 55 40 6 38 8 19 7 41 19 23 5 43 29 51 20 34 55 38 19 3 54 50 48 2 11 25 27 20 5 61 14 37 31 23 16 29 36 6 3 41 11 30 50 14 53 37 28 19 52 20 51 40 63 47 42 34 22 19 18 11 50 51 20 36 21 58 44 3 6 15 51 18 7 32 50 16 53 61 28 36 8 53 48 19 19 34 20 59 12 30 35 53 47 56 2 4 8 38 39 50 55 19 11 36 28 45 40 20 31 21 23 5 7 28 32 37 57 15 16 3 36 14 19 13 12 63 56 29 19 51 6 26 20 11 33 13 19 19 33 26 56 40 26 36 9 23 42 1 14 54 21 33 5 11 51 10 17 26 29 43 48 20 46 27 23 20 30 55 56 36 4 37 25 1 18 5 10 42 40 39 23 44 62 11 31 58 19 HER>pl^VPk|1LTG2d Np+B(#O%DWY.<*Kf) By:cM+UZGW()L#zHJ Spp7^l8*V3pO++RK2 _9M+ztjd|5FP+&4k/ p8R^FlO-*dCkF>2D( #5+Kq%;2UcXGV.zL| (G2Jfj#O+_NYz+@L9 d<M+b+ZR2FBcyA64K -zlUV+^J+Op7<FBy- U+R/5tE|DYBpbTMKO 2<clRJ|*5T4M.+&BF z69Sy#+N|5FBc(;8R lGFN^f524b.cV4t++ yBX1*:49CE>VUZ5-+ |c.3zBK(Op^.fMqG2 RcT+L16C<+FlWB|)L ++)WCzWcPOSHT/()p |FkdW<7tB_YOB*-Cc >MDHNpkSzZO8A|K;+ Alterations of the 340: - In relation to the bigram peak at period 19: Scheme: move 1 row down, 2 columns right and repeat (wrap around cipher): 340_1rd-2cr-w.txt (doranchak) Grid 19 by 18, direction North-East (vertical) and 2 "?" symbols added: 340_19by18_n-e.txt Grid 20 by 17, direction SW-SE (diagonal): 340_20by17_sw-se.txt Grid 17 by 19, 17 symbols filler at end, vertically untransposed: 340_323_17.txt (smokie treats) Grid 17 by 20, 16 symbols filler at end, vertically untransposed: 340_324_16.txt (smokie treats) Grid 17 by 20, 15 symbols filler at end, vertically untransposed: 340_325_15.txt (smokie treats) Grid 17 by 20, 14 symbols filler at end, vertically untransposed: 340_326_14.txt (smokie treats) Grid 17 by 20, 13 symbols filler at end, vertically untransposed: 340_327_13.txt (smokie treats) - In relation to the odd/even encoding scheme: Evens only: 340evens.txt Odds only: 340odds.txt Randomized, shuffled: 340shuffled.txt (doranchak) Tools/links/solvers: - David Oranchak Zodiac Killer Ciphers:Zodiac Ciphers wiki:... =Main_Page CryptoScope:340 Webtoy:Zodiac Pattern Drawer:| (info) Word Search Gadget:- glurk ZKDecrypto:and viewtopic.php?f=81&t=2268 - Michael Cole The Zodiac Revisited:- Jarlve AZdecrypt:Visualizations: - In relation to the bigram peak at period 19 and 15 (mirrored 340): Doranchak's ngram viewer. Doranchak's period calculator. Doranchak's fragment explorer. Test ciphers: I'd like to introduce a whole new range of ciphers to test on, mainly being homophonic substitution but with different schemes. More will be added and particular schemes can be requested. All of these ciphers can have low count 1:1 substitutes. Please use the proper names of the ciphers when referencing them. There should be no errors in these ciphers but the number of homophones per letter were handpicked each time to introduce a human element. Perfect cycles: c_p1.txt c_p2.txt c_p3.txt Randomization of cycles: (the numb...

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I worked on my cycle spreadsheet yesterday morning, and it is still a work in progress. I want to continue my work with the cycles. There is a reason why we can detect cipher created L=2 cycles, but not very many L=3-5 cycles. Maybe it is just random symbol selection, maybe it is intentionally random symbol selection. Or maybe it is something else. I am not convinced about any hypothesis yet.

I somewhat understand about the 17 position no-repeat windows. This may be related, but I wanted to point out that with L=2, the symbols are distributed farther away from each other, and therefore often appear on different rows. When looking at encoding left right top bottom instead of right left top bottom, there are still a lot of L=2 cycles, and some of them are different. Some start at the beginning of the message when reading left right top bottom, and some start at the beginning of the message when reading right left top bottom.

I really want to work on the cycles some more, and update my spreadsheet for L=3-5 and make more charts. I am a lot better at making spreadsheets than I was a year ago. Sometimes I want to make huge charts and stare at them on my tiny laptop screen. Sometimes I want to make huge charts and put them on the walls of my house. Sometimes I just want to mentally run away from the 340. Maybe a nice compromise would be to just get a bigger computer monitor.

I started an outline of my own because it is time for me to get more organized with my thoughts:
viewtopic.php?f=81&t=2916

Posted : May 22, 2016 6:48 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Keep up the good work, smokie. Maybe you could get a projector to put charts on your walls. Better yet, get one of those projectors that projects the image AND turns the wall into a giant touch screen, so you can manipulate the image directly on the wall!

http://zodiackillerciphers.com

Posted : May 23, 2016 2:23 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

The hypothesis is about how he applied homophonic substitution. He created a key as normally, as he did with the 408. Then instead of cycling and thus keeping track of which symbol in the cycle should be used next he picked a random symbol from that cycle that was not repeated in his window. And with window I mean a particular range, for instance one row, or a couple of rows, or the last x-number of symbols. And just that for the whole cipher.

Something in this manner. So, if he did that, any symbol that is repeated at close range ("+" and "p") should be 1:1 substitutes. This encoding style seems to fit observations very well.

Jarlve, can you point me to the page in this thread where you show a chart or a graph so that I can see the spike at 17. I remember reading about it before, but didn’t quite understand. I understand the above. But did you move sliding windows of increasing lengths through the message to find this measurement? Wouldn’t there be fewer and fewer repetitions with smaller and smaller windows. How could there be more repetitions with windows of 16 instead of 17. Or did you not slide the windows, but instead examine alternating window sections?

What do you think the motive was for not repeating symbols in rows? Was he trying to hide the cycles because the he thought that maybe the 408 cycles made the 408 easy to solve?

Here is what I am currently wondering about. At top are the first 34 positions, encoding left to right. The vertical axis is the symbol, and the horizontal axis is the position. The green shaded cells mark the beginning of a cycle, whether it is that symbol at that position, or the other symbol at that position ( green shaded cells always appear in pairs ). Note positions 13 to 19, several positions where a cycle doesn’t even start. Why? We have L=2 cycles. But why are there several symbols all together in the first row of the message that do not mark the beginning of a cycle? Superimposed in the upper right are the same positions.

Below is the mirrored version, but I also had to re-number the symbols first appearing lowest consecutive so that you could see the diagonal row. Positions 13-17 become positions 1-5. Looking at the cycles this way, as if he encoded from EDIT right to left, all five of these positions mark the beginning of a cycle. Minimum number of consecutive alternations = 6 ( not a high threshold ) for both pictures.

That is what is currently bothering me.

Posted : May 23, 2016 4:01 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

Thanks for clarifying – I think I understand now. My hunch is that even if the cycling is weakened this way, it shouldn’t really affect azdecrypt’s ability to find solutions. However, with 1:1 substitutes thrown in, you have already ruled that out, up to some multiplicity limit. Is this fair to say?

Yes of course, it’s no problem to solve this. But as I said I’m talking purely about the homophonic substitution layer and I assume another reason why the 340 is not solving.

This is my more general hypothesis for the 340:

Layer 1: plaintext into transposed plaintext.
Layer 2: transposed plaintext into homophonic substitution.

AZdecrypt

Posted : May 23, 2016 5:10 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

Jarlve, can you point me to the page in this thread where you show a chart or a graph so that I can see the spike at 17. I remember reading about it before, but didn’t quite understand. I understand the above. But did you move sliding windows of increasing lengths through the message to find this measurement? Wouldn’t there be fewer and fewer repetitions with smaller and smaller windows. How could there be more repetitions with windows of 16 instead of 17. Or did you not slide the windows, but instead examine alternating window sections?

Take the string "ABCABCABC". Now for every position sum the largest string that has no repeating symbols in your given direction.

ABCABCABC
333333321 (sum=25)

Let’s randomize the string a bit.

ABCCBAABC
321321321 (sum=18)

For the following graphs, the red line is the normal reading direction or reversed. The green line is mirrored or flipped. The blue lines are vertical and the pink lines are diagonal.

340:

408:

AZdecrypt

Posted : May 23, 2016 5:53 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

What do you think the motive was for not repeating symbols in rows? Was he trying to hide the cycles because the he thought that maybe the 408 cycles made the 408 easy to solve?

Perhaps he didn’t like to keep track of the cycles.

Here is what I am currently wondering about. At top are the first 34 positions, encoding left to right. The vertical axis is the symbol, and the horizontal axis is the position. The green shaded cells mark the beginning of a cycle, whether it is that symbol at that position, or the other symbol at that position ( green shaded cells always appear in pairs ). Note positions 13 to 19, several positions where a cycle doesn’t even start. Why? We have L=2 cycles. But why are there several symbols all together in the first row of the message that do not mark the beginning of a cycle? Superimposed in the upper right are the same positions.

I’m not sure if I understand correctly but I checked the first line of p1 through p10 and found that on average there are 10.8 unique letters out of 17. So we can assume that not all positions on the first line have to start a cycle. But I suppose you are wondering about the gap. I don’t know but note that the "p" and "+" symbols occupy position 18 and 19, these may be 1:1 substitutes.

AZdecrypt

Posted : May 23, 2016 6:10 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Now for every position sum the largest string that has no repeating symbols in your given direction.

O.k., I get it now.

If you are at position x in the message, and moving left right, top bottom one position at a time, then you will more frequently move 17 positions before landing on the same symbol that is at position x. It looks like a lot of the time you move 14, 15 or 16 positions before landing on the same symbol. But there aren’t very many positions in the message where you can go more than 17 positions before landing on the same symbol that is at position x.

It seems to me that this value would be a function of two characteristics of the key, count of symbols and diffusion efficiency, and the plaintext.

Let’s say we had all 1:1 substitutes, or 23 symbols. Your value would be low. But if we make them all 2:1 substitutes, or 46 symbols, your value would increase. If we arranged 46 symbols on the key to most efficiently diffuse the plaintext, then your value would be at a maximum for 46 symbols. If you made the key have 63 symbols and inefficient ( which makes it easier to create period 19 repeats by the way ), then the value would not be as high as if you made the 63 symbols key efficient. For example mapping 8 or so symbols to E and maybe one symbol to W. All depending on the plaintext of course.

But the really tall spike and sharp decline? That is very interesting. It would seem to me that there would be a very rough curve of some sort, but not such a sharp drop. So I see what you are saying.

I have gotten the hard part of my L=3 spreadsheet done, and hope to use my chart to examine where L=3 cycles begin and end in the message. I want to see if there is a pattern, if a lot of L=3 cycles end in a certain region of the message.

Posted : May 23, 2016 9:44 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

I try to answer the question. "Is it possible that the bigram peak at 19 in the 340 is just a random peak not related to any part of the encoding at all?"

And came up with a recursive algorithm that can look much more deeply at bigrams (or other measurements) to weed out the noise. At the depth it ran, it took 30 mins per cipher and millions of sub-measurements were performed for each.

The 340 scores 0.875 (higher is better) and the 408 scores 0.966. Cipher 12 of doranchak’s batch scored 0.803. That is reasonably close. So, to answer the initial question. Yes, but none of doranchak’s 24 ciphers with the bigram 19 peak matched the strength of the 340 though some came close. Perhaps the algorithm, ran at a higher depth could put more distance between doranchak’s ciphers and the 340. But it then would take days per cipher.

Period 19 strength:
340: 0.875

Period 1 strength:
408: 0.966


Period 19 strength:
d1: 0.443
d2: 0.514
d3: 0.509
d4: 0.545
d5: 0.469
d6: 0.529
d7: 0.596
d8: 0.541
d9: 0.752
d10: 0.506
d11: 0.597
d12: 0.803
d13: 0.472
d14: 0.512
d15: 0.606
d16: 0.541
d17: 0.463
d18: 0.456
d19: 0.597
d20: 0.536
d21: 0.579
d22: 0.696
d23: 0.514
d24: 0.588

AZdecrypt

Posted : May 23, 2016 10:58 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Interesting – can you describe how the algorithm works?

http://zodiackillerciphers.com

Posted : May 23, 2016 11:05 pm

Mr lowe

(@mr-lowe)

Posts: 1197

Noble Member

1 19 4 50 7 11 10 20 13 14 16 23
18 29 20 3 23 30 26 50 29 19 32 51
20 63 36 34 38 18 26 51 13 21 1 3
42 15 43 7 44 16 45 28 19 50 31 19
46 20 19 30 49 47 50 4 19 39 10 19
5 36 7 40 23 21 17 7 51 37 25 16
22 36 31 13 58 56 36 51 8 20 13 13
21 19 41 56 22 36 46 42 40 54 13 5
17 51 19 26 39 48 51 27 34 30 53 36
55 37 38 18 7 42 23 23 29 11 34 19
38 3 54 6 2 9 27 12 61 15 31
16 19 6 22 11 25 14 28 28 31 20
40 35 42 19 19 15 50 33 36 40 44
6 5 18 6 50 8 61 23 8 3 19
34 37 12 48 53 11 2 9 38 53 55
11 3 45 6 31 30 5 10 32 16 15
3 19 19 24 16 38 19 15 26 40 33
19 16 26 49 26 19 23 27 14 60 33
11 37 17 19 43 16 46 36 20 63 56
4 6 1 19 10 19 39 43 62 20 58
2 3 5 48 8 25 11 5 14 37 17
5 36 21 41 24 50 27 37 30 52 33
34 47 37 33 39 11 21 20 22 58 41
5 51 7 32 30 50 5 36 19 48 16
47 59 40 35 17 56 51 8 52 50 54
44 28 51 20 55 23 56 28 4 57 21
50 14 57 50 16 29 59 6 28 11 11
15 33 32 40 23 9 18 1 19 21 47
29 10 61 29 3 20 20 23 62 55 31
40 25 8 5 41 40 5 44 51 31 55

Posted : June 9, 2016 10:31 am

Mr lowe

(@mr-lowe)

Posts: 1197

Noble Member

I tried to run the above through crypto scope.. do i need to convert it to symbols first? .. Help

Posted : June 9, 2016 10:34 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

I tried to run the above through crypto scope.. do i need to convert it to symbols first? .. Help

Yeah, sorry, it doesn’t support numerical representations yet. Here’s a symbolic version:

ABCDEFGHIJKLMNHOL
PQDNBRSHTUVWMQSIX
AOYZaEbKcdBDeBfHB
PghDCBiGBjUEkLXlE
SmnKoUeIpqUSrHIIX
BsqoUfYktIjlSBQiu
SvVPwUxmWMEYLLNFV
BWOtyz0v12ZeKByoF
nJddeHk3YBBZD4Ukb
yjMyDr2LrOBVm1uwF
z0WwxFOcyePjGRKZO
BB5KWBZQk4BKQgQBL
vJ64FmlBaKfUHTqCy
ABGBia7HpzOjurnFj
JmljUXs5DvmP84Vhm
4iFXHopsjSERPDjUB
uKh9k3lqSr8DtbdSH
xLqdC!XDJ!DKN9ydF
FZ4RkL0MABXhNG2NO
HHL7xeknrjskjbSex

http://zodiackillerciphers.com

Posted : June 9, 2016 2:17 pm

Mr lowe

(@mr-lowe)

Posts: 1197

Noble Member

Thanx for that doranchak..Interesting the first column down..

Posted : June 9, 2016 4:15 pm

Mr lowe

(@mr-lowe)

Posts: 1197

Noble Member

this is a columnar scytale of the 340 with a 90 degree rotation to make it easier to read. top line starting at p13 B,S,B,S,P,A,P,A, I find it interesting not sure why.

H F x u 4 J A v B z y n B S B S P A P A
H Z L K i m B J B 0 j J W v s m g O Q B
L 4 q h F l G 6 5 W M d O V q n h Y D C
7 R d 9 X j B 4 K w y d t P o K D Z N D
x k C k H U i F W x D e y w U o C a B E
e L ! 3 o X a m B F r H z U f U B E R F
k 0 X l p s 7 l Z O 2 k 0 x Y e i b S G
n M D q s 5 H B Q c L 3 v m k I G K H H
r A J S j D p a k y r Y 1 W t p B c T I
j B ! r S v z K 4 e O B 2 M I q j d U J
s X D 8 E m O f B P B B Z E j U U B V K
k h K D R P j U K j V Z e Y l S E D W L
j N N t P 8 u H Q G m D K L S r k e M M
b G 9 b D 4 r T g R 1 4 B L B H L B Q N
S 2 y d j V n q Q K u U y N Q I X f S H
e N d S U h F C B Z w k o F i I l H I O
x O F H B m j y L O F b F V u X E B X L

Posted : June 10, 2016 5:32 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I have been thinking about this idea for the last few weeks, and wanted to write it down. Disrupted transposition is route transposition with a lot of transcription skips. At the very least, a very simple disrupted transposition cipher for someone like a spy would be to use a newspaper crossword puzzle pattern to transpose plaintext. The sender and receiver would have to both know what newspaper and what day, but it would be very easy to write a message into a crossword left right top bottom, and transcribe top bottom left right. Or whatever other directions. Combined with homophonic substitution, a message would be pretty difficult to solve, much less detect.

It may be a long shot, but at some point I would like to look at Bay Area newspapers published shortly before Zodiac sent the 340. Just to see if any of the puzzle patterns could create a lot of period 19 bigram repeats. Somewhere on my to-do list, maybe higher just for fun.

Posted : June 22, 2016 4:39 am

Zodiac Discussion Forum