I worked on my cycle spreadsheet yesterday morning, and it is still a work in progress. I want to continue my work with the cycles. There is a reason why we can detect cipher created L=2 cycles, but not very many L=3-5 cycles. Maybe it is just random symbol selection, maybe it is intentionally random symbol selection. Or maybe it is something else. I am not convinced about any hypothesis yet.
I somewhat understand about the 17 position no-repeat windows. This may be related, but I wanted to point out that with L=2, the symbols are distributed farther away from each other, and therefore often appear on different rows. When looking at encoding left right top bottom instead of right left top bottom, there are still a lot of L=2 cycles, and some of them are different. Some start at the beginning of the message when reading left right top bottom, and some start at the beginning of the message when reading right left top bottom.
I really want to work on the cycles some more, and update my spreadsheet for L=3-5 and make more charts. I am a lot better at making spreadsheets than I was a year ago. Sometimes I want to make huge charts and stare at them on my tiny laptop screen. Sometimes I want to make huge charts and put them on the walls of my house. Sometimes I just want to mentally run away from the 340. Maybe a nice compromise would be to just get a bigger computer monitor.
I started an outline of my own because it is time for me to get more organized with my thoughts:
viewtopic.php?f=81&t=2916
Keep up the good work, smokie. Maybe you could get a projector to put charts on your walls. Better yet, get one of those projectors that projects the image AND turns the wall into a giant touch screen, so you can manipulate the image directly on the wall!
The hypothesis is about how he applied homophonic substitution. He created a key as normally, as he did with the 408. Then instead of cycling and thus keeping track of which symbol in the cycle should be used next he picked a random symbol from that cycle that was not repeated in his window. And with window I mean a particular range, for instance one row, or a couple of rows, or the last x-number of symbols. And just that for the whole cipher.
Something in this manner. So, if he did that, any symbol that is repeated at close range ("+" and "p") should be 1:1 substitutes. This encoding style seems to fit observations very well.
Jarlve, can you point me to the page in this thread where you show a chart or a graph so that I can see the spike at 17. I remember reading about it before, but didn’t quite understand. I understand the above. But did you move sliding windows of increasing lengths through the message to find this measurement? Wouldn’t there be fewer and fewer repetitions with smaller and smaller windows. How could there be more repetitions with windows of 16 instead of 17. Or did you not slide the windows, but instead examine alternating window sections?
What do you think the motive was for not repeating symbols in rows? Was he trying to hide the cycles because the he thought that maybe the 408 cycles made the 408 easy to solve?
Here is what I am currently wondering about. At top are the first 34 positions, encoding left to right. The vertical axis is the symbol, and the horizontal axis is the position. The green shaded cells mark the beginning of a cycle, whether it is that symbol at that position, or the other symbol at that position ( green shaded cells always appear in pairs ). Note positions 13 to 19, several positions where a cycle doesn’t even start. Why? We have L=2 cycles. But why are there several symbols all together in the first row of the message that do not mark the beginning of a cycle? Superimposed in the upper right are the same positions.
Below is the mirrored version, but I also had to re-number the symbols first appearing lowest consecutive so that you could see the diagonal row. Positions 13-17 become positions 1-5. Looking at the cycles this way, as if he encoded from EDIT right to left, all five of these positions mark the beginning of a cycle. Minimum number of consecutive alternations = 6 ( not a high threshold ) for both pictures.
That is what is currently bothering me.
Thanks for clarifying – I think I understand now. My hunch is that even if the cycling is weakened this way, it shouldn’t really affect azdecrypt’s ability to find solutions. However, with 1:1 substitutes thrown in, you have already ruled that out, up to some multiplicity limit. Is this fair to say?
Yes of course, it’s no problem to solve this. But as I said I’m talking purely about the homophonic substitution layer and I assume another reason why the 340 is not solving.
This is my more general hypothesis for the 340:
Layer 1: plaintext into transposed plaintext.
Layer 2: transposed plaintext into homophonic substitution.
Jarlve, can you point me to the page in this thread where you show a chart or a graph so that I can see the spike at 17. I remember reading about it before, but didn’t quite understand. I understand the above. But did you move sliding windows of increasing lengths through the message to find this measurement? Wouldn’t there be fewer and fewer repetitions with smaller and smaller windows. How could there be more repetitions with windows of 16 instead of 17. Or did you not slide the windows, but instead examine alternating window sections?
Take the string "ABCABCABC". Now for every position sum the largest string that has no repeating symbols in your given direction.
ABCABCABC 333333321 (sum=25)
Let’s randomize the string a bit.
ABCCBAABC 321321321 (sum=18)
For the following graphs, the red line is the normal reading direction or reversed. The green line is mirrored or flipped. The blue lines are vertical and the pink lines are diagonal.
340:
408:
What do you think the motive was for not repeating symbols in rows? Was he trying to hide the cycles because the he thought that maybe the 408 cycles made the 408 easy to solve?
Perhaps he didn’t like to keep track of the cycles.
Here is what I am currently wondering about. At top are the first 34 positions, encoding left to right. The vertical axis is the symbol, and the horizontal axis is the position. The green shaded cells mark the beginning of a cycle, whether it is that symbol at that position, or the other symbol at that position ( green shaded cells always appear in pairs ). Note positions 13 to 19, several positions where a cycle doesn’t even start. Why? We have L=2 cycles. But why are there several symbols all together in the first row of the message that do not mark the beginning of a cycle? Superimposed in the upper right are the same positions.
I’m not sure if I understand correctly but I checked the first line of p1 through p10 and found that on average there are 10.8 unique letters out of 17. So we can assume that not all positions on the first line have to start a cycle. But I suppose you are wondering about the gap. I don’t know but note that the "p" and "+" symbols occupy position 18 and 19, these may be 1:1 substitutes.
Now for every position sum the largest string that has no repeating symbols in your given direction.
O.k., I get it now.
If you are at position x in the message, and moving left right, top bottom one position at a time, then you will more frequently move 17 positions before landing on the same symbol that is at position x. It looks like a lot of the time you move 14, 15 or 16 positions before landing on the same symbol. But there aren’t very many positions in the message where you can go more than 17 positions before landing on the same symbol that is at position x.
It seems to me that this value would be a function of two characteristics of the key, count of symbols and diffusion efficiency, and the plaintext.
Let’s say we had all 1:1 substitutes, or 23 symbols. Your value would be low. But if we make them all 2:1 substitutes, or 46 symbols, your value would increase. If we arranged 46 symbols on the key to most efficiently diffuse the plaintext, then your value would be at a maximum for 46 symbols. If you made the key have 63 symbols and inefficient ( which makes it easier to create period 19 repeats by the way ), then the value would not be as high as if you made the 63 symbols key efficient. For example mapping 8 or so symbols to E and maybe one symbol to W. All depending on the plaintext of course.
But the really tall spike and sharp decline? That is very interesting. It would seem to me that there would be a very rough curve of some sort, but not such a sharp drop. So I see what you are saying.
I have gotten the hard part of my L=3 spreadsheet done, and hope to use my chart to examine where L=3 cycles begin and end in the message. I want to see if there is a pattern, if a lot of L=3 cycles end in a certain region of the message.
I try to answer the question. "Is it possible that the bigram peak at 19 in the 340 is just a random peak not related to any part of the encoding at all?"
And came up with a recursive algorithm that can look much more deeply at bigrams (or other measurements) to weed out the noise. At the depth it ran, it took 30 mins per cipher and millions of sub-measurements were performed for each.
The 340 scores 0.875 (higher is better) and the 408 scores 0.966. Cipher 12 of doranchak’s batch scored 0.803. That is reasonably close. So, to answer the initial question. Yes, but none of doranchak’s 24 ciphers with the bigram 19 peak matched the strength of the 340 though some came close. Perhaps the algorithm, ran at a higher depth could put more distance between doranchak’s ciphers and the 340. But it then would take days per cipher.
Period 19 strength: 340: 0.875 Period 1 strength: 408: 0.966 Period 19 strength: d1: 0.443 d2: 0.514 d3: 0.509 d4: 0.545 d5: 0.469 d6: 0.529 d7: 0.596 d8: 0.541 d9: 0.752 d10: 0.506 d11: 0.597 d12: 0.803 d13: 0.472 d14: 0.512 d15: 0.606 d16: 0.541 d17: 0.463 d18: 0.456 d19: 0.597 d20: 0.536 d21: 0.579 d22: 0.696 d23: 0.514 d24: 0.588
Interesting – can you describe how the algorithm works?
1 19 4 50 7 11 10 20 13 14 16 23
18 29 20 3 23 30 26 50 29 19 32 51
20 63 36 34 38 18 26 51 13 21 1 3
42 15 43 7 44 16 45 28 19 50 31 19
46 20 19 30 49 47 50 4 19 39 10 19
5 36 7 40 23 21 17 7 51 37 25 16
22 36 31 13 58 56 36 51 8 20 13 13
21 19 41 56 22 36 46 42 40 54 13 5
17 51 19 26 39 48 51 27 34 30 53 36
55 37 38 18 7 42 23 23 29 11 34 19
38 3 54 6 2 9 27 12 61 15 31
16 19 6 22 11 25 14 28 28 31 20
40 35 42 19 19 15 50 33 36 40 44
6 5 18 6 50 8 61 23 8 3 19
34 37 12 48 53 11 2 9 38 53 55
11 3 45 6 31 30 5 10 32 16 15
3 19 19 24 16 38 19 15 26 40 33
19 16 26 49 26 19 23 27 14 60 33
11 37 17 19 43 16 46 36 20 63 56
4 6 1 19 10 19 39 43 62 20 58
2 3 5 48 8 25 11 5 14 37 17
5 36 21 41 24 50 27 37 30 52 33
34 47 37 33 39 11 21 20 22 58 41
5 51 7 32 30 50 5 36 19 48 16
47 59 40 35 17 56 51 8 52 50 54
44 28 51 20 55 23 56 28 4 57 21
50 14 57 50 16 29 59 6 28 11 11
15 33 32 40 23 9 18 1 19 21 47
29 10 61 29 3 20 20 23 62 55 31
40 25 8 5 41 40 5 44 51 31 55
I tried to run the above through crypto scope.. do i need to convert it to symbols first? .. Help
I tried to run the above through crypto scope.. do i need to convert it to symbols first? .. Help
Yeah, sorry, it doesn’t support numerical representations yet. Here’s a symbolic version:
ABCDEFGHIJKLMNHOL
PQDNBRSHTUVWMQSIX
AOYZaEbKcdBDeBfHB
PghDCBiGBjUEkLXlE
SmnKoUeIpqUSrHIIX
BsqoUfYktIjlSBQiu
SvVPwUxmWMEYLLNFV
BWOtyz0v12ZeKByoF
nJddeHk3YBBZD4Ukb
yjMyDr2LrOBVm1uwF
z0WwxFOcyePjGRKZO
BB5KWBZQk4BKQgQBL
vJ64FmlBaKfUHTqCy
ABGBia7HpzOjurnFj
JmljUXs5DvmP84Vhm
4iFXHopsjSERPDjUB
uKh9k3lqSr8DtbdSH
xLqdC!XDJ!DKN9ydF
FZ4RkL0MABXhNG2NO
HHL7xeknrjskjbSex
Thanx for that doranchak..Interesting the first column down..
this is a columnar scytale of the 340 with a 90 degree rotation to make it easier to read. top line starting at p13 B,S,B,S,P,A,P,A, I find it interesting not sure why.
H F x u 4 J A v B z y n B S B S P A P A
H Z L K i m B J B 0 j J W v s m g O Q B
L 4 q h F l G 6 5 W M d O V q n h Y D C
7 R d 9 X j B 4 K w y d t P o K D Z N D
x k C k H U i F W x D e y w U o C a B E
e L ! 3 o X a m B F r H z U f U B E R F
k 0 X l p s 7 l Z O 2 k 0 x Y e i b S G
n M D q s 5 H B Q c L 3 v m k I G K H H
r A J S j D p a k y r Y 1 W t p B c T I
j B ! r S v z K 4 e O B 2 M I q j d U J
s X D 8 E m O f B P B B Z E j U U B V K
k h K D R P j U K j V Z e Y l S E D W L
j N N t P 8 u H Q G m D K L S r k e M M
b G 9 b D 4 r T g R 1 4 B L B H L B Q N
S 2 y d j V n q Q K u U y N Q I X f S H
e N d S U h F C B Z w k o F i I l H I O
x O F H B m j y L O F b F V u X E B X L
I have been thinking about this idea for the last few weeks, and wanted to write it down. Disrupted transposition is route transposition with a lot of transcription skips. At the very least, a very simple disrupted transposition cipher for someone like a spy would be to use a newspaper crossword puzzle pattern to transpose plaintext. The sender and receiver would have to both know what newspaper and what day, but it would be very easy to write a message into a crossword left right top bottom, and transcribe top bottom left right. Or whatever other directions. Combined with homophonic substitution, a message would be pretty difficult to solve, much less detect.
It may be a long shot, but at some point I would like to look at Bay Area newspapers published shortly before Zodiac sent the 340. Just to see if any of the puzzle patterns could create a lot of period 19 bigram repeats. Somewhere on my to-do list, maybe higher just for fun.