Zodiac Discussion Forum

Homophonic substitu…
 
Notifications
Clear all

Homophonic substitution

1,434 Posts
21 Users
0 Reactions
305.2 K Views
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Oh! Gotcha. Thanks.

Here’s my list of them: http://zodiackillerciphers.com/wiki/ind … ength:_2_2

http://zodiackillerciphers.com

 
Posted : August 4, 2015 10:40 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

@doranchak,

Wildcard length, that’s another way to describe it. I remember glancing over it a long while back. :)

I agree that it’s not sure how relevant higher wildcard lengths may be. But taking in consideration that the 340 has been unsolved for so long I want to note a high peak at a wildcard length of 18 for the normal 340 and at 14 for the mirrored version of the 340. I’ve discussed it a bit with daikon, maybe the "+" symbol is involved because of its count and that it only lands on a prime number once (which you very well know).

AZdecrypt

 
Posted : August 4, 2015 11:03 pm
daikon
(@daikon)
Posts: 179
Estimable Member
 

By the way I found something a bit interesting perhaps. Note that the last 10 rows of the 340 have only 1 bigram repeat. Now increase the period to 3 and be slightly dazzled.

That is very interesting indeed! It didn’t occur to me to test 1st and 2nd halves of Z340 separately. So it was surprising to me to learn that the second half only has a single bigram repeat. And it’s ‘++’! Looking at the symbol frequencies for the 2nd half, it’s also interesting to find that it is much more plausible for a normal homophonic substitution, where ‘+’ is no longer twice as frequent as the next most frequent symbol. And you are correct, bigram repeats go way up at period 3 (and 9 to a smaller degree, but it could be just an artifact, since 3 is a factor of 9). Which could be a sign of a bifid encoding with the period of 6, which is an uncommon period for bifid, but still possible. Or that a columnar transposition of width 3 was used. I tried feeding the un-transposed result to my solver, but no stable solution was found. It’s probably due to a short length (170 symbols), which doubles the multiplicity, so it could be simply out of reach of auto-solvers.

I’m still not sure what to think of this extremely low bigram repeats in the 2nd half only. By itself, very low bigram repeats isn’t anything unusual. For example, columnar transpositions, or letter transpositions within each row (we need to come up with better names to distinguish these two types of transpositions), both destroy bigrams enough to cause this effect. But such a huge difference in bigram repeats between 1st half and 2nd half? Might be a sign that two different "extra steps" were indeed done to 1st and 2nd halves to further thwart decryption attempts.

I’m thinking about starting a new thread that just lists such "interesting observations". I have found a couple of "oddities" about Z340 myself, which didn’t lead anywhere. But perhaps if we put them all together, a new conclusion could be made?

At the start of previous thread I discussed the relatively low bigram repeat counts of the 340 in relation to smokie’s wildcard idea.

Yes, I noticed that overall Z340 bigram repeats are on the low side. Not a totally rare occurrence, as I’ve found several straight homophonic substitution ciphers (so they are solvable by ZDK/AZD) that have an even lower bigram repeats. But the majority of straight homophonic substitution ciphers of the same length usually have higher bigram repeats. Which I think is a weak sign of that elusive "extra step" that was done to Z340, which changed the plaintext before it was encoded with homophones. Separating the 1st half from the 2nd makes 1st half much more in line with other straight homophonic ciphers as far as bigram repeats are concerned. Have you tried feeding just the first half into AZD to see if it solves it? You are probably much better at figuring out how to handle such high multiplicity ciphers and how to spot possible good solves.

That the Zodiac possibly revised the cipher to remove/replace repeats, most likely visually and therefore would not spot bigrams at a distance easily. It’s just an idea with this strange observation.

I’m not sure Z was even aware of bigrams, or bigram repeats. It’s not easy to spot them by just looking at the cipher. It didn’t even occur to me that the second half was practically devoid of bigram repeats until now, and I’ve stared at Z340 for hours by now :). I don’t think anyone else spotted it until now. So I highly doubt Z did anything to Z340 to specifically address bigram repeats. It is likely just a side effect of whatever else he did to Z340.

 
Posted : August 4, 2015 11:28 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

I’m thinking about starting a new thread that just lists such "interesting observations". I have found a couple of "oddities" about Z340 myself, which didn’t lead anywhere. But perhaps if we put them all together, a new conclusion could be made?

I have such a thing here but it is challenging to keep it up to date: http://zodiackillerciphers.com/wiki/ind … servations

I’m going to focus on the interesting observations during my talk in October. Let me know what you’d want to see in the list! There will be a room full of cryptologists there so I’m hoping the interesting observations will provoke their curiosity to attack the 340 from new angles.

http://zodiackillerciphers.com

 
Posted : August 4, 2015 11:36 pm
daikon
(@daikon)
Posts: 179
Estimable Member
 

What do you mean by this? How do you increase the period to 3?

It’s a standard test for bifid. Here’s a pretty good explanation (in the first part about the period): Cryptanalysis of the Bifid cipher.

Besides bifid, it’s also a test for columnar transpositions. It’s when you write the plaintext in N columns and then read it by columns, top-to-bottom, left-to-right. To decode cipher back to plaintext, you need to reverse that operation. Undoing the transposition is not the same as applying it, unless the text has the same number of columns and rows (i.e. it’s a square matrix).

Here’s just the 2nd half of Z340 by itself. Notice just a single bigram repeat (‘++’).

Here’s the same 2nd half with 3-column transposition undone. Notice how there are now 22 bigram repeats! It has an even higher number of repeats than the 1st half.

 
Posted : August 4, 2015 11:44 pm
daikon
(@daikon)
Posts: 179
Estimable Member
 

I used to call it bigrams at a distance but daikon has been using the term period. It’s the distance between the first and the last symbol of the bigram, a normal bigram has a distance of 1. A distance of 3 would be A..B.

Well, I call them "periods" similar to bifid periods, since I learned about this metric from the bifid test. "Bigrams at a distance" might be a bit ambiguous as you could be talking about bigrams separated by some distance (i.e. "AB—AB—AB"). Practical Cryptography calls it "Bigrams with a step of 0/1/2/…", which is probably a better way of describing it.

 
Posted : August 4, 2015 11:51 pm
daikon
(@daikon)
Posts: 179
Estimable Member
 

I’m going to focus on the interesting observations during my talk in October. Let me know what you’d want to see in the list! There will be a room full of cryptologists there so I’m hoping the interesting observations will provoke their curiosity to attack the 340 from new angles.

I keep a log of things I’ve observed about Z340, but it’s very long and full of very minor "oddities" that are very likely just things that happened by pure chance. For example, I noticed that you get a stable solve (i.e. the auto-solver frequently converges on the exact same solution) if you use bifid with the key "ZODIAC", but that stable solve is gibberish and its overall score is far from a normal English text. I should go over my log again and try to cull the most significant "finds".

 
Posted : August 5, 2015 12:01 am
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Here’s just the 2nd half of Z340 by itself. Notice just a single bigram repeat (‘++’).

Here’s the same 2nd half with 3-column transposition undone. Notice how there are now 22 bigram repeats! It has an even higher number of repeats than the 1st half.

That’s is very interesting. I will add that to my list of things to explore!

http://zodiackillerciphers.com

 
Posted : August 5, 2015 5:16 am
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

Well I tried to make a post, but it didn’t post, so I have to write it again.

I am still working on my spreadsheet as I analyze M1_P2 and see that there is more cycling in the first half than the second half. It looks like there may be more switching around of cycles symbols in the second half. And some of the stronger cycles are missing symbols toward the end. Not done yet.

I have an idea, and forgive me if someone has recently tried this and I didn’t catch on. Check for cycling in the top and bottom halves of the 340 and compare that to the count of +’s. See if there is a relationship. But I was also thinking about regression analysis. What about checking portions of the 340, saw five rows (or more) at a time, starting at the top. Find the total cycle score with whatever formula, and scroll down, one row at a time. Compare the cycle score to the total count of +’s, q’s, B’s and F’s, individually or in combinations. I wonder if regression analysis could help to determine if there is a relationship between cycle scores and high count non-cyclic symbols, or which high count non-cyclic symbols are more determinative of the score as compared to others. EDIT: We could also scroll from the bottom up for a few more scores.

I have also been thinking about my grid hillclimber idea, and know how I would do it. It wouldn’t be that difficult, and I may move forward with it at some point in the future.

 
Posted : August 5, 2015 5:42 am
daikon
(@daikon)
Posts: 179
Estimable Member
 

That’s is very interesting. I will add that to my list of things to explore!

In case you need a simple explanation of how to transpose the 2nd half of Z340 into a version with higher bigram repeats — you read it skipping every 2 symbols. I.e. you take the 1st, 4th, 7th, 10th, etc. symbol. Then you go back to the start and take every 2nd, 5th, 8th, 11th, etc. symbol. And finally you take 3rd, 6th, 9th, 12th, etc. symbol.

It could be just pure chance though, as many other transpositions also increase the bigram repeats, such as skipping 1 symbol (step/period 2), or 3 (step/period 4). It’s not hard to significantly increase bigram repeats, if you only have 1 (compared to 8 in the 1st half). It’s just that the step/period 3 has the maximum increase. Here’s the raw graph:

Bifid period test, bigram IoC
  1 =  0.25  |————————
  2 =  1.75  |—————————————————————————————————————————————————————
  3 =  2.76  |———————————————————————————————————————————————————————————————————————————————————
  4 =  1.75  |—————————————————————————————————————————————————————
  5 =  1.25  |——————————————————————————————————————
  6 =  0.50  |————————————————
  7 =  0.75  |———————————————————————
  8 =  0.25  |————————
  9 =  2.00  |—————————————————————————————————————————————————————————————
 10 =  1.00  |———————————————————————————————
 11 =  0.25  |————————
 12 =  1.00  |———————————————————————————————
 13 =  0.25  |————————
 14 =  0.00  |
 15 =  0.25  |————————
 16 =  1.00  |———————————————————————————————
 17 =  1.50  |——————————————————————————————————————————————
 18 =  0.75  |———————————————————————
 19 =  2.00  |—————————————————————————————————————————————————————————————
 20 =  1.25  |——————————————————————————————————————
 21 =  2.51  |————————————————————————————————————————————————————————————————————————————
 22 =  0.75  |———————————————————————
 23 =  1.25  |——————————————————————————————————————
 24 =  1.00  |———————————————————————————————
 25 =  0.25  |————————
 26 =  0.00  |
 27 =  1.00  |———————————————————————————————
 28 =  2.51  |————————————————————————————————————————————————————————————————————————————
 29 =  0.50  |————————————————
 30 =  1.25  |——————————————————————————————————————
 31 =  1.25  |——————————————————————————————————————
 32 =  2.26  |————————————————————————————————————————————————————————————————————
 33 =  0.75  |———————————————————————
 34 =  1.25  |——————————————————————————————————————
 35 =  1.00  |———————————————————————————————
 36 =  1.50  |——————————————————————————————————————————————
 37 =  0.75  |———————————————————————
 38 =  1.00  |———————————————————————————————
 39 =  2.51  |————————————————————————————————————————————————————————————————————————————
 40 =  1.75  |—————————————————————————————————————————————————————
 
Posted : August 5, 2015 6:14 am
daikon
(@daikon)
Posts: 179
Estimable Member
 

What about checking portions of the 340, saw five rows (or more) at a time, starting at the top. Find the total cycle score with whatever formula, and scroll down, one row at a time.

I’ve tried that, see one of my previous posts:
By the way, I now tested variance of different parts of Z340 (i.e. beginning, middle, end), and I even ran a "sliding window" across Z340, where I selected 200 sequential symbols, starting with 1st symbol in the cipher, 2nd, 3rd and so forth, and I’m seeing a fairly even level of variance throughout. Unfortunately you can’t compute variance of a smaller section, such as individual rows, as you need to have a long enough string to get a good representation of distances between the same symbols. But I think it is fairly safe to say that there are no sudden changes in the way different parts of Z340 were encoded. Unlike Z408, which has a distinct increase of variance (almost twice!) towards the end, when by the 3rd section Z abandoned his initial attempt at maintaining perfect cycles, either by design, or because he got tired and sloppy.

Basically, 200 symbols is about the shortest sequence you can get a reliable variance number for. Anything shorter, and the variance doesn’t represent the randomness level too well.

I’m still not quite sure how the fairly even variance of same symbol distances across Z340 can be consolidated with the fact that there are practically no bigram repeats in the second half. If the bigrams were destroyed in the second half with some sort of transposition (i.e. by effectively randomizing them), then it should’ve been reflected in the variance being pushed up towards "random string" level in the second half. Unless the transposition was done on the per row level (i.e. letters were swapped around in each row, without crossing the row boundaries), then the variance wouldn’t have changed much. I’ll see if I can explore this further.

Another thought that I had. If Z did something to the 2nd half that destroyed/randomized bigrams, why the *2nd* half? I would’ve done it either to the entire message, or started with the 1st half. Was it an afterthought? I.e. did he encrypt the first half, then took a break, then came up with something that he can do to make decryption harder, and decided not to redo the 1st half, and only do it to the 2nd half? Or does this mean that the whole cipher should be read backwards, starting with the 2nd half? Or does it mean that 1st and 2nd half should be further combined somehow?

The funny thing is, I just realized that I should’ve noticed the near absence of bigram repeats in the 2nd half myself. When I was doing the "sliding window" test on Z340, I noticed that the bigram IoC (which is a slightly different way of measuring bigram repeats) is dropping quite a bit towards the end of the cipher. I actually have a note in my "log of curiosities" of Z340 to investigate this further. But Jarlve beat me to it. 🙂

He’s the graph I was looking at, but I was too preoccupied with analyzing variance at the time (red line), to pay enough attention to bigram IoC (blue line). The "sliding window" was 200 characters long, so it didn’t quite fit the 2nd half (170 characters), otherwise bigram IoC would’ve dropped to nearly 0.

 
Posted : August 5, 2015 6:45 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

I quoted daikon, doranchak and smokie treats in this reply.

I have found a couple of "oddities" about Z340 myself, which didn’t lead anywhere. But perhaps if we put them all together, a new conclusion could be made?

Good idea, like solving a sudoku, the more we fill in the more becomes apparant.

Yes, I noticed that overall Z340 bigram repeats are on the low side. Not a totally rare occurrence, as I’ve found several straight homophonic substitution ciphers (so they are solvable by ZDK/AZD) that have an even lower bigram repeats.

I’ve noticed this as well, but then there are usually more bigrams than expected at low periods. Kinda like with the 2nd part of the 340 actually.

Separating the 1st half from the 2nd makes 1st half much more in line with other straight homophonic ciphers as far as bigram repeats are concerned. Have you tried feeding just the first half into AZD to see if it solves it? You are probably much better at figuring out how to handle such high multiplicity ciphers and how to spot possible good solves.

The first and last 10 rows of the 340 are in my benchmark suit and I always check the top results. It’s probably out of range, but not by so much. I’ll share some solves.

First half:

(15734,667,170,63)

hknightayalsoundo
ngsharewedinfromt
hepreseandatorthi
nggotherangessnod
ucestopolitysalar
gentthefromatidea
risonwiderinantol
andimpresunitsfor
ofestsandthreallo
ftheastisegofthef

Second half:

(15517,630,170,60)

nlaphandersarehot
ersfandthechildsi
ngcoupledhissucha
frienthecrisicall
usanticcontinghol
dsignsoutanithere
aselingorlifesdbi
llbeonesstorepuba
dimmerwasortstoos
thereamongthedocl

I’m going to focus on the interesting observations during my talk in October. Let me know what you’d want to see in the list! There will be a room full of cryptologists there so I’m hoping the interesting observations will provoke their curiosity to attack the 340 from new angles.

I thought about it, I think it’s intimidating to share anything at all with a room full of experienced cryptologist! When I told my father about the cipher the first thing he asked was "how do you know it’s genuine?". So I believe that’s one of the most important things to talk about. That for instance the encoding appears to be genuine, and so for the entire cipher.

Here’s the same 2nd half with 3-column transposition undone. Notice how there are now 22 bigram repeats! It has an even higher number of repeats than the 1st half.

Very interesting, I’ll take a look at it.

Well I tried to make a post, but it didn’t post, so I have to write it again.

I know that feeling, it sucks!

I have an idea, and forgive me if someone has recently tried this and I didn’t catch on. Check for cycling in the top and bottom halves of the 340 and compare that to the count of +’s. See if there is a relationship. But I was also thinking about regression analysis. What about checking portions of the 340, saw five rows (or more) at a time, starting at the top. Find the total cycle score with whatever formula, and scroll down, one row at a time. Compare the cycle score to the total count of +’s, q’s, B’s and F’s, individually or in combinations. I wonder if regression analysis could help to determine if there is a relationship between cycle scores and high count non-cyclic symbols, or which high count non-cyclic symbols are more determinative of the score as compared to others. EDIT: We could also scroll from the bottom up for a few more scores.

I have dabbled with such a thing with my measurement of non-repeats. I’ll make another post after this one with some visuals.

I have also been thinking about my grid hillclimber idea, and know how I would do it. It wouldn’t be that difficult, and I may move forward with it at some point in the future.

Yes you should give it a go.

AZdecrypt

 
Posted : August 5, 2015 7:39 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

I thought about it, I think it’s intimidating to share anything at all with a room full of experienced cryptologist! When I told my father about the cipher the first thing he asked was "how do you know it’s genuine?". So I believe that’s one of the most important things to talk about. That for instance the encoding appears to be genuine, and so for the entire cipher.

In your opinion, what is the best evidence for the genuineness of the encoding?

I guess for me it would be the symbol frequencies, appearance of bigrams/trigrams, other repeated patterns, apparent cycling of symbols suggestive of homophonic encoding, etc. And also the fact that it’s fairly easy to generate test ciphers (with real plaintexts) that share these same qualities. Oh, and the fact that we know Zodiac was capable of encoding a valid message, because of Z408 and its known solution.

It’s certainly difficult to test the "it’s not geniune" hypothesis. We can rule out thousands of encoding schemes but there are numerous others than remain untested.

http://zodiackillerciphers.com

 
Posted : August 5, 2015 8:09 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

@doranchak,

Yes, there are certainly more repeating grams/patterns in the horizontal direction. Various measurements of cycles being much higher in the horizontal direction. 26 distinct strings of 17 symbols with no repeats, several rows having no repeats, all very typical of cyclic homophonic encoding. Symbol frequencies are a bit strange because there are many low counts, and does seem to hint towards some of the higher count symbols being 1:1 symbol substitutes/whatever, an important relation I think.

AZdecrypt

 
Posted : August 5, 2015 9:08 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Find the total cycle score with whatever formula, and scroll down, one row at a time. Compare the cycle score to the total count of +’s, q’s, B’s and F’s, individually or in combinations. I wonder if regression analysis could help to determine if there is a relationship between cycle scores and high count non-cyclic symbols, or which high count non-cyclic symbols are more determinative of the score as compared to others. EDIT: We could also scroll from the bottom up for a few more scores.

smokie I went through the 340 sliding from top-to-bottom with a row width of 10 with my measurement of non-repeats, it’s a measurement that relates very highly to how much the cipher cycles but can be thrown off by high count non-cyclic symbols. I’ve not compared these scores to your suggested symbols but scores are all very much in line, indicating that the encoding throughout the 340 is quite uniform. I believe daikon also suggested this.

This download contains all the 11 slides in graphs. Images numbered from 1 to 11. Image 1 is row 1 to 10, 2 is row 2 to 11, 3 is row 3 to 12, etc.

The red line is the left-to-right, top-to-bottom direction (our suspected direction of encoding for the 340) and the green lines is the right-to-left, top-to-bottom direction (mirrored version of 340). The other weaker colors are vertical and diagonal directions, the more right shift a line is the more evidence of cyclic homophonic encoding in that direction. There is also a table with numbers to the right, from 1 to 16, the first 4 are the horizontal directions. Higher is better.

I noticed something unusual, the green line (340 mirrored) often extends further than the red line, this may suggest that there are longer cycles to be found in that direction! Notice also at times a sharp drop for the red line, the wildcards are actually causing this.

Example:

AZdecrypt

 
Posted : August 5, 2015 10:57 pm
Page 2 / 96
Share: