FBI analysis (by Dan Olson) of Z340 might be wrong

daikon · 2015-07-07T04:38:39Z

Now that I got your attention , here's my reasoning. More specifically, Dan Olson of FBI says: "Lines 1-3 and 11-13 contain a distinct higher level of randomness than lines 4-6 and 14-16. This appears to be intentional and indicates that lines 1-3 and 11-13 contain valid ciphertext whereas lines 4-6 and 14-16 may be fake."... _Dan_Olson While the first part is certainly true, lines 1-3 and 11-13 do not contain any repeats, so they are "more random", but I believe the conclusion he makes is without merit. I've tried constructing multiple ciphertexts similar to Z340, that contain a valid plaintext taken from other Zodiac letters, using a simple homophonic substitution used in Z480. It turns out it is very trivial to get the same characteristics of no repeats on some lines and several repeats on others. In fact, you can do it to pretty much any line without even having to alter your plaintext in any way. The key is to have multiple homophones for each plaintext letter, which is certainly the case for Z340 with whopping 63 cipher symbols for 26 letters of presumably English alphabet, and secondly, you need to switch from sequential use of homophones to random order from line to line. Here's why -- if you pick homophones strictly sequentially, and you are lucky with your plaintext not to have too many repeating rare letters, you won't have any repeats for a long time. If you pick homophones for the same letters randomly, you bound to get more repeats. So if Zodiac used sequential homophones selection for lines 1-3, but then switched to random for lines 4-10, and then went back to sequential for lines 11-13 and then back to random for the rest of cipher, you'd have the exact same "unevenness of randomness" exhibited by Z340 for pretty much any plaintext. Why would he do that? To make deciphering harder, to break up the homophone cycles. But in fact, Zodiac didn't even do that. If you observe more carefully, only lines 1-3 are "special", and lines 11-13 just happen to look "more random" because the ciphertext has 17 columns. Don't believe me? Go the the excellent Webtoy tool:and change the layout of Z340 from the default "20x17" grid to double the number of columns (click on "10x34"). It combines each of 2 original rows into 1 long row. You'll see now that only row 1 (original rows 1-2) has a few repeats, which is expected for a sequential homophonic substitution cipher. Row 6 (starts with U+R), which is a combination of original rows 11-12, now has a huge number of repeats, so is row 7 (original rows 13-14). Row 8 has surprising low number of repeats, but original rows 15-16 also had low repeats, and it could just be due to plaintext being non-repeating in that section. So you see, rows 11-13 in the original ciphertext are *not* that special, and the whole idea of the cipher being split in two and then the left half placed on top of the right half, which is what Dan Olson suggested on a few occasions, doesn't seem likely. I know that I'm contradicting what a well known and respected FBI crypto-analyst has said, and I have a total of a couple of months of experience and knowledge trying to break Z340, compared to decades of experience on Dan's part, so I could be totally off in my conclusions. But logic above suggests otherwise. What do you think, am I missing something? Now, the fact that rows 1-2, even when combined into one, have much lower number of repeats compared to the rest of the ciphertext is a very good sign! It does suggest that we are looking at a homophonic substitution, and that Z likely started with sequential assignment of homophones. Although it could also be because the plaintext has few repeats in the first two lines, but it is less likely.

Norse

(@norse)

Posts: 1764

Noble Member

If you mean it’s in a completely different language, other than English…

Yes, that’s what I have in mind: Hebrew, for instance – or Russian, where a completely different alphabet is in play.

One of my first theories about the Zodiac case was that the 340 plain text is Old Norse. I’ve moved pretty far away from that by and by, though, for various reasons – but that language would fit the bill in two possible ways:

a) If the plain text is – basically – runes, then you’d deal not only with a non-Latin alphabet but also potentially (depending on whether an older or a newer rune alphabet is used) one which consists of significantly fewer letters/signs than English.

b) If the text is non-runic Norse there would be several unusual letters/symbols in play in addition to the familiar Latin letters you get in modern European languages.

It’s an intriguing possibility, I guess – but to be honest I do think it’s a bit far fetched.

If this idea has merit, I’d say the likeliest candidates for a non-Latin alphabet plain text is something a fairly regular American guy might be able to produce because he learned the language as a child: Possibly Hebrew. Or – that would be my first choice – some Slavic language or other which uses the Cyrillic alphabet. I doubt Z was someone who mastered a dead language like Old Norse, or otherwise any language he had studied to the required degree.

Problem is indeed, as you say, that you’d need to be pretty fluent in any test language – on top of knowing what you’re doing crypto wise.

Posted : July 9, 2015 4:11 am

daikon

(@daikon)

Posts: 179

Estimable Member

Topic starter

The more I think about the possibility of Z340 being in a different language, the less likely it seems. Let’s assume for a second that Zodiac was a smart guy. Judging by his clever ways of avoiding capture it seems to be quite certain (such as applying plastic model glue to his fingertips to avoid leaving fingerprints, which is much better than wearing gloves in a number of ways). He also clearly didn’t want to be caught, so he’d want to try to hide any personal details about himself. Which is why, I think, he was using so many spelling mistakes in his letters – to make them significantly different from his normal writing style. Which by the way also makes me think he was either a published writer or a journalist of some sort, so he was afraid of being identified by his writing style. So even if he was fluent in a foreign language, he’d want to keep that information to himself as much as possible, as not to give any more clues to the police. Otherwise if it is discovered that Z340’s plaintext was in Hebrew, or Russian, then police can start looking closely at people of that descent, or at attendants of the corresponding language courses. I don’t think you can learn fluent Hebrew or Russian on your own, if I’m not mistaken? So that makes me think it’s either in a language that’s common in California alongside English, such as in Spanish, or that it’s in a dead language that someone can learn from a book on their own, such as Esperanto or Latin. But that’s just a guess at this point.

Posted : July 9, 2015 6:06 am

daikon

(@daikon)

Posts: 179

Estimable Member

Topic starter

After some more experiments with constructing ciphers similar to Z340, I’ve realized that there is one more common assumption that can be thrown out of the window because it is incorrect. The one about reading Z340 by rows, and not columns. It is based on the fact that there are much fewer repeats in the rows, vs columns, as WebToy clearly shows (see "Repeated symbols by row" stat). The mistake here is that, yes, the low repeat counts by rows tells us that the *cipher* was constructed by rows, and likely left to right as well. However, and this is the key, it tell us nothing about how the *plaintext* (i.e. the original message) was written into the rows/columns before it was encrypted.

Here’s an example.

I submit to you the following cipher:

K+bGHTm8qIC9DR4Q0
15jEOAS2pZ6Wa3iBo
UjCfN9JT4L+esK7HP
rRnhVQAIFJSWGMKfD
Hldce+U0b8iBI+TV1
2CEg563mOa40fJr1d
s2NbFsXh9R3WUj4V+
++cmSGmK+0HY1m2qm
3D4TPIiEd0QFWjeJa
piOK1gRUHLSmfIo2J
skKh3VAPMe7gNHG+4
01Q8qI2W++BJnUDbV
KW5chUfO3EdCFr9jH
67GT48+plZrVLWcAU
V0I+dJ12eB3Wf4506
78aCMKH19A2UV+ec5
WqNI++JLB6CU3M7uK
4Hsi89XPQVspADoNO
I0L56j+7+Y+85fRkM
P+NJl6K1BUH0r5+0n

Looking at WebToy stats, you’ll see that it is very much like Z340. It has very little repeats in the rows, and plenty of repeats column-wise. In fact, I managed to get the first 5 rows without repeats, and the first 2 rows combined have 0 repeats (evident if you put the cipher in 34-symbol rows). It has almost the same number repeated bigrams as Z340, and it even has one repeated 3-gram (vs. 2 in Z340). I’ve even mimicked the ‘+’ symbol being twice as frequent as the next most frequent symbol, which plays no role in making my point, but I just wanted to make this cipher very much like Z340 in all respects.

And here’s the kicker – I can even tell you that it’s the exact beginning of Z408, that we all know so well, truncated at 340 characters, encoded using a straight homophonic substitution, and yet trying to crack this cipher using ZDK/AZD will yield absolutely no result for one simple reason.

Because before I encrypted it, I have transposed the plaintext like this:

INCOIHWTBIAAOHEREETGORAHEBDTLCEGAY
LGAMSAIHESNMKITIRVHYFLRAIOIHLOSIMO
IPUUMNLECTGAINHLEEAOFTTEWRCEEMIVEU
KESCOKDFAHELLGELNNNUWHOWINEIDEWEBW
EOEHRIGOUEROLGMICBGRIEFHLISHWMIYEI
KPIFELARSMTFSIONEEERTBIELNNAIYLOCL
ILTUFLMREOUAOVAGITTOHETNBPDVLSLUAL
LEINUIEEMAELMETETTTCASIIEAAELLNMUT
LBSINNISATALESTXIEIKGTADRRLKBAOYSR
IESTTGNTNDNTTMHPSRNSIPTIEALIEVTNEY

If you don’t see it, start reading the first letters of each row, then the second letters, and so on.

There you go. We have another cipher that has the same stats as Z340, and yet it is clearly written vertically, top to bottom. I could’ve written it diagonally, if I wanted to. Or using any other number of "routes" or columnar transpositions. You just have to *encrypt* it horizontally, left to right, after you are done "transposing" the plaintext.

Which simply means that we cannot rule out that Z340 was written "vertically", or that columnar transpositions were used, etc.. I.e. this part of FBI’s analysis can be crossed out as well: "This indicates that the cipher is written horizontally and rules out any transposition patterns that are not strictly horizontal." Maybe that’s why Z340 hasn’t been cracked yet – nobody tried applying "transposition patters that are not strictly horizontal"?

I might be embarrassing myself here, of course, since I’m not a professional cryptographer, so please do point out any flaws in my reasoning above. 🙂

Posted : July 16, 2015 11:06 am

glurk

(@glurk)

Posts: 756

Prominent Member

Maybe that’s why Z340 hasn’t been cracked yet – nobody tried applying "transposition patterns that are not strictly horizontal"?

I think that the problem here is that there are a nearly infinite number of these transpositions. It would take a little while to try them all.

It’s not so much that "nobody tried" – many have – but the problem of "it would take billions of years." Just my opinion.

-glurk

——————————–
I don’t believe in monsters.

Posted : July 16, 2015 12:05 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

That’s what I said in my last post at: viewtopic.php?f=81&t=267&start=140

There is however a way to look through the encoding, kinda. Let me share some data, note that there is some bias towards the horizontal direction just because of the encoding.

Bigram counts in percentages (undoing directional transposition):

408:
Horizontals: 44.33%
Verticals: 15.76%
Diagonals 1: 23.15%
Diagonals 2: 16.74%

340:
Horizontals: 28.42%
Verticals: 24.21%
Diagonals 1: 27.36%
Diagonals 2: 20%

daikon:
Horizontals: 36.19%
Verticals: 22.01%
Diagonals 1: 20.52%
Diagonals 2: 21.26%

daikon2:
Horizontals: 34.50%
Verticals: 24.79%
Diagonals 1: 19.94%
Diagonals 2: 20.75%

daikon3:
Horizontals: 39.76%
Verticals: 20.47%
Diagonals 1: 20.47%
Diagonals 2: 19.29%

daikon4: 17 by 20 (transposition)
Horizontals: 27.81%
Verticals: 21.30%
Diagonals 1: 27.81%
Diagonals 2: 23.07%

daikon4: 34 by 10 (transposition)
Horizontals: 21.30%
Verticals: 37.39%
Diagonals 1: 22.60%
Diagonals 2: 18.69%

daikon4: 17 by 20 (transposition undone)
Horizontals: 41.48%
Verticals: 18.08%
Diagonals 1: 23.40%
Diagonals 2: 17.02%

daikon4: (transposition undone)

KUHs3sKVWI+jl2DkW
0q0bCdN4K5INLGfcb
Thc+I5HNeFP3hd+6T
9+sIVUJ+jmJUXiAf1
J+8T0hEPO2L7q4b9d
M3eB+IL8R0eEB6YC+
i3Q7d3C+9eBWFgCWU
8DsIUWNFf35RK+jjH
r4Mf47T4eG957RQHV
VJ+j0uk0P1+a4H6KM
1r2+p0674P5RC+i17
8H+jnEcOQGasNEhgm
K8TCiJOV5S1q4M8lA
Q6GgI8K96SA3mR2+H
XK2ImKUWp1P1pFO+H
+l9QBZJa0L+ZAVU6S
4HSBr2sHWW0YmJVUp
0aGf1fnLVAr3MJmIU
W+D5iKr2oDceo+Bf1
q2bAcN0oDdmJVU5On

AZdecrypt

Posted : July 16, 2015 1:46 pm

daikon

(@daikon)

Posts: 179

Estimable Member

Topic starter

I think that the problem here is that there are a nearly infinite number of these transpositions. It would take a little while to try them all.

Overall, all possible transpositions, yes, probably in the billions. The ones that can be done with a pen and a piece of paper? Probably a lot less. The classical columnar transposition, when you arrange your text in N-columns and then read it by columns – for a text with 340 characters – there are only 340 ways of doing that. And roughly half of those will be degenerate cases, because you’ll have only one character in the column, so reading it column-wise will be the same as reading it normally, by rows. Here’s an example of what I’m talking about. For a plaintext "ABCDEFGHIKLMNO" (14 characters), using more than 7 columns would be silly, as you’ll end up with, for example:
12 column transposition: "ANBOCDEFGHIKLM" (most of the original text intact towards the end)

ABCDEFGHIKLM
NO----------

11 column transposition: "AMBNCODEFGHIKL" (most of the original text intact towards the end)

ABCDEFGHIKL
MNO--------

Etc…

So out of 340 possible transpositions, we only need to test half, or 170. That’s not that many, definitely less than billions. 🙂 It’s actually a bit more than that, because most of these transpositions will depend on the length of the text for decoding (i.e. reversing the transposition to get the original plaintext). And we don’t really know if Z used the full 340 characters, or he added filler at the end after doing the transposition. So we’ll need to test for all lengths of text as well, for each transposition. How much filler is at the end? Judging by Z408, there can be up to one full line of filler and then some in the second to last line. Let’s make it full 2 lines/rows to be safe, for 17*2=34 possible variations in the plaintext length. So we end up with 170 * 34 = 5,780 possible transpositions with different lengths of plaintext. Let’s even throw in the possibility that Z reversed the whole plaintext as well (wrote it backwards, as was the case with Feynman cipher #1), which will make it 11,560 ciphers to test with ZKD/AZD. Should be doable, I think, right? With, say, 60 seconds for each test, running 4 in parallel, it can be done in almost exactly 48 hours.

I’m nominating Jarlve for the task, since he has access to the newest and greatest version of AZD. 🙂

Posted : July 16, 2015 10:22 pm

daikon

(@daikon)

Posts: 179

Estimable Member

Topic starter

That’s what I said in my last post at: viewtopic.php?f=81&t=267&start=140

Yeah, I didn’t get your question at the time. Now I understand it perfectly. So you’ve already arrived at the same conclusion before me. 🙂

There is however a way to look through the encoding, kinda. Let me share some data, note that there is some bias towards the horizontal direction just because of the encoding.

Bigram counts in percentages (undoing directional transposition):

Interesting data indeed! It does show that my latest cipher has much less horizontal bias (just like Z340!) with bigrams. Can you elaborate a bit on what it is you are counting exactly? I’m not quite following. Are you counting bigram *repeats*? Or total number of unique bigrams? And it shows percentages of what? I.e. what is the denominator? I’m also not sure what is Diagonals 1 and 2? Sorry if all this is already common knowledge, but I can’t be the only one confused when reading this. 🙂

Posted : July 16, 2015 10:32 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

For columnar transposition, which are basicly the permutations of 17 columns/elements, there are http://www.wolframalpha.com/input/?i=17%21 number of different combinations. And the solution is spikey, you need to hit the nail on the head.

Bigram repeats yes, for instance M+, ++, FB in the 340.

Look for the table of directions in this document of mine. I consider a primary direction and a secondary direction. For instance the writing system we use is: primary: east, secondary: south. So in total you can come up with 16 basic directions for a cipher in a 2D grid. Note that when considering bigram counts, the reverse of each direction will have equal counts so only 8 directions have to be explored.

About percentages. If there are a total count of 100 bigram repeats for all directions and in the horizontal directions there are 40 bigram repeats then horizontal will be 40%

Btw, version 0.95 of my solver is out. You can download it by clicking "My work on the ciphers" in my signature. I just started a new test for transposition (row flips). 3.145.728 ciphers will be processed over the next 2 weeks. Better not have a power outage again!

AZdecrypt

Posted : July 17, 2015 12:31 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

Very cool – Thanks, Jarlve!

http://zodiackillerciphers.com

Posted : July 17, 2015 12:41 am

daikon

(@daikon)

Posts: 179

Estimable Member

Topic starter

For columnar transposition, which are basicly the permutations of 17 columns/elements, there are http://www.wolframalpha.com/input/?i=17%21 number of different combinations. And the solution is spikey, you need to hit the nail on the head.

I’m guessing you are talking about a different kind of transpositions, when you change the order of columns, like this, right? I kind of doubt Z would go as far as transposing all 17 columns. Reason being – he wanted to create a cipher that was hard to crack, but not completely impossible to crack. Otherwise, if he wanted to create a truly crack-proof cipher, he might as well just use a one-time pad only known to him and be done. But what fun in that? So I think he stayed away from really complicated encryption systems. I can see him using a 6-letter key for columnar transpositions (ZODIAC), or maybe up to 9, so that it is repeated twice across 17 columns, which gives us "only" 362,880 possibilities.

About percentages. If there are a total count of 100 bigram repeats for all directions and in the horizontal directions there are 40 bigram repeats then horizontal will be 40%

Sorry, I still don’t quote follow your methodology. When you say there are a total of 100 bigram repeats, do you mean you throw away all bigrams that appear just once, and count bigrams appearing more than once (ignoring their individual repeat counts), or do you add up all repeat counts for all bigrams that appear more than once? For example, for the following text "ABCBCBC", you get the following counts: AB=1, BC=3, CB=2. So what is your "bigram repeat count" for this? 2 (there are 2 repeating bigrams: BC and CB)? Or 5 (2 repeating bigrams repeated 5 times total, 3 for BC and 2 for CB)? I’m pedantic that way, to make sure I understood everything perfectly before drawing any conclusions about the results. 🙂

Btw, version 0.95 of my solver is out. You can download it by clicking "My work on the ciphers" in my signature.

You forgot to update the link! It still links to "azdecrypt094.zip". 🙂

Posted : July 17, 2015 2:26 am

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Ah yes, you are talking about a different kind of transposition. I’ll look into it.

ABCBCBC. BC repeats 2 times and CB repeats 1 time for a total of 3 bigrams for this string. For every direction, a new string is checked. That’s how I do it.

I updated the link, it’s working now. Thanks for mentioning it.

AZdecrypt

Posted : July 17, 2015 3:16 pm

daikon

(@daikon)

Posts: 179

Estimable Member

Topic starter

ABCBCBC. BC repeats 2 times and CB repeats 1 time for a total of 3 bigrams for this string.

I see! It is similar to bigram index of coincidence then, which sometimes is abbreviated as DIC for "digram index of coincidence". For general IoC you add up n*(n-1), where n is the number of repeats for each "item". So all items that are not present, or are only repeated once, will be cancelled out (either n, or n-1 will be 0), and the rest are almost "squared", which exaggerates letters/bigrams with lots of repeats.

Posted : July 17, 2015 9:23 pm

marie

(@marie)

Posts: 189

Estimable Member

While I do understand quite a bit of programming, I just have a few questions on yours that I may have missed the answer to.

I do see there are parameters that can be changed. I have manually done some data analysis on the 408 to (hopefully) give me hints on the 340. For example, letter usage, once deciphered, is different from the standard "english" usage. Z also used a variety of symbols to represent letters he knew were more common, or thought would be in his cipher (8 for E, a letter that appeared 47 times, but 6 for O appearing 16 times, and only 4 for I that appeared 39 times). So I wondered what is being used for the alphabet, allowing how many substitutions per letter. And should it also be a compilation of his other "confirmed" letters?

O may go up in usage with all his bomb talk, or even x for taxi.

I also think while Z was no ciphering genius, he probably realized these errors since 408 was solved so easily, and attempted to correct them.

The problem when solved will be simple– Kettering

Posted : July 19, 2015 11:07 am

glurk

(@glurk)

Posts: 756

Prominent Member

While I do understand quite a bit of programming, I just have a few questions on yours that I may have missed the answer to.

I do see there are parameters that can be changed. I have manually done some data analysis on the 408 to (hopefully) give me hints on the 340. For example, letter usage, once deciphered, is different from the standard "english" usage. Z also used a variety of symbols to represent letters he knew were more common, or thought would be in his cipher (8 for E, a letter that appeared 47 times, but 6 for O appearing 16 times, and only 4 for I that appeared 39 times). So I wondered what is being used for the alphabet, allowing how many substitutions per letter. And should it also be a compilation of his other "confirmed" letters?

O may go up in usage with all his bomb talk, or even x for taxi.

I also think while Z was no ciphering genius, he probably realized these errors since 408 was solved so easily, and attempted to correct them.

Your counts seem really incorrect. So far as I know, 7 subs. were used for plaintext E, which appears in the 408 in 54 instances. 4 subs. were used for plaintext O, which occurs 27 times. And yes, 4 subs. for I which occurs in the PT 44 times.

This can all be a bit subjective, of course, depending on how one reads it and translates the misspellings.

-glurk

EDIT: this image is fairly correct:

——————————–
I don’t believe in monsters.

Posted : July 19, 2015 11:49 am

marie

(@marie)

Posts: 189

Estimable Member

I will give you my counts may be off, I did account for corrections, but I did it a while ago manually, should have had the computer count. In any case, the counts vary in percentage from the "norm." And there might seemingly be little difference between 2 and 4 percent, as some letters show, that could be huge in terms of cracking the cipher.
And while I am accomplished in statistics, eyeballing can sometimes be best (I can find stats to prove or disprove the same point. Its 5 am where I am, I’ll recount in the morning, uh, at this point afternoon, and see what I come up with.

The problem when solved will be simple– Kettering

Posted : July 19, 2015 1:15 pm

Zodiac Discussion Forum