I have looked at 340 a few times over the last couple years. I have a few things to note, none of which seem entirely new after browsing this site a little, but I thought I’d get some feedback on them.
1. The symbol ‘+’ seems to be fundamentally different from the other cipher characters, for a number of reasons: frequency, distribution, etc. I have a suspicion that the "+" symbol was inserted "randomly" (a human’s poor intuition for random, anyway) in a second pass over the cipher as a wildcard, to increase the difficulty, after the 408 was solved "too easily" in the mind of Zodiac.
To add to this a little (maybe), the symbol "+", either consciously or subconsciously, evokes the idea of "more" as in "this symbol represents more letters than one." It also, consciously or subconsciously, has a bit of a "cross out" implication, the + crossing out the correct cipher.
2. The "Halloween Card" connection seems compelling, and I am continually drawn to it for the following reasons. First is the apparent quadrant structure implied by the central "+" and the corresponding mid-row "-" at the ends:
To add to the (possibly spurious) attraction is the appearance of "by" in some direction in each quadrant. Of course you have to consider both variants of "y" as interchangeable for this to be valid.
I have been thinking about how to test for the expectation of this "by" coincidence. I see there has been a lot of good work here testing ideas like this against expectation. I am not entirely sure how to design such a test yet, though.
If the "by" construct is not a complete red herring, it appears to suggest something about the cipher reading right-to-left in the top half, and left-to-right in the bottom half. Or perhaps reading in independent quadrants. You can find various permutations, like alternating line directions, that are consistent with the directions of the "by" directional hints.
3. The "Zodaik" (sic) implication in the lower right jumps right out at you first thing, of course. Has anyone ever noticed this? Consistent with "Signed", reading right-to-left, if some of the characters are to be taken literally, as the "Zodaik" seems to imply. Probably noise, but I thought I’d point it out.
Anyway, just some recent thoughts. None of these have led to any particularly compelling progress towards a decipherment yet.
Let me add a little more substance about why I think ‘+’ is likely to not be a typical encipherment character, but rather functions as an obfuscating wildcard which was added in an ad hoc manner after the fact to make the cipher more difficult to solve.
First, I think he was probably annoyed that the 408 was solved within days by amateurs. And so he decided he needed to take further steps to make the cipher more difficult to solve (and clearly, it has been). Understanding what those steps might have been is probably the key to cracking 340. I am assuming that they probably consisted of some type of preprocessing of the plaintext, or postprocessing of the cipher, or both, combined with a homophonic cipher like the 408. Of course, it could be anything, but if we don’t assume it was something that was reasonably systematic, we might as well give up. So it seems like a fair assumption if we want to continue working on it.
From the various mistakes evident in 408, I think it’s safe to say that Zodiac was more or less a hobbyist at ciphers. Maybe someone who does the Sunday puzzles, and/or has read a popular book or two. So we should expect him to follow fairly amateurish methods in their construction. When you sit down with pencil and paper to do a homophonic cipher as an amateur, one of the first things you’ll look for to "hide" by using multiple encodings is double letters, since we all know that double letters are one of the first keys that everybody uses to make initial guesses at a simple substitution cipher. But "++" appears not once, not twice, but three times in the cipher. It seems unlikely that he would allow that to stay in a puzzle he wanted to make more difficult than the 408, where he only left one double letter to be encoded as such. But, it’s not a problem at all if "+" is a wildcard which can mean anything. Furthermore, there are 3 "++" sequences if the text reads vertically, two of which are connected to "++" sequences which are horizontal. If he’s looking over the puzzle and trying to add "attractive distractors" that lead puzzle solvers down blind alleys (perhaps like "Zodaik" near the end), this might be one way he could do it, since he has total freedom in where to overlay the wildcard.
Another compelling hint is the frequency of "+", which is twice *any* other symbol. If he’s trying to make the puzzle hard, then why let a correct guess for "+" crack so much of it? There was nothing even close to this big jump in distribution in the 408. I think it’s because there is no correct guess for "+", i.e., it’s a wildcard.
Then there’s the placement of the "+" at the "center" (minus the "signature line", I guess) of the puzzle. Which seems to suggest it being "central" to the puzzle somehow. Maybe.
The final thought is a bit more tenuous but worth mentioning, and again refers to the "Halloween Card". Perhaps the "Paradice Slaves" arranged in a cross formation is a hint that the cross is a special symbol in the puzzle.
Does anyone know of a solver for ciphers of this kind which can allow for wildcard positions? If it doesn’t exist, I might try to modify zkdecrypto to handle this.
Let me add a little more substance about why I think ‘+’ is likely to not be a typical encipherment character, but rather functions as an obfuscating wildcard which was added in an ad hoc manner after the fact to make the cipher more difficult to solve.
I believe that the "+" symbol may be a 1:1 substitute. In the 408 the most frequently occurring symbol is the "q" and is a 1:1 substitute. The Hardens even commented on that, thinking the symbol was probably the letter "e" at first, and concluded that this was possibly a clever trick employed by Zodiac to throw off frequency analysis even further. Not to say that the wildcard hypothesis is uninteresting or not to be considered.
When you sit down with pencil and paper to do a homophonic cipher as an amateur, one of the first things you’ll look for to "hide" by using multiple encodings is double letters, since we all know that double letters are one of the first keys that everybody uses to make initial guesses at a simple substitution cipher. But "++" appears not once, not twice, but three times in the cipher. It seems unlikely that he would allow that to stay in a puzzle he wanted to make more difficult than the 408, where he only left one double letter to be encoded as such. But, it’s not a problem at all if "+" is a wildcard which can mean anything.
We have explored this hypothesis to some extent.
We have explored this hypothesis to some extent.
Cool. Doubted it was that easy.
Do AZDecrypt or zkdecrypto have the ability to score by both rows and columns (and maybe diagonals)? On the hypothesis that the cryptogram is laid out more like a crossword or word search, than just reading left to right and down.
I highly doubt this is a novel thought either.
Some thoughts..
1. Yes, the + symbol might be a 1:1 substitution. In the 408, the most frequent symbol represents the letter M. It most likely is not a frequent letter (as those use too many homophones). However, a 1:1 substitution homophone most likely would not be present >7% either. Bernoulli analysis has shown that the + symbol most likely represents either an above-average ‘L’ or the letter ‘S’ with a slightly higher-than-expected frequency. IMO it’s an ‘S’.
2. The ‘BY’ or ‘YB’ symbols of the cipher is only one out of a few bigrams present in the 340. This one seems to be reversible such as ‘NE’ or ‘EN’ in the words ‘NEXT’ and ‘ENGLISH’. Bigrams appearing twice most likely represent frequent bigrams. Some might occur up to 8 times or more, hidden amongst various different homophones. In fact such bigrams help to narrow down potential letters when computing. Computing won’t work, however, if e.g. QZ occurs twice in such a cipher..
3. Personally I don’t believe in any other encryption method than homophone substitution. Shorter text, more homophones plus throwing aboard any strict sequences when using the homophones already makes the 340 way harder than the 408. As an example, there is no repeating 4-gram present in the 340 (compared to the 408). Even if you calculate the overall variations of both ciphers, you’ll end up in some 90-digit or more figures with huge, huge difference of difficulty levels between those two ciphers (e.g. one has a 50 digit longer amount of variations than the other). This is, imo, why the 340 cannot be cracked by any standard homophone shuffling software (no matter how good it is..).
4. Let’s face it: The 340 is extremely hard to crack. Either someone lands a ‘lucky strike’ (by guessing and using Oranchak’s webtoy or any other tool) OR one calculates the cipher completely (shuffling the homophones to it’s optimum). For latter it won’t be sufficient to use Tianghe-II or Big Blue or any other (even military) supercomputer. It’d – definitely – take some kind of quantum computer, otherwise: No chance. Prove me wrong.
5. What actually IS possible is to solve parts of the cipher. Let’s assume the following: You say, for example, that all repeating bigrams indeed are frequent bigrams. You could also say that all of them are non-frequent or that some of them are frequent and others are not (however you prefer to). Let’s further exclude e.g. rare letters (or frequent letters or both) for the + symbol. THEN you have a certain approach to ‘crack’. Because: If you find a string in the cipher that consists of e.g. many + symbols and many bigram letters, then you may complete such string with some remaining homophones that could represent any letter from A to Z. And such string could contain certain words (e.g. a dictionary with 4000 words) of a certain length (e.g. minimum 5 letters). Now if you’ve found such a string containing at least one word, you could cross-check this string with a second, completely different string from the cipher (if the same homophones match to each other). This is how you could find e.g. two different words at two different positions in the cipher. This, imo, is the only way to crack the cipher by computation.
I currently do run such computation, precisely now focussing on line 13 of the 340 cipher (starting at the reverse y symbol, ending at the R symbol – thus a string of 13 symbols/letters). Of course I had to predefine some bigrams, the + symbol as well as the IoFBc part of the string. E.g. the letter S as the plus symbol and the 5-gram ‘ENTHE’ representing the IoFBc part. Combining this with a specific amount of frequent bigrams, the string is almost complete.
Only one of such ‘runs’ takes approximately 45 minutes to compute (~10 million text variations). Many word results occur multiple times in different strings, so I get approximately 50,000 to 150,000 strings with words present in it. As they repeat, a next step is to delete duplicates accordingly.
Here, as an example, is the result for the latest run (with ‘S’ as plus symbol and ‘ENTHE’ for the IoFBc part of the string; searching for words of length >5).
Thus, IF Z had used the configuration mentioned above, his (partial) cleartext (inside this specific string) is shown below! Please be aware that the results are not non-overlapping, therefore e.g. ‘assent’ + ‘thereto’ in this configuration is no valid result while e.g. ‘listen’ + ‘theater’ could very well be..
QT
‘theater’
‘theatre’
‘heated’
‘hearth’
‘health’
‘header’
‘headed’
‘heaped’
‘heaven’
‘theirs’
‘therein”herein’
‘thereon’
‘thereto’
‘heroes’
‘hermit’
‘thence’
‘lessen’
‘lessen”theater’
‘lessen”theatre’
‘lessen”heated’
‘lessen”hearth’
‘lessen”health’
‘lessen”header’
‘lessen”headed’
‘lessen”heaped’
‘lessen”heaven’
‘lessen”theirs’
‘lessen”therein”herein’
‘lessen”thereon’
‘lessen”thereto’
‘lessen”heroes’
‘lessen”hermit’
‘lessen”thence’
‘assent’
‘assent”theater’
‘assent”theatre’
‘assent”heated’
‘assent”hearth’
‘assent”health’
‘assent”header’
‘assent”headed’
‘assent”heaped’
‘assent”heaven’
‘assent”theirs’
‘assent”therein”herein’
‘assent”thereon’
‘assent”thereto’
‘assent”heroes’
‘assent”hermit’
‘assent”thence’
‘listen’
‘listen”theater’
‘listen”theatre’
‘listen”heated’
‘listen”hearth’
‘listen”health’
‘listen”header’
‘listen”headed’
‘listen”heaped’
‘listen”heaven’
‘listen”theirs’
‘listen”therein”herein’
‘listen”thereon’
‘listen”thereto’
‘listen”heroes’
‘listen”hermit’
‘listen”thence’
‘hasten’
‘hasten”theater’
‘hasten”theatre’
‘hasten”heated’
‘hasten”hearth’
‘hasten”health’
‘hasten”header’
‘hasten”headed’
‘hasten”heaped’
‘hasten”heaven’
‘hasten”theirs’
‘hasten”therein”herein’
‘hasten”thereon’
‘hasten”thereto’
‘hasten”heroes’
‘hasten”hermit’
‘hasten”thence’
‘unseen’
‘unseen”theater’
‘unseen”theatre’
‘unseen”heated’
‘unseen”hearth’
‘unseen”health’
‘unseen”header’
‘unseen”headed’
‘unseen”heaped’
‘unseen”heaven’
‘unseen”theirs’
‘unseen”therein”herein’
‘unseen”thereon’
‘unseen”thereto’
‘unseen”heroes’
‘unseen”hermit’
‘unseen”thence’
*ZODIACHRONOLOGY*
4. Let’s face it: The 340 is extremely hard to crack. Either someone lands a ‘lucky strike’ (by guessing and using Oranchak’s webtoy or any other tool) OR one calculates the cipher completely (shuffling the homophones to it’s optimum). For latter it won’t be sufficient to use Tianghe-II or Big Blue or any other (even military) supercomputer. It’d – definitely – take some kind of quantum computer, otherwise: No chance. Prove me wrong.
Certainly exhaustive computation is right out. But I think things are not quite as hopeless as that. The key thing is the complexity of the fitness landscape. Part of the fitness landscape complexity is inherent in the English language and in the construction of the cipher, but there might still be room for improvement. No navigation method, no matter how sophisticated, can succeed with a broken compass.
5. What actually IS possible is to solve parts of the cipher.
I’ll have to think about this because I don’t quite follow it on first read. It seems hard to understand how using less of the cipher could make things easier. I think maybe you are proposing some kind of multi-stage algorithm that optimizes to a subspace of solutions that fit one line, then constrains another phase of search to only that subspace when considering the rest of the cipher. Something like that? Does a technique like this work on Z408?
Not sure if the method works at all, however it’s the only way to reduce the cipher’s complexity. Parts with too many non-frequent homophones (such as the first two lines of the 340) can be left out while stronger parts of the cipher get into focus. This is clearly an advantage. Often the selected strings contain too many non-identifiable homophones..with non-identifiable I mean such homophones that can’t be tied to any n-gram. Those might be frequent (implying they’d represent a frequent letter, too), but in most cases you have to replace them with any letter from A-Z. Then again, the 408 has two repeating 4-grams – however both appear to be rather outliers than some of the most frequent ones (‘ILLI’ and ‘METH’). Too many different bigrams, too, meaning it’d take a broader range of bigrams to pre-define. It should work on the long run, but I doubt that it’d be easy.
With the 340 it’s slightly different: We there have two overlapping trigrams (IoF and FBc). This, imo, is an indication that the IoFBc part is in fact a potentially reapeating 5-gram. It doesn’t matter if it actually occurs twice or more often in the cipher. What does matter, however, is that the two trigrams, IoF and FBc, actually do show up at least twice. Therefore one may assume that we do have two frequent trigrams that – if combined – represent a 5-gram which wouldn’t be rare either. An example…’QZU’ and ‘ZTW’ are non-frequent trigrams. You can’t combine them in a way IoFBc does. ‘QZU’ and ‘UTW’, however, can be combined to ‘QZUTW’.
However both of those trigrams aren’t frequent, thus rather wouldn’t show up twice (or more often) in the cipher. Therefore it is required to use two frequent trigrams, such as ‘THE’ and ‘AND’. Those, again, can’t be combined..now finally we choose ‘THE’ and ‘ENT’: Those frequent (!) trigrams can indeed be combined to the 5-gram ‘THENT’. Now if you take a list of the most frequent 5-grams, you soon will find out that it’s partial trigrams might either be frequent or non-frequent. Most are non-frequent, thus only some 5-grams which contain two frequent trigrams remain…with those we can start. This advantage, although easier to solve in general, does not exist in the 408.
To find words in such selected strings one should at least choose twice the length of the word someone is looking for. Therefore, if you search for a word of length >5 letters, the string should at least be of length >=12 letters. The 5-gram only covers parts of it..with the distinctive + symbol we still have only half of it. The rest of the string must be set-up with partial or complete frequent bigrams, trigrams or double letters. The sequence I currently look at actually has the following structure:
S-S-P-F-A-A-A-A-A-F-Z-F-S
with
S: Second letter of a repeating bigram
P: Plus symbol
F: First letter of a repeating bigram
A: Part of the 5-gram (IoFBc)
Z: Unknown letter, A-Z
Therefore, except one homophone (Z), the complete 13-letter string can be set-up either by the 5-gram or other repeating bigrams. By choosing
a.) a plus symbol
b.) a 5-gram consisting of two frequent trigrams
c.) a range of frequent (repeating) bigrams
d.) A-Z for one letter
the computation process can start…searching for words leads to results as shown above.
QT
*ZODIACHRONOLOGY*
To add to the (possibly spurious) attraction is the appearance of "by" in some direction in each quadrant. Of course you have to consider both variants of "y" as interchangeable for this to be valid. I have been thinking about how to test for the expectation of this "by" coincidence. I see there has been a lot of good work here testing ideas like this against expectation. I am not entirely sure how to design such a test yet, though.
I’ve always been curious about that, so I ran a quick test by shuffling Z340 1,000,000 times and counting how many times the variations of "BY" and "YB" appear in the horizontal reading direction. For Z340, they appear 4 times. Here is the breakdown of counts in random shuffles:
0=439902
1=374363
2=145370
3=34276
4=5380
5=651
6=54
7=3
8=1
The average is 0.793, and standard deviation is 0.86, which puts Z340’s measurement at 3.7 standard deviations from the mean.
Based on that, there is only about a 1 in 164 chance of at least 4 "BY" variations appearing horizontally completely by chance. Seems a little significant, but keep in mind that we haven’t tested for all the other short words that could have also appeared with similar odds.
Putting it another way: The chance of "BY" appearing 4 times is low. The chance of "OF" appearing 4 times is probably also low. But, I suspect the chance of *any* short word appearing 4 times is probably rather high. This goes back to the golf ball analogy: The chance that a specific blade of grass is hit by the ball is extremely low. But the chance of *any* blade of grass getting hit is very high.
Just for fun, here’s an example shuffle that produced 8 occurrences of "BY":
Putting it another way: The chance of "BY" appearing 4 times is low. The chance of "OF" appearing 4 times is probably also low. But, I suspect the chance of *any* short word appearing 4 times is probably rather high
Right. But the hypothesis that "by" is special and not interchangeable with any other two character sequence comes from the Halloween card "by knife, by gun, by rope, by fire".
Why does it make any sense to interpret the ciphertext literally? Maybe it doesn’t, but the appearance of the "signature" may suggest that there are other literals within the ciphertext as well.
Do AZDecrypt or zkdecrypto have the ability to score by both rows and columns (and maybe diagonals)? On the hypothesis that the cryptogram is laid out more like a crossword or word search, than just reading left to right and down.
I highly doubt this is a novel thought either.
A long time ago I tried scoring the cipher in multiple directions at the same time in an attempt to solve a dense word search cipher but it did not work. AZdecrypt has a transposition solver that can tackle a variety of unkeyed transposition schemes without knowing what it is. If a cipher is for example transposed diagonally in a 14 by 25 grid then the transposition solver will find it.
3. Personally I don’t believe in any other encryption method than homophone substitution. Shorter text, more homophones plus throwing aboard any strict sequences when using the homophones already makes the 340 way harder than the 408. As an example, there is no repeating 4-gram present in the 340 (compared to the 408). Even if you calculate the overall variations of both ciphers, you’ll end up in some 90-digit or more figures with huge, huge difference of difficulty levels between those two ciphers (e.g. one has a 50 digit longer amount of variations than the other). This is, imo, why the 340 cannot be cracked by any standard homophone shuffling software (no matter how good it is..).
If the 340 is like the 408 then it should have been solved a long time ago. The higher difficulty (multiplicity) of the 340 versus the 408 is trivial nowadays.
Based on that, there is only about a 1 in 164 chance of at least 4 "BY" variations appearing horizontally completely by chance. Seems a little significant, but keep in mind that we haven’t tested for all the other short words that could have also appeared with similar odds.
I was also wondering about this and arrived at the same conclusion with a thought experiment. Thanks for doing the math!
Right. But the hypothesis that "by" is special and not interchangeable with any other two character sequence comes from the Halloween card "by knife, by gun, by rope, by fire".
True. But we have to be careful about "back fitting" the observation to Zodiac materials. For instance, if it is easy to generate short words from random cipher text, then it may also be possible to argue a link between those words and other aspects of the Zodiac’s correspondences or crimes. "RH", for example, being a signature of sorts on the Riverside desktop poem, could be found and we could argue a link. It’s hard to measure how significant these kinds of observations really are.
Why does it make any sense to interpret the ciphertext literally? Maybe it doesn’t, but the appearance of the "signature" may suggest that there are other literals within the ciphertext as well.
Yeah, it’s a big unknown. He makes it especially tantalizing with how imperfect the signature is. If it was spelled exactly right, there’d be much less ambiguity about his intent to put it there. Maybe it’s all just filler to keep everyone guessing.
True. But we have to be careful about "back fitting" the observation to Zodiac materials.
I agree. At this point it is a pretty thin thread. There would need to be some additional evidence that it’s meaningful to give it more attention.
Nice work on the shuffling statistics!