Zodiac Discussion Forum

Canonical Set of En…
 
Notifications
Clear all

Canonical Set of English Alphabet Transformations?

4 Posts
2 Users
0 Reactions
1,436 Views
(@vexr408)
Posts: 2
New Member
Topic starter
 

Long time listener, first time caller :lol: ; asking a question which I’ve not seen discussed elsewhere. (If this has already been touched upon elsewhere, my apologies, and please redirect as appropriate) I will also probably over-explain things in an effort to error towards saying “too much” as opposed to leaving something out.

If someone wanted to create a cipher we have can create an infinite amount of our own unique symbols for use; but we already the handy (well known and fairly distinct) 26 symbols of our english alphabet to start with; and which we can easily modify to get more by either mirroring and/or flipping the standard character. Some of those letters (AHMOTU… -SZ) can’t be mirrored because they are either left-right symmetrical, or they transform into other existing letters. More letters (BCDEHIK… – MW) cant be flipped because they are likewise top-bottom symmetrical, or again turn into other letters. A minor addition is that some of the letters (for instance Q and J) don’t seem well suited for mirroring and flipping because what they produce isn’t robustly distinct and could easily be confused or cause errors while enciphering or deciphering.

Does there exist some canonical set or sets of symbols composed of just normal and mirrored (and/or flipped) english capital letters as part of a published and distributed cipher scheme or schemes? I understand we could generate one brute-force style by just doing mirrors and flips on all 26 letters and then crossing-off ones that are duplications, transformations (SZ MW), or too indistinct or have other faults. But has this exercise already been done by some reputable organization (that may have been available during Z’s timeframe; but even if it wasn’t).

The reason I ask is: its been noted that there’s no C in the 408 or Q in the 340. A common rationale for that is that they may have been assigned to low frequency letters (QJX) which weren’t in the cleartext and as such not transcribed into the ciphertext. My question is: what if there was a backwards B in the 408 and/or backwards R in the 340 as part of the predetermined key, but they also were assigned to low frequency letters not present in the cleartext. And can that supposition tell us anything at all useful about possible substitutions or the like?

Sure create a few more unique symbols for the 340 to increase the homophonic ranges for the high-frequency letters (and hopefully make cracking it less easy); but why ignore symbols you already know and had used (like backwards R,E, or /) without an obvious reason? Please let me know your thoughts on my speculations; and thank you in advance for any help of guidance you cant point my way.

P.S. How many of the analysis, transformation, and deciphering attempts on the 408/340 have “corrected” the minor misspellings and/or encoding/transcription errors? I’ve found almost none, and wonder if they might provide any additional insights or patterns to leverage.

P.P.S. My personal feeling is the over-abundant + symbols DO work as some sorts of wildcard scheme. Its also one of the few ‘roll-your-own’ additions we can hope to test/solve. [Where as if the solution requires moving column 4 to the front and 7 to the end, or some other odd one-off; there are too many possibilities and they are too dilute to solve by any sort of “natural intuition” and have way too many combinations and schemes to ever adequately determine through analysis or brute-force]. Thanks again; great site and people.

 
Posted : July 4, 2020 3:31 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

Welcome aboard VEXR408,

What is a canonical set of symbols?

The reason I ask is: its been noted that there’s no C in the 408 or Q in the 340. A common rationale for that is that they may have been assigned to low frequency letters (QJX) which weren’t in the cleartext and as such not transcribed into the ciphertext. My question is: what if there was a backwards B in the 408 and/or backwards R in the 340 as part of the predetermined key, but they also were assigned to low frequency letters not present in the cleartext. And can that supposition tell us anything at all useful about possible substitutions or the like?

I am glad you brought this up. I wonder if we can prove this "common rationale" you speak of. If so we may be one step closer to "proving" that the Z340 at least seems to have a real key.

Sure create a few more unique symbols for the 340 to increase the homophonic ranges for the high-frequency letters (and hopefully make cracking it less easy); but why ignore symbols you already know and had used (like backwards R,E, or /) without an obvious reason? Please let me know your thoughts on my speculations; and thank you in advance for any help of guidance you cant point my way.

Yep, good thinking. I like this angle.

P.S. How many of the analysis, transformation, and deciphering attempts on the 408/340 have “corrected” the minor misspellings and/or encoding/transcription errors? I’ve found almost none, and wonder if they might provide any additional insights or patterns to leverage.

It seems a trivial exercise for the Z408 and a unknown for the Z340. Please elaborate.

P.P.S. My personal feeling is the over-abundant + symbols DO work as some sorts of wildcard scheme. Its also one of the few ‘roll-your-own’ additions we can hope to test/solve.

Allot of work has been done in this direction and it didn’t yield anything. Since the highest frequency symbol in the Z408 is a 1:1 substitute it is very plausible that the same is true for the Z340 (makes sense with a predetermined key). One of the principles of branch prediction in CPU code is to simply remember which way it went the last time, supposedly that works 85% of the time. In that way the most likely and simple explanation for the "+" symbol is it being a 1:1 substitute.

AZdecrypt

 
Posted : July 4, 2020 10:42 am
(@vexr408)
Posts: 2
New Member
Topic starter
 

Canonical set of symbols

Does there exist a published or widely-used enciphering guide that shows a set of symbols that hold any strong correlation to the ones Z chose to use? Learning where/what Z learned enciphering from could hold lots of clues for both his identity and deciphering the codes. It would be great if we found that the Army field guide of 1953 had a scheme and symbol set identical to Z’s; but it would probably have to be older or more obscure than that (as most ciphering had moved to electronic communication methods by then). I wasn’t sure to what level anyone had researched into that area, or of there were any strong candidates on where the symbol set used came from and/or where he learned his coding methods.

Spelling/Coding Errors

What I was trying to ask is to what length have any of the possible various logical/textual errors been accounted for in the deciphering attempts? I can see errors creeping in by 3 main sources.

1) Supposedly early published copies of the 340 had some unintended differences from the actual Z text due to copy mistakes and/or unclear image quality photos of the original document.
– I feel those have probably been caught and corrected at this late date, but you never know
2) If Z made any errors while encoding the 340 (this letter should encode to R, but I accidentally wrote P)
– We already have at least 1 example of an encoding error (the cross-out replaced by K). There certainly could be others for multiple reasons.
3) Z seems to have made an abundance of spelling errors (either intentionally or unintentionally); which could make for some additional/unexpected challenges in deciphering without the key.
– Simple spelling mistakes would cause errors much like #2 above; but there are others that wouldn’t. If I wanted to say “HAPPY” but wrote “HAPY” as my clear text, that would make the encoded text all the more challenging to reverse-engineer (missing character, broken double-letter pattern, etc).

Have any of these (or any other abjurations that would make the published cipher less than “perfect”) been taken into account in the various decryption methods and attempts.

‘+’ Wildcard / Roll-your-own

Its my supposition that Z intended to shed additional light on the code (and/or send in more larger coded documents), but didn’t get a chance to. Even he could not have believed the 2 small coded messages could ever be solved without hints or the key since something SO short can be made to say almost anything.

If Z added some half-thought-out, or completely unique step to his encoding process (swap these 3 columns and then these 5 rows) then there exists some many “odd” things he could have done; and so many combinations of each one that it would become effectively impossible to reverse engineer. What kinds of minor additions were known/popular at Z’s time, and how far can we go down those (and others) before they become to computationally taxing?

 
Posted : July 10, 2020 6:17 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

Does there exist a published or widely-used enciphering guide that shows a set of symbols that hold any strong correlation to the ones Z chose to use? Learning where/what Z learned enciphering from could hold lots of clues for both his identity and deciphering the codes. It would be great if we found that the Army field guide of 1953 had a scheme and symbol set identical to Z’s; but it would probably have to be older or more obscure than that (as most ciphering had moved to electronic communication methods by then). I wasn’t sure to what level anyone had researched into that area, or of there were any strong candidates on where the symbol set used came from and/or where he learned his coding methods.

Graysmith found a coding book which had some matches but it is not convincing imo. In the history of homophonic substitution variations of letters have been used: mirrored letters, upside-down letters, small/big letters.

As to his coding methods, he must at least had some time invested in the matter. Using homophonic substitution was a very good choice, both in presentation and cryptographic properties. A high multiplicity cipher can still easily be unbreakable today as it essentially graduates towards becoming a one time pad. He cycled his homophones, a technique of which there exist few references in literature.

Have any of these (or any other abjurations that would make the published cipher less than “perfect”) been taken into account in the various decryption methods and attempts.

We have looked into this. The Z408 contains it shares of errors and is very easily solved. As an experiment I’ve added 41 (10%) more random errors on top of that:

9%#/Z/UB%eOR=pX=B
WV+eGYF69HP@K!/Ye
MJY^UIB7qTtNQYD5)
S(/9#B/OPAU%fRlqE
k^LMRJdrpFHVWe8Y
@+MGD9KI)6qX85zS(
RNtIYElO8qGBTQS#B
Ld/P#B@MIEJeU^9Rk
cWKqpI)Wq!85LMr9#
VPDR+j=6N(eEUHk+
ZcpOVWI5+tL)l^R6H
I)DR_TUrRe/@XJQK
P5M8RUt%L)NVEKH=G
rI!Jk598LMlNA)Z(P
zUpkA9#B)W+VTtOP
(=SrlfUe%7DzG)%IP
NB)ScE/9%%ZfAP#BV
peXqWL_F#Ec+@F69B
%OT5RUc+_dYq_^SqW
VqeGYKE_TYA9%BLt_
H!FBX9zXADd7L==8
_P%##6e5PORXQF%GA
Z@JTtq_8JI+BBqQW6
VEXr9WI6DEHM)=ULk

Which solves into:

Score: 18633.80 IOC: 0.0662 Multiplicity: 0.1323 Seconds: 0.81
Repeats: THEMOAT BECAUSE CAUSE EKILL EYOU WILL (2) ILLK KILL
PT-to-CT cycles: 4568

ILLKEKILLCNGPEOPL
EBECAUSEITISSOKUC
HFUNITLEMOREFUNTH
ANKILLKNIWILDGAME
INTHGFORRESTBECAU
SEHANISTHEMOATDAN
GERTUEANAMALOFALL
TOKILLSHTEFCINIGI
VESMETHEMOATTHRIL
BINGEXPERENCEITIE
EVENBETTERTHANGET
THNGYOIRRGCKSOFFS
ITHAGIRLTHEBESTPA
RTOFITIATHAEWHENI
DIEIWILLHEREBORNI
NPARADICLENDAHLTI
ELHAVEKILLEDWILLB
ECOMETYSLEVESSEIL
LNOTGIVEYOUMYNAME
BMCAUSEYOUWILLTRY
TOSLOIDOWNORETPPA
YILLLECTINGOFSLAW
ESFORMYAFTELLMFEE
BEORIETENETHHPITI

If you go over 10% errors then the cipher can become unsolvable to our current tools. Though I do think that then in this case the "errors" where intentional. In the same way nulls can also complicate decryption from around the 15% mark. Both "many intentional errors" and "many nulls" are good hypotheses.

If Z added some half-thought-out, or completely unique step to his encoding process (swap these 3 columns and then these 5 rows) then there exists some many “odd” things he could have done; and so many combinations of each one that it would become effectively impossible to reverse engineer. What kinds of minor additions were known/popular at Z’s time, and how far can we go down those (and others) before they become to computationally taxing?

Almost every "scheme" can be taken to the point where it becomes to computational. AZdecrypt can do millions of decryption attempts (for transposition etc) per day. Though that is not billions.

AZdecrypt

 
Posted : July 10, 2020 11:29 am
Share: