Zodiac Discussion Forum

Notifications

Clear all

autocomplete "button press" approach

Zodiac Cipher Mailings & Discussion

Last Post by anderson110 6 years ago

3 Posts

3 Users

0 Reactions

1,872 Views

RSS

ellispierce

(@ellispierce)

Posts: 1

New Member

Topic starter

Hi all.
I had a thought on homophonic deciphering, and joined the forum in part to get people’s thoughts on it. Thanks in advance for reading and considering.

The ability to automatically solve a homophonic cipher seems like a large gap which needs to be filled in efforts with z340. As far as I’m aware, the hill-climbing algorithm described in http://www.cs.sjsu.edu/faculty/stamp/RUA/homophonic.pdf is currently the best algorithm in this direction, but this appears to be generally unsuccessful with shorter ciphers. As such, the analysis I have seen (such as period 19 bigrams) is based around statistical data with no way of guaranteeing if the resulting cipher actually gives rise to a plausible message or not.

I wondered if, for shorter ciphers, one could reasonably pursue an approach based on autocomplete? The idea is to evaluate a solution based on the number of button presses one would need to type the purported plaintext – one per letter, one to confirm the word guessed by autocomplete. So "I" takes 1 button press for the start of a message (confirm), "like" then takes 3 (L-I-confirm), "killing" then takes 4 (K-I-L–confirm) and "people" then takes 2 (P-confirm). This is 10 button presses for the first 4 words of the 408, and this is from what I can tell already the fewest button presses one can get to the respective character with while matching the pattern. Greedy approaches quickly run into trouble ("I am a man jailed did just" takes 12 – confirm confirm confirm M confirm J confirm D I confirm J confirm. "Enjoy our new year error" takes 11 – E confirm O confirm N confirm Y confirm E R confirm). We thus have a way of evaluating a heavily truncated portion of the ciphertext, with a correct answer being possibly indicated by the lowest value for this test just 4 words in. It is possible I’ve missed some lower-scoring alternatives as I’ve not coded this up, but it seems certain to me that by the time we reach "so much" the correct answer will be the unique best solution under this metric.

The question remains of how to find this solution, and there are a few things which might help here. Firstly, the brief testing I did on this was just on my phone, with knowledge of my way of writing. We have access to some amount of zodiac’s writing which we can use instead, including a number of misspellings. By brute force, to find the furthest character which can be reached in 10 button presses would require 27^10 checks which is inaccessible. We could prune the search space at each confirm, though. While reaching the 12th character in 8 button presses is not the best, it is not far off. Finally, we only have to check button presses which authentically lead to words. The Anatree or DAWG data structures would allow for a very quick calculation of which button presses are even possible, and it is unlikely to be all 27 apart from at the start of a word. For the first 25 characters this seems to bring the search space down to less than 1billion which is perfectly manageable.

Is it worth me pursuing this or have I missed a problem which makes this unfeasible? Any comments welcome.

Posted : September 8, 2019 12:37 am

Jarlve

(@jarlve)

Posts: 2547

Famed Member

The ability to automatically solve a homophonic cipher seems like a large gap which needs to be filled in efforts with z340. As far as I’m aware, the hill-climbing algorithm described in http://www.cs.sjsu.edu/faculty/stamp/RUA/homophonic.pdf is currently the best algorithm in this direction, but this appears to be generally unsuccessful with shorter ciphers. As such, the analysis I have seen (such as period 19 bigrams) is based around statistical data with no way of guaranteeing if the resulting cipher actually gives rise to a plausible message or not.

The reason why the algorithm is unsuccessful with shorter ciphers is because it uses 2-gram frequencies as intelligence. There are far better algorithms, glurk’s zkdecrypto was able to solve 340 like ciphers as early as 2007-2010 and it uses up to 5-grams. From 2014 I started developing AZdecrypt and it is steadily getting recognition as being the best solver out there with support for up to 8-grams. The higher the n-grams the shorter ciphers it can solve and something like a 6 line 408 can be solved. There is also Largo’s solver (The Raccoon) which exists within his program Peek-a-boo. And project Zenith from beldenge. There is also CrypTool and others (sorry if I forgot someone’s project). All capable solvers that should be able to solve the 340 if it was a cipher like the 408. doranchak maintains a list at: http://zodiackillerciphers.com/wiki/ind … ware_Tools

I wondered if, for shorter ciphers, one could reasonably pursue an approach based on autocomplete?

Your autocomplete idea is very original. I wonder if it is possible to tap into Google’s autocomplete to achieve this (backed up by a hill-climber). Filling in the cipher from left-to-right may be a problem with the 340 as it does not have many repeating symbols at the beginning, meaning that there would be allot of fittable sentences to start with. You would rate the solution based on how well the autocomplete is doing. Instead of going from left-to-right you could assign a random key to the whole cipher and let the autocomplete score it (somehow, perhaps in sections of some length) and then hill-climb the key. A problem may be the lack of spaces, suppose the sentence "Do I or do I not" is recognized by Google autocomplete but strung together "doiordoinot" is not.

For the first 25 characters this seems to bring the search space down to less than 1billion which is perfectly manageable.

The first 25 characters of the 340 have only 1 repeating symbol so you could fit in almost any sentence you want. Unless you can go back in time and spy on Zodiac (or simulate) there are certain limitations that we need to take into account.

Is it worth me pursuing this or have I missed a problem which makes this unfeasible? Any comments welcome.

I think it could be a very interesting project and if it works it may be a very powerful solver that scales with the database you tap it into. For example if you could tap into Google’s autocomplete somehow that may be insane as they have massive data sets. Anyway, just go for it. Developing a solver is great fun!

AZdecrypt

Posted : September 8, 2019 10:14 am

anderson110

(@anderson110)

Posts: 55

Trusted Member

An interesting idea. The metric would seem to have value when the plaintext is close to legible words. But does it provide any gradient in the right direction when the plaintext isn’t very close to a solution yet?

I can imagine this augmenting an n-gram approach, but probably not replacing it.

Posted : September 11, 2019 5:52 am

92 Forums
5,255 Topics
87.7 K Posts
11 Online
5,626 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed