Zodiac Discussion Forum

Notifications
Clear all

My work

118 Posts
17 Users
0 Reactions
25.5 K Views
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

I’m working on 6-gram support for AZdecrypt. I have used the Usenet corpus (thanks to daikon for making me aware of it). After cleaning up the corpus it still sized 23.2gb and the total size of the 6-grams is almost 500mb! Anyone happen to know an even bigger corpus?

The Google Ngrams data sets are quite large: http://storage.googleapis.com/books/ngr … etsv2.html

Just for fun I added up the file sizes for all the 5-grams for American English (Version 20120701): 187.2 GB

I look forward to your 6-grams update!

http://zodiackillerciphers.com

 
Posted : August 17, 2015 1:14 pm
Quicktrader
(@quicktrader)
Posts: 2598
Famed Member
 

The possibility to define certain symbols as e.g. L, T or E would be very, very, very cool…

Excellent work, btw.

QT

*ZODIACHRONOLOGY*

 
Posted : August 17, 2015 1:36 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Thanks all,

Do you think that using 6-grams will be of benefit?

Early testing indicates that they are of benefit, but there are indeed diminishing returns. Which was anticipated. They way I test is by removing rows from ciphers, if 6-grams mean I can remove one more row from a cipher and still get a solve were 5-grams previously failed I’ll gladly take it! It’s all about pushing small improvements. So I believe that n-grams that fit in memory are worth taking. I guess you are suggesting that I start looking at word level n-grams and that is certainly something I should start to consider.

Just for fun I added up the file sizes for all the 5-grams for American English (Version 20120701): 187.2 GB

I saw that to. Do you or glurk have any ideas on how to implement word level n-grams into a solver?

The possibility to define certain symbols as e.g. L, T or E would be very, very, very cool…

You mean locking certain symbols to letters like in ZKDecrypto? I agree. I could push an updated version of Examine which has this functionality but it is not very user friendly (at first) and I can’t afford/offer (time-wise) much support.

AZdecrypt

 
Posted : August 17, 2015 1:57 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Just for fun I added up the file sizes for all the 5-grams for American English (Version 20120701): 187.2 GB

I saw that to. Do you or glurk have any ideas on how to implement word level n-grams into a solver?

Funny – I was just chatting with glurk about this. :)

I think there is some benefit to dictionary attacks with long n-gram sequences if the constraints imposed by the relevant sequence of cipher text are large enough to weed out all the spurious candidate texts. I.e., a snippet of cipher text that has enough repeating symbols will exclude many possibilities.

In an unlimited memory/speed fantasy world, I would index all those ngrams such that they could be queried by those constraints. Then it becomes possible to consider sets of constraints (groups of cipher text snippets that each have strong constraints). I would try to search for a set of small-ish cipher text snippets that maximizes the number of symbols shared between snippets. This significantly prunes the search space since plaintext under consideration for one snippet would inform the choices for other snippets.

I’m thinking of Edwin Olson’s dictionary attacks for short cryptograms, but expanding the idea to handle longer n-gram sequences, and to handle substring searches: https://april.eecs.umich.edu/pdfs/olson2007crypt.pdf

http://zodiackillerciphers.com

 
Posted : August 17, 2015 2:11 pm
glurk
(@glurk)
Posts: 756
Prominent Member
 

Jarlve-

I may be getting off-topic here, I should probably start a new thread about n-grams, but an idea I once had was negative scoring for longer n-grams that NEVER appear in any corpora. Especially with longer n-grams like 6-grams, things like "QSTVPK," which would likely never appear could be given a negative score.

This never got past the idea stage with me, since something like "ZZZZZZ" could appear, meaning sleeping, snoring, etc. And I have no idea how to determine negative scores for things that never actually occur in English.

It DID give me an idea, which I never fully developed but considered. The idea is "longest string of consonants." Basically, in English you will almost never find a string of letters longer than 7 that does not contain a vowel. There are very few words or phrases that will not have an A,E,I,O,U in a 7 letter section.

I never used it, but I still like the idea, and I think it might be a good addition to your solver.

-glurk

EDIT: In fact, in this post the only => 7 grams that have no vowels are "QSTVPKwh" and "ZZZZZZc"

EDIT #2: I don’t think this could be used directly in a hillclimber / solver as part of the solving process. But it might be a good metric to weed out bad prospective solutions from good ones, The 408 plaintext fits this metric, as well as all of Zodiac’s known writings, as far as I can tell.

——————————–
I don’t believe in monsters.

 
Posted : August 17, 2015 2:44 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

@Quicktrader,

I have pushed a new version of Examine that will let you lock letters like ZKDecrypto, see the last post in: viewtopic.php?f=81&t=2432 If you or anyone has any questions about it just shoot.

@doranchak,

I personally don’t give much value to how many repeats a piece of a cipher has. While I do find it of importance when considering forced, manual solutions that degrade quickly. The solver looks at the entire cipher. For example, in the following image I selected the first 8 symbols of the 340 and you can see how they "connect". I believe this is the real structure, roughly, the amount of symbols involved with a change. Maybe I misunderstood your point, I’ll think about it for a while.

@glurk,

The negative score is interesting, I wonder if it might help with ciphers which solution is lower than the local maxima. I once tried to penalize for less flat n-gram scores but it didn’t help. What do you think about an IoC for consonant/vowel frequencies? For instance, a double consonant on average appears 50 times in a 340 character cipher, a triple 11 times etc, etc… (made up values).

AZdecrypt

 
Posted : August 17, 2015 4:28 pm
glurk
(@glurk)
Posts: 756
Prominent Member
 

Jarlve-

I kind of gave up on the negative scoring idea, since I don’t know any way to determine how any n-gram is "less likely" than any other. I would have to suppose that all of them that never appear in English text are equally unlikely.

But I do think that a final scoring, by using vowel distribution, or "longest string of consonants" could be a valuable metric for sorting out prospective solutions. As I said – dealing with English text with all punctuation / spaces removed – one will rarely find a string of consonants longer than 7.

As far as I can tell, this applies to almost everything written in English. Even with misspellings.

-glurk

(In any case, I have NEVER seen a correct solution to ANY cipher in the English language that contains 7 or more consonants in a row. It isn’t impossible, but it seems very unlikely. Even the "final 18" of the 408 are vowel-rich.)

——————————–
I don’t believe in monsters.

 
Posted : August 17, 2015 4:52 pm
(@pinkphantom)
Posts: 556
Honorable Member
 

Jarive – your Shiba avatar is almost as awesome as all of your hard work done. Again I could never comprehend how you guys come up with all of these theorems, but they we really remarkable. :)

 
Posted : August 17, 2015 6:46 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

But I do think that a final scoring, by using vowel distribution, or "longest string of consonants" could be a valuable metric for sorting out prospective solutions. As I said – dealing with English text with all punctuation / spaces removed – one will rarely find a string of consonants longer than 7.

Yes, but I think that metric is probably already taken care of by the n-grams. Because in my experience, the solver tries to come up with a distribution that promotes the best results for the entire cipher. Anyway, worth trying out.

your Shiba avatar is almost as awesome as all of your hard work done.

Hehe, I’m not so much aware of the whole doge thing but it’s much cool. 8-)

AZdecrypt

 
Posted : August 17, 2015 7:18 pm
daikon
(@daikon)
Posts: 179
Estimable Member
 

Do you think that using 6-grams will be of benefit? I’m not saying they won’t but there is a point of diminishing returns somewhere. I have not done the testing myself, but I’ve read up on it a lot, and many researchers believe (maybe a consensus) that 3 and 4-grams are optimal.

I can confirm from my own experience that using 6-grams definitely gives you an improvement for higher-multiplicity ciphers. Most research papers I’ve seen don’t concern themselves with higher-multiplicity ciphers, and for 1000+ character / 50 unique symbol ciphers 4-grams are perfectly adequate. One thing to watch out when moving to higher-order N-grams is what to do with "missing" entries. If I remember correctly, for 6-grams, over 90% of them will not be present in the corpus (depends on its size of course). Usually a score of 0 is assigned to all such N-grams. It is generally not a problem for scoring the final solution, as a "missing" N-gram shouldn’t be present in the correct solution. However, it creates a problem for the hill-climber algorithm (or simulated annealing, etc.) while it’s converging on the solution, because the majority of the solution field will be very flat, with occasional "islands" with practically vertical cliffs around them here and there. The algorithm tends to get stuck on those islands, unable to venture far enough away from an island to get to the next one, so you’ll need a lot of random restarts to end up on the correct "island". I’ve bumped into this problem myself, and tried to research a good solution. If you’d like to have nightmares for a few nights :), I welcome you to try to read this brief overview, or the full paper, if you are brave enough. I couldn’t get through it, so I ended up coming up with my own simplified interpolation method. If anyone is interested, I can describe it. Basically, I used the lower-order N-grams to "fill in the gaps". But I encourage you to experiment with your own implementations, as you might come up with something better.

 
Posted : August 28, 2015 2:07 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Hey daikon,

Have you noted an improvement with your interpolation method? I have not experienced a problem with going from 5 to 6-grams, but perhaps it could work even better?

AZdecrypt

 
Posted : August 28, 2015 9:55 am
daikon
(@daikon)
Posts: 179
Estimable Member
 

Have you noted an improvement with your interpolation method? I have not experienced a problem with going from 5 to 6-grams, but perhaps it could work even better?

Not a huge one, but there was definitely an improvement (vs assigning flat 0 to all non-present 6-grams). Specifically, fewer restarts required to get to the correct solution for harder ciphers. It’s practically required for 7-grams though, as otherwise it performs worse than 5-grams. However, 7-grams are much slower than 6-grams as you can’t use an array lookup any more, and have to do the indexing/binary search trick. Which means the little improvement you get with 7-grams is wasted, as 6-grams can do a few restarts by the time 7-grams are finished with a single restart.

 
Posted : August 28, 2015 10:47 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Download AZdecrypt 0.98.

It’s a great update!

– Powerful 6-gram solver module included. I’m not sure about the memory requirements but I believe 2 to 4GB of RAM may be in order to use the 6-gram module. (click on the module box to change between 5-grams and 6-grams)
– Now accepts ciphers in numeric or symbolic form. (check readme.txt under the Ciphers sub-directory for format examples)
– Both 32-bit and 64-bit executable included, the 64-bit version is quite a bit faster on my i7 but it requires a 64-bit operating system.
– Up to 10.000 characters, 400 distinct symbols and 400 characters per distinct symbol.
– Up to 20% faster and greatly increased loading speed of n-grams.
– New output style that is more informative (see jroberson example below).

Happy to report that the 6-gram module is able to solve the jroberson. I used 10.000.000 keys per iteration and was lucky enough to get this solve within 10 iterations. This is the first automatic decipherment of the jroberson?

8-)

jrob.txt

Score:20342 Ioc:719 M:244 C:405 S:99

iliveeatingicec
reambecauseitis
sodelicusitismo
redeliciousthan
eatingshsrbenot
thefrozenfoodai
slebecauseicecr
eatisthemostdel
iciousgelatoofa
lltonatsomethin
gsweetgivestenv
emostpleasiasec
redenceitiseven
morepleasingtha
tslurringdownaw
eathsshavethebe
stpartoiitistha
nwheniurinateal
ltvicecreatihav
eeatenwillbedeb
ormaslemonaueiw
illnotgiveyouth
eaameofthgrocer
becauseyouwills
lowdownorstormh
eatingoficecrea
mhaanczrtestdia

au=é0vUF.W!:Iwd
E^zG*(d-«Xx<R#Z
4A)7u/9k1ay=’{H
l}YfJ.I:t~&DQ¨
@r8<V3X]1S*m`ic
Oe0B6p%v5NM|Ph#
Z>w*^do«4(/9xI+
7bga1[è}Gq’KLfT
=d.Ak&!@uUFHtBz
J>Ri5-yXp{m¬:_
3Zs0v8!<jw4g^`j
(GM1OCTxQ’#-17n
,};fW9@/[a&2jm¨
{|E0CuvrX=V!KDh
cZJ~l,.53)|?_os
w-y$41]bé^Fe(*x
’RCUSyq/:<&8èz
`?¬7W#«6/¨-O}Q>
T[jaIfd+@rg=Dhj
2moK0Vs.uJ*v;w*
AEGbX>^GH5Uk(:?
<Tu_tF!#jx»i~R]
7-z{}pNg$3lM9fS
*@I-«Z2»|ks/J>4
Tq?PAsWH61yt,G$
mQa¨!iB=d09+vr
{ezUWd%6[w&L#h

The 6-gram solver is not just better for difficult ciphers, it also produces more accurate plaintext transciptions for easy ciphers. See the difference between the following solves.

5-grams:

inordertofacilita
tetheanalysisofho
mophonicciphersol
versivewrittenasc
riptthattakestert
asinputandgenerat
esacisherofagiven
lengthusingagiven
numberofsymbolsth
escriptgencisherp
lyupportsseveralo
ptionsincludingen
codingusingstraig
htintegeryorthepo
pularprintableash
liformattheoutsut
sgeneratedbythesc
ristincludeplaint
entversionofthein
puthishertentvers

6-grams:

inordertofacilita
tetheanalysisofho
mophonicciphersol
versivewrittenasc
riptthattakestest
asinputandgenerat
esacipherofagiven
lengthusingagiven
numberofsymbolsth
escriptgencipherp
lsupportsseveralo
ptionsincludingen
codingusingstraig
htintegersorthepo
pularprintableasc
iiformattheoutput
sgeneratedbythesc
riptincludeplaint
extversionofthein
putciphertextvers

AZdecrypt

 
Posted : August 30, 2015 10:22 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

http://zodiackillerciphers.com

 
Posted : August 30, 2015 10:28 pm
Quicktrader
(@quicktrader)
Posts: 2598
Famed Member
 

This rocks…64bit version with key-locking function and quick…greatgreatgreat.

QT

*ZODIACHRONOLOGY*

 
Posted : September 1, 2015 12:30 pm
Page 2 / 8
Share: