Zodiac Discussion Forum

Notifications
Clear all

My work

118 Posts
17 Users
0 Reactions
25.6 K Views
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

That quadgram repeat is also very interesting. Very nice finds! :)

AZdecrypt

 
Posted : October 16, 2015 8:34 pm
daikon
(@daikon)
Posts: 179
Estimable Member
 

Just a brief comment, but I think if you are using a GB+ sized corpus, a few "hahaha" and etc. probably won’t matter that much.

Must be my solver then. Before I filtered out all "hahaha" and "zzzzz" from my 7-grams, it was sometimes getting stuck on nonsensical solutions like "hahahahahaha…" all the way through the cipher. I figured it would be *very* unlikely someone would waste time encrypting "hahaha" in their cipher too.

I personally got much better results in ZKD, though, when I log-weighted the scores, very simply, like:

VALUE=(int)10*log((double)VALUE);

You kind of have to, so that you are working with probabilities. I’ve actually converted my solver to work directly with probabilities, so that the closer the combined score to 0, the more likely it is the correct solution. Ideally (although it is only possible to achieve theoretically), the probability should be 1, and log(1)=0. It’s a trivial change in the code, you just need to flip all comparisons for the score/fitness, since we are trying to minimize it now. But I think it helps the Simulated Annealing algorithm, as you are using exp(old_fitness-new_fitness) to decide whether to accept the new change. With the higher score/fitness = better solution, the closer you get to the correct solve, the bigger the difference between old_fitness and new_fitness tends to be (since the values themselves get bigger), so you have bigger "jumps" every time you try a new change. Working with probabilities, lower fitness = better solution, so the closer you get to the correct solve, the tighter the difference between old_fitness and new_fitness becomes, so you have "smoother" transitions.

 
Posted : October 16, 2015 9:23 pm
daikon
(@daikon)
Posts: 179
Estimable Member
 

Though there appear to be not many numerals, and ciphers that contain them have problems.

Yeah, I noticed the same thing. I’m actually considering parsing the corpus and "spelling out" all numeral into words. So every 42 and 812 will be replaced with "forty two" and "eight hundred twelve", etc.. Same with all fractions: 3/16th will be replaced with three sixteenth, etc.. Currency amounts: $3.40 = three dollars forty cents. I think there must be something that already exists that does these conversions (like a perl module?) since it is necessary for screen readers, etc..

 
Posted : October 16, 2015 9:32 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Yeah, I noticed the same thing. I’m actually considering parsing the corpus and "spelling out" all numeral into words.

I’ve experimented with that and it does improve things allot for such ciphers. I considered doing it with the Reddit corpus but thought it would be hard to distinguish between sentences that have numerical information that goes with the message or not. It may also be hard to get it just perfect because there a number of formats: #1, 7th, 2015, 16/10/15, 3s, 1.000.000, 100,8, etc. Anyway you seem to be on it and I’m sure you’ll do fine.

But as glurk stated, it probably doesn’t have to be perfect. But since I possibly have additional plans with the sentences I wanted to get them as good as I could get them. So I decided to discard any sentence that had numerical information "0123456789".

AZdecrypt

 
Posted : October 16, 2015 10:14 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

It took about 10 minutes on 6 cores to get the 6 row 408 solve.

368,640 ciphers, 10 minutes each. Works out to about 7 years of processing time. Yikes. :)

http://zodiackillerciphers.com

 
Posted : October 16, 2015 10:16 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

@doranchak, I haven’t started processing the smaller list yet for the Dan Olson theory but I hope to start it this week. I’ve decided to put at least a couple of hours into each cipher just to make sure.

I’m glad to share possibly the first automatic decipherment of the second W.B.Tyler cipher. :D

In July 2000, the contest came to an end when Gil Bronza submitted the correct answer. It turned out that the cipher was a poly-alphabetic substitution cipher and contained “over two dozen mistakes.” And what was the answer to this masterful, albeit flawed puzzle?

“It was early spring, warm and sultry glowed the afternoon. The very breezes seemed to share the delicious langour of universal nature, are laden the various and mingled perfumes of the rose and the –essaerne (?), the woodbine and its wildflower. They slowly wafted their fragrant offering to the open window where sat the lovers. The ardent sun shoot fell upon her blushing face and its gentle beauty was more like the creation of romance or the fair inspiration of a dream than the actual reality on earth. Tenderly her lover gazed upon her as the clusterous ringlets were edged (?) by amorous and sportive zephyrs and when he perceived (?) the rude intrusion of the sunlight he sprang to draw the curtain but softly she stayed him. ‘No, no, dear Charles,’ she softly said, ‘much rather you’ld I have a little sun than no air at all.’”

Score: 16250 Ioc: 696 M: 206 C: 668 S: 138

itwasearlyspringwarb
andsultrrglowintheaf
ternoonthsverybrpere
sseemedtosharethedil
icoooslangomrosuoive
rsalnatureaeladenthe
variouslndsingledpeh
fumesostheroseandthe
ressaernethewoodlike
andtiswildflowerthey
hlowtywaatedshiiafrt
grantssneringtotheov
enwindowwheoesatthea
overstleardenceinsho
otsuolupinherblishin
gfaceanditsgentlebea
utyhasmorelieitherre
atiaonsideyisdromonc
loathesairyrnspirati
onofddreaschanthaact
ualrealithonearthttn
derlyhehalvirgaredup
onherasherclusteroug
ringletswareedvedlya
morousandsportivenev
errsandwhethepersist
edtherideintrusionof
thesunlighthesprangt
odrawthecurtshubutst
agentlystayudhistono
diarcharlesshasontlr
sardmuchhatheryouldn
havealittlerunthenno
ainatral

55 19 109 37 51 71 81 2 9 34 27 115 78 25 5 32 64 79 47 107
57 13 12 33 45 73 44 26 40 20 36 46 75 30 133 15 82 24 17 31
23 4 70 14 58 95 83 69 80 27 6 10 39 52 107 54 110 8 104 21
67 94 18 53 120 11 29 74 72 63 68 3 40 4 43 48 71 16 30 106
121 108 91 98 72 50 49 41 22 32 99 105 77 46 28 45 42 38 6 24
2 27 1 9 5 81 65 76 47 56 79 10 73 57 12 10 35 135 92 85
6 17 26 55 95 113 67 36 22 7 84 25 14 20 36 8 29 110 21 82
31 103 105 18 94 72 51 100 89 53 39 42 63 11 93 83 16 44 59 8
130 21 33 50 1 8 2 22 24 23 96 4 64 91 102 12 106 62 128 71
37 5 7 69 55 33 75 38 66 29 31 49 98 109 56 54 15 101 85 52
80 9 99 64 15 34 75 81 1 19 18 16 28 82 30 60 41 31 77 23
32 78 57 35 43 84 28 86 53 2 55 13 20 65 58 69 80 56 42 125
11 14 134 25 83 12 72 109 64 68 71 46 90 51 17 87 100 89 53 3
42 6 24 26 27 15 9 24 3 70 7 4 83 117 4 111 22 63 59 97
42 43 28 61 58 66 76 119 62 5 48 10 39 122 9 111 50 96 121 35
32 31 41 112 85 1 13 29 62 44 51 20 90 14 15 73 8 107 18 37
61 23 34 88 79 27 123 46 54 21 36 38 90 30 19 92 114 47 40 11
17 74 60 1 97 86 33 25 137 10 52 55 28 16 2 99 120 46 13 116
131 95 1 43 59 11 28 3 25 78 34 2 83 67 118 62 47 41 65 38
91 35 58 31 7 12 26 71 37 84 117 96 1 5 87 101 17 81 108 100
45 79 49 70 24 57 9 60 44 88 97 86 4 17 39 74 89 15 43 13
7 56 54 73 52 82 85 88 3 49 6 30 78 32 3 104 8 29 113 110
102 14 80 18 77 41 94 48 53 2 112 36 103 63 23 11 47 98 45 20
26 55 22 32 66 24 19 50 75 93 70 4 4 7 6 56 16 106 34 1
105 46 39 95 76 51 57 13 12 33 119 72 40 44 25 6 114 136 10 125
129 40 54 27 3 22 7 64 82 8 124 48 21 118 18 77 138 30 84 87
53 29 69 68 10 78 111 16 11 62 5 15 2 61 67 38 98 35 58 31
23 96 4 94 45 13 49 60 20 92 19 48 21 63 115 47 37 14 32 74
99 12 26 81 75 44 59 8 116 113 70 65 28 68 45 122 103 87 50 19
93 20 11 22 100 9 52 51 15 41 34 61 7 80 25 127 124 126 5 91
29 30 1 39 108 68 79 54 73 18 27 50 89 93 33 102 86 23 36 40
67 57 2 16 123 76 112 48 88 17 69 101 56 77 52 97 76 66 12 14
92 3 6 10 132 49 38 19 74 9 21 104 61 35 43 59 90 13 14 42
1 60 5 37 65 26 3 66 

AZdecrypt

 
Posted : October 19, 2015 2:55 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Very nice work!

You should consider writing up your work into a paper and submit it to Cryptologia or some other journal. I’ve seen too many researchers claim "first automatic decipherment" without realizing that solvers "in the wild" (such as yours) have already beaten them to it.

http://zodiackillerciphers.com

 
Posted : October 19, 2015 5:55 pm
daikon
(@daikon)
Posts: 179
Estimable Member
 

I’m glad to share possibly the first automatic decipherment of the second W.B.Tyler cipher. 😀

Nice! Did you have to change anything in your solver, or was it just using higher-order N-grams (7?) to help with the multiplicity?

 
Posted : October 19, 2015 9:14 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Thanks guys,

These journals don’t seem to be my cup of tea. I’m okay just sharing it with the people here. :)

@daikon, 7-grams.

AZdecrypt

 
Posted : October 20, 2015 12:41 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

AZdecrypt 0.991 for Windows.

.zip: https://drive.google.com/open?id=0B5r0r … 3pQUWhYdTQ (153MB download)

– Minor optimizations to the algorithm.
– 7-gram solver which requires 12GB of ram.
– Included a bunch of new ciphers found on the internet (some historical).

The main download comes with Practical Cryptography 5-grams and Reddit corpus 6-grams. The other n-gram files can be downloaded below and need to be placed (unzipped) under the Corpora directory. The program will automaticly detect these files and will allow to user to change the Solver module accordingly. The scores of all n-grams have been normalized to that of the solved 408 which is around 23300. Reddit corpus n-grams are great but may perform worse when the cipher contains numerals.

Additional n-gram files:

5-grams_reddit, .7z: https://drive.google.com/open?id=0B5r0r … DFLRWp2S1k (19MB download)
6-grams_usenet, .7z: https://drive.google.com/open?id=0B5r0r … C03Zm9oYk0 (83MB download)
7-grams_usenet, .7z: https://drive.google.com/open?id=0B5r0r … VBjLUFSLVk (400MB download)
7-grams_reddit, .7z: https://drive.google.com/open?id=0B5r0r … WFuX1haeG8 (576MB download)

AZdecrypt

 
Posted : October 24, 2015 2:19 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Awesome work, Jarlve! As always!!

http://zodiackillerciphers.com

 
Posted : October 24, 2015 2:39 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Thanks :) just wanted to make a 7-gram solver available to everyone.

AZdecrypt

 
Posted : October 24, 2015 2:44 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

I’m curious how much you can scale this up. For example, if you read the ngrams from a fast SSD instead of RAM, how much slower would it be? I think SSD is still about 2 orders of magnitude slower than RAM so you’d suffer a slowdown. But you’d have a lot more space to work with, to store all the data for larger ngrams.

http://zodiackillerciphers.com

 
Posted : October 24, 2015 2:50 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

I’m also curious how well it will scale. I’ve wanted to do this but my current computer has a very poor SSD controller and I don’t want to buy a new one until Skylake-E. I think it will come down to IOPS performance, high end PCI-E Intel SSD solutions have around 1.000.000 IOPS. It’s going to be considerably slower than RAM.

AZdecrypt

 
Posted : October 24, 2015 5:29 pm
daikon
(@daikon)
Posts: 179
Estimable Member
 

Personally, I think 7-grams is about as high as you want to go. Yeah, I know that infamous quote: "640 Kb of memory is ought to be enough for anyone". 🙂 But here’s my reasoning. The average length of a word in English is 5.1 characters. Well, actually, it’s between 4.5 and 5.3, depending on who you ask, so 5 letters would be a good "average of the averages". With 7-grams you cover the whole "average word" entirely, plus 1 letter from the preceding and the following words each. Anything higher than that and I think it would make much more sense to move to word-level N-grams. I’ve actually been thinking about that. With higher-multiplicity ciphers the solver often produces nonsensical results that sort of look English, and score very high according to 7-grams, but don’t read like real English sentences. It might improve the solve rate, at least for higher-multiplicity ciphers, quite a bit if we employed additional scoring based on word-level N-grams. The only thing that’s stopping me from implementing that is that I’m not sure it’ll help solving Z340 in any way.

 
Posted : October 24, 2015 9:39 pm
Page 6 / 8
Share: