Thanks guys!
loaded and ran the 6 grams ok but wont let me load and run the 8 grams.. i will give it another go on the weekend. cheers
Damn. I tested it on a 4 GB RAM system and it worked. Can you describe what is exactly happening?
EDIT: did you perhaps not unzip the download file with 7-zip?
dont worry jarlve it wont be you. i have 7 zip downloaded.
i will get back to you on the weekend. cheers
dont worry jarlve it wont be you. i have 7 zip downloaded.
i will get back to you on the weekend. cheers
Is it working now Mr lowe?
—
Someone with the handle of Sensei Rat blogged about using AZdecrypt to solve an item of a cryptology challenge @ https://senseirat.com/?p=255
This cipher stupefied me for a bit, it is oddly interesting. It has only 2 trigram repeats and because of that is solvable as both a bigram substitution cipher and a regular substitution cipher! Props to Carl Mehner and Sensei Rat.
Cryptographic Challenge 8 – Bigram
AXQBSUT PFN QPUP AXU SVTUKA HUHWTN JKKUPP AW SVPJCEU AXU PUKQTVAN OUJAQTUP CQVEA VBAW AXVP ANFU WO KWHFQAUT KWBBUKAVWB FWTAChallenge 8 designed by Carl Mehner
This was, by far, the hardest challenge for me. It was relatively easy for me to identify that it was a diagraph/bigram encryption, but I wasted so much time trying to brute force the encryption using scrabble skills instead of doing research on the problem. The analyzer that dCode provides did help some with the scrabble brute forcing, but after getting a little disheartened with it, I did some more Googling on this and found this article: http://scienceblogs.de/klausis-krypto-k … ecord-set/ and this article: http://scienceblogs.de/klausis-krypto-k … -climbing/ about some guys had solved a complex one in record time using something called Up The Hill method.One of them developed a tool called AZdecrypt and there is actually a demo of it hosted that I was able to use. This got me the question that I needed to answer for the solve, and I have to say, other than successfully completing all of the challenges for this, solving this specific challenge was the most gratifying because to me it was the hardest.
This cipher stupefied me for a bit, it is oddly interesting. It has only 2 trigram repeats and because of that is solvable as both a bigram substitution cipher and a regular substitution cipher!
I understand more clearly now. A bigram can start at a even or uneven position, or 2 rails, and with this cipher every bigram (repeat) is unique to its own rail. To get a message like this (that solves as both bigram and masc) of this length (104 characters) one must cherry pick it somehow. It is more likely for it to happen with a message that has few naturally occurring bigram repeats and this message is such a case. The cipher is a clever masc.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 1 A X Q B S U T P F N Q P U P A X U S V T U K A H U H W T N J K K U P P A W S V P J C E U A X U P U K Q T 2 V A N O U J A Q T U P C Q V E A V B A W A X V P A N F U W O K W H F Q A U T K W B B U K A V W B F W T A
This cipher stupefied me for a bit, it is oddly interesting. It has only 2 trigram repeats and because of that is solvable as both a bigram substitution cipher and a regular substitution cipher! Props to Carl Mehner and Sensei Rat.
I am still trying to understand this.
So, is each bigram a separate cipher unit? For example, for the ciphertext AXQBSUT PFN, is this the cipher alphabet: AX, QB, SU, TP, FN?
Does each cipher unit stand for a plaintext unigram, or a bigram?
Does each cipher unit stand for a plaintext unigram, or a bigram?
Both, in the plain text, every unique bigram occurs only at even or uneven positions (naturally). For example, the positions of bigram AX: 1, 15, 45, 73.
If you haven’t yet, then pick up programming, it is actually a very relaxing and rewarding thing to start with.
I did. When the bookstores opened back up, I got a couple of books about Javascript. I only put a few hours a week into the new part of my hobby, but so far I have a program that does corpus training and makes and solves simple substitution ciphers ( most of the time ) in Google Chrome. I have a lot of work to do, but it is fun and rewarding. So far I just get my data from the console.log, so it is a very rudimentary program. Rather than go to homophonic, I aspire to fine tune and perfect my simple substitution solver, then work on transposition. We will see how it goes, and I might ask some questions later.
Nice to hear from you smokie!
Seems you are doing great with the programming. That’s not bad at all. Have fun with it and take it easy. Of course feel free to ask anything.
O.k., so my first program wasn’t very well planned, and I am doing it over with more perspective. The objective is to learn how to program and to eventually work up to being able to explore different algorithms for un-transposing messed up transpositions, similar to your rows bound solver. I write a few functions a week.
I have two questions.
First, keeping in mind that this is only a simple substitution solver so that I can focus my future time and computer power to transposition, what do you suggest for hill-climbing methods? If I batch solve 1000 cryptograms, and only use bigrams, I get +/- 87% solve rate. The program finds false hills and gets stuck, so I allow only 5,000 iterations and if that doesn’t work, then re-initialize the key. Allow for 10 attempts before moving on to the next cryptogram. Of course, if I use tri-grams, the solve rate goes up to 97%. The original program was too slow with quadrams because I was using slow methods, but the new version should be exponentially faster. Anyway, instead of re-starts after 5,000 iterations ( most solves take much less than that ), I also tried to let the hill climber climb up and also down hills, with an acceptance rate and most of the time up, but some of the time down. That didn’t seem to work. Any suggestions? Keeping in mind that the ultimate use for the program will be to pour cleartext into rows of various lengths before scoring.
Second, how do you score your messages? I just add up the logarithms of the probabilities of the ngrams so far. My score function is set up to have multiple functions inside it, but for now it only has the one add up the logarithms function. What about rows bound method? Do you normalize somehow the bigram scores and trigram scores to match the quadram scores? Or do you have three different scores? Or do you just add them up without normalizing them? What about scoring individual rows? How do you do that? If the ( future ) algorithm that searches for a way to fragment the un-transposed message so that a group of pieces, or rows, can be used to find a partial solve, and if I just add up the logarithms, then I think that bigrams would be favored over trigrams, and trigrams over quadrams, pushing the program into smaller and smaller fragments that are not the solution. Does your row bound solver deal with this somehow?
Asking now so that I don’t program myself into spagetti. I probably will anyway.
We will see how it goes, and it may take a while. I took about 8 months off. I am not quite as aggressive about my work anymore, it has to stay a hobby.
Thanks.
Hey smokie,
First, keeping in mind that this is only a simple substitution solver so that I can focus my future time and computer power to transposition, what do you suggest for hill-climbing methods?
Whatever works best. Simulated annealing has worked well for substitution ciphers and the like for many people. But some problems require a depth search such that it looks ahead multiple steps. AZdecrypt’s "simple transposition" solver has a depth search component added in. Only use depth search if really needed because it can be terribly inefficient as well.
I also tried to let the hill climber climb up and also down hills, with an acceptance rate and most of the time up, but some of the time down. That didn’t seem to work. Any suggestions?
Was it proper simulated annealing or something that you came up with? Try the following, take a temperature value (magic number that you need to tune) and after every iteration reduce the temperature by (temperature/iterations). Also after every iteration, if the solution is not accepted reduce the previous best score by the temperature. Remember to tune the temperature value to get it just right. I call this simulated annealing without acceptance function.
Second, how do you score your messages?
Yes, sum the logs into a score value. I also divide the score by the number of logs to normalize scores between ciphers of varying length. Then multiplicate the score by the entropy of the plain text 1-gram frequencies.
Asking now so that I don’t program myself into spagetti. I probably will anyway.
Yeah, you will need to figure things out, to me, that is one of the joys. I remember you as a original thinker, so it should not be a problem for you to work things out eventually. For every good idea there may be many bad ideas, it’s a matter of putting in the work and staying interested.
Will answer the questions about the row bound solver at a later date to keep things light for now.
That helps out a lot. Thanks.
I’m no crypto expert, so I could be wrong on the details, but here’s my thought. The 1st Zodiac crypto was solved my a husband-wife team who had no expertise in cryptology. Subsequent Zodiac crypts remain unsolved even after analysis by numerous knowledgeable individuals. Could it be that that 1st crypto was Zodiac’s, while the unsolvable ones were produced by a genuine cryptographer?
What about rows bound method? Do you normalize somehow the bigram scores and trigram scores to match the quadram scores? Or do you have three different scores? Or do you just add them up without normalizing them? What about scoring individual rows? How do you do that? If the ( future ) algorithm that searches for a way to fragment the un-transposed message so that a group of pieces, or rows, can be used to find a partial solve, and if I just add up the logarithms, then I think that bigrams would be favored over trigrams, and trigrams over quadrams, pushing the program into smaller and smaller fragments that are not the solution. Does your row bound solver deal with this somehow?
Say that the solver is in 5-gram mode, then there are 4 different scores for each n-gram size. They are added but the smaller n-gram sizes get there score reduced by some factor as they have to remain complementary. For example, you could divide the 4-gram score by 2, the 3-gram score by 4 and the 2-gram score by 8 or whatever works best for your case. Also, the smaller n-gram sizes are only used at the beginning and end of each row.
Each row is scored individually only in function of not crossing row boundaries with the n-grams.
With the fragments solver (that works on top of the row bound solver) you can limit it to say 20 fragments, or 30 fragments, and that way it cannot push itself into increasingly smaller fragments. Alternatively you can set up a fragment weight such that a higher number of fragments is punished.
What about rows bound method? Do you normalize somehow the bigram scores and trigram scores to match the quadram scores? Or do you have three different scores? Or do you just add them up without normalizing them? What about scoring individual rows? How do you do that? If the ( future ) algorithm that searches for a way to fragment the un-transposed message so that a group of pieces, or rows, can be used to find a partial solve, and if I just add up the logarithms, then I think that bigrams would be favored over trigrams, and trigrams over quadrams, pushing the program into smaller and smaller fragments that are not the solution. Does your row bound solver deal with this somehow?
Say that the solver is in 5-gram mode, then there are 4 different scores for each n-gram size. They are added but the smaller n-gram sizes get there score reduced by some factor as they have to remain complementary. For example, you could divide the 4-gram score by 2, the 3-gram score by 4 and the 2-gram score by 8 or whatever works best for your case. Also, the smaller n-gram sizes are only used at the beginning and end of each row.
Each row is scored individually only in function of not crossing row boundaries with the n-grams.
With the fragments solver (that works on top of the row bound solver) you can limit it to say 20 fragments, or 30 fragments, and that way it cannot push itself into increasingly smaller fragments. Alternatively you can set up a fragment weight such that a higher number of fragments is punished.
Thanks. It didn’t recently occur to me to include the smaller ngrams at the beginning of a row, just at the end. The program is over 1,000 lines now, but I have a way to go, including a lot of testing with the solver before I move on to fragments. I have some ideas for algorithms that I have been thinking about, and want to try it myself. As far as normalizing the shorter ngram scores, I thought that I might sort the quadrams by corpus text count high to low, then for example, sort the trigrams the same way, but spread them out evenly on the x-axis of the quadram sort, then use the y-value ( count ) of the corresponding quadram that is directly below it. Since the distributions are curved somewhat. But I am a ways away from that. Thanks again and if I find a fragmentation algorithm I will share it with you. Hey, it’s a great distraction from the news, the election, which in my country is extremely stressful on everyone, and gives me something to do since I am basically house bound. Have a good day.
smokie-
In re: to your solver that you are working on, I just wanted to chime in with a few thoughts. In the scoring system, making a computer recognize natural language text, (English, Spanish, etc….) is kind of an under-explored area. There _may_ even be a better way than ngram scoring, with perhaps Markov models, some new yet-uninvented heuristic, etc. It’s fun to experiment.
So I echo Jarlve on that. It’s been said that most new inventions don’t come from an AHA! moment, but more from an observation or noticing "hmm, that’s strange…"
What we did with ZKD is to "weight" all of the various scores by varying amounts, and then change those weights around to see what worked best. You can always remove them later. We did, in hindsight, end up with some hard-coded "magic numbers" which may not have been the best idea, but the weight factors could always be read in from a separate file. Even the testing of those could be automated against a known solution and timed, compared, etc.
So all I’m saying really is to not take anything as gospel, because you might come up with a great new approach if you experiment and stick to it.
Best of luck and have fun!
-glurk
——————————–
I don’t believe in monsters.
smokie-
In re: to your solver that you are working on, I just wanted to chime in with a few thoughts. In the scoring system, making a computer recognize natural language text, (English, Spanish, etc….) is kind of an under-explored area. There _may_ even be a better way than ngram scoring, with perhaps Markov models, some new yet-uninvented heuristic, etc. It’s fun to experiment.
So I echo Jarlve on that. It’s been said that most new inventions don’t come from an AHA! moment, but more from an observation or noticing "hmm, that’s strange…"
What we did with ZKD is to "weight" all of the various scores by varying amounts, and then change those weights around to see what worked best. You can always remove them later. We did, in hindsight, end up with some hard-coded "magic numbers" which may not have been the best idea, but the weight factors could always be read in from a separate file. Even the testing of those could be automated against a known solution and timed, compared, etc.
So all I’m saying really is to not take anything as gospel, because you might come up with a great new approach if you experiment and stick to it.
Best of luck and have fun!
-glurk
Thanks. The above gives me some ideas!
I am adding a word spacer for the next release of AZdecrypt that can be enabled/disabled and here’s a preview:
Score: 23159.31 IOC: 0.0654 Multiplicity: 0.1323 Seconds: 0.10 Repeats: EBECAUSE KILLING THEMOAT BECAUSE WILLBE SLAVES PC-cycles: 12443 I LIKE KILLING PEOPLE BECAUSE IT IS SO MUCH FUN IT I A MORE FUN THAN KILLING WILD GAME IN THE FORREST BECAU SE MAN IS THE MOAT DANGERTUE ANAMAL OF ALL TOKILL SOME THING GIVESME THEMOAT THRILLING EXPERENCE IT I S EVEN BETTER THAN GETTING YOUR ROCKS OFF WITH A GIRL THE BEST PART OFITIATHAE WHEN I DIE I WILL BERE BORN IN PAR ADICE AND ALL THE I HAVE KILLED WILL BECOME MY SLAVES I WILL NOT GIVE YOU MY NA ME BECAUSE YOU W ILL TRY T OSLO I DOW NORATOP MY COLLECTING OF SLAVES FOR MY AFTER LIFEE BEORIETE METHHPITI