Not all homophonic substitutions can be auto-solved

daikon · 2015-07-07T10:42:16Z

... we have software that can solve the 408 in a matter of seconds. People have hammered the 340 with all this great software for years and never cracked it. This would suggest it is not homophonic substitution. I wanted to address this common misconception. It's seems there is a general assumption that any homophonic substitution can be automatically solved by currently available programs that implement hill-climb algorithms like ZKDecrypto and AZDecrypt. And I'm not talking about carefully crafted plaintext that doesn't have certain common letters (like passages from the novel "Gadsby"), or that has an abundance of rare letters, or a very high, or very low IoC. That's actually not the problem. Or plaintexts in languages other than English, as that's solvable too, obviously. But there is still a way to fool hill-climb algorithm quite easily. Here is an example cipher I've constructed to illustrate this problem. It is a straight homophonic substitution, without any tricks of any kind. Each ciphertext symbol translates to only one plaintext letter, and there are no additional transformations to the plaintext that need to be done to read it: 9V[J%:;*S#K&_/$T'_A( ]#4$B)$M$+USP<$`# W,Q5L1a#Y-$&E2$$F#G3 9$.X:*N/#$_+VW#6J#=; ^C,T0ZD$^#HI`#R7K#>Y ?-P8L^A.^*QEJ4#a#@`# U1<9#K^B+^,RFL5#a#=` #S2:;#J$C#>#3<$D#[#9 $PG#6$Q'H7RO#8M#:^A- ^B?/$#X.@(P$*Z=)]C;% IE[#+]F#^D,A&a#VG'`` _B(4K<)5L0>?1$%$26YH &QZC9'IEND3FJ@#7^A-( =#R[/8K)$#:%4L0T]G.> #_*P^1$B2U[H&$$'IE]# +[F#^C,(?#;D)b#35YA% 6J<9B&b#C'7K:G/b#$D( $)8L%HI It has a fairly normal plaintext too. In fact, it's a passage from one of the Zodiac letters I picked at random. The cipher has the exact number of unique symbols as Z340, and it's slightly longer at 347 letters, so it should be actually easier to solve. Yet, neither ZKDecrypto nor AZDecrypt can crack it at default settings. I've used the highest setting of "keys per cipher" for AZDecrypt and I let ZKDecrypto run for nearly an hour. I can see that it gets close at times, and I can even find a few correct words in the jumble of letters, but I can only see them because I know the plaintext. It is still generally unreadable otherwise (without knowing the plaintext). It is by no means a clever way of constructing a cipher, and it should actually be fairly clear how I did it after manual analysis, but it does appear to fool automated cracking attempts. I'll post a hint tomorrow, if nobody solves it before then. EDIT: It appears I misunderstood some of the options of ZKDecrypto, and how to use it in general. So I stand corrected, the cipher above *can* be auto-solved. But definitely not in seconds. EDIT#2: See this cipher which proved to be truly unsolvable without manual analysis and careful merging of cycles.

Largo

(@largo)

Posts: 454

Honorable Member

Would that mean the command line executable has to be restarted for each new variation your main program summons? And thus ngrams have to be reloaded time upon time or can you keep it in memory somehow. Thanks for your ciphers.

For the first version, unfortunately yes. Later I will implement shared memory or named pipes so that the ngrams and the executable needs to be loaded only once. There are also some ways to call c++ directly from python but I have never tried that before. But the first version will support batch solving. I plan to pass most of the gui options via command line parameters. But at the moment I do not have not as much time as I want (had my last week of vacation now).

Posted : December 3, 2016 9:00 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Would that mean the command line executable has to be restarted for each new variation your main program summons? And thus ngrams have to be reloaded time upon time or can you keep it in memory somehow. Thanks for your ciphers.

For the first version, unfortunately yes. Later I will implement shared memory or named pipes so that the ngrams and the executable needs to be loaded only once. There are also some ways to call c++ directly from python but I have never tried that before. But the first version will support batch solving. I plan to pass most of the gui options via command line parameters. But at the moment I do not have not as much time as I want (had my last week of vacation now).

It looks like you know allot more about programming than I do. How is it going?

AZdecrypt

Posted : December 11, 2016 2:34 pm

Largo

(@largo)

Posts: 454

Honorable Member

At the moment I am working on a "proof of concept" for the solver. First I have tried to port the AZDecrypt code to C++. Unfortunately I have some problems at some points since the FreeBasic source I have is not complete. If you don’t mind I would ask you for your help at some time.
For the moment I have decided to develop an own solver. AZDecrypt is very powerful and incredibly fast (great work!). But it might be a good idea to have another solver with some different approaches too.

Current status of the project:
In the past few days I wrote some code to generate scored ngram corpora from arbitrary text books (e.g. "War And Peace" from Leo Tolstoy). Also I have tried different solving methods. To keep things simple my test case is just a monoalphabetic substitution cipher instead of a homophonic one. Genetic algorithms work well but they are a bit too slow. So I have tried a basic hillclimber which was ok. Next I will give simulated annealing a try. As soon as I get some stable and fast results I will keep my focus on the basic concept of the solver. That is:

What’s next?
I will implement the solver as a server. This means that you can send a cipher via TCP/IP to the solver and get back the score and plaintext. The server can run on local host or on any PC in the Network. This gives you the ability to distribute the workload to all your available hardware. If the solver listens on local host the data transfer should be faster than reading and writing files (I hope so!). It should also be possible to send a batch of ciphers at once.
This means that a user of the solver is not bound to a specific programming language. E.g. you can create test ciphers in FreeBasic, Python, C++ or whatever language you prefer. All you have to do is to send your ciphers via TCP/IP and listen for the results.

I am pretty sure that I can implement all that server stuff. But I also think that other people are much more experienced and skilled in the solver related things. So I think that I will share my project via github as soon as it is in alpha/beta state so that others can contribute.

The only problem is that I am very busy at the moment. So it will take some time…

Posted : December 12, 2016 11:20 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

At the moment I am working on a "proof of concept" for the solver. First I have tried to port the AZDecrypt code to C++. Unfortunately I have some problems at some points since the FreeBasic source I have is not complete. If you don’t mind I would ask you for your help at some time.
For the moment I have decided to develop an own solver. AZDecrypt is very powerful and incredibly fast (great work!). But it might be a good idea to have another solver with some different approaches too.

I’m glad to hear you decided to develop your own solver.

Something I like to think of with solvers is that there exists an intelligence spectrum. Adding more intelligence means that the solver will find the solution in less amount of iterations, but the iterations themself will take longer. I think that AZdecrypt is clearly not-intelligent and I focused heavily on simplicity and cost-effective intelligence mostly in the form of normalization. Is intelligence really intelligent? Or is it just the speed at which it can simulate?

I will implement the solver as a server. This means that you can send a cipher via TCP/IP to the solver and get back the score and plaintext. The server can run on local host or on any PC in the Network. This gives you the ability to distribute the workload to all your available hardware. If the solver listens on local host the data transfer should be faster than reading and writing files (I hope so!). It should also be possible to send a batch of ciphers at once.
This means that a user of the solver is not bound to a specific programming language. E.g. you can create test ciphers in FreeBasic, Python, C++ or whatever language you prefer. All you have to do is to send your ciphers via TCP/IP and listen for the results.

Sounds great!

AZdecrypt

Posted : December 13, 2016 1:41 pm

Largo

(@largo)

Posts: 454

Honorable Member

Just a short project update:

I have implemented the basic functionality of my solver system. The main part is a C++ command line tool which listens on a specific network port for incoming text messages. If a message is received it tries to solve it by simulated annealing. As mentioned before I have implemented only a solver for monoalphabetic substitutions for now. When the text is decrypted the solver sends back the plaintext and the calculated score.
For the ngrams I use a corpus based on "war and peace" from project Gutenberg. Any corpus can be used as long as it contains only letters from A to Z. Numbers are not supported at the moment. The ngrams are generated on the fly when the tool is started. The ngram generation is quite fast (creating 6grams from a 2.5 MB text corpus takes less than 0.5 seconds). Of course this has only be done once.
You can run as many solvers (even on different computers) as you want. All you have to do is to setup some threading mechanism in your client which puts the cipher texts in a queue.
My solver uses the following library: http://zeromq.org
This means that it does not matter which programming language you prefer. Just use the bindings provided from zeromq, send your cipher and receive the result. I have implemented clients in python and c++. If I find the time I will create a FreeBasic sample as well if you are interested Jarlve.
The advantage of such a solver is that you can use it programmatically instead of using a gui. This gives you the ability to use it as a part of a genetic algorithm or whatever algorithm you prefer. If you need some computing power for brute force just start as much solvers on as many machines as possible

Here are some screenshots from my "Agapeyeff"-Project – (which is another yet unsolved cipher):

Solver (Server-Part, C++) running on my PC:

Solver (Client-Part, Python) running on my Mac:

Posted : December 25, 2016 6:12 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Really nice work, Largo – thanks so much for sharing this! I really like the idea of having all these different solvers available to us. It’d be nice to run them in parallel to attack ciphers, especially if one of them can find a solution that the others can’t.

http://zodiackillerciphers.com

Posted : December 25, 2016 6:18 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Looks very nice Largo. You are very well organized.

AZdecrypt

Posted : December 25, 2016 8:10 pm

anderson110

(@anderson110)

Posts: 55

Trusted Member

My solver uses the following library: [
The advantage of such a solver is that you can use it programmatically instead of using a gui. This gives you the ability to use it as a part of a genetic algorithm or whatever algorithm you prefer.

This sounds great and just like what I am looking for. Is the initial code up on github?

Posted : May 7, 2018 6:32 pm

Largo

(@largo)

Posts: 454

Honorable Member

This sounds great and just like what I am looking for. Is the initial code up on github?

Unfortunately not. A couple of weeks ago I stopped working on z340 and at the moment I don’t have time to refactor my source code. If I find some time I can pick the most important (server/client) part and post it. But atm I can’t promise.

Posted : May 7, 2018 7:56 pm

Largo

(@largo)

Posts: 454

Honorable Member

Here is a link to a github repository that I’ve used as a starting point:

https://github.com/leejacobson/SimulatedAnnealingTSP

The corresponding article (very informative and well written):
http://www.theprojectspot.com/tutorial- … eginners/6

You should be able to adapt the source code so that it can solve a monoalphabetic substitution. That’s how I started.

Here is some code to create NGrams from a plaintext file (plaintext should consinst only of letters A-Z in upper case, otherwise use a regex when reading the file)

// --------------------------------------------------------------------------------------
struct NGramInfo
{
	int count;
	double logValue;

	NGramInfo(int count, float logValue)
	{
		this->count = count;
		this->logValue = logValue;
	}
};

typedef unordered_map<string, NGramInfo> NGramMapDescription;
typedef NGramMapDescription::iterator NGramMapIterator;
NGramMapDescription m_NGramMap;


// --------------------------------------------------------------------------------------
void Solver::createNGramMapFromFile(string filename, int nGramLength)
{
	cout << "Loading ngrams from: "" << filename << """ << endl;
	cout << "NGram length: " << nGramLength << endl;
	m_nGramLength = nGramLength;
	m_totalNGramCount = 0;
	m_uniqueNGramCount = 0;

	FILE *fileptr;
	char *buffer;
	long filelen;

	// Funky Funky Oldschool =)
	fileptr = fopen(filename.c_str(), "rb"); // We don't use fopen_s because the code must be platform independent
	fseek(fileptr, 0, SEEK_END);
	filelen = ftell(fileptr);
	rewind(fileptr);

	// Read the entire file into a buffer
	buffer = (char *)malloc((filelen + 1) * sizeof(char));
	fread(buffer, filelen, 1, fileptr);
	fclose(fileptr);

	int nGramCount = filelen / nGramLength;	
	int nGramEntryLength = sizeof(int) + nGramLength;

	char *pTemp = new char[nGramLength];
	NGramMapIterator it;

	// Create a map which contains all unique ngrams and their counts
	for (int i = 0; i < filelen - nGramLength; i++)
	{
		m_totalNGramCount++; // TODO: this is +1 in original python code. Check!

		memcpy(pTemp, buffer + i, nGramLength);
		
		it = m_NGramMap.find(pTemp);

		if (it != m_NGramMap.end())
		{
			it->second.count++;
		}
		else
		{
			m_NGramMap.insert(make_pair(pTemp, NGramInfo(1, 0)));
			m_uniqueNGramCount++;
		}
	}

	// Calculate the logarithm of each ngram frequency in the corpus.
	it = m_NGramMap.begin();

	while (it != m_NGramMap.end())
	{
		it->second.logValue = log((double)it->second.count / m_totalNGramCount);
		it++;
	}

	// Calculate a default value for any ngram that does not appear in the corpus
	m_minLogFrequency = log(0.01 / m_totalNGramCount);

	cout << "NGrams loaded!" << endl;

	cout << "Total number of ngrams: " << m_totalNGramCount << endl;
	cout << "Number of unique ngrams: " << m_uniqueNGramCount << endl;

}

I hope this helps a bit

Posted : May 7, 2018 8:11 pm

anderson110

(@anderson110)

Posts: 55

Trusted Member

Thank you!

Posted : May 7, 2018 9:04 pm

Zodiac Discussion Forum