Zodiac Discussion Forum

Homophonic substitu…
 
Notifications
Clear all

Homophonic substitution

1,434 Posts
21 Users
0 Reactions
326 K Views
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

@Mr lowe: I think that’s a good idea. We would need a good way to automatically identify high-quality sentence fragments within streams of non-delimited plaintexts , even if there are a few decoding errors / misspellings. Those sorts of tasks might be better suited to algorithms that aren’t strictly based on ngram counts, and capture more abstract features of the English language.


http://zodiackillerciphers.com

 
Posted : December 10, 2015 3:14 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Eventually I would like to understand how you compute the measurement so I can implement it too.

I wrote a speed orientated function (FreeBASIC) for you which outputs the score. I then also calculate the percentual difference between the ciphers score and 1000 randomizations of it, which gives a number close to 180 for the 340. I haven’t taken a second look at the calculation logic yet and it will probably have it’s own weaknesses. But it is something different you may want to give a try.

Raw scores with weight 5:
340: 2152
408 (capped 340): 2861

Function use for 340 cipher for example: m_2s_cycles(cipher(),340,63,5). Higher weight increases the score emphasis on cycle quality, I usually go with 5 here.

function m_2s_cycles(cipher() as short,byval total_symbols as short,byval unique_symbols as short,byval weight as single)as double
	dim as integer i,j,e,u,cs1,cs2
	dim as short cycle(unique_symbols,unique_symbols,1 to 100)
	dim as short cycle_length(unique_symbols,unique_symbols)
	dim as short ident(1000),ident_count(1000)
	dim as short alternations
	dim as double score,alt_per_cycle_length
	for i=1 to total_symbols
		if ident(cipher(i))=0 then
			u+=1
			ident(cipher(i))=u
			e=u
		else
			e=ident(cipher(i)) 
		end if
		ident_count(cipher(i))+=1
		for j=1 to unique_symbols
			cycle_length(e,j)+=1
			cycle_length(j,e)+=1
			cycle(e,j,cycle_length(e,j))=e
			cycle(j,e,cycle_length(j,e))=e
		next j	
	next i
	for cs1=1 to unique_symbols
		for cs2=cs1+1 to unique_symbols
			for i=1 to cycle_length(cs1,cs2)-1 
				if cycle(cs1,cs2,i)<>cycle(cs1,cs2,i+1) then alternations+=1
			next i 
			if alternations>0 then score+=(cycle_length(cs1,cs2)-1)*((alternations/(cycle_length(cs1,cs2)-1))^weight)
			alternations=0
		next cs2
	next cs1
	return score
end function

AZdecrypt

 
Posted : December 10, 2015 10:35 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

On the one hand, it makes me think 20900 is significant. On the other hand, if the 1000+ transpositions I tested are all behaving as random text, 20900 is expected to appear as an outlier.

Given your results I don’t think 20900 is significant. I did a test some days ago comparing the 340 versus a cipher where smokie had transposed a cipher on 55 different "period 19" lines and the 340 didn’t show promising results.

I have been thinking and come down to these (no specific order). The bigram period 19 thing:

1. is a fluke.
2. relates to transposition and the cipher is not in English.
3. relates to a special encoding process (something like _pi mentioned).
4. relates to transposition but with some misalignments, or unexpected distribution.
5. relates to transposition and the cipher also is also somewhat polyalphabetic (wildcards?).

Or combinations of the above even. I personally think option 1 is the most unlikely. From there I don’t know. But I think we should try to find out if misalignment can be the culprit for low scoring returns.


AZdecrypt

 
Posted : December 10, 2015 10:57 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

On the one hand, it makes me think 20900 is significant. On the other hand, if the 1000+ transpositions I tested are all behaving as random text, 20900 is expected to appear as an outlier.

Given your results I don’t think 20900 is significant. I did a test some days ago comparing the 340 versus a cipher where smokie had transposed a cipher on 55 different "period 19" lines and the 340 didn’t show promising results.

I have been thinking and come down to these (no specific order). The bigram period 19 thing:

1. is a fluke.
2. relates to transposition and the cipher is not in English.
3. relates to a special encoding process (something like _pi mentioned).
4. relates to transposition but with some misalignments, or unexpected distribution.
5. relates to transposition and the cipher also is also somewhat polyalphabetic (wildcards?).

Or combinations of the above even. I personally think option 1 is the most unlikely. From there I don’t know. But I think we should try to find out if misalignment can be the culprit for low scoring returns.

I will provide some basic discussion about whether the 340 period 19 statistics are a fluke soon.


 
Posted : December 11, 2015 4:43 am
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

I wrote a speed orientated function (FreeBASIC) for you which outputs the score.

Excellent; thanks for sharing it! I will work on an implementation of it.


http://zodiackillerciphers.com

 
Posted : December 11, 2015 3:15 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Given your results I don’t think 20900 is significant.

I’m inclined to agree.

I did a test some days ago comparing the 340 versus a cipher where smokie had transposed a cipher on 55 different "period 19" lines and the 340 didn’t show promising results.

I have been thinking and come down to these (no specific order). The bigram period 19 thing:

1. is a fluke.
2. relates to transposition and the cipher is not in English.
3. relates to a special encoding process (something like _pi mentioned).
4. relates to transposition but with some misalignments, or unexpected distribution.
5. relates to transposition and the cipher also is also somewhat polyalphabetic (wildcards?).

Or combinations of the above even. I personally think option 1 is the most unlikely. From there I don’t know. But I think we should try to find out if misalignment can be the culprit for low scoring returns.

I will keep working on the transposition explorer for a while. My hope is to locate candidates that have multiple measurements that seem to peak together, or to happen upon a candidate that scores highly in azdecrypt. But I fear I may be forced to return to the "cipher generator" approach which systematically excludes specific encoding methods. An extremely tedious and time consuming approach.

One more thing I’m wondering is if it is possible to fully automate azdecrypt. For example, from my transposition explorer, I can produce a list of 100 candidates and feed them into azdecrypt’s input directory, and use the resulting scores to automatically direct my search to more promising candidates. But I would need to figure out how to tell azdecrypt to start its tasks automatically and exit when it’s done. Is this feasible?


http://zodiackillerciphers.com

 
Posted : December 11, 2015 3:26 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Raw scores with weight 5:
340: 2152
408 (capped 340): 2861

My implementation’s results are:

340: 2144
408: 2873
first 340 of 408: 2861

Looks pretty close. I wonder why my 340 is off from yours a little. Maybe rounding errors? *shrug*


http://zodiackillerciphers.com

 
Posted : December 11, 2015 5:56 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

I will provide some basic discussion about whether the 340 period 19 statistics are a fluke soon.

Looking forward to it.

One more thing I’m wondering is if it is possible to fully automate azdecrypt. For example, from my transposition explorer, I can produce a list of 100 candidates and feed them into azdecrypt’s input directory, and use the resulting scores to automatically direct my search to more promising candidates. But I would need to figure out how to tell azdecrypt to start its tasks automatically and exit when it’s done. Is this feasible?

I have thought similar. How about this. I add a new mode where AZdecrypt periodically scans the Ciphers directory for files (wait mode). When a file is found it exits wait mode and processes it into the Results directory along with the input file and then returns to wait mode.

Looks pretty close. I wonder why my 340 is off from yours a little. Maybe rounding errors? *shrug*

Very strange. I ran my program without any optimizations and special stuff and it still gave the same number. I guess the Sherlock Holmes approach would be to consider your 340 different from mine. But it may be a FreeBASIC problem, if so I would like to report it.


AZdecrypt

 
Posted : December 11, 2015 10:14 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

A little curious thing. I checked for the possibility of column or row filler with my m_s2_cycles measurement. I found rows 12 and especially 14 to be the best candidates (all columns gave weak results more or less). Rows 12 and 14 happen to be pivot rows.


AZdecrypt

 
Posted : December 11, 2015 10:32 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

I have thought similar. How about this. I add a new mode where AZdecrypt periodically scans the Ciphers directory for files (wait mode). When a file is found it exits wait mode and processes it into the Results directory along with the input file and then returns to wait mode.

That would be great! For my search to resume, it will need to detect when azdecrypt has returned to wait mode, perhaps by detecting a message in one of the output files, or the presence of a file made specifically when azdecrypt is completely done with the current batch of inputs.

Looks pretty close. I wonder why my 340 is off from yours a little. Maybe rounding errors? *shrug*

Very strange. I ran my program without any optimizations and special stuff and it still gave the same number. I guess the Sherlock Holmes approach would be to consider your 340 different from mine. But it may be a FreeBASIC problem, if so I would like to report it.

There’s a good chance of a bug in my implementation, since I haven’t studied your algorithm to fully understand it (I focused only on blindly porting it). But here’s the output of the variables when my implementation finishes its run on the 340:

http://www.zodiackillerciphers.com/jarl … ements.txt

Maybe you can notice some problem in there. Note: To keep the output manageable, the two big arrays’ contents are only shown when their values are greater than 0.


http://zodiackillerciphers.com

 
Posted : December 11, 2015 10:38 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

A little curious thing. I checked for the possibility of column or row filler with my m_s2_cycles measurement. I found rows 12 and especially 14 to be the best candidates (all columns gave weak results more or less). Rows 12 and 14 happen to be pivot rows.

That’s very interesting. I still need to add some "remove filler" and "create misalignments" operators to my transposition explorer, so it can try to evaluate a wide variety of combinations.


http://zodiackillerciphers.com

 
Posted : December 11, 2015 10:41 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

Maybe you can notice some problem in there.

First of all you can remove anything related to the ident_count array because I just noticed it is unused (oops). My algorithm also enumerates, look for the remark in the next piece of code of what I think is going wrong.

if ident(cipher(i))=0 then
	u+=1 <--- I think you moved this line under the next line
	ident(cipher(i))=u
	e=u
else
	e=ident(cipher(i)) 
end if

Because your ident starts with 40:

ident: [40, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,

cipher: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 4, 18, 19, 20, 21, 22, 23, 24, 25, 
26, 27, 28, 29, 30, 31, 32, 19, 33, 34, 35, 36, 18, 37, 38, 14, 25, 20, 32, 12, 21, 39, 0, 40, 41,

AZdecrypt

 
Posted : December 12, 2015 1:13 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

I will keep working on the transposition explorer for a while. My hope is to locate candidates that have multiple measurements that seem to peak together, or to happen upon a candidate that scores highly in azdecrypt. But I fear I may be forced to return to the "cipher generator" approach which systematically excludes specific encoding methods. An extremely tedious and time consuming approach.

Both your projects are worth their while and universal. Great work. When your transposition explorer is somewhat worked out I’d like to feed various transposition ciphers to it with increasing difficulty. Nerdgasm!


AZdecrypt

 
Posted : December 12, 2015 1:30 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
Topic starter
 

@smokie,

The main thread is updated with your latest ciphers but I seem to have lost smokie15 (searched the thread) and I don’t think you’ve published smokie16b,c,d yet because your still doing analysis on these. Thanks.


AZdecrypt

 
Posted : December 12, 2015 2:29 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Because your ident starts with 40:

Oh! Thanks for pointing that out. Just discovered the reason for that: My cipher array was indexed starting at 0, and all the other arrays start at 1. After fixing that, my score for the 340 becomes 2151 which only differs from yours by 1.

Both your projects are worth their while and universal. Great work. When your transposition explorer is somewhat worked out I’d like to feed various transposition ciphers to it with increasing difficulty. Nerdgasm!

Thanks. And some test ciphers would be great. Not sure how good the algorithm will be on them but it will be a good exercise to improve its ability to find the correct transpositions. I worry about the vastness of the search space.

Recently I restored a Merge operation which randomly picks two symbols A and B and merges them together (replaces all occurrences of B with A). In this way it tries to guess which symbols stand for the same letters. My hope is that a correct selection leads to improved ngrams/fragments/cycle scores. But in practice it might just lead to too many false positives.


http://zodiackillerciphers.com

 
Posted : December 12, 2015 2:58 pm
Page 45 / 96
Share: