Homophonic substitution

Jarlve · 2015-08-02T16:42:57Z

This thread is a continuation of viewtopic.php?f=81&t=267 in which several aspects of the Zodiac 340 cipher are discussed and researched. I'd like to continue the work from there in this thread because then I can use the main post to reference and update all the cipher material being discussed. Some of the questions which the contributors are trying to answer: - Is the 340 a straightforward homophonic substitution cipher or is there something else going on? - The 340 does not seem to cycle as well as the 408, what is going on? (doranchak:... _sequences) - To what extent is the 340 cyclic or random? Can we find areas - as for instance with the last part of the 408 - that are more random? - Is it possible to attribute the 340 not cycling as well as the 408 (despite its higher symbol count) due to some transposition after encoding? - Some of the medium-high count symbols do not seem to cycle well, are these possibly wildcards/polyalphabetic or 1:1 substitutes? (smokie treats) - Can we make a system that can adequately group homophones that belong to the same letter without having to solve the cipher? (smokie treats, glurk) - Is there a discrepancy between symbols/cycles/etc on odd and even positions for the 340? If so, what could be causing this? (daikon, doranchak, smokie treats) - There is a significant bigram repeat peak at period 19, is this a lead to the encryption scheme of the 340? (daikon) Related: 2 symbol cycle analysis for the 340 evens only. (doranchak) 2 symbol cycle analysis for the 340 odds only. (doranchak) Symbol position factors for the 340, 408 and smokie ciphers. (doranchak) 340 cipher numeric and symbolic version: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 5 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 20 34 35 36 37 19 38 39 15 26 21 33 13 22 40 1 41 42 5 5 43 7 6 44 30 8 45 5 23 19 19 3 31 16 46 47 37 19 40 48 49 17 11 50 51 9 19 52 53 10 54 5 44 3 7 51 6 23 55 30 17 56 10 51 4 16 25 21 22 50 19 31 57 24 58 16 38 36 59 15 8 28 40 13 11 21 15 16 41 32 49 22 23 19 46 18 27 40 19 60 13 47 17 29 37 19 61 19 39 3 16 51 20 36 34 62 63 53 31 55 40 6 38 8 19 7 41 19 23 5 43 29 51 20 34 55 38 19 3 54 50 48 2 11 25 27 20 5 61 14 37 31 23 16 29 36 6 3 41 11 30 50 14 53 37 28 19 52 20 51 40 63 47 42 34 22 19 18 11 50 51 20 36 21 58 44 3 6 15 51 18 7 32 50 16 53 61 28 36 8 53 48 19 19 34 20 59 12 30 35 53 47 56 2 4 8 38 39 50 55 19 11 36 28 45 40 20 31 21 23 5 7 28 32 37 57 15 16 3 36 14 19 13 12 63 56 29 19 51 6 26 20 11 33 13 19 19 33 26 56 40 26 36 9 23 42 1 14 54 21 33 5 11 51 10 17 26 29 43 48 20 46 27 23 20 30 55 56 36 4 37 25 1 18 5 10 42 40 39 23 44 62 11 31 58 19 HER>pl^VPk|1LTG2d Np+B(#O%DWY.<*Kf) By:cM+UZGW()L#zHJ Spp7^l8*V3pO++RK2 _9M+ztjd|5FP+&4k/ p8R^FlO-*dCkF>2D( #5+Kq%;2UcXGV.zL| (G2Jfj#O+_NYz+@L9 d<M+b+ZR2FBcyA64K -zlUV+^J+Op7<FBy- U+R/5tE|DYBpbTMKO 2<clRJ|*5T4M.+&BF z69Sy#+N|5FBc(;8R lGFN^f524b.cV4t++ yBX1*:49CE>VUZ5-+ |c.3zBK(Op^.fMqG2 RcT+L16C<+FlWB|)L ++)WCzWcPOSHT/()p |FkdW<7tB_YOB*-Cc >MDHNpkSzZO8A|K;+ Alterations of the 340: - In relation to the bigram peak at period 19: Scheme: move 1 row down, 2 columns right and repeat (wrap around cipher): 340_1rd-2cr-w.txt (doranchak) Grid 19 by 18, direction North-East (vertical) and 2 "?" symbols added: 340_19by18_n-e.txt Grid 20 by 17, direction SW-SE (diagonal): 340_20by17_sw-se.txt Grid 17 by 19, 17 symbols filler at end, vertically untransposed: 340_323_17.txt (smokie treats) Grid 17 by 20, 16 symbols filler at end, vertically untransposed: 340_324_16.txt (smokie treats) Grid 17 by 20, 15 symbols filler at end, vertically untransposed: 340_325_15.txt (smokie treats) Grid 17 by 20, 14 symbols filler at end, vertically untransposed: 340_326_14.txt (smokie treats) Grid 17 by 20, 13 symbols filler at end, vertically untransposed: 340_327_13.txt (smokie treats) - In relation to the odd/even encoding scheme: Evens only: 340evens.txt Odds only: 340odds.txt Randomized, shuffled: 340shuffled.txt (doranchak) Tools/links/solvers: - David Oranchak Zodiac Killer Ciphers:Zodiac Ciphers wiki:... =Main_Page CryptoScope:340 Webtoy:Zodiac Pattern Drawer:| (info) Word Search Gadget:- glurk ZKDecrypto:and viewtopic.php?f=81&t=2268 - Michael Cole The Zodiac Revisited:- Jarlve AZdecrypt:Visualizations: - In relation to the bigram peak at period 19 and 15 (mirrored 340): Doranchak's ngram viewer. Doranchak's period calculator. Doranchak's fragment explorer. Test ciphers: I'd like to introduce a whole new range of ciphers to test on, mainly being homophonic substitution but with different schemes. More will be added and particular schemes can be requested. All of these ciphers can have low count 1:1 substitutes. Please use the proper names of the ciphers when referencing them. There should be no errors in these ciphers but the number of homophones per letter were handpicked each time to introduce a human element. Perfect cycles: c_p1.txt c_p2.txt c_p3.txt Randomization of cycles: (the numb...

doranchak

(@doranchak)

Posts: 2614

Member Admin

@Mr lowe: I think that’s a good idea. We would need a good way to automatically identify high-quality sentence fragments within streams of non-delimited plaintexts , even if there are a few decoding errors / misspellings. Those sorts of tasks might be better suited to algorithms that aren’t strictly based on ngram counts, and capture more abstract features of the English language.

http://zodiackillerciphers.com

Posted : December 10, 2015 3:14 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

Eventually I would like to understand how you compute the measurement so I can implement it too.

I wrote a speed orientated function (FreeBASIC) for you which outputs the score. I then also calculate the percentual difference between the ciphers score and 1000 randomizations of it, which gives a number close to 180 for the 340. I haven’t taken a second look at the calculation logic yet and it will probably have it’s own weaknesses. But it is something different you may want to give a try.

Raw scores with weight 5:
340: 2152
408 (capped 340): 2861

Function use for 340 cipher for example: m_2s_cycles(cipher(),340,63,5). Higher weight increases the score emphasis on cycle quality, I usually go with 5 here.

function m_2s_cycles(cipher() as short,byval total_symbols as short,byval unique_symbols as short,byval weight as single)as double
	dim as integer i,j,e,u,cs1,cs2
	dim as short cycle(unique_symbols,unique_symbols,1 to 100)
	dim as short cycle_length(unique_symbols,unique_symbols)
	dim as short ident(1000),ident_count(1000)
	dim as short alternations
	dim as double score,alt_per_cycle_length
	for i=1 to total_symbols
		if ident(cipher(i))=0 then
			u+=1
			ident(cipher(i))=u
			e=u
		else
			e=ident(cipher(i)) 
		end if
		ident_count(cipher(i))+=1
		for j=1 to unique_symbols
			cycle_length(e,j)+=1
			cycle_length(j,e)+=1
			cycle(e,j,cycle_length(e,j))=e
			cycle(j,e,cycle_length(j,e))=e
		next j	
	next i
	for cs1=1 to unique_symbols
		for cs2=cs1+1 to unique_symbols
			for i=1 to cycle_length(cs1,cs2)-1 
				if cycle(cs1,cs2,i)<>cycle(cs1,cs2,i+1) then alternations+=1
			next i 
			if alternations>0 then score+=(cycle_length(cs1,cs2)-1)*((alternations/(cycle_length(cs1,cs2)-1))^weight)
			alternations=0
		next cs2
	next cs1
	return score
end function

AZdecrypt

Posted : December 10, 2015 10:35 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

On the one hand, it makes me think 20900 is significant. On the other hand, if the 1000+ transpositions I tested are all behaving as random text, 20900 is expected to appear as an outlier.

Given your results I don’t think 20900 is significant. I did a test some days ago comparing the 340 versus a cipher where smokie had transposed a cipher on 55 different "period 19" lines and the 340 didn’t show promising results.

I have been thinking and come down to these (no specific order). The bigram period 19 thing:

1. is a fluke.
2. relates to transposition and the cipher is not in English.
3. relates to a special encoding process (something like _pi mentioned).
4. relates to transposition but with some misalignments, or unexpected distribution.
5. relates to transposition and the cipher also is also somewhat polyalphabetic (wildcards?).

Or combinations of the above even. I personally think option 1 is the most unlikely. From there I don’t know. But I think we should try to find out if misalignment can be the culprit for low scoring returns.

AZdecrypt

Posted : December 10, 2015 10:57 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

On the one hand, it makes me think 20900 is significant. On the other hand, if the 1000+ transpositions I tested are all behaving as random text, 20900 is expected to appear as an outlier.

Given your results I don’t think 20900 is significant. I did a test some days ago comparing the 340 versus a cipher where smokie had transposed a cipher on 55 different "period 19" lines and the 340 didn’t show promising results.

I have been thinking and come down to these (no specific order). The bigram period 19 thing:

1. is a fluke.
2. relates to transposition and the cipher is not in English.
3. relates to a special encoding process (something like _pi mentioned).
4. relates to transposition but with some misalignments, or unexpected distribution.
5. relates to transposition and the cipher also is also somewhat polyalphabetic (wildcards?).

Or combinations of the above even. I personally think option 1 is the most unlikely. From there I don’t know. But I think we should try to find out if misalignment can be the culprit for low scoring returns.

I will provide some basic discussion about whether the 340 period 19 statistics are a fluke soon.

Posted : December 11, 2015 4:43 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

I wrote a speed orientated function (FreeBASIC) for you which outputs the score.

Excellent; thanks for sharing it! I will work on an implementation of it.

http://zodiackillerciphers.com

Posted : December 11, 2015 3:15 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Given your results I don’t think 20900 is significant.

I’m inclined to agree.

I did a test some days ago comparing the 340 versus a cipher where smokie had transposed a cipher on 55 different "period 19" lines and the 340 didn’t show promising results.

I have been thinking and come down to these (no specific order). The bigram period 19 thing:

1. is a fluke.
2. relates to transposition and the cipher is not in English.
3. relates to a special encoding process (something like _pi mentioned).
4. relates to transposition but with some misalignments, or unexpected distribution.
5. relates to transposition and the cipher also is also somewhat polyalphabetic (wildcards?).

Or combinations of the above even. I personally think option 1 is the most unlikely. From there I don’t know. But I think we should try to find out if misalignment can be the culprit for low scoring returns.

I will keep working on the transposition explorer for a while. My hope is to locate candidates that have multiple measurements that seem to peak together, or to happen upon a candidate that scores highly in azdecrypt. But I fear I may be forced to return to the "cipher generator" approach which systematically excludes specific encoding methods. An extremely tedious and time consuming approach.

One more thing I’m wondering is if it is possible to fully automate azdecrypt. For example, from my transposition explorer, I can produce a list of 100 candidates and feed them into azdecrypt’s input directory, and use the resulting scores to automatically direct my search to more promising candidates. But I would need to figure out how to tell azdecrypt to start its tasks automatically and exit when it’s done. Is this feasible?

http://zodiackillerciphers.com

Posted : December 11, 2015 3:26 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Raw scores with weight 5:
340: 2152
408 (capped 340): 2861

My implementation’s results are:

340: 2144
408: 2873
first 340 of 408: 2861

Looks pretty close. I wonder why my 340 is off from yours a little. Maybe rounding errors? *shrug*

http://zodiackillerciphers.com

Posted : December 11, 2015 5:56 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

I will provide some basic discussion about whether the 340 period 19 statistics are a fluke soon.

Looking forward to it.

One more thing I’m wondering is if it is possible to fully automate azdecrypt. For example, from my transposition explorer, I can produce a list of 100 candidates and feed them into azdecrypt’s input directory, and use the resulting scores to automatically direct my search to more promising candidates. But I would need to figure out how to tell azdecrypt to start its tasks automatically and exit when it’s done. Is this feasible?

I have thought similar. How about this. I add a new mode where AZdecrypt periodically scans the Ciphers directory for files (wait mode). When a file is found it exits wait mode and processes it into the Results directory along with the input file and then returns to wait mode.

Looks pretty close. I wonder why my 340 is off from yours a little. Maybe rounding errors? *shrug*

Very strange. I ran my program without any optimizations and special stuff and it still gave the same number. I guess the Sherlock Holmes approach would be to consider your 340 different from mine. But it may be a FreeBASIC problem, if so I would like to report it.

AZdecrypt

Posted : December 11, 2015 10:14 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

A little curious thing. I checked for the possibility of column or row filler with my m_s2_cycles measurement. I found rows 12 and especially 14 to be the best candidates (all columns gave weak results more or less). Rows 12 and 14 happen to be pivot rows.

AZdecrypt

Posted : December 11, 2015 10:32 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

I have thought similar. How about this. I add a new mode where AZdecrypt periodically scans the Ciphers directory for files (wait mode). When a file is found it exits wait mode and processes it into the Results directory along with the input file and then returns to wait mode.

That would be great! For my search to resume, it will need to detect when azdecrypt has returned to wait mode, perhaps by detecting a message in one of the output files, or the presence of a file made specifically when azdecrypt is completely done with the current batch of inputs.

Looks pretty close. I wonder why my 340 is off from yours a little. Maybe rounding errors? *shrug*

Very strange. I ran my program without any optimizations and special stuff and it still gave the same number. I guess the Sherlock Holmes approach would be to consider your 340 different from mine. But it may be a FreeBASIC problem, if so I would like to report it.

There’s a good chance of a bug in my implementation, since I haven’t studied your algorithm to fully understand it (I focused only on blindly porting it). But here’s the output of the variables when my implementation finishes its run on the 340:

http://www.zodiackillerciphers.com/jarl … ements.txt

Maybe you can notice some problem in there. Note: To keep the output manageable, the two big arrays’ contents are only shown when their values are greater than 0.

http://zodiackillerciphers.com

Posted : December 11, 2015 10:38 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

A little curious thing. I checked for the possibility of column or row filler with my m_s2_cycles measurement. I found rows 12 and especially 14 to be the best candidates (all columns gave weak results more or less). Rows 12 and 14 happen to be pivot rows.

That’s very interesting. I still need to add some "remove filler" and "create misalignments" operators to my transposition explorer, so it can try to evaluate a wide variety of combinations.

http://zodiackillerciphers.com

Posted : December 11, 2015 10:41 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

Maybe you can notice some problem in there.

First of all you can remove anything related to the ident_count array because I just noticed it is unused (oops). My algorithm also enumerates, look for the remark in the next piece of code of what I think is going wrong.

if ident(cipher(i))=0 then
	u+=1 <--- I think you moved this line under the next line
	ident(cipher(i))=u
	e=u
else
	e=ident(cipher(i)) 
end if

Because your ident starts with 40:

ident: [40, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,

cipher: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 4, 18, 19, 20, 21, 22, 23, 24, 25, 
26, 27, 28, 29, 30, 31, 32, 19, 33, 34, 35, 36, 18, 37, 38, 14, 25, 20, 32, 12, 21, 39, 0, 40, 41,

AZdecrypt

Posted : December 12, 2015 1:13 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

I will keep working on the transposition explorer for a while. My hope is to locate candidates that have multiple measurements that seem to peak together, or to happen upon a candidate that scores highly in azdecrypt. But I fear I may be forced to return to the "cipher generator" approach which systematically excludes specific encoding methods. An extremely tedious and time consuming approach.

Both your projects are worth their while and universal. Great work. When your transposition explorer is somewhat worked out I’d like to feed various transposition ciphers to it with increasing difficulty. Nerdgasm!

AZdecrypt

Posted : December 12, 2015 1:30 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

@smokie,

The main thread is updated with your latest ciphers but I seem to have lost smokie15 (searched the thread) and I don’t think you’ve published smokie16b,c,d yet because your still doing analysis on these. Thanks.

AZdecrypt

Posted : December 12, 2015 2:29 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Because your ident starts with 40:

Oh! Thanks for pointing that out. Just discovered the reason for that: My cipher array was indexed starting at 0, and all the other arrays start at 1. After fixing that, my score for the 340 becomes 2151 which only differs from yours by 1.

Both your projects are worth their while and universal. Great work. When your transposition explorer is somewhat worked out I’d like to feed various transposition ciphers to it with increasing difficulty. Nerdgasm!

Thanks. And some test ciphers would be great. Not sure how good the algorithm will be on them but it will be a good exercise to improve its ability to find the correct transpositions. I worry about the vastness of the search space.

Recently I restored a Merge operation which randomly picks two symbols A and B and merges them together (replaces all occurrences of B with A). In this way it tries to guess which symbols stand for the same letters. My hope is that a correct selection leads to improved ngrams/fragments/cycle scores. But in practice it might just lead to too many false positives.

http://zodiackillerciphers.com

Posted : December 12, 2015 2:58 pm

Zodiac Discussion Forum