Bigram and Cycle Counts

beldenge

(@beldenge)

Posts: 48

Trusted Member

Topic starter

Hey all — my apologies if I’m rehashing things that have already been talked about. I just wanted to share some quick findings and also ask for a little assistance.

I’ve been experimenting a little bit with transposition keys that produce high numbers of bigram repeats on z340. I’m running a hill climber with transposition keys of varying lengths working from length 2 all the way up to 30 or so. I’m getting some pretty high numbers of bigram repeats at key length 23 and greater. Here are a few examples:

If interested I can provide the resulting ciphertext for these, but I’m not finding any good solutions for any of them (using https://github.com/beldenge/zenith).

[0, 19, 15, 8, 1, 5, 14, 21, 18, 10, 3, 12, 16, 2, 7, 9, 4, 11, 13, 17, 20, 22, 6], Bigram repeats: 50, KeyLength: 23
[6, 15, 0, 11, 12, 16, 20, 5, 17, 2, 7, 9, 22, 1, 14, 10, 18, 21, 13, 8, 19, 4, 3], Bigram repeats: 51, KeyLength: 23
[5, 13, 3, 12, 7, 11, 16, 20, 4, 15, 14, 10, 1, 21, 18, 23, 17, 22, 2, 6, 0, 9, 19, 8], Bigram repeats: 50, KeyLength: 24
[13, 4, 10, 8, 6, 5, 20, 3, 16, 1, 15, 19, 11, 0, 24, 22, 9, 17, 23, 14, 7, 18, 12, 21, 2], Bigram repeats: 50, KeyLength: 25
[9, 13, 19, 0, 24, 15, 16, 11, 6, 22, 18, 21, 4, 5, 3, 2, 14, 7, 12, 10, 1, 20, 17, 8, 23], Bigram repeats: 50, KeyLength: 25
[3, 5, 4, 14, 0, 8, 11, 22, 15, 19, 24, 1, 10, 6, 7, 12, 16, 21, 23, 9, 18, 13, 20, 2, 17], Bigram repeats: 52, KeyLength: 25
[2, 1, 21, 20, 13, 11, 8, 15, 23, 6, 18, 16, 10, 4, 0, 14, 9, 5, 3, 24, 19, 22, 7, 12, 17], Bigram repeats: 51, KeyLength: 25

Note that I’m removing the last row of the cipher (assuming it’s partially filler), so any bigram repeats in the last row aren’t counted — thus the "real" counts might be even higher than the above.

Unfortunately what this is telling me is just that it’s really easy to produce high numbers of bigram repeats. I ran a similar experiment on the z408 and was able to get lots of high bigram repeats there as well — even higher than the original cipher. So what I’ve concluded thus far is that my approach isn’t going to be helpful in finding the correct transposition key.

I thought of also factoring in trigram repeats and cycle counts but haven’t tried that yet. Which brings me to where I would be grateful for any assistance. Does anyone know of an efficient way of calculating cycle counts? I wasn’t finding anything useful after doing some brief googling, but I also may not be searching for the right terms.

Thanks!

http://projectzenith.net
https://github.com/beldenge/zenith

Posted : July 26, 2019 6:08 am

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Does anyone know of an efficient way of calculating cycle counts?

There are many ways to do them. Depends on what you want. smokie treats came up with the idea of scoring all 2-symbol cycle counts in the cipher and this turned out to be a very stable measurement with some care. It is a go to for deciding whether a cipher text uses sequential homophonic substitution or not. Here follows my own optimized implementation and score calculation, it could be optimized further but have not gotten around to it:

function m_2cycles(array()as long,byval l as integer,byval s as integer,byval weight as double)as double

	'array = cipher numbered by appearance
	'l = cipher length
	's = cipher symbols
	'weight = cycle weight (I use 5)

	dim as short i,j,k,p1,p2,a,b,c,d,e,al,cl,fm
	dim as short frq(s)
	dim as double score
	for i=1 to l 'get symbol frequencies
		frq(array(i))+=1
	next i
	for i=1 to s 'determine max frequency
	if frq(i)>fm then fm=frq(i)
	next i
	dim as short map(l,fm+1)
	for i=1 to l 'symbol position look up table
		map(array(i),0)+=1
		map(array(i),map(array(i),0))=i
	next i
	for i=1 to s
		map(i,map(i,0)+1)=l+1
	next i
	for i=1 to s
		for j=i+1 to s
			if frq(i)>1 andalso frq(j)>1 then 'ignore symbols that only appear once
				e=0
				p1=1
				p2=1
				al=0
				cl=0
				do
					a=map(i,p1)
					b=map(j,p2)
					if a<b then
						c=a
						p1+=1
						d=1
					else
						c=b
						p2+=1
						d=2
					end if
					if c=l+1 then exit do
					cl+=1
					if e>0 andalso e<>d then al+=1
					e=d
				loop
				if al>1 then score+=(cl-1)*((al/(cl-1))^weight)
				'if al>1 then score+=scs_table(al,cl-1) 'score look up table
			end if
		next j
	next i
	return score

end function

The 340 scores about 2137 with this measurement for validation.

AZdecrypt

Posted : July 26, 2019 9:24 am

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Unfortunately what this is telling me is just that it’s really easy to produce high numbers of bigram repeats. I ran a similar experiment on the z408 and was able to get lots of high bigram repeats there as well — even higher than the original cipher. So what I’ve concluded thus far is that my approach isn’t going to be helpful in finding the correct transposition key.

Yep, especially with hill climbing and a large search space. As the search space grows larger you need to move up to using a solver with increasingly higher n-gram sizes.

AZdecrypt

Posted : July 26, 2019 9:35 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Hey all — my apologies if I’m rehashing things that have already been talked about. I just wanted to share some quick findings and also ask for a little assistance.

I’ve been experimenting a little bit with transposition keys that produce high numbers of bigram repeats on z340. I’m running a hill climber with transposition keys of varying lengths working from length 2 all the way up to 30 or so. I’m getting some pretty high numbers of bigram repeats at key length 23 and greater. Here are a few examples:

If interested I can provide the resulting ciphertext for these, but I’m not finding any good solutions for any of them (using https://github.com/beldenge/zenith).

[0, 19, 15, 8, 1, 5, 14, 21, 18, 10, 3, 12, 16, 2, 7, 9, 4, 11, 13, 17, 20, 22, 6], Bigram repeats: 50, KeyLength: 23
[6, 15, 0, 11, 12, 16, 20, 5, 17, 2, 7, 9, 22, 1, 14, 10, 18, 21, 13, 8, 19, 4, 3], Bigram repeats: 51, KeyLength: 23
[5, 13, 3, 12, 7, 11, 16, 20, 4, 15, 14, 10, 1, 21, 18, 23, 17, 22, 2, 6, 0, 9, 19, 8], Bigram repeats: 50, KeyLength: 24
[13, 4, 10, 8, 6, 5, 20, 3, 16, 1, 15, 19, 11, 0, 24, 22, 9, 17, 23, 14, 7, 18, 12, 21, 2], Bigram repeats: 50, KeyLength: 25
[9, 13, 19, 0, 24, 15, 16, 11, 6, 22, 18, 21, 4, 5, 3, 2, 14, 7, 12, 10, 1, 20, 17, 8, 23], Bigram repeats: 50, KeyLength: 25
[3, 5, 4, 14, 0, 8, 11, 22, 15, 19, 24, 1, 10, 6, 7, 12, 16, 21, 23, 9, 18, 13, 20, 2, 17], Bigram repeats: 52, KeyLength: 25
[2, 1, 21, 20, 13, 11, 8, 15, 23, 6, 18, 16, 10, 4, 0, 14, 9, 5, 3, 24, 19, 22, 7, 12, 17], Bigram repeats: 51, KeyLength: 25

Note that I’m removing the last row of the cipher (assuming it’s partially filler), so any bigram repeats in the last row aren’t counted — thus the "real" counts might be even higher than the above.

Unfortunately what this is telling me is just that it’s really easy to produce high numbers of bigram repeats. I ran a similar experiment on the z408 and was able to get lots of high bigram repeats there as well — even higher than the original cipher. So what I’ve concluded thus far is that my approach isn’t going to be helpful in finding the correct transposition key.

I thought of also factoring in trigram repeats and cycle counts but haven’t tried that yet. Which brings me to where I would be grateful for any assistance. Does anyone know of an efficient way of calculating cycle counts? I wasn’t finding anything useful after doing some brief googling, but I also may not be searching for the right terms.

Thanks!

You are reading the 340 left right top bottom, and inscribing into a keyed columnar transposition rectangle? What about reading left right top bottom? Have you tried that?

My theory is that if part of the last row is filler, then it is the last 9 symbols. Look at the positional relationship between all of the < and S symbols in the entire message, which doesn’t seem very probable with a a random bunch of symbols, then see the S symbol in the last row which is the 8th symbol in that row. The circle cross hairs and Zodiac signature immediately follow.

Are you getting more period 1 bigram repeats than there are period 19 bigram repeats already?

I score my cycles just by looking at the 1,953 possible two symbol combinations, and counting the number of consecutive alternations in each of these combinations. Then, I calculate 2 exponent the number of consecutive alternations, and sum all of those up. AB would be one alternation, so it would score 2. ABA would be two alternations, and score 4. And so on. You get different results if you stop counting alternations altogether if there is an interruption, such as ABABAABABABABABA. This could score 2^8 = 16 if you stop at the interruption, 2^10 = 1024 if you only count the longest run of alternations, or 2^8 + 2^10 = 1040 if you add them up. Lately I have been using the first method.

If you try to use three symbol combinations, there are just too many and it is not practical because all homophone groups that map to the same letter have two symbol combinations in them anyway. ABC has AB, AC, and BC in it anyway.

If you make a message with all perfect cycles, then you will get very high scores, much higher than the 340. Testing shows that if you make messages and randomly select a symbol about 25% to 30% of the time, you will get scores about the same as the 340. In other words, if E = ABCD, go ABCD over and over, except that 25% to 30% of the time, randomly choose either A B C or D out of order.

Posted : July 26, 2019 3:20 pm

Largo

(@largo)

Posts: 454

Honorable Member

Unfortunately what this is telling me is just that it’s really easy to produce high numbers of bigram repeats. I ran a similar experiment on the z408 and was able to get lots of high bigram repeats there as well — even higher than the original cipher. So what I’ve concluded thus far is that my approach isn’t going to be helpful in finding the correct transposition key.

I made similar experiments some time ago. For example, I also examined a columnar transposition using a genetic algorithm (fixed length of 17). For the fitness-function (scoring) I used nGrams (2, 3 and 4) as well as the 2 cycle score described above. Just like you, I also got some hits with very high values. However, I came to the same conclusion as you and don’t think that one can get ahead in this way. Even in randomly shuffled ciphers you can always find "good" results. A high Bigram number is no guarantee that we are on the right track.

Later I will start a new thread, in which I come back again to the topic.

I implemented the calculation of the 2 symbole cycle score used by Jarlve and smokie treats as follows (Swift 5.0. I also have a C# variant if you are interested):

func getTwoSymbolCycleScore() -> Double {
        var score: Double = 0.0
        let unigrams = getUnigrams()
        
        var symbols: String = ""
        
        for unigram in unigrams {
            if unigram.value > 1 {
                symbols += String(unigram.key)
            }
        }
        
        for i in 0..<symbols.count-1 {
            for k in (i+1)..<symbols.count {
                var alterations: String = ""
                
                var lastSymbol: Character = Character(UnicodeScalar(255))
                var numAlterations: Double = -1
                
                for symbol in cipher {
                    if symbols[i] == symbol || symbols[k] == symbol {
                        alterations += String(symbol)
                        
                        if lastSymbol != symbol {
                            numAlterations += 1
                        }
                        
                        lastSymbol = symbol
                    }
                }
                
                score += Double((alterations.count - 1)) * pow(numAlterations / (Double(alterations.count - 1)), 5)
            }
        }
        
        return score
    }

Translated with http://www.DeepL.com/Translator

Posted : July 26, 2019 4:00 pm

beldenge

(@beldenge)

Posts: 48

Trusted Member

Topic starter

My sincere thanks to you all. I knew this would be the right place to ask. I like the 2-symbol cycle score idea and will give that a try.

You are reading the 340 left right top bottom, and inscribing into a keyed columnar transposition rectangle? What about reading left right top bottom? Have you tried that?

smokie treats, Yes I am reading the 340 left-to-right, top-to-bottom when untransposing using the keyed columnar transposition.

Are you getting more period 1 bigram repeats than there are period 19 bigram repeats already?

I have not tried experimenting with different periods yet, so I suppose you could say I’m using a period of 1.

http://projectzenith.net
https://github.com/beldenge/zenith

Posted : July 27, 2019 5:01 am

Mr lowe

(@mr-lowe)

Posts: 1197

Noble Member

Hi beldenge try Doranchacs excellent crypto scope analyser.. takes a bit of playing around but i think you will like it if you haven’t already used it http://www.oranchak.com/zodiac/webtoy/stats.html

Posted : July 27, 2019 12:35 pm

Mr lowe

(@mr-lowe)

Posts: 1197

Noble Member

If you go to the start of the below thread it has lots of great analysing tools.
viewtopic.php?f=81&t=2617&hilit=homophonic+substitution

Posted : July 27, 2019 12:38 pm

Zodiac Discussion Forum

Bigram and Cycle Counts