Zodiac Discussion Forum

Detecting fake peri…
 
Notifications
Clear all

Detecting fake periods

15 Posts
3 Users
0 Reactions
2,998 Views
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Hi,

this morning I had an idea for a new measurement method that could possibly determine whether a cipher was transposed or not. It is based on the discoveries that different column/row-offsets in z340 lead to significantly higher nGram numbers (viewtopic.php?p=56059).
My idea is to determine the period with the highest bigramm number for all possible offsets and to put the results into a list. Here’s the source code that illustrates this:

var map2Grams = new SortedDictionary<int, int>();

for (int y=0; y<cipherOriginal.Height; y++)            
{
	for (int x=0; x<cipherOriginal.Width; x++)
	{
		// Get a fresh copy of unmodified cipher
		Snippet cipherShifted = new Snippet(cipherOriginal);

		// Shift the cipher
		if (x != 0)
			cipherShifted.ShiftHorizontal(x);

		if (y != 0)
			cipherShifted.ShiftVertical(y);
		
		// Track best periods
		var best2GramPeriod = cipherShifted.GetBestPeriod(2, 170);

		if (map2Grams.ContainsKey(best2GramPeriod.bestPeriod))
			map2Grams[best2GramPeriod.bestPeriod]++;
		else
			map2Grams[best2GramPeriod.bestPeriod] = 1;		
	}
}

The z340 shows the following diagram:

The x-axis shows the periods 1-170, the y-axis shows how often which period was the "best" one. You can see very clearly that P19 is dominant, but there are also small disturbances. The periods 3, 4, 5, 16, 38 and 117 have the most irregularities. This is particularly interesting in that a 17×20 cipher also yields P16 in a diagonal transposition. Here again as a table:

 3: 5
 4: 8
 5: 8
16: 38
19: 217
21: 2
38: 11
39: 1
54: 8
64: 1
76: 1
81: 1
84: 2
85: 3
86: 1
90: 1
91: 1
93: 1
97: 1
110: 1
115: 8
117: 10
123: 1
128: 1
129: 2
130: 1
138: 1
139: 1
142: 1
150: 1
157: 1

The third largest peak is on P38. This is not surprising as it is a factor of P19.
Let’s now take a cipher that we know to have a real P19 peak, but that doesn’t come from a straight P19 transposition. I used Jarvles cipher based on a Magic Square for testing (viewtopic.php?f=81&t=3591). Here is the result:

 19: 326
 32: 2
 38: 6
 41: 1
 74: 1
 97: 2
121: 1
131: 1

I have not used a diagram, because you can see very well that P19 stands out extremely. Again you can see that P38 has a small peak.

What happens if you have a cipher that has a bigram peak at P19, but that doesn’t result from a transposition? As we know, it happens occasionally that untransposed ciphers have significant peaks other than P1. Here is an example:

Plaintext (taken from the news archive at https://wortschatz.uni-leipzig.de/de ):

RNEDMAINLYWITHSTU
DENTSWHOCONTINUET
OGAMBLEBECAUSEITC
ANBECOMEANADDICTI
ONIDONTKNOWTHATIW
OULDHAVEHADTHEDRI
VETOSTARTMYOWNBUS
INESSWITHOUTTHISP
ROGRAMSAIDJAYSONE
DWARDSFOUNDEROFTH
EPOPULARPROVOHOTD
OGSTANDJDAWGSANDS
TUDENTOFTHEMARRIO
TTSCHOOLBEIRUTLEB
ANONMDASHTHEANTIS
YRIANCOALITIONFRE
SHOFFITSELECTIONV
ICTORYBLAMEDLEBAN
ONSPRESIDENTTHURS
DAYFORTHEASSASSIN

Cipher (25% random cycles, raw IOC 2104)

SNmjJbCKI!8D0xW14
knL0X8yOiPM3DN6q3
OvcJfIrgshd4XtA0i
eKfrhPJsbLcljBi1C
OMCkPN3FKP80xd1D8
P5Glye7txbj2xmkTB
7n3PY0cU1J!O8Lg5Z
CMoWW8D2yO633zAXQ
TPwSdJYeAlEb!ZONp
j8cTkWuP5KlqUOu0x
rQPQ6GdVQSO7PxO0j
PvX1eLkElb8wZbMjW
24ksN3Pu0ytJcTTBO
12ZhzPOHfmCU53Ing
cKPLJleWx0yobM1DX
!VCcNiOdGD1APKuSn
XzOuuB3YrHni0CPL7
Ch1OT!gIeJojGpgbM
PNZQVqWDkrK23z6SX
lc!uOT0xmdYZcWXAL

If you let AZDecrypt analyze this cipher (Statistics -> Find plaintext direction), then everything points to P19. For the solver the cipher is no problem, it solves it as usual immediately. Let’s assume, however, that this is not the case. Then our assumption would be that a possible transposition is the reason and we would deal with P19. So what does the test described earlier show? This:

Apparently, this method is quite good for discovering "fake" periods. However, this test does not tell us what the "real" transposition is (in our case P1).
I wonder if more is possible with it. For example, the mini-peaks mentioned above could actually give hints on the structure of the cipher. How does the measurement behave if there are misalignments in a transposition? What is the effect of cycle randomization? What does it look like if the upper and lower half were transposed differently? What happens if about 90% of the cipher is transposed and the remaining 10% consists of a filler? What are the effects of frequent repetitions in plaintext?
I think there is still a lot to try out here. As always, the lack of spare time is the biggest opponent. I hope I can post something new soon, even if it is only the insight that the test is useless ;)

Translated with http://www.DeepL.com/Translator

 
Posted : February 16, 2019 1:44 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

You might want to consider making plaintext transpositions, without substitution, with different sizes and shapes of inscription rectangles, to see if smaller versus larger, narrower versus wider, and shorter versus taller, have a tendency to create false spikes more than the other sizes. I made messages with multiple smaller inscription shapes quite a while ago, and was surprised to find false spikes more frequent if I recall correctly.

 
Posted : February 16, 2019 4:07 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

Looks cool Largo. Can you try it on a couple of ciphers that have 37 bigrams at p19?

AZdecrypt

 
Posted : February 18, 2019 1:06 pm
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Looks cool Largo. Can you try it on a couple of ciphers that have 37 bigrams at p19?

I ran some more tests. Among them was this one:

xmbGZO2Sb7nHoj3P2
ypQPS0OuJqkcKdzP4
TheUVAlrIOPFBLvuP
SsVMC1bDNxOWQA0cG
XdKjieJQY1ymSnBu!
O4TokP8L2O8M!P5Jb
!ZpqcuHOhFPuWrw8d
!UClsVXvIDjAKwkO8
L3ztYBlo8eGFl8Sxb
ZJcMCQ6Hd0pjJPPWq
QOQ4He1DPMXf!AYZ5
ANvur8sTz6K2BKwQt
UJC3WuOViP8YbLkJO
SmuPTg4HIZc0GcW1h
P5M3d0P1eHOuJDGA0
bU!QnVXPNKmH8pSqc
YZBvLrl1O2xs8CGHP
8wTO7tMe7bIcBUZ1d
0CPNEOCN1TmWnU7of
eXpbihPVkDMv3O0yq

Raw IOC: 2114
Unigrams: 60
Repeated Bigrams at P1: 25
Repeated Bigrams at P19: 37
Repeated Trigrams at P1: 1

It’s not that easy to find a cipher that has the most bigrams on P19 and that’s exactly 37. Maybe it would be better to use "(P19 – P1) > n" (in this case 37-25 = 12) as a criterion.
However, here again the test shows that the period is a false positive. Diagram:

 
Posted : February 20, 2019 11:38 pm
(@largo)
Posts: 454
Honorable Member
Topic starter
 

You might want to consider making plaintext transpositions, without substitution, with different sizes and shapes of inscription rectangles, to see if smaller versus larger, narrower versus wider, and shorter versus taller, have a tendency to create false spikes more than the other sizes. I made messages with multiple smaller inscription shapes quite a while ago, and was surprised to find false spikes more frequent if I recall correctly.

That’s a good idea! I just added a chart dialog to Peek-a-boo, so that you don’t have to create a chart manually for all ciphers. So one can try it out very easy. Maybe I will be able to create a new version in the next days.
What would be great would be a scoring for the measurement. Simply only the number of spikes != best period is probably not enough. Does anyone have an idea?

 
Posted : February 21, 2019 9:21 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

Example:

The series of 11 inscription rectangles are 6 cells wide x 5 cells high, inscribe LRTB, read TBLR and transcribe LRTB into the 17 x 20 grid.

Since we are using inscription rectangles with 30 cells, there are 29 P1 bigrams in each inscription rectangle. There are 329 P1 bigrams in the intitial, untransposed message.

25 of the 29 P1 bigrams in each inscription rectangle will become P5, and 4 of the P1 bigrams in each inscription rectangle will become -P24.

Calculate how many P1 repeats there are, and the expected number of P5 repeats, approximately (25/29)*P1. If there is a spike at anything equal to or higher than this value, and not at P5, then have the computer remember the period ( "SP" ) and the number of repeats.

Is there any relationship between SP and its value and the width, height, or total number of cells in the inscription rectangle?

I said that I wanted to show the anatomy of a false spike once, and you liked the idea. It may have been a couple of years ago. I am sorry that I never did it, but it is still on my mind. I am still here and thinking about the 340 though.

If we know how false spikes are created, then perhaps we will know how to detect them. Perhaps make some variant of the idea above.

 
Posted : February 21, 2019 10:54 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

What would be great would be a scoring for the measurement. Simply only the number of spikes != best period is probably not enough. Does anyone have an idea?

I would keep it simple and just divide the amount of spikes by the maximum possible spikes such that your final number is between 0 and 1.

AZdecrypt

 
Posted : February 22, 2019 2:37 pm
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Hi,

If we know how false spikes are created, then perhaps we will know how to detect them. Perhaps make some variant of the idea above.

Many thanks for the idea! I don’t want to promise anything, because I don’t have much time. But I would like to test and implement it. Sounds promising.

I would keep it simple and just divide the amount of spikes by the maximum possible spikes such that your final number is between 0 and 1.

Thank you. With the next Peek-a-boo release I will include this. But I’m still looking for a way to evaluate results with two or more quite high spikes. In some tests I had e.g. a high peak at P1 and another peak at P18 (see below). I would also like to filter this somehow automatically. Maybe just count how many spikes are above average. I will give it a try.

Here are some more finds that might be interesting. However, I’m still not sure if my "Deceptive Periods" test is of any use at all.

If you only look at the lower half of z340, you will see an above-average peak at period 3 and NO hit for P19:

If you omit the upper and lower row and the first column, the following results:

Interesting that there are two distinct spikes at P1 and P18. Probably just an interesting coincidence, but let’s see where this leads…maybe I can discover regions that don’t belong to the cipher, but are responsible for P19.

Translated with http://www.DeepL.com/Translator

 
Posted : March 3, 2019 4:23 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

If you only look at the lower half of z340, you will see an above-average peak at period 3 and NO hit for P19

I find that interesting. If you read the message RLTB in groups of three symbols, ABC, then the P15 bigram symbols are both on either A, B or C. And the period 18, 36 and 54 unigram repeats, which make up the pivots, are also on either A, B or C. And the period 18 bigram symbols, reading the message LRBT, that have symbols that match P15 bigram symbols, are also either A, B or C.

Yesterday I checked again for a coincidence count spike at multiples of 3 reading RLTB, but there isn’t a pattern.

 
Posted : March 3, 2019 5:29 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

Here are some more finds that might be interesting. However, I’m still not sure if my "Deceptive Periods" test is of any use at all.

I would say that at least it may have specific use cases. Bigrams and alternatives may not always work for determining the plain text direction. But it is good to have them.

Here is smokie18e. It is a p1 cipher but smokie changed symbols or letters around to create a fake period at p19. Your test scores p19 at 158. What do you think?

>^DWZI:6>+[OgY`MM
4)Q<$GaW-12c9f80]
;NWF.O8gY,-ZeISUT
<M:.B:G[+>)K4W+1b
Z6`)KWRZ$BI_.A$5
.U>ZZO"0Z`8,9YBb;
N.:.QW&`<IW.[gF/$
+SLV<.,ZV`<`^DID,
1<R[M`#^.aO59Nf*]
)W-Z4cI<`F*Q08Bh>
;5h6[,:Q2GZ9]W`(4
fKSM)1J<..FRZA0N
PIOW-]8Z[Z9V..5S.
7WL>f$C6Z>,M[`:Oe
N<W.[2.BJZ,R6I[G-
acZ#.Dg)%:g:-Z8TV
IA)>W"<`P]fg)Z9M,
R[WI-IG..O:/.WU4<
ZBY4Z)*+><^I&hD`L
B)K(aD/W.]7>)KWR;

AZdecrypt

 
Posted : March 3, 2019 10:26 pm
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Here is smokie18e. It is a p1 cipher but smokie changed symbols or letters around to create a fake period at p19. Your test scores p19 at 158. What do you think?

Indeed, smokie18e shows a very distinct peak at P19, but also some noise beyond P19. It becomes clearer if you perform untranspose P19 and then generate the chart again:

The noise may be an indication that it is a fake P19. Unfortunately, this behavior does not show up with all fake P19 ciphers. Here again the example I posted above:

xmbGZO2Sb7nHoj3P2
ypQPS0OuJqkcKdzP4
TheUVAlrIOPFBLvuP
SsVMC1bDNxOWQA0cG
XdKjieJQY1ymSnBu!
O4TokP8L2O8M!P5Jb
!ZpqcuHOhFPuWrw8d
!UClsVXvIDjAKwkO8
L3ztYBlo8eGFl8Sxb
ZJcMCQ6Hd0pjJPPWq
QOQ4He1DPMXf!AYZ5
ANvur8sTz6K2BKwQt
UJC3WuOViP8YbLkJO
SmuPTg4HIZc0GcW1h
P5M3d0P1eHOuJDGA0
bU!QnVXPNKmH8pSqc
YZBvLrl1O2xs8CGHP
8wTO7tMe7bIcBUZ1d
0CPNEOCN1TmWnU7of
eXpbihPVkDMv3O0yq

This cipher also shows a lot of noise apart from P19. But if you perform untranspose P19, the noise is gone and you only have the peak on P1:

xQVbY8WkbOuuHHHC1
emPAD1MrOZQrOIO8G
dXbSlNy!w8J48VZup
H0pG0rxmP8LcHsicJ
SPCbZOIOS5d3MeTP0
Dq8PiOuOWnJ!zC1z8
GGcwNh2JPQBbUtQD6
YcAYTEPSqFAu!CY6P
KbW0ZOOVbkB0!ZlBH
M2L1bB7Ck7cLcOpsl
dXBkhUvtNDnKvG4qV
o0fKJP!LM1MHduXTc
X8p!wO5QreTvozPdo
uvejAQSMnl7m3jPSK
kHIGJYtm3V1bWO34s
jPODFPZUudXOIn0PT
Vi8hjlP5JP0P2cUy2
hMeLFA8WACTPNxB7q
yeCJ2PKSqN3g1KsUo
pU1QOuwxQvW4em8Zf

Now it gets interesting: Take the cipher and copy it into AZDecrypt. The transposition solver takes a very long time to solve it, although it would only need a simple transpose P19. Could something similar have happened with z340? I don’t know exactly how to describe it, but does AZDecrypt perhaps "concentrate" too much on high nGramm numbers at certain periods? Probably there is an option in the settings that I don’t know yet ;)

I have changed my measurement method for testing purposes. Instead of collecting the best periods in a list, I added the number of nGrams per best period.

Before:

	// Track best periods
	var best2GramPeriod = cipherShifted.GetBestPeriod(2, 170);

	if (map2Grams.ContainsKey(best2GramPeriod.bestPeriod))
		map2Grams[best2GramPeriod.bestPeriod]++;
	else
		map2Grams[best2GramPeriod.bestPeriod] = 1;

After:

	// Track best periods
	var best2GramPeriod = cipherShifted.GetBestPeriod(2, 170);

	if (results.ContainsKey(best2GramPeriod.bestPeriod))
		results[best2GramPeriod.bestPeriod] += best2GramPeriod.nGramCount;
	else
		results[best2GramPeriod.bestPeriod] = best2GramPeriod.nGramCount;

This helps to visualize the noise more clearly. However, this does not make the test significantly more meaningful.

Translated with http://www.DeepL.com/Translator

 
Posted : March 10, 2019 2:54 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

Try to solve it as a regular homophonic cipher, not a transposition. That might be the one where I manipulated the key until I got a false spike at P19.

 
Posted : March 10, 2019 3:39 pm
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Try to solve it as a regular homophonic cipher, not a transposition. That might be the one where I manipulated the key until I got a false spike at P19.

Sorry for the misunderstanding. The cipher that takes so long to solve is my example cipher from above. It got a fake P19 peak, but is P1. If you apply "Untranspose P19" to it, it has a peak at P1. This seems to challenge AZDecrypt.

 
Posted : March 10, 2019 3:50 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

The noise may be an indication that it is a fake P19. Unfortunately, this behavior does not show up with all fake P19 ciphers.

I have played around with your test in Peek-a-boo and I really like it.

I have changed my measurement method for testing purposes. Instead of collecting the best periods in a list, I added the number of nGrams per best period.

Do you have some graphs?

Now it gets interesting: Take the cipher and copy it into AZDecrypt. The transposition solver takes a very long time to solve it, although it would only need a simple transpose P19. Could something similar have happened with z340? I don’t know exactly how to describe it, but does AZDecrypt perhaps "concentrate" too much on high nGramm numbers at certain periods? Probably there is an option in the settings that I don’t know yet

The transposition solver uses bigrams to help its search, though it can be disabled by changing "(Substitution + transposition) Search states" to 1. Then AZdecrypt will only use its n-gram score. You can also reduce the search space by changing "(Substitution + transposition) Operation stack size" to 1. I tried with both and it quickly found a reasonable solution to your cipher. Look to the image to see what I’ve added for the next release. It will make things much easier and understandable. The idea is to add all transpositions that are "simple" to this list, in other words, the transpositions that have a reasonably small search space. And the transpositions that have a potentially large search space can be called "keyed" and these need a specialized individual solver.

AZdecrypt

 
Posted : March 11, 2019 12:43 am
(@largo)
Posts: 454
Honorable Member
Topic starter
 

I have changed my measurement method for testing purposes. Instead of collecting the best periods in a list, I added the number of nGrams per best period.

Do you have some graphs?

Sorry, but my changed measurement method had a bug that corrupted the result. After fixing the bug, the new test is almost 100% similar to the old one. The differences are too marginal to be representative.

The transposition solver uses bigrams to help its search, though it can be disabled by changing "(Substitution + transposition) Search states" to 1. Then AZdecrypt will only use its n-gram score. You can also reduce the search space by changing "(Substitution + transposition) Operation stack size" to 1. I tried with both and it quickly found a reasonable solution to your cipher.

Thanks, that worked!

Look to the image to see what I’ve added for the next release. It will make things much easier and understandable. The idea is to add all transpositions that are "simple" to this list, in other words, the transpositions that have a reasonably small search space. And the transpositions that have a potentially large search space can be called "keyed" and these need a specialized individual solver.

This idea is very good! So you can easily test specific transpositions, I’m already looking forward to the release! Thank you for continuously enhancing AZDecrypt!

Translated with http://www.DeepL.com/Translator

 
Posted : March 16, 2019 7:14 pm
Share: