Zodiac Discussion Forum

New tool: Peek-a-bo…
 
Notifications
Clear all

New tool: Peek-a-boo

72 Posts
8 Users
0 Reactions
30.1 K Views
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

I did a small sample test with my plaintext library:

Plaintext 1 to 100 into homophonic substitution (target raw ioc: 1857) into period 17 transposition (as randomization): average p1 bigrams: 15.97.
Plaintext 1 to 100 into homophonic substitution (target raw ioc: 2237) into period 17 transposition (as randomization): average p1 bigrams: 20.69.

Almost a jump of 5 bigrams. As stated before, I regard the ioc as a value that approximates the repeat potential of a text where a higher ioc will generally have more repeats. Be that unigrams, bigrams etc, it affects all measurements and needs to be taken into account primarily when comparing stuff.

Then the same test with no period transposition as randomization:

Plaintext 1 to 100 into homophonic substitution (target raw ioc: 1857): average p1 bigrams: 30.92.
Plaintext 1 to 100 into homophonic substitution (target raw ioc: 2237): average p1 bigrams: 37.82.

A jump of almost 7 bigrams now, thus, a higher ioc stabilizes the plaintext p1 bigram repeats versus randomizations (for instance random periods).


AZdecrypt

 
Posted : November 2, 2017 3:18 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Thanks for that test, Jarlve. It seems to suggest that a normally enciphered message of length 340, with the same IoC as Z340, will tend to contain more bigrams on average than what we observe in Z340. Did you capture standard deviation? I am curious how often the test ciphers could be expected to exceed the p1 bigram count of Z340.


http://zodiackillerciphers.com

 
Posted : November 2, 2017 7:00 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

Here is a book, Jarlve. Brave New World which is about 860 x 340.

https://drive.google.com/file/d/0B5md-0 … sp=sharing

I think you have to download it instead of just viewing it. I deleted everything but plaintext.

Largo: Thank you.


 
Posted : November 3, 2017 4:36 am
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Nice update Largo. Could you add one book to the download?

Sure. I’ll do that with the next update.

Here is a book, Jarlve. Brave New World which is about 860 x 340.

https://drive.google.com/file/d/0B5md-0 … sp=sharing

I think you have to download it instead of just viewing it. I deleted everything but plaintext.

In case of using my statistics tool it is not necessary to delete non plaintext content. Simply put the raw plaintext files into the folder "Plaintext source", thats it. The tool then creates upper case text blocks like the one you have linked above.

All UTF-8-Textfiles from Project Gutenberg should work. Just download, copy into the folder and click "Start". After the tool has finished you can find all ciphers that match the statistics criteria in the "Logs" folder.

https://www.gutenberg.org/ebooks/search … Ddownloads


 
Posted : November 3, 2017 11:48 am
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

I am curious how often the test ciphers could be expected to exceed the p1 bigram count of Z340.

Plaintext message 1 from 100 each encoded a 100 times (sample test size 10000) with 63 symbols and perfect cycles.

Raw ioc target 1850: samples over 25 p1 bigrams: 7988
Raw ioc target 1900: samples over 25 p1 bigrams: 8470
Raw ioc target 1950: samples over 25 p1 bigrams: 8851
Raw ioc target 2000: samples over 25 p1 bigrams: 9242
Raw ioc target 2050: samples over 25 p1 bigrams: 9430
Raw ioc target 2100: samples over 25 p1 bigrams: 9597
Raw ioc target 2150: samples over 25 p1 bigrams: 9753
Raw ioc target 2200: samples over 25 p1 bigrams: 9819
Raw ioc target 2250: samples over 25 p1 bigrams: 9888 <— target ioc equal to 340

In other words, with a raw ioc of 1850 you could expect about 1 in 5 of the ciphers to have as much or less bigrams than the 340. And with a raw ioc of 2250 this goes up to about 1 in 100. These numbers are rough and will vary between encoders/tests etc. I am not worried about the accuracy here.

The consensus that I want to make is that we at least need to take in account the ioc of the 340 when doing comparisons as a rough approximation of its expected repeat potential.


AZdecrypt

 
Posted : November 3, 2017 12:25 pm
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Thank you for explanation and samples Jarlve. I will implement ioc calculations in the next update


 
Posted : November 3, 2017 12:30 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

The consensus that I want to make is that we at least need to take in account the ioc of the 340 when doing comparisons as a rough approximation of its expected repeat potential.

I agree.

I think that the concept of an IOC for a homophonic cryptogram is confusing to me because while the English language has an IOC, there is no large corpus of homophonic cryptogram "language" in existence to compare the 340 with.

But IOC is a number used to summarize symbol count distribution, and the Zodiac 340 has a symbol count distribution that we can try to match with test messages. I do by trying to make the distribution chart similar to the 340 chart by eye, but sometimes I forget to look.

Symbol count distribution is a function of the count of each letter in the alphabet in the message, key efficiency, and possible influence by random symbol selection. An inefficient key with about 63 symbols and about 23 different letters, and more symbols mapping to low frequency letters will produce a symbol count distribution that has more symbols with lower count. The inefficient key will also have fewer symbols mapping to high frequency letters, causing the count of P1 repeats to go up.

In other words, let’s say I have only three of the letter W in my message, and want to increase the count of P1 repeats in my cryptogram. I make a key that is inefficient by mapping only a few symbols to high frequency letters. So I have to put the other symbols somewhere on the key to get around 63 symbols total. Instead of only mapping one symbol to W, I map three symbols to W. If I do that, then those symbols will only appear once in the cryptogram. Mapping more symbols to several low frequency letters will shift the symbol count distribution so that there are more symbols with lower count. The IOC will be lower, but a high count symbol such as the + will offset that and increase the IOC.

IOC, letter count distribution, symbol count distribution, key efficiency, P1 repeats, and random symbol selection are all related to each other. I will try to look at IOC more often with my test messages. A test message should have an IOC that is comparable to the 340 IOC.

I have been taking a break for work reasons. I want to work on Alberti ciphers more, to try different Alberti scenarios and see what happens with regional bias and also how two different Alberti shifts could create regional bias depending on the shifts. In other words, let’s say I want to encode the top and bottom third of a message with one Caesar shift and the middle third with a different Caesar shift before I encode homophonic. There are I think 26 * 25 combinations of Caesar shifts. Would some combinations cause more regional bias than others? Anyway, that discussion belongs in the other thread. I will mess with it or something else when the weather gets crappy and I have a legitimate excuse for not working on my yard. And my immediate career issue is settled one way or the other.

Thanks!


 
Posted : November 3, 2017 3:19 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Smokie, did you see this post from freichmann about his chiffre disk tool?

https://www.tapatalk.com/groups/zodiack … t7810.html


http://zodiackillerciphers.com

 
Posted : November 3, 2017 11:44 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

No, I didn’t see it. I googled it and it looks like an Alberti disk or a Caesar disk. The Alberti disk and homophonic ciphers were developed at about the same time, and are found next to each other in some cryptography books. I would like to explore further regional biases caused by transposition + Alberti + homophonic ciphers because of the symbols that appear at the top and bottom of the message. And, I wonder if it is possible to rotate an Alberti disk at certain intervals in a Alberti + homophonic cipher and create a lot of P15/19 repeats. I wonder about different rotations, and how alignment of frequent letters or less frequent letters will change the count distribution of Alberti ciphertext. Could certain rotations / alignments and intervals + homophonic cause some of the statistical phenomenon observed in the 340? It’s a pretty broad subject and I am just wondering about it at this point. Thanks.

I am still going to delete everything but the letters in some books as I move forward because I will use them in my spreadsheets.


 
Posted : November 4, 2017 3:40 am
(@largo)
Posts: 454
Honorable Member
Topic starter
 

Hi,

Jarlve, sorry for asking a maybe dumb question, but I am a litte bit confused.

What I know about the IOC is that it can be used to determine whether a text is derived from a natural language. In addition, it can be used to determine the language itself. (English about 6.6%, German 7.8% etc.). However, the statistics shown by you seem to refer to something else. What exactly is the RAW ioc and how do you calculate it? What is the difference to the "common" ioc?


 
Posted : November 4, 2017 2:47 pm
doranchak
(@doranchak)
Posts: 2614
Member Admin
 

Smokie, the Alberti approach is interesting. I’m glad you are exploring that.

I’ve been looking at unigrams again, testing how many rows/columns each symbol covers. I found that Z340 seems to have a lot of low-frequency symbols that repeat in columns, when random shuffles indicate that they should not repeat in columns so much. For example, look at the columnar repeats for hollow square, ^, left-filled square, L, and P:

Has anyone noticed this before? I am still trying to determine if this is significant or just an expected phenomenon (i.e., there are enough low-frequency symbols for SOME of them to be expected to have this behavior). I’ll try to post my findings on the unigrams thread.


http://zodiackillerciphers.com

 
Posted : November 4, 2017 4:20 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

What I know about the IOC is that it can be used to determine whether a text is derived from a natural language. In addition, it can be used to determine the language itself. (English about 6.6%, German 7.8% etc.). However, the statistics shown by you seem to refer to something else. What exactly is the RAW ioc and how do you calculate it? What is the difference to the "common" ioc?

The ioc is typically divided over the length of the cipher times the length of the cipher minus 1 (zero to one range normalization). Omit this normalization and you have what I call the raw ioc. These 6.6% and 7.8% values are the normalized ioc multiplied by a 100. The ioc is just a way of adding more weight to more frequently occuring symbols (exponential instead of logarithmic).

It is excellent to gauge the repeat potential of a text. If you say that german has a 7.8% and english 6.6% then I automatically know that the average german plaintext will have more bigrams than the average english plaintext. It is simple, the ioc is higher so it is more probable for repeats to form.

The index of coincidence provides a measure of how likely it is to draw two matching letters by randomly selecting two letters from a given text.

To get the raw ioc of a cipher just sum the product of all individual symbol frequencies multiplied by itself minus 1. You do not need to use the raw ioc per sé, I just prefer to use it in certain conditions as it is an integer. The 340 has a raw ioc of 2236.

24 * 23: 552
12 * 11: 132
11 * 10: 110
etc...

AZdecrypt

 
Posted : November 4, 2017 6:28 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

I want to work on Alberti ciphers more.

Could you give a simple explanation of how this type of encoding works? Thanks.


AZdecrypt

 
Posted : November 4, 2017 6:32 pm
Jarlve
(@jarlve)
Posts: 2547
Famed Member
 

I found that Z340 seems to have a lot of low-frequency symbols that repeat in columns.

I am thinking sequential homophonic substitution.


AZdecrypt

 
Posted : November 4, 2017 6:35 pm
smokie treats
(@smokie-treats)
Posts: 1626
Noble Member
 

Smokie, the Alberti approach is interesting. I’m glad you are exploring that.

I’ve been looking at unigrams again, testing how many rows/columns each symbol covers. I found that Z340 seems to have a lot of low-frequency symbols that repeat in columns, when random shuffles indicate that they should not repeat in columns so much. For example, look at the columnar repeats for hollow square, ^, left-filled square, L, and P:

Has anyone noticed this before? I am still trying to determine if this is significant or just an expected phenomenon (i.e., there are enough low-frequency symbols for SOME of them to be expected to have this behavior). I’ll try to post my findings on the unigrams thread.

I have never noticed that before, and don’t know if it is significant.


 
Posted : November 5, 2017 12:39 am
Page 3 / 5
Share: