Homophonic substitution

Jarlve · 2015-08-02T16:42:57Z

This thread is a continuation of viewtopic.php?f=81&t=267 in which several aspects of the Zodiac 340 cipher are discussed and researched. I'd like to continue the work from there in this thread because then I can use the main post to reference and update all the cipher material being discussed. Some of the questions which the contributors are trying to answer: - Is the 340 a straightforward homophonic substitution cipher or is there something else going on? - The 340 does not seem to cycle as well as the 408, what is going on? (doranchak:... _sequences) - To what extent is the 340 cyclic or random? Can we find areas - as for instance with the last part of the 408 - that are more random? - Is it possible to attribute the 340 not cycling as well as the 408 (despite its higher symbol count) due to some transposition after encoding? - Some of the medium-high count symbols do not seem to cycle well, are these possibly wildcards/polyalphabetic or 1:1 substitutes? (smokie treats) - Can we make a system that can adequately group homophones that belong to the same letter without having to solve the cipher? (smokie treats, glurk) - Is there a discrepancy between symbols/cycles/etc on odd and even positions for the 340? If so, what could be causing this? (daikon, doranchak, smokie treats) - There is a significant bigram repeat peak at period 19, is this a lead to the encryption scheme of the 340? (daikon) Related: 2 symbol cycle analysis for the 340 evens only. (doranchak) 2 symbol cycle analysis for the 340 odds only. (doranchak) Symbol position factors for the 340, 408 and smokie ciphers. (doranchak) 340 cipher numeric and symbolic version: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 5 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 20 34 35 36 37 19 38 39 15 26 21 33 13 22 40 1 41 42 5 5 43 7 6 44 30 8 45 5 23 19 19 3 31 16 46 47 37 19 40 48 49 17 11 50 51 9 19 52 53 10 54 5 44 3 7 51 6 23 55 30 17 56 10 51 4 16 25 21 22 50 19 31 57 24 58 16 38 36 59 15 8 28 40 13 11 21 15 16 41 32 49 22 23 19 46 18 27 40 19 60 13 47 17 29 37 19 61 19 39 3 16 51 20 36 34 62 63 53 31 55 40 6 38 8 19 7 41 19 23 5 43 29 51 20 34 55 38 19 3 54 50 48 2 11 25 27 20 5 61 14 37 31 23 16 29 36 6 3 41 11 30 50 14 53 37 28 19 52 20 51 40 63 47 42 34 22 19 18 11 50 51 20 36 21 58 44 3 6 15 51 18 7 32 50 16 53 61 28 36 8 53 48 19 19 34 20 59 12 30 35 53 47 56 2 4 8 38 39 50 55 19 11 36 28 45 40 20 31 21 23 5 7 28 32 37 57 15 16 3 36 14 19 13 12 63 56 29 19 51 6 26 20 11 33 13 19 19 33 26 56 40 26 36 9 23 42 1 14 54 21 33 5 11 51 10 17 26 29 43 48 20 46 27 23 20 30 55 56 36 4 37 25 1 18 5 10 42 40 39 23 44 62 11 31 58 19 HER>pl^VPk|1LTG2d Np+B(#O%DWY.<*Kf) By:cM+UZGW()L#zHJ Spp7^l8*V3pO++RK2 _9M+ztjd|5FP+&4k/ p8R^FlO-*dCkF>2D( #5+Kq%;2UcXGV.zL| (G2Jfj#O+_NYz+@L9 d<M+b+ZR2FBcyA64K -zlUV+^J+Op7<FBy- U+R/5tE|DYBpbTMKO 2<clRJ|*5T4M.+&BF z69Sy#+N|5FBc(;8R lGFN^f524b.cV4t++ yBX1*:49CE>VUZ5-+ |c.3zBK(Op^.fMqG2 RcT+L16C<+FlWB|)L ++)WCzWcPOSHT/()p |FkdW<7tB_YOB*-Cc >MDHNpkSzZO8A|K;+ Alterations of the 340: - In relation to the bigram peak at period 19: Scheme: move 1 row down, 2 columns right and repeat (wrap around cipher): 340_1rd-2cr-w.txt (doranchak) Grid 19 by 18, direction North-East (vertical) and 2 "?" symbols added: 340_19by18_n-e.txt Grid 20 by 17, direction SW-SE (diagonal): 340_20by17_sw-se.txt Grid 17 by 19, 17 symbols filler at end, vertically untransposed: 340_323_17.txt (smokie treats) Grid 17 by 20, 16 symbols filler at end, vertically untransposed: 340_324_16.txt (smokie treats) Grid 17 by 20, 15 symbols filler at end, vertically untransposed: 340_325_15.txt (smokie treats) Grid 17 by 20, 14 symbols filler at end, vertically untransposed: 340_326_14.txt (smokie treats) Grid 17 by 20, 13 symbols filler at end, vertically untransposed: 340_327_13.txt (smokie treats) - In relation to the odd/even encoding scheme: Evens only: 340evens.txt Odds only: 340odds.txt Randomized, shuffled: 340shuffled.txt (doranchak) Tools/links/solvers: - David Oranchak Zodiac Killer Ciphers:Zodiac Ciphers wiki:... =Main_Page CryptoScope:340 Webtoy:Zodiac Pattern Drawer:| (info) Word Search Gadget:- glurk ZKDecrypto:and viewtopic.php?f=81&t=2268 - Michael Cole The Zodiac Revisited:- Jarlve AZdecrypt:Visualizations: - In relation to the bigram peak at period 19 and 15 (mirrored 340): Doranchak's ngram viewer. Doranchak's period calculator. Doranchak's fragment explorer. Test ciphers: I'd like to introduce a whole new range of ciphers to test on, mainly being homophonic substitution but with different schemes. More will be added and particular schemes can be requested. All of these ciphers can have low count 1:1 substitutes. Please use the proper names of the ciphers when referencing them. There should be no errors in these ciphers but the number of homophones per letter were handpicked each time to introduce a human element. Perfect cycles: c_p1.txt c_p2.txt c_p3.txt Randomization of cycles: (the numb...

doranchak

(@doranchak)

Posts: 2614

Member Admin

Thanks for linking to your previous study. It is useful to see that random cycles have very similar stats to full shuffles. It seems that the individual cycle sequences in Z340 have a relatively high probability of occurring strictly by chance, but for so many cycles to do so seems very improbable.

If we are lucky, the disruption in cyclic symbol assignment is directly connected somehow to the additional steps that may have been applied to the plaintext.

I’m interested to see what you come up with about the regional bias question for cycles.

http://zodiackillerciphers.com

Posted : May 2, 2016 2:24 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

Thanks for linking to your previous study. It is useful to see that random cycles have very similar stats to full shuffles. It seems that the individual cycle sequences in Z340 have a relatively high probability of occurring strictly by chance, but for so many cycles to do so seems very improbable.

If we are lucky, the disruption in cyclic symbol assignment is directly connected somehow to the additional steps that may have been applied to the plaintext.

I’m interested to see what you come up with about the regional bias question for cycles.

Wouldn’t you have to have a cycle in the plaintext to get a false homophonic cycle in the ciphertext? Am I thinking right?

They would probably be mostly random and with not very many consecutive alternations. But don’t some letters naturally occur before other letters in common words or word orders ( T comes before H and E)? I am just wondering if a study of naturally occurring plaintext cycles, including particular letters, count of consecutive alternations, frequency, and actual distance ( consider T, H and E) between high frequency letters would help us to guess if a ciphertext cycle is true or false.

For example, a two ciphertext cycle with a few consecutive alternations, and where two positions are close to each other, may be caused by a repeated phrase. Not taking transposition into consideration.

I am not really sure if a transposition, performed before encoding, would affect cycles. Except that it would destroy the plaintext cycles that occurred before transposition, and create plaintext cycles.

Posted : May 2, 2016 3:44 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

Yes, it makes more sense that the disruption would be because of a post-encipherment step.

Also, a falsely detected homophone cycle does not necessarily reflect the conditions of the plaintext, since the symbols in the false cycle don’t necessarily stand for the same letter. They could be anything. I don’t think the cycle would need to exist in the plaintext. Maybe it would help to dig up some sample false cycles from Z408 and other test ciphers.

http://zodiackillerciphers.com

Posted : May 2, 2016 4:03 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I think that a good exercise would be to find randomly occurring cycles in a plaintext sample. Then cycle encode with 63 symbols. Find the false cycles in the message. Then compare to the originally found randomly occurring false cycles in the plaintext sample.

Posted : May 2, 2016 4:59 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

Thanks for your massive analysis doranchak, I like it allot!

Here is another interesting observation. Look at the values for “perfectCycle” for L=2, 3, and 4 for each of the cipher. For smokie33, sigma is 6.7 for L=2, then goes up to 8.6 for L=3 and stays at 8.5 for L=4. For Z408, sigma is 30.6 for L=2, then more than doubles to 77.7 for L=3, and leaps to 187.4 for L=4. However, for Z340, sigma is 5.7 for L=2 and then drops significantly to 3.8 for L=3 and 2.7 for L=4. Perhaps whatever is screwing with ngram repeats has this suppressing effect on cycling as L increases.

That’s extremely interesting. Smokie and I have found the 340 to be "very cyclic" but both of us have mainly looked at 2 symbol cycles. It is also interesting to note that smokie’s wildcard scheme has the effect of suppressing the non-repeat score. That seems not to be happening for the 340. As a reminder there is strong peak for the non-repeating string frequencies at 17. And empirically, I found that it’s hard to mix that observation with randomization of cycles or transposition after encoding.

I have just ran a 3 symbol cycle test and found the 340 to be further away from the randomized average than with 2 symbol cycles but my test may be flawed. That aside let’s assume that the higher symbol cycles are indeed more suppressed. Because that is what is shown relative to the smokie33.

Wildly brainstorming, just want to throw some rough ideas around.

– Prior to the homophonic substitution an encryption was used that redistributed the letters over a large number of symbols. Let’s say 40 or so, meaning that not many 3 to 4 symbol cycles had to be used.
– Something intentionally was applied to the higher symbol cycles to mask them.
– Some kind of polyalphabetism thrown into the mix.

AZdecrypt

Posted : May 2, 2016 5:44 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

340 versus AZdecrypt:

A particular test I wanted to run, and also to test the solving routine of the new and upcoming AZdecrypt. It now automatically normalizes the scores of the n-gram files so that on average, ciphers score the same. The 340’s score of 20351 with the Practical Cryptography 5-grams has been kept. So for this test, the normal 340 is scored (maxed out) with 6 different sets of n-grams and 2 different plaintext normalizations, namely the index of coincidence and entropy.

Notice that the 340’s plaintext solution is a dominant theme with almost all results, I still wonder if that is to be expected. I think that ZKDecrypto also converges on this theme. Also notice that the scores go down as the n-gram size goes up, while I previously stated n-gram score normalization. This is because the solver cannot find a good fit and thus lower scoring less-frequent n-grams have to be used. And when the n-gram size goes up, from 5 to 6 and 7, the fit becomes increasingly worse if the cipher somehow is not as expected.

Index of coincidence:

AZdecrypt 0.993 (Practical Cryptography 5-grams)

Score: 20351.27 Ioc: 0.07520389 Entropy: 3.919787 Characters: 340 Letters: 20

gshiscentopernatm
estelsilarldeiavo
edvantocarlorsegi
nsspectingsitthat
sunterimporttheof
sthercisimporital
sotablytoacanderp
lativisitseletoru
mentatchtreadysea
seconteitispereds
othforspalesannai
teachipionendther
esundsteporealyth
careevoteadanertt
deceiveupsinocost
padgeealisedvnbat
hantrespetrcrepor
ttorperatingnflos
promrepreslieispa
inagesonecitypayt

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Reddit corpus 5-grams)

Score: 20650.56 Ioc: 0.06810689 Entropy: 4.020609 Characters: 340 Letters: 21

glhystundeconsedp
ostersiformnoiate
edcasticerrensogi
nssbuttingsitthad
mastolipcardtheep
sthurtiliplerydor
satawfudiavennonc
reditisitmomotina
postitchdreadywea
lotintuitisboredl
ithpallcomesissai
doathiciasesnther
owandstocarearuth
teroutadeinaneltt
devoiceallynicalt
cangoearisuntswed
hastnowlotrtrecen
tterloradingspres
creproblemmieilla
ysogosenocitycaut

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Usenet corpus 6-grams)

Score: 17804.07 Ioc: 0.07766788 Entropy: 3.867273 Characters: 340 Letters: 19

gahsscentoplanotm
ustersilorsdeiane
edmantocorreasegi
nsspectingsitthat
sinterimporttheoe
sthercisimworstor
sotablytoacondeap
rotinisitsusethai
mentatchtreadylea
seconteitispereds
otheoraposesannai
teachipionendther
elindstuporearyth
coruenoteadanertt
declimeiwasnocost
padgeearisednnbot
hantallwetrcrepea
tterweratingneres
promrepressieiswa
snogusonecitypayt

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Reddit corpus 6-grams)

Score: 17836.36 Ioc: 0.07860489 Entropy: 3.848518 Characters: 340 Letters: 20

gdhiscantopernotm
estersilarlsoiamo
edvastocorrorsegu
nssmactingsitthat
wastenimparttheoe
stharcinimnoritar
satabletoaconserp
rotumisitweletora
mostitchtreadydea
necontautismoredn
otheandpalesinsai
toachupianessther
edandstepareareth
coreamateisanentt
deceiveandinocant
pasgeearisasmsbot
hantrednotrcrepor
ttorneratingneros
promromnewlieinna
isagesonecitypaet

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Usenet corpus 7-grams)

Score: 14542.63 Ioc: 0.08578865 Entropy: 3.731695 Characters: 340 Letters: 18

gdhsscentopornotd
estersilarlseaamo
edhastocorrorsegi
nsspectantsitthat
wastenedparttheoe
sthercinaddorstar
satabletoaconserp
rotimesitweletgra
destitchtreadylea
neconteitisperedn
otheandpalesinsai
teachipaanessther
elandstepareareth
coreemateisanentt
decoaheaddsnocant
pasteearisesmsbot
hantroldetrcrepor
ttorderatingneros
prodrepnewlieanda
ssagesonecitypaet

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Reddit corpus 7-grams)

Score: 14631.92 Ioc: 0.09003991 Entropy: 3.637693 Characters: 340 Letters: 16

iiestsinmeforlutn
ettendonarmseasto
eshastoturnordois
ntthishandtottest
hastoconfirmtheel
theirsonanderstan
ditsandtoagunsorf
nutstodothemotora
nestattetreasures
nosontistotheresn
otelicifametalsso
teasesfailessther
oransdtefireandhe
sureititeasanectt
segoaheadisnotint
fasdoesnotistsaut
ealtrordetrsrefor
ttordoramonillnot
frenrehcehmoeanda
ssaietenotohufsdt

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+

Entropy:

AZdecrypt 0.993 (Practical Cryptography 5-grams)

Score: 20365.06 Ioc: 0.07515183 Entropy: 3.980048 Characters: 340 Letters: 22

gohiscentofurnatm
estelsikarrdeiave
edlantocarlersegi
nsspectingsitthat
wanterimforttheob
sthercisimporital
sotackstoabanderf
lativisitweretura
mentatchtreadymea
seconteitispereds
othborofaresannai
teachifionendther
emandsteforealsth
careevoteadanertt
debuileapoinocost
fadgeealisedvncat
hantrumpetrcrefer
tterperatingnbles
fromreprewrieispa
inagesonecityfast

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Reddit corpus 5-grams)

Score: 20589.14 Ioc: 0.07188964 Entropy: 4.008175 Characters: 340 Letters: 22

gohiscandoflenotm
estendilarvsoiape
edkastocorneedegu
nssmactingsitthat
wastenimfordtheoi
stharcisimporitan
dotablytoalonseef
notupiditwevethea
mostitchtreadylea
secontautismoreds
othionofavesinsai
toachufionessther
elanddteforeanyth
coreapoteisanentt
dellikeapoinocost
fasgeeanisaspsbot
hantellpotrcrefee
tterperadingnines
fromromnewvieispa
isagesonecityfayt

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Usenet corpus 6-grams)

Score: 17814.52 Ioc: 0.08351553 Entropy: 3.838646 Characters: 340 Letters: 21

gahsscentoplanots
ustersilorsdeiame
edfantocorreasegi
nsspectingsitthat
sinteresparttheoe
sthercitisworstor
satabletoacondeap
rotimesitsusethai
sentatchtreadykea
teconteitisperedt
othearaposesannai
teachipianendther
ekindstupareareth
coruemateadanertt
declifeiwasnocatt
padgeearisedmnbot
hantalkwetrcrepea
tterweratingneres
prosrepressieitwa
snogusonecitypaet

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Reddit corpus 6-grams)

Score: 17835.91 Ioc: 0.08158945 Entropy: 3.868229 Characters: 340 Letters: 22

gohuscartofrelatm
estelsieandtooane
edvastecanleesigi
nsstactordsitthat
wastingmforttheoe
stharcinomsorutal
sotakeiteapartief
latingsitweditbea
mostatchtreadymea
nicertaitistoredn
etheonofadesalsai
toachifoolestther
imandsteforealith
careanoteatarentt
deproveasourecont
fatdiealisatnskat
haltermsotrcnefee
ttensinatingleles
fromnotnewdieonsa
usagesonicityfait

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Usenet corpus 7-grams)

Score: 14620.16 Ioc: 0.08440048 Entropy: 3.848159 Characters: 340 Letters: 21

gphsscentoflenotm
estersilarrseiane
ednastocorreesegi
nsspectingsitthat
wastexamforttheou
sthercisimporstar
sotabletoaconseef
rotinasitwerethea
mestitchtreadylea
seconteitispereds
othuoxpfaresinsai
teachifionessther
elandsteforeareth
coreenoteisanextt
declineappsnocost
fasgeearisesnsbot
hantellpetrcrefee
tterperatingnures
fromrepxewrieispa
ssagesonecityfaet

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+


AZdecrypt 0.993 (Reddit corpus 7-grams)

Score: 14546.84 Ioc: 0.09694604 Entropy: 3.640735 Characters: 340 Letters: 20

tirlotsownstabhee
ioaitthecaterstth
ingsearthathatste
aooustusonohaarte
nmeaspieshewasony
oursetheseanelect
thatdefersshoesas
theetithanitsagam
ereanatreeisnttot
estroaseahoureine
raryhpisctionbeth
erstresshboeeasie
stmantaisheistfur
theistheonesoopaa
nistsgomailorthea
ssensitthosetedhe
rsbaattaraetaisha
aahaasaswhatbytho
senearupinthiseas
lectionasthutstfa

Multiplicity: 0.1852941 Symbols: 63 Bigrams: 25 Flatness: 0.7331541 Sequential: 0.2292084

HER>pl^VPk|1LTG2d
Np+B(#O%DWY.<*Kf)
By:cM+UZGW()L#zHJ
Spp7^l8*V3pO++RK2
_9M+ztjd|5FP+&4k/
p8R^FlO-*dCkF>2D(
#5+Kq%;2UcXGV.zL|
(G2Jfj#O+_NYz+@L9
d<M+b+ZR2FBcyA64K
-zlUV+^J+Op7<FBy-
U+R/5tE|DYBpbTMKO
2<clRJ|*5T4M.+&BF
z69Sy#+N|5FBc(;8R
lGFN^f524b.cV4t++
yBX1*:49CE>VUZ5-+
|c.3zBK(Op^.fMqG2
RcT+L16C<+FlWB|)L
++)WCzWcPOSHT/()p
|FkdW<7tB_YOB*-Cc
>MDHNpkSzZO8A|K;+

AZdecrypt

Posted : May 2, 2016 8:01 pm

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

I wanted to do another post on my flatness measurement and it seems to chime in with some of the current work.

Flatness is just a number between 0 and 1 indicating how flat a distribution is. In this case the frequency of the symbols. It’s something that goes up when a cipher is more sequential. If you have a cipher with 100 symbols and 10 uniques each with a frequency of 10 then the cipher would be considered totally flat and would score 1.

When the cipher is not sequential there seems to exist a correlation between the flatness of the cipher and that of the solved plaintext, this is what I have noticed empirically, and if true it is very useful. The flatness of most plaintexts around the length of 340 seem to be 0.666.

Let’s look at the flatness of some ciphers, solved and unsolved to see how interesting this measurement appears to be.

408: 0.8383579
340: 0.7331541 <—

beale1: 0.4013346 <—
beale2: 0.574114
beale3: 0.4767701 <—

glurk4 (beale 3 emulation): 0.5943292
alanbenji: 0.6551447

First of all, the unsolved beale1 and 3 have a very low flatness that could indicate the plaintext is not as expected. This chimes in with the hoax theory surrounding those ciphers. Now for the 340, it is lower than the 408. But we already knew that the 340 appears more randomized. Though, the flatness observation may suggest that it likely is not the cause of transposition (after encoding), otherwise the flatness would have been around that of the 408. I think it’s an important distinction to make.

AZdecrypt

Posted : May 2, 2016 8:29 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I am going to find all of the plaintext cycles in the Jarlve 100 library, and experiment with false cycles caused by them. Just for some perspective.

Then I am still thinking about regional bias, maybe repeats compared to cycles, to see if some symbols are more cyclic and also members of repeats in certain parts of the message.

Posted : May 3, 2016 2:53 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

Here are some quick and dirty results of a run that looks for cycles in z408’s plaintext. It then computes the cipher symbol sequence that represents the plaintext cycles, and outputs the cycles found in the cipher symbol sequences.

http://www.zodiackillerciphers.com/z408-plaintext-cycles-l2.html
http://www.zodiackillerciphers.com/z408-plaintext-cycles-l3.html
http://www.zodiackillerciphers.com/z408-plaintext-cycles-l4.html

Many of the plaintext cycles really do end up generating cycles in the resulting cipher text, even when the symbols don’t stand for the same plaintext letters.

http://zodiackillerciphers.com

Posted : May 3, 2016 5:48 am

doranchak

(@doranchak)

Posts: 2614

Member Admin

Regarding smokie31:

I don’t know what the plaintext is because I forgot to save it. But it is one of the Jarlve 100 library. However, we have the key, the period, and the polybius square. So is can be solved with those. We could just solve the first 38 positions and figure out what message it is.

I applied the key but I could not get a decrypt on the result with polybius square "abcde fghik lmnop qrstu vwxyz" and various period values. Can you say what square and period you used?

http://zodiackillerciphers.com

Posted : May 4, 2016 2:08 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

From my notes:

smokie31 is a bifid message, with a plaintext period of 38, and a ciphertext 1 period of 19

I also wrote down that I inscribed from right to left top down, but the period seems to be from left to right top down. So I may have made a typo in my notes. Try the first 38 positions going from left to right. If that doesn’t work, then try from right to left.

Here is the key, and your polybius square is correct:

A 1 2 3 48
B 4 5 6
C 7 8 9
D 10 11 12
E 13 14
F 15 16
G 17 18
H 19 20
I 21 22 23 39
J
K 24 25
L 26 27
M 28 29 30
N 31 32 33
O 34 35 36 39
P 37 38
Q 39 40 41
R 42 43 44
S 45 46 47 39
T 48 49 50
U 51 52
V 53 54
W 55 56
X 57 58 59
Y 60 61
Z 62 63

EDIT: Inscription is left right top bottom. But note that the Polybius square coordinates are column-row, not row-column.

Posted : May 4, 2016 3:21 pm

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

I found all of the plaintext cycles in the Jarlve 100 library, and here are the stats. The number next to the pair of plaintext is the average number of consecutive alternations across all 100 messages. The bar chart is very condensed, and doesn’t show all of the plaintext pairs. But it does have a distinct shape, with a peak and high area near the top, and a sudden drop where certain plaintext like X are included.

Some plaintext pairs cycle with each other more than others. The plaintext pairs don’t necessarily show the first order. In other words, plaintext pair AN could have a cycle that starts with A or N.

The top three are AN, IN and NO. Note that all include N.

The next group of six are HT, IT, AT, OT, ET, and NT. Note that all include T.

The next group of six are OR, RT, ST, IS, AR and IR. There are a mix of R, S and T.

It occurs to me that plaintext included in high frequency bigrams are close to the top, such as AN, HT ( TH is the same thing ), and ET ( TE is the same thing ). And plaintext that often appear in pairs, such as L, aren’t close to the top because a pair destroys a cycle.

Now I am wondering. Is a false cycle is a plaintext cycle that shows up as two ciphertext that are members of two different homophonic cycle groups, and which just happen to be cyclically encoded so that the plaintext cycle can be detected?

Probably not. The cyclic encoding of the homophonic cycle groups doesn’t have to correlate perfectly with the plaintext cycle. The cyclic encoding of the homophonic cycle groups can skip over parts of the plaintext cycle, if there even is one. I think that when we are looking at false cycles, we are probably looking at two plaintext that don’t cycle together, but look like they cycle together because of the homophonic cycles.

I will do a little more experimenting soon. I am just wondering if false cycles and true cycles have slightly different characteristics with regards to the spacing between ciphertext. Is there a way to tell the difference?

Posted : May 5, 2016 4:19 am

Jarlve

(@jarlve)

Posts: 2547

Famed Member

Topic starter

It occurs to me that plaintext included in high frequency bigrams are close to the top, such as AN, HT ( TH is the same thing ), and ET ( TE is the same thing ). And plaintext that often appear in pairs, such as L, aren’t close to the top because a pair destroys a cycle.

Nice observation!

I will do a little more experimenting soon. I am just wondering if false cycles and true cycles have slightly different characteristics with regards to the spacing between ciphertext. Is there a way to tell the difference?

I think that true cycles on average would have a more equal spacing.

Let’s suppose this plaintext in the code box. A and B represent a common plaintext bigram repeat and the C, D and E are more randomly. We encode 2 symbols per letter, 1 and 2 denote the cycle symbols. Let’s also suppose that frequent bigrams are more likely to survive the encoding process so that after encoding we have at least 1 bigram repeat of A and B.

   ABDECABCDEABECDABDCE
A: 1    2    1    2
B:  1    2    1    2
D:   1     2     1  2 
E:    1     2  1      2
C:     1  2     1    2

In this case all symbols are cycling perfectly.

Can we recognize if a bigram repeat in the cipher is a more frequent bigram repeat in the plaintext by measuring how well it cycles? Could we possibly even sum these values and divide them by the amount of unique bigrams to come up with some number that tells us if one transposition is more likely than another?

And what symbols are truly cycling with eachother?

Let’s take a look at some of the distances.

A1, A2: 5, 5, 5 <— more likely to be a true cycle (higher flatness)
B1, B2: 5, 5, 5 <— more likely to be a true cycle (higher flatness)
A1, B1: 1, 9, 1
A1, B2: 6, 4, 6
A2, B2: 1, 9, 1

So I believe that when a bigram repeat in the cipher (that consists of 2 unique symbols?) that is a more frequent "true" bigram repeat in the plaintext will have the following properties: above average cycling and low flatness (less evenly spaced throughout the cipher). But I think that in general most bigram repeats in the cipher may exhibit this property although we may be able to rank them.

AZdecrypt

Posted : May 5, 2016 5:49 pm

doranchak

(@doranchak)

Posts: 2614

Member Admin

But note that the Polybius square coordinates are column-row, not row-column.

That was the clue that unlocked the plaintext for me.

The discovery of the coin perplexed Giles. It was certainly the trinket attached to the bangle which he had given Anne. And here he found it in the grounds of the Priory. This would argue that she was in the neighborhood, in the house it might be. She had never been to the Priory when living at The Elms, certainly not after the New Year, when she first became possessed of the coin.

Here’s the actual plaintext I found which includes some gibberish at the end. Is it intentional?

thediscoveryofthe
coinperplexedgile
ntsyassurtainlyth
etrhoketattacheot
othebanglewhichhe
hadgivenannuqndhe
rehefopsditintheg
roundsohthepriory
thhtwouldamruethc
tshewlshooseneigh
borhoodinthehouse
itmhihtbeshehqtne
verbeentotheprior
ywhenlixingatosee
lmscertaholynotah
tprthenewyelrwhen
shefirstbecamepos
spssedofthecoince
iestdkddcugeqdgpt
cqintntrpodoeuemd

(note: plaintext shows some of the errors resulting from polyalphabetism)

http://zodiackillerciphers.com

Posted : May 6, 2016 1:06 am

smokie treats

(@smokie-treats)

Posts: 1626

Noble Member

It was not intentional. The last section of the message is only 36 plaintext long instead of the period of 38. That causes a blank spot int the bifid encoding at position 323, where I inserted the symbol "1". I checked the coordinates for the first few ciphertext 1 and plaintext positions of the last section, and it should solve.

Posted : May 6, 2016 3:27 am

Zodiac Discussion Forum