Unicity Distance History

The Original Approach: Linear & Arbitrary

We began by using the same formula from YD’s paper to compute the unicity distance (The relevant section is 2.4.2: “Unicity Distance”)

YD assumed the following arbitrary increases in equivocation due to conflating:

Aspirated and unaspirated phonemes: 10%
Retroflex and dental phonemes 10%
All sibilants: 150%

We instead used more conservative estimates:

Aspirated and unaspirated phonemes: 20%
Retroflex and dental phonemes 20%
All sibilants: 200%

Our assumptions are summarized in this table:

Latin Symbol	Sanskrit Phoneme Class	Assumed Increase in Equivocation
E	अ, आ	20%
T	श, ष, स, ह	200%
A	त, थ, ट, ठ	20%
O	इ, ई, य्	20%
I, J	न	0%
N	र, ऋ, ॠ	20%
S, Z	व	0%
H, X	म	0%
R	य	0%
D	द, ध, ड, ढ	20%
L	ए, ऐ, अय्	20%
U	उ, ऊ, व्	20%
C	प, फ	20%
M	क, ख	20%
W	ओ, औ, अव्	20%
F	ब, भ	20%
G, Q	च, छ	20%
Y	अस्, अः	20%
P	अन्, अं, ङ्, ञ्	20%
B	ग, घ	20%
V	ज, झ	20%
K	ल, ऌ	20%

Using these, the unicity distance ( U ) was computed as:

\[U = \frac{(26 + 18 × 0.2 + 1 × 2)}{0.7} \approx 45.143\]

Hence, if our “decipherment” of the U.S. Constitution produced coherent output beyond 46 characters, it would’ve met the same standard that Yajnadevam set for his own work.

The First Revision: Actually Considering the Keyspace

Immediately after we crossed 46 symbols, it was pointed out by one of his followers that a “better formula” was available in his discord server:

the discord is already public dude, i am not trying to hide anything, anyways here goes pic.twitter.com/1RgZ7zNndV
— The Butter Thief (@TheButterThief) July 9, 2025

The primary distinguishing feature of this approach is that it actually considers all possible phoneme-combinations that can act as the key, as compared to the previous formula which arbitrarily weighted phoneme classes linearly thus missing the exponential growth in keyspace due to many-many mappings. Even here, one of the formulae used his subjectively reduced list of ciphertext symbols.

So, using our complete ciphertext and our subjectively reduced one due to the allograph table, we end up with two possible unicity distances, again using the same methods as Yajnadevam - namely:

\[\frac{3+\log_2{(2^{26})^{52}}}{0.7\log_2{52}} \approx 339.57\]

and

\[\frac{3+\log_2{(2^{26})^{81}}}{0.7\log_2{81}} \approx 475.225\]

And sure enough, we crossed these distances as well.

The Second Revision

Based on the latest exchange, we expect a second revision of unicity distance:

However, I anticipate some flaws in your yet-to-come formula already:
#1 will be some extra term bce I've "included vowels /e/ /ai/ and non-vowel /ay/ in the same class". However, /ay/ is grammatically the SAME as dipthong /e/, as is explicitly mentioned in sUtra eco'yavAyAvaH.
— उ॒ग्रश्र॑वस् (@Ugrashravas) July 14, 2025

~~It is yet to be seen by how much this will increase the unicity distance.~~

Final Note

Apart from the fact that Unicity Distance is not a measure of correctness, all formualtions used by Yajnadevam for finding unicity distance are faulty as they use constructs developed for ciphers with reversible keys. In Shannon’s own words (emphasis added):

A secrecy system is defined abstractly as a set of transformations of one space (the set of possible messages) into a second space (the set of possible cryptograms). Each particular transformation of the set corresponds to enciphering with a particular key. The transformations are supposed reversible (non-singular) so that unique deciphering is possible when the key is known.

— Communication Theory of Secrecy Systems, Pg 657

As we’ve seen, Yajnadevam’s key is not reversible. Knowing the key does not allow us to create a unique ciphertext or recover a unique plaintext. The same message (say रव) can be encoded as many different cryptograms (, , , and \(267\) others) and the same cryptogram (say ) can be deciphered as many different messages (सट, षत, स्थ, हट्ठ, and \(92\) others).

Finally, the redundancy of the “language” generated by abusing the very liberal rules of Pāṇini is likely much lower than the 0.7 for most real languages. This was pointed out by us at the very beginning, but was met with dismissal. Presently, however, Yajnadevam seems to have entered a fascinating state where he simultaneously believes and denies it:

Panini redundancy may need to be accounted for. Depending on the number of kridantas used etc. I have speculated that Panini redundancy is much lower than the 0.7 for most languages.

— Yajnadevam, Jul 17 4:13 AM

“If you torture Panini, you can read anything” is a Bollywood rendition of “Panini grammar has a very low redundancy”. Of course you can’t read anything. You can’t read एएएएएए. The redundancy is not close to zero. It is closer to 0.7 than it is to 0.

— Also Yajnadevam, Jul 17 10:23 AM

Summary

We had anticipated Yajnadevam to start adjusting and negotiating the unicity distance upward a couple times before finally devolving into word salad and goalpost shifts. In that spirit, we visualized the history of UD values: earlier ones are shown in pink, while the most recent value provided by Yajnadevam appears in red.

The red marker has moved 2 times as of date. Presently, Yajnadevam seems to have given up on hard math and shifted to fuzzy word salad.

As of date, the unicity distance is 475, and 508+ symbols have been translated (green bar). This may be outdated — see the translation page for live numbers.

Progress in first 1000 symbols

Current progress: 50.80%. Last updated: 2025-Jul-14, 21:49 (GMT +5:30)

⬅️ Back to Overview