Ethereum: How much entropy is lost alphabetising your mnemonics?

Ethereum: How Much Entropy Is Lost When Alphabetizing Mnemonics?

As a thought-provoking exercise, let’s delve into the world of entropy and its implications on mnemonic systems. In this article, we’ll explore how alphabetizing mnemonic devices can lead to significant data loss due to inherent mathematical properties.

What isEntropy?

Entropy is a measure of disorder or randomness in a system. In the context of computer science, it refers to the amount of information lost when data is processed or transformed. Entropy increases with the number of bits required to represent a piece of information.

BIP39 Compliance: A Crucial Concern

The Brainpool 39 (BIP39) standard is an algorithm used for generating secure seed words for wallets that support mnemonic-based cryptocurrency storage. While BIP39 is designed to ensure data security, it has been criticized for not being BIP32 compliant, a crucial aspect of Ethereum’s tokenomics.

The Alphabetic Mnemonic Shuffle

Suppose we have a large collection of mnemonics, each consisting of 12 words (a common mnemonic length). We alphabetize these mnemonics using a standard ordering scheme. Mathematically speaking, this can be represented as a permutation problem:

P(12) = Σ(n! / (n-i)!*i!), where n is the number of items (mnemonics), and i is the number of times we permuted them.

In our example, P(12) ≈ 1.94 trillion

The Entropy Conundrum

Now, let’s consider how many unique combinations are generated by these permuted mnemonics:

Σ(n!/(n-i)!*i!), where n = 12 and i ranges from 1 to

This calculation yields an enormous number of possibilities: approximately 2.86 quintillion

As we alphabetize each mnemonic, a significant portion of this entropy is lost due to:

Redundancy: Each permutation represents multiple original permutations, resulting in wasted data.

Symmetry: The ordering scheme used for alphabetizing can lead to symmetrical distributions, where identical permutations are treated as distinct.

Cyclical nature: When we permute the same set of mnemonics, they eventually cycle through all possible orderings, leading to an exponential increase in data loss.

Consequences and Implications

The immense entropy generated by alphabetizing mnemonic devices has far-reaching implications for:

Cryptocurrency storage: As the number of available tokens increases, so does the potential for catastrophic losses if mnemonic systems are not optimized.

Secure key generation: The entropy lost through mnemonic shuffling can compromise the security of cryptographic keys used in token wallets and exchanges.

Data preservation: The sheer magnitude of data loss raises concerns about the long-term preservation and accessibility of historical mnemonics.

In conclusion, alphabetizing mnemonic devices is not a harmless practice and comes with significant entropy costs. By understanding these mathematical properties, we can better appreciate the importance of optimizing mnemonic systems for efficient storage, secure key generation, and data preservation on the Ethereum blockchain and beyond.