Edgecase | An investigation into recovering from transcription errors in Bitcoin private keys

Brief Summary

I investigated the use of an arithmetic checksum tree in Bitcoin private key storage. I found that this greatly helped to locate transcription errors and increase key recovery speed.

Contents

- Brief Summary
- Contents
- Summary
- Worked Example
- Generating The Checksum Tree
- Miscellaneous Details
- Writing A Bitcoin Private Key On Paper
- Writing A Bitcoin Address On Paper
- More Thoughts About Storing A Bitcoin Address
- Notes On Offline Storage Of Bitcoin
- Project Log

Summary

When writing a Bitcoin private key on paper, the goal is 100% accuracy. There are 2^256 possible private keys, and one of the security aspects of the Bitcoin cryptosystem is the enormous infeasibility of trying to find an unknown valid private key by brute-force mechanical address generation.

If you write a Bitcoin private key on paper and make 1 random transcription error (i.e. 1 character is incorrect), the key can be recovered in 3 minutes using commodity hardware.

If you make 2 random transcription errors, the key can be recovered in 2 days.

If there are 3 random errors, recovery will take 5 years. A faster codebase and machine, or cluster of machines, could bring this time down to something more tolerable.

Recovering the key requires testing variants of the private key, to see if they produce the correct Bitcoin address. Each additional random transcription error multiplies the number of required tests by a factor of approximately 1000.

Backup copies can help reduce risk, but you might make a mistake in each copy, or be worried that you had done so (and it is time-consuming to get a key out of storage and re-enter it into a computer to generate the address and thereby check the key's accuracy). Backup copies can also be lost or destroyed. It's good to plan for the scenario where you have a single remaining copy of the private key that is almost, but not quite, correct.

Storing an arithmetic checksum tree with a private key is cheap and dramatically improves key recovery time. The generation of the checksum tree can be automated (and performed on an offline machine), or it can be performed manually on paper with the help of a hexadecimal addition table.

If there is a random error in the private key, the checksum tree can be used to locate the group of 4 characters where the error exists. A program can be used to mechanically generate all variants of this group, and test all resulting variants of the private key. A group of 4 random hex characters can have 16^4 = 65536 possible values. On commodity hardware, this would take 3 hours.

If there are errors in multiple groups, we can use the checksum tree to divide up the problem. Each group could be varied to find the group values that generate the correct checksum character, and then the permutations of these group values could be tested to find the correct key. I haven't tested this approach in this project.

Transcription errors in the checksum tree can be identified by re-generation of the tree. If almost all the tree is unchanged, then a transcription error is the likely cause of the difference. In contrast, errors in the private key will produce further errors in the higher levels of the re-generated tree.

Final scenario: There are errors in both the private key and the checksum tree. In this case, a more inventive approach would be required. The checksum tree could be treated as "partial truth", and a more complex program could vary the groups until it found a partial match with the checksum tree. These partial matches could then be tested to see if they produce the correct address. I haven't tested this approach in this project.

A further consideration: These key-recovery approaches depend on having the correct Bitcoin address. The Bitcoin address is necessarily public, and can be shared with anyone without risk to the bitcoin held in it. It also has a hash-based checksum embedded within it, which permits mechanical error detection. Since the address is public, you can store multiple copies of it on network-connected computers, which means that it's not really as much of a resilience gain to design and use a checksum tree for it.

Bonus (assuming that there are no errors in the stored private key): When entering the private key into an offline computer, generating the checksum tree is a convenient way to discover and locate any transcription errors made during this process.

Worked Example

Here is a Bitcoin private key:

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

The corresponding Bitcoin address is:

16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj

We can generate a checksum tree and place it above the key:

9
ab31
96ed 0029 cb57 1b50

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

Let's suppose that in the private key, in Line 3, Group 1, the first character "3" was accidentally written as "8".

When we enter the key into an offline computer, and re-generate a checksum tree, we would see this:

e
ab81
96ed 0029 1b57 1b50

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
8225 c97f c6b8 1150
42cf 6696 8c2f b2e5

The checksum character at the top of the tree is different, which alerts us that there is a problem.

The third character in the checksum group on the second level of the tree is different, which tells us to look at the third group in the next level down.

In the third group of the checksum line, the first character is different. So: We know that the error is in Group 1 of Line 3 in the private key. We can then run a program that will vary this group until the correct key is found.

Generating The Checksum Tree

Standard human numbers are base 10 and the digits are:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Hex numbers are base 16 and the digits are:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f

Description of hexadecimal addition table: Two axes, X and Y. X runs horizontally along the top of the table. Y runs vertically down the left side of the table. Both X and Y contain the hex values 0-f. Each entry in the table is the hex sum of the relevant X and Y values. Examples: 0 + 0 = 0, 2 + 4 = 6, 5 + 5 = a, b + 7 = 12, f + f = 1e.

The minimum sum of a group of 4 hex characters is:
0 + 0 + 0 + 0 = 0

The maximum sum of a group of 4 hex characters is:
f + f + f + f = 1e + f + f = 2d + f = 3c

Algorithm:
- Sum the values in a group of 4 characters. Take the last character of this sum.
- Do the previous step for each group in a line of 4 groups. This will produce 4 checksum characters. Write this new group on a new line.
- Do the previous step for each line in the 4 lines that form a private key, writing each new group on the same new line. Result: A single new "checksum line" of 4 groups.
- Sum the values of each group in the checksum line. Take the last character of each sum. This will produce a single "checksum group" of 4 characters. Write this on a new line.
- Sum the values of the checksum group. Take the last character of the sum. This is a single "checksum character". Write this on a new line.

Miscellaneous Details

My work machine:

System details:
- Name: Shovel
- Specifications: HP 6005 Pro SFF. 3 GHz x86_64 processor (AMD II x4 B95 Quad Core), 4 GB RAM, 1 TB hard drive. Running CentOS 7.6.1810 (Core).
- More information: New computer: Shovel
- Installed items: GCC 4.8.5, Make 3.82, Vim 7.4, Python 2.7.5, Python 3.3.2, Gnome 3.28.2, gedit 3.28.1, GPG 1.4.10.

Writing A Bitcoin Private Key On Paper

A Bitcoin private key is 32 bytes long. A byte can be written as 2 hex characters. The private key below is written in hex and is therefore 64 characters long.

7972b641101c0ad67b0e401b800a9b6f3225c97fc6b8115042cf66968c2fb2e5

I have found it easiest to divide the key into groups of 4 characters, and lines of 4 groups.

This division process can be automated:

[spiano@shovel ~]$ echo -n "7972b641101c0ad67b0e401b800a9b6f3225c97fc6b8115042cf66968c2fb2e5" | fold --width 16 | sed 's/.\{4\}/& /g'; echo "";

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

Writing A Bitcoin Address On Paper

Here is the address from earlier:

16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj

The possible characters in a Bitcoin address are:
- 0-9 with no 0
- A-Z with no I or O
- a-z with no l

The 0, I, O, and l characters were excluded due to possible confusion with i and o.

When writing an Bitcoin address on paper, I have found it best to take these steps in order to reduce ambiguity:
- Underline every capital letter, taking care not to allow the underline to touch the character or any other nearby character.
- For a "1" character, include the extra bottom and diagonal lines (i.e. "1"), rather than writing only a single vertical line (i.e. "|").
- Add a tail to u/U to distinguish it from v/V.
- Cross the middle of z/Z to distinguish it from 2.
- Cross the top of a capital J, because you might as well do this.

Also, like a private key, it's easiest to write the characters in groups of four with a space between each group. Some letters will be left over that do not form a complete group of four.

The division-into-groups process can be automated:

[spiano@shovel ~]$ echo -n "16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj" | fold --width 16 | sed 's/.\{4\}/& /g'; echo "";

16AS CUS3 s7D4 UQh6
J9oH GuT1 9agP vD3P
Fj

Further notes:
- Be careful when underlining F, as if the underline is too close, it could look like E.
- Be careful when underlining 7. If you cross it with a horizontal line, it could look like Z when underlined. If you don't cross it, it could look like 2 when underlined.
- Make sure to close the loop properly on a 6, so that it doesn't look like G. Similarly, be careful not to close the loop on G, so that it doesn't look like 6.
- Underline 6 to distinguish it from b. This makes the overall underlining rule into "Underline capital letters and 6, and be careful of F and 7".
- Extend the tail in g so that it forms into a loop, preventing g from looking like q or 9.
- Add an upwards-diagonal tail to q, so that it doesn't look like 9.

More Thoughts About Storing A Bitcoin Address

Steps that can be taken to preserve an accurate copy of a Bitcoin address on an online computer:
- In a text file, don't just keep a single copy of the address. Keep several copies on a few consecutive lines. Some text editors, when you select a string and highlight it, highlight other identical strings in the document. This can be used as a quick test to confirm that all copies of the address are identical (i.e. that no copy has a typo).
- Keep a backup copy of the address-storage text file in the same directory.
- Backup the address storage data onto multiple external storage units.

Storing the address in offline storage:
- Write out the address next to the private key. This has several advantages:
-- The balance held by the key can be looked up via the address without re-calculating the address from the key.
-- A written-out address, kept offline, is a more resilient copy of the address. Address copies kept on network-connected computers might conceivably be altered by mistake or by third parties.
-- When the key is entered into an offline computer, the address can be used as an additional confirmation of the newly entered key's accuracy, by comparing it to the address generated from the entered key. If a transcription error has been made in the private key during the original storage or during entry into the offline computer, the generated address will be different.
- Design the offline storage in such a way that the private key is not exposed when you use it to view the address. Private keys can be stolen via cameras.
- Create a name for your own use for each private key and address combination, so that you can more easily differentiate between combinations. Store the name on the offline storage together with the combination. On a networked-connected computer, where you might look up the balance on this address, keep the name with the address.

Notes On Offline Storage Of Bitcoin

- Make at least two copies of the private key.
- Use a waterproof container to protect against water damage.
- Use backups in two separate locations to protect against fire damage.
- Think about the risk of theft: How can the storage unit be made inconspicuous and/or difficult to steal?
- Do not open the offline storage unit near a camera or potential camera.

Project Log

Let's take this Bitcoin private key from an earlier article as an example:

7972b641101c0ad67b0e401b800a9b6f3225c97fc6b8115042cf66968c2fb2e5

A Bitcoin private key is 32 bytes long. A byte can be written as 2 hex characters. The key above is written in hex and is therefore 64 characters long.

[spiano@shovel ~]$ echo -n "7972b641101c0ad67b0e401b800a9b6f3225c97fc6b8115042cf66968c2fb2e5" | wc -c

When writing a Bitcoin private key on paper, the goal is 100% accuracy: A single mistake could lose all the bitcoin. Backup copies can help, but you might make a mistake in each copy, or be worried that you had done so (and it is laborious to re-copy the key into a computer to generate the address and thereby check the key's accuracy).

I have found it easiest to divide the key into groups of 4 characters, and lines of 4 groups.

This division process can be automated:

[spiano@shovel ~]$ echo -n "7972b641101c0ad67b0e401b800a9b6f3225c97fc6b8115042cf66968c2fb2e5" | fold --width 16 | sed 's/.\{4\}/& /g'; echo "";

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

So: The key is now a 4x4x4 grid of characters. This format makes it much easier to:
a) read on a computer screen and write onto paper
b) read from paper and write into a computer file
c) check the new copy against the old one

Notes:
- Although hex characters are clearly distinct on a computer screen, the characters "6" and "b" can sometimes be difficult to distinguish when written on paper. Therefore, underline the "6" character, taking care not to allow the underline to touch the character or any other nearby character.
- When writing a "1" character, include the extra bottom and diagonal lines (i.e. "1"), rather than writing only a single vertical line (i.e. "|"). It is best to take every opportunity to remove possible ambiguity.
- It can be convenient to write the key on squared or lined paper. However, if you do, then be careful that the written characters do not run over onto thick line markings, as this could make a character ambiguous.

Unfortunately, despite all this effort to avoid inaccuracy and ambiguity, a mistake will almost certainly be made eventually while writing a copy of a private key. Let's suppose that the other backup copies have been lost or destroyed in accidents, and that there is only one remaining copy. A single inaccurate character in this private key could cause the loss of all bitcoin stored in the corresponding address. Reason: There are 2^256 possible private keys, and one of the security aspects of the Bitcoin cryptosystem is the enormous infeasibility of trying to find a valid private key by brute-force mechanical address generation.

However, if the location of the error can be found, then the characters in that location can be mechanically varied, and candidate addresses generated from these variant private keys until one produces the desired address. As long as the location comprises only a limited number of characters, a small-scale brute-force approach can work.

Hm. Now that I think about it, if you started with a private key that you are quite sure is 95%-99% accurate but has a mistake (or several) somewhere in it, whose location is unknown, you would not have to brute-force the entire key possibility space. You could proceed in this manner:
- Assume that 1 character is wrong. Vary each character in the private key through the possible 16 hex values. See if the desired address is generated.
- Assume that 2 characters are wrong. Vary all possible pairs of characters in the private key.
- Assume that 3 characters are wrong, etc.

This approach is time-consuming, even if programs are written to automate a lot of the work. However, it is still feasible, and the reward would likely be worth the effort.

Anyway: Let's plan for a mistake to be made, and try to store the key in such a way that this mistake can be easily located.

After some thought, I've decided to use an arithmetic checksum tree.

Approach:
- Sum the values in a group of 4 characters. Take the last character of this sum.
- Do the previous step for each group in a line of 4 groups. This will produce 4 checksum characters. Write this new group on a new line.
- Do the previous step for each line in the 4 lines that form a private key, writing each new group on the same new line. Result: A single new "checksum line" of 4 groups.
- Sum the values of each group in the checksum line. Take the last character of each sum. This will produce a single "checksum group" of 4 characters. Write this on a new line.
- Sum the values of the checksum group. Take the last character of the sum. This is a single "checksum character".

The advantage of this approach is that, if necessary, the checksum tree can calculated manually using pencil and paper. This would be much less feasible if a hash tree was used instead of an arithmetic sum tree.

So: Store the private key with a checksum tree. If, when a copy is made, there is a mistake in the private key, then the checksum tree can be used to locate the group of 4 characters where the mistake exists. A program can be written to mechanically generate all variants of this group, and test all resulting variants of the private key.

Let's work through this approach and thereby test it.

Standard human numbers are base 10 and the digits are:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Hex numbers are base 16 and the digits are:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f

We'll be doing arithmetic in base 16.

I'm using a useful hexadecimal addition table that I found here:
www.tutorialspoint.com/computer_logical_organization/hexadecimal_arithmetic.htm

Description of hexadecimal addition table: Two axes, X and Y. X runs horizontally along the top of the table. Y runs vertically down the left side of the table. Both X and Y contain the hex values 0-f. Each entry in the table is the hex sum of the relevant X and Y values. Examples: 0 + 0 = 0, f + f = 1e.

The first group in the private key is:
7972

7 + 9 = 10
10 + 7 = 17
17 + 2 = 19

The sum of the first group is 19. The last character of this sum is 9.

The second group in the private key is:
b641

b + 6 = 11
11 + 4 = 15
15 + 1 = 16

The sum of the second group is 16. The last character of this sum is 6.

Group 3: 101c
1 + 0 + 1 + c = e
Last character: e

Group 4: 0ad6
0 + a + d + 6 = 17 + 6 = 1d
Last character: d

So: The first group on the checksum line is: 96ed

Let's proceed through line 2 of the private key:
7b0e 401b 800a 9b6f

7 + b + 0 + e = 12 + e = 20
4 + 0 + 1 + b = 5 + b = 10
8 + 0 + 0 + a = 12
9 + b + 6 + f = 14 + 6 + f = 1a + f = 29
Last characters: 0029

Line 3 of the private key:
3225 c97f c6b8 1150

3 + 2 + 2 + 5 = 7 + 5 = c
c + 9 + 7 + f = 15 + 7 + f = 1c + f = 2b
c + 6 + b + 8 = 12 + b + 8 = 1d + 8 = 25
1 + 1 + 5 + 0 = 7
Last characters: cb57

Line 4 of the private key:
42cf 6696 8c2f b2e5
4 + 2 + c + f = 6 + c + f = 12 + f = 21
6 + 6 + 9 + 6 = c + 9 + 6 = 15 + 6 = 1b
8 + c + 2 + f = 14 + 2 + f = 16 + f = 25
b + 2 + e + 5 = d + e + 5 = 1b + 5 = 20
Last characters: 1b50

So: The checksum line for this private key is:
96ed 0029 cb57 1b50

Detour:

The minimum sum of a group of 4 hex characters is:
0 + 0 + 0 + 0 = 0

The maximum sum of a group of 4 hex characters is:
f + f + f + f = 1e + f + f = 2d + f = 3c

Next: Calculate the checksum group from the checksum line:
96ed 0029 cb57 1b50

9 + 6 + e + d = f + e + e = 1d + d = 2a
0 + 0 + 2 + 9 = b
c + b + 5 + 7 = 17 + 5 + 7 = 1c + 7 = 23
1 + b + 5 + 0 = c + 5 = 11
Last characters: ab31

So: The checksum group for this private key is ab31.

Next: Calculate the checksum character from the the checksum group:
a + b + 3 + 1 = 15 + 3 + 1 = 18 + 1 = 19
Last character: 9

Here is the original private key:

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

Using the checksum line, group, and character, the key can be written like this:

9
ab31
96ed 0029 cb57 1b50

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

Let's automate the generation of the checksum tree.

generatePrivateKeyChecksumTree.py

	#!/usr/bin/python


	# CONTROLS
	privateKey = "7972b641101c0ad67b0e401b800a9b6f3225c97fc6b8115042cf66968c2fb2e5"
	# END CONTROLS


	def main():

	checkValidityOfPrivateKey(privateKey)

	# cs = "checksum"
	csLine = getPrivateKeyChecksumLine(privateKey)
	csGroup = getLineChecksum(csLine)
	csCharacter = getGroupChecksum(csGroup)

	# print output
	print ""
	print csCharacter
	print csGroup
	print getSpacedLine(csLine)
	print ""
	print getSpacedPrivateKey(privateKey)
	print ""




	def getSpacedPrivateKey(privateKey):
	spacedLines = []
	for i in xrange(0, 64, 16):
	line = privateKey[i:i+16]
	spacedLines.append(getSpacedLine(line))
	return '\n'.join(spacedLines)


	def getSpacedLine(line):
	groups = [line[i:i+4] for i in xrange(0, 16, 4)]
	return ' '.join(groups)


	def getPrivateKeyChecksumLine(privateKey):
	csLine = ""
	# iterate over key in sets of 16 characters (i.e. one line).
	for i in xrange(0, 64, 16):
	line = privateKey[i:i+16]
	csLine += getLineChecksum(line)
	return csLine


	def getLineChecksum(line):
	checksum = ""
	# iterate over line in sets of 4 characters (i.e. one group).
	for i in xrange(0, 16, 4):
	group = line[i:i+4]
	checksum += getGroupChecksum(group)
	return checksum


	def getGroupChecksum(group):
	checksum = getHexSum(group)
	# return last character.
	return checksum[-1]


	def getHexSum(hexString):
	# convert into base 10 to calculate sum, then convert back to base 16.
	decimalSum = sum([int(c, 16) for c in hexString])
	return '{:x}'.format(decimalSum)


	def checkValidityOfPrivateKey(privateKey):
	if not isinstance(privateKey, str):
	raise TypeError("Private key must be a string.")
	if len(privateKey) != 64:
	raise ValueError("Private key length must be 64 characters.")
	hexDigits = "0123456789abcdef"
	if not all(c in hexDigits for c in privateKey):
	raise ValueError("Private key characters must be hex.")


	if __name__ == "__main__":
	main()

And now run this program.

[spiano@shovel test]$ python generatePrivateKeyChecksumTree.py

9
ab31
96ed 0029 cb57 1b50

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

This matches the earlier manually-generated result.

Let's sketch out a scenario where we make use of the checksum tree.

Let's suppose that:
- We have written out the key and the checksum (in the form shown above) onto a piece of paper.
- Without noticing it, we have made an error. Location: Line 3, group 1, character 1. We have accidentally written "8" instead of "3".

So: We discover the error when we manually copy the key from paper storage onto an offline computer and run generatePrivateKeyChecksumTree.py on it. We look at the checksum character on the first line of the tree and see that it is different. This alerts us to the problem.

Let's simulate this by generating the checksum tree for the altered private key.

Here's the altered private key:

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
8225 c97f c6b8 1150
42cf 6696 8c2f b2e5

Remove the whitespace:

[spiano@shovel test]$ echo -n "7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
8225 c97f c6b8 1150
42cf 6696 8c2f b2e5" | tr '\n' ' ' | sed 's/ //g' ; echo ""

7972b641101c0ad67b0e401b800a9b6f8225c97fc6b8115042cf66968c2fb2e5

Use this altered key as input for the python program:

[spiano@shovel test]$ python generatePrivateKeyChecksumTree.py

e
ab81
96ed 0029 1b57 1b50

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
8225 c97f c6b8 1150
42cf 6696 8c2f b2e5

Next: Compare checksum trees:

Original key and checksum:

9
ab31
96ed 0029 cb57 1b50

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

Altered key and checksum:

e
ab81
96ed 0029 1b57 1b50

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
8225 c97f c6b8 1150
42cf 6696 8c2f b2e5

So:
- In our scenario, the original checksum tree is written on our paper storage system. The altered checksum is displayed on an offline computer where we have generated it from the altered key.
- We look at the top of the checksum tree. Instead of "9" as expected, we see "e". Therefore, we investigate further.
- We look at the checksum group. Instead of "ab31", we see "ab81". The third character is different. This indicates that the error is in Line 3 of the private key.
- We look at the third group in the checksum line, which is "1b57" instead of "cb57". The first character is different, which indicates that the error is in Line 3, Group 1, of the private key.
- We've narrowed down the location to 4 of the characters, out of 64 in total.

Let's get the address that corresponds to the original key.

We got the key from this earlier article:
Bitcoin transaction test set

Searching for this private key in the article leads to a record of the corresponding address, which is:
16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj

Let's double-check this.

Follow this recipe to generate the address from the private key:
Recipe for generating a Bitcoin address

[spiano@shovel test]$ ls -1

bitcoin_functions.py
bjorn_edstrom_ripemd160.py
ecdsa
ecdsa-0.10
ecdsa-0.10.tar.gz
generate_bitcoin_address_3.py
generatePrivateKeyChecksumTree.py
pypy_sha256.py

[spiano@shovel test]$ python generate_bitcoin_address_3.py

### START GENERATION OF BITCOIN ADDRESS

Private key (hex bytes): 7972b641101c0ad67b0e401b800a9b6f3225c97fc6b8115042cf66968c2fb2e5
Private key (32 hex bytes): 7972b641101c0ad67b0e401b800a9b6f3225c97fc6b8115042cf66968c2fb2e5
Private key (WIF): 5JjmowHQkS37VWboLEazyNqAK6xQTCyHzj4mLf2YdxhJaaQm2gu
Bitcoin address: 16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj

### END GENERATION OF BITCOIN ADDRESS

Compare equality of the two addresses.

Original address from earlier article:
16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj

Generated address:
16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj

[spiano@shovel test]$ a1="16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj"

[spiano@shovel test]$ a2="16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj"

[spiano@shovel test]$ [[ "$a1" = "$a2" ]] && echo equal || echo not-equal

equal

Cool. They're identical. This is the correct address.

Next: Vary the hex values in the error location until the desired address is generated, showing that we have found the correct private key.

We'll only vary one character at a time.

There are four characters in a group.

Number of variants: 16 (values per character) * 4 (characters) = 64 variants.

We could try each variant manually, but instead we'll automate it from the beginning.

findErrorInPrivateKey.py

	#!/usr/bin/python


	# CONTROLS
	privateKeyWithError = "7972b641101c0ad67b0e401b800a9b6f8225c97fc6b8115042cf66968c2fb2e5"
	correctAddress = "16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj"
	errorLine = 3
	errorGroup = 1
	# END CONTROLS


	# DEPENDENCIES
	import itertools
	import bitcoin_functions as bf
	# END DEPENDENCIES


	def main():

	checkValidityOfPrivateKey(privateKeyWithError)

	correctPrivateKey = varyErrorGroup(privateKeyWithError, correctAddress, errorLine, errorGroup)

	print ""
	print getSpacedPrivateKey(correctPrivateKey)
	print ""




	def varyErrorGroup(privateKey, correctAddress, errorLine, errorGroup):
	# note: be careful to convert the line and group from 1-index to 0-index values.
	privateKeyLines = getPrivateKeyLines(privateKey)
	hexDigits = "0123456789abcdef"
	group = privateKeyLines[errorLine-1][errorGroup-1]
	# iterate over the four hex characters.
	for i in xrange(4):
	groupList = [c for c in group]
	# iterate over possible values for one hex character in the group.
	for j in xrange(16):
	groupList[i] = hexDigits[j]
	groupString = ''.join(groupList)
	privateKeyLines[errorLine-1][errorGroup-1] = groupString
	privateKeyVariant = getPrivateKeyString(privateKeyLines)
	addressVariant = bf.private_key_hex_to_bitcoin_address(privateKeyVariant)
	if addressVariant == correctAddress:
	return privateKeyVariant
	raise Exception("Correct address not found while varying this group (Group {g}, Line {l}) in the private key.".format(g=errorGroup, l=errorLine))


	def getPrivateKeyString(privateKeyLines):
	# join lines and groups into string.
	privateKey = ""
	for line in privateKeyLines:
	privateKey += "".join(line)
	return privateKey


	def getPrivateKeyLines(privateKey):
	# divide key into lines and groups.
	privateKeyLines = []
	for i in xrange(0, 64, 16):
	line = []
	for j in xrange(0, 16, 4):
	p = i + j # p = position
	group = privateKey[p:p+4]
	line.append(group)
	privateKeyLines.append(line)
	return privateKeyLines


	def getSpacedPrivateKey(privateKey):
	spacedLines = []
	for i in xrange(0, 64, 16):
	line = privateKey[i:i+16]
	spacedLines.append(getSpacedLine(line))
	return '\n'.join(spacedLines)


	def getSpacedLine(line):
	groups = [line[i:i+4] for i in xrange(0, 16, 4)]
	return ' '.join(groups)


	def checkValidityOfPrivateKey(privateKey):
	if not isinstance(privateKey, str):
	raise TypeError("Private key must be a string.")
	if len(privateKey) != 64:
	raise ValueError("Private key length must be 64 characters.")
	hexDigits = "0123456789abcdef"
	if not all(c in hexDigits for c in privateKey):
	raise ValueError("Private key characters must be hex.")


	if __name__ == "__main__":
	main()

And run the program.

[spiano@shovel test]$ python findErrorInPrivateKey.py

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

We see that the program has replaced the "8" in Line 3 Group 1 with "3", thereby producing the correct private key.

Let's consider another scenario. We've located the error in a group, but we don't know how many mistakes are in that group. All four of the characters could be wrong.

A group of 4 random hex characters can have 16^4 = 65536 possible values.

My work machine:

System details:
- Name: Shovel
- Specifications: HP 6005 Pro SFF. 3 GHz x86_64 processor (AMD II x4 B95 Quad Core), 4 GB RAM, 1 TB hard drive. Running CentOS 7.6.1810 (Core).
- More information: New computer: Shovel
- Installed items: GCC 4.8.5, Make 3.82, Vim 7.4, Python 2.7.5, Python 3.3.2, Gnome 3.28.2, gedit 3.28.1, GPG 1.4.10.

Copy findErrorInPrivateKey.py to create findErrorInPrivateKey2.py.

Change the function

varyErrorGroup

to be:

def varyErrorGroup(privateKey, correctAddress, errorLine, errorGroup):

	# note: be careful to convert the line and group from 1-index to 0-index values.

	privateKeyLines = getPrivateKeyLines(privateKey)

	hexDigits = "0123456789abcdef"

	hexDigitsList = [c for c in hexDigits]

	group = privateKeyLines[errorLine-1][errorGroup-1]

	# iterate through all the possible values in one group.

	countIncrement = 256

	count = 0

	total = 65536

	for groupTuple in itertools.product(hexDigitsList, repeat=4):

		groupString = ''.join(groupTuple)

		privateKeyLines[errorLine-1][errorGroup-1] = groupString

		privateKeyVariant = getPrivateKeyString(privateKeyLines)

		addressVariant = bf.private_key_hex_to_bitcoin_address(privateKeyVariant)

		if addressVariant == correctAddress:

			return privateKeyVariant

		if count % countIncrement == 0:

			percentage = float(count) / total * 100

			print "{c}/{t} = {p:.2f}%".format(c=count, t=total, p=percentage)

		count += 1

	raise Exception("Correct address not found while varying this group (Group {g}, Line {l}) in the private key.".format(g=errorGroup, l=errorLine))

Run the program.

[spiano@shovel test]$ date +"%T"; python findErrorInPrivateKey2.py; date +"%T"

09:33:54
0/65536 = 0.00%
256/65536 = 0.39%
512/65536 = 0.78%
768/65536 = 1.17%
1024/65536 = 1.56%
1280/65536 = 1.95%
1536/65536 = 2.34%
1792/65536 = 2.73%
2048/65536 = 3.12%
2304/65536 = 3.52%
2560/65536 = 3.91%
2816/65536 = 4.30%
3072/65536 = 4.69%
3328/65536 = 5.08%
3584/65536 = 5.47%
3840/65536 = 5.86%
4096/65536 = 6.25%
4352/65536 = 6.64%
4608/65536 = 7.03%
4864/65536 = 7.42%
5120/65536 = 7.81%
5376/65536 = 8.20%
5632/65536 = 8.59%
5888/65536 = 8.98%
6144/65536 = 9.38%
6400/65536 = 9.77%
6656/65536 = 10.16%
6912/65536 = 10.55%
7168/65536 = 10.94%
7424/65536 = 11.33%
7680/65536 = 11.72%
7936/65536 = 12.11%
8192/65536 = 12.50%
8448/65536 = 12.89%
8704/65536 = 13.28%
8960/65536 = 13.67%
9216/65536 = 14.06%
9472/65536 = 14.45%
9728/65536 = 14.84%
9984/65536 = 15.23%
10240/65536 = 15.62%
10496/65536 = 16.02%
10752/65536 = 16.41%
11008/65536 = 16.80%
11264/65536 = 17.19%
11520/65536 = 17.58%
11776/65536 = 17.97%
12032/65536 = 18.36%
12288/65536 = 18.75%
12544/65536 = 19.14%
12800/65536 = 19.53%

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

10:12:35

Time taken: 10:12:35 - 09:33:54 ~= 38.5 minutes

We found the correct key after searching 20% of the keyspace.

So to search the entire keyspace would take approximately:
38.5 minutes * 5 = 192.5 minutes ~= 3.2 hours

Ok. So, although it will take a little time and effort, a private key is retrievable in the event of an error that can be narrowed down to a particular group. The entire group can be wrong.

Note: Using a faster language than Python for the Bitcoin address generation subroutine would speed this program up quite a lot.

Even if multiple groups are wrong, we can use the checksum tree to divide up the problem. We could iterate over each group until its last checksum digit matches the checksum value on the next level, then check the entire key.

Another scenario: What if there is a mistake in the checksum tree?

Original key and checksum:

9
ab31
96ed 0029 cb57 1b50

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

Let's suppose that the checksum group is incorrectly written down on paper as "a631". The "6" should be a "b".

Then: We then manually enter the private key on an offline computer. We re-calculate the checksum tree. We notice that the checksum group is incorrect. If the generated checksum character and the checksum line are identical to the written copy, then we know that the discrepancy is due to incorrect copying.

The same applies to the checksum character having an incorrect value.

If the error were instead on the checksum line, this would narrow down the location of the error to:
a) a copy error in the group on the checksum line
b) an error in the corresponding group in the private key

(a) is easily checkable - calculate the checksum of the group in the private key.
(b) can be approached with the method we used earlier for varying an entire group.

Let's suppose that there were 2 or more errors, in both the private key and the checksum tree.

Well, in this case, a more inventive approach would be required. A more complicated program could be written that would vary the values of several characters, in both the key and the checksum tree, to try to find the correct key variant. We would expect the correct key to produce a checksum tree variant that is nearly identical to the recorded checksum tree. Essentially, we would treat the checksum tree as a partial / probabilistic truth, rather than an absolute one.

Final scenario: Let's explore how long it would take to search a private key for a) 1 error b) 2 errors c) 3 errors, without relying on a checksum tree. The error locations are random.

For 1 error, we're going to vary each of the 64 key characters through the 16 possible hex values. This is 64*16=1024 operations.

For 2 errors: For each of the operations in the 1-error scenario, we would hold our current character constant, and then we would have to vary the other 63 characters through the 16 hex values. 63*16=1008 operations. So in total, we would have to perform 1008 * 1024 = 1032192 operations.

For 3 errors: Similar approach. For each of the 1032192 operations in the 2-error scenario, we would do 62*16=992 operations. In total, 992 * 1032192 = 1023934464 operations.

Well, essentially, we can see that each extra error that we search for increases the search cost by a factor of approximately 1000.

We'll start from the same altered key as before, and write a program that implements these approaches.

findErrorInPrivateKey3.py

	#!/usr/bin/python


	# CONTROLS
	privateKeyWithError = "7972b641101c0ad67b0e401b800a9b6f8225c97fc6b8115042cf66968c2fb2e5"
	correctAddress = "16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj"
	numberOfErrors = 1
	# END CONTROLS


	# LOGGING
	count = {}
	count['1'] = 0
	count['2'] = 0
	count['totalOperations'] = 1
	n = 0
	while n < numberOfErrors:
	operations = (64-n)*16
	count['totalOperations'] *= operations
	n += 1
	# END LOGGING


	# DEPENDENCIES
	import itertools
	import bitcoin_functions as bf
	# END DEPENDENCIES


	def main():

	checkValidityOfPrivateKey(privateKeyWithError)

	correctPrivateKey = varyPrivateKeyForOneError(privateKeyWithError, correctAddress)
	#privateKeyWithErrorList = [c for c in privateKeyWithError]
	#correctPrivateKey = varyPrivateKeyForMultipleErrors(privateKeyWithErrorList, correctAddress, numberOfErrors)

	if correctPrivateKey == None:
	raise Error("Correct address not found while varying this key")

	print ""
	print getSpacedPrivateKey(correctPrivateKey)
	print ""

	print "count['1'] = " + str(count['1']) + " operations."
	#print "count['2'] = " + str(count['2']) + " operations."



	def varyPrivateKeyForMultipleErrors(privateKeyList, correctAddress, numberOfErrors, holdConstant={}):
	privateKeyListOriginal = privateKeyList[:]
	hexDigits = "0123456789abcdef"
	# iterate over character locations in the private key.
	# - but don't vary a character whose index appears in the holdConstant dictionary.
	for i in xrange(64):
	if i in holdConstant.values():
	continue
	privateKeyList = privateKeyListOriginal[:]
	# iterate over hex digits
	for j in xrange(16):
	privateKeyList[i] = hexDigits[j]
	if numberOfErrors == 1:
	# we have arrived at the bottom of the recursion chain.
	# change one character
	privateKeyVariant = ''.join(privateKeyList)
	addressVariant = bf.private_key_hex_to_bitcoin_address(privateKeyVariant)
	if addressVariant == correctAddress:
	return privateKeyVariant
	count['2'] += 1
	if count['2'] % 256 == 0:
	percentage = float(count['2']) / count['totalOperations'] * 100
	print "{c}/{t} = {p:.2f}%".format(c=count['2'], t=count['totalOperations'], p=percentage)
	else:
	# we have to vary more characters before testing the private key variant.
	# store the index of the character that we're varying in this context.
	holdConstant[numberOfErrors] = i
	# recurse, carefully subtracting 1 from numberOfErrors.
	result = varyPrivateKeyForMultipleErrors(privateKeyList, correctAddress, numberOfErrors-1, holdConstant)
	if result != None:
	return result
	return None



	def varyPrivateKeyForOneError(privateKey, correctAddress):
	hexDigits = "0123456789abcdef"
	# iterate over character locations in the private key.
	for i in xrange(64):
	privateKeyList = [c for c in privateKey]
	# iterate over hex digits
	for j in xrange(16):
	# change one character
	privateKeyList[i] = hexDigits[j]
	privateKeyVariant = ''.join(privateKeyList)
	addressVariant = bf.private_key_hex_to_bitcoin_address(privateKeyVariant)
	if addressVariant == correctAddress:
	return privateKeyVariant
	count['1'] += 1
	return None


	def getSpacedPrivateKey(privateKey):
	spacedLines = []
	for i in xrange(0, 64, 16):
	line = privateKey[i:i+16]
	spacedLines.append(getSpacedLine(line))
	return '\n'.join(spacedLines)


	def getSpacedLine(line):
	groups = [line[i:i+4] for i in xrange(0, 16, 4)]
	return ' '.join(groups)


	def checkValidityOfPrivateKey(privateKey):
	if not isinstance(privateKey, str):
	raise TypeError("Private key must be a string.")
	if len(privateKey) != 64:
	raise ValueError("Private key length must be 64 characters.")
	hexDigits = "0123456789abcdef"
	if not all(c in hexDigits for c in privateKey):
	raise ValueError("Private key characters must be hex.")


	if __name__ == "__main__":
	main()

Run the program.

[spiano@shovel test]$ date +"%T"; python findErrorInPrivateKey3.py; date +"%T"

11:40:58

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

count['1'] = 515 operations.
11:42:32

So: Approximately 1.5 mins to find the correct key.

The keyspace was 1024 variants in total. We've processed about 50% of the keyspace.

So: To iterate across the whole keyspace would take ~1.5 * 2 = ~3 mins.

Let's test with
varyPrivateKeyForMultipleErrors()
instead of
varyPrivateKeyForOneError()

Uncomment lines 35, 36, and 46. Comment out lines 34 and 45.

[spiano@shovel test]$ date +"%T"; python findErrorInPrivateKey3.py; date +"%T"

11:45:35
256/1024 = 25.00%
512/1024 = 50.00%

7972 b641 101c 0ad6
7b0e 401b 800a 9b6f
3225 c97f c6b8 1150
42cf 6696 8c2f b2e5

count['2'] = 515 operations.
11:47:08

Same time taken. This indicates that the recursive function works, at least for numberOfErrors = 1.

Let's try with numberOfErrors=2.

Change the last character of the key from "5" to "6", so that there are now 2 errors. In the settings section, set numberOfErrors=2.

While the program is running, let's do a little maths:
- 1 error: 3 mins to get through the keyspace on this machine, using a Python codebase.
- We assume that each extra error that we search for increases the search cost by a factor of approximately 1000.
- 2 errors: 3 mins * 1000 = 3000 mins = 50 hours ~= 2.1 days
- 3 errors: 3 mins * 1000 * 1000 = 3000000 mins = 50000 hours ~= 2083 days ~= 5.70 years
- 4 errors: 3 mins * 1000**3 = 3 * 10**9 mins = 50000000 hours ~= 2083333 days ~= 5704 years

So... 2 errors is the tolerable margin of error in copying a private key. It's feasible to contemplate searching the keyspace for 3 errors, if you invested effort into a faster codebase and machine.

[spiano@shovel test]$ date +"%T"; python findErrorInPrivateKey3.py; date +"%T"

11:51:14
256/1032192 = 0.02%
512/1032192 = 0.05%
768/1032192 = 0.07%
1024/1032192 = 0.10%
1280/1032192 = 0.12%
1536/1032192 = 0.15%
1792/1032192 = 0.17%
2048/1032192 = 0.20%
2304/1032192 = 0.22%
2560/1032192 = 0.25%
2816/1032192 = 0.27%
3072/1032192 = 0.30%
3328/1032192 = 0.32%
3584/1032192 = 0.35%
3840/1032192 = 0.37%
4096/1032192 = 0.40%
4352/1032192 = 0.42%
4608/1032192 = 0.45%
4864/1032192 = 0.47%
5120/1032192 = 0.50%
5376/1032192 = 0.52%
5632/1032192 = 0.55%
5888/1032192 = 0.57%
6144/1032192 = 0.60%
6400/1032192 = 0.62%
6656/1032192 = 0.64%
6912/1032192 = 0.67%
7168/1032192 = 0.69%
7424/1032192 = 0.72%
7680/1032192 = 0.74%
7936/1032192 = 0.77%
8192/1032192 = 0.79%
8448/1032192 = 0.82%
8704/1032192 = 0.84%
8960/1032192 = 0.87%
9216/1032192 = 0.89%
9472/1032192 = 0.92%
9728/1032192 = 0.94%
9984/1032192 = 0.97%
10240/1032192 = 0.99%
10496/1032192 = 1.02%
10752/1032192 = 1.04%
11008/1032192 = 1.07%
11264/1032192 = 1.09%
11520/1032192 = 1.12%
11776/1032192 = 1.14%
12032/1032192 = 1.17%
12288/1032192 = 1.19%
12544/1032192 = 1.22%
12800/1032192 = 1.24%
13056/1032192 = 1.26%
13312/1032192 = 1.29%
13568/1032192 = 1.31%
13824/1032192 = 1.34%
14080/1032192 = 1.36%
14336/1032192 = 1.39%
14592/1032192 = 1.41%
14848/1032192 = 1.44%
15104/1032192 = 1.46%
15360/1032192 = 1.49%
15616/1032192 = 1.51%
15872/1032192 = 1.54%
16128/1032192 = 1.56%
16384/1032192 = 1.59%
16640/1032192 = 1.61%
16896/1032192 = 1.64%
17152/1032192 = 1.66%
17408/1032192 = 1.69%
17664/1032192 = 1.71%
17920/1032192 = 1.74%
18176/1032192 = 1.76%
18432/1032192 = 1.79%
18688/1032192 = 1.81%
18944/1032192 = 1.84%
19200/1032192 = 1.86%
19456/1032192 = 1.88%
19712/1032192 = 1.91%
19968/1032192 = 1.93%
20224/1032192 = 1.96%
20480/1032192 = 1.98%
20736/1032192 = 2.01%

[spiano@shovel test]$ date +"%T"

12:53:47

I've stopped the program and run the

data

command again immediately after stopping. We've gone far enough (2%) to make an estimate.

Time taken:
12:53:47 - 11:51:14 ~= 62.5 minutes for 2% of keyspace.

To process the entire keyspace would take approximately:
62.5 minutes * 50 = 3125 minutes ~= 52 hours ~= 2.17 days

This matches our earlier estimate of 2.1 days.

Another consideration: These key-recovery approaches depend on having the correct Bitcoin address.

The Bitcoin address is necessarily public, and can be shared with anyone without risk to the bitcoin held in it. It has a hash-based checksum embedded within it.

Since the address is public, copies of it do not have to be kept offline.

Because we can store multiple copies of an address on network-connected computers, it's not really as much of a resilience gain to use a checksum tree for it, although obviously you could do so if you wish.

Steps that can be taken to preserve an accurate copy of a Bitcoin address:
- In a text file, don't just keep a single copy of the address. Keep several copies on a few consecutive lines. Some text editors, when you select a string and highlight it, highlight other identical strings in the document. This is a quick test to confirm that all copies of the address are identical (i.e. that no copy has a typo).
- Keep a backup copy of the address-storage text file in the same directory.
- Backup the address storage data onto multiple external storage units.
- Write out the address next to the private key. This has several advantages:
-- The balance held by the key can be looked up via the address without re-calculating the address from the key.
-- A written-out address, kept offline, is a more resilient copy of the address. Address copies kept on network-connected computers might conceivably be altered by mistake or by third parties.
-- When the key is entered into an offline computer, the address can be used as an additional confirmation of the newly entered key's accuracy, by comparing it to the address generated from the entered key. If a transcription error has been made in the private key during the original storage or during entry into the offline computer, the generated address will be different.
- Create a name for your own use for each private key and address combination, so that you can more easily differentiate between combinations. Write the name on the offline storage paper. On a networked-connected computer, where you might look up the balance on this address, keep the name with the address.
- Design the offline storage in such a way that the private key is not exposed to view when you use it to view the address. Private keys can be stolen with cameras.

Let's go over possible problems with storing an address.

Here is the address from earlier:
16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj

The possible characters in a Bitcoin address are:
- 0-9 with no 0
- A-Z with no I or O
- a-z with no l

The 0, I, O, and l characters were excluded due to possible confusion with i and o.

When writing an Bitcoin address on paper, I have found it best to take these steps in order to reduce ambiguity:
- Underline every capital letter, taking care not to allow the underline to touch the character or any other nearby character.
- For a "1" character, include the extra bottom and diagonal lines (i.e. "1"), rather than writing only a single vertical line (i.e. "|").
- Add a tail to u/U to distinguish it from v/V.
- Cross the middle of z/Z to distinguish it from 2.
- Cross the top of a capital J, because you might as well do this.

Also, like a private key, it's easiest to write the characters in groups of four with a space between each group. Some letters will be left over that do not form a complete group of four.

The division-into-groups process can be automated:

[spiano@shovel ~]$ echo -n "16ASCUS3s7D4UQh6J9oHGuT19agPvD3PFj" | fold --width 16 | sed 's/.\{4\}/& /g'; echo "";

16AS CUS3 s7D4 UQh6
J9oH GuT1 9agP vD3P
Fj

StJohn Piano | 2020-07-26 | Comment 0

The following two statements deal with underlining the character "7" when writing a Bitcoin address on paper.

- Be careful when underlining 7. If you cross it with a horizontal line, it could look like Z when underlined. If you don't cross it, it could look like 2 when underlined.

[...]

- Underline 6 to distinguish it from b. This makes the overall underlining rule into "Underline capital letters and 6, and be careful of F and 7".

They were accidentally left in from an earlier approach that involved underlining all numerals as well as all capital letters.

Summary: The character "7" does not need to be underlined. Ignore indications to the contrary.