Knox Custody Risk Management
This document is intended for regulators and newcomers to understand some basics behind the safety of Bitcoin private key management systems. This is far from an exhaustive list, as listing all of our controls could never fit in one neat place, and we are always adding new controls. We may wish to add other important controls to this list in the future as well.
In many cases, to avoid introducing unnecessary complexity, we maintain simplicity for the sake of clarity, even if a point could be better explained to someone more experienced with Bitcoin. In some cases, we use terminology we believe is more descriptive or understandable for newcomers. If you are experienced with Bitcoin and believe that some explanation could bring the reader closer to the reality of Bitcoin without introducing extra complexity, we would love to hear from you. To get in touch, please email us at firstname.lastname@example.org or visit knoxcustody.com/contact.
If you are a new-comer with little understanding of Bitcoin, we strongly recommend first reading our primer on some Bitcoin basics, which is assumed knowledge in this document.
- Use multisignature schemes
- Enforce dual or triple control
- Use HD wallets
- Do not commingle customer accounts
- Segregate accounts
- Keep 95+% of funds cold, and do not abuse the term cold
- Track the entire private key life-cycle before declaring it cold
- Generate keys in N distinct locations
- Use different personnel for each of the N generation locations
- Log every action taken during a key generation event
- Generate keys in a strict sequence
- Mix two or more sources of entropy to generate keys
- Use physical entropy to generate keys
- Vault offline devices used to generate keys
- Destroy devices used to generate keys at the end of their life
- Never transport or store raw private keys
- Secret-share private keys for secure backups
- Encrypt shards for secure private key backups
- Consider a secure transport provider to move any private material between locations
- Mark an account as compromised if secure transport is compromised
- Make regular use of tamper-evident packaging on all private key material
- Never store private key backup shards associated with one account in the same place
- Geographically distribute private key backup shards
- Ensure extra redundancy for parent public keys of a client account
- Never allow access to raw private keys
- Ensure human policy is not solely responsible for preventing thefts
- Use HSMs (Hardware Security Modules)
- Use more than one, or M HSMs
- Maintain strict physical security controls for signing locations
- Do not rely on physical security alone
- Use dual control personnel to sign
- Maintain at least M distinct signing locations
- Ensure that no master keys exist that can unlock signers
Document Organization: Stages
In order to help the reader understand these risk management strategies, we frame them in terms of the life-cycle of private keys. To begin, we will lay out some strategies that should be applied at all times. Briefly, the staged sections are:
General: These should be used throughout the life cycle, or do not have a natural home in one particular section.
Generation & De-archiving: This stage occurs when keys are generated, or any time private keys must be created or made present. This occurs extremely infrequently, typically once for the full private key life-cycle.
Storage & Transport: Storage refers to vaulting and safekeeping of private keys. A majority of a private key's life is spent in this stage, concurrently with the signing stage (4). Transportation occurs when generated keys are moved to some location.
Signing: This refers to any period when private keys are used for signing transactions. A majority of a private key's life is spent in this stage with frequent access, concurrently with the storage stage where a private key is rarely used.
Document Organization: Structure
Each control will be presented in the same way to keep it consumable:
Simple description of the control
What this means: An explanation of the risk being controlled, or the particular control.
Why it matters / what it protects: A rationale for the imposition of this control.
In other sections: In some cases, we will want to define a term that will be useful in other sections, and will do so here.
(1) Use multisignature schemes
What this means: An address may be associated with a private key, or it may be associated with several private keys. It is possible to construct addresses that require signatures from multiple private keys before funds are allowed to move—multisignature (multisig). We recommend always using multi-signature schemes to derive addresses for holding a customer’s bitcoin.
Why it matters / what it protects: We will see through the next sections that segregation of duties, and segregation of personnel is quite important. Multi-signature schemes present several advantages over single-signature schemes in the institutional context.
In later sections: When a multisignature scheme is used, some number of distinct private keys are created, from which a quorum needs to be reached. In our primer on Bitcoin basics, we show a 3 of 4 scheme. This means 4 keys are generated, and at least 3 are needed to reach quorum. More generally, we refer to these as M of N schemes. In the 3 of 4 case, M=3 and N=4. In other sections, we refer to things like “each of the N private keys”. This simply means for each of the 4 distinct keys.
(2) Enforce dual or triple control
What this means: Always use more than one person for any operation that is even remotely sensitive. The only situation in which an operation should be allowed to execute with a single person is if you can make an assessment that a nefarious agent in that situation could not cause damage.
Why it matters / what it protects: Even with an otherwise well-controlled system, and trusted, background-checked personnel, a single person is running unobserved, and is capable of more harm. For this reason, two or more personnel should always be used for sensitive operations.
In later sections: We will point out certain times when this control is used, although this will not be exhaustive. When we do so, we will refer to a dual control, or dual control personnel, which implies that the described operation is being engaged by two or more personnel.
(3) Use HD wallets
What this means: Participants interact with the bitcoin protocol via private keys, public keys, and addresses. In Bitcoin, it is common and desired for a different address to be used for each deposit. However, since every address must have at least one associated private key, and several in the case of multi-signature schemes, this would lead to an explosion in the number of keys kept. Fortunately, a technique exists (BIP32) which allows one piece of private information to be created, from which all of the private keys that will be used for an account can be derived. This does not mean that there is a master key that can gain signing authority over an account. Using this technique in concert with a 3 of 4 multisignature scheme, for example, will mean that 4 unique such keys will be created.
Why it matters / what it protects: Since customers expect accounts with many addresses, this technique means that a minimum of information will be created and maintained for each customer. A smaller footprint of information is beneficial, however further action is needed in order to safely archive this parent private key.
In later sections: We will begin referring to both private keys and parent private keys. A parent private key is simply one from which the other keys are derived. Where possible, we will still speak in terms of individual private keys.
(4) Do not commingle customer accounts
What this means: It is possible to use a set of addresses, and store many different customers’ funds on the same set of addresses in a wallet, being careful to track the balances of each. We strongly discourage this practice.
Why it matters / what it protects: Several downsides appear if this practice is engaged. The most obvious is that a mistake can be made in accounting for the funds of different customers, and reconciling this mistake may require data that is warehoused in a single location. Engaging this practice also takes away from the ability of a customer or an auditor to use canonical Bitcoin data that is available to every participant in the network to independently verify address activity. This capacity for independent verification is incredibly powerful, and should be thrown away. In a moment we will get into a further separation that should be applied, and the same safety rationale for employing such a control applies for not commingling customer accounts as well.
(5) Segregate accounts
What this means: As shown previously, a parent private key can be used to derive many different child private keys. However, this technique can be abused to do something inappropriate. Instead of using the parent key to create many addresses for a single customer, the parent key could be used to generate many addresses, where sets of these addresses are then given to individual customers. To truly segregate accounts means not sharing any private information between them. This means that each segregated account must be constructed with completely independent information from any other.
Why it matters / what it protects: From the perspective of the customer, an account created by abusing a parent private key appears to be segregated, but in fact, even in the case that a multisignature scheme is employed, each customer is linked by some identical information to the others. This means that the compromise of one piece of information links many different customers. We not only frown on this practice, we frown on the conflation of the terms segregated and commingling. Merely not commingling addresses is not true segregation.
(6) Keep 95+% of funds cold, and do not abuse the term cold
What this means: Cold in the context of Bitcoin security means ‘offline’ and that private keys never appear on devices that will ever receive a network connection, or are connected to devices that will ever receive a network connection. For example, a Bitcoin wallet created on a computer that is later attached to the internet is strictly not cold. An industrial HSM connected to a computer that will ever see a network connection is not cold. A hardware wallet that is communicating over USB with a network-connected computer is not cold. You will come across different terminology to describe Bitcoin storage systems. These all amount to keeping private keys away from a network connection. You may see analogous terms elsewhere such as “air-gapped”, “eternal quarantine” or “offline”.
Why it matters / what it protects: The Internet is wonderful because it provides a path for any node in the world to connect to any other. Of course, for security reasons, this is not a good property. For this reason, it is important that a majority of funds in a system be kept cold, which is to say they will never be exposed to the possibility of leaking by way of a network intrusion. It is however important to understand that a truly cold custody service, simply by virtue of being cold, is not necessarily safe. In particular, it may be exposed to internal collusion (agents stealing together), as personnel interacting with the material or devices carry their own risks, even though they are not network intruders.
(7) Track the entire private key life-cycle before declaring it cold
What this means: It could be argued that this follows naturally from the practice of keeping funds 95+% cold. However, we think it is worth mentioning explicitly. A private key can never be called cold should it ever be found, if even just for an instant, on a device that is connected to the Internet. For instance, consider a private key that was generated on an online computer, and then kept cold for the entirety of its life following. Even though it is kept away from a network for every second of its life after generation, it can never be considered cold, as it was not cold at birth. Consider also an offline computer that was used to generate a private key. Following its generation, the private key, like in the previous example, is kept cold for the rest of its life. If the generating computer is ever connected to the internet, any keys it ever generated can never be called cold. Think of this as an extreme form of contagion. A key can never again qualify as cold once it is not cold for any period of time.
Why it matters / what it protects: The same rationale applies as previously explained in keeping private keys cold. That this needs to be maintained for the entire private key life-cycle however is critical.
Generation & De-Archiving (GD)
(1) Generate keys in N distinct locations
What this means: When private keys are generated for a multisignature account, the generation can be done in completely different places. For example, in the case of a 3 of 4 multisignature scheme, this means 4 completely different locations.
Why it matters / what it protects: Since the keys can be independently generated, it is worth taking the extra care to make sure that their independence is guarded to the point of not being generated in the same place. Ultimately, anything you can do to decrease the likelihood that the information from these keys should ever cross is worth it.
(2) Use different personnel for each of the N generation locations
What this means: Not only should the N distinct key sets be generated in different locations, different dual control personnel must be used for each and every site. What this means in concert with different locations, and true segregated accounts, is that every single customer account will be created by no less than 2n people (for example 8 people in the case of a 3 of 4 scheme).
Why it matters / what it protects: Since ultimately the safety afforded to us by the use of multisignature schemes is derived from their being created completely independently, the same person should never observe or come into contact with any of the other private key sets. This is yet another case of guarding their independence to the maximum extent possible.
(3) Log every action taken during a key generation event
What this means: A generation event will come to produce private keys for one part of a customer’s multisignature account. Given the importance of this event for the account, every action that can reasonably logged should be, so that the details of the operation can later be reviewed or audited. There are a large number of controls that can and should be imposed on this area in particular, which we may detail in future control documents. Once a generation event completes, the entity that will come to maintain signing authority should independently verify that the dual control personnel followed all of the imposed controls.
Why it matters / what it protects: Due to the criticality of generation, it is important for others to independently verify and audit that it was undertaken with appropriate care, and that the relevant controls were followed. This also allows for completely re-running generation events that missed even a single control.
In later sections: We will start using the term ceremony to define an event where personnel are involved, and where the event has controls imposed.
(4) Generate keys in a strict sequence
What this means: For each of the N locations being used to generate private keys, engage each one in sequence. This implies engaging them in a particular sequential order, such that they do not occur at the same time as one another. This implies that private keys are never observable by more than one set of dual control personnel at a time, that the private keys from one location are fully vaulted before moving on to the next location, and that the ceremony is audited by personnel other than those responsible for generating. This should all occur prior to commencing the generation ceremony at the next location.
Why it matters / what it protects: With distinct locations and personnel being used N times, and with each sortie producing auditable logs of its activity, independent verification of the generation ceremony can be conducted before any other is allowed to proceed.
(5) Mix two or more sources of entropy to generate keys
What this means: A source of entropy is the input used to create a high quality random number. When creating random numbers, two or more distinct sources of entropy should be used and mixed.
Why it matters / what it protects: High quality random numbers are critical to producing a safe private key. The use of multiple sources of entropy reduces the risk that a faulty source of entropy could compromise the quality of a private key.
(6) Use physical entropy to generate keys
What this means: Make sure that one of the mixed sources of entropy is “real-world entropy”. This implies that the randomness is picked up from randomness due to a physical process. Such entropy can be generated for example by fair dice, or a TRNG (True Random Number Generator). It is important to understand that most computers have a (PRNG) Pseudorandom Number Generator, which is not truly random. It should not be relied on alone, in particular for machines that are cold, as the PRNG will attempt to pick up as much entropy as it can from the world, and a cold computer will necessarily have more trouble doing so. If using a TRNG, we recommend further mixing real-world entropy whose randomness you can verify for yourself, such as rolling dice.
Why it matters / what it protects: This follows from the rationale for mixing multiple sources of entropy. The quality of the random numbers used to generate private keys is absolutely crucial in producing a high-quality private key.
(7) Vault offline devices used to generate keys
What this means: An offline machine that was used to generate private keys should be carefully vaulted in a safe, with any access to it logged.
Why it matters / what it protects: Consider what we said earlier about a key no longer being cold if the machine on which it was generated is ever connected to a network. For this reason alone, such a machine must be closely monitored, and never allowed to leave the entity that used it for key generation. There are other reasons you might not want such a machine to ever be accessed by others, so treating it as sensitive is important. Further, such a machine should only ever be accessed and initialized by dual control personnel.
(8) Destroy devices used to generate keys at the end of their life
What this means: Not only should a machine used to generate keys be vaulted, should it ever stop functioning, or be retired for any reason, it must be destroyed.
Why it matters / what it protects: For similar reasons to vaulting such a machine, carefully destroying it before discarding it is critical.
Storage & Transport (ST)
(1) Never transport or store raw private keys
What this means: When we describe moving key material, we are not talking about a package that contains information that would allow the transporter or anyone else to gain knowledge of a private key, even if they successfully opened every package that they are carrying. Access to any material being transported or stored should not leave any attacker with the ability to discover private keys.
Why it matters / what it protects: While spreading the sensitive material that can be used to reconstruct keys by the safekeeping entity across the world is important for several reasons, the most safety-conscious assumption to make is that that material will later be accessed by the wrong entity. If a system is designed such that even such an extreme intrusion does not lead to the attacker gaining signing authority, the system will be considerably safer than if this were not the case.
In later sections: We will regularly refer to private material alongside private keys going forward, in order to be able to speak about things like private keys that are not in raw form, while still stressing the private nature of the material. What we mean when we say private material is anything that is associated with some particular private key. Concretely: private keys and parent private keys qualify as private material. We also include in this category encrypted or otherwise partial information associated with some private key, or devices that can be used to produce digital signatures using an internally held private key, even if by design such devices can not directly reveal a private key. It would be inaccurate to classify such material as a private key, so we use this term to stress that it is still sensitive due to its association with some private key or set of private keys. In the above description, we mentioned an “attacker”. An attacker is an entity that is trying to cause a theft or loss, or otherwise attempting something malicious.
(2) Secret-share private keys for secure backups
What this means: There exist techniques that allow one to take a single private key, and turn it into several distinct pieces of information—also called shards. A subset of the divided pieces (shards) is necessary to reconstruct the original private key. Following their generation, private keys should not be kept in raw form, and this is one of the techniques that should be employed.
Why it matters / what it protects: This method allows one to neither transport, nor store, exfiltratable private keys that would grant an attacker signing authority.
In later sections: We will use the term shard later in the document when referring to a single piece of information that was obtained from a secret-sharing transformation of something like a private key.
(3) Encrypt shards for secure private key backups
What this means: Encryption refers to encoding information such that the observer of the encoded information can not retrieve the original information. All shards should be encrypted following their generation, and transported and stored in encrypted form.
Why it matters / what it protects: This is a second control that can be imposed to respect never transporting or storing private keys in raw form. With the secret information secret-shared, and then encrypted, the risks of attackers gaining signing authority over an account are greatly reduced, even if they gain access to every single geographically distributed vault.
(4) Consider a secure transport provider to move any private material between locations
What this means: When moving private material, in any form, a secure transport method should be used. There are a large number of controls that can be imposed for such a service. A third party may be elected to assist, and is likely the right choice for most Bitcoin companies. A responsible provider may be an entity regularly tasked with carrying large numbers of banknotes or other valuables, with a successful history of doing so.
Why it matters / what it protects: Moving sensitive material in the world is not a new endeavor. Bitcoin companies are unlikely to have internal expertise in secure transportation. Engaging a specialized entity with a proven track record is best.
(5) Mark an account as compromised if secure transport is compromised
What this means: A sound secure transport service will inform you if there was a serious breach in any of its controls, for example an attempt by their personnel to open some package. Even if you can take receipt of some item and believe it was not compromised, it should be assumed to be compromised. Move to discard the associated customer account entirely. If any funds are kept on such an account, a new one should be provided to the customer so that they may sweep the contents from their now-compromised account.
Why it matters / what it protects: A multisignature account with the level of security achieved by following these and other controls is formed by the union of several independent processes. While signing authority may not be directly compromised by the failure of a single process, which is part of what lends the combined set of processes their extreme safety, any weakening should be conservatively considered fatal.
(6) Make regular use of tamper-evident packaging on all private key material
What this means: Any sensitive material that is moved between locations should be stored in tamper-resistant and tamper-evident packaging. It is important that the unlogged access of any material is independently observable.
Why it matters / what it protects: Knowledge of exactly when certain material was accessed is important, and even if it is kept locked away, you should not rely on that fact alone to assume that it was never accessed.
(7) Never store private key backup shards associated with one account in the same place
What this means: Given some number of private key backup shards produced per the controls mentioned earlier, they will need to be stored. It is important that no two shards that are related to the same account be stored in the same safety deposit or other locked box. Shard storage requires physical isolation.
Why it matters / what it protects: Even though the earlier controls have already gone to great lengths to ensure that physical intrusion by an attacker will not allow them to gain signing authority, limiting the likelihood that such access is ever gained in the first place is a necessary additional measure.
(8) Geographically distribute private key backup shards
What this means: Given a set of private key backup shards for a multisignature account, ensure that the locations of the stored shards are distributed globally. In the case of Knox, private material is distributed over 4 cities, 3 countries, and 2 continents.
Why it matters / what it protects: This both greatly reduces the likelihood that any one attacker can gain access to large numbers of shards, but also provides for a lot of redundancy for the private material. Consider that, besides theft, loss of signing authority can render funds in an account unmovable. By distributing material over several cities, countries, and continents, the risk of natural catastrophe or other force majeure events leading to a loss is greatly reduced.
(9) Ensure extra redundancy for parent public keys of a client account
What this means: In an M of N multisignature account, M is the threshold quorum to manage to sign off on a fund movement (M ≤ N). However, even though only M private keys are necessary to sign, due to the way the system functions, N public keys should be known at all times. These keys are not as sensitive as their corresponding private keys, but the loss of them can lead to an inability to spend the funds just the same. For this reason, they should be stored in other places besides their default locations alongside every private key.
Why it matters / what it protects: Since all N public keys will need to be known, increasing the redundancy of this information is critical. This also serves to highlight another reason some people may be unfortunately dissuaded from storing their own Bitcoin: By the description of multisignature, and the threshold quorum, it is a common mistake to think that the total loss of one of the N pieces of information will be inconsequential. Hearing stories about losses due to such a mistake is bound to cause more fear.
(1) Never allow access to raw private keys
What this means: Personnel responsible for signing should never have direct access to private keys. As an example, even if you are maintaining very strict security controls, it may be possible for an agent responsible for managing the system to copy a private key during signing. The frequency of access for a key used to sign is too high to ever be directly observable.
Why it matters / what it protects: Imposing this control means that, even though you should only employ trusted agents, a nefarious agent internal to the system will not be able to copy a key. This control is not enough by itself however, as later controls will detail.
(2) Ensure human policy is not solely responsible for preventing thefts
What this means: Ensure that you have a system in place that renders a nefarious agent or agents incapable of attaining a signature that would not have been requested by the customer. At Knox we employ such technology to reduce to an absolute minimum any situation in which collusion between agents could lead to a loss.
Why it matters / what it protects: A private key spends most of its life in a signing state. It is dangerous to trust some internal agent to do the right thing. No matter the level of physical security you have introduced, the attempt to rely on physical security and policy controls alone to prevent an inappropriate spend is not enough.
(3) Use HSMs (Hardware Security Modules)
What this means: HSMs are designed to make it impossible for an attacker, even with direct physical access, to extract private keys from the device. All HSMs should be offline, as mentioned.
Why it matters / what it protects: The use of HSMs is useful to assure yourself that you have implemented some of the higher-level goals such as not allowing agents to copy private keys. How they are used however is important, and there are many unsafe methods of using them.
(4) Use more than one, or M HSMs
What this means: As much security as an HSM may grant you, you should use as many distinct physical HSMs as the threshold quorum of your multisignature scheme. If you were to use only one, you may as well not rely on a multisignature scheme for protection during signing.
Why it matters / what it protects: Single points of failure are fatal, and should be rooted out of any secure system. In the same way that we would strongly discourage the use of single-signature schemes, we believe that a single HSM will never afford one the appropriate level of safety. We will point out other possible sources of fatal single points of failure in subsequent controls.
(5) Maintain strict physical security controls for signing locations
What this means: There are many independent controls that we can get into here, but broadly speaking: Physically securing a premise is something people have been doing for ages. At minimum, you should have every signing location under constant video surveillance, and it should be alarmed such that any intrusion attempt will bring a swift armed response.
Why it matters / what it protects: While the many other controls will still prevent a theft, it is best that these are never exercised, and an attacker is caught before tripping on any further controls. This is also a good deterrent. There are a larger number of much easier targets in the world if you erect the kinds of controls we are now discussing.
(6) Do not rely on physical security alone
What this means: Physical security is important, and care should be taken to ensure that reaching something like an HSM is extremely difficult, and will trigger appropriate alarms and an armed response. However, we recommend exercising the most extreme assumption: That even with the barn doors wide open, an attempted theft will not succeed.
Why it matters / what it protects: While physical security will prevent outsiders from entering, and heavy use of security cameras could in theory spot nefarious behavior, it is safest to assume that you will fail to catch the behavior. It is best that your system is resistant to such theft attempts.
(7) Use dual control personnel to sign
What this means: Always make sure at least two people are present for signing.
Why it matters / what it protects: While the extreme controls already introduced mean that a nefarious agent is not likely to execute a successful theft, we recommend using at least two personnel for some of the reasons stated previously.
(8) Maintain at least M distinct signing locations
What this means: Maintain M distinct premises, each with their own independent physical security. For example, in a 3 of 4 multisignature scheme, this implies 3 distinct premises with physical vaulting for HSMs.
Why it matters / what it protects: This is yet another example of eliminating any single points of failure. If a multisignature scheme is being engaged, which it should be if Bitcoin is being stored for someone else, what are otherwise completely independent keys should never be co-located.
(9) Ensure that no master keys exist that can unlock signers
What this means: This is one of the biggest mistakes that are committed by users of an HSM, typically for convenience. There should be absolutely no way for an HSM, once armed, to have its private keys extracted or observed. There should also be no way for such an HSM to be asked to sign something without the customer’s explicit permission, enforceable of course by a machine, and not a human policy.
Why it matters / what it protects: This is yet another example of the introduction of a single point of failure. Assume for example that you maintain multiple HSMs, spread across distinct facilities, obey every other control, but there exist master keys that could unlock them. This is equivalent to not having produced the segregation of risk in the first place. This is a concentrated risk, and should be avoided at all costs.