3

I'm writing a small script that will dump the utxo database to a text file. As far as I'm aware, these are the most common script patterns indicated by the type field inside each value:

e.g. value:    b98276a2ec7700cbc2986ff9aed6825920aece14aa6f5382ca5580
               <----><----><><-------------------------------------->
                /      /    \                   \
 height/coinbase  value      type                script data


0x00 = P2PKH (hash160 public key)
0x01 = P2SH  (hash160 script)
0x02 = P2PK
0x03 = P2PK
0x04 = P2PK (uncompressed)
0x05 = P2PK (uncompressed)

It seems as though the script type is there so that you only need to store the minimal amount of script data inside the database (e.g. the unique public keys and script hashes inside P2PK, P2PKH, and P2SH).

Anyway, would I be correct in assuming that you could only get an address from script types 0 and 1 (by base58 encoding the script data)?

In other words, the chainstate leveldb does not include any witness data to allow you get addresses for utxos using P2WPKH and P2WSH scripts?

EDIT: Here's the finished tool: https://github.com/in3rsha/bitcoin-utxo-dump

inersha
  • 2,928
  • 1
  • 17
  • 41

1 Answers1

3

Yes, Bitcoin Core does do some compression of standard output scripts in order to store the minimal amount of data needed.

Anyway, would I be correct in assuming that you could only get an address from script types 0 and 1 (by base58 encoding the script data)?

Yes

In other words, the chainstate leveldb does not include any witness data to allow you get addresses for utxos using P2WPKH and P2WSH scripts?

If by witness data you mean Segwit outputs, no. ALL output's scriptPubKeys are stored in the database, otherwise it would be unable to verify transactions that spend arbitrary scripts and segwit scripts. These scripts are stored without special compression (i.e. the type stuff going on here) and just serialized as is. Segwit outputs are already in a minimal form so there is no need for a type here.

Since all scriptPubKeys are stored in the database, you can compute the address for every UTXO if it has one, including segwit UTXOs.

Also, the term witness data refers to the signatures and input data for a transaction that spends a segwit output. It does not refer to anything that is segwit related. Using the term "witness data" in your question is confusing.

Andrew Chow
  • 67,209
  • 5
  • 76
  • 149
  • Ha, had the completely the wrong idea with where I was going there. Thanks for clearing that up. – inersha Mar 29 '19 at 18:52
  • Seeing as P2WPKH and P2WSH scripts are stored without special compression, what does the script `type` field indicate for these entries? From a quick look it appears that P2WPKH are type `28` and P2WSH are type `40`. I'm not sure what the number indicates when it's greater than `5`. – inersha Mar 30 '19 at 11:35
  • It seems as though if the `type` is greater than `5`, then it indicates the **size** of the upcoming script. Although the size given is actually 6 bytes greater, so you should subtract 6 to get the actual size of the upcoming script. – inersha Mar 30 '19 at 12:09
  • A brief description of the compression can be found here: https://github.com/bitcoin/bitcoin/blob/master/src/compressor.h#L25. There are only 6 types, `0x00` is P2PKH, `0x01` is P2SH, `0x02` and `0x03` are compressed pubkeys. Those bytes are actually part of the pubkey itself. `0x04` and `0x05` are uncompressed pubkeys. After that, the first byte is the `script size + 6`, so the real size is `byte - 6`. – Andrew Chow Mar 30 '19 at 16:11
  • Thank you. Why is it that `0x05` is used for uncompressed public keys, seeing as they always start with `0x04`? Why not just use `0x04` for all uncompressed public keys? – inersha Mar 30 '19 at 16:17
  • The uncompressed pubkeys are compressed when they are added to the db. 0x04 and 0x05 are used to indicate that the key is supposed to be uncompressed and those indicate whether the y value is even or odd so that the full uncompressed key can be retrieved. It's basically the same thing as for normal pubkey compression, just with different values to indicate that the script uses an uncompressed key and not a compressed one. – Andrew Chow Mar 30 '19 at 19:14
  • Gotcha, thank you very much, you have explained everything! – inersha Mar 30 '19 at 19:28
  • My question is why is type required when ALL output's scriptPubKeys are stored in the database. Isn't it redundant data ? – dark knight Jun 25 '20 at 07:27
  • @darkknight Type is required because some scriptPubKeys are compressed in a form that is ambiguous without a distinguishing type. – Andrew Chow Jun 25 '20 at 15:27