Encoding Base58: Understanding and implementing in Scala

post-thumb

I recently needed to convert to and from Base58. An existing crypto library was not an option, because I just needed one function exact function. However, I could not find a suitable Scala implementation, and the algorithm in the specification was unclear.

Understanding the algorithm

An encoding algorithm is just a base transformation, it is the same as transforming between base 2 (binary), 16 (hexadecimal), 10 (decimal) or any other base. The trick here is that the target base has a mapping we can nicely represent to ASCII symbols and as a string of characters. For Base58, the alphabet is:

val alphabet = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"

The source base can be any base, but in practice we usually want to transform an array of bytes, since bytes are value between 0 and 255, our source base is 256. Hence, we need to define a function like this:

def base58Convert(a: Array[Byte]): Array[Int]

We return Array[Int] for convenience, Array[Byte] would be also OK. Once we have this conversion, we just need to map the result to a string using the alphabet and we are done. Assuming we have a function that converts an Int to the corresponding Char in the alphabet:

def idxToChar(idx: Int): Char

We can define the base encoding:

def base58Convert(a: Array[Byte]) =
  baseConvert(byteArrayToBigInt(a), 58).map(idxToChar).mkString

We also need a byteArrayToBigInt function to transform the array of bytes to an integer that we can encode in Bas58. Hence, we have a skeleton on which we can build our implementation.

Implementing base58Convert

The base conversion algorithm is straightforward. Given a number n: Int, we can convert it to base b: Int by performing the following operations:

def baseConvert(n: BigInt, b: Int): Array[Int] = {
  if (n == 0) Array.emptyIntArray
  else {
    val m = n % b
    val q = n / b
    baseConvert(q, b).appended(m.toInt)
  }
}

The algorithm just exploits the fact that any number $n$ in base $b$ we can express as the sum of its digits $d_i$ times the base power the digit’s position: $$ \begin{equation} n = d_m d_{m - 1} \cdots d_0 = \sum_{i = 0}^{m} d_i \cdot b^i \end{equation} $$ For example, for the number 123 in base 10 we have: $$ \begin{equation} 123 = 1 \cdot 10^2 + 2 \cdot 10^1 + 3 \cdot 10^0 \end{equation} $$ The algorithm works because the division of a number by its base always yields the last digit of number in that base as remainder. In our example with 123, dividing 123 by 10 yields a quotient of 12 and a remainder of 3. By the equation (1), if we subtract the remainder and divide by b, the quotient are the rest of the digits, so we can apply the algorithm recursively.

Given the right idxToChar function, we can convert between bases. Let us look at a base conversion from decimal to binary:

def encodeBinary(n: Int) = baseConvert(n, 2).map(_.toString).mkString

Here, since we are converting to binary, the idxToChar function is just the toString() function. Indeed, the number 1 will map to the character "1".

So, to encode an integer to Base58, we just need the right idxToChar function. Luckily, given an alphabet, this function is easy to implement in Scala.

val idxToChar = Map(alphabet.zipWithIndex.map(_.swap):_ *)

Finally, we need to convert the array of bytes to an integer. This can be done, once again, using the same equation (1), only this time the digits are the bytes in the array of bytes and the base is 256. Mathematically, for an array $a$ of size $m + 1$ we have: $$ \begin{equation} n = \sum_{i = 0}^{m} a_{m - i} \cdot 256^{m-i} \end{equation} $$ Encoding this formula in Scala gives us:

def byteArrayToBigInt(a: Array[Byte]) = // do not use this function, it has a problem!
  a.reverse.zipWithIndex.foldLeft(BigInt(0)) { (acc, b) =>
    val (digit, idx) = b
    acc + BigInt(256).pow(idx) * BigInt(digit)
  }

The implementation above seems correct and gives the right result for some test vectors, but it has an issue. Bytes in Scala are signed, and the algorithm (as shown in the mathematical formulas) considers digits which are always unsigned!

You can see the problem by running the following code:

base58Convert(Array.fill(1)(-59.toByte))

Indeed, the baseConvert algorithm will return a negative remainder, that does not appear in the idxToChar map. A simple solution to this is to use Java’s java.lang.Byte.toUnsignedInt function. The new function is now:

def byteArrayToBigInt(a: Array[Byte]) =
  a.reverse.zipWithIndex.foldLeft(BigInt(0)) { (acc, b) =>
    val (digit, idx) = b
    acc + BigInt(256).pow(idx) * BigInt(java.lang.Byte.toUnsignedInt(digit))
  }

Implementing Base58 encoding and decoding

The algorithm we have defined is still not Base58 encoding. Indeed, it returns an empty string for an array of bytes containing only zeroes. It also does not consider any leading zeroes in the input array. The Base58 encoding deals with this by just encoding those bytes directly into the Base58 alphabet. Thus, we just need to take all bytes that are equal to zero and encode them. The final version of the encodeBase58 function is:

def encodeBase58(a: Array[Byte]) =
  a.takeWhile(_ == 0).map(x => idxToChar(x)).mkString ++ base58Convert(a)

Decoding the string is the opposite process:

  • We decode the zeroes from the beginning.
  • We decode Base58 encoded number to base 256 and concatenate it with the zeroes (if any)

So, the first thing that we need is a base58ToBigInt function. By equation (1) this is the same as byteArrayToBigInt but with Base58.

def base58ToBigInt(a: Array[Byte]) =
  a.reverse.zipWithIndex.foldLeft(BigInt(0)) { (acc, b) =>
    val (digit, idx) = b
    acc + BigInt(58).pow(idx) * BigInt(java.lang.Byte.toUnsignedInt(digit))
  }

Then, we need a base256Convert function, which is the same as before but with the base changed:

def base256Convert(a: Array[Byte]) =
  baseConvert(base58ToBigInt(a), 256).map(_.toByte)

Finally, the decodeBase58 function needs something to go from the alphabet to indexes. We do this by:

val charToIdx = Map(alphabetBase58.zipWithIndex: _*)

We can then write the decodeBase58 as:

def decodeBase58(s: String) =
  s.map(charToIdx).takeWhile(_ == 0).map(_.toByte).toArray ++ // recover leading zeroes
    base256Convert(s.map(charToIdx).dropWhile(_ == 0).map(_.toByte).toArray)

The first part just recovers the leading zeroes and the second part decodes back to base 256.

Optimizing the Algorithms

The algorithms presented work, but there is room for improvement. Indeed, the conversion to BigInt can be done automatically since Scala’s BigInt implementation already provides a constructor to build a BigInt from a Array[Byte]. The only problem with this is that BigInt assumes a signed representation, and the Base58 assumes an unsigned representation.

However, this is easy to solve. The BigInt representation is in two’s complement. In practical terms, this means that if the highest order bit is one, then the number is understood to be negative, otherwise, the number is understood to be positive (what we need). We can then rewrite the byteArrayToBigInt as follows:

def byteArrayToBigIntOptimized(a: Array[Byte]) =
  BigInt(0.toByte +: a)

Adding a 0 byte at the beginning forces the conversion to yield a positive number (because of two’s complement encoding). In the same way, we can redefine base256Convert:

def base256ConvertOptimized(a: Array[Byte]) =
  base58ToBigInt(a).toByteArray.dropWhile(_ == 0)

The toByteArray from BigInt will do the heavy lifting for us. The only gotcha is that we know that our encoding can never have leading zeros (because we encode them directly), so we need to do the dropWhile operation at the end.

Finally, the baseConvert operation is not stack-safe. We can use a nice unfold to create a stack-safe version of baseConvert:

def baseConvertOptimized(n: BigInt, b: Int): Array[Int] =
  LazyList
    .unfold(n)(n => if (n == 0) None else Some((n /% b).swap))
    .map(_.toInt)
    .reverse
    .toArray

Final Version of the Algorithm

Putting all optimizations together, we can define the Base58 encoding in one Scala statement:

def encodeToBase58(array: Array[Byte]): String =
  (LazyList.fill(array.takeWhile(_ == 0).length)(1.toByte) ++ LazyList
    .unfold(
      BigInt(0.toByte +: array)
    )(n => if (n == 0) None else Some((n /% 58).swap))
    .map(_.toInt)
    .reverse
    .map(x => idxToChar(x))).mkString

We can do the same in two statements (removing decoding error handling for clarity) with the decoding:

def decodeFromBase58(b58: String): Array[Byte] = {
  val zeroCount = b58.takeWhile(_ == '1').length
  Array.fill(zeroCount)(0.toByte) ++
    b58
      .drop(zeroCount)
      .map(charToIdx)
      .toList
      .foldLeft(BigInt(0))((acc, x) => acc * 58 + x)
      .toByteArray
      .dropWhile(_ == 0.toByte)
}

The Bitcoin Version

The encoding we just saw is the one that is used to encode Bitcoin addresses. However, Bitcoin’s addresses have an extra step. They perform a double SHA-256 hash on the payload and concatenate the first 4 bytes of it to the string to be encoded as a checksum. The Scala code (assuming we have a dependency that provides a SHA-256 implementation) for this would be.

def encodeToBase58Check(payload: Array[Byte]): String = {
  val checksum =
    sha256.hash(sha256.hash(payload).value).value.take(4)
  encodeToBase58(payload.concat(checksum))
}

If you have read until here and liked this content please consider subscribing to my mailing list. I write regularly about Web3 and Scala.

comments powered by Disqus