Converting a 64-bit hex string to INT64 with BigQuery

I recently had the need to convert a 64-bit hex string to an INT64 value in a BigQuery user defined function (UDF). I’ll post more about why later but for now lets just say you have a sixteen character hex string such as `7FFFFFFFFFFFFFFF` and need it as an INT64 value. Easy you say, just prepend “0x” and do `CAST("0x7FFFFFFFFFFFFFFF" AS INT64)`.
But what if you have something just a bit bigger, like `8000000000000000`? Let’s try that:

`SELECT CAST("0x8000000000000000" AS INT64);` and we get the error:

`Could not cast literal "0x8000000000000000" to type INT64`

The problem is that INT64 is a signed value in BigQuery (and other Google cloud databases like Spanner) with values ranging from `-9,223,372,036,854,775,808` to `9,223,372,036,854,775,807`.

So we have to convert from the hex using two’s compliment. I recently wrote some code in Python to do this:

``````def uint64_to_signedInt64(v):
MAX_INT64 = 2**63-1

if v > MAX_INT64:
return (v & MAX_INT64) - (MAX_INT64 + 1)
else:
return v
``````

But since we can’t create an INT64 from the hex string we break it into two 32-bit values. The most significant is the first 8 characters of the hex string (remember each character in hex is 4-bits or a “nibble”). The least significant is the last 8 characters. `v & MAX_INT64` is emulated by taking the most significant 32-bits and `AND`-ing with `0x7fffffff` and shifting to the left 32 times then `OR`-ing with the least significant 32-bits. This gives us a pure-SQL UDF like so:

``````CREATE TEMP FUNCTION
hex64ToInt64(hex STRING)
RETURNS INT64 AS (
IF(hex < "8000000000000000",
cast(concat("0x", hex) AS INT64),
(SELECT (((ms32 & 0x7fffffff) << 32) | ls32) - 0x7fffffffffffffff - 1
FROM (SELECT cast(concat("0x", substr(hex, 1, 8)) AS INT64) AS ms32,
cast(concat("0x", substr(hex, 9, 8)) AS INT64) AS ls32))));
``````

Now we can calculate `select hex64ToInt64("8000000000000000")` which is the most negative signed INT64 number `-9223372036854775808` and `select hex64ToInt64("ffffffffffffffff")` is the least negative number `-1`.

I’ll soon show you how I’ve used this function as part of a BigQuery emulation of Postgres’ HLL extension from Citus

Written on March 27, 2019