Snowflake IDs
distributed-systems
Snowflake IDs are an ID format encoded in a 64-bit signed integer.
The basic algorithm is:
- At startup, generate or retrieve the 10 bit instance ID
- When generating the ID, get the current unix timestamp and take the first 41 bits
- If another ID has been generated in the same millisecond, increment the sequence to differentiate the IDs
- If the sequence overflows, wait 1ms and try again
- Take the timestamp, instance ID, and sequence parts, and smush them together as show in the image.
timestampPart := uint64(currentTimestamp << (instanceIDBits + sequenceBits))
workerIDPart := uint64(instanceID << sequenceBits)
sequencePart := uint64(nextSequence)
id := int64(timestampPart | workerIDPart | sequencePart)
My understanding of the history behind the development of Snowflake is that Twitter needed an ID format that would work with a sharded database, i.e. not auto-increment, but was backwards compatible with auto-increment, i.e. a 64-bit integer. Without those requirements, there isn’t really a great reason to choose it over something like UUIDs which don’t have any of the operational complexity I’ll get into later.
Contention
When generating IDs on the same instance, in the same millisecond, we increment a sequence counter to prevent generating the same ID. This sequence is 12 bits so we can generate 4096 (212) IDs per millisecond, or one ID every 244ns. That’s 4,000,000 ID s-1. Based on my benchmarking, my implementation doesn’t achieve even close to that so real world contention will be exceedingly rare. From my tests and benchmarking, it looks like the main source of slowdown, ignoring inefficiencies in the code, is caused by the clock moving backwards so we have to articially slow the ID generation.
Machine ID generation
A more real world problem is collisions between instance IDs. As there is no communication between generating instances, the only thing stopping them generating the same ID is the ID. The ID is 10 bits, and we can calculate how many instances we could have in a cluster before collisions become likely using the Birthday Problem1:
Servers | Collision probability |
---|---|
5 | 0.9% |
10 | 4% |
20 | 17% |
30 | 35% |
40 | 54% |
50 | 70% |
So, even with a small cluster of 40 instances, it becomes likely that we’ll have a instances with the same ID. A better solution would be to assign instances IDs from a central source on startup so that we can prevent these clashes. I suspect that this is what Twitter does. In this case, you could have up to 1024 instances which is much more tenable.
Basically, none of this is a problem for the project I implemented this for so it’s technically viable to use snowflake IDs.
Implementation
I’m using Snowflake IDs in a personal project because (1) I think they look nicer the
UUIDs in the URL, and (2) they were interesting to implement. Also, I’m probably only
ever going to run this on one instance. This was my first proper usage of Go’s
benchmarking capabilities. Go’s recent release of testing/synctest
also looks like
it’ll be really intesting to use here.
To implement the algorithm, I mostly used Twitter’s old open source implementation2 as a guide. It’s written in Scala which I’m not at all familiar with but seems to not require considering concurrent ID generation which greatly simplifies it compared to my implementation.
As part of building this, I learnt about the CAS operation: Compare and Swap. This is where you atomically assign a value if, and only if, the old value is what you expect it to be:
g.lastTimestampAndSequence.CompareAndSwap(lastTimestampAndSequence, currentTimestampAndSequence)
This will update lastTimestampAndSequence
to currentTimestampAndSequence
and
return true
, if lastTimestampAndSequence
is equal to
lastTimestampAndSequence
. The reason that I’ve got both the timestamp and sequence
in here is that we need to update them both atomically so it’s important to pack them
into the same value that can be atomically switched. My initial implementation did:
g.lastTimestamp.CompareAndSwap(lastTimestamp, currentTimeStamp) &&
g.lastSequence.CompareAndSwap(lastSequence, currentSequence)
It is possible that two goroutines generating an ID could race and one sets the timestamp and the other sets the sequence. This would lead to the ID generator state being corrupted.
What I like about my implementation is that it just gets all the ID generators to race and whoever wins returns the ID they generated and the others just have to try again. I’m not sure exactly how performant this would be under high load because this approach could lead to high tail latency due to generators continually racing.
Another interesting implementation detail is that because we’re using unix timestamps we have to account for the clock moving backwards. We can’t use monotonic time because we need real time to give sufficient entropy (is this the right term?) to IDs as monotinc time is just the time since the server was booted. Detecting this is easy because we already keep track of the timestamp when we last generated an ID. If our current timestamp is less than that, then we’ve consequentially moved back in time.
Benchmarking
Twitter’s repository states two performance requirements:
- minimum 10k IDs per second per process
- response rate 2ms (plus network latency)
As part of building this library I added some benchmarking to make sure I fulfilled that. Writing benchmarks in Go is super simple, for this all I had to do was:
func BenchmarkSnowflakeID(b *testing.B) {
generator, err := snowflake.NewGenerator()
if err != nil {
b.Fatal(err)
}
for i := 0; i < b.N; i++ {
var id int64
for j := 0; j < 10_000; j++ {
id = generator.ID()
}
if id == 0 {
b.Fatal("Never generated an ID")
}
}
}
Running it with:
go test -v -bench=BenchmarkSnowflakeID -run=BenchmarkSnowflakeID ./snowflake/
gives,
BenchmarkSnowflakeID-8 91 12953329 ns/op
This means it took, on average, 0.01s to generate 10,000 IDs. Comfortably under the requirement of 10,000 IDs per second. Extrapolating, we should be able to generate 1,000,000 IDs per second.
A more interesting test is how do we perform when generating IDs concurrently as this will force out any contention issues. This benchmark is pretty simple as well:
func BenchmarkSnowflakeIDConcurrent(b *testing.B) {
generator, err := snowflake.NewGenerator()
if err != nil {
b.Fatal(err)
}
for i := 0; i < b.N; i++ {
var id int64
var wg sync.WaitGroup
for j := 0; j < 10_000; j++ {
wg.Add(1)
go func() {
id = generator.ID()
wg.Done()
}()
}
wg.Wait()
if id == 0 {
b.Fatal("Never generated an ID")
}
}
}
this gives:
BenchmarkSnowflakeIDConcurrent-8 44 24364501 ns/op
So, on average it takes 0.02s to generate 10,000 IDs. It seems like contention definitely plays a role. However, 500,000 IDs per second is still comfortably within out requirements.
The other performance requirement, 2ms response rate, is also satisfied. ~20ms to generate 10,000 IDs, so 0.002ms for a single ID. Not bad if you ask me. Your database queries will be taking orders of magnitude longer than that so ID generation won’t be the bottleneck.