← writethat.blog

Registry you can actually query

Screenshot of reg with logs overlay

Docker registries like docker.io and quay.io are public endpoints for browsing and fetching OCI images. You can also run a private registry yourself! The registry API is standardized here and the most ubiquitous open-source implementation is available here. Registries can work with myriad data stores, but in practice, they all just run on a comfy and reliable S3-compatible object store. It's cheap, it's persistent, it scales to infinity (if you're willing to renegotiate the "it's cheap" claim), but there's one major flaw. Querying arbitrary information about the stored images (e.g. "list all image repositories in this registry, please") is either:

  • painfully slow; or
  • painfully unreliable; or
  • downright disabled.

Wouldn't the world be a tad more beautiful if there was a registry you can actually query? Let's implement one!

Core API

The OCI distribution specification is based on HTTP. It lists all endpoints that a conformant implementation must support, along with their legal methods and status codes for success and failure:

ā„– Method Path Success Error
end-1 GET /v2/ 200 404/401
end-2 GET/HEAD /v2/<name>/blobs/<digest> 200 404
end-3 GET/HEAD /v2/<name>/manifests/<reference> 200 404
end-4a POST /v2/<name>/blobs/uploads/ 202 404
end-4b POST /v2/<name>/blobs/uploads/?digest=<digest> 201/202 404/400
end-5 PATCH /v2/<name>/blobs/uploads/<reference> 202 404/416
end-6 PUT /v2/<name>/blobs/uploads/<reference>?digest=<digest> 201 404/400
end-7 PUT /v2/<name>/manifests/<reference> 201 404/413
end-8a GET /v2/<name>/tags/list 200 404
end-8b GET /v2/<name>/tags/list?n=<integer>&last=<tagname> 200 404
end-9 DELETE /v2/<name>/manifests/<reference> 202 404/400/405
end-10 DELETE /v2/<name>/blobs/<digest> 202 404/400/405
end-11 POST /v2/<name>/blobs/uploads/?mount=<digest>&from=<other_name> 201/202 404
end-12a GET /v2/<name>/referrers/<digest> 200 404/400
end-12b GET /v2/<name>/referrers/<digest>?artifactType=<artifactType> 200 404/400
end-13 GET /v2/<name>/blobs/uploads/<reference> 204 404

The REST paths are rather self-descriptive, modulo the cryptic "referrers" with an even more cryptic definition: "a list of manifests with a subject relationship to a specified digest." Let's forget about these for now. The most important part is storing layers (a.k.a. blobs) and all kinds of metadata about images, their layers, versions, tags, etc. I described the OCI image format in more detail in this post.

How does the official implementation handle all the data management? It can be summed up as "let's squeeze absolutely everything in S3." That applies both to data (OCI layers) and metadata (OCI manifests, tags, uploads, etc). It also makes the registry service extremely lightweight, as it doesn't keep any data itself. It really is barely more than syntactic sugar over the S3 API. The standard also specifies that it's fine to return an HTTP redirect response, which can be used to send presigned S3 requests. Then, users can fetch data from S3 directly. The storage layout is straightforward – data is kept in one prefix, metadata in another.

$ aws s3 ls my-fancy-bucket/docker/registry/v2/
                           PRE blobs/          ← data goes here
                           PRE repositories/   ← metadata goes here

Simple and brilliant, because everything is stored in a single place with high reliability guarantees. It's also what makes querying so bothersome: most interesting inquiries need to scan a huge S3 bucket. That's inefficient in two important ways: it’s slow and it’s expensive.

What if we keep metadata somewhere else? Enter SQLite.

Metadata store

The OCI distribution format can be translated to a simple SQL schema. It can look roughly like this:

$ sqlite3 registry.db '.schema'
CREATE TABLE tags (
                        repository TEXT NOT NULL,
                        name TEXT NOT NULL,
                        PRIMARY KEY(repository, name)
                );
CREATE TABLE manifests (
                        tag_rowid INTEGER NOT NULL,
                        manifest_json TEXT NOT NULL,
                        PRIMARY KEY(tag_rowid)
                );
CREATE TABLE manifest_layers (
                        manifest_rowid INTEGER NOT NULL,
                        layer_digest TEXT NOT NULL,
                        layer_index INTEGER NOT NULL,
                        PRIMARY KEY(manifest_rowid, layer_digest, layer_index)
                );
CREATE TABLE layers (
                        digest TEXT PRIMARY KEY,
                        media_type TEXT NOT NULL,
                        size INTEGER NOT NULL
                );
CREATE TABLE upload_sessions (
                        upload_id TEXT PRIMARY KEY,
                        repository TEXT NOT NULL,
                        digest TEXT,
                        s3_upload_id TEXT,
                        s3_key TEXT NOT NULL,
                        total_size INTEGER,
                        uploaded_size INTEGER DEFAULT 0
                );

Publishing an image boils down to uploading the layers to an object store and updating the metadata:

  • which manifest do the layers belong to;
  • which repository and image tag does the uploaded manifest describe;
  • etc.

Keeping information in SQLite is easy, because it's just a file. An OCI registry is a distributed system, though, so how can you store everything in a file and pretend it's scalable? A key observation is that we can store metadata in a "write-through" manner. Save everything in SQLite alright, but also store it in S3, just like the original implementation does. You can then treat SQLite as a fast-yet-powerfully-queryable cache.

What if you lose the file or it becomes corrupt?

Bootstrap

First of all, since all the metadata is preserved in S3, the registry can scan the whole bucket and recreate the metadata structure in SQLite. That may take a few centuries for a large repository, but is only done sparingly – ideally once. The ability to bootstrap from an existing object store bucket also makes the registry a "drop-in replacementā„¢," as tech companies love to describe their products.

Turso

Now that an SQLite-as-a-Service company (Turso) has entered the market, we can offload the complicated backup and replication bits to them, while still having a simple local file to manage. The embedded replicas feature looks relevant.

Backup backup ideas

An SQLite database is a file, so if you're somehow not enticed by Turso (shame on you), other backup strategies exist. There’s "just rsync with cron" and Litestream, to name just two.

Metadata insights

With all metadata kept locally, we can unleash SQLite's full potential and get all kinds of information from our new store.

All repositories in the registry, in JSON:

> select json_group_array(repository) from tags;
["abc","alpine","hello-world","podman-hello","etcd","node-ex
porter","distroless-static","distroless-base","pause","cored
ns","containerbase","nginx","redis","alpine","alpine"] 

Repositories with the most tags:

> select repository, count(name) tags from tags group by repository
    order by tags desc limit 3;
repository     tags
-------------  ----
alpine         3   
abc            1   
containerbase  1 

Largest layers with matching image tags:

select repository, tags.name tag, (size/1024/1024) || ' MiB' size from layers
    join manifest_layers on digest = layer_digest
    join tags on manifest_rowid = tags.rowid
    order by size desc limit 8;
repository     tag       size
-------------  --------  --------
abc            latest    218 MiB                                    
abc            latest    33 MiB                    
containerbase  latest    28 MiB                               
abc            latest    22 MiB                    
nginx          alpine    16 MiB                    
coredns        v1.10.1   15 MiB                    
redis          7-alpine  11 MiB                    
node-exporter  v1.6.1    9 MiB                     

List layers most frequently reused by multiple manifests:

select substr(layer_digest, 8, 8) digest, count(*) reused from manifest_layers
    group by layer_digest order by reused desc limit 3;
digest    reused
--------  ------
e33bce57  2     
b6824ed7  2     
9ef7d74b  2

reg

 $$$$$$\   $$$$$$\   $$$$$$\  
$$  __$$\ $$  __$$\ $$  __$$\ 
$$ |  \__|$$$$$$$$ |$$ /  $$ |
$$ |      $$   ____|$$ |  $$ |
$$ |      \$$$$$$$\ \$$$$$$$ |
\__|       \_______| \____$$ |
                    $$\   $$ |
                    \$$$$$$  |
                     \______/

I'm a huge fan of concise naming, so this project is codenamed reg, and comes with obligatory low effort 3D ASCII art. Contributions are welcome!

It is still very much a work-in-progress, but it can already:

  • Bootstrap itself from S3.
  • Properly process uploads (tested locally with min.io).
  • Serve image pulls.

In other words, it's usable enough for local development.

Demo

Pushing an image to a local registry

$ skopeo copy --dest-no-creds --dest-tls-verify=false docker://alpine:3.23 docker://localhost:2137/alpine:3.23
Getting image source signatures
Copying blob 014e56e61396 skipped: already exists  
Copying config 7acffee03f done   | 
Writing manifest to image destination

Logs from a successful push

$ go build ./cmd/reg && AWS_REGION=us-east-1 AWS_PROFILE=minio ./reg serve -b buck
 $$$$$$\   $$$$$$\   $$$$$$\  
$$  __$$\ $$  __$$\ $$  __$$\ 
$$ |  \__|$$$$$$$$ |$$ /  $$ |
$$ |      $$   ____|$$ |  $$ |
$$ |      \$$$$$$$\ \$$$$$$$ |
\__|       \_______| \____$$ |
                    $$\   $$ |
                    \$$$$$$  |
                     \______/ 
Server starting on :2137 with bucket 'buck'...
time=2025-12-17T12:57:50.270+01:00 level=DEBUG msg=getBlob name=alpine blobKey=docker/registry/v2/blobs/sha256/01/014e56e613968f73cce0858124ca5fbc601d7888099969a4eea69f31dcd71a53/data method=HEAD
time=2025-12-17T12:57:50.516+01:00 level=DEBUG msg=getBlob name=alpine blobKey=docker/registry/v2/blobs/sha256/7a/7acffee03fe864cd6b88219a1028855d6c912e7cf6fac633aa4307529fd0cc08/data method=HEAD
time=2025-12-17T12:57:50.540+01:00 level=DEBUG msg=putManifest name=alpine reference=3.23
time=2025-12-17T12:57:50.540+01:00 level=DEBUG msg="putting manifest blob" blobKey=docker/registry/v2/blobs/sha256/a1/a107a3c031732299dd9dd607bb13787834db2de38cfa13f1993b7105e4814c60/data
time=2025-12-17T12:57:50.546+01:00 level=DEBUG msg="putting manifest meta" metaKey=docker/registry/v2/repositories/alpine/_manifests/tags/3.23/current/link
time=2025-12-17T12:57:50.550+01:00 level=DEBUG msg="putting manifest index meta" metaIndexKey=docker/registry/v2/repositories/alpine/_manifests/tags/3.23/index/sha256/a107a3c031732299dd9dd607bb13787834db2de38cfa13f1993b7105e4814c60/link
time=2025-12-17T12:57:50.553+01:00 level=DEBUG msg="putting manifest revisions meta" revisionsKey=docker/registry/v2/repositories/alpine/_manifests/revisions/sha256/a107a3c031732299dd9dd607bb13787834db2de38cfa13f1993b7105e4814c60/link
Put manifest for alpine with reference 3.23

Pulling and running a container

$ podman run --tls-verify=0 docker://localhost:2137/alpine:3.19 which busybox
Trying to pull localhost:2137/alpine:3.19...
Getting image source signatures
Copying blob 17a39c0ba978 done   | 
Copying config 83b2b6703a done   | 
Writing manifest to image destination
/bin/busybox

Logs from a successful pull

time=2025-12-17T13:04:33.652+01:00 level=DEBUG msg="Retrieved manifest" repo=alpine tag=3.19
time=2025-12-17T13:04:33.653+01:00 level=DEBUG msg=getBlob name=alpine blobKey=docker/registry/v2/blobs/sha256/83/83b2b6703a620bf2e001ab57f7adc414d891787b3c59859b1b62909e48dd2242/data method=GET
time=2025-12-17T13:04:33.661+01:00 level=DEBUG msg=getBlob name=alpine blobKey=docker/registry/v2/blobs/sha256/17/17a39c0ba978cc27001e9c56a480f98106e1ab74bd56eb302f9fd4cf758ea43f/data method=GET
time=2025-12-17T13:05:39.019+01:00 level=DEBUG msg="Retrieved manifest" repo=alpine tag=3.19
time=2025-12-17T13:05:39.021+01:00 level=DEBUG msg=getBlob name=alpine blobKey=docker/registry/v2/blobs/sha256/83/83b2b6703a620bf2e001ab57f7adc414d891787b3c59859b1b62909e48dd2242/data method=GET
time=2025-12-17T13:05:39.027+01:00 level=DEBUG msg=getBlob name=alpine blobKey=docker/registry/v2/blobs/sha256/17/17a39c0ba978cc27001e9c56a480f98106e1ab74bd56eb302f9fd4cf758ea43f/data method=GET

šŸ‘‹

Give it a try! Especially if you also feel like reducing the number of missing features. You might have noticed the glaring --tls-verify=0 flag passed in every skopeo command due to reg's inability to speak HTTPS.