redesigning the pkg tag

## Background 

Every package built with `linuxkit pkg build` has an automatically generated digest as a tag, which also can be shown via `lkt pkg show-tag`. This digest is used to determine whether or not a package's specific state is available in local cache, on registry, or needs a new build.

The tag can be overridden on user request.

The generated tag has limits, does not handle all use cases, and can be misleading.

What should the tag be replaced with?

## Current state

When linuxkit builds a package with `linuxkit pkg build`, it automatically generates a tag that is a sha256 hash. By default, that tag is composed of the results of `git ls-tree` on the directory.

For example, on the most recent commit to `master` as of this writing of `pkg/init`:

```sh
$ lkt pkg show-tag pkg/init 
linuxkit/init:680da6e6f79bb8236a095147d532cd2160e23c9f

$ git ls-tree --full-tree HEAD -- ./pkg/init
040000 tree 680da6e6f79bb8236a095147d532cd2160e23c9f    pkg/init
```

In addition, if there are changes that are not committed - i.e. the git tree for that directory is dirty - it adds the word `dirty` and the digest of the  file contents:

```sh
$ touch pkg/init/foo
$ &#10007; lkt pkg show-tag pkg/init              
linuxkit/init:680da6e6f79bb8236a095147d532cd2160e23c9f-dirty-35f1311
```

The appended digest is given by listing all of the files in the tree and then sha256 digesting them.

## What is missing

The following are missed:

* build-arg-files that are outside the directory, e.g. `lkt pkg build --build-arg-file /tmp/foo ./pkg/init`.
* files in the directory that are not git committed. Arguably this case can be ignored, as someone is choosing explicitly to avoid git.
* contents that are not determinable, e.g. `ADD https://example.com/foo`. There is no way to know that the contents of that URL have changed.
* dynamically generated build args, e.g. those related to the platform, or the special `REL_*` linuxkit ones

## Purpose of the digest tag

The digest tag has two purposes. 

First, primarily, it serves as a way to check if anything has changed, such that there is need to rebuild. All of the things listed as "missing" above fit within that category. Something has changed, yet `lkt pkg build` cannot detect that something has changed.

Second, it has some element of provenance: given an artifact (OCI image) can I get to the source? This assumes that, given an `ls-tree` output digest, you can find that exact state again, which is somewhat questionable. However, this part is secondary, because every `lkt pkg build` also adds a label to the image with the git commit and repository that generated the image. This should be enough for provenance; if not, it should be fixed here. This leaves just the first issue to be resolved.

@justincormack uses the terms "input hash" and "output hash" for these. 

In terms of the first, a key goal is to be able to determine the tag of the input tree without rebuilding or even calling buildkit to rebuild it. Solely based on the input, we should be able to:

1. Determine the value of the tag/hash/identifier
2. Use that identifier to determine if it exists in the cache or registry, and therefore if it needs to be rebuilt

## Some possible future avenues

### Directory contents plus build args 

One possibility is to include all of the file contents as well as build args, generated and static, files or CLI flags, into a single digest. This has nothing to do with git commands or even if files are checked in, it is just the content. Change a build arg = rebuild an image; change a file = rebuild an image.

The whole `dirty` just goes away (although we could keep it if it is helpful).

Like most digests, they are one-way hashes. There would be no way to go from hash-to-source, but the git commit label is for that; you always can go from source-to-hash. 

It would benefit us to do a better job capturing the source info in labels, like the build args or the CLI flags used.

### buildkit

Another approach might be to figure out what buildkit does to determine if something needs to be rebuilt and adopt it, maybe as a library.

I would be hesitant to actually use buildkit, as we prefer to be able to determine tags and such purely via CLI.

Other approaches?

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

redesigning the pkg tag #4180

Background

Current state

What is missing

Purpose of the digest tag

Some possible future avenues

Directory contents plus build args

buildkit

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

redesigning the pkg tag #4180

Description

Background

Current state

What is missing

Purpose of the digest tag

Some possible future avenues

Directory contents plus build args

buildkit

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions