Since the beginning of its existence, Azure Artifacts has taken a firm stance on the immutability of the packages we store. We get a lot of questions about that stance, esp. from engineers newer to package and artifact management.
In this post, I’ll provide some history and stories from within Microsoft that led us to take such a firm stance, cover how you can avoid the issues we ran into, and talk about some of the options we’ve considered to enable new workflows that are today blocked by the immutability guarantee.
Caches, caches, and more caches
Most package managers, including NuGet, keep package caches on your local machine to make package restores faster. As NuGet’s docs say, “By using the cache and global-packages folders, NuGet generally avoids downloading packages that already exist on the computer, improving the performance of install, update, and restore operations.” When you’re using a single package source (like nuget.org), caching packages is a sensible idea and a great performance improver. However, these caches assume that package IDs and versions are globally unique and immutable, so that every time I download, say, Newtonsoft.Json 11.0.2, I get the exact same content. That’s immutability.
What happens if I break that immutability promise? It’s pretty easy to do – I can add a private feed to my nuget.config and then publish to it a modified version of, say, Newtonsoft.Json 11.0.2. It seems silly, but I’ve seen it happen at Microsoft multiple times, esp. when we maintained large, centralized NuGet servers that were used by teams across different divisions. A team would find a package that didn’t meet their requirements (wasn’t built for their version of .NET, had DLLs for too many versions of .NET and was thus too large, wasn’t strong-name signed, to name a few), download it, repack it to their specifications, and republish it at the same version. Then, NuGet.org would have one copy of the package and our internal server would have a different copy that claimed the same package ID and version.
Even if your team doesn’t create modified packages, you can be subject to the whims of other teams. If you use build agents that are shared by 10s or 100s of builds, as many of our powerful internal build agents are, you can be affected if another build on the same agent relies on a package feed containing a modified package. By default, build agents also rely on a single global package cache, which will become polluted as soon as the agent runs its first build that pulls in a modified package. If another team pollutes the cache, your build can start failing in hard to debug ways. And, it can become even harder to debug if you have multiple agents in a shared agent pool. In this case, a build from one agent will fail because it’s using the modified package from the polluted cache, but a build (of the same code!) from another agent in the same pool succeeds because it’s using the original package in a clean cache.
Deterministic restores and error-free builds
Thankfully, it’s easy to avoid all these hard-to-debug issues. For the most part, immutability issues don’t appear until you either start depending on multiple private package feeds or you start reusing build agents across multiple products (which may each reference a different private package feed). So, you’re all set if you:
- Develop and build a single product (e.g. a single .sln or single package.json) at a time and only use the main public package registry (e.g. nuget.org or npmjs.com); this is what most new projects do when they only need packages from the main public package registry
- Develop and build a single product (e.g. a single .sln or single package.json) at a time and use a private package feed that’s configured according to our best practices and has upstream sources enabled; this enables you to use both private packages created by your team and public packages from the main public package registries
By default, all newly-created feeds in Azure Artifacts have upstream sources enabled.
Now, if you’re developing multiple products in parallel or sharing a build agent pool across multiple products, you need to think about your package cache. To ensure that the package cache isn’t corrupted with modified packages, you can either
- ensure that all products you’re building (on your developer machine or on the shared agent pool) are configured with the public feed or a single private feed with upstream sources, as described above the image, or
- clear the package cache (with nuget locals all -clear, dotnet locals all –clear ornpm cache clean –force) when switching between products on your developer machine and before every build using the shared agent pool
The latter option will significantly increase build times and I generally don’t recommend it.
Despite all these concerns around immutability, there are still valid and interesting justifications for wanting to re-use a package ID and version. For example, several internal teams use file shares or other NuGet servers that allow them to republish a set of packages several times, test and fix bugs, and then release the set of packages with a well known, sensibly incremented version number.
To support those types of scenarios, we’ve considered building “sandbox feeds”. These feeds would bypass the immutability protections found on regular feeds and allow you to re-use a package ID and version as many times as you want. We would prevent sandbox feeds from being added as upstream sources to regular feeds, but it would be up to you to ensure that sandbox feeds weren’t used in shared build agent pools.
Would you use sandbox feeds? Have other ideas for how we might maintain the immutability guarantee but enable these types of testing and validation scenarios? I’d love to hear from you in the comments.