Show HN: Morgan – PyPI Mirror for Restricted/Offline Environments
1. Depend on pip to download and cache package distributions. This means those downloads will probably only work in a similar environment (same Python interpreter, same libc), because of the nature of binary package distributions and the fact that packages have optional dependencies for different environments.
2. Depend on other PyPI packages, meaning installing the mirror in a restricted environment in itself is too difficult.
3. Cannot resolve dependencies of dependencies, meaning mirroring PyPI partially is extremely difficult, and PyPI is huge.
Morgan works differently. It creates a mirror based on a configuration file that defines target environments (using Python's standard Environment Markers specification from PEP 345) and a list of package requirement strings (e.g. "requests>=2.24.0"). It downloads all files relevant to the target environments from PyPI (both source and binary distributions), and recursively resolves and downloads their dependencies, again based on the target environments. It then extracts a single-file server to the mirror directory that works with Python 3.7+, has no outside dependencies, and implements the standard Simple API. This directory can be copied to the restricted network, through whatever security policies are in place, and deployed easily with a simple `python server.py` command.
I should note that Morgan can find dependencies from various metadata sources inside package distributions, including standard METADATA/PKG-INFO/pyproject.toml files, and non-standard files such as setuptools' requires.txt.
There's more information in the Git repository. If this is interesting to you, I'll be happy to receive your feedback.
I have been looking for similar solution and the whitelist used to fail with other tools as they weren't resolving the dependencies.
We were running with the same problem (supercomputer with clusters of different architecture and no outgoing connections permitted) and so we created "pypickup" [1,2]. nice to see that we came with similar solutions! I have some questions:
1. is the directory of packages you create compatible with the PEP 503? (so I can use `--index-url file://PATH_TO_LOCAL_CACHE` flat with pip and it should work)
2. is there some filtering mechanism? e.g. we are not interested in non-release versions ("dev" versions, "rc" versions, "post" versions, ...)
3. I guess that the way morgan resolves dependencies is by manually parsing files like "pyproject.toml" or "requirements.txt" and it does not ask the build-system for the dependencies. if so...
kudos for the good work
- does "morgan" detect build-dependencies? - which build-systems are compatible? - is "morgan" capable of detecting more complex dependency specifications? e.g. "oldest-supported-numpy" which is used by "spicy" has dependency strings like the following: numpy==1.19.2; python_version=='3.8' and platform_machine=='aarch64' and platform_python_implementation != 'PyPy'
As for doing partial mirroring of pypi with only what you are using, is that really a good idea anyway? it will break whenever you add or change any dependency.
I'd hesitantly accepted the risk of serving a devpi server over vsock and into my (personal) restricted VLAN. I did so because using a shared folder meant I'd need have cached the module and any dependencies from my internet-connected VLAN first.
Combined with debmirror, vscodeoffline, and some nightly snatcher shell scripts, I think I have most of my needs covered.
But Poetry and PDM don't add build dependencies to lock files - which I need - so I'm thinking of building a custom resolved.
Did you consider using resolvelib , which is what underlies both pip and PDM?