Show HN: Morgan – PyPI Mirror for Restricted/Offline Environments

Mirroring PyPI packages for environments/networks that do not have access to the Internet is hard. It's actually hard even in environments that do have access to the Internet. Most solutions out there either:

1. Depend on pip to download and cache package distributions. This means those downloads will probably only work in a similar environment (same Python interpreter, same libc), because of the nature of binary package distributions and the fact that packages have optional dependencies for different environments.

2. Depend on other PyPI packages, meaning installing the mirror in a restricted environment in itself is too difficult.

3. Cannot resolve dependencies of dependencies, meaning mirroring PyPI partially is extremely difficult, and PyPI is huge.

Morgan works differently. It creates a mirror based on a configuration file that defines target environments (using Python's standard Environment Markers specification from PEP 345) and a list of package requirement strings (e.g. "requests>=2.24.0"). It downloads all files relevant to the target environments from PyPI (both source and binary distributions), and recursively resolves and downloads their dependencies, again based on the target environments. It then extracts a single-file server to the mirror directory that works with Python 3.7+, has no outside dependencies, and implements the standard Simple API. This directory can be copied to the restricted network, through whatever security policies are in place, and deployed easily with a simple `python server.py` command.

I should note that Morgan can find dependencies from various metadata sources inside package distributions, including standard METADATA/PKG-INFO/pyproject.toml files, and non-standard files such as setuptools' requires.txt.

There's more information in the Git repository. If this is interesting to you, I'll be happy to receive your feedback.

Thanks!

idop
60
15
6d
GITHUB.COM

Comments

skbly7 6d
Thanks for creating it and looking forward to try it out.

I have been looking for similar solution and the whitelist used to fail with other tools as they weren't resolving the dependencies.

mofeing 6d
hey,

We were running with the same problem (supercomputer with clusters of different architecture and no outgoing connections permitted) and so we created "pypickup" [1,2]. nice to see that we came with similar solutions! I have some questions:

1. is the directory of packages you create compatible with the PEP 503? (so I can use `--index-url file://PATH_TO_LOCAL_CACHE` flat with pip and it should work)

2. is there some filtering mechanism? e.g. we are not interested in non-release versions ("dev" versions, "rc" versions, "post" versions, ...)

3. I guess that the way morgan resolves dependencies is by manually parsing files like "pyproject.toml" or "requirements.txt" and it does not ask the build-system for the dependencies. if so...

   - does "morgan" detect build-dependencies?

   - which build-systems are compatible?

   - is "morgan" capable of detecting more complex dependency specifications? e.g. "oldest-supported-numpy" which is used by "spicy" has dependency strings like the following: numpy==1.19.2; python_version=='3.8' and platform_machine=='aarch64' and platform_python_implementation != 'PyPy'
kudos for the good work

[1] https://pypi.org/project/pypickup/ [2] https://github.com/UB-Quantic/pypickup

Galanwe 6d
Maybe I'm confused about what this offers, but I have been running private pypi repositories for a decade now, and it never required more than running an HTTP server with directory listing.

As for doing partial mirroring of pypi with only what you are using, is that really a good idea anyway? it will break whenever you add or change any dependency.

hackish 6d
Thanks for posting this. I'm going to give setting up Morgan a shot when I've got some free cycles.

I'd hesitantly accepted the risk of serving a devpi server over vsock and into my (personal) restricted VLAN. I did so because using a shared folder meant I'd need have cached the module and any dependencies from my internet-connected VLAN first.

Combined with debmirror[0], vscodeoffline[1], and some nightly snatcher shell scripts, I think I have most of my needs covered.

[0] https://help.ubuntu.com/community/Debmirror

[1] https://github.com/LOLINTERNETZ/vscodeoffline

jvolkman 6d
This looks similar to some Bazel rules I'm working on. I'm also using the approach of defining target environments up front [1], but the main difference is that I'm currently offloading the actual resolution process to Poetry or PDM, which both generate cross-platform lock files.

But Poetry and PDM don't add build dependencies to lock files - which I need - so I'm thinking of building a custom resolved.

Did you consider using resolvelib [2], which is what underlies both pip and PDM?

[1] https://github.com/jvolkman/rules_pycross/blob/main/examples...

[2] https://github.com/sarugaku/resolvelib

indrora 6d
Oh neat. Not only do I share a name with a project, it's a project I was seriously thinking of starting.