I've written a number of python packages over time for my own use really and I'm so snowed under just getting it done that every time I stopped to consider sharing one, so it could be pip
installed, I searched on-line and immediately got lost among the tens, hundreds maybe, of guides, tutorials and options and what looked like enough material for a doctoral dissertation to wade through. And all of it looked scary talking about starting from scratch (not from an existing package), wanting me to add and write a dozen files and understand this and that, and worse there are old ways, new ways, alternate ways ... Aaargh.
But hey, we're in a snap 3 day lockdown here (thanks COVID!) and I was tidying up some projects and tried again ... this time, experience in hand, (and less hair) I figured I would put my blinkers on and use just the official tutorial:
https://packaging.python.org/tutorials/packaging-projects/
And I was mildly pleased. I guess my expectations had dropped but hey, it proved manageable and I did succeed in publishing a few packages. But even this tutorial left me wasting far too much time trying to work out how stuff works and I wanted to write it down ASAP, for my own sake, and well, here we are ... share it.
So here's a better (IMHO) guide to packaging your Python Project and publishing it (better because it says what I wanted to know, and I'm making different mistakes and it has it's own shortcomings that I don't notice ;-).
1. Start with a package
Yep, I already have packages ... a good few of them. I want to publish them. Anyhow a package of course is just a folder with an __init__.py
file in it, and if it's a small package (as many of mine are) that's all they are. Nothing more, nothing less, than a folder with an __init__.py
in it that provided some classes and/or functions.
Sometimes there might be a few more .py
files in the folder beside the __init__.py
. Minor detail, just what happens when it's a little too big to fit conveniently into the one file.
Key thing here is we're not starting from scratch, but have a package.
2. Get the tools
You only need two tools as it happens and they are:
build
- which creates from your package by default a.tar.gz
file and a.whl
file which is what pip wants/needs for it'sinstall
ing career.twine
- which publishes to pypi.org
and so for prep:
pip install build twine
Not too bad. Step 2 was easy.
3. Prep the few extra files a package wants
Not many, don't fret.
README.md
- a simple markdown file with a welcome message and whatever you want to add. What is this package, how do you use it? I use Typora but you can write it in any text editor and it can be as brief or in depth as you like. It is what's shown on the pypi.org page for your package so it's also your ad if you like for your package.LICENSE.md
- Not sure you need it but worth doing and easy as Py. I am beastly careless in this space and just love the Hippocratic License. Download the markdown version and save it asLICENSE.md
That's it! And to think it seemed so scary in past.
4. But no! There's more - just a little more, not much.
The tutorial recommends that you lay out your folder like this (yes, I've simplified it a bit):
the-folder-I-keep-it-in/
├── LICENSE
├── README.md
├── pyproject.toml
├── setup.cfg
└── src/
└── my_package/
└── __init__.py
The things to note are:
You don't need to put it in a
src
folder, but why not? If it ain't broke don't fix it. Thesrc
strategy means you can just drag and drop you package from where it was into src ... done. The stuff above it is the publishing kit ...pyproject.toml
andsetup.cfg
just tellbuild
andtwine
what to do. We'll come back to these shortly. The first is just a standard file to tellbuild
that we're going to usetwine
in a roundabout way ;-) and the second one describes your package sotwine
can publish it (purists may argue with this neat division, but let them).The folder
the-folder-I-keep-it-in
can have any name you like. won't change a thing with the build or publish. I actually call itmy-package
(in this example). As to why, keep reading. It's just convenient that's all.The folder
my_package
should use underscores between words, yes, do it. There's a bizarre confusion in the Python world betweenmy_package
andmy-package
.
my_package
and my-package
? What, why, when?
This is described nowhere, and I had to work this out with a lot of trial an error and hair pulling alas. But here's what I got for you.
my_package
: just stick to this don't waver, never waver, use only this ;-). I kid ye not. Usingmy-package
in either the folder undersrc
or insetup.cfg
will cause you grief during the build, publish, install and test.Once it's published it will appear on pypi.org as
my-package
and people will install it withpip install my-package
, but use it withimport my_package
. That's just the way it is, that's the convention, don't rock the boat, all you need to know is you don't have to lift a finger to make that happen, just stick withmy_package
in thesrc
folder and insetup.cfg
.But of course,
the-folder-I-keep-it-in
is irrelevant here and I call itmy-package
just because, because that's what the package is called. The only other exception is the github repo if you're using one (and I do), that too can bemy-package
and is in my case in fact later you'll se I can exploit that for a nice two line install script.
5. pyproject.toml
and setup.cfg
pyproject.toml
is easy. Just copy the standard. Put this in it:
[build-system]
requires = [
"setuptools>=42",
"wheel"
]
build-backend = "setuptools.build_meta"
and be done with it. Ask no more. It's build
internals and unless you're super keen in digging deeper, let it rest, this just means when you run python3 -m build
in your package folder, it knows what to do (if you don't have this file it will ask for one). What it does, is created a dist
folder and drops two files in it. These are what twine needs
setup.cfg
is not hard either and here's my minimalist take and the clarifications that I felt were missing elsewhere:
[metadata]
name = my_package
version = 0.1
author = my name
author_email = my email address
description = My little package
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/me/my-package
project_urls =
Bug Tracker = https://github.com/me/my-package/issues
classifiers =
Programming Language :: Python :: 3
License :: Freely Distributable
Operating System :: OS Independent
Development Status :: 4 - Beta
Framework :: Django :: 3.2
Intended Audience :: System Administrators
Topic :: Software Development :: Libraries :: Python Modules
[options]
install_requires =
other_package1 >= 0.1.1
other_package2 >= 2.0.1
package_dir =
= src
packages = find:
python_requires = >=3.6
[options.packages.find]
where = src
And here's what I felt I should have known:
-
name
should usemy_package
notmy-package
. Just believe me. Things go weird if it saysmy-package
. Experiment if you like, I wish I didn't need to and the tutorial was clear here. -
install_requires
wants one indented line per requirement with relatively familiar syntax (similar topip freeze
- another one of those mysteriously named python commands that actually meanspip show-me-whats-installed
). This is completely missed in the tutorial. -
package_dir
is weird, yes, but forget it. Likeinstall_requires
it has a list of one liners beneath it, in this case just one. The one liners map package names to folders somehow in the internal complexities of setuptools - details most of don't care about or want to know about when publishing our simple one file package. The tutorial tells us that this line maps the "noname" package to thesrc
folder, and that the "noname" package (that nothingness before the = sign) is a code name for the overarching root package, so thesrc
folder becomes the mystical "root package". Do most of us actually care about this? What is a "root package"? anyhow. Nah, let's leave it for the boffins, and just accept this is the odd way of tellingbuild
and/ortwine
that our package is in thesrc
folder. - there's nothing missing after
find:
. No. That's just the syntax, live with it. Refer back to the intro, re: my sentiments on the unnecessary befuddling cryptic nature of Python package publication ... Ditto thewhere = src
, just accept it. - The classifiers are bit fiddly they have to come from the list of allowed classifiers. And they bothersomley lack a clear way of saying you're using the Hippocratic License (which I just happen to love).
6. The Importance and Catches with Testing
Publishing is as simple as:
python3 -m twine upload dist/*
BUT, it's committal. Once you've published there appears to be no way of undoing it and it consume the filenames you used (which means also the version
you have in setup.cfg
as these get built into the filenames in dist
).
And so, testing first is critical. And pypi.org provide testpypi
at https://test.pypi.org/ that you can publish to freely, as often as you need to get it right.
The main things that demand a retry are in my experience:
You look at it on pypi and
README.md
has issues. Either typos, or code lines that are too long and render badly etc. Either way, you get see how it's going to presented on pypi and can adjust your README to look nice.Your test installing it with pip doesn't work. Which actually doens't happen now that I have a workflow, but happened a lot while Iw as trying to work all that
setup.cfg
syntax out that the tutorial deigns to gloss over.
To publish to the test site it's just small variant:
python3 -m twine upload --repository testpypi dist/*
The catches
So testing is great. A lifesaver. But it caused me some modest grief too (the flip side of the same coin).
Firstly you need to create an account on the site, and I did that but use Bitwarden always, and generate large random passwords for me - a habit (that we should all have).
twine
when used as above prompts for username and password. Alas these long random passwords of mine are not easy to type, so I usually do a copy/paste but alas pasting the password does not work - I tried and tried.
Fortunately they can be provided on the command line as in:
python3 -m twine upload --repository testpypi -u $username -p $password dist/*
and I saved this in a file called test-publish
that reads:
#!/bin/bash
source ~/.auth/pypi.auth
python3 -m twine upload --repository testpypi --verbose -u $username -p $password dist/*
Secondly, you can't republish. At all. You need to increment the version
in setup.cfg
and rebuild before you can republish. Slows things down some. Not least because of the time and energy spent searching online for ways and means to republish. Some on-line sources suggest --skip-existing
does the trick, but it doesn't - not for me and it's not clear what it does or what it's for and maybe I just misread that. C'est la vie.
Thirdly, the dependencies listed under install_requires
in setup.cfg
don't work, presumably because, when testing the required packages aren't on https://test.pypi.org/. But it took a bit of head scratching and try and try again to convince myself of that, as I was trying believe it or not to validate the syntax for just that setting as it's not described in the tutorial and sent me looking at that warren of other sources quickly again. I do wish that testpyi would look at pypi for requirements as a fallback so this test cycle could be complete.
7. A Standard Workflow
OK, so having gone through that all now, like most folk eventually do, I have a standard template (the last package I published). I now routinely use five tiny little two line shell scripts to make life easy for myself.
Basically a build script, and two publish and install scripts.
In order:
A script to build:
build
:
#!/bin/bash
rm dist/*
python3 -m build
A script to test publishing:
test-publish
:
#!/bin/bash
source ~/.auth/pypi.auth
python3 -m twine upload --repository testpypi --verbose -u $username -p $password dist/*
A script to install the test publish (test installing) - noting that errors here about requirements that cannot be met are expected:
test-install
:
#!/bin/bash
package=$(basename $(dirname $(readlink -f "$0")))
python -m pip install --index-url https://test.pypi.org/simple/ $package
A script to publish properly:
publish
:
#!/bin/bash
source ~/.auth/pypi.auth
python3 -m twine upload --verbose -u $username -p $password dist/*
A script to install the package properly:
install
:
#!/bin/bash
package=$(basename $(dirname $(readlink -f "$0")))
python -m pip install $package
A basic example of that together you can visit at:
https://github.com/bernd-wechner/django-model-admin-fields
and see here:
https://pypi.org/project/django-model-admin-fields/
I hope that helps someone save all the learning hassle, and publish something easily, by just adding 4 files to a folder (a README.md
to write, a LICENSE.md
to download, a pyproject.toml
to copy, and setup.cfg
to tune) and maybe 5 tiny little helper bash scripts and in no time a test and then publish cycle is underway.
Top comments (6)
The reason you can use an underscore but not a hyphen in the name is that it must be a valid identifier for the import statement, and hyphens are not allowed in Python identifiers. The directory name inside src must match so must not use hyphens or any other characters that are disallowed in identifiers.
As for why hyphens are otherwise preferred over underscores for the name used to install from pypi, I'm not as certain but I have a guess.... It is likely due to the general preference of hyphens instead of underscores in URLs. See for example these guidelines from Google: developers.google.com/search/docs/.... They don't explain why that is the preference but it could be related to the fact that underscores are not allowed in domains, so even though they are allowed in the rest of a URL you have greater visual consistency if you also avoid them in the rest of a URL.
This isn't unique to Python. I use Java more than Python. Java package names and module names can use underscores but not hyphens. Generally, when you publish Java artifacts to Maven Central, the artifact name is often the same as either the Java module (if modules are in use) or a Java package contained in the artifact except using hyphens rather than underscores if you have a reason to use either. I'm not actually sure if underscores are actually disallowed in artifact names or if it is a strong convention to use hyphens. The file name of a jar on Maven Central includes the artifact name, the version, and an identifier all separated by hyphens, so by using hyphens in an artifact name when separation is needed looks nicer since it is consistent with rest of filename.
It also has the benefit that if you have a site dedicated to it that you can use the artifact name in the domain if you use hyphens, which you can't do if there were underscores. Here is an example.... I have a Java library named
rho-mu
(with a hyphen) so the artifact name and corresponding jar file uses a hyphen. But the jar contains a Java module namedrhu_mu
with an underscore. The website for the project uses the artifact name in the domain:https://rho-mu.cicirello.org
. An underscore would not have been allowed there even though it could otherwise be used elsewhere in the URL.Thanks for the considered appraisal. It is indeed likely that the
-
norms arise out of a need for URLs so the package name for example is needed in a URL like: github.com/bernd-wechner/my-packageThat said, you misread me a little in that it it is not the specifics of the wherefores and why's that are my central observation or complaint, so much as the enormous unnecessary complexity that would-be contributors are exposed to to this day, not least in a language that is currently at the arguable peak of popularity.
But as you've given a moment to specifics I will add some of my specific test results that led me to pull my hair out and write down these notes (and those results which I did not include as the article is long enough as is). Consider the two configurables, the name of the folder under
src
and the same declared in setup.cfg. There are 4 variations to explore of-
vs_
use and there are two outputs frombuild
, a.tar.gz
, and a.whl
file. Put this in the context of the official tutorial in which:That is they use
-
not_
. bear that in mind as you examine these fourbuild
outputs:Using:
src/my_package
andname = my_package
Produces:
my_package-0.1.tar.gz
andmy_package-0.1-py3-none-any.whl
Using:
src/my-package
andname = my_package
Produces:
my_package-0.1.tar.gz
andmy_package-0.1-py3-none-any.whl
Using:
src/my_package
andname = my-package
Produces:
my-package-0.1.tar.gz
andmy_package-0.1-py3-none-any.whl
Using:
src/my-package
andname = my-package
Produces:
my-package-0.1.tar.gz
andmy_package-0.1-py3-none-any.whl
Key observations:
_
-
never works and yes is the recommended name int he official tutorial_
) the second publishes fine but cannot be installed and used. Go figure.In conclusion the official tutorial, one of the last havens we have in a world that will produce is replete (as noted in my intro) an already befuddling number of tutes and more a cacophony of research material (to which I've only added I admit) is both a) wrong (suggests using a name that does not work) and b) completely ignores the issue (let along others, like how to defined requirements).
On top of which, befuddling to me is how that tutorial provides no ready clues on how to contribute to improving it. In so many other context today, such material is anything from an open wiki to sporting feedback buttons or notes on how to help improve the documentation. This one is wrong and the best it offers is a tiny "Found a bug?" link in the footer that jumps to an Issues list at:
github.com/pypa/packaging.python.o...
Given we've come this far (and are still in lockdown here ;-). I may just look at filing an issue or PRing a fix over there for the doc.
But the bemusement goes further. Setuptools for example have (finally) evolved to the point where you can use just a
setup.cfg
file with a basicsetup.py
assumed if it's missing. Next step will be forbuild
to simply assume that basicpyproject.toml
if it's missing. For one of the most popular languages and community based ones at that it would be nice if sharing packages became much much easier.Wow. That's weird that options that won't work actually produce something and in some cases even publish to pypi. If you try to do the equivalent in Java, either directory name or package name or both, you'll get syntax errors when you compile.
Totally agree. It's rather frustrating how complex it is and moreover that the official tutorial suggests something that plain doesn't work.
Not being allowed to replace or remove a version is also not just a pypi thing. Maven Central also doesn't allow this. Once it is public, other packages might depend on it. Removing or even replacing it can then break other people's projects.
That's all good and well, and easy enough to understand but still falls short of awesome ;-). There's public and there's public. In the extreme, there's public and got lots of people using it, and there's public just published now and ooops, made a mistake, let's fix it.
To help with the latter cast testpypi was born and that rocks! And yet it falls short of awesome too as we cannot test the
install_requires
there (that could be fixed by having pip more smartly try pypi if testpyi doesn't have a package - easily generalised to if repository is testX and aninstall_requires
package cannot be find try the repository X).But pypi could also be smarter. Allowing for two steps like many publishing media do. Push to pypi (visible publicly perhaps, maybe installable only with your account credentials) and then Releasing, making fully public. OR alternately keeping track of all installs (downloads and from where the request came) and if there are no downloads from source IPs different to the one that uploaded, then allow an overwrite (an oops style fix).
All just thoughts in the stunning and still very surprising complexity of publishing Python packages.