Let's build the Python interpreter, and learn about $ORIGIN
An interesting title could have also been: Lets build A Python interpreter (à la https://github.com/RustPython/RustPython). That's a bit more involved, so instead I will write out about building THE One and Only CPython interpreter from source, and why would one want to do it.
Premise
I have to premise this post by saying that probably not many industries need to go this route.
I worked in VFX/CG for almost 10 years now and I'm fully convinced that software development in Computer Graphics pipelines has very particular challenges to address that you won't find in many other industries (maybe games comes close). One of the many challenges is to try your best to stick to https://vfxplatform.com to avoid headaches.
VFX Reference platform
-
At its best.. the VFX Reference Platform is a community effort involving DCC vendors (like SideFX/Foundry/etc.) and software developers to minimize incompatibilities between libraries adopted, so that shared libraries can effectively be.. shared.
-
At its worst.. it's a wall of shame for libraries that can't keep a good hygiene of their internal symbols, and a testimony that packaging C++ libs is still a mess in 2025.
Folks will have very different opinions on what the VFX reference platform really is at its core, but at the end of the day, on a practical level, I take it for what it does: trying to design a common software environment that suits the needs of the whole CG industry.
Python
One big components of the VFX ref platform is the glue of all VFX/CG pipelines in the world: Python.
That's because most DCCs ship a Python interpreter, and allow customization via Python scripts placed in specific directories. Since every year the world evolves (because we evolve and not devolve, right..? right ? ) every year the version of the Python interpreter that will run within your pipe will evolve too (for example the upcoming CY2025 suggests 3.11, while CY2023 was 3.10).
Of course, not every studio adopts the VFX ref pipeline of that year. Most lag behind a few years. But at the end adoption is inevitable, if anything because artists will scream for feature X or bugfix Y in that new version of a DCC.
Ideally by now it should be a bit more apparent why Python is such a big deal. So now I'll cover a bit my experience of being effective at trying to keep up with the VFX Reference requirements for Python.
Seeking independence
The main way that I've seen studios keep up to date with the Python version requirements is very simple: using whatever the OS packages/bundles for you. If that means using python3.8
in 2025 because Rocky Linux only packaged that.. so be it.
Sadly, if you go this route you will need to delay adoption of a specific VFX ref year until those specs match the ones of the packagers of your distro (if you're lucky). Or, alternatively, you will embrace newer VFX ref years without delay but you'll have to keep tools used outside DCCs as their own special case, since they might run on much older versions of Python compared to what DCCs ship with.
All of this means giving to the distro packagers waaay more responsibility than they should have. Which is not to say distro packagersa are bad folks - quite the opposite, I think they're unsung heroes of the OSS community. BUT I feel that being able to move at your own pace is key to truly having control of your stack, and avoiding tech debt. Especially for such a core package like Python.
Which leads me to the core idea of this post.
BAKE YOUR OWN PYTHON
photo credits to https://www.pexels.com/photo/woman-in-white-sweater-baking-cake-3992206/Trust me, it's easier than it looks! And fun! And empowering!
The best thing about CPython is that it's written in C, and it's almost fully self-contained. The one ugly dependency you'll have to worry about is OpenSSL, but other than that, it's mostly batteries included.
I tinkered with the best recipe to build Python for my last 2 gigs, and, just as the traditional recipe for Ciambelline al Vino (Wine Cookies), I think I finally nailed it.
One essential requirement I give myself is that building and packaging should be 2 different steps. This seems like an essential division, but you'll be surprised how many systems tend to consider the build&package step as a single non-divisible one.
In this post I'll be focusing on just the building, so that the instructions will work no for any packaging tool you're using (Rez, SPK, or in-house).
Ingredients
photo credits to https://www.pexels.com/photo/pastries-on-wooden-tray-357628/I am a weirdo and I like using Docker as a build tool, leveraging multi-stage images. I know some folk don't like it, but I have my reasons.
I mostly like Docker because:
- Removes the need to have a dedicated physical host for building stuff
- Ideally anybody can build in the exact same way from any OS, as long as they have Docker (for example, I routinely build x86 Python versions from an arm64 macOS machine)
- I can use Dockerfile cache to my own advantage while doing tests (which saves A LOT of time)
- It has many benefits of containerization: it's self contained and doesn't spam the host filesystem
Since the ASWF authors Docker images, it's also very easy to target a specific production environment.
For example, aswf/ci-base:2024.1
uses Rocky 8.8.
Proving this claim is easy enough, since running this:
sudo docker run -ti aswf/ci-base:2024.1 cat /etc/redhat-release
will spit out
[...redacted..]
Rocky Linux release 8.8 (Green Obsidian)
This image also hosts GNU's libc 2.28:
$ docker run -ti aswf/ci-base:2024.1 ldd --version
[..redacted..]
ldd (GNU libc) 2.28
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
You can avoid using Docker if you prefer.
Other than that, we'll need a few tarballs. Say that we're building 3.9, for example, then we'll need:
- Python itself: https://www.python.org/ftp/python/3.9.19/Python-3.9.19.tgz
- OpenSSL: https://www.openssl.org/source/openssl-3.0.0.tar.gz
Additionally, depending on what you choose to rely on from your system install, you might need also:
- Zlib: https://github.com/madler/zlib/releases/download/v1.2.13/zlib-1.2.13.tar.gz
- Readline: https://ftp.gnu.org/gnu/readline/readline-8.2.tar.gz
Note that the url for the Python tarball was obtained from https://www.python.org/downloads/release/python-3919.
Recipe
Your mileage might vary here, depending how you much wine you like in your Ciambelline.
But the gist is that, like all old school GNU stuff, you'll first want to prep all the ingredients via the ./Configure
script.
OpenSSL
For example, we prep OpenSSL like this:
tar -xvf openssl-3.0.0.tar.gz
rm -f openssl-3.0.0
cd openssl.3.0.0
./Configure \
linux-x86_64 shared \
--openssldir=$WHERE_YOULL_HAVE_OPENSSL \
--prefix=$WHERE_YOULL_INSTALL_OPENSSL
make depend && make -j
Keep in mind that --openssldir
refers to the directory where OpenSSL will live on the target's filesystem where you install Python, not the one you are building on.
Once OpenSSL has been built, we can build Python itself.
Python
A similar idea applies regarding the prep steps. Extraction:
tar -xvf Python-3.9.19.tgz
rm -f Python-3.9.19.tgz
cd Python-3.9.19
..and compilation/install:
export LD_RUN_PATH="$WHERE_YOULL_INSTALL_OPENSSL/lib64"
export LDFLAGS="-L $WHERE_YOULL_INSTALL_OPENSSL/lib64"
export CPPFLAGS="-I $WHERE_YOULL_INSTALL_OPENSSL/include"
./configure \
--with-system-ffi \
--prefix=${PYTHON_INSTALL_PREFIX} \
--enable-optimizations \
--with-openssl=${WHERE_YOULL_HAVE_OPENSSL} \
--enable-shared \
--enable-loadable-sqlite-extensions \
-C
make -j && make test && make install
All of the configure
flags are documented in https://docs.python.org/dev/using/configure.html so I won't go over them.
Note how we temporarily set the RUNPATH
to point to the custom build of OpenSSL that we ran before.
This will let the linker link against this custom version without relying on other, older, OpenSSL versions that might or not be installed on the system (if you're using Docker that's less of a problem, but it still provides more control).
This is one of the tricks that costed me a loooot of googling and trials to get the invocation right!
If you want to know more about LD_RUN_PATH
, we can look at man ld
:
- On an ELF system, for native linkers, if the -rpath and -rpath-link options were not used, search the contents of the environment variable "LD_RUN_PATH".
This moves us a few steps in the direction of a fully "relocatable" Python3 build, but it doesn't get us completely there. The reason why is beautifully exposed in this CPython issue: https://github.com/python/cpython/issues/111514. More on that soon.
Regarding the --prefix
, it's worth noting that it's where make
will install it in the build host, but be aware that (unless you tweak the shared libraries in the way that I'll show you) it will also be where you will be expected to install python on the target hosts running it.
I'm also building it as a shared library so that other libraries (like OpenUSD) can link against it. But otherwise you're free to avoid that to build it statically (which simplifies some of the work we'll have to do later!).
Make the build relocatable
What does relocatable really means in this context ?
The idea is that the whole build can be installed everywhere on the filesystem, since all the required bits (like shared libraries) are found at runtime via paths relative to the $PREFIX itself (e.g. libraries would be found via ../lib/mylibrary.so
, not /opt/Python3/lib/mylibrary.so
). To achieve that, the build should also be as self contained as possible, aka: everything we need should be installed in that $PYTHON_INSTALL_PREFIX
.
This means that ideally, you could plop the final Python3 dir tree into USB drive and still have a fully working interpreter that you can invoke from anywhere. More practically speaking, it allows you to freely change ideas on where to install the whole Python $PREFIX tree without having to constantly rebuild it.
Until now, the build will be MOSTLY relocatable.
That's because, in my experience, setting the LD_RUN_PATH
only affects the interpreter binary, but not all of the other libraries (see the CPython issue 111514 mentioned before for a longer read).
Achieving an actual relocatable build.. but without help from the build system requires us to tap into the power of patchelf
.
patchelf
and I are best friends now. That's because this tiny binary allows us to tweak the RUNPATH of an ELF.. post-build.
So what we're gonna attempt to do with patchelf
now is to tweak the RUNPATH of all shared libraries required by our dear Python interpreter.
This is one of the biggest hacks that I have ever pulled, but it's also the one that paid off the most in terms of practical convenience.
I am not saying that this is The Right Way™. to do it, but it's One (Working) Way to do it.
For example, if we didn't use the LD_RUN_PATH
approach, we could have just run this command to tweak the RUNPATH of the interpreter, post-build:
patchelf --set-rpath "\$ORIGIN/../lib" "${WHEREVER_YOU_HAVE_INSTALLED_PYTHON}/bin/python3"
What this^ does effectively is tell python to first look in $ORIGIN
when searching for shared libraries.
Note that we had to escape the $
to avoid it being expanded as the var $ORIGIN
by the shell itself.
But what does $ORIGIN
resolve to ?
Let's ask man ld
:
[..redacted..] The tokens $ORIGIN and $LIB can appear in these search directories. They will be replaced by the full path to the directory containing the program or shared object in the case of $ORIGIN and either lib - for 32-bit binaries - or lib64 - for 64-bit binaries - in the case of $LIB. [..redacted..]
Basically: if you're running python3
from /my/funky/path/bin
then $ORIGIN
will resolve at runtime to /my/funky-path/bin
.
Now we just need to rinse and repeat a similar command to target all the shared libraries, since they also link against other shared libraries that need to be found at runtime.
Also: do you remember that we built OpenSSL before ? This is the time where we need to tell libpython3.9.so
how to find that OpenSSL
at runtime!
In my case, I like copying the previously built OpenSSL into $PYTHON_INSTALL_PREFIX/deps/openssl
, so we'll make $ORIGIN
relative to that too.
This is how a second pass of patching could look like:
cd "${PYTHON_INSTALL_PREFIX}/lib" || exit 1
for f in *.so; do
patchelf "$f" --set-rpath "\$ORIGIN/../deps/openssl/3.0.0/lib64"
done
Basically, it targets all libs installed in the lib
directory, like libpython3.9.so
itself.
Finally.. we'll need to target those extra pesky libs living in lib-dynload
so they can find OpenSSL and the other libs too.
This directory is where things like cpython-39-x86_64-linux-gnu.so
will live, which source SSL directly when you do a import ssl
. Don't ask me how I know it 🙂 .
This third pass might look like this:
cd "${PYTHON_INSTALL_PREFIX}/lib/python3.9/lib-dynload" || exit 1
for f in *.so; do
patchelf "$f" --set-rpath "\$ORIGIN/../../../lib:\$ORIGIN/../../../deps/openssl/3.0.0/lib64"
done
And TA-DA!
If we did all the steps correctly, we now have a Python3 build that is independent and self-contained as possible. Or at least, as self-contained as the brain of this hacker here can make it be.
Which means that we're free to package and install it however we want, and move on with our lives. More importantly, we have a recipe to build it any time we want, and move to any newer version that we want to move to without having to wait for somebody else (distro/OS) to build it and package it for us.
Footnotes
If you're working on macOS, there is a binary similar to patchelf
, called install_name_tool
. They work in different ways, especially since macOS has more choice in terms of $ORIGIN and allows you to target a @loader_path, et cetera. Overall, the gist is the same.