Porting a Linux-based binary PyPI Package to Mac OS X

Hext is my little library that has (simple) language bindings for Python, i.e. you are able to use Hext within a Python project.

I have previously documented the process of bringing hext to PyPi (i.e. pip install hext), but that was only for Linux-based systems.

These are my notes on porting my python builds to Mac OS X.

Note This is actually my first time screwing around with any Apple product! The primary target audience is ‘Me in 6 Months’, so YMMV.

My Setup

I’ve bought this quaint little machine off of Ebay:

  • Macbook Pro from Mid-2009
  • 2.26GHz Intel Core 2 Duo, 4GB DDR3, 128GB SSD
  • Highest supported version of OS X for this machine: 10.11.6 — El Capitan

Assuming that later versions of Mac OS X are downwards binary compatible, El Capitan is the ideal build target, because it is the oldest version of Mac OS X that is still supported.

Side story: I erased the hard drive, assuming that I can reinstall the OS with the builtin recovery system. But it turns out that if your Apple-ID is brand new, the App Store refuses the request to download the OS and you’re stuck. So I contacted Apple Support and got an appointment at a “Genius Bar”. They happily installed the OS, despite me having never been a customer, and the device being 10 years old — I initially assumed they would tell me to go pound sand, but nah! All free of charge. One funny little thing though: I was just done configuring the OS and was packing up my stuff when loud music started playing in the store and the employees hastily formed two lines at the exit, like a guard of honor, chanting and clapping their hands. Either a colleague was leaving for good, or maybe this is a daily occurrence, you know, Apple being Apple :)

Getting comfortable with Mac OS X

  • When downloading gzipped packages, Safari seems to automatically gunzip the archive. That makes it impossible to verify checksums or signatures. This can be disabled in Safari’s preferences dialog.
  • Bash shortcuts like [alt]+[.] or [alt]+[d] are accessible as [esc],[.] or [esc],[d] in the default terminal emulator.
  • Mac OS X uses .bash_profile instead of .bashrc.
  • There’s no ldconfig but update_dyld_shared_cache.
  • There’s no ldd but otool -L <binary>.
  • To know the minimum Mac OS X version a binary was built for, run otool -l <binary> and look for LC_VERSION_MIN_MACOSX.
  • There’s no readlink.

Installing Xcode

As I understand it, installing Xcode is the equivalent of apt install build-essentials on Debian-based systems.

Xcode can be installed through the App Store. Unless of course your system isn’t officially supported by Xcode anymore, like OS X 10.11 in my case.

Luckily Apple provides downloads for older versions of Xcode.

For Mac OS X 10.11 the required packages are named:

  • Xcode 8.2.1
  • Command Line Tools (macOS 10.11) for Xcode 8.2

Installing Xcode is as simple as dragging the unpacked folder into “Applications” and launching Xcode. The “Command Line Tools” ship with an installer.

Building Hext’s Dependencies

Building on Mac OS X is pretty much straightforward. I only had to adjust some stuff from my previous notes on building Hext statically on Linux-based systems.

  • Build & install GCC. No changes.
  • Build & install ICU4C. See below.
  • Build & install Boost. No changes.
  • Install rapidjson. No changes.
  • Build & install Gumbo. No changes.
  • Build & install Google Test. No changes.
  • Install CMake. See below.
  • Build & install SWIG with PCRE. See below.

Side Question: Why not use Homebrew? Homebrew would be perfect, but:

  • Homebrew does not support Mac OS X 10.11 anymore
  • I need static libraries, not dynamic ones
  • These libraries will get statically linked into a dynamic library (the resulting Hext python module).

Building ICU4C

Boost.Regex depends on ICU4C for Unicode support.

# --with-library-bits=64
#   Configure fails if the argument '--with-library-bits=64' is omitted.
#   I think it tries to build both 32bit and 64bit versions, and it
#   cannot find a suitable compiler for 32bit if gcc was configured
#   with '--disable-multilib'.
# CXXFLAGS="-fPIC":
#   The statically built libicu will end up in the Python module,
#   which is a shared library, and therefore needs position independent code.
$ CXXFLAGS="-fPIC" CFLAGS="-fPIC" ./configure --enable-static \
    --disable-shared --prefix=/usr/local --with-library-bits=64
$ make && make install

Getting CMake

The CMake project provides official binaries for CMake on Mac OS X 10.7 or later. After installing, the cmake utility is found in /Applications/CMake.app/Contents/bin/cmake.

Swig and PCRE

Place the latest PCRE source tarball in Swig’s source directory and run ./Tools/pcre-build.sh. Then configure and build Swig the usual way.

OpenSSL

A Python dependency. Only needed for testing.

$ ./Configure --prefix=/usr/local --openssldir=/usr/local/etc/openssl \
    shared darwin64-x86_64-cc
$ make && make install
# copy system certificate chain
$ security find-certificate -a -p \
    /System/Library/Keychains/SystemRootCertificates.keychain \
    > /usr/local/etc/openssl/cert.pem

GNU Readline

A Python dependency. Not necessary, but convenient.

./configure --prefix=/usr/local && make && make install

Building Python 2.7, 3.4, 3.5, 3.6 and 3.7

Side Question: Why not use pyenv? I haven’t used pyenv before, but if I understand correctly, pyenv doesn’t let me install two instances of the same Python version. Specifically, I need two instances of the latest Python 2.7:

  • One with narrow Unicode support (--enable-unicode=ucs2), such as the version that is shipped with Mac OS X 10.11.
  • And one with wide Unicode support (--enable-unicode=ucs4)

These two Pythons are ABI incompatible, therefore I need to compile a module for each version.

My build script:

#!/usr/bin/env bash

set -e

perror_exit() { echo "$@" >&2 ; exit 1 ; }

[[ $# -lt 1 ]] && {
  echo "Usage: $0 <python-tarball...>"
  exit
}

normalize_version() {
  echo "$1" | grep -Eo '\d+\.\d+\.\d+' | head -n1
}

major_version() {
  echo "$1" | grep -Eo '\d+' | head -n1
}

abi_tag() {
  echo "$1" | awk -F. '{printf "cp%d%d-cp%d%dm", $1, $2, $1, $2}'
}

build_python() {
  local package="$1"
  local build_dir="$2"
  local install_dir="$3"
  local compile_flags="$4"

  pushd . >/dev/null
  [[ -d "$build_dir" ]] || mkdir "$build_dir"
  tar -x --strip-components=1 -f "$package" -C "$build_dir"
  cd "$build_dir"
  pwd
  ./configure --disable-shared --prefix="$install_dir" $compile_flags
  make -j2
  make install
  [[ -e "$install_dir/bin/python3" ]] && [[ ! -e "$install_dir/bin/python" ]] && {
    ln -s "$install_dir/bin/python3" "$install_dir/bin/python"
  }
  "$install_dir/bin/python" "$GET_PIP_PY" --force-reinstall
  [[ -e "$install_dir/bin/pip3" ]] && [[ ! -e "$install_dir/bin/pip" ]] && {
    ln -s "$install_dir/bin/pip3" "$install_dir/bin/pip"
  }
  popd >/dev/null
}

BUILD_DIR="$(pwd)"
INSTALL_DIR="$HOME/python-build"
GET_PIP_PY="$(pwd)/get-pip.py"

[[ -f "$GET_PIP_PY" ]] || curl https://bootstrap.pypa.io/get-pip.py -o "$GET_PIP_PY"

while [[ $# -gt 0 ]] ; do
  package="$1"
  version=$(normalize_version "$package")
  abi_tag=$(abi_tag "$version")
  build_dir="$BUILD_DIR/$version-m"
  install_dir="$INSTALL_DIR/$abi_tag"

  if [[ $(major_version "$version") -eq 2 ]] ; then
    build_python "$package" "$build_dir" "$install_dir" "--enable-unicode=ucs2"
    build_python "$package" "${build_dir}u" "${install_dir}u" "--enable-unicode=ucs4"
  else
    build_python "$package" "$build_dir" "$install_dir" ""
  fi

  shift
done

Building A Python Module

The linker complains that there are unresolved symbols (libpython), because it cannot know that this shared object is linked at runtime of the Python interpreter, when libpython is guaranteed to be available. This is fixed by passing -Wl,-undefined,dynamic_lookup.

My script for building the python modules for all installed python versions:

#!/usr/bin/env bash

set -e

CMAKE_APP="/Applications/CMake.app/Contents/bin/cmake"
CMAKE_MAKE_FLAGS="-j2"

perror_exit() { echo "$1" >&2 ; exit 1 ; }

PYTHON_BUILDS_DIR="$HOME/python-build"
[[ -d "$PYTHON_BUILDS_DIR" ]] || perror_exit "cannot access python build directory (expected '$PYTHON_BUILDS_DIR')"
ASSETD="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )/assets"
[[ -d "$ASSETD" ]] || perror_exit "cannot access asset directory (expected '$ASSETD')"
OUTD="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )/output"
[[ -d "$OUTD" ]] || perror_exit "cannot access output directory (expected '$OUTD')"

HEXTD=$(mktemp -d)
git clone "https://github.com/thomastrapp/hext.git" "$HEXTD"

LIBHEXTD="$HEXTD/libhext"
cd "$LIBHEXTD/build"
$CMAKE_APP -DBUILD_SHARED_LIBS=Off -DCMAKE_POSITION_INDEPENDENT_CODE=On -DCMAKE_EXE_LINKER_FLAGS=" -static-libgcc -static-libstdc++ " ..
make $CMAKE_MAKE_FLAGS
sudo make install

cd "$LIBHEXTD/test/build"
$CMAKE_APP -DBUILD_SHARED_LIBS=Off ..
make $CMAKE_MAKE_FLAGS
./libhext-test

cd "$HEXTD/build"
$CMAKE_APP -DUSE_SYSTEM_LIBHEXT=On -DCMAKE_EXE_LINKER_FLAGS=" -static-libgcc -static-libstdc++ " ..
make $CMAKE_MAKE_FLAGS
sudo make install

HTMLEXT="/usr/local/bin/htmlext" "$HEXTD/test/blackbox.sh" "$HEXTD/test/case/"*hext

PYTHOND="$LIBHEXTD/bindings/python"
cd "$PYTHOND"
for i in "$PYTHON_BUILDS_DIR/"cp* ; do
  V=$(basename $i)
  mkdir $V
  cd $V
  mkdir -p wheel/hext
  cp "$ASSETD/setup.py" "$ASSETD/README.md" "$ASSETD/MANIFEST.in" "$ASSETD/gumbo.license" "$ASSETD/rapidjson.license" wheel/

  PYTHON_PATH=$(cd "$i/"include/*/ && pwd)
  $CMAKE_APP -DCMAKE_CXX_FLAGS=" -static-libgcc -static-libstdc++ -Wl,-undefined,dynamic_lookup " -DPYTHON_INCLUDE_DIR="$PYTHON_PATH" -DBUILD_SHARED_LIBS=On ..
  make $CMAKE_MAKE_FLAGS
  cat hext.py\
    | sed '/^# This file was automatically generated by SWIG/,/^del _swig_python_version_info$/d'\
    | cat <(echo "from . import _hext") -\
    > wheel/hext/__init__.py
  cp _hext.so wheel/hext

  mkdir wheel/bin
  cp /usr/local/bin/htmlext wheel/bin
  strip wheel/bin/htmlext

  cd wheel
  "$i/bin/python" setup.py bdist_wheel

  WHEEL=$(find . -iname "*.whl")
  [[ -f "$WHEEL" ]] || perror_exit "cannot find wheel (*.whl)"
  cp "$WHEEL" "$OUTD"
  cd ../..
done

Minimal Dependencies

Use otool -L to list dependencies on dynamically shared objects:

$ otool -L _hext.so 
  _hext.so:
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)
$ otool -L htmlext
  htmlext:
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)

libSystem.B.dylib is the equivalent of libc, if I understand correctly. This library is present on all versions of Mac OS X and is backwards compatible, and therefore a safe dependency.

Updated: