Use Custom Software in Jobs Using Apptainer
Linux containers are a way to build a self-contained environment that includes software, libraries, and other tools. This guide shows how to submit jobs that use Apptainer containers.
Introduction
HTCondor supports the use of Apptainer (formerly known as Singularity) container environments for jobs on the High Throughput Computing system.
Container jobs are able to take advantage of more of CHTC’s High Throughput resources because the operating system where the job is running does not need to match the operating system where the container was built.
Table of Contents
Start Here
To run a job using an Apptainer container, you will need access to an Apptainer
image file, usually with the suffix .sif
.
If you have an existing .sif
file, go straight to Use an Apptainer Container in HTC Jobs.
If you do not have an existing .sif
container, you can create one in two ways:
- If you have no container to start from: Build your own Apptainer container
- If you have an existing Docker container image: Convert Docker Images to Apptainer Images
Once you have created the .sif
file by using one of the above methods
ready, circle back to Use an Apptainer Container in HTC Jobs.
For more details about using Apptainer, see Suggestions for testing and More details about HTCondor and Apptainer.
Use an Apptainer Container in HTC Jobs
If you or a group member have already created the Apptainer .sif
file, or are using a container from reputable sources such as the OSG, add one of these options to your HTCondor submit file
to add it to your HTC job:
-
Option 1: If the
.sif
file is in a/home
directory:container_image = path/to/my-container.sif
-
Option 2: If the
.sif
file is in a/staging
directory:container_image = file:///staging/path/to/my-container.sif
-
Option 3: If the
.sif
file is in a/staging
directory AND you are usingwant_campus_pools
orwant_ospool
:container_image = osdf:///chtc/staging/path/to/my-container.sif
The full submit file otherwise looks like normal, for example:
# apptainer.sub
# Provide HTCondor with the name of your .sif file
container_image = file:///staging/path/to/my-container.sif
executable = myExecutable.sh
# Include other files that need to be transferred here.
# transfer_input_files = other_job_files
log = job.log
error = job.err
output = job.out
requirements = (HasCHTCStaging == true)
# Make sure you request enough disk for the container image in addition to your other input files
request_cpus = 1
request_memory = 4GB
request_disk = 10GB
queue
More details about how HTCondor integrates with Apptainer are in More details about HTCondor and Apptainer.
Build your own Apptainer container
If you need to build your own Apptainer container (.sif
file), the
process looks like this:
TBD: graphic
- Create a definition file. The definition file describes a starting software environment in the first two lines and then what to add to it.
- Start an interactive job for building. We require that you build containers while in an interactive build job.
- Build the container. To build a container, Apptainer uses the instructions in the
.def
file to create a.sif
file. The.sif
file is the compressed collection of all the files that comprise the container. - (Optional): Test the container. Once the image (
.sif
file) is created, it is important to test it to make sure you have all software, packages, and libraries installed correctly. - Move the container to a persistent location. We recommend placing the image file into your
/staging
folder
Create a definition file
To create your own container using Apptainer, you will need to create a definition (.def
) file. CHTC provides example definition files in the software
folder of our Recipes GitHub repository.
We strongly recommend that you use one of the existing examples as the starting point for creating your own container.
For the purposes of this guide, we will call
the definition file image.def
.
📖 Learn More About Definition Files
For more details about the definition file see: The Apptainer Definition File
Start an interactive build job
Building a container can be a computationally intense process, so we require that you build containers while in an interactive build job.
On the High Throughput system, you can run the following commands to start an interactive job that includes your definition file:
chtc-submit-apptainer-build -build image.def
condor_submit -i apptainer-build.sub
Note that this submit file assumes you have a definition file named image.def
in the same directory as the submit file.
Build your container
Once the interactive build job starts, confirm that your image.def
was transferred to the current directory, by running the ls
command.
To build your container, run this command:
apptainer build my-container.sif image.def
Feel free to rename the .sif
file as you desire; for the purposes of this guide we are using my-container.sif
.
As the command runs, a variety of information will be printed to the terminal regarding the container build process.
Unless something goes wrong, this information can be safely ignored.
Once the command has finished running, you should see INFO: Build complete: my-container.sif
.
Using the ls
command, you should now see the container file my-container.sif
.
ls
Troubleshooting Tip: Killed Jobs
Apptainer
.sif
files can be fairly large, especially if you have a complex software stack. If your interactive job abruptly fails during the build step, you may need to increase the value ofrequest_disk
in the submit file generated bychtc-submit-apptainer-build
In this case, the.log
file should have a message about the reason the interactive job was interrupted.
Troubleshooting Tip: Error Messages
If the build command fails, examine the output for error messages that may explain why the build was unsuccessful. Typically there is an issue with a package installation, such as a typo or a missing but required dependency. Sometimes there will be an error during an earlier package installation that doesn’t immediately cause the container build to fail. But, when you test the container, you may notice an issue with the package.
If you are having trouble finding the error message, edit the definition file and remove (or comment out) the installation commands that come after the package in question. Then rebuild the image, and now the relevant error messages should be near the end of the build output.
For more information on building Apptainer containers, see our Building an Apptainer Container guide.
Test your container
Once your container builds successfully, we highly encourage you to immediately test the container while still in the interactive build session.
To test your container, use the command
apptainer shell -e my-container.sif
You should see your command prompt change to Apptainer>
.
When you are finished running commands inside the container, run the command exit
to exit the container.
exit
Your prompt should change back to something like [username@build4000 ~]$
.
For more details about testing, see Suggestions for testing.
Move the container .sif file to staging
Since Apptainer .sif
files are routinely more than 1GB in size, we recommend that you transfer my-container.sif
to your /staging
directory.
It is usually easiest to move the container file directly to staging while still in the interactive build job:
mv my-container.sif /staging/$USER
If you do not have a /staging
directory, you can skip this step and the .sif
file will be automatically transferred back to the login server when you exit the interactive job.
We encourage you to request a /staging
directory, especially if you plan on running many jobs using this container.
See our Managing Large Data in Jobs guide for more information on using staging.
At this point, you can use the container in jobs, as described above.
Suggestions for testing
As always with the High Throughput system, we recommend submitting a single test job and confirming that your job behaves as expected. If there are issues with the job, you may need to modify your executable, or even (re)build your own container.
In an interactive job, run:
apptainer shell -e my-container.sif
The shell
command logs you into a terminal “inside” the container, with access to the libraries, packages, and programs that were installed in the container following the instructions in your image.def
file.
(The -e
option is used to prevent this terminal from trying to use the host system’s programs.)
While “inside” the container, try to run your program(s) that you installed in the container.
Typically it is easiest to try to print your program’s “help” text, e.g., my-program --help
.
If using a programming language such as python3
or R
, try to start an interactive code session and load the packages that you installed.
If you installed your program in a custom location, consider using ls
to verify the files are in the right location.
You may need to manually set the PATH
environment variable to point to the location of your program’s executable binaries.
For example,
export PATH=/opt/my-program/bin:$PATH
Consult the “Special Considerations” section of our Building an Apptainer Container guide for additional information on setting up and testing your container.
Also see the section below for how to fully emulate the behavior of an HTCondor job interactively.
More details about HTCondor and Apptainer
From the user’s perspective, a container job is practically identical to a regular job. The main difference is that instead of running on the execute point’s default operation system, the job is run inside the container.
When you submit a job to HTCondor using a submit file with container_image
set, HTCondor automatically handles the process of obtaining and running the container. You do not need to include any apptainer
commands in your executable
file. The process looks roughly like
- Claim machine that satisifies submit file requirements
- Pull (or transfer) the container image
- Transfer input files, executable to working directory
- Run the executable script inside the container, as the submit user, with key directories mounted inside (such as the working directory, /staging directories, etc.)
- Transfer output files back to the submit server
For testing purposes, you can replicate the behavior of a container job with the following command.
First, start an interactive job.
Then run this command but change my-container.sif
and myExecutable.sh
to the names of the .sif
and .sh
files that you are using:
apptainer exec \
--scratch /tmp \
--scratch /var/tmp \
--workdir $(pwd) \
--pwd $(pwd) \
--bind $(pwd) \
--no-home \
--containall \
my-container.sif \
/bin/bash myExecutable.sh 1> job.out 2> job.err
The container image can be placed in multiple locations in CHTC and still used in the job.
If the .sif
file is located on the login server, you can use
container_image = my-container.sif
although we generally don’t recommend this, since .sif
files are large and should instead be located in staging.
Therefore, we recommend using
container_image = file:///staging/path/to/my-container.sif
If you are using want_campus_pools
or want_ospool
as described in our Scale Beyond Local HTC Capacity guide, then you should instead use:
container_image = osdf:///chtc/staging/path/to/my-container.sif
to enable transferring of the .sif
file via the OSDF to compute capacity beyond CHTC.