Unverified Commit ce3c6709 authored by Jana Schor's avatar Jana Schor Committed by GitHub
Browse files

Merge pull request #11 from bernt-matthias/gitlab-docker-ci

Update github sync workflow, switch to docker, add gitlab CI for building docker container and push to registry
parents 25451264 279e54ee
Pipeline #141711 passed with stages
in 19 minutes and 46 seconds
name: Build DFPL Singularity Container
on: [ release ]
jobs:
build:
name: Install Base System
runs-on: ubuntu-20.04
steps:
- name: Set up Go 1.13
uses: actions/setup-go@v1
with:
go-version: 1.13
id: go
- name: Install Dependencies
run: |
sudo apt-get update && sudo apt-get install -y build-essential \
libssl-dev \
uuid-dev \
libgpgme11-dev \
squashfs-tools \
libseccomp-dev \
wget \
pkg-config \
git \
cryptsetup
sudo apt-get clean
- name: Install Singularity
env:
SINGULARITY_VERSION: 3.9.5
GOPATH: /tmp/go
run: |
mkdir -p $GOPATH
sudo mkdir -p /usr/local/var/singularity/mnt && \
mkdir -p $GOPATH/src/github.com/sylabs && \
cd $GOPATH/src/github.com/sylabs && \
wget -qO- https://github.com/sylabs/singularity/releases/download/v${SINGULARITY_VERSION}/singularity-${SINGULARITY_VERSION}.tar.gz | \
tar xz && \
cd singularity && \
./mconfig -p /usr/local && \
make -C builddir && \
sudo make -C builddir install
cd ..
rm -rf singularity
- name: Fetch DFPL Code from GitHub
uses: actions/checkout@v1
- name: Build DFPL Singularity Container
env:
SINGULARITY_RECIPE: singularity_container/dfpl.def
OUTPUT_CONTAINER: dfpl.sif
run: |
docker system prune --all --force --volumes
if [ -f "${SINGULARITY_RECIPE}" ]; then
cd $(dirname ${SINGULARITY_RECIPE})
sudo singularity build ${OUTPUT_CONTAINER} $(basename ${SINGULARITY_RECIPE})
echo "${SINGULARITY_RECIPE} built successfully"
else
echo "${SINGULARITY_RECIPE} is not found."
echo "Present working directory: $PWD"
fi
- name: Deploy DFPL Container to Sylabs
env:
SINGULARITY_TOKEN: ${{ secrets.SINGULARITY_TOKEN }}
SINGULARITY_CONTAINER: singularity_container/dfpl.sif
run: |
echo ${SINGULARITY_TOKEN} | singularity remote login
singularity remote status
singularity push -U ${SINGULARITY_CONTAINER} library://mai00fti/default/dfpl.sif:latest
name: Push changes to Gitlab
on: [ push ]
on:
release:
types: [released]
jobs:
job1:
sync_to_hzdr:
runs-on: ubuntu-latest
steps:
- name: Install Dependencies
run: |
sudo apt-get update && sudo apt-get install -y build-essential\
git
sudo apt-get clean
- uses: actions/checkout@v3
with:
ref: '${{ github.event.repository.default_branch }}'
fetch-depth: 0
- name: Push to GitLab
env:
ACCESS_TOKEN: ${{secrets.GITLAB_ACCESS_TOKEN}}
run: |
git clone -b master https://gitlab.hzdr.de/Normo/deepFPlearn.git/
cd deepFPlearn
git config --global user.name $GITHUB_ACTOR
git remote add github https://github.com/yigbt/deepFPlearn.git
git pull github master
git push --tags https://Normo:$ACCESS_TOKEN@gitlab.hzdr.de/Normo/deepFPlearn.git/
git push https://Normo:$ACCESS_TOKEN@gitlab.hzdr.de/Normo/deepFPlearn.git/
git remote add hzdr https://SYNC_FROM_GITHUB:${{ secrets.HZDR_ACCESS_TOKEN }}@gitlab.hzdr.de/department-computational-biology/deepfplearn.git
git remote -v
git push --tags hzdr "${{ github.event.repository.default_branch }}"
image: docker:20.10.9-alpine3.14
stages:
- build
- deploy
services:
- docker:20.10.9-dind-alpine3.14
before_script:
# docker login asks for the password to be passed through stdin for security
# we use $CI_JOB_TOKEN here which is a special token provided by GitLab
- echo -n $CI_JOB_TOKEN | docker login -u gitlab-ci-token --password-stdin $CI_REGISTRY
#- echo $DockerHub_Token | docker login -u $DockerHub_User --password-stdin $DockerHub_Registry
build:
stage: build
only:
- tags
timeout: 2 hours
## use tags to get the correct runner, that has docker installed
tags:
- docker
- dind
script:
# builds the project, passing proxy variables, and vcs vars for LABEL
# the built image is tagged locally with the tag, and then pushed to
# the GitLab registry
- >
docker build
--build-arg VCS_REF=$CI_COMMIT_REF_NAME
--build-arg VCS_URL=$CI_PROJECT_URL
--no-cache
--tag $CI_REGISTRY_IMAGE/deepfplearn:$CI_COMMIT_REF_NAME
-f container/Dockerfile
.
# run tests
- docker run $CI_REGISTRY_IMAGE/deepfplearn:$CI_COMMIT_REF_NAME pytest /deepFPlearn/tests/
- docker run $CI_REGISTRY_IMAGE/deepfplearn:$CI_COMMIT_REF_NAME dfpl --help
# push in the gitlab registry
- docker push $CI_REGISTRY_IMAGE/deepfplearn:$CI_COMMIT_REF_NAME
deploy latest:
stage: deploy
only:
- tags
variables:
# We are just playing with Docker here.
# We do not need GitLab to clone the source code.
GIT_STRATEGY: none
tags:
- docker
- dind
script:
# Because we have no guarantee that this job will be picked up by the same runner
# that built the image in the previous step, we pull it again locally
- docker pull $CI_REGISTRY_IMAGE/deepfplearn:$CI_COMMIT_REF_NAME
# Then we tag it "latest"
- docker tag $CI_REGISTRY_IMAGE/deepfplearn:$CI_COMMIT_REF_NAME $CI_REGISTRY_IMAGE/deepfplearn:latest
# Push it.
- docker push $CI_REGISTRY_IMAGE/deepfplearn:latest
......@@ -6,35 +6,66 @@ Link molecular structures of chemicals (in form of topological fingerprints) wit
The DFPL package requires a particular Python environment to work properly.
It consists of a recent Python interpreter and packages for data-science and neural networks.
The exact dependencies can be found in this
[`environment.yml`](singularity_container/environment.yml) which is the basis for creating
a conda environment and a Singularity container.
The exact dependencies can be found in the
[`requirements.txt`](requirements.txt) (which is used when installing the package with pip)
and [`environment.yml`](environment.yml) (for installation with conda).
You have several ways to provide the correct environment to run code from the DFPL package.
1. Use the automatically built Singularity container
2. Build your own Singularity container [following the steps here](singularity_container/build_container.md)
3. Set up a conda environment
1. Use the automatically built docker/Singularity containers
2. Build your own container [following the steps here](container/README.md)
3. Setup a python virtual environment
4. Set up a conda environment install the requirements via conda and the DFPL package via pip
In the following, you find details for option 1. and 3.
In the following, you find details for option 1., 3., and 4.
### Docker container
You need docker installed on you machine.
In order to run DFPL use the following command line
```shell
docker run --gpus GPU_REQUEST registry.hzdr.de/department-computational-biology/deepfplearn/deepfplearn:TAG dfpl DFPL_ARGS
```
where you replace
- `TAG` by the version you want to use or `latest` if you want to use latest available version)
- You can see available tags here https://gitlab.hzdr.de/department-computational-biology/deepfplearn/container_registry/5827.
In general a container should be available for each released version of DFPL.
- `GPU_REQUEST` by the GPUs you want to use or `all` if all GPUs should be used (remove `--gpus GPU_REQUEST` if only the CPU should)
- `DFPL_ARGS` by arguments that should be passed to DFPL (use `--help` to see available options)
In order to get an interactive bash shell in the container use:
```shell
docker run -it --gpus GPU_REQUEST registry.hzdr.de/department-computational-biology/deepfplearn/deepfplearn:TAG bash
```
### Singularity container
You need Singularity installed on your machine, and for details, please read the first
section of [this document](singularity_container/build_container.md).
You can download the latest version of the container with
You need Singularity installed on your machine. You can download a container with
```shell
singularity pull --arch amd64 library://mai00fti/default/dfpl.sif:latest
singularity pull dfpl.TAG.sif docker://registry.hzdr.de/department-computational-biology/deepfplearn/deepfplearn:TAG
```
After that, you can run DFPL with
- replace `TAG` by the version you want to use or `latest` if you want to use latest available version)
- You can see available tags here https://gitlab.hzdr.de/department-computational-biology/deepfplearn/container_registry/5827.
In general a container should be available for each released version of DFPL.
This stores the container as a file `dfpl.TAG.sif` which can be run as follows:
```shell script
singularity run --nv dfpl.sif "python -m dfpl convert -f \"data\""
singularity run --nv dfpl.TAG.sif dfpl DFPL_ARGS
```
- replace `DFPL_ARGS` by arguments that should be passed to DFPL (use `--help` to see available options)
- omit the `--nv` tag if you don't want to use GPUs
or you can start a shell script (look at [run-all-publication-cases.sh](scripts/run-all-publication-cases.sh) for an
or you can start a shell script (look at [run-all-cases.sh](scripts/run-all-cases.sh) for an
example)
```shell script
......@@ -44,16 +75,31 @@ singularity run --nv dfpl.sif ". ./example/run-multiple-cases.sh"
It's also possible to get an interactive shell into the container
```shell script
singularity shell --nv dfpl.sif
singularity shell --nv dfpl.TAG.sif
```
**Note:** The Singularity container is intended to be used on HPC cluster where your ability to install software might
be limited.
For local testing or development, setting up the conda environment is preferable.
### Set up conda environment
### Set up DFPL in a python virtual environment
From within the `deepFPlearn` directory call
```
virtualenv -p python3 ENV_PATH
. ENV_PATH/bin/activate
pip install ./
```
replace `ENV_PATH` by the directory where the python virtual environment should be created.
If your system has only python3 installed `-p python3` may be removed.
In order to use the environment it needs to be activated with `. ENV_PATH/bin/activate`.
### Set up DFPL in a conda environment
To use this tool outside the Singularity container first create the respective conda environment:
To use this tool in a conda environment:
1. Create the conda env from scratch
......@@ -61,7 +107,7 @@ To use this tool outside the Singularity container first create the respective c
contains all information and necessary packages
```shell
conda env create -f singularity_container/environment.yml
conda env create -f environment.yml
```
2. Activate the `dfpl_env` environment with
......@@ -73,7 +119,7 @@ To use this tool outside the Singularity container first create the respective c
3. Install the local `dfpl` package by calling
```shell
conda develop dfpl
pip install --no-deps ./
```
## Prepare data
......@@ -125,16 +171,16 @@ You have several options to work with the DFPL package. The package can be start
you can provide all necessary information as commandline-parameters. Check
```shell script
python -m dfpl --help
python -m dfpl train --help
python -m dfpl predict --help
dfpl --help
dfpl train --help
dfpl predict --help
```
However, using JSON files that contain all train/predict options an easy way to preserve what was run and you can use
them instead of providing multiple commandline arguments.
```shell script
python -m dfpl train -f path/to/file.json
dfpl train -f path/to/file.json
```
See, e.g. the JSON files under `validation/case_XX` for examples. Also, you can use the following to create template
......
# This file is based on
# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/dockerfiles/dockerfiles/gpu.Dockerfile#L104
# https://github.com/tensorflow/tensorflow/commit/02956a52930bea96f57401d39a834e13047bad9a
# instead of installing tensorflow-gpu, dfpl is installed which included tensorflow-gpu (2.6)
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
#
# THIS IS A GENERATED DOCKERFILE.
#
# This file was assembled from multiple pieces, whose use is documented
# throughout. Please refer to the TensorFlow dockerfiles documentation
# for more information.
ARG UBUNTU_VERSION=20.04
ARG ARCH=
ARG CUDA=11.2
FROM nvidia/cuda${ARCH:+-$ARCH}:${CUDA}.1-base-ubuntu${UBUNTU_VERSION} as base
# ARCH and CUDA are specified again because the FROM directive resets ARGs
# (but their default value is retained if set previously)
ARG ARCH=x86_64
ARG CUDA=11.2
ARG CUDNN=8.1.0.77-1
ARG CUDNN_MAJOR_VERSION=8
ARG LIB_DIR_PREFIX=x86_64
ARG LIBNVINFER=7.2.2-1
ARG LIBNVINFER_MAJOR_VERSION=7
# Let us install tzdata painlessly
ENV DEBIAN_FRONTEND=noninteractive
# Needed for string substitution
SHELL ["/bin/bash", "-c"]
RUN sh -c 'echo "APT { Get { AllowUnauthenticated \"1\"; }; };" > /etc/apt/apt.conf.d/99allow_unauth'
RUN apt -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true update
RUN apt-get install -y curl wget
RUN apt-key del 7fa2af80
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
RUN dpkg -i cuda-keyring_1.0-1_all.deb
RUN rm -f /etc/apt/sources.list.d/cuda.list /etc/apt/apt.conf.d/99allow_unauth cuda-keyring_1.0-1_all.deb
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC F60F4B3D7FA2AF80
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-${CUDA/./-} \
libcublas-${CUDA/./-} \
cuda-nvrtc-${CUDA/./-} \
libcufft-${CUDA/./-} \
libcurand-${CUDA/./-} \
libcusolver-${CUDA/./-} \
libcusparse-${CUDA/./-} \
curl \
libcudnn8=${CUDNN}+cuda${CUDA} \
libfreetype6-dev \
libhdf5-serial-dev \
libzmq3-dev \
pkg-config \
software-properties-common \
unzip
# Install TensorRT if not building for PowerPC
# NOTE: libnvinfer uses cuda11.1 versions
RUN [[ "${ARCH}" = "ppc64le" ]] || { apt-get update && \
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub && \
echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/tensorRT.list && \
apt-get update && \
apt-get install -y --no-install-recommends libnvinfer${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda11.0 \
libnvinfer-plugin${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}+cuda11.0 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*; }
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda-11.0/targets/x86_64-linux/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# Link the libcuda stub to the location where tensorflow is searching for it and reconfigure
# dynamic linker run-time bindings
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 \
&& echo "/usr/local/cuda/lib64/stubs" > /etc/ld.so.conf.d/z-cuda-stubs.conf \
&& ldconfig
# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8
RUN apt-get update && apt-get install -y \
python3 \
python3-pip
RUN python3 -m pip --no-cache-dir install --upgrade \
"pip<20.3" \
setuptools
# Some TF tools expect a "python" binary
RUN ln -s $(which python3) /usr/local/bin/python
# copy dfpl sources for installation
# unfortunatelly COPY setup.py README.md dfpl/ /deepFPlearn/
# does not work sind it just copies the files in dfpl
COPY ./ /deepFPlearn/
# install dfpl
RUN python -m pip install --no-cache-dir /deepFPlearn && pip install --no-cache-dir pytest
# The code to run when container is started.
CMD ["bash"]
# docker run dfpl:latest dfpl DFPL_ARGUMENTS # to run dfpl
# docker run -it dfpl:latest # and an interactive shell
#
# if using the container like an executable, ie
# `docker run dfpl:latest DFPL_ARGUMENTS`
# then change the last line in entrypoint.sh to just `dfpl $@`
# this does not easily allow to obtain an interactive shell
# maybe with --entrypoint?
LABEL author="Jana Schor, Patrick Scheibe, Matthias Bernt"
LABEL version="1.0"
\ No newline at end of file
# Building the docker container for DFPL
The docker container is built automatically for the latest version of DFPL, and
information how to download/use them can be found in the main `README.md` of this repository.
This guide is only for those who want to build the container locally.
To build the container, you need docker installed on your machine.
```shell
docker build -t TAG -f container/Dockerfile ./
```
Replace `TAG` by a tag of your choice and run the container with
```shell
docker run TAG dfpl DFPL_ARGS
```
replacing `DFPL_ARGS` with the arguments for dfpl.
# Building the singularity container for DFPL
If you want to be able to run the container as a normal user, you need to configure it appropriately when
[building Singularity from source](https://sylabs.io/guides/3.4/user-guide/installation.html).
In the configuration step, you should provide
```shell script
$ ./mconfig --without-suid --prefix=/home/patrick/build
```
where the last `prefix` option defines where Singularity will be installed.
All other steps are as pointed out in the documentation linked above.
In order to obtain a singularity container the docker container needs to be built first.
Then run:
```
singularity build FILENAME.sif docker-daemon://TAG
```
\ No newline at end of file
name: dfpl_env
channels:
- conda-forge
- defaults
dependencies:
# dev requirements
- conda-build=3.21.8
- conda=4.12.0
- pip=22.0.4
- pytest=7.1.1
# application requirements
- jsonpickle=2.1
- matplotlib=3.5.1
- numpy=1.19.5
- pandas=1.4.2
- rdkit=2022.03.1
- scikit-learn=1.0.2
- tensorflow-gpu=2.6.0
- wandb=0.12
\ No newline at end of file
numpy~=1.19.1
pandas~=1.0.5
matplotlib~=3.2.2
wandb~=0.12.11
keras~=2.4.3
scikit-learn~=0.23.1
jsonpickle~=1.4.1
dataclasses
rdkit~=2020.03.4
pytest~=5.4.3
setuptools~=49.6.0
pip==22.0.4
pytest==7.1.1
-e .
\ No newline at end of file
......@@ -5,10 +5,10 @@ with open("README.md", "r") as fh:
setup(
name="dfpl",
version="0.1",
author="Jana Schor, Patrick Scheibe",
version="1.2",
author="Jana Schor, Patrick Scheibe, Matthias Bernt",
author_email="jana.schor@ufz.de",
packages=find_packages(),
packages=find_packages(include=['dfpl', 'dfpl.*']),
long_description=readme_text,
long_description_content_type="text/markdown",
url="https://github.com/yigbt/deepFPlearn",
......@@ -16,5 +16,19 @@ setup(
"Programming Language :: Python :: 3",
"Operating System :: OS Independent"
],
python_requires='>=3.6'
)
python_requires='>=3.6',
install_requires=[
"jsonpickle~=2.1",
"matplotlib==3.5.1",
"numpy==1.19.5",
"pandas==1.4.2",
"rdkit-pypi==2022.03.1",
"scikit-learn==1.0.2",
"keras==2.6.0",
"tensorflow-gpu==2.6.0",
"wandb~=0.12",
],
entry_points={
'console_scripts': ['dfpl=dfpl.__main__:main']
}
)
\ No newline at end of file
# Building the Singularity container for DFPL
The Singularity container is built automatically for the latest version of DFPL, and
information how to download/use them can be found in the main `README.md` of this repository.
This guide is only for those who want to build the container locally.
To build the container, you need [Singularity](https://sylabs.io/guides/3.4/user-guide/installation.html) installed
on your machine.
If you want to be able to run the container as a normal user, you need to configure it appropriately when building
Singularity from source.
In the configuration step, you should provide
```shell script
$ ./mconfig --without-suid --prefix=/home/patrick/build
```
where the last `prefix` option defines where Singularity will be installed.
All other steps are as pointed out in the documentation linked above.
Building the container using the provided `dfpl.def` definition file requires only a few steps:
1. (Optionally) Make some adjustments to the `.def` file
2. Run the container build command
## 1. Make some adjustments to the `.def` file
You can adjust this file to your liking, e.g. adjust the conda environment that build inside the container
and which is defined in the file `environment.yml`.
## 2. Run the container build command
Building the container needs to be done with sudo rights.
From within the `singularity_container` directory, run the following commands:
```shell script
SING=$(command -v singularity)
sudo $SING build dfpl.sif dfpl.def
```
**Note:** The container will have a copy of the whole `deepFPlearn` directory in it.
Therefore, ensure that you remove old `.sif` files or any other large data-files that you
don't want to include in the container itself. During the automatic building, the `deepFPlearn`
repository is cloned using
```shell
git clone --depth 1 https://github.com/yigbt/deepFPlearn.git
```
to keep the container size at a minimum.
<