Linking a Gitlab repo to the GEUS Dataverse
This is a similar post to the one described here about linking a GitHub repo to the GEUS Dataverse. I would recommend reading that post first for a general overview of the steps and set-up components needed.
The steps for setting this up with Gitlab are pretty similar for linking a repo to the GEUS Dataverse, but there are some key differences.
Using Gitlab CI/CD to construct a pipeline
Gitlab CI/CD is similar to GitHub Actions in that you can construct jobs to run automatically at set times. This should be a yml
file located at the top directory of your repo and named .gitlab-ci..yml
. The contents of this job is fairly similar to the GitHub yml
action workflow.
# Gitlab-to-Dataverse uploader
image: python:3.9
dataverse_uploader:
variables:
DATAVERSE_TOKEN: "$DATAVERSE_API_TOKEN"
DATAVERSE_SERVER: "https://dataverse.dataverse.dk"
DATAVERSE_DATASET_DOI: "doi:10.22008/FK2/IPOHT5"
GITHUB_DIR: "./"
DELETE: "true"
PUBLISH: "false"
script:
- apt-get update && apt-get install -y git python3-pip
- git clone https://github.com/IQSS/dataverse-uploader.git dataverse-uploader
- cd dataverse-uploader
- pip install -r requirements.txt
- echo "$DATAVERSE_TOKEN" "$DATAVERSE_SERVER" "doi:$DATAVERSE_DATASET_DOI" "$CI_PROJECT_URL" "$DELETE" "$PUBLISH"
- python dataverse.py "$DATAVERSE_TOKEN" "$DATAVERSE_SERVER" "doi:$DATAVERSE_DATASET_DOI" "$CI_PROJECT_URL" -r "$DELETE" -p "$PUBLISH"
Scheduling a pipeline
Once a pipeline has been created, how often it runs can be set using the Schedules
settings under the CI/CD
menu in Gitlab. This is essentially a cron job syntax, so say we wanted a job to run every day at 12.00 then it would look something like this:
0 12 * * *
Another option is to have the pipeline run after a trigger event has happened on the repo. This can either be defined in Settings >> CI/CD >> Pipeline triggers
or can be added to the top of the .gitlab-ci..yml
file with the workflow
and rules
parameters
workflow:
rules:
- if: $CI_PIPELINE_SOURCE == "push"
This example above executes the pipeline after something has been pushed to the repo, for example.
Further reading
Terminology for CI/CD yml workflow
Gitlab CI/CD workflow rules documentation