Commit 5e1b97b3 authored by Erxleben, Fredo (FWCC) - 136987's avatar Erxleben, Fredo (FWCC) - 136987 Committed by Hueser, Christian (FWCC) - 138593
Browse files

Review blog post, refine it and add extra plot

* Did a review of the blog post about the technology perspective
on the survey results.
* Added a new plot that depicts how much do researchers use version
control systems depending on how much they share their code.
parent e913378b
Pipeline #50023 failed with stages
in 7 minutes and 36 seconds
---
title: "HIFIS Survey 2020: A Technology Perspective"
date: 2020-11-27
authors:
- huste
- hueser
layout: blogpost
title_image: default
categories:
- report
tags:
- survey
- technology
excerpt: >
The HIFIS Software survey gathered information from Helmholtz
research groups about their development practice. This post shows some
insights from a technology perspective and tries to make some conclusions
for the future direction of HIFIS Software technology services.
---
Beginning of 2020 the HIFIS Software team initiated a software survey
targeting employees of the whole Helmholtz Association in which 467 participants
could be considered for the analysis.
The figure below depicts how strongly the different Helmholtz research fields
are represented in this survey.
{:.treat-as-figure}
![Participants per research field]({% link assets/img/posts/2020-10-15-survey-technology/participants_per_research_field.svg %})
With the results of the survey we want to understand, how we as HIFIS Software
Services can best support your every day life as a research software developer.
In this blog post we will examine the results from a technology perspective
and will on the one hand give an overview of the status quo of the software
engineering process of the participants, and on the other hand try to identify
specific measures.
## Version Control
One of the basic requirements for developing sustainable and high-quality
research software is the usage of a version control system (VCS).
On the market there exist multiple competitors, distributed version control
systems like Git or Mercurial and centralized version control systems like
SVN.
In accordance with the trends shown in analysis done by Stackoverflow, we
expected Git to be the most popular tool within Helmholtz.
{:.treat-as-figure}
![Stackoverflow version control systems trend]({% link assets/img/posts/2020-10-15-survey-technology/stackoverflow_vcs.svg %})
Trend of Stackoverflow questions per month. Created via [Stackoverflow Trends](https://insights.stackoverflow.com/trends)
on 2020-10-15.
The participants of the survey have answered to the multiple-choice question
about which VCSs they use as shown in the figure below.
{:.treat-as-figure}
![Version control system usage]({% link assets/img/posts/2020-10-15-survey-technology/vcs_percentage.svg %})
A similar diagram as above has already been evaluated in a related
[blog post on results from the survey analysis]({% post_url 2020/11/2020-11-07-survey-results-language-vcs %}).
Here, based on these descriptions we only would like to draw conclusions
from a technological point of view.
Only roughly 10% of the participants claim that they do not use VCSs
while developing their research software.
These results indicate that the awareness is high among the participants
that the usage of version control systems is an important aspect in
sustainable software development.
In order to unravel that a bit more, we identified a trend in the figure below
that the use of VCSs increase the wider research software developers share
their source code in terms of categories like within their research group,
research organization, research field or even general public.
Hence, there might be a relationship between the broadness of code
share and usage of VCSs.
If this trend holds true then it illustrates that version control
systems are indeed mandatory tools to collaborate with other
developers.
{:.treat-as-figure}
![Version control system usage]({% link assets/img/posts/2020-10-15-survey-technology/vcs_usage_per_code_share_category_percentages.svg %})
The responses to the survey are then grouped into the six Helmholtz research
fields:
* Aeronautics, Space and Transport
* Energy
* Earth and Environment
* Health
* Matter
* Key Technologies
{:.treat-as-figure}
![Version control system per research field]({% link assets/img/posts/2020-10-15-survey-technology/vcs_usage_per_field.svg %})
In the research field _Aeronautics, Space and Transport_ SVN seems to be
more widely spread compared to other research fields but also the portion
of developers who do not use version control is lowest among the
participants of this research field.
On the one hand, given the collected data about the amount of VCSs questions
asked on Stackoverflow over time introduced earlier this most probably gives an
indication that there is a significant amount of comparably older repositories
that use SVN and that this research field might have a longer tradition of
using VCSs.
On the other hand, this shows that the use of VCSs in this research
field today is more prevalent compared to other Helmholtz research fields.
From the data it is also possible to compare the usage of version control
systems with the team size participants usually develop software in.
The result is shown in the figure below:
{:.treat-as-figure}
![Version control system by team size]({% link assets/img/posts/2020-10-15-survey-technology/vcs_usage_per_team_size.svg %})
It is clearly visible that the amount of participants who claim to not use any
kind of version control decreases with increasing team size.
This insight is actually very valuable.
This illustration suggests a relationship between team size and the use of VCSs.
One reason for increasing use of VCSs with growing team size might be that VCSs
make collaboration more comfortable and that researchers are aware of this fact.
Whether the use of VCSs has actually already become a de-facto standard in
research software will be further investigated (e.g. in our next survey).
On the other hand from the participants who claim to develop software mostly
on their own 20% specify to not use version control at all.
This is something we as HIFIS Software Services would like to see change in
the future.
For us, it is important to make people aware that using version control is a
mandatory requirement for software development projects of any scale.
This requires us to make the entry hurdle to using version control systems as
low as possible.
This means that every software developer in Helmholtz must have
access to a suitable and easy-to-use infrastructure to enable this basic
requirement.
Therefore, HIFIS Software Services will offer a GitLab instance that is
usable by every employee of the Helmholtz Association free of charge.
## Software Development Platforms
Using version control systems can be considered the entry-point to a world of
platforms that build even more around this basic requirement.
Even if you can typically use a version control system completely local
as well, it really starts paying off when combining version control with online
platforms like e.g. GitLab, GitHub or Bitbucket.
On the one hand this opens up your project for collaboration but also gives
you access to a whole ecosystem of other extremely useful tools like issue
tracking, merge requests, CI/CD or code reviews.
This is why we were also eager to know which software development platforms
the participants use in their every-day life.
{:.treat-as-figure}
![Software Development Platform Distribution]({% link assets/img/posts/2020-10-15-survey-technology/sw_dev_platforms_percentages.svg %})
The results show that among the participants the most widely used platforms
are GitHub.com and self-hosted GitLab instances followed by GitLab.com.
Thus, about 54% of the participants claim to use GitHub.com, 49% use self-hosted
GitLab instances and about 25% of the participants specify to use GitLab.com.
About 13% claim to not use any of the platforms.
This value is in a similar range to the participants who specified to not use
version control systems.
## Continuous Integration
Continuous Integration (CI) is referred to as the practice of merging code
changes into a shared mainline several times a day.
A typical workflow would incorporate the automatic building of a software,
the automatic execution of unit tests and finally, the automatic deployment of
artifacts, e.g the documentation or compiled binaries.
The last step is also referred to as Continuous Deployment (CD).
On the market, there exist multiple tools that support this kind of software
development process.
Some of the tools available at the time of this survey were GitLab CI, Jenkins,
Travis or CircleCI.
The results of the survey show a pretty diverse situation for the usage of CI
services by the participants.
{:.treat-as-figure}
![Continuous Integration Distribution]({% link assets/img/posts/2020-10-15-survey-technology/ci_service_used.svg %})
On the one hand, a portion of 53% of the participants claim to not use CI
services at all.
Among the participants who declared to use CI services, the most commonly used
technologies were GitLab CI (29%), Jenkins (16%) and Travis CI (13%).
Due to the fact that many Helmholtz centers host their own GitLab instances
which also allows to use GitLab CI, we expected GitLab CI to be the most
popular tool among the participants of the survey.
Jenkins is also a tool that can be self-hosted and thus, is also popular and
available in different centers.
Due to the popularity of GitHub, especially for Open Source projects,
it is not surprising that also Travis CI is widely chosen according
to the survey responses.
At the time of creating the survey, GitHub Actions was not yet widely available
on the market.
This explains, why this service does not show up in the list of chosen tools.
We as HIFIS Software Services would like to see a rise in the overall usage
of CI/CD in the daily software development process.
It offers the chance to automate repeating tasks and introduces automated
quality checks for code changes before they get merged into the mainline.
Therefore, we want to ensure that every Helmholtz researcher regardless of
their affiliation has seamless access to general purpose resources for CI/CD.
This is why the provided GitLab instance will be equipped with scalable
resources for CI/CD.
With this offer, in combination with proper education, training and
consultation we hope to see a rise of the general usage of automation
technologies in research software engineering.
---
layout: blogpost
title: "HIFIS Survey 2020: Programming, CI and VCS"
date: 2020-11-27
authors:
- erxleben
title_image: default
categories:
- report
---
## Introduction
In the beginning of 2020 the HIFIS team conducted a survey among Helmholtz
scientists with the goals of learning more about the current practices
concerning research software development and identifying future challenges.
This blog post will present a glimpse into the survey's results and our take
on the gathered data.
Specifically, we will take a look at the distribution of programming languages
across the different research fields as well as the utilization of
_Version Control Systems_ (VCS) in the same context.
Last, a short insight into the prevalence of various
_Continuous Integration_ (CI) systems will be given to round out this blog
post.
## Programming Languages
We asked the survey participants which programming languages they regularly
used for writing research software.
The following heatmap displays the relative usage of the most predominant programming languages for each research field
{:.treat-as-figure}
![Plot: Languages by Research field]({{ site.directory.images | relative_url }}/posts/2020-11-07-survey-results-language-vcs/plot_language_by_field_normalized.svg)
All presented numbers are the relative usage of a given language in a given
field.
They might not always add up to exactly 1.00 per field or per language due to
multiple factors:
* Some participants did not answer both questions.
These answers are not represented in the plot.
* Languages that had not at least a _5%_ share in at least one field were
omitted to focus on the most prominent ones and make the graphic easier to
read.
### What can We Learn?
The first thing that catches the eye is that Python seems to be very dominant
in every research field.
We have to take this appearance with a slight grain of salt since the survey did
not distinguish between the outdated, but generally popular, Python 2 and
the current Python 3.
The popularity of the language amongst researchers is not very surprising:
They are well suited for quickly creating small scale scripts, combined with
an extensive choice of libraries for many use cases.
Consequently, our education and training efforts will continue to provide
offers regarding programming in Python and create appropriate courses and
materials to further the knowledge and best practices in this language amongst
scientists and research software developers.
Regarding consultations we expect the team to receive requests regarding the
porting of older Python 2 applications to Python 3, as well as support
requests for dealing with the variance of virtual environments and package
management for this language.
A second language often selected was C++ which often is a popular choice in
high performance computing and larger applications.
This indicates a potential demand for supporting this language in the future as
well, especially in the context of training as well as consulting.
Notable further mentions would be the the strong presence of the statistics
language R in the _Health_ and _Earth and Environment_ research fields,
which implies the opportunity for education and consulting being tailored and
advertised more towards these areas.
## Version Control systems
Similarly to the question above, a second question was analyzed, concerning the
usage of _Version Control Systems_ (VCS) amongst the participants related to
specific fields of research.
{:.treat-as-figure}
![Plot: VCS Usage by Research field]({{ site.directory.images | relative_url }}/posts/2020-11-07-survey-results-language-vcs/plot_vcs_per_field.svg)
The strong prevalence of Git is apparent at first glance.
As a runner-up there are still some projects out there based on SVN for
version control, which - together with a few mentions of CVS - might be an
indicator for older, longer living projects.
The amount of projects not using any version control at all is comparatively
low, which points toward the usage of VCS being an established step in setting
up projects across all research fields.
From an education perspective it appears to be the right way to continue to
focus on basic and advanced Git-courses and promote version control as one of
the standard practices in every scientists toolbox.
It can be expected that the consulting team might face requests for help with
migrating projects from SVN or CVS to Git in the future.
## Continuous Integration
As a third question we wanted to know which _Continuous Integration_ (CI)
services the participants use to automate tasks surrounding their projects.
This, again, was a multiple choice question and the following plot shows the
relative distribution of the given answers:
{:.treat-as-figure}
![Plot: Overall CI Usage]({{ site.directory.images | relative_url }}/posts/2020-11-07-survey-results-language-vcs/plot_ci_service_usage.svg)
One very prominent outcome is that over half of the participants did claim to
not use any CI at all.
Several possible reasons for this finding come to mind:
* The question was not clear enough and participants who actually use CI were
not aware of that fact.
* Participants are not aware that CI exists.
* Participants do not see any potential benefit of CI for their projects.
* Participants do not know how to set up and use CI.
Given that practically any project can benefit from employing
_Continuous Integration_ services by automating at least the mundane management
tasks like license checking, documentation generation, style checks, etc. all
four given reasons can be assumed to be a lack in awareness and education.
Further, the plot reveals that the currently used CI solutions are (in
descending order of percentage) _GitLab CI_ which holds over a quarter of all
shares, _Jenkins_ and _Travis CI_ with all other services being barely
represented.
Building on the insights from this analysis, three actions clearly stand out to
improve CI usage across all projects:
* The education team will have to increase their portfolio and offer more
courses centered around CI usage.
* The popularity of _GitLab CI_ will likely increase the demand to migrate
other projects to this system. It will fall to the consulting branch to be
prepared to deal with such requests.
* The technology team has already begun to offer pre-made recipes for CI
pipelines and has an incentive to grow the collection of ready-to-use solutions
for popular scenarios.
## Conclusion
Thanks to the participants of the HIFIS survey in 2020 it was possible to gain
a first glimpse into the status quo of research software engineering within the
Helmholtz centers. With this data, the needs of the scientists could be assessed
from a birds-eye perspective and it is possible to determine concrete steps to
offer better support for the scientists at Helmholtz.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment