WEBVTT

00:00.000 --> 00:11.000
Okay. Thanks. So, my name is Uli Havna and I'm trying to give you a talk about a union

00:11.000 --> 00:19.000
unified equality feedback across CI and C.D. pipelines. What I would like to show is how

00:19.000 --> 00:26.360
important it is to have quality, monitoring, some kind unified across the build systems

00:26.360 --> 00:36.360
or the CI systems. So, first a couple of topics about me. I started as a Jenkins developer.

00:36.360 --> 00:45.360
I'm almost active since for 20 years. I built in Jenkins the quality monitoring tools,

00:45.360 --> 00:53.360
the warnings plug in and the coverage plug in. And I used in the beginning Jenkins primary for

00:53.360 --> 01:00.360
continuous integration with industrial pipelines. So, what I have shown is how

01:00.360 --> 01:08.360
the quality of industrial projects works. Now, I'm switched the topics I moved to education

01:08.360 --> 01:14.360
and now I'm at the teaching at the university software engineering and software programming.

01:14.360 --> 01:22.360
And there I realized that we have the same topics there. We want to see how students

01:22.360 --> 01:31.360
perform while they are programming. So, we need quality monitoring there as well. So, we have

01:31.360 --> 01:37.360
the same topics and it's just a little bit different interpretation of the results.

01:37.360 --> 01:45.360
So, my talk will show you a little bit about modern CI pipelines and how they build quality

01:45.360 --> 01:52.360
reports. Quality feedback and we've seen already that you are using a different CI systems

01:52.360 --> 02:00.360
to show you results. Quality feedback is a little bit fragmented in the meantime. So, we have

02:00.360 --> 02:07.360
a lot of tools that produce different data. So, we had a talk about fortune, a couple of minutes ago

02:07.360 --> 02:14.360
they produce test results from fortune. And when you are using Maven, you produce test results

02:14.360 --> 02:20.360
in Maven or you produce coverage results or you produce static analysis results from spotbacks

02:20.360 --> 02:28.360
or check style or things like that. So, we have a lot of reports, a lot of data.

02:28.360 --> 02:35.360
And yeah, the problem is the data that is shown is totally different if you are using

02:35.360 --> 02:42.360
GitHub or if you are using GitHub or if you are using Jenkins. So, there is no consistency

02:42.360 --> 02:49.360
about everything. So, I think what we need is something that is unified for all these building

02:49.360 --> 02:58.360
tools. And all I mentioned that I started with Jenkins and yeah, Jenkins exposed this.

02:59.360 --> 03:07.360
Yeah, the need for a quality, a shared quality model because in Jenkins we have a lot

03:07.360 --> 03:13.360
of plugins that show some quality information. We have a plugin for a unique testing.

03:13.360 --> 03:19.360
We have several plugins that show coverage information. And we have a warnings plugin that shows

03:19.360 --> 03:27.360
warnings. But everything is showing some kind of quality data. And some plugins even show

03:27.360 --> 03:33.360
it differently from build to build. So, for me, it makes sense that we have a missing piece.

03:33.360 --> 03:40.360
We need a quality model that is shared across all those plugins in Jenkins. And we need

03:40.360 --> 03:46.360
one that is shared across different CI systems that is the same in GitHub, the same in GitHub

03:46.360 --> 03:53.360
and the same in Jenkins. And this quality model should gather all these quality information.

03:54.360 --> 04:01.360
And finally, it should enable aggregation. It should provide trends. It should provide feedback

04:01.360 --> 04:08.360
for industry people or for students. And it should provide some quality guides when we say

04:08.360 --> 04:19.360
when we fail the build because of some quality problems. So, I took the code that I programmed

04:20.360 --> 04:27.360
in Jenkins with a warnings plugin, the coverage plugin and extracted the common code base.

04:27.360 --> 04:37.360
And use Jenkins currently as a reference implementation. And I noticed the interesting thing is we can use the same

04:37.360 --> 04:45.360
quality model in the different systems. So, it does not make sense to have different libraries for GitHub.

04:45.360 --> 04:51.360
So, we use one quality model that can be used in all of these continuous integration service.

04:51.360 --> 04:59.360
So, finally, we should have GitHub and GitHub produce the same output on the same build.

04:59.360 --> 05:09.360
It should not matter if we use continuous integration on Jenkins or on GitLab. It should produce the same output with the same data.

05:10.360 --> 05:20.360
So, we centralize the semantics. So, what is a quality guide? What building tools do you have? What quality tools do you have?

05:20.360 --> 05:26.360
And use a unique model that can be used across all these systems.

05:26.360 --> 05:34.360
Maybe this will be a little bit more clear if I show you the architecture of the whole structure.

05:34.360 --> 05:45.360
So, I try to provide a small architecture image where you can see how the plug-ins or the model behaves.

05:45.360 --> 05:53.360
So, basically, we have a different quality dimensions, I say. We have static analysis results.

05:53.360 --> 05:59.360
These are cumulative. So, we have 10 warnings, 100 warnings, something like that.

05:59.360 --> 06:05.360
Then we have code and mutation coverage results. These results are more relative.

06:05.360 --> 06:12.360
And then we have test results and test results are more binary. So, fade or not fade.

06:12.360 --> 06:18.360
And then we have software metrics like complexity and things like that, they are even differently.

06:18.360 --> 06:30.360
So, how are we dealing? This is dimensions. We put the dimensions here. You see the far arrows here into separate quality models.

06:30.360 --> 06:35.360
That means we have one specialized quality model here per dimension.

06:35.360 --> 06:45.360
And we have specialized models as sorry specialized parsers that transform the output of the build into these quality models.

06:45.360 --> 06:54.360
So, we have parsers for instance, for check style or for service or for spot bugs that feed into the static analysis data.

06:54.360 --> 07:06.360
Or we have parsers for jacoco or for cobertura that put the input from cobertura coverage results and feed them into our quality model.

07:06.360 --> 07:18.360
We have tests, a unit results, XML files, or we have software metrics from PMW, a PMD, sorry, and they feed into these specialized quality models.

07:18.360 --> 07:27.360
The parsers are the only thing that needs to provide it when we want to use different new systems.

07:27.360 --> 07:40.360
For instance, I'm not sure if we have fortrend. If someone wants to report fortrend unit tests, we need a parser that takes the output of fortrend and feeds it here into our testing model.

07:40.360 --> 07:45.360
And then you can see fortrend results in a unified perspective.

07:46.360 --> 07:55.360
And this is where contributors are required or come into. So, most of the parsers are written by contributors.

07:55.360 --> 08:12.360
So, I started with five parsers that I used in my projects, but now we have 150 parsers in the static analysis model where everybody who wants to show warnings about cobertura and whatever can show them in my models.

08:12.360 --> 08:28.360
And then we have the unification, we have specialized parsers and the quality model, and then we can start with a unified approach. We have a unified quality evaluation at the end that processes the results.

08:28.360 --> 08:39.360
And another key, you need to mention this, or I need to mention this, the quality evaluation is totally decoupled from the build.

08:39.360 --> 08:45.360
So, the project build is still with your project tools that you are used.

08:45.360 --> 09:00.360
So, when you are using, for instance, Chava, you are using Maven and the Chava compiler, when you are using C, you can use GCC or something like that, and you are using Make, it doesn't really matter.

09:00.360 --> 09:13.360
We decoupled the build and what is important for us is only the results that our process. So, each build produces a lot of reports.

09:13.360 --> 09:33.360
So, for instance, when you are using Chava, you have a console log where you see the warnings of the Chava compiler, you see XML files from checkstyle, you see XML files from spotbacks from cobertura, all tools produce some kind of reports here.

09:33.360 --> 09:43.360
And these reports are from your build system, and if it is not supported here, well, actually we need some contributions, then it will be supported as well.

09:43.360 --> 10:02.360
And here we start with our unified quality feedback. That means we take the reports, parse them, transform them into our model, and then we are starting our unified quality evaluation.

10:02.360 --> 10:08.360
That means we don't evaluate the builds, we evaluate the outputs of your builds.

10:08.360 --> 10:24.360
Let's see how does it come now to that works for a multiple CI systems. So, we have this unified quality evaluation, which is of course independent of a CI system.

10:24.360 --> 10:42.360
And then, what do we produce? We produce a lot of different things. So, we produce, for instance, scores, how good is your change, how many failures do you have in the test, what does the coverage, what are the static analysis results.

10:42.360 --> 10:54.360
So, we show you trends, how does it work out for the next, for the last 10 releases, for instance, or we show you a quality gates.

10:54.360 --> 11:06.360
And this is one of the important things. We have quality gates that enforce, that you have zero warnings, for instance, or that your code coverage is 50% or something like that.

11:06.360 --> 11:14.360
That is something important. This is also generalized. This is available for all CI systems.

11:14.360 --> 11:29.360
Then, we have these modelings, and what we need is some feedback for the users. So, we also have a consistent feedback with same context, same semantics for all CI systems.

11:30.360 --> 11:39.360
That means we produce feedback and say, okay, the code coverage is not good enough, or there are too many warnings, et cetera.

11:39.360 --> 12:04.360
And this feedback, we need to present in the CI system. And here we need some customization, because Jenkins has a user interface, where you have HTML and JavaScript, where you can produce a result in GitHub and GitHub, you only have pull requests and mark down. So, that needs to be changed here.

12:04.360 --> 12:24.360
But when we have this common thing, we can kind of have a short look at the quality feedback. The quality feedback is implemented. We are a quality gates, and this quality gates are for us, a main control mechanism for your project.

12:24.360 --> 12:42.360
We have different kind of quality gates, and they are the same in Jenkins and GitHub or GitHub. So, we have quality gates that are absolute, which makes sense for your maybe greenfield project. So, you have 70% of global code coverage in your code.

12:42.360 --> 12:56.360
But yeah, sometimes it does not make sense to measure the global coverage, because you have a legacy project, like in Jenkins, we have a lot of code that is not tested, but the new code should be good tested.

12:56.360 --> 13:11.360
So, we also have a relative thresholds for our quality gates that means we can check which is the new code and what is the code coverage, only of the change code, and how many warnings have been introduced in the new code.

13:11.360 --> 13:20.360
So, this is something which is also very important for users of our tooling that we use relative quality gates.

13:21.360 --> 13:36.360
And finally, this is something we are using very often in Jenkins. We also kind of have a data based thresholds that means when you have your main branch, you can see how it behaves from release to release.

13:36.360 --> 13:57.360
And the idea here from these quality gates is that you say, maybe, yeah, we want to improve even better, but we want not to say, okay, it needs to be 90%, but we just say it should be better as the previous release. So, these are the three types of quality gates you can select to choose.

13:57.360 --> 14:05.360
So, the goal for the quality gates is not to punish, it's just to improve the results.

14:05.360 --> 14:19.360
So, once we define the quality gates, the next question is how do we represent some, so in Jenkins we have the user interface, but in GitHub, in GitHub, we don't have a user interface.

14:19.360 --> 14:31.360
So, what we do is we provide some structured, a pull request feedback, that means the scores, quality gates, and trends are shown in the pull request.

14:31.360 --> 14:47.360
We have marked down for this, and maybe the best thing is I show you an example, this is, for instance, quality feedback in GitHub, where you see the results of your build, you see you have some tests here,

14:47.360 --> 15:02.360
and code coverage of new code, you see code coverage of existing code, you configure what you want to see here. So, you can see everything, what you have configured, the test static analysis for code coverage.

15:02.360 --> 15:19.360
What you also can see, you see the quality status, that means for each quality that you define, you see the status, okay, the overall test rate is okay, it must be 100%, that's line coverage of new code is 90% et cetera.

15:19.360 --> 15:47.360
So, you have different fuse, and all are shown in the pull request. And if you are an IT tech, and want to dig a little bit deeper, you even can have a look at the diff, and each diff will provide some different visualizations, where we have some annotations in GitHub, where you see, okay, here, a mutation has been survived, or we have a warning or something like that.

15:47.360 --> 16:11.360
So, this is something we use to show the quality results. And another point is, I already mentioned that I'm working in university and I'm using quality monitoring for students as well, and so we noticed that when we are doing this for students, we can use the same thing.

16:11.360 --> 16:25.360
We need the same quality model, we have the same quality evaluation, the only thing what's changed is the interpretation. For students, we want some different interpretation.

16:25.360 --> 16:36.360
I try to summarize this in a table. So, in industry pipelines, we have quality gates, and these quality gates decide if something goes into a release or not.

16:36.360 --> 16:56.360
For student projects for assignments that students have in my courses, they get points for code coverage for warnings, and finally they get a great and automatic great, where they can see how are they doing from learning, or how they get some feedback.

16:56.360 --> 17:18.360
And I have hundreds of students and it's very simple to get them a first feedback before I go make manual review. And in projects, in real projects, you need a pull request reviews and a quality gate is something that helps to simplify your reviews.

17:18.360 --> 17:37.360
So, the lessons learned, yeah, CI systems, they differ less than we expected. So, the quality models are the interesting part, but the CI systems, they come and go, and it doesn't really matter for my tools.

17:37.360 --> 18:06.360
If there is coming a new star, then we will report the user interface, but the model will stay the same. So, the scoring, we are using a little bit more complex, not just simply failing, but yeah, the scoring to put it in a library helps to reduce the maintenance costs. So, we have a single module that helps to show in different CI systems.

18:06.360 --> 18:23.360
And finally, yeah, the key lesson I have with this system, or with the implementation of the system, it is totally independent of the CI system, we can replace it anytime.

18:23.360 --> 18:38.360
It is independent of the user interface, we use Markdown, which is visible in GitLab and GitHub. And yeah, finally, it's open source, that's also important to get contributions for different passes.

18:38.360 --> 18:51.360
And yeah, finally, I think, so the CI systems built software, this is one part, and the other part is we should have quality feedback, and this is something I am providing with my tools.

18:51.360 --> 19:01.360
So, yeah, try it them in your projects, I hope, yeah, it will help to improve the quality of your project as well.

19:01.360 --> 19:09.360
So, thank you for your attention, and now I'm ready for some questions.

19:09.360 --> 19:28.360
Can you use this to test software that has machine learning models and neural networks?

19:28.360 --> 19:34.360
Okay, the question is if I can use the software that uses a machine and learning model in it.

19:34.360 --> 19:42.360
Yeah, I'm trying to create a software that uses machine learning neural networks, but I don't have a quality monitoring.

19:42.360 --> 19:55.360
Yeah, the question is if it works with a software that uses a machine model, but yeah, it can if your tools produce some output that I can pass.

19:55.360 --> 20:05.360
So, if you're tooling produces a report of the tests, a report of the coverage, a report of the warnings, then it is possible to use.

20:05.360 --> 20:16.360
If there are no passes yet, we need to provide new passes, but that's typically a one day thing, so I get a lot of passes from different areas.

20:16.360 --> 20:24.360
So, if it's not supported, have a look how complex your output is, and then we can integrate it.

20:24.360 --> 20:29.360
Yeah?

20:29.360 --> 20:33.360
The question is where does the auto-grading run?

20:33.360 --> 20:36.360
It's part of the CI system.

20:36.360 --> 20:40.360
So, in GitHub, it's a GitHub action that you're actually binding in.

20:40.360 --> 20:49.360
In GitHub, I didn't find a similar thing, so it's part of a Docker container, and you call the action in the Docker container.

20:49.360 --> 21:05.360
And it takes the output of your build, so it must have access to your build files, and then it produces a result.

21:05.360 --> 21:09.360
Yeah, that's currently a student working on this project.

21:09.360 --> 21:32.360
We are recording the dead, the reports for each build, and storing it in the build, and in a pull request or much request build, we search for the last build on the main branch, and then we take both result files, and both result files are compared against.

21:32.360 --> 21:45.360
I think it's a little bit simpler because we have references in the model, but in GitHub, we need to point to the last build of the main branch, which is always not the right one.

21:45.360 --> 21:50.360
Yeah, but this is something we're currently improving.

21:51.360 --> 22:19.360
Okay, the question is how do we report trends in GitHub and GitHub we don't have trends yet, and because I don't see where I should show them, so in Jenkins, we have trends, because we have it in a database, but in GitHub, I don't have this database, so this is something we need to add on, or I'm not sure how we can do that in GitHub or GitHub.

22:19.360 --> 22:24.360
Okay, another question, yeah?

22:24.360 --> 22:28.360
Yeah, I think now we're going to get those AI agents that are creating the code and everything.

22:28.360 --> 22:35.360
Have you thought about how this is about this type of system coming to that?

22:35.360 --> 22:39.360
Have you thought about the type of code by AI?

22:39.360 --> 22:46.360
I don't know, then I think those進 type of system is much more important, not to guarantee what to get out of the main branch.

22:47.360 --> 22:58.360
Yeah, I'm not sure how this relates to the project, because as a the question is how AI generated code.

22:58.360 --> 23:01.360
How the system fits with that?

23:01.360 --> 23:09.360
How the AI generated code system fit with my code base, so I'm currently totally independent.

23:09.360 --> 23:20.360
I'm taking the results of a build, and if the build is produced by code that has been generated by AI, it takes the code, what the build produces.

23:20.360 --> 23:27.360
So I have no knowledge about the content, and I don't have the tools, the tools are provided by you.

23:27.360 --> 23:32.360
I just take the results of the tools.

23:32.360 --> 23:33.360
Yeah?

23:33.360 --> 23:34.360
Yeah.

23:34.360 --> 23:38.360
You mentioned everything is the cover from the CI CDC system.

23:38.360 --> 23:39.360
Yeah.

23:39.360 --> 23:41.360
You basically provide a framework.

23:41.360 --> 23:42.360
Yeah.

23:42.360 --> 23:45.360
For the trends, you integrate with the check in second.

23:45.360 --> 23:50.360
Doesn't make sense to create a separate data, but it doesn't, it doesn't, it doesn't, it doesn't, it doesn't.

23:50.360 --> 23:51.360
Everything that's moving.

23:51.360 --> 23:56.360
The information from out there is integrated to check is then looking for the deep love and information.

23:56.360 --> 24:08.360
Yeah, the question is if we shouldn't create a real application that has a database, and so where we can put all the data into, yeah, this would be a good idea.

24:08.360 --> 24:14.360
Maybe this is a good master thesis for one of my students, but we have, we have it not yet.

24:14.360 --> 24:16.360
So yeah, it would be possible.

24:16.360 --> 24:17.360
Yeah.

24:17.360 --> 24:19.360
That's the back, yeah.

24:20.360 --> 24:27.360
So we have, you can have a look at these links.

24:27.360 --> 24:34.360
We have 150 formats for the static analysis and 10 or 15 formats for code coverage.

24:34.360 --> 24:38.360
So you can have a look if your format is supported.

24:38.360 --> 24:44.360
And as already mentioned, most formats are a couple of hours to implement a parser.

24:44.360 --> 24:48.360
So you just need text input provided in my Java model.

24:48.360 --> 24:55.360
And that's really easy to provide if you at least maybe a I can write such a parser now.

24:55.360 --> 24:59.360
Okay.

24:59.360 --> 25:00.360
Okay.

25:00.360 --> 25:01.360
Thank you.