WEBVTT

00:00.000 --> 00:10.000
Thanks everyone for attending my name is Andreas.

00:10.000 --> 00:18.000
I'm a member of the GNU Octave project, I'm a developer for Core Octave and I'm also a

00:18.000 --> 00:24.000
maintainer in the primary developer of a few packages including the statistics and the data

00:25.000 --> 00:33.000
I'm an electronic engineer by training but I've also made a piece being biological anthropology

00:33.000 --> 00:44.000
so I ended up on the different career path doing with anthropology and basically doing statistics

00:44.000 --> 00:50.000
and data analysis in a different context and engineering and computer science.

00:51.000 --> 01:00.000
So today I would like to give you my insight about GNU Octave and the GNU Octave ecosystem

01:00.000 --> 01:02.000
in education.

01:02.000 --> 01:12.000
As I've experienced during my research time but also as a lecturer in various universities

01:12.000 --> 01:19.000
I worked over the past ten years but also from the maintenance and developers point of view

01:20.000 --> 01:25.000
because I've been heavily involved with GNU Octave during the past four years

01:25.000 --> 01:31.000
or I've been using it for more than eleven years now for my work and my teaching.

01:31.000 --> 01:43.000
So here we are then like GNU Octave in education and insight beyond engineering into statistics and data analysis basically.

01:44.000 --> 01:55.000
So what is GNU Octave? GNU Octave is a scientific programming language which is mostly focused on numerical

01:55.000 --> 01:58.000
computations.

01:58.000 --> 02:10.000
Most people know it as the open source alternative of MATLAB or like the freebie clone of MATLAB which is kind of true.

02:10.000 --> 02:20.000
I mean we serve the same syntax but also it's a project that only took three open source projects.

02:20.000 --> 02:27.000
So to give you a bit of historical background of this project it's not GNU actually.

02:27.000 --> 02:36.000
It was conceived back in the late 80s and it was conceived actually as a tool, as an education tool.

02:37.000 --> 02:42.000
They needed to make a scripted language so to speak.

02:42.000 --> 02:48.000
In order to assist students actually to understand the computations in chemical reactions.

02:48.000 --> 02:57.000
So and back then the idea was like well we can do that in Fortran, which was what scientists did use at the time.

02:57.000 --> 03:13.000
And then again the whole irrational behind that was that yes, but we don't want the students actually paying attention to all the what is going on with a Fortran language and trying to debug the compiler so on.

03:13.000 --> 03:17.000
We want to make a long way that it's straightforward.

03:17.000 --> 03:29.000
It has a very easy learning curve and actually students can actually dedicate their focus and their effort in understanding the underlying mathematics.

03:29.000 --> 03:36.000
So this initial idea well still stands today like 36 years later or so.

03:36.000 --> 03:44.000
The development of octaves started back in 1992 by a guy named John W. Eton.

03:44.000 --> 03:48.000
Thankfully he's still around with us.

03:48.000 --> 04:05.000
And so the initial release was in 1993 and basically now we're almost I mean we're almost at octave 11 which is like we've just made a release candidate.

04:05.000 --> 04:15.000
Like if 10 days ago and hopefully there will be the major release within this math if everything goes well.

04:15.000 --> 04:34.000
So during this 34 years of course they've been more than 450 contributors in this project because 35 years it's 34 or 35 years it's quite a long time as you understand but nevertheless it's always been kind of a.

04:35.000 --> 04:42.000
Of a small project in terms of the actual people that were always engaged so.

04:42.000 --> 04:53.000
During this 34 years there's always been just a handful of people being engaged for certain periods and then like leaving us on so it's we're not a big project in this aspect.

04:53.000 --> 04:59.000
We don't have like a large community we do have a large user base though.

04:59.000 --> 05:09.000
But we're kind of seen in terms of developers and the contributors.

05:09.000 --> 05:12.000
We don't have a foundation supporting our show.

05:12.000 --> 05:15.000
I mean our concept is that basically.

05:15.000 --> 05:21.000
We're just a repository and the cold base and the group of people who are actually working on that.

05:21.000 --> 05:39.000
And of course we are licensed I mean the license is like general public license version three so because actually the octave project is actually part of the of the new project and it was released back then and we continue that.

05:39.000 --> 05:45.000
So a few things about the octave it's written in C mostly.

05:46.000 --> 05:55.000
Well it's an interpreted language of course and the scripting language of octave actually.

05:55.000 --> 05:57.000
Well it's almost identical.

05:57.000 --> 06:05.000
Well it is identical to MATLAB but it has a few extensions that work also in octave that are octaves specific.

06:05.000 --> 06:28.000
And it can be extended by using dynamic libraries written in C++ and this is an integral part of the octave interpreter which makes it quite easy to actually implement or link any kind of library into octava without any intermediate.

06:28.000 --> 06:33.000
We're also the need of intermediate libraries or other codes.

06:34.000 --> 06:47.000
We use an open GL backend for plotting which is quite a useful in most statistical data analysis tasks.

06:47.000 --> 07:02.000
And it comes both with a graphical user interface which I will also refer later on in my talk and of course it has a traditional command line interface that you can run.

07:02.000 --> 07:11.000
Last but not least the octava in itself is also a library.

07:11.000 --> 07:24.000
So whatever you can do with the interpreter and like the interpreter with the language actually you can use octava as a library into your own projects.

07:24.000 --> 07:34.000
So of course when you link to octava you have to be also a free project because you have to abide with a GPL version free library of course.

07:34.000 --> 07:38.000
But the potential is there nevertheless.

07:38.000 --> 07:54.000
So the key features of octava that makes it apart from other interpreted language is that it has a building support for multidimensional arrays and sparse matrices.

07:54.000 --> 08:09.000
And this is basically an integral part of the octava library because it was built upon this concept that you need to do multidimensional arrays.

08:09.000 --> 08:30.000
So basically when you have like a color picture that you need to process somehow in octava you don't really need to do some sort of work around or just load some external library that you're doing for example in r or python.

08:30.000 --> 08:42.000
It's already there like you load the data and it's natively multidimensional and this helps a lot because actually you can have a substantial speed.

08:42.000 --> 08:54.000
Because of course octava also supports full broadcasting in all math operations comparison or Boolean operations both dense and sparse matrices.

08:54.000 --> 09:10.000
So this makes it quite easy actually to write a highly vectorized code which makes the processing of the interpreted language.

09:10.000 --> 09:16.000
Quite quite fast.

09:16.000 --> 09:34.000
So other key features is that we do support nested indexing which plays very well with multidimensional arrays and other actually writing vectorized code especially with a complex structures.

09:34.000 --> 09:47.000
It has I said like we have an open gl back end so there's extensive floating capabilities that there built in we don't have to rely external packages like gd plot or whatever else.

09:47.000 --> 10:00.000
And of course there is an extensive set of core functions that deal with geometry linear algebra we have like ordinary differential equation solvers like.

10:00.000 --> 10:05.000
Like set of them you get statistics set operations etc.

10:05.000 --> 10:29.000
And of course and this is well I consider this one of the very important aspects of octave is that there is an integrated testing suite which is used for octava internally but it is also used for all the packages that actually can be used as addons to the octava program language.

10:30.000 --> 10:42.000
So I mean if you've used the octave before most likely you've you've heard of octave force or sure and octave packages basically.

10:42.000 --> 10:53.000
Octave force is the legacy systems that used it was actually it started back in 2000 and basically it was kind of a.

10:53.000 --> 11:06.000
A simple project to the new octave project that actually was dealing with the packages that would extend the octave functionality for specific.

11:06.000 --> 11:18.000
Needs and for specific scientific fields so you would get like statistics package you will get a package for optimizations you will get like geographic packages dealing with.

11:18.000 --> 11:33.000
So well as I said the octave community like the octave developers team has always been like.

11:33.000 --> 11:43.000
It's a slim in terms of a number of people so the the original octave force.

11:43.000 --> 11:50.000
Packed the system came to a stall basically somewhere about.

11:50.000 --> 11:58.000
2014 so to speak and I mean there were there were problems actually maintaining it.

11:58.000 --> 12:04.000
It also had to do that it was when it was built back in the 2000 it was built with.

12:04.000 --> 12:21.000
With an old old system and old concepts in mind terms of packaging so basically back in 2020 there has been a shift with change the whole packaging and well actually we moved it to GitHub the previous.

12:21.000 --> 12:25.000
Octave force was hosted in the source cords.

12:25.000 --> 12:42.000
And on GitHub we made good user of the continuous integration capabilities which are free for open source projects until now at least so and.

12:42.000 --> 12:57.000
And the concept was that to make another made it system for publishing packages so that me other maintainers like other users can publish their own code without.

12:57.000 --> 13:16.000
The necessity of any of the country of the developers being actively engaged or involved in this and the idea behind this was to actually be able to help people to expand the use of octave.

13:16.000 --> 13:32.000
Because what octave has been mostly used so far was basically doing linear algebra and matrix computations and this is why you don't get like when you hear like doing statistics and data analysis with octave.

13:32.000 --> 13:41.000
It comes it sounds a bit weird right because nowadays most people either use are or Python for example.

13:41.000 --> 14:04.000
But nevertheless it did work the transition that we did like the octave force used to have something like 55 packages and during the last years the majority of those were actually not maintained whereas with a new system.

14:04.000 --> 14:20.000
The octave packets indexed actually have more than 130 packages and with more than 100 of them being actively developed and maintained so actually from our perspective.

14:20.000 --> 14:24.000
It did work the effort of making this transition.

14:24.000 --> 14:40.000
It is the a octave packets index basically it's a single station file that it is automatically generated with all the packages that being listed on the octave packets index and basically what we actually do is that.

14:40.000 --> 14:57.000
We have set up continuous integration to actually monitor and most importantly test all the packages so any of you can you can write your own particular packets that you want to use in your research project in the class wherever and you can actually.

14:57.000 --> 15:17.000
Publish it there and have it readily available for your students for your colleagues to install it in octave and basically what we do actually we make sure that it goes through the continuous integration testing that it doesn't break octave when you install.

15:17.000 --> 15:31.000
And the other thing that we also have nowadays is that basically octave also takes that for the integrity of what is what is what is being downloaded.

15:31.000 --> 15:58.000
With was a long-standing issue in request for from my users of course and we did that of course this doesn't mean that you can't like you still have to know what you're downloading because we were just testing that the code that the package doesn't download but of course you're downloading a programming language.

15:58.000 --> 16:20.000
So you have when you don't load the package you have to actually keep this in mind. But anyway, all this you can actually just install it in octave with a simple compile pkg install and the packets that you want to install and this actually makes it quite.

16:20.000 --> 16:34.000
It's helpful not only for a colleagues but especially in classrooms at least my experience because it has been quite a few times that I just.

16:34.000 --> 16:50.000
Had to actually make something for the classroom and then make it make it at least to have it available and then ask the students to download it either in the lab or in their personal computers and to work without.

16:50.000 --> 17:06.000
Which is quite handy in certain systems. So during my research I mean I've mostly been dealing with a population statistics in biological anthropology and doing a classification for biological parameters.

17:06.000 --> 17:21.000
So using octave basically for the most part was actually dealing with a statistics package and this is how I started contributing back in like.

17:21.000 --> 17:30.000
2016-17 I think and then after certain point I said like and because there were no maintainers actually I said like okay we.

17:30.000 --> 17:40.000
I started I took over the maintenance and the main development back in 2022.

17:40.000 --> 17:51.000
And because I've been using this for the classes as well I did put a substantial effort to actually expand it and make it.

17:51.000 --> 18:11.000
Well not complete but as complete as I could so nowadays the latest statistics release is like one point eight and there are more than 450 functions and class objects that are supporting the.

18:11.000 --> 18:30.000
There is support for more than 30 different distributions and as far as I know it's the only package that it does support so mainly fully support so many distributions in a single package.

18:30.000 --> 18:39.000
And these distributions include like random generators, distributors, feedings, log like you would say etc.

18:39.000 --> 18:57.000
Of course there's a very large set of functions for hypothesis testing and the most important is the 11,000 unit tests that are integrated in the statistics package.

18:57.000 --> 19:22.000
The one of the one of the important aspects of the octave both core octave and a lot of packages is that we do pay a lot of attention in the regression and testing it's not about writing some code and making somehow work but we really have to.

19:22.000 --> 19:43.000
Make sure that what we produce as a result it's correct well at least to our best effort that we can because after all we're still making a library for the numerical computation and if the output is not a numerical correct basically it's kind of useless no matter how fast it is or how.

19:43.000 --> 19:52.000
So and of course in the statistics package I've put a lot of I mean I've made like a lot of.

19:52.000 --> 20:08.000
Integrated algorithm for classification the regression I mean I've written these while because I needed them for my own reasons but after some point that I also started teaching I also found it quite handy.

20:08.000 --> 20:26.000
That I can use octave to actually teach statistics and I get it to there in a few minutes so the other thing about the statistics package in particular will basically all the packages I maintain but also octave is.

20:26.000 --> 20:35.000
Apart from testing we also have a focus for good documentation.

20:35.000 --> 20:41.000
And because it is important to be able to.

20:41.000 --> 20:49.000
To I mean when you write the function yourself you know how it works and you know how to call it and so on but.

20:49.000 --> 20:55.000
When you want your student to write it well the only way to get there is to actually.

20:56.000 --> 21:00.000
Right proper documentation which is a.

21:00.000 --> 21:09.000
Worker only told I mean writing the documentation for the function usually takes the same amount of time actually writing the code for the function so.

21:09.000 --> 21:14.000
And this is just an example from the online documentation.

21:14.000 --> 21:29.000
From the statistics package like I mean like up like this is the function for geometric mean and this documentation is also automatically generated with another package I've written the peak is the octave dock.

21:29.000 --> 21:42.000
So basically what it does is it takes all the help dox strings that are embedded in the function files and also the demos that are available and it produces this online documentation for users.

21:42.000 --> 21:50.000
To be able to see to read how how to use the function also find certain examples.

21:50.000 --> 22:05.000
So working with statistics well inevitably you end up needing certain certain data types and especially tables.

22:05.000 --> 22:11.000
Which is something that the core of the locks and well they are available.

22:11.000 --> 22:20.000
Matlab other languages have their own implementations for the tables like are has this data frames concept and etc.

22:20.000 --> 22:30.000
So over the past three years now I've started implementing the data types package from strad from scuds.

22:30.000 --> 22:40.000
And the idea was is to make table and categorical arrays available because that's the two things basically that I need for the statistics package.

22:40.000 --> 22:46.000
But doing that then you know you get into the rabbit hole and you start.

22:46.000 --> 22:53.000
I mean you realize that then you need to make like day time arrays and duration arrays and string arrays and etc.

22:53.000 --> 22:59.000
So I ended up with data types package which is.

22:59.000 --> 23:10.000
I mean it's not production ready I mean there is a still functionality missing but also what is already there it is working because as I said.

23:10.000 --> 23:23.000
What we build that we also build testing for that so we know that if it's working at least it's working correctly so it won't it won't mess your data.

23:23.000 --> 23:28.000
So if you call a function and it's not there it will not work so.

23:28.000 --> 23:38.000
And this is the what you get for example which is like similar to how Matlab actually uses tables.

23:38.000 --> 23:53.000
And this also I mean when you do it for yourself during your research it doesn't really matter you can have it like you can have your numbers in CSV and your data in CSV files.

23:53.000 --> 24:11.000
But when you get to the class and you actually have to teach students how to actually use the data load them from CSV files and use them to do an ANOVA etc.

24:11.000 --> 24:20.000
Being able to show the data in tabular format actually helps a lot.

24:20.000 --> 24:24.000
So I mean we are in the education.

24:24.000 --> 24:38.000
The room of course and most likely a lot of you will be wondering okay nice work that you guys are doing with Matlab thank you very much so what's the education in it.

24:38.000 --> 24:56.000
So I basically from my own personal experience which is not huge I mean I've been teaching like classes for the past five or six years now like in mostly in other graduate students.

24:56.000 --> 25:04.000
From my experience is that the and the other thing is that I haven't I haven't been teaching in like in CSV.

25:04.000 --> 25:08.000
Like the students I do is like from humanities their biologists.

25:08.000 --> 25:15.000
There are people doing social sciences and they want to and they need to learn statistics.

25:15.000 --> 25:23.000
So basically you don't have like take place take students or STEM students.

25:23.000 --> 25:44.000
And this is quite important because I mean what what we are dealing today is that in the class is that basically we have this AI thing and the question is like to AI or not to AI basically.

25:44.000 --> 25:54.000
So a lot of people saying that okay you will just have the AI to code the anything you want.

25:54.000 --> 26:03.000
But the thing and it kind of works like if you want to plot something the AI will do it for you today like some would compile it.

26:03.000 --> 26:10.000
But teaching students actually to understand the statistical concepts and apply them.

26:10.000 --> 26:19.000
They still have to understand them in order to be able to apply them even if they use an AI assistant to do it for them.

26:19.000 --> 26:30.000
Because at the end of the day I mean if you go to the grocery and like buy something that costs two euros and you pay five if you don't know math well the best like basic math.

26:30.000 --> 26:38.000
The best calculator in the world will not help you because you don't know what to do with two and five at the end of the same applies with statistics.

26:38.000 --> 26:59.000
So from my perspective is that what are the key aspects that make or that have a great programming language for teaching students statistics and data types.

26:59.000 --> 27:06.000
Well, to begin with indexing starts at one instead of zero.

27:06.000 --> 27:19.000
I mean most of us we are like developers programmers and so on and at the end of the day we are all I mean our favorite language each one has its own but it's the language that we use the most enter that we know.

27:19.000 --> 27:25.000
I mean you would guess that my favorite language is octave and C++ this is what I write this is what I know.

27:25.000 --> 27:31.000
But for first year students will they don't know anything.

27:31.000 --> 27:42.000
So basically every little detail matters in how much they will be able how how easy to attract their attention and get them involved.

27:42.000 --> 27:50.000
So the other benefit is that syntax naturally resembles mathematical notakes notation x equals five that's it.

27:50.000 --> 27:56.000
You see it on the blackboard you just type it and it's the same.

27:56.000 --> 28:10.000
And also the self-intuitive vectorization syntax which also has to do with the way with the way with the way octave is syntax is doing array indexing and also broadcasting.

28:10.000 --> 28:20.000
And last but not least well it's the modern yet simple graphical user interface that the octave has which also has an integrated debugger.

28:20.000 --> 28:36.000
The work space you and the variable editor and at least to my experience in the class it has been very very helpful in actually getting students to understand basic concepts of programming.

28:36.000 --> 28:49.000
And I think this is a highly important and this is why I wanted to talk to you about a new octave in the education at least from my own experience over the past few years.

28:49.000 --> 28:54.000
So thank you very much for your attention and to open to any questions.

28:54.000 --> 29:04.000
Thank you very much.

29:04.000 --> 29:11.000
Yeah, other questions yes, would you like to.

29:11.000 --> 29:25.000
And so are and octave I understand there is a lot of overlap but what would be the main distinction and I also want to like follow up we are for a question.

29:25.000 --> 29:29.000
Last year there was a talk about robotics and education.

29:29.000 --> 29:40.000
So with R I know it's not recommended for embedded because of memory reliance on OS and other reasons.

29:40.000 --> 29:52.000
Do you know if there is what's the situation with octave and it's use in embedded systems.

29:52.000 --> 29:56.000
Thank you for the question.

29:56.000 --> 30:02.000
Well, I don't know much about embedded system to be honest.

30:02.000 --> 30:12.000
So I mean I know you can write coding MATLAB and have it translated to embedded systems.

30:12.000 --> 30:16.000
I don't think you can do that with the octave.

30:16.000 --> 30:31.000
There are packets is in octaves that you actually interface with Arduino for example or you can interface with low IO systems through serial ports, USB, etc etc etc.

30:31.000 --> 30:36.000
So that's regarding the second question you made.

30:36.000 --> 30:48.000
One of the first questions so about R well R is mostly used for statistical computations.

30:48.000 --> 31:01.000
And you can do a lot of stuff but as I said in the previous slide well it's a very I mean it has a very bad syntax.

31:01.000 --> 31:19.000
So I mean it's a great software but in terms of education.

31:19.000 --> 31:39.000
When you're like in the high school or when you have like first year students who know nothing about programming languages.

31:39.000 --> 31:52.000
Well you have to explain them why X less does 5 is equivalent to X equal 5 so to speak.

31:52.000 --> 32:02.000
If you know what I understand so for example if you use Python and you want to do some vectorization.

32:02.000 --> 32:15.000
And you have all this semicolon semicolon comma semicolon a number a minus number indexing sort of I mean if you know Python it's like yeah okay.

32:15.000 --> 32:27.000
But in education we're dealing with students well especially in my field where I think it's like students from social sciences.

32:27.000 --> 32:41.000
I mean I put a lot of effort to convince that you need some sort of knowledge in data analysis you need to be able to actually write five lines of code and.

32:41.000 --> 32:53.000
And merge to CSV files to do your data analysis instead of copy pasting stuff in SSO you know it's.

32:53.000 --> 33:06.000
I hope I just answered your question and anybody else it's time up okay is it time for the last question or.

33:06.000 --> 33:24.000
I think there is an ongoing effort by.

33:25.000 --> 33:41.000
There's a company there's a fence company think that they've done integration with with do be there in the web assembly but I don't know much about it to be honest to be more informative.

33:41.000 --> 33:56.000
But I know that there is some effort to integrate the I mean they do have an opportunity I don't know much about it.

33:56.000 --> 33:57.000
Okay thank you very much.

