WEBVTT

00:00.000 --> 00:11.760
Yes, hello, hello, everybody. Welcome to Nikolai Stoke. Yes, unfortunately, he was

00:11.760 --> 00:18.600
able to make it. So, yeah, my imposter syndrome was never as bad as today because, yeah,

00:18.600 --> 00:26.880
that's not my talk. Okay, I will try hard to, you know, to deliver what he wanted to deliver.

00:26.880 --> 00:32.400
Nikolai's friend of mine, he's an amateur musician and a music lover. So, he put an epigraph to

00:32.400 --> 00:40.080
his talk, saying from a French list. And he asked me to tell that this is like free open source

00:40.080 --> 00:49.040
conference and at least was probably one of the free spirit like people and history. And

00:49.040 --> 00:58.160
mine this epigraph will get to it later. Well, why just don't print out like we saw examples of,

00:58.160 --> 01:04.960
you know, the document producing like technologies. So, probably we have somewhere this print

01:04.960 --> 01:14.080
menu item or expert to PDF or expert to like auditi. Why just it doesn't work. Like, you know,

01:14.160 --> 01:20.560
from practice, it doesn't. So, what to do with, for example, you have code listing and it has

01:20.560 --> 01:27.600
like very wide line and what do you do? Like, you may scale, just make it smaller, but scale,

01:27.600 --> 01:35.600
if you are doing like semantic markup, scaling is not semantic, you know, part. Or you may

01:35.600 --> 01:42.800
change just to use landscape orientation for this page only. And again, it's not not a part of

01:42.800 --> 01:49.520
semantic or should it be like should we insert some hints there? It's not known. Or probably we

01:49.520 --> 01:56.160
would like to do both. Or probably we would like to fire an exception like, you know, now your

01:56.160 --> 02:03.280
code building failed and author please rewrite your code example so that your snippet will contain like

02:03.280 --> 02:10.080
short lines. Or probably you would like to rub them and again there are options. You can use

02:10.160 --> 02:16.720
line feed and spaces, but if you care about validity of your code, somebody copies it from PDF

02:16.720 --> 02:23.360
and inserts it like it may break. Or you may use indents, which will work with text processors,

02:23.360 --> 02:30.880
but will not work with PDF. So, choices, choices, choices. And while on the web, in HTML,

02:30.880 --> 02:36.400
we can just add horizontal scroll bar and everything. And this like button copy to clipboard.

02:36.480 --> 02:42.240
And that's just one example. So, there is a core mismatch. semantic markup meets the rigid

02:42.240 --> 02:49.520
world of print. And yeah, what's interesting like this talk is like doing the opposite, like going

02:49.520 --> 02:55.360
in the opposite direction from the doc link that we like heard before because now we are going back

02:55.360 --> 03:02.240
from like doc link back from printed media to semantics and we are going from semantics to printed

03:02.320 --> 03:10.000
media. And this is also like very hard task. So, the main focus like what Nikolai wanted to present

03:10.000 --> 03:18.880
in this is this unidoc publisher repository, GitHub repository. This is actually a result of evolution

03:18.880 --> 03:25.600
because like over the years trying to solve this problem. He did three approaches.

03:26.560 --> 03:32.320
Uh, first one was the access cell of four approach from aski doctor. The second one is the open

03:32.320 --> 03:40.480
document converter. So, the idea was simple. You have aski doc document just produce nicely looking

03:40.480 --> 03:46.560
open document like the library office writer format for this. And finally, unidoc publisher,

03:46.560 --> 03:54.880
which claims to be like any markup to any printing, painting rendering engine, quite a series, quite a

03:54.880 --> 04:04.000
bold claim. Let's see like what it does and how. First of all, what formats do we have for printing?

04:04.000 --> 04:08.720
Like first that comes to mind, of course, PDF, PDF of course, if we have PDF we can print it.

04:10.480 --> 04:15.920
Second is text processing format, right? So, you if you are using word or library office,

04:15.920 --> 04:23.120
like you can print it because there are page, you know, break HTML. Who thinks that HTML is a

04:23.200 --> 04:32.000
format for printing? Wow, some people, some people do. Well, actually we have CSS page media,

04:32.000 --> 04:39.680
which is a CSS extension, which defines styling specific for printing. But the harsh truth of life

04:39.680 --> 04:49.040
is that also we have been with CSS page media for 15 years. Now, still browser support is limited.

04:49.040 --> 04:55.200
And what in practice, like how it is using practices, why are some open source tools,

04:55.200 --> 05:04.400
which convert HTML, plus CSS, page CSS into PDF. So, it's kind of intermediary format,

05:04.400 --> 05:10.000
rather than, you know, well, used format. But still, technically, yeah, this is possible.

05:10.720 --> 05:17.840
Okay, how do we render to this format? Well, if you want to go low level, really low level,

05:17.920 --> 05:24.720
good luck. You can use native PDF generating. There is access LFO. There is tech, of course,

05:24.720 --> 05:32.160
if we are like speaking about PDF and printing, we cannot miss it. And as mentioned before,

05:32.160 --> 05:40.400
HTML plus page media CSS. Also, if we have, like text processor, you know, files,

05:40.400 --> 05:46.640
we also can render to, you know, print it media either by direct printing or converting it to PDF.

05:48.720 --> 05:54.480
The truth is that these technologies are not aligned in a great number of various ways.

05:55.440 --> 06:02.640
So, yeah, we can compare every way, as its own limitations and problems. Like a patch,

06:02.640 --> 06:08.720
fob has long-standing problems with dots in a table of contents. Libre office writer doesn't support

06:08.720 --> 06:16.240
typography, like keep with next, with in tables. Microsoft doesn't recommend using, like,

06:16.240 --> 06:23.520
automation, automating work, actually, to produce PDF. So, there are lots of problems, and it looks

06:23.520 --> 06:30.800
like, you know, old joke about those paleologists who meet in a narrow cave, and one says,

06:30.800 --> 06:35.440
I'm coming from the dead end, and the other says, like, I'm coming from the dead end, too.

06:35.440 --> 06:41.680
So, like, dead ends everywhere, and the world of printing is the world of constraints.

06:42.560 --> 06:51.120
And those constraints, they differ for each technology. So, often, you end up supporting several

06:51.120 --> 06:59.040
chains. For example, your main output is exclusively looking text tech book, and in order to,

06:59.040 --> 07:07.840
you know, navigate reviews or approvals, you are sending Libre office files just for this,

07:08.160 --> 07:17.280
you know, workflow. So, there are no universal solutions. And Nikolari Kamens is still

07:17.280 --> 07:23.120
unidoc publisher, if at least one of these holds. So, if you don't prepare the

07:23.120 --> 07:29.360
documentation, especially for printing purposes. So, if you are doing something like, you know,

07:29.360 --> 07:35.280
on Torah, the documentation like huge tutorials website, and you just need occasionally,

07:35.280 --> 07:42.320
you just need to present a PDF to somebody. If you are automating the documentation generation,

07:42.320 --> 07:48.640
I'm going to speak about it one like our later. And, like, much of your documentation is not

07:48.640 --> 07:53.360
manually produced, but automatically. And you hope it will look good, no matter what will be generated.

07:54.240 --> 08:00.640
And, if you're output format is one of the text processing format, this is also impossible.

08:02.000 --> 08:11.360
So, where do we start? Like, where did Nikolai start? He first he tried to create and ask

08:11.360 --> 08:20.320
you doctor open document converter. So, the idea was the following. So, ask you doctor,

08:20.320 --> 08:27.520
parses it's own mark up into an abstract syntax tree. You may extend it, you may transform this

08:27.520 --> 08:34.320
IST with the aski doctor tree processor and, you know, run each template. So, actually writing

08:34.320 --> 08:41.680
a custom aski doctor processor. And, the writer can be already done either with pure ruby,

08:41.680 --> 08:48.080
or with special templates. So, this is how it looked in practice. So, in the left side of this slide,

08:48.160 --> 08:57.760
this is a simplified AST of an aski doctor document after parsing to the right. There is a slim template.

08:57.760 --> 09:03.440
And, you see, we have him mixed ruby code, everything which starts with the dash is ruby.

09:03.440 --> 09:08.240
Everything which doesn't start with dash is a part of the template. It's just XML output.

09:08.880 --> 09:19.520
So, yeah, you can use aski doctor parser, this extension point that it provides and built

09:19.520 --> 09:29.520
whatever you like, including open document format. It was great, but, yeah, you know, this template

09:29.520 --> 09:36.640
cannot be universal. Everybody wanted their own, you know, particular features. And, it wasn't

09:37.600 --> 09:44.000
it was hard to graduate just part of the template. People had to copy this template paste it

09:44.000 --> 09:50.240
and do actual their own work. That's not how it can be maintained. And, then you should invent

09:50.240 --> 09:56.880
something for styling. By styling, we mean the styling for open documents for the word processor format.

09:56.880 --> 10:02.320
You know, then when you select text and you apply style from the list of styles. And, you say,

10:02.400 --> 10:07.760
okay, you have semantic styles and the word processors. They do support styling.

10:08.480 --> 10:16.320
Well, actually, this is kind of different things because, like here, like both bold and green

10:16.320 --> 10:23.200
are applied to some part of the text, but in open, like in libre office or in Microsoft,

10:23.200 --> 10:29.280
what you can only apply one style. So, you actually need three styles, one for bold, one for green,

10:29.280 --> 10:38.080
and one for bold green, and all the combinations. So, what did he have to do is,

10:40.000 --> 10:47.840
like invent slightly extended intermediary open document format, which contains not only standard

10:47.840 --> 10:56.880
attributes, but extended semantic attributes, and then post-processing it in order to add missing styles,

10:56.960 --> 11:05.760
like missing combination of styles. Well, it kind of worked. And, one of the interesting,

11:05.760 --> 11:12.160
like finding was that users of this approach, they started to transform this extended open document

11:12.160 --> 11:19.680
format. This wasn't intended, intended, you know, feature or intended capability of this product.

11:19.760 --> 11:27.040
Yet, Nikolay noticed that people wanted to transform a ST before converting it into something

11:27.040 --> 11:36.480
printable. Yet, this idea of post-processing and adding, like separately styles, proved to be useful.

11:37.680 --> 11:44.320
And, also, Nikolay liked very much the idea of using Gradle, Gradle is a built-to-from-JVM world,

11:44.400 --> 11:52.400
quite powerful with statically typed DSL. So, it has statically typed checks before you run

11:52.400 --> 12:02.480
this Gradle script. Magnificent in gluing all the parts together. So, before we go to the second step,

12:02.480 --> 12:12.480
the Unidoc publisher, like some final thoughts, like if creating, universal converter is impossible,

12:12.560 --> 12:18.160
which will create meta-converter, a platform for building converters. So, that everybody who

12:18.960 --> 12:25.200
face the problem of transforming their existing, like, body of the documentation into PDF,

12:25.840 --> 12:31.840
they can just simply do it by a custom scripting, but without too much effort.

12:32.960 --> 12:40.880
And, what are the requirements? Native converter is a reader. We will explain this at the next slide,

12:41.200 --> 12:48.400
what's your name? So, we're learning AST. So, yeah, this should be more or less easy to do, right?

12:49.840 --> 12:57.360
And, styling as a separate focus, well, this is just, you know, learning or finding from the previous

12:59.520 --> 13:07.280
previous version. And, yeah, ideally, it should have good integration with CICD with focus on

13:07.360 --> 13:15.440
homogeneity. What is meant here is that everything must be from the same ecosystem, like either JavaScript,

13:15.440 --> 13:26.480
or Java, or Python, but not mix, because mix is often problematic. What do we mean by native

13:26.480 --> 13:34.080
converter as a reader? The idea is that each converter outputs HTML and HTML is quite semantic.

13:34.800 --> 13:43.040
So, if we take as an input, an HTML output, we can actually build the universal thing.

13:43.920 --> 13:52.080
So, markdown can produce HTML, ask your doctor can produce HTML, Microsoft Word, whatever, every,

13:52.160 --> 14:07.520
you know, we can produce HTML. So, you can do it, just read it and there are good, you know, HTML,

14:07.520 --> 14:15.680
reading HTML parsing libraries in open source, if it's in Java, it's JSON, which is quite, quite good.

14:16.320 --> 14:22.240
So, the experiment, let's convert this presentation, the presentation that you see

14:23.200 --> 14:34.720
into printable form. So, this presentation, by the way, is written in asking doctor. So, this is the source

14:34.800 --> 14:45.200
code of Nikolai's presentation. What we want to get is document like this. So, it's LibreOffice

14:45.200 --> 14:53.280
Reiter, and as you can see, it's converted into this printable form. The fact that it is converted

14:53.280 --> 14:58.480
into something else, you can see before you're all nice. It's this presentation, it's asking doctor

14:58.560 --> 15:07.680
review JS pipeline, which just converts it into this clickable, you know, HTML. So, how do we

15:07.680 --> 15:22.400
do this? Using Nikolai's tool. So, yeah, he's using Kotlin internally because, well, it's JVM approach,

15:22.400 --> 15:30.320
is using Gradle with the Gradle script. And the following code listings, they actually as snippets

15:30.320 --> 15:36.960
taking from from the scripts. So, every code snippet that you can see on this presentation,

15:36.960 --> 15:45.600
is actually included from the presentation source code. We will share a GitHub repository. So,

15:45.600 --> 15:50.800
you will see that this is actually not something copy and paste that it's included. So, this is

15:50.800 --> 16:00.400
workable code, which actually works. I will speak like a bit more about this in my next presentation.

16:00.400 --> 16:11.920
So, the boilerplate, this is like a snippet of Gradle script, build script. And what does it do?

16:11.920 --> 16:18.160
Like, first we need the HTML. So, we are using a skiddoctor.js is its library, which allows

16:18.160 --> 16:25.040
to build HTML from a skiddoctor, it's Java library, to build this HTML. Then we are

16:25.040 --> 16:30.480
adding a template. The template here is the starting point. It's just to know that empty

16:30.480 --> 16:37.920
word or library office file, which we are populating with content. Then some, some other technical

16:38.000 --> 16:50.320
details, right, we need, but if we, I'm need this part, which is commented out, and just

16:50.320 --> 16:59.840
run it without any further customizations, we will get something like this, let me show. So,

16:59.840 --> 17:07.760
this is the default transformation. So, already like not bad, probably. So, as you can see,

17:07.760 --> 17:14.960
there is some, you know, output. You can see page headers, but on the first page, you have

17:14.960 --> 17:23.600
page header as well. The first slide looks ugly because like, you know, sizes of images.

17:25.600 --> 17:36.080
The epigraph is lost. So, we need to fix it somehow, right? So, the idea is that, by default,

17:36.160 --> 17:41.600
you are getting like pretty good result, but if you want an ideal result, you should, like,

17:41.600 --> 17:47.520
get your hands dirty and do some abstract syntax tree transformations. That's it. So, how do you do this?

17:47.520 --> 17:53.520
Like, after parsing, you will get access to abstract syntax tree. For example, you need to

17:53.520 --> 18:01.280
extract the first section to build this beautiful header. So, you just filter this out by

18:01.280 --> 18:08.240
source tag name, like take the first section, and make it title out of this. So, how does this first

18:08.240 --> 18:15.200
section look like in askiDoctor? So, this is the source code, and actually, yeah, this is quite

18:15.200 --> 18:21.920
interesting because it's technically, this is an askiDoctor page, which includes the part of itself

18:23.680 --> 18:29.760
within the askiDoctor page. So, yeah, this is like a listing like a,

18:29.840 --> 18:36.560
a part of the listing of this askiDoctor. So, we are going to extract symmetrical variables from

18:36.560 --> 18:43.200
there. Like, you can see roles here, full name, title, photo, biolog and stuff. So, this is how we're

18:43.200 --> 18:50.560
doing this. Like, we have access to AST, so we just, you know, filter out these roles and assign

18:50.560 --> 18:57.760
them to a bunch of Kotlin variables. That's it. And then, having this Kotlin variables,

18:58.720 --> 19:08.720
we can rebuild just three orange, rebuild reconstruct the header, the beginning of this document.

19:08.720 --> 19:16.000
So, what we have here, a pinched out table, table row group, is the builder for open document format.

19:16.000 --> 19:21.280
It can be some other format, but in this case, we are building it. And we are using this, you know,

19:21.280 --> 19:32.240
photo by a contact variables that we extracted in our abstract syntax three. So, what do we have

19:32.240 --> 19:40.800
in the end? Like, let me, let me show it once again. So, we have beautifully looking, you know,

19:40.800 --> 19:50.480
document, which contains all the slide contents. It contains all the speaker notes. And also,

19:50.560 --> 19:56.480
we have an epigraph. We didn't lose it. Also, it used some, you know, unsupported

19:56.480 --> 20:02.320
troll of whatever feature of, you know, of ASCII doctor, which prevented it from, you know, default

20:02.320 --> 20:09.360
expert. Now, it's being shown and it's being shown in, in, in its correct place, in, as the result.

20:13.680 --> 20:14.480
Sorry?

20:14.480 --> 20:18.720
Could you enable the unprintable character? Unprintable characters?

20:23.520 --> 20:28.480
Yes, if it's possible via a builder like LibreOffice Builder. So, it's, like,

20:32.000 --> 20:34.240
I see, you mean this one, okay?

20:35.920 --> 20:43.120
Yeah, yeah, yeah, yeah, you wanted this, yeah, sure, sure, of course. Yeah, yeah, yeah. So, it's,

20:43.440 --> 20:52.320
you wanted to see if it's not all, like, spaces, right? Okay, okay, okay. No, no, it's, it's building, like,

20:52.320 --> 20:57.200
it's building, it's structurally. Yes, it's building it structurally. I'm going to skip this one.

20:57.200 --> 21:03.680
It's, like, highly technical and the Nikolai knows better. By the way, Nikolai is available in

21:03.680 --> 21:11.680
element chat for this group. So, he will be, uh, happy to answer everything. Uh, a bit about testing.

21:11.760 --> 21:19.120
So, it's, uh, kind of, a lot of code already, this transformational code. And we need to make sure

21:19.120 --> 21:23.600
that we, if we are maintaining it, if we are keeping it for some, you know, documentation project,

21:23.600 --> 21:29.440
we need, uh, to, to put some regression testing on it. So, the idea is quite simple. It's, uh,

21:29.440 --> 21:38.800
snapshot testing. So, uh, what we do is we prepare small, small snippets, like a small table, or,

21:38.960 --> 21:48.720
some, you know, combination of, uh, styles. And we ask this pipeline to output the picture,

21:48.720 --> 21:57.840
like printed. So, uh, we have an approved picture just in, in Git. And, uh, if, uh, due to some

21:57.840 --> 22:04.720
version changes or something, uh, the picture becomes to something becomes to, um, something is sliding, right?

22:05.360 --> 22:12.240
Uh, we see it immediately. Our, uh, CI build fails. And, uh, if we are okay with this sliding,

22:12.240 --> 22:18.000
we may just re-approved, which is, say, okay. Now, this version is, uh, kind of fine. So,

22:18.000 --> 22:23.760
this is how it's gonna be. But if something breaks, like, something like this, uh, table disappears

22:23.760 --> 22:29.120
completely, right? In this case, it will be obvious that we need to fix, uh, to fix the code.

22:30.000 --> 22:39.520
Uh, okay. So, what did we get, uh, the result? Uh, so, uh, we got everything CI friendly. What can't

22:39.520 --> 22:46.000
be more CI friendly than, you know, a build script, grade-all build script, because grade-all is a build tool.

22:46.000 --> 22:54.080
And everything is built, uh, with a DSL within grade-all. Uh, like, there are no declarations.

22:54.080 --> 23:05.040
What is meant, meant here is that, uh, it's, uh, this product doesn't, uh, offer some default, uh,

23:06.240 --> 23:13.360
behavior that you can override with some, you know, knobs and switches. And, uh, it's definitely,

23:13.360 --> 23:18.960
like, it was, um, in principally, did there other way around. So, if you want to change something,

23:18.960 --> 23:28.000
you need to get your hands dirty and do some AST transformations. And, uh, it has type, uh, AST,

23:28.000 --> 23:35.920
and clean and testable code. Well, that's, get to conclusion. So, three printing is engineering.

23:35.920 --> 23:42.480
Designed, coded, tested, automated. Printing is a loss of transformation. Not only, you know,

23:42.480 --> 23:48.240
from printed media to semantic markup, also vice versa. It is a loss of transformation. Some

23:48.240 --> 23:53.840
semantics cannot survive it. Uh, keep rendering logic programmable under your control.

23:53.840 --> 24:00.320
And have a look, check out this, uh, you need dog publisher and this repository, which contains

24:00.320 --> 24:07.680
this exact slides and the way how, like, printed documents can be built from them. Thank you very much.

24:07.680 --> 24:08.640
Questions?

24:09.280 --> 24:23.300
Thanks.

24:23.300 --> 24:25.120
Um, JavaScript probably makes.

