WEBVTT

00:00.000 --> 00:12.920
Okay, so I'll be presenting the F8 architecture.

00:12.920 --> 00:18.960
For those who have been to last year's talk, the first year slides with the motivation

00:18.960 --> 00:25.400
introduction will be familiar and the part will get into some more details.

00:25.480 --> 00:30.840
Now the F8 is an A16-bedar architecture.

00:30.840 --> 00:34.760
Now, why do we still, but it's A16-bedar architecture?

00:34.760 --> 00:42.920
Well, when my point is still used in huge numbers, in a large number of devices.

00:42.920 --> 00:47.160
Even the laptops, we must have what's occurring, we say,

00:47.160 --> 00:51.240
tend to have an 85-divine, which is an 8-bit core,

00:51.320 --> 00:55.240
as a keyboard controller, or somewhere deeply within the Wi-Fi chip.

00:55.240 --> 01:00.920
For example, the real-tech Wi-Fi chip, you contain 85-divine cores.

01:00.920 --> 01:07.880
Now, in general, there's even four-bit microcontrollers in U-SAM-TARM.

01:07.880 --> 01:14.280
What the cycle computer, for my bike took it apart, turns out there's a four-bit microcontroller in it.

01:14.360 --> 01:20.440
And of course, there's a familiar 32-bit 64-bit channel dominated by arm and probably some risk files.

01:21.640 --> 01:24.920
But there's still this area of A16-bed.

01:24.920 --> 01:30.360
Whereas a 32-bit devices are too big, not just in terms of their core,

01:30.360 --> 01:34.200
but also in terms for memory requirements, both for program and data storage.

01:35.640 --> 01:40.040
Now, is this low-end A16-bit architecture typically programmed in C?

01:40.040 --> 01:44.120
They're not that expensive, the typical range is about a century euro.

01:44.200 --> 01:47.240
So, some of them are a bit cheaper or more expensive.

01:48.280 --> 01:56.280
The data memory tends to be in the range of about 60-70 bytes to about the same number of kilobytes,

01:57.160 --> 02:05.960
and the program memory tends to be a few kilobytes, typically Samsung, like four or two hundred and twenty-eight kilobytes of a program memory.

02:06.120 --> 02:17.160
The market is dominated by, for example, retiree architectures, like the STM8, by STMicroelectronics,

02:17.160 --> 02:23.000
and then these ancient things like 80-50-wannes, or some of the 80-derivatives,

02:23.000 --> 02:25.720
and such, were all patterns, have long inspired.

02:28.440 --> 02:34.040
We don't have a good modern free architectures.

02:34.120 --> 02:38.200
In a sort of two-six-four-bit market, we have a free drive, which is doing a very well,

02:39.000 --> 02:46.600
but we don't really have this free architectures that has enough momentum in the 80-10-bit area,

02:46.600 --> 02:49.640
and that's what I'm trying to change, better F8.

02:50.360 --> 02:56.040
Now, a little bit of background, the small device, C-compiler, is a C-compiler targeting eight-bit systems.

02:56.840 --> 03:00.920
In particular, architectures that are hard to target and clang, and DCC,

03:00.920 --> 03:02.840
do two very irregularities.

03:03.960 --> 03:09.640
It's a compiler that, other tools you need, like a sampler, linker, simulator,

03:09.640 --> 03:13.400
round, it works on a variety of host systems.

03:13.400 --> 03:16.600
It targets quite a lot of different eight-bit architectures.

03:16.600 --> 03:22.440
It has optimizations that make sense for these eight-bit architectures, such as a register,

03:22.440 --> 03:28.920
locator, that can deal with irregularities, get slow for a large number of registers,

03:29.000 --> 03:34.120
which would be a problem if you use this in GCC or clang, but it's fine for eight-bit architectures,

03:34.120 --> 03:42.120
with a small number of registers. The user group is embedded systems microcontroller programmers

03:42.120 --> 03:47.160
and if you're computing enthusiast, because of all the eight-bit architectures,

03:47.160 --> 03:52.840
as a part two, not just the current ones. Now, I'm a developer of the small device C-compiler,

03:53.960 --> 03:58.120
and what I learned about the eight-bit architectures we are targeting,

03:58.200 --> 04:02.360
what makes sense, what makes them a good target for C-compiler,

04:02.360 --> 04:06.360
a lot of that experience, of course, went into the design of the F8.

04:07.320 --> 04:13.000
And a really important point is, you want an efficient stack point, a relative, a tracing mode.

04:13.800 --> 04:18.360
You put your local variable, you want to put some on the stack, because that's how you do

04:18.360 --> 04:23.080
re-entran functions in C. If you don't have it, is that you have to give up re-entrancy.

04:23.080 --> 04:26.760
By default, that's what you typically do when targeting the 80, 50, one.

04:26.760 --> 04:32.440
Only makes those functions re-entrant, where you can use a real SS-K-I.

04:32.440 --> 04:37.000
A nitri-cursion or indirectly or directly, or a nitri-entrant,

04:37.000 --> 04:39.960
because I want to call it for mentor-up controller,

04:39.960 --> 04:45.400
but if you have a efficient stack point, a relative, a tracing mode, then a tracing becomes much easier,

04:45.400 --> 04:48.360
and you can do the right thing, and it's a efficient thing to do.

04:50.520 --> 04:54.840
Then you want to unify that press-based, you want to say, okay,

04:55.000 --> 04:58.920
this pointer, and you have your instruction for pointer access, and that's it.

04:58.920 --> 05:02.520
You don't want to say, okay, this pointer. Now, let's look at the topic.

05:02.520 --> 05:06.600
It's now we do a switch statement, there's five different cases, depending on which

05:06.600 --> 05:11.000
address-based it is, I need to use a different instruction. Like you do have, let's say, the 85,

05:11.000 --> 05:17.400
51, or quite some other of the less elegant data architectures.

05:17.400 --> 05:19.880
Next point is, it helps to have registers.

05:19.880 --> 05:26.600
Next, the experiment shows that there's the M8, for example, with its stack point relative

05:26.600 --> 05:31.880
pressing mode, and unifying address-based, it's a bit lacking in registers, so there's a few

05:31.880 --> 05:39.080
corner cases where even an old side of it can be more efficient. So you don't need a lot of registers,

05:39.080 --> 05:41.480
but having some really helps this code generation.

05:42.360 --> 05:50.840
Then the experience from the product devices is that, yes, you can replace peripheral hardware

05:50.840 --> 05:56.280
by having an extra core or an extra hardware thread where you emulating your peripheral device,

05:57.320 --> 06:02.920
and then do that in real time, because it's extra core, so that was at the rest of the system.

06:04.520 --> 06:09.400
But you need to think it fully school, and you want it's supported at the sea level,

06:09.400 --> 06:16.200
as a product devices, it's typically programmed in a sampler, and the notice in the instructions

06:16.200 --> 06:21.400
that, for example, see 11 atomic, do not map nicely to that instruction set.

06:22.200 --> 06:27.000
So you want support for these multi-stating anatomics and thread-local storage,

06:27.000 --> 06:31.400
in your instruction set, so you can both programs or things in sea, and be able to

06:32.440 --> 06:35.240
not use a lot of your die space for peripherals.

06:36.200 --> 06:36.840
Yeah.

06:37.400 --> 06:43.240
The point about your architecture is, we have a register-locator that can deal with it well,

06:43.240 --> 06:48.600
and then they are fine for code generation. Now, DCC, whether it's or clang,

06:48.600 --> 06:52.200
there's a typically track-installed graph, coloring a register-locator,

06:52.760 --> 06:56.600
can't deal with irregular architectors well. You know, they want

06:58.120 --> 07:03.720
kind of risk architecture, every instruction should have its register-operants,

07:03.720 --> 07:07.480
and it shouldn't matter which register it is. If you have an instruction,

07:07.480 --> 07:10.520
should be available for all registers, and should have the same cost.

07:11.560 --> 07:17.560
But if the number of registers is low, and we have small targets, it's fine to deal with irregularities.

07:17.560 --> 07:21.000
You want a good mixture of 8 and 16-bit operations, because you want to do 8-bit,

07:22.280 --> 07:25.400
where you don't need more than 8-bit, because you have little memory,

07:25.960 --> 07:29.240
but an 8-bit address space would be totally interfacial.

07:29.320 --> 07:33.000
You need to 16-bit address space, and therefore pointers are 16-bits,

07:33.000 --> 07:37.480
and then ends are 16-bit, and then see a lot of stuff gets promoted to end.

07:37.480 --> 07:43.720
Often, you can optimize it out, but not always, so you need to be able to deal with 16-bit stuff.

07:43.720 --> 07:48.920
And yes, as I said, pointers should be 16-bits, and if you want to unify that address space,

07:48.920 --> 07:54.040
so pointer is a 16-bit value in memory, and we can use it directly, and we don't have

07:54.840 --> 07:59.400
an upper byte, that is a text or for which address space it points, and then makes it 24-bit,

08:00.680 --> 08:02.680
or things like that.

08:05.320 --> 08:07.400
When getting to a few more details,

08:08.760 --> 08:14.680
we experience a 16-bit address setting, that many 8-bit architect has suffered, actually, not that useful.

08:16.120 --> 08:21.320
So you can use this as kind of a sketchpad memory, but the savings are relatively small.

08:21.800 --> 08:26.760
You save an address byte in your instruction, so it makes it a fraction a little bit shorter,

08:26.760 --> 08:31.480
but the user has to decide what goes into that 0-page or scratchpad memory.

08:32.760 --> 08:37.080
It's much better to just have the stack pointer relative at pressing mode, be efficient,

08:37.080 --> 08:41.320
and then the local variable is also one-factical, excess efficiently.

08:43.560 --> 08:48.920
Index pointer relative read instructions for most 8- and 60-bit offsets

08:49.880 --> 08:55.560
are important, it's particularly for like using existing stacks of union, yeah.

08:56.200 --> 09:02.680
We want to be able to, to hear the, the pointer to the stack, now gets this memory.

09:04.600 --> 09:12.760
And prefix bytes, which some 8-bit architect has, can actually be relatively good way to allow additional

09:12.840 --> 09:17.640
operands. Now we want our instructions to be very compact, we can't afford like a

09:19.160 --> 09:25.960
classic risk format with op code, target operand left operand right operand because that results in

09:25.960 --> 09:32.360
big instructions. We want short instructions, and if we want to use different operands, then

09:32.360 --> 09:36.600
where we do the rare cases via prefix bytes, so only instructions are actually

09:37.320 --> 09:41.720
do unusual stuff get longer. That means we don't have the risk principle of every instruction

09:41.720 --> 09:45.160
in the same length, but we tend to have different instruction lengths.

09:46.440 --> 09:51.480
In terms of multiplication, yes, hardware multipliers are expensive, but the cheapest ones, the

09:51.480 --> 09:57.320
8-time same to 16 is actually already quite useful, because first you can build bigger multiplications

09:57.320 --> 10:04.440
out of it, and second the very common case of error indexing is such a multiplication. Yeah,

10:04.440 --> 10:10.200
you have an area of struct, the struct has an odd size, because we don't want to part it to

10:10.200 --> 10:15.640
multiple of 64 bits or anything, because again we don't have enough memory, so five bytes

10:15.640 --> 10:20.440
struct, we need to do multiplication by five on the error access, and that is substantially

10:20.440 --> 10:26.120
sped up by this multiplication. And yeah, the other multiplicative operators such as division

10:26.120 --> 10:31.480
and remainder are relatively rare, so it tends to be not worse to actually have some in hardware.

10:32.280 --> 10:38.440
On the other hand, having a multiply in that instruction can substantially speed up

10:38.440 --> 10:45.240
white multiplications, because you have the multiplier, which is already costly, and you want to

10:45.240 --> 10:49.880
use it efficiently, because you are paying that hardware cost, and then ideally you want to

10:49.880 --> 10:56.520
do a big multiplication, 64 bits, you want to use the multiplier as often as possible, and

10:56.520 --> 11:01.160
not spend all your time shuffling data around to make it available to the multiplier and then

11:01.160 --> 11:07.320
store it away, so having dedicated multiply in that instruction helps a lot there. And after all,

11:07.400 --> 11:12.920
multiplications are not just for the small ones important for the error indexing, but also for

11:12.920 --> 11:20.440
things like cryptography, classic things like RSA, do multiply big numbers, and if we go into the

11:20.440 --> 11:25.400
post-ventum area, let's say, the numbers here are at the transform of the chi bar key exchange,

11:26.280 --> 11:36.280
also that's a lot of multiplications. Now binary code decimal, it's the first one, things are just

11:36.360 --> 11:45.080
the dinosaur from the past, and as a data form it actually is. However, we said the vision is

11:45.080 --> 11:51.160
really used, but still people often want to print out numbers in decimal, and then being able

11:51.160 --> 11:56.440
to convert a number to BCD and then to asky is like a cheaper alternative to dividing by 10.

11:57.400 --> 12:03.000
So having a little bit of BCDs are part of the hardware is actually useful for this use case of

12:03.720 --> 12:09.640
outputting numbers in decimal. Yes, and we want a good support for shift annotations,

12:09.640 --> 12:15.160
cause again, these are common when you're travelling bits, close to the hardware, if you

12:15.160 --> 12:20.120
some maybe need a floating-point, some we're not familiar to the software, or also in cryptography.

12:23.800 --> 12:29.800
So, where do we get there? Big picture, a 16-bit architecture, this will be an irregular

12:30.120 --> 12:36.040
architecture, cause on the other side we don't get the code density we want, that means that the

12:36.040 --> 12:41.640
core gets somewhat bigger, because we have some more complex instructions and instruction decode,

12:42.280 --> 12:49.880
but we are saving a lot on the code memory, because our code this is maximum compact. And then

12:51.960 --> 12:57.480
because we've seen there's a very low ant devices as well, I also want to define an instruction

12:57.480 --> 13:03.080
subset, where the core becomes even smaller, this is as 8L variant, where the core is the only

13:03.080 --> 13:06.920
about half-to-side for the others, but for example, the instructions are does not have a

13:06.920 --> 13:17.000
multi-plicational at all. Now, looking at the details, this is basically the register set,

13:17.000 --> 13:22.520
I came up with, sure we need a power-gram counter, counter, we need a stack pointer,

13:22.680 --> 13:31.160
the 16-bit, our architecture uses a flag register with flag bits, meaning we are not directly

13:31.160 --> 13:36.520
have a don't have a direct compare and jumping structure, but rather we do a comparison for us,

13:36.520 --> 13:42.200
which sets flags in the flag register, and then we jump depending on those, and then we have

13:42.200 --> 13:48.520
our general purpose registers, and in this case there's basically three 16-bit general purpose

13:48.520 --> 13:56.680
registers, which each consists of two 8-bit registers. Basically, this is built up from the

13:56.680 --> 14:03.080
experience, from the 8-bit target supported, such as the SDM H, such as the set 80 and all the

14:03.080 --> 14:09.560
set 80 derivatives, and so on, and now which might not be perfectly visible, that XL and Y are

14:09.560 --> 14:17.400
actually involved, because those are kind of accumulator. This architecture often have so-called

14:17.400 --> 14:23.000
accumulator registers, where many instructions implicitly operate on that one, or only support that one,

14:23.000 --> 14:28.760
as source and destination operand, and then you have these prefix flags that are mentioned before,

14:28.760 --> 14:34.280
if you want to use a different register instead. So typically in addition, will be something like

14:35.240 --> 14:40.440
takes a value in the accumulator, add something else to it, store the value into the accumulator,

14:40.440 --> 14:47.240
and by default for the 8-bit structure, we use XL as the accumulator and Y for the 16-bit instructions.

14:47.640 --> 14:58.440
If there's any questions on this, I think I'll try to leave time, because it's,

14:59.480 --> 15:04.120
it's really the number of registers and having them as this two half is really based on this experience,

15:04.120 --> 15:08.440
from the SDM H, where we do have an 8-bit accumulator, for example, and two 16-bit registers,

15:08.440 --> 15:13.320
pointer registers, but we can't really use anything else as 8-bit registers, whereas for example,

15:13.320 --> 15:19.800
it's at 80, where we actually do have a similar style of register pairs. We're often a 16-bit register,

15:19.800 --> 15:24.760
can we use S2, 8-bit registers, and that also is affected co-generation.

15:27.960 --> 15:33.960
And since I've been talking about instructions, and if you're not familiar with this style of instructions,

15:34.920 --> 15:42.760
I thought we'd just do a typical example here. There's different interaction classes in

15:42.760 --> 15:48.440
the F8, as usual for this architecture, one of them would be 8-bit instructions that have two operands.

15:50.280 --> 16:00.440
Again, the very first one, the ABC-XL, with another output, be a typical case, which means it's an addition

16:00.440 --> 16:08.840
of the Cary, we take the value in XL, 8-second operand, and 8-second Cary, store the result into XL,

16:10.120 --> 16:14.600
and also updates the flags, even though it's not mentioned on the slides, such as a Cary flag,

16:14.600 --> 16:21.000
to be able to build wider additions, such as the C-R flag, to use this for equality,

16:21.000 --> 16:27.960
comparations and so on. And then the second operand, as typical in the CISC architecture,

16:27.960 --> 16:34.200
can just be registered, but it also could be an immediate value or memory, or value in memory,

16:34.200 --> 16:39.800
either directly at rest, with 16-bit address, relative to the stack pointer, using the stack pointer,

16:39.800 --> 16:45.240
relative for tracing mode, or an indexed tracing mode relative to another register.

16:47.480 --> 16:55.800
And the second and third lines here show how variants we get if we use a prefix by.

16:55.880 --> 17:01.800
Now the first one, prefix is a prefix by for swapping the two operands. So basically instead of

17:03.240 --> 17:07.480
taking XL adding something, sorry it's there, while we use the location,

17:09.320 --> 17:23.160
and add something else to it, and store the result, and then sorry, that's not a second, that's a

17:23.160 --> 17:27.880
third line, this one here, I was talking about here. We basically swap the two operands,

17:27.880 --> 17:32.040
we are instead of having XL as a source and destination, we have the other thing as a source and

17:32.040 --> 17:38.680
destination, we add XL to it, that basically allows us to use one of the other registers,

17:38.680 --> 17:45.320
or a memory location, like we use the actual molecular before. It has a prefix by it,

17:45.320 --> 17:50.520
this interaction costs more, but it allows us to make better use of the other registers, and

17:51.480 --> 17:56.760
allowing to even have a memory operand means we can use this as a atomic instruction.

17:58.280 --> 18:03.400
Not that important for an 8-bit addition with Kerry, but in general, and this, if you want an atomic

18:03.400 --> 18:11.080
addition, and they have it directly operate on the memory operand, that is how you can do it efficiently.

18:11.080 --> 18:19.000
Okay, now after the second line, we just change the XL via a different one, so instead of swapping

18:19.000 --> 18:23.400
the two operands and having XL as a right operand, we still have the right operand like before,

18:23.400 --> 18:29.400
but instead of XL as a accumulator, we use a different register. So for the second line,

18:29.400 --> 18:34.840
it's always a register. That's one of the source of destinations, only for the swap

18:34.840 --> 18:38.680
the one where a source one can actually have memory as a destination.

18:43.880 --> 18:47.960
It works similar with the 16-bit interaction with two operands,

18:47.960 --> 18:52.120
or if you have an 8-bit interaction with just one operand, like a rotation or shift

18:52.120 --> 18:56.760
the interaction, because of course it doesn't make sense to swap operands, but you still have

18:56.760 --> 19:02.200
a prefix by where you replace the XL accumulator by one of the different registers, or even a memory operand.

19:05.080 --> 19:09.240
Typically, you, in the compiler, then you keep commonly accessed values in registers,

19:09.240 --> 19:12.680
but if you're running out of registers, for example, you make sense to have a decrement,

19:12.680 --> 19:17.080
that works directly on the memory for your loop counter if your loop counter is not accessed much.

19:17.160 --> 19:24.840
Inside the loop, otherwise. So I was only mentioned that there's other instructions,

19:24.840 --> 19:28.440
so the rough instruction that overviews, we have this 8-bit tour operand instructions,

19:28.440 --> 19:32.520
where we saw the example, we have the one operand instruction, like a decrement or shift,

19:33.320 --> 19:41.880
similar instructions exist, 16-bit white, not as many, but still enough that we can work well with

19:41.880 --> 19:46.520
integers and with pointers, and then of course there's the load instructions,

19:48.680 --> 19:54.120
where we're moving data around between registers and memory, or even in some corner cases

19:54.120 --> 20:00.840
between memory and memory, and then there's a special case once, the other 8-bit instructions,

20:01.960 --> 20:11.080
other 16-bit instructions. Typically, you always have like a few instructions that you need for

20:11.160 --> 20:16.680
extra stuff, that doesn't really fit into the other categories. Like a sign extension instructions,

20:16.680 --> 20:22.360
especially in the 8-16-bit area, you often work in both 8-16-bit data, if you want to convert to

20:22.360 --> 20:28.200
signed 8-16-bit detection instructions to start properly sign extensions, allowing a dedicated instruction

20:28.200 --> 20:35.240
for that, it's nice, or something like this multiply and add instructions, which gets more complicated,

20:35.240 --> 20:38.600
that would be an example of one of the other signs instructions, and so forth, the jumps,

20:39.080 --> 20:42.280
unconditional jumps, conditional jumps, depending on the flag register,

20:43.880 --> 20:50.280
instructions to call, sub-potins, to return from them, like put the program count on the start,

20:50.280 --> 20:58.360
and so on. Yes, to make the hard-famplementation a bit easier, all instructions to

20:58.360 --> 21:04.440
write at most 16-bit register and 16-bit memory locations, they can read multiple registers and

21:04.440 --> 21:10.600
read one memory location. So the idea is that you can, in hardware, implement this,

21:10.600 --> 21:21.880
where the dual-port RAM and the single-port program memory. And to be able to use memory efficiently

21:21.880 --> 21:28.040
and not need any padding by it, there's no alignment requirements, meaning the architecture allows

21:28.040 --> 21:34.680
16-bit loads and stores to any address, not just to even ones. To again, Max hardware would

21:34.680 --> 21:44.280
more complicated, but it saves a lot in terms of memory design. And there's actually a few

21:45.080 --> 21:51.800
safety and security features, which from perspective of a bigger microcontroller, of course,

21:52.360 --> 21:58.920
nothing particular fancy, I mean, on a risk, on a arm you have a wide extra execucing sensor,

21:58.920 --> 22:03.640
we don't have that. But I've put in a few few basic stuff, the classic watchstock, you want

22:03.640 --> 22:08.200
on a microcontroller, meaning it's the system hangs, you have a mechanism to restart it,

22:08.200 --> 22:13.240
basically it's the counter, that counts down or up, and when it reaches the set value, it just

22:13.240 --> 22:21.800
resets a machine, and then you're software, you tell the watchstock to be quiet for a while,

22:21.800 --> 22:26.040
all the time, and if you don't do it too long, the system assumes that it was hanging,

22:26.040 --> 22:32.200
and then it just brings starts. The second one is actually wrong, what I would say,

22:32.200 --> 22:38.680
wide to address, zero to trap, not reads. So if you can't, for some reason, an all-pointer,

22:38.680 --> 22:43.320
and it's up somewhere where you're worried that you don't get random undefined behavior,

22:43.320 --> 22:50.120
but you get a proper reset, a trap reset, in similar, if you're in Star, the zero instructions,

22:50.120 --> 22:55.720
the opt code that's all zero bits is all so a trap. Classic, classicly, this is often a no-op,

22:56.760 --> 23:03.640
and in many exploits, this is used, because when memory is often zero, things are often in

23:03.640 --> 23:10.120
it's like to zero before they're used, and if you manage to exit, to start executing some data,

23:10.120 --> 23:15.240
it's all no-ops until you get to a point where you have your gadgets that you can use to build

23:15.240 --> 23:19.640
your exploit, but if there's a no-ops hero instruction, it's actually a trap instruction that

23:19.640 --> 23:27.240
resets a system, this relatively common attack style doesn't work anymore. And of course, after

23:27.240 --> 23:33.240
reset, we have some bits in the reset controller set, so after reset you can check what was the

23:33.240 --> 23:37.080
reset, what was the power on reset, what was the watchdog reset, what was the trap reset.

23:42.600 --> 23:48.280
So basically that's what I wanted to say about the details of the architecture so far,

23:50.200 --> 23:53.320
but if I have extra time, I'll use it to quickly show the op code map,

23:54.280 --> 24:01.080
but let's just get to the current state. So there's an F8 port, and also an F8 L port,

24:01.080 --> 24:05.880
in SDCC's small device C compiler, so we have a working C tool chain for the architecture,

24:06.840 --> 24:12.200
I'll put in very low implementations of the F8, two of the F8, one of the F8 L,

24:14.120 --> 24:19.800
moist proof of concept, then not really a C-plue designer, even though I have a bit little bit of computer

24:19.800 --> 24:26.520
architecture background, one bit more optimized for speed, the other one more for size,

24:27.640 --> 24:34.200
the digital prepository, the architecture manual, and from tutorials on how to get started,

24:34.200 --> 24:39.640
on some common FPGA boards, for example, the letters, I see E40 ones,

24:41.640 --> 24:46.600
I think it's a govine, I would have had running on govine and on the

24:46.600 --> 24:54.360
current chip skate mate, so the ones I tend to have free tool chains, didn't bother trying another,

24:54.360 --> 24:59.160
just of course the website of the small device C compiler, and by the instructions, that is fixed by

24:59.160 --> 25:06.200
now, I'm still traveling the op code map around a bit, mostly automated to see what the

25:06.200 --> 25:12.360
synthesis tools can use very well to make the core a little bit smaller, so

25:12.920 --> 25:18.760
we'll probably see in the next few months the op code map being fixed, and then the sampler,

25:18.760 --> 25:27.480
and simulator has been updated accordingly. Okay, so that's for my talk slides, now if you want to

25:27.480 --> 25:38.360
see the I hope bits, it's readable, even from the back a bit. Now, we have,

25:38.360 --> 25:46.120
like, maybe I can, can, can zoom in a bit.

25:53.000 --> 25:55.720
Ah, yeah, I see it. Thanks.

25:55.720 --> 26:02.360
Better? Okay, so on the upper left, of course, we have the top instruction, as I said,

26:02.360 --> 26:07.720
C.O. is a top, then we have the typical two-up on the eight bits, they are, we use the addition

26:07.720 --> 26:11.720
with carry as example, where there you see the subtraction, subtraction with carry addition, addition

26:11.720 --> 26:16.840
with carry, the comparison with which is essentially the same as subtraction, but it doesn't

26:16.840 --> 26:22.440
write it's destination, it's just for checking things in an eighth condition or in a switch statement.

26:22.760 --> 26:32.680
The usual bit was operations and or XOR, the next comes the one-up on the eight bit instructions,

26:32.680 --> 26:39.480
such as the shifts, right shift left shift rotations, increment, decrement, clearance,

26:39.480 --> 26:44.280
or setting to zero, and just testing setting the flag beds depending on the value.

26:44.280 --> 26:48.840
Now, you see the first yellow ones, here the yellow ones are, if eight instructions are not in the

26:48.840 --> 26:56.520
simplified F8, L sub set, and when we now get to some 16 bit instructions, you see that a lot

26:56.520 --> 27:04.840
of them are F8 only, then we have a set of load instructions, next comes the exotic things,

27:04.840 --> 27:12.280
what not really exotic, but to once that doesn't fit into another category, like the exchanging

27:12.280 --> 27:19.800
tracks, an observer which swaps the two operands, atomically again, the grey ones, by the way, are these

27:19.800 --> 27:33.400
instruction prefixes that are mentioned before, and then 16 bit loads, L, the W, anything,

27:33.400 --> 27:40.120
ending in W is a 16 bit instruction essentially, and we also have a 16 bit XOR and or,

27:40.840 --> 27:46.600
because it's a relatively common in cryptography and the OR also in bitchar flings actually,

27:46.600 --> 27:52.120
both quite a bit more common than a need for 16 bit end instruction, because a lot of real

27:52.120 --> 27:59.480
world uses of end operators in C, we do something like a Texas, with an end mask,

27:59.480 --> 28:03.080
can serve with an end mask, can do stuff with it, then then OI together, but at the time we

28:03.080 --> 28:08.040
OI together we really need an OI instruction, so at the time we did the end, one of them was still

28:08.120 --> 28:12.120
a literal operand for typically one byte, what is a fully one or fully zero, so it's

28:12.120 --> 28:19.320
got optimized in turn 8 bit aren't any way. When they're as a conditional jump, the sign

28:19.320 --> 28:27.080
expansion stuff and multiply an art dancer on. Now looking down, this is basically the first page

28:27.080 --> 28:32.840
of the op code map, and if you use one of the prefix bytes and you get into something like this,

28:32.840 --> 28:39.320
this is the one for swapping operands, you can see now for subtraction the XL is on the right side,

28:39.320 --> 28:46.840
and the other operand is one of the other types, and then instead of the swap operand prefix

28:46.840 --> 28:53.240
we also then have of course this was the other pages for one of the other prefix is in this case

28:53.240 --> 29:00.360
XL 8 bit accumulator gets replaced by the XH register for this prefix byte or down here and

29:00.360 --> 29:07.080
out the YL and so on. So these other pages are just the prefix byte applied to an instruction,

29:07.080 --> 29:15.480
which affects the operands, what else are we as the same, maybe the small exception of the

29:15.800 --> 29:22.520
conditional jumps, where it affects the condition that we are jumping on.

29:28.360 --> 29:33.640
Sorry, the conditional jumps are wanted to highlight the actually easier over there,

29:33.640 --> 29:40.280
so this is a jump on carry flux set for example instruction, jump on zero flux set,

29:40.280 --> 29:46.920
instruction and a few less common conditions are then moved to the next page.

