WEBVTT

00:00.000 --> 00:12.960
Hi everyone, so next talk, we have a geometries spin-ellist here who will speak to us about

00:12.960 --> 00:20.080
how they found the V4 tape and upload the data to the UNIX history repo.

00:21.040 --> 00:31.920
Hello, good morning. As you know, UNIX has defined the way modern computing works.

00:32.720 --> 00:38.960
It was developed in the World Famous AT&T Bell Laboratories that is a place associated with

00:38.960 --> 00:46.000
no less than eight Nobel Prizes and three touring awards. And yet it started the life out of

00:46.000 --> 00:53.360
a failure. AT&T, General Electrics and MIT will be building a system called Maltics in the 1960s

00:53.360 --> 00:59.920
that was a bit too ambitious for the capabilities of the time. So AT&T pulled out of the project,

00:59.920 --> 01:05.920
it wasn't going to well. And can Thompson, other scientists found themselves without a subject of

01:05.920 --> 01:12.640
studies. So can Thompson later joined by Doug McKillroy, Jefferson and Dennis Richie went on

01:12.640 --> 01:19.600
and developed the first non-named system on a PDP 7. That's some very interesting ideas in it.

01:19.600 --> 01:24.880
And based on the success of that system, they created the funding proposal to Bell Laboratories,

01:24.880 --> 01:31.760
they needed to do that even in that well-funded place for a more powerful computer. Initially they

01:31.760 --> 01:39.680
asked for a PDP 10, which cost about half a million dollars at a time. Bell Ups told that they cost,

01:39.680 --> 01:46.000
and at the end they got the PDP 11, which cost a tenth of that price. And maybe the lesser

01:46.000 --> 01:52.880
capabilities of that system influenced the direction of Unix. Over the 1970s, a large group of

01:52.880 --> 01:58.160
people, you see them here at the later age, they were very young and powerful. At the time,

01:58.720 --> 02:06.000
they developed on and created a series of Unix editions. These were numbered following the

02:06.000 --> 02:12.480
editions of the manual that came out each time. In various states, you see here the seventh

02:12.480 --> 02:19.040
so-called Bell Ups research edition from the first one to the seventh one. So source code was

02:19.040 --> 02:25.840
not versioned even at the last edition there was no version control system, but the manuals were

02:25.840 --> 02:32.400
versioned through their printing and adding an edition word, word on it. Not a lot has survived

02:32.400 --> 02:37.600
from the first edition, so for the first unnamed one, it has survived in printed form and people

02:37.600 --> 02:43.040
have also already already made it run. In others, when you see you have some parts of commands,

02:43.040 --> 02:48.560
some parts of the kernel. At the fourth edition, we had just the documentation.

02:50.000 --> 02:56.720
Which brings me to 1974, and this letter from Kent Thompson to Martin Newell at the University

02:56.720 --> 03:01.120
of Utah, which says that we are delaying the delivery because we are on a lot of copies

03:01.200 --> 03:07.760
of the documentation, but there will soon send you a system. Newell, you might know him,

03:07.760 --> 03:12.880
he is the person every one computer scientist, specializing computer graphics. He is the creator of

03:12.880 --> 03:20.560
the famous Utah team or the benchmark used for 3D rendering. So it seems at some point there was a

03:21.840 --> 03:29.760
version that could be shipped from Bell Labs to Utah, and something like this status of

03:29.840 --> 03:36.400
the American, it's not the original tape, but I want to show you to you for those who are not familiar

03:36.400 --> 03:46.880
with the computer tapes, got shipped from Bell Labs to Utah. Fast forward to 1828 of July in 2025,

03:46.880 --> 03:53.360
and the storage closet of Robert Rich's flat research group at the Maryland Engineering building

03:53.440 --> 04:00.160
you will see here, where Alex and American researchers associate in the group found the tape

04:00.160 --> 04:06.560
among the documents of Jayle Pro. The findings was widely reported, even on broadcast TV.

04:07.760 --> 04:15.680
To, so was there anything on the tape? It was important to read it, but also it was one tape

04:15.680 --> 04:21.680
we never had access to it before, so it was considered very valuable. So to avoid high altitude

04:21.680 --> 04:28.000
cosmic radiation on flights and airport scanner damage, lab members jodering, you see at the

04:28.000 --> 04:34.560
center in Thalia Archibald, you see on the left and there took 11 hours drive to the computer

04:34.560 --> 04:43.920
history museum in California. There it was read by Archivist Alcosso of bit savers, famous archiving

04:43.920 --> 04:50.320
site, you see him here at work. To minimize tape where he used the system developed by computer

04:50.320 --> 04:57.760
historian Len Shashik, so he first read the tape not as a digital bit, but as a 1.6 gigabyte

04:57.760 --> 05:05.520
of analog data using a multi-track analog to digital converter. This means that a lot of hardware

05:05.520 --> 05:12.080
that you see here on the left that happens inside that is implemented normally inside tape readers

05:12.080 --> 05:18.080
can be implemented in software and thereby the tape can be virtually read again and again with various

05:18.320 --> 05:26.960
settings so that can be restored. You see here the waveforms from various tape tracks and

05:26.960 --> 05:32.000
here zoomed in to see how they can be seen and various elements how they can be adjusted.

05:32.640 --> 05:40.960
This was successful and now if you go to archive.org using Shostek's read tape program,

05:40.960 --> 05:47.520
this 1.6 gigabyte analog data were converted on 2.5 megabytes of digital

05:47.680 --> 05:54.480
the format which is compatible with a image emulator and both are now available on archive.org.

05:54.480 --> 06:00.720
Think of it, tape is 2.5 megabytes about the same as a single picture we take with our mobile phones.

06:01.840 --> 06:08.480
What's the legal status of this tape thankfully in 2002 called data which on the IP rights

06:08.480 --> 06:16.800
to Unix at the time wrote this letter which if you squint well enough looks like an open source

06:16.800 --> 06:23.840
license. So effectively it can be uploaded on negative and shared and built upon and run and so on.

06:23.840 --> 06:29.200
And then people started using it for example Brian Rodriguez developed this erotic

06:29.200 --> 06:35.200
ebook the Unix 4th edition source code commentary where he analyzes the entire kernel and how

06:35.200 --> 06:41.680
it is built, how you can run it on an emulator following a very influential book from the 1976

06:41.680 --> 06:48.080
by John Limes from the University of New South Wales which taught many generations of computer

06:48.080 --> 06:55.200
science students the design of a real operating system kernel. Also Angela Papenhoff who

06:55.200 --> 07:01.680
gave a very interesting talk at 39c3 if you weeks ago made it run on a pdp 11 and you later

07:01.760 --> 07:09.520
and also published a tar file with the tapes contents. I employed the file to add the tapes

07:09.520 --> 07:16.000
contents to the Unix history repository. What is that? It's a repository on GitHub on the URL

07:16.000 --> 07:22.560
you see there that contains a history of Unix from its inception so the first unnamed version

07:22.560 --> 07:28.640
until today for a lot part of it it traces the 730 search editions work that happened at the

07:28.720 --> 07:36.400
Berkeley the BSD editions 36 BSD and ends up with the 3 BSD modern editions from the very complex

07:36.400 --> 07:43.360
and long history of Unix it traces one line they want you see here on the in orange. It contains

07:43.360 --> 07:50.640
snapshots of the research editions and values Berkeley editions then commits moved from

07:50.640 --> 07:56.880
CCS this was the source code control system used for many years at the Berkeley and then finally

07:56.960 --> 08:07.040
normal free BSD commits as they move from CGS to subversion and now git. The way it is constructed

08:07.040 --> 08:16.320
allows us to run git blame on it here I do it on times on dot c and amazingly we see on the

08:16.320 --> 08:24.000
modern free BSD times on dot c lines that were apparently written by Dennis H.1979 actually some

08:24.000 --> 08:29.200
of you may realize that those lines shouldn't be there because what was released by Berkeley

08:29.200 --> 08:35.920
was supposed to be a complete clean source open source implementation of Unix so it's a mystery to

08:35.920 --> 08:46.480
me to to add the contents of the tape to the repository what I do is I what I did was first

08:46.480 --> 08:52.160
of all associate files with actual authors based in documentation and the information that was available

08:52.160 --> 08:58.800
at the time even oral histories and things that people who were there told me now and I got

08:58.800 --> 09:04.160
the timestamps from the actual timestamps of the tape and this each timestamp is a separate

09:04.160 --> 09:11.360
commit. Let me demo you how the system works you can go to a Unix before dot dev and see

09:11.360 --> 09:16.720
the running this not work at it but I find it pretty cute cool and it shows a Unix system that's

09:17.280 --> 09:23.760
and what we expect so if I type on the command line I can change the directory to TMP

09:23.760 --> 09:29.680
I run the line editor it's what happens when you press on VIM a Q I append lines and I write

09:29.680 --> 09:33.920
the small main function with a hello world that you would expect.

09:37.200 --> 09:44.560
I press dot 2 finish my input I write the file to hello dot c and then I can compile it

09:45.440 --> 09:51.840
if I list the files in the directory c and a dot dot and if I run it press to get hello world

09:51.840 --> 10:03.040
from code that was written at that time. So this is really important because now we have a complete

10:03.040 --> 10:07.760
it's the first complete Unix distribution we have the kernel the commands we already had the

10:07.760 --> 10:14.560
mantpages and the c compiler. This version produced many interesting things structured programming

10:14.560 --> 10:19.760
so kernel was implemented in a language called the new B at the time you can guess which language

10:19.760 --> 10:28.160
this was 7800 lines of new B and just 1000 lines of VDP 11 assembly. Provided I think for the

10:28.160 --> 10:33.360
first time a language independent API so you see here the pipe system called documented both in the

10:33.360 --> 10:40.160
way you call it from assembly language R0 and R1 RDP 11 registers and from c to get an array

10:40.160 --> 10:46.160
of two file descriptors. It used the data structure definition and reused through header files

10:46.880 --> 10:52.080
it provided the device driver abstraction through special files many of the files are

10:52.080 --> 10:59.760
completely nonsensical today speech synthesizer and the spider interface but the driver interface

10:59.760 --> 11:06.480
the way this abstraction was implemented is still available today and previous editions but also

11:06.480 --> 11:11.360
similar design on Unix and even on Windows if you try to write a Windows device driver.

11:12.000 --> 11:17.520
Also it had a properly typeset manual you see the fourth the third edition is typewriter

11:17.520 --> 11:24.400
the other one is a typeset. It had a snowball and implementation a dynamically type string and

11:24.400 --> 11:32.160
pattern processing language that has influenced the or compared and thereby Python 1600 lines of

11:32.160 --> 11:38.080
C and tried to find out what the way it was written it was just a quick entertainment for

11:38.080 --> 11:44.640
again and the application that survived was a program that solves the instant insanity page and

11:44.640 --> 11:51.040
here you see it written in snowball. What are the tapes contents regarding dates what I did that

11:51.120 --> 11:58.080
created the rank of diagrams about from the files of all editions that have survived and if you

11:58.080 --> 12:04.800
and as you can see the fourth edition the files match many of them match the manual page some

12:04.800 --> 12:10.320
are associated with the later date but they seem to be relatively close to the manual date.

12:11.440 --> 12:18.160
I also wrote a script using a git to git blame to see what code was inherited from previous

12:18.320 --> 12:26.080
editions and as you can see the fourth edition in had some parts of the third edition and also

12:26.080 --> 12:32.800
the fifth edition inherited a large part of the fourth edition. I also looked at the evolution of language

12:32.800 --> 12:38.880
used because it was a time C was being used more and more and you can indeed see here that the

12:38.880 --> 12:45.280
fourth edition content is a significant part of C code something that continues and gets expanded

12:45.360 --> 12:50.800
until the end of the decade and I think we can unfortunately be fortunate to say that it

12:50.800 --> 12:53.840
survives until today. Thank you very much.

12:53.840 --> 13:16.960
So we have a couple of minutes for some questions. Thank you very much. I volunteer for

13:16.960 --> 13:22.080
computing history group in Newcastle UK and about 10 years ago the retiring head of school

13:22.080 --> 13:28.480
handed me a tape and he said this has got unix on it I think. I wouldn't get too excited because

13:28.480 --> 13:33.440
I'm fairly sure it's it's well known it's V7 or something and obviously the next thing I need to

13:33.440 --> 13:37.760
do is go back to him and ask him because he's still that I can still kind of contact him but

13:38.320 --> 13:43.760
assuming there is something of historical interest on it what would you advise our next step to be?

13:44.640 --> 13:50.640
Contact the computer history museum to get further advice we may be witnessing history here.

13:51.600 --> 13:54.400
Thank you. Thank you very much for bringing this up.

13:58.480 --> 14:05.360
Riddle or tape is tricky they can get destroyed so please don't touch it. For example the tape

14:05.360 --> 14:10.320
right if you doesn't have any plastic parts, cups, tants and so on in order to minimize

14:10.320 --> 14:29.520
it weren't there. When converting the binary data when converting the

14:29.840 --> 14:35.360
data to binary how did you make sure that there were no errors in the transformation?

14:36.160 --> 14:42.560
So when I didn't convert the tape I just uploaded it on GitHub but when the binary data were

14:42.560 --> 14:49.920
converted how to ensure there were no errors there as understand the CRC checks in the tape

14:49.920 --> 14:55.520
reading mechanism and then people actually looked at the tape contents to see that it was okay so

14:55.600 --> 15:01.920
2.5 megabytes is fine and indeed we found some people find some blocks that were mislade

15:01.920 --> 15:06.080
and manually patched them to collect everything. Amazing, cool.

15:11.760 --> 15:17.280
We are on time but let's take one more question and then we move to the next talk.

15:26.320 --> 15:34.880
Hi, thank you for the amazing research. The Bose Extenders came out in 1998. I was wondering

15:34.880 --> 15:40.880
to what extent does this very early incarnation comply to the later Bose Extenders? What is

15:42.480 --> 15:48.080
to what extent does this feel like a real modern units? To what extent does it feel like a

15:48.080 --> 15:55.600
real modern units? There are differences, for example I typed CHD, rather than CD. Many of the tools

15:55.600 --> 16:02.080
are already there, command options are missing and some tools are not there yet.

16:02.080 --> 16:06.480
And the kernel like the system pulls that were later the fines and the Bose Extenders,

16:06.480 --> 16:10.960
they're still being developed at this point again. Thank you for that. So what happened with

16:10.960 --> 16:18.480
the interface and the system calls? There is this paper you can find that documents and

16:18.480 --> 16:23.760
the repository on GitHub that has a time line of all unique facilities, command system calls,

16:23.760 --> 16:29.760
when each one appeared and when some of them also move the way. And there you can find the exact

16:29.760 --> 16:34.960
timeline and see what was there and how it compares to the 7th edition and later ones.

16:34.960 --> 16:38.160
Very impressive. I'm going to look at that up. Thank you. Thank you.

16:40.960 --> 16:44.160
Thank you again.

