From 0662e3574ef5227cd033d8d7f7eae97f33c3702c Mon Sep 17 00:00:00 2001 From: Minijackson Date: Mon, 21 Sep 2020 17:40:22 +0200 Subject: add file-system article + small fixes --- .gitignore | 1 + README.md | 2 +- file-system.md | 327 +++++++++++++++++++++++++++++++ html/res/GParted_1.0_screenshot.png | Bin 0 -> 106008 bytes html/res/GUID-Partition-Table-Scheme.svg | 86 ++++++++ html/res/fat32.jpg | Bin 0 -> 25837 bytes index.md | 50 ++--- kernel.md | 49 +++-- 8 files changed, 470 insertions(+), 45 deletions(-) create mode 100644 file-system.md create mode 100644 html/res/GParted_1.0_screenshot.png create mode 100644 html/res/GUID-Partition-Table-Scheme.svg create mode 100644 html/res/fat32.jpg diff --git a/.gitignore b/.gitignore index cbb539f..f613fa6 100644 --- a/.gitignore +++ b/.gitignore @@ -2,3 +2,4 @@ html/* !html/style.css !html/fonts +!html/res diff --git a/README.md b/README.md index 4da6a61..8b26efd 100644 --- a/README.md +++ b/README.md @@ -44,4 +44,4 @@ These resources helped me create and design this website: [Nord theme](https://www.nordtheme.com/) -: The palette used in this website +: The color palette used in this website diff --git a/file-system.md b/file-system.md new file mode 100644 index 0000000..acd7ab5 --- /dev/null +++ b/file-system.md @@ -0,0 +1,327 @@ +% The File System + +--- +subtitle: WTF is Linux +--- + +[back](index.md) + + +## Purpose + +If we look at storage devices as a way to store a bunch of ones and zeroes, we +can see that this is not the most humane way of storing data. At the hardware +level, there is no concept of file or directories, so we need a way of +organizing these ones and zeroes in such a way that we can tell "such and such +series of bytes corresponds to the `/home/user/file.txt` content". + +This is exactly what a file system is. At the core, a file system is a way of +organizing data (we can this a "format", since it's a way of "formatting" +data). + +Once we have some storage formatted for a known file system, we rely on the +kernel to be able to read this file system. + +Then, once we know the kernel is able to read the file system, we want to tell +the kernel where we would like to see it in our directory tree. This is called "[mounting](#mounts)" a file system. + + +## Storage devices + +Before we talk about the file system properly, let's talk about different kinds +of storage devices. + +We said in the introduction that "storage devices are a way to store a bunch of +ones and zeroes", but different kinds of devices have different specificities. + +For example, with a Hard Disk Drive (HDD), you will want to read close data in +quick succession, else you would have to wait for the next rotation of the +disk. + +In Flash Memory, used in USB Flash drives and often used in embedded systems, +you cannot set a single 0 to a 1, but instead have to erase a whole "sector" to +set every bit in that sector back to a 1. + +While these specificities might have an impact in high end computing, or bare +metal embedded systems (i.e. without a kernel), in our case, we thankfully +don't have to care about any of that. + +Every particularity of these devices is abstracted away by "device drivers" +implemented in the kernel, leaving us with a much simpler interface, like +`read` and `write` system calls. + + +## Some known file systems + +Just as there are multiple ways of storing an image in a file (jpeg, png, bmp, +etc.), there are multiple file system formats. + +### FAT32 + +FAT32 is probably the most "compatible" file system around. + +USB Flash Drives usually comes pre-formatted as FAT32, and when plug a USB +Flash Drive into something (computer, router, gaming console, etc.), you can be +pretty sure it'll work. + +It is not without drawbacks though: it is prone to fragmentation, can't handle +files bigger than 4GB, and doesn't have many features that more modern file +systems have (e.g. error recovery). + +### ext4 + +ext4 is the de facto standard file system on GNU/Linux: when you install a +GNU/Linux distribution, you will most likely end up with partitions formatted +using ext4. + +### NTFS + +NTFS is the standard file system used by Windows computers. To my knowledge, it +is not much used elsewhere, except to have Windows compatible external drives, +and support files bigger than 4GB (so FAT32 is not a valid choice). + + +## Enabling file systems in the kernel + +As an end user of a normal GNU/Linux distributions, there is usually no need to +enable the support for any file system, as the most common ones are enabled by +default. + +As an embedded system developer however, this is useful to know how to enable, +disable file systems: you already know which file system(s) you're going to +use, so you can only enable those, and disable the rest. + +Disabling unused file systems have two main benefits: + +- Less code in the kernel means a smaller kernel, and "size matters" is + actually true in the embedded world +- Less code in the kernel also means less attack surface, so less potential + vulnerabilities in the kernel + +To enable or disable a file system in the kernel, simply go to your +[configuration](kernel.md#configuring-the-kernel), look up your file system in +the "File Systems" menu, and enable or disable it. + +## Usual format of a file system + +Because a file system stores more that just the content of files, it needs +space for this "meta data". Different file systems do this differently, and +have different needs, so let's quickly look at a few examples: + +TODO: reference source of image + +![FAT32 format](./res/fat32.jpg) + +The FAT32 way of doing things is the simplest: it simply has a table of files +and directories at the beginning, and space afterwards for the content of +files. + +## Partitioning schemes + +We've seen how a file system works, but we're missing a crucial component of +the storage story: partitions. + +While having an ext4 file system take up a full hard drive is possible, this is +usually not done in practice. + +In a lot of cases, we want *several* file systems in a single hard drive. An +embedded example is having two system partitions, and if booting to one +partitions fails, fall back on booting on the other (in case a system upgrade +broke something, for example). + +Like everything else in computer science, you have several ways of doing it: in +a similar to file systems, partitioning schemes are a way of formatting ones +and zeroes, but instead of storing files, it stores file systems. + +The two most common partitioning schemes are GPT (the most modern one) and MBR +(which you'll often find on older systems). + +TODO: reference source of image + +![GPT format](./res/GUID-Partition-Table-Scheme.svg) + + +Like file format, you'll find sections dedicated to the meta data of the +partitions (i.e. name of the partitions, id of the partition, etc.), and +a whole space (partition data) to put your file system. + +## Block devices + + +Now that we know how partitions and file systems work, let's see how that +integrates with the GNU/Linux ecosystem. + +With the "everything is a file" philosophy of Linux, we can safely expect to +see our disks and partitions represented as files. These special kind of files +are called "block devices", and can be found in the `/dev` directory. + +Depending on how your disk is connected to your computer, the block device is +going to be named differently in `/dev`. + +On most systems, hard disk drives and SSDs are connected using an +[SCSI](https://en.wikipedia.org/wiki/SCSI) connection, so you will find your +drives under `/dev/sda`, `/dev/sda1`, `/dev/sdb`, etc. + +The naming scheme goes like this: + +``` + ,----- Represents the whole first SCSI disk the kernel found + | + v +/dev/sda +/dev/sda1 <-- Represents the first partition of the first disk +/dev/sda2 <-- Represents the second partition of the first disk + + ,----- Represents the whole second SCSI disk the kernel found + | + v +/dev/sdb +/dev/sdb1 <-- Represents the first partition of the second disk +/dev/sdb2 <-- Represents the second partition of the second disk +``` + +So, if we wanted to read directly from the file `/dev/sda`, you would get +exactly the bytes that are stored in your first storage device (and you could +parse yourself the GPT or MBR format). + +But this is not what we usually want, we want to access the partitions inside +the disk. For this, the kernel provides us with the files `/dev/sda1`, +`/dev/sda2`, etc. + +In a similar fashion, if you were to read directly from the file `/dev/sda2`, +you would get the bytes that are stored in the second partition of the first +disk (and you could parse yourself the format ext4, fat32, etc.) + +Again, we do not usually want to access the bytes inside of the partition, but +we want to access the files stored inside of the file system of the partition! +To that end, what we do is [mount](#mounts) the file system. + +Accessing the raw bytes of the disk or partition is often useful, however. It +can be used to list all the partitions in a disk, resize partitions, or get +some information of a given file system. + +To read or modify partitions, you can use the `parted` command-line tool, or +`gparted` graphical program: + +```bash +user@host:~$ sudo parted -l + +Model: Thumb Drive (scsi) +Disk /dev/sdb: 4041MB +Sector size (logical/physical): 512B/512B +Partition Table: gpt + +Number Start End Size File system Name Flags + 1 17.4kB 1000MB 1000MB first + 2 1000MB 4040MB 3040MB second +``` + +![GParted screenshot](./res/GParted_1.0_screenshot.png) + +To read or modify file systems, well it depends on the file system format you +want to access. + +To access file systems in the ext4 format, you can use programs from the +[e2fsprogs project](https://en.wikipedia.org/wiki/E2fsprogs). + +To access file systems in the fat32 format, you can use programs from the +[dosfstools project](https://directory.fsf.org/wiki/Dosfstools). + +TODO: create a file system + +## Mounts + +"Mounting" a file system is the action of representing the files of a partition +or disk into the directory hierarchy. + +For example, if the `/dev/sda1` file system is mounted, files of the first +partition of the first disk will appear, and reading, writing, or changing the +permissions of these files will instruct the kernel to read or modify the bytes +stored in that first partition of the first disk (in the file data if +reading/writing to the file, and in the file meta-data if modifying +permissions, in our example). + +To mount a file system, we can use the `mount` command (being root is needed): + +```bash +# ,----------- What file system to mount +# | (disk or partition) +# | +# | ,---- Where to mount it (a directory) +# | | +# vvvvvvvvv vvvv +root@host:~$ mount /dev/sda1 /mnt +``` + +With this command, the kernel will try to auto detect the type of file system +(ext4, fat, etc.), and provide you with the files inside that file system under +the given directory (in this example: `/mnt`). + +The directory where a file system is mounted is called a "mount point". + +You can also specify mount options, with the `-o` or `--options` command-line +argument. Common options include: + +- `ro`: read-only +- `rw`: read-write +- `defaults`: several common options +- `sync`: when a file is read or written, do it directly on the hardware +- `async`: when a file is read or written, do it asynchronously + +For more information, see the man page `man 8 mount`. + + +## Special file systems + +One consequence of the "everything is a file" philosophy of Linux, is that not +every file has to be tied to a storage device. + +For example, or files `/dev/sda`, ... that represent or hard drives and +partitions don't have to be stored in a physical location. It's way better to +create these files on boot according to the present hardware, and forget them +when shutting down the machine. + +An often used file system like that is the `tmpfs` special file system. Like +its name suggests, it is a temporary file system. Every file and directory in +it is stored on memory only, and disappears on shutdown. + +Another commonly used temporary file system is the `devtmpfs` special file +system. It is much like `tmpfs`, but instead of being created empty, all the +block devices (among other things) like `sda`, `sda1`, ... are automatically +created inside it. + +In that sense, the `/dev` directory really is just a standard mount point for +the `devtmpfs` file system. + +Finally, two really useful file systems are the `sysfs` and `proc` special file +systems. + +The `sysfs` is there to export information directly from the kernel. It can be +related to hardware devices, drivers, etc. It is normally mounted under the +`/sys` directory. + +The `proc` is mainly there to provide information about currently running +processes. For example, doing `cat /proc//cmdline` will get you the +command-line arguments of the running process with the given ``. It is +normally mounted under the `/proc` directory. + +There are also a lot of configuration options under the `/proc/sys` directory. + +## Further reading + +- "Design of the FAT file system" on Wikipedia +: +- ext4 on Wikipedia +: +- GPT on Wikipedia +: +- MBR on Wikipedia +: +- Using the `parted` command line tool on LinuxJourney +: +- Manual of the `mount` command +: +- Manual of `sysfs` file system +: +- Manual of `proc` file system +: diff --git a/html/res/GParted_1.0_screenshot.png b/html/res/GParted_1.0_screenshot.png new file mode 100644 index 0000000..461701f Binary files /dev/null and b/html/res/GParted_1.0_screenshot.png differ diff --git a/html/res/GUID-Partition-Table-Scheme.svg b/html/res/GUID-Partition-Table-Scheme.svg new file mode 100644 index 0000000..8d85bfb --- /dev/null +++ b/html/res/GUID-Partition-Table-Scheme.svg @@ -0,0 +1,86 @@ + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/html/res/fat32.jpg b/html/res/fat32.jpg new file mode 100644 index 0000000..0fc3277 Binary files /dev/null and b/html/res/fat32.jpg differ diff --git a/index.md b/index.md index c2a7141..0cb15c5 100644 --- a/index.md +++ b/index.md @@ -7,6 +7,8 @@ TODO: reference a mono font in the CSS +TODO: do figures properly in CSS + TODO: fix TOC CSS TODO: have a POSIX / UNIX / Linux section @@ -30,37 +32,37 @@ The objective is to deeply understand the Linux ecosystem: how it boots, what are the different pieces, how do they integrate with one-another, what are the different choices available to us when building such a system, etc. -Even though it is very rare to build these kinds of things "manually", I do -think it is better to do it this way for educative purposes. +Even though it's very rare to build these kinds of things "manually," I do +think it's better to do it this way for educative purposes. However, in the practical work part, we will build a Linux system in an -automated fashion, using [Buildroot](https://buildroot.org/), an well-known +automated fashion, using [Buildroot](https://buildroot.org/), a well-known software in the industry for building embedded Linux systems. This document is intended to serve several functions: - As notes, if I talk too fast, or if you don't like to take - notes[^take-notes], or miss classes + notes[^take-notes], or miss classes. - It's likely that this website is going to be more detailed than the course, so it's useful you want to go further. You can also follow the links to go - even further -- Should I miss some things while talking, this website should fix it + even further. +- Should I miss some things while talking, this website should fix it. [^take-notes]: - {-} You really should be taking notes, it does help remembering + {-} You really should be taking notes, it does help remember. The Linux command-line ---------------------- Main article: [CLI](cli.md) -When we say "command-line interface", we usually mean the text-based program +When we say "command-line interface," we usually mean the text-based program whose main purpose is to execute other commands. You input a program name, enter its arguments, press enter, and it will execute the said program with said arguments. In reality, the UNIX command-line is more complicated than that, but that also -makes it much more powerful. See the main article if you want to unlocking this +makes it much more powerful. See the main article if you want to unlock this power.