summaryrefslogtreecommitdiffstats
path: root/file-system.md
diff options
context:
space:
mode:
authorMinijackson <minijackson@riseup.net>2020-09-21 17:40:22 +0200
committerMinijackson <minijackson@riseup.net>2020-09-21 17:40:22 +0200
commit0662e3574ef5227cd033d8d7f7eae97f33c3702c (patch)
treecfd1658c05213a00c9c6cfe8b62671307f1df446 /file-system.md
parent4e6a880d4e4357e627b5ca2fe7f02b768830eb2b (diff)
downloadwtf-is-linux-website-0662e3574ef5227cd033d8d7f7eae97f33c3702c.tar.gz
wtf-is-linux-website-0662e3574ef5227cd033d8d7f7eae97f33c3702c.zip
add file-system article + small fixes
Diffstat (limited to 'file-system.md')
-rw-r--r--file-system.md327
1 files changed, 327 insertions, 0 deletions
diff --git a/file-system.md b/file-system.md
new file mode 100644
index 0000000..acd7ab5
--- /dev/null
+++ b/file-system.md
@@ -0,0 +1,327 @@
1% The File System
2
3---
4subtitle: WTF is Linux
5---
6
7[back](index.md)
8
9
10## Purpose
11
12If we look at storage devices as a way to store a bunch of ones and zeroes, we
13can see that this is not the most humane way of storing data. At the hardware
14level, there is no concept of file or directories, so we need a way of
15organizing these ones and zeroes in such a way that we can tell "such and such
16series of bytes corresponds to the `/home/user/file.txt` content".
17
18This is exactly what a file system is. At the core, a file system is a way of
19organizing data (we can this a "format", since it's a way of "formatting"
20data).
21
22Once we have some storage formatted for a known file system, we rely on the
23kernel to be able to read this file system.
24
25Then, once we know the kernel is able to read the file system, we want to tell
26the kernel where we would like to see it in our directory tree. This is called "[mounting](#mounts)" a file system.
27
28
29## Storage devices
30
31Before we talk about the file system properly, let's talk about different kinds
32of storage devices.
33
34We said in the introduction that "storage devices are a way to store a bunch of
35ones and zeroes", but different kinds of devices have different specificities.
36
37For example, with a Hard Disk Drive (HDD), you will want to read close data in
38quick succession, else you would have to wait for the next rotation of the
39disk.
40
41In Flash Memory, used in USB Flash drives and often used in embedded systems,
42you cannot set a single 0 to a 1, but instead have to erase a whole "sector" to
43set every bit in that sector back to a 1.
44
45While these specificities might have an impact in high end computing, or bare
46metal embedded systems (i.e. without a kernel), in our case, we thankfully
47don't have to care about any of that.
48
49Every particularity of these devices is abstracted away by "device drivers"
50implemented in the kernel, leaving us with a much simpler interface, like
51`read` and `write` system calls.
52
53
54## Some known file systems
55
56Just as there are multiple ways of storing an image in a file (jpeg, png, bmp,
57etc.), there are multiple file system formats.
58
59### FAT32
60
61FAT32 is probably the most "compatible" file system around.
62
63USB Flash Drives usually comes pre-formatted as FAT32, and when plug a USB
64Flash Drive into something (computer, router, gaming console, etc.), you can be
65pretty sure it'll work.
66
67It is not without drawbacks though: it is prone to fragmentation, can't handle
68files bigger than 4GB, and doesn't have many features that more modern file
69systems have (e.g. error recovery).
70
71### ext4
72
73ext4 is the de facto standard file system on GNU/Linux: when you install a
74GNU/Linux distribution, you will most likely end up with partitions formatted
75using ext4.
76
77### NTFS
78
79NTFS is the standard file system used by Windows computers. To my knowledge, it
80is not much used elsewhere, except to have Windows compatible external drives,
81and support files bigger than 4GB (so FAT32 is not a valid choice).
82
83
84## Enabling file systems in the kernel
85
86As an end user of a normal GNU/Linux distributions, there is usually no need to
87enable the support for any file system, as the most common ones are enabled by
88default.
89
90As an embedded system developer however, this is useful to know how to enable,
91disable file systems: you already know which file system(s) you're going to
92use, so you can only enable those, and disable the rest.
93
94Disabling unused file systems have two main benefits:
95
96- Less code in the kernel means a smaller kernel, and "size matters" is
97 actually true in the embedded world
98- Less code in the kernel also means less attack surface, so less potential
99 vulnerabilities in the kernel
100
101To enable or disable a file system in the kernel, simply go to your
102[configuration](kernel.md#configuring-the-kernel), look up your file system in
103the "File Systems" menu, and enable or disable it.
104
105## Usual format of a file system
106
107Because a file system stores more that just the content of files, it needs
108space for this "meta data". Different file systems do this differently, and
109have different needs, so let's quickly look at a few examples:
110
111TODO: reference source of image
112
113![FAT32 format](./res/fat32.jpg)
114
115The FAT32 way of doing things is the simplest: it simply has a table of files
116and directories at the beginning, and space afterwards for the content of
117files.
118
119## Partitioning schemes
120
121We've seen how a file system works, but we're missing a crucial component of
122the storage story: partitions.
123
124While having an ext4 file system take up a full hard drive is possible, this is
125usually not done in practice.
126
127In a lot of cases, we want *several* file systems in a single hard drive. An
128embedded example is having two system partitions, and if booting to one
129partitions fails, fall back on booting on the other (in case a system upgrade
130broke something, for example).
131
132Like everything else in computer science, you have several ways of doing it: in
133a similar to file systems, partitioning schemes are a way of formatting ones
134and zeroes, but instead of storing files, it stores file systems.
135
136The two most common partitioning schemes are GPT (the most modern one) and MBR
137(which you'll often find on older systems).
138
139TODO: reference source of image
140
141![GPT format](./res/GUID-Partition-Table-Scheme.svg)
142
143
144Like file format, you'll find sections dedicated to the meta data of the
145partitions (i.e. name of the partitions, id of the partition, etc.), and
146a whole space (partition data) to put your file system.
147
148## Block devices
149
150
151Now that we know how partitions and file systems work, let's see how that
152integrates with the GNU/Linux ecosystem.
153
154With the "everything is a file" philosophy of Linux, we can safely expect to
155see our disks and partitions represented as files. These special kind of files
156are called "block devices", and can be found in the `/dev` directory.
157
158Depending on how your disk is connected to your computer, the block device is
159going to be named differently in `/dev`.
160
161On most systems, hard disk drives and SSDs are connected using an
162[SCSI](https://en.wikipedia.org/wiki/SCSI) connection, so you will find your
163drives under `/dev/sda`, `/dev/sda1`, `/dev/sdb`, etc.
164
165The naming scheme goes like this:
166
167```
168 ,----- Represents the whole first SCSI disk the kernel found
169 |
170 v
171/dev/sda
172/dev/sda1 <-- Represents the first partition of the first disk
173/dev/sda2 <-- Represents the second partition of the first disk
174
175 ,----- Represents the whole second SCSI disk the kernel found
176 |
177 v
178/dev/sdb
179/dev/sdb1 <-- Represents the first partition of the second disk
180/dev/sdb2 <-- Represents the second partition of the second disk
181```
182
183So, if we wanted to read directly from the file `/dev/sda`, you would get
184exactly the bytes that are stored in your first storage device (and you could
185parse yourself the GPT or MBR format).
186
187But this is not what we usually want, we want to access the partitions inside
188the disk. For this, the kernel provides us with the files `/dev/sda1`,
189`/dev/sda2`, etc.
190
191In a similar fashion, if you were to read directly from the file `/dev/sda2`,
192you would get the bytes that are stored in the second partition of the first
193disk (and you could parse yourself the format ext4, fat32, etc.)
194
195Again, we do not usually want to access the bytes inside of the partition, but
196we want to access the files stored inside of the file system of the partition!
197To that end, what we do is [mount](#mounts) the file system.
198
199Accessing the raw bytes of the disk or partition is often useful, however. It
200can be used to list all the partitions in a disk, resize partitions, or get
201some information of a given file system.
202
203To read or modify partitions, you can use the `parted` command-line tool, or
204`gparted` graphical program:
205
206```bash
207user@host:~$ sudo parted -l
208
209Model: Thumb Drive (scsi)
210Disk /dev/sdb: 4041MB
211Sector size (logical/physical): 512B/512B
212Partition Table: gpt
213
214Number Start End Size File system Name Flags
215 1 17.4kB 1000MB 1000MB first
216 2 1000MB 4040MB 3040MB second
217```
218
219![GParted screenshot](./res/GParted_1.0_screenshot.png)
220
221To read or modify file systems, well it depends on the file system format you
222want to access.
223
224To access file systems in the ext4 format, you can use programs from the
225[e2fsprogs project](https://en.wikipedia.org/wiki/E2fsprogs).
226
227To access file systems in the fat32 format, you can use programs from the
228[dosfstools project](https://directory.fsf.org/wiki/Dosfstools).
229
230TODO: create a file system
231
232## Mounts
233
234"Mounting" a file system is the action of representing the files of a partition
235or disk into the directory hierarchy.
236
237For example, if the `/dev/sda1` file system is mounted, files of the first
238partition of the first disk will appear, and reading, writing, or changing the
239permissions of these files will instruct the kernel to read or modify the bytes
240stored in that first partition of the first disk (in the file data if
241reading/writing to the file, and in the file meta-data if modifying
242permissions, in our example).
243
244To mount a file system, we can use the `mount` command (being root is needed):
245
246```bash
247# ,----------- What file system to mount
248# | (disk or partition)
249# |
250# | ,---- Where to mount it (a directory)
251# | |
252# vvvvvvvvv vvvv
253root@host:~$ mount /dev/sda1 /mnt
254```
255
256With this command, the kernel will try to auto detect the type of file system
257(ext4, fat, etc.), and provide you with the files inside that file system under
258the given directory (in this example: `/mnt`).
259
260The directory where a file system is mounted is called a "mount point".
261
262You can also specify mount options, with the `-o` or `--options` command-line
263argument. Common options include:
264
265- `ro`: read-only
266- `rw`: read-write
267- `defaults`: several common options
268- `sync`: when a file is read or written, do it directly on the hardware
269- `async`: when a file is read or written, do it asynchronously
270
271For more information, see the man page `man 8 mount`.
272
273
274## Special file systems
275
276One consequence of the "everything is a file" philosophy of Linux, is that not
277every file has to be tied to a storage device.
278
279For example, or files `/dev/sda`, ... that represent or hard drives and
280partitions don't have to be stored in a physical location. It's way better to
281create these files on boot according to the present hardware, and forget them
282when shutting down the machine.
283
284An often used file system like that is the `tmpfs` special file system. Like
285its name suggests, it is a temporary file system. Every file and directory in
286it is stored on memory only, and disappears on shutdown.
287
288Another commonly used temporary file system is the `devtmpfs` special file
289system. It is much like `tmpfs`, but instead of being created empty, all the
290block devices (among other things) like `sda`, `sda1`, ... are automatically
291created inside it.
292
293In that sense, the `/dev` directory really is just a standard mount point for
294the `devtmpfs` file system.
295
296Finally, two really useful file systems are the `sysfs` and `proc` special file
297systems.
298
299The `sysfs` is there to export information directly from the kernel. It can be
300related to hardware devices, drivers, etc. It is normally mounted under the
301`/sys` directory.
302
303The `proc` is mainly there to provide information about currently running
304processes. For example, doing `cat /proc/<PID>/cmdline` will get you the
305command-line arguments of the running process with the given `<PID>`. It is
306normally mounted under the `/proc` directory.
307
308There are also a lot of configuration options under the `/proc/sys` directory.
309
310## Further reading
311
312- "Design of the FAT file system" on Wikipedia
313: <https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system>
314- ext4 on Wikipedia
315: <https://en.wikipedia.org/wiki/Ext4>
316- GPT on Wikipedia
317: <https://en.wikipedia.org/wiki/GUID_Partition_Table>
318- MBR on Wikipedia
319: <https://en.wikipedia.org/wiki/Master_boot_record>
320- Using the `parted` command line tool on LinuxJourney
321: <https://linuxjourney.com/lesson/disk-partitioning>
322- Manual of the `mount` command
323: <https://linux.die.net/man/8/mount>
324- Manual of `sysfs` file system
325: <https://man7.org/linux/man-pages/man5/sysfs.5.html>
326- Manual of `proc` file system
327: <https://man7.org/linux/man-pages/man5/proc.5.html>