Please Criticize This…
OK, I am working on the explanation of Unix-like file systems intended for programmers that don’t know anything about the Unix-like file system structure. This is a rough draft, and I know the appendix is sketchy, it needs to be - well - written. I think I did a half decent job, but it occurs to me as I write the technical documentation that there is no explanation of the Unix-like file system concept. So please please please help me by demanding clarification at points!
Cheers!
AngryPhysicist
Revision 1 Added more information about how the directory really is-a file.
An Introduction to Unix-like File Systems
What the hell’s a file system?
A file system is used nowadays to manage the hard disk, floppy disk, and well any block device. It uses files to centralize data, and directories to organize files and other directories.
The hard disk has a geometry (see the appendix) so the file system has to “respect” this. There are various approaches to making a file system, we shall inspect the Unix approach specifically since this is a manual for a Unix-like operating system.
The Unix Approach: Inodes
It should be emphasized EVERYTHING “IS-A” FILE! Unix is (sorta) object oriented like that, that’s one of the appeals to it. Any operation that can be done on a file (read, write, open, close, etc.) can be performed on directories, devices, etc.
The file system is key to the Unix philosophy…everything “is-a” file after all! The file system is like the heart of Unix…so it is worthy to spend time to discuss the idea behind the Unix-like file systems. We shall use pseudo-code that is similar to C# or D.
The basic idea is this: everything is a file. A file is represented by a data structure called an inode (index node). The inode (also written as “i-node”, “index-node”, etc.) has fields that store information about the file or the addresses of the blocks. If you are unfamiliar with disk geometry, there is an appendix on it. A toy inode could be:
struct inode{
uint mode;
uint size;
uint *blocks[MAX_ADDRESSES];
}There are only three simple fields that really define an inode. One tells us what sort of file we are dealing with. In unix, we have ordinary files and directories, as well as file representation of the various devices, pipes, and sockets. (The last two, pipes and sockets, are a sort of data structure used in interprocess communication.)
The next thing that gives us information about the file is the size of the file, which is given to us by the field
uint size.Last, and by no stretch of the imagination least,
uint *blocks[MAX_ADDRESSES]. This field is the pointers to the various blocks that the inode stores the data in. Typically, the last several blocks are so-called indirect blocks. They hold addresses to blocks which are then used to hold addresses to more blocks. Singly indirect blocks use 1 indirect block (that is, the address it refers to is a block full of addresses; the exact number of addresses is equal to (the size of the block in bytes) / (the size of the variable one uses for addressing in bytes)). TO figure out how much data storage this gives us, we have the number of addresses in the indirect block given, we shall call it N. Now, each address refers to a block of data, so there is: N * (size of a block in bytes) bytes per singly indirect block.A doubly indirect block uses a block that is full of addresses to another indirect block. That is to say, the address referred to us by the inode is a block full of addresses. These addresses refer us to more blocks which are chalk full of addresses. These addresses then refer us to the data. To figure out how many addresses it refers to, we simply figure: (1 block of addresses) * (X bytes per block) * (1 address per Y bytes) = number of addresses referring us in the first indirect block. We previously called this N. Now, each address refers us to what is pragmatically a singly indirect block. So we have: N * ( N * (size of a block in bytes) ) = bytes supported in a doubly indirect block.
There is also the triply indirect block. If you haven’t spotted the pattern so far, here’s the trick: for an Z indirect block, it supports N**Z * (size of a block in bytes) bytes. So for our triply indirect block, it is: N * (N * (N * (size of a block in bytes) ) ) bytes.
To figure out how many bytes an inode supports total, simply sum up the direct blocks with the indirect blocks. It’s a simple sum.
The Organization of the Disk
The basic structure of the inode approach more or less organizes the hard disk (or any block device really) into several components. We shall briefly cover this organization here.
The first important block is the zeroeth block: the boot block. It is the block that is loaded into memory after bios. It must be preserved at all costs.
Then we have two bit map blocks. The first is the inode bitmap. The idea is that the disk has a finite space (despite our wishes!). There are only so many inodes that are needed. So we allocate a section of space for these inodes. We need to keep track of whether these inodes are being used or are free. We use a bitmap to do this. A bitmap is really just a collection of bits (ones and zeroes). The first bit refers to the first inode, the second bit refers to the second inode, etc. We indicate whether an inode is used by a 1 and it is free by representing it with a 0 in the bitmap.
The second bitmap that follows this is the block bitmap. Just as the inode kept track of the free and used inodes, the block bitmap tracks the blocks and whether they are free or taken.
Now we have the space allocated to the inodes. This is the inode table. It is the contiguous space of inodes that keeps track of, and organizes, the disk.
The rest of the disk is used for the data directly (or as indirect blocks full of addresses). They do not hold inodes, etc.
Directories in Unix Like File Systems
Well, we see what a file is (it’s just a collection of blocks basically!). What about a directory? It seems a little more complicated. The directory is a file. However, unlike an ordinary file that stores data, a directory has special entries within it. The directory entry is a glorified pointer to a file. It stores the file name, the inode number of the file, and sometimes other fields that are useful.
We treat the directory as a file. A file is an inode. An inode stores addresses to blocks of memory for data. A directory, however, has no data other than these glorified links called directory entries. Well, since we cannot really change the structure of the inode, we need to simply change the data that are stored on the blocks. We use things called directory entries, a data structure often shortened to dir_entry. Let us look at a simplistic one:
struct dir_entry {
char name[NAME_LENGTH];
int inodeNumber;
}This is the simplest directory entry. It has two fields that are really important: the name of the entry within the directory, and the inode number.
The inode number refers to the inode being the Nth inode in the inode table. This is useful if we wish to know where the inode is, and if we would just so happen like to actually access the data.
Supposing that the maximum length of the name is 256 (i.e.
#define NAME_LENGTH 256), then the size of adir_entryis: (256 char) * (1 byte per char) + (1 int) * (4 bytes per int) = 256+4 = 260 bytes. That is roughly half of a sector (512/2 = 206, I know I know!). And an inode can refer to (supposing that there are 16 addresses all together) 16 blocks directly (I’m lazy, so I won’t try to calculate out “What if we use 1 singly indirect block? What if we use 1 doubly indirect block? What if…?”). Thus, in a handwavy lazy fashion, there are at most 32 entries in a directory this way…yes yes this also assumes that 1 block is 1 sector big (and 1 sector is 512 bytes). <!–I lied, I will calculate out how manydir_entrythere are for: 1 singly indirect block, 1 singly and 1 doubly indirect block, and 1 singly 1 doubly and 1 triply indirect blocks. Well, for a singly indirect block using 4 bytes for addressing, we have (512 bytes per block) * (1 address per 4 bytes) = (512/4) addresses = 128 addresses. Thus if each refers to a block, we have (128 blocks) * (512 bytes per block) = 65536 bytes. Thus for an inode with 15 direct addresses and 1 singly indirect block, we have (15 addresses + 128 addresses) * (512 bytes) = 73,216 bytes. There are 281.6 directory entries, round it down and we have 281 directory entries.Suppose now we have 14 direct blocks, 1 singly indirect block, and 1 doubly indirect block. The doubly indirect block has 128 addresses that refers to 128 addresses, or (128 addresses) * (128 addresses) = 16,384 addresses. Now, to add together all the addresses: (14 direct addresses) + (128 singly indirect addresses) + (16384 doubly indirect addresses) = 16,526 addresses. Now this supports: (16526 blocks) * (512 bytes per block) = 8,461,312 bytes. This in turn supports (8461312 bytes) * (1 dir_entry per 260 bytes) = 32543.507692308 dir_entries. We round down to 32,543 directory entries.
For 13 direct blocks, 1 singly indirect block, 1 doubly indirect block, and 1 triply indirect block, let us calculate this one out. We know how many addresses there are for direct, singly indirect, and doubly indirect blocks, but we are ignorant of the number of addresses supported by the triply indirect block. Let us figure it out now. There is a block that has addresses to other blocks full of addresses to other blocks. That is (128 addresses) * (128 addresses) * (128 addresses) = 2,097,152 addresses. Now this inode supports: (13 direct addresses) + (128 singly indirect addresses) + (16384 doubly indirect addresses) + (2097152 triply indirect addresses) = 2,113,677 addresses. This in turn supports a total of: (2113677 addresses) * (1 block per address) * (512 bytes per block) = 1,082,202,624 bytes. Now, each directory entry is 260 bytes, so that means that this inode approach supports (1082202624 bytes) * (1 dir_entry per 260 bytes) = 4162317.784615385 directory entries. We round this down to become 4,162,317 directory entries.
–>Appendix: Disk Geometry
Disk geometry is an interesting concept in and of itself. The hard disk is divided into atoms of 512 byte sectors. Typically file systems use a “block” concept, where there are 1, 2, 4, 8, or sometimes 16 sectors in a block. Note how it’s always a multiple of 2 (2**0, 2**1, 2**2, etc.).
22 July 2007 at 6:42 pm
Unless it is a Unix specific term, I’d expect the term ‘format’ to be used instead of ‘geometry’.
‘everything “is-a” file after all!’ is unclear (for me anyway) until it is further explained 3 paragraphs down.
Perhaps I diagram showing the blocks, sectors and indirections would be good.
The rest is good and gave me a interesting intro into the Unix way of doing things (I’m a Windows dev).
22 July 2007 at 7:02 pm
I do need to elaborate hard disk geometry a little more. Well, there’s practically no explanation given! So perhaps I should start there
It is, however, used in Windows.
A Valid point, I’ll edit the piece shortly to reflect this.
I was tempted to do this too, but what stopped me was two things: 1) I’m really bad when it comes to diagrams, and 2) I have no clue how to use diagrams on wordpress.
But I whole heartedly agree, diagrams would make life easier…far far far easier.
This is actually only an introduction to a further more elaborate introduction to Unix file systems.
I was hoping to do case by case studies starting from a toy operating system (Skelix), then moving on to FreeBSD and ending with Linux. I’m done with the Skelix portion, but I need to work on the FreeBSD and Linux sections.
Then I was hoping to move on to discuss the virtual file system, starting with SunOS’s original implementation of it, then ending with the Linux virtual file system. Perhaps walk through writing one from scratch?
This was intended to be part of the Brainix manual, as a sort of introduction and historical background to the file system concepts. That’s why there’s so much stuff.
22 July 2007 at 7:25 pm
IANAComputer Engineer, but…
“Geometry” is the industry standard term for the device-level structure of the drive, which is usually pretty closely related to its physical parameters — the number of platters, how clusters of magnetized regions are arranged on each, and so on. “Format” refers to a higher-level structure, consisting of a method for indexing the information in terms of the device-level structure so that programs don’t have to think in those terms.