-
Notifications
You must be signed in to change notification settings - Fork 770
File System, Part 2: Files are inodes
Big idea: Forget names of files: The 'inode' is the file.
It is common to think of the file name as the 'actual' file. It's not! Instead consider the inode as the file. The inode holds the meta-information (last accessed, ownership, size) and points to the disk blocks used to hold the file contents.
A directory is just a mapping of names to inode numbers. POSIX provides a small set of functions to read the filename and inode number for each entry (see below)
Let's think about what it looks like in the actual file system. Theoretically, directories are just like actual files. The disk blocks will contain directory entries or dirents. What that means is that our disk block can look like this
inode_num | name |
---|---|
2043567 | hi.txt |
... |
Each directory entry could either be a fixed size, or a variable c-string. It depends on how the particular filesystem implements it at the lower level.
From a shell, use ls
with the -i
option
$ ls -i
12983989 dirlist.c 12984068 sandwich.c
From C, call one of the stat functions (introduced below).
Use the stat calls. For example, to find out when my 'notes.txt' file was last accessed -
struct stat s;
stat("notes.txt", &s);
printf("Last accessed %s", ctime(&s.st_atime));
There are actually three versions of stat
;
int stat(const char *path, struct stat *buf);
int fstat(int fd, struct stat *buf);
int lstat(const char *path, struct stat *buf);
For example you can use fstat
to find out the meta-information about a file if you already have an file descriptor associated with that file
FILE *file = fopen("notes.txt", "r");
int fd = fileno(file); /* Just for fun - extract the file descriptor from a C FILE struct */
struct stat s;
fstat(fd, & s);
printf("Last accessed %s", ctime(&s.st_atime));
The third call 'lstat' we will discuss when we introduce symbolic links.
In addition to access,creation, and modified times, the stat structure includes the inode number, length of the file and owner information.
struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* inode number */
mode_t st_mode; /* protection */
nlink_t st_nlink; /* number of hard links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_t st_rdev; /* device ID (if special file) */
off_t st_size; /* total size, in bytes */
blksize_t st_blksize; /* blocksize for file system I/O */
blkcnt_t st_blocks; /* number of 512B blocks allocated */
time_t st_atime; /* time of last access */
time_t st_mtime; /* time of last modification */
time_t st_ctime; /* time of last status change */
};
Let's write our own version of 'ls' to list the contents of a directory.
#include <stdio.h>
#include <dirent.h>
#include <stdlib.h>
int main(int argc, char **argv) {
if(argc == 1) {
printf("Usage: %s [directory]\n", *argv);
exit(0);
}
struct dirent *dp;
DIR *dirp = opendir(argv[1]);
while ((dp = readdir(dirp)) != NULL) {
puts(dp->d_name);
}
closedir(dirp);
return 0;
}
Ans: Use opendir
, readdir
, closedir
For example, here's a very simple implementation of ls
to list the contents of a directory.
#include <stdio.h>
#include <dirent.h>
#include <stdlib.h>
int main(int argc, char **argv) {
if(argc ==1) {
printf("Usage: %s [directory]\n", *argv);
exit(0);
}
struct dirent *dp;
DIR *dirp = opendir(argv[1]);
while ((dp = readdir(dirp)) != NULL) {
printf("%s %lu\n", dp-> d_name, (unsigned long)dp-> d_ino );
}
closedir(dirp);
return 0;
}
Note: after a call to fork(), either (XOR) the parent or the child can use readdir(), rewinddir() or seekdir(). If both the parent and the child use the above, behavior is undefined.
For example, to see if a particular directory includes a file (or filename) 'name', we might write the following code. (Hint: Can you spot the bug?)
int exists(char *directory, char *name) {
struct dirent *dp;
DIR *dirp = opendir(directory);
while ((dp = readdir(dirp)) != NULL) {
puts(dp->d_name);
if (!strcmp(dp->d_name, name)) {
return 1; /* Found */
}
}
closedir(dirp);
return 0; /* Not Found */
}
The above code has a subtle bug: It leaks resources! If a matching filename is found then 'closedir' is never called as part of the early return. Any file descriptors opened, and any memory allocated, by opendir are never released. This means eventually the process will run out of resources and an open
or opendir
call will fail.
The fix is to ensure we free up resources in every possible code-path. In the above code this means calling closedir
before return 1
. Forgetting to release resources is a common C programming bug because there is no support in the C lanaguage to ensure resources are always released with all codepaths.
There are two main gotchas and one consideration:
The readdir
function returns "." (current directory) and ".." (parent directory). If you are looking for sub-directories, you need to explicitly exclude these directories.
For many applications it's reasonable to check the current directory first before recursively searching sub-directories. This can be achieved by storing the results in a linked list, or resetting the directory struct to restart from the beginning.
One final note of caution: readdir
is not thread-safe! For multi-threaded searches use readdir_r
which requires the caller to pass in the address of an existing dirent struct.
See the man page of readdir for more details.
Ans: Use S_ISDIR
to check the mode bits stored in the stat structure
And to check if a file is regular file use S_ISREG
,
struct stat s;
if (0 == stat(name, &s)) {
printf("%s ", name);
if (S_ISDIR( s.st_mode)) puts("is a directory");
if (S_ISREG( s.st_mode)) puts("is a regular file");
} else {
perror("stat failed - are you sure I can read this file's meta data?");
}
Yes! Though a better way to think about this, is that a directory (like a file) is an inode (with some data - the directory name and inode contents). It just happens to be a special kind of inode.
From Wikipedia:
Unix directories are lists of association structures, each of which contains one filename and one inode number.
Remember, inodes don't contain filenames--only other file metadata.
First remember that a file name != the file. Think of the inode as 'the file' and a directory as just a list of names with each name mapped to an inode number. Some of those inodes may be regular file inodes, others may be directory inodes.
If we already have a file on a file system we can create another link to the same inode using the 'ln' command
$ ln file1.txt blip.txt
However blip.txt is the same file; if I edit blip I'm editing the same file as 'file1.txt!' We can prove this by showing that both file names refer to the same inode:
$ ls -i file1.txt blip.txt
134235 file1.txt
134235 blip.txt
These kinds of links (aka directory entries) are called 'hard links'
The equivalent C call is link
link(const char *path1, const char *path2);
link("file1.txt", "blip.txt");
For simplicity the above examples made hard links inside the same directory however hard links can be created anywhere inside the same filesystem.
When you remove a file (using rm
or unlink
) you are removing an inode reference from a directory.
However the inode may still be referenced from other directories. In order to determine if the contents of the file are still required, each inode keeps a reference count that is updated whenever a new link is created or destroyed.
An example use of hard-links is to efficiently create multiple archives of a file system at different points in time. Once the archive area has a copy of a particular file, then future archives can re-use these archive files rather than creating a duplicate file. Apple's "Time Machine" software does this.
No. Well yes. Not really... Actually you didn't really want to do this, did you?
The POSIX standard says no you may not! The ln
command will only allow root to do this and only if you provide the -d
option. However even root may not be able to perform this because most filesystems prevent it!
The integrity of the file system assumes the directory structure (excluding softlinks which we will talk about later) is a non-cyclic tree that is reachable from the root directory. It becomes expensive to enforce or verify this constraint if directory linking is allowed. Breaking these assumptions can cause file integrity tools to not be able to repair the file system. Recursive searches potentially never terminate and directories can have more than one parent but ".." can only refer to a single parent. All in all, a bad idea.
Legal and Licensing information: Unless otherwise specified, submitted content to the wiki must be original work (including text, java code, and media) and you provide this material under a Creative Commons License. If you are not the copyright holder, please give proper attribution and credit to existing content and ensure that you have license to include the materials.