- Name
- Status
- Description
- Synopsis
- Exceptions
- Classes
- File system operation methods
- Stat methods
- Author
- Copyright and License
fsutil
This library is considered production ready.
File-system Utilities.
fsutil.get_mountpoint('/bin')
# '/'
fsutil.get_device('/bin')
# '/dev/sdb1'
fsutil.get_disk_partitions()
# {
# '/': {'device': '/dev/disk1',
# 'fstype': 'hfs',
# 'mountpoint': '/',
# 'opts': 'rw,local,rootfs,dovolfs,journaled,multilabel'},
# '/dev': {'device': 'devfs',
# 'fstype': 'devfs',
# 'mountpoint': '/dev',
# 'opts': 'rw,local,dontbrowse,multilabel'},
# ...
# }
Raises when there is no data to scan before timeout, when Cat.cat()
or
Cat.iterate()
.
Raises when there are more than one exclusive Cat
instances(with the same id)
trying to scan a same file.
Raises when trying to use an invalid mount point path.
Raises when there is no file present before timeout, when Cat.cat()
or
Cat.iterate()
.
Synopsis: continuously tail nginx log and print it. If there is no more data for 1 hour, it quits.
from pykit import fsutil
fn = '/var/log/nginx/access.log'
for l in fsutil.Cat(fn).iterate(timeout=3600):
print l
Just like nix command cat or tail, it continuously scan a file line by line.
It provides with two way for user to handle lines: as a generator or specifying a handler function.
It also remembers the offset of the last scanning in a file in /tmp/
.
If a file does not change(inode number does not change), it scans from the last
offset, or it scan from the first byte.
syntax:
Cat(fn, handler=None, file_end_handler=None, exclusive=True, id=None, strip=False, read_chunk_size=16*1024**2)
arguments:
-
fn
: specifies the file to scan. -
handler
: specifies a callable to handle each line, ifCat()
is not used in generator mode. It can be a callable or a list of callable. See methodCat.cat()
. -
file_end_handler
: specifies a callable when file end reached. Every time it scans to end of file,file_end_handler
is called, but it is still able to not quit and to wait for new data for a while. Thusfile_end_handler
will be called more than one time. -
exclusive
: isTrue
means at the same time there can be only one same progress scanning a same file, which means,Cat
with the sameid
and the samefn
. TwoCat
instances with different id are able to scan a same file at the same time and they record their own offset in separate file.By default it is
True
. -
id
: specifies the instance id.id
is used to identify aCat
instance and is used as part of offset record file(in/tmp/
) and is used to exclude other instance. Seeexclusive
.By default
id
is the file name of the currently running python script. Thus normally a user does not need to specifyid
explicitly. -
strip
: isTrue
orFalse
to specifies if to strip blank chars(space, tab,\r
and\n
) before returning each line.By default it is
False
. -
read_chunk_size
: is the buffer size to read data once, appropriate smallread_chunk_size
will return stream data quickly.By default it is
16*1024**2
.
config:
-
cat_stat_dir
: specifies base dir to store offset recording file.By default it is
/tmp
.# cat pykitconfig cat_stat_dir = '/' # cat usage.py from pykit import fsutil fn = '/var/log/nginx/access.log' for l in fsutil.Cat(fn).iterate(timeout=3600): print l
Make a generator to yield every line.
syntax:
Cat.iterate(timeout=None)
arguments:
-
timeout
: specifies the time in second to wait for new data.If timeout is
0
or smaller than0
, it means to scan a file no more than one time:- If it sees any data, it returning them until it reaches file end.
- If there is not any data, it raises
NoData
error.
By default it is 3600.
-
default_seek
: specify a default offset when the last scanned offset is not avaliable or not valid.Not avaliable mean the stat file used to store the scanning offset is not exist or has broken. For example, when it is the first time to scan a file, the stat file will not exist.
Not valid mean the info stored in stat file is not for the file we are about to scan, this will happen when the same file is deleted and then created, the info stored in stat file is for the deleted file not for the created new file.
We will also treat the last offset stored in stat file as not valid if it is too small than the file size when you set
default_seek
to a negative number. And the absolute value ofdefault_seek
is the maximum allowed difference.It can take following values:
-
fsutil.SEEK_START: scan from the beginning of the file.
-
fsutil.SEEK_END: scan from the end of the file, mean only new data will be scanned.
-
x
(a positive number, includes0
). scan from offsetx
. -
-x
(a negative number). it is used to specify the maximum allowed difference between last offset and file size. If the difference is bigger thanx
, then scan fromx
bytes before the end of the file, not scan from the last offset.This is usefull when you want to scan from near the end of the file. Use
fsutil.SEEK_END
can not solve the problem, because it only take effect when the last offset is not avaliable.
By default it is
fsutil.SEEK_START
. -
return: a generator.
raise:
NoSuchFile
: if file does not present beforetimeout
.NoData
: if file does not have un-scanned data beforetimeout
.
Similar to Cat.iterate
except it blocks until timeout or reaches file end and
let Cat.handler
to deal with each line.
syntax:
Cat.cat(timeout=None)
return: Nothing.
Returns the full path of the file to store scanning offset.
syntax:
Cat.stat_path()
return: string
Remove the file used to store scanning offset.
syntax:
Cat.reset_stat()
return: Nothing
syntax:
fsutil.assert_mountpoint(path)
Ensure that path
must be a mount point.
Or an error NotMountPoint
is emitted.
arguments:
path
: is a path that does have to be an existent file path.
return: Nothing
syntax:
fsutil.get_all_mountpoint(all=False)
Returns a list of all mount points on this host.
arguments:
-
all
: specifies if to return non-physical device mount points.By default it is
False
thus only disk drive mount points are returned.tmpfs
or/proc
are not returned by default.
return: a list of mount point path in string.
syntax:
fsutil.get_device(path)
Get the device path(/dev/sdb
etc) where path
resides on.
arguments:
path
: is a path that does have to be an existent file path.
return:
device path like "/dev/sdb"
in string.
syntax:
fsutil.get_device_fs(device)
Return the file-system name of a device, if the device is a disk device.
arguments:
device
: is a path of a device, such as/dev/sdb1
.
return:
the file-system name, such as ext4
or hfs
.
syntax:
fsutil.get_disk_partitions(all=True)
Find and return all mounted path and its mount point information in a dictionary.
arguments:
all
: By default it isTrue
thus all mount points including non-disk path are also returned, otherwisetmpfs
or/proc
are not returned.
return: an dictionary indexed by mount point path:
{
'/': {'device': '/dev/disk1',
'fstype': 'hfs',
'mountpoint': '/',
'opts': 'rw,local,rootfs,dovolfs,journaled,multilabel'},
'/dev': {'device': 'devfs',
'fstype': 'devfs',
'mountpoint': '/dev',
'opts': 'rw,local,dontbrowse,multilabel'},
'/home': {'device': 'map auto_home',
'fstype': 'autofs',
'mountpoint': '/home',
'opts': 'rw,dontbrowse,automounted,multilabel'},
'/net': {'device': 'map -hosts',
'fstype': 'autofs',
'mountpoint': '/net',
'opts': 'rw,nosuid,dontbrowse,automounted,multilabel'}
}
syntax:
fsutil.get_mountpoint(path)
Return the mount point where this path
resides on.
All symbolic links are resolved when looking up for mount point.
arguments:
path
: is a path that does have to be an existent file path.
return:
the mount point path(one of output of command mount
on linux)
syntax:
fsutil.get_path_fs(path)
Return the name of device where the path
is mounted.
arguments:
path
: is a file path on a file system.
return:
the file-system name, such as ext4
or hfs
.
syntax:
fsutil.get_sub_dirs(path)
Get all sorted sub directories of path
.
arguments:
path
: is the directory path.
return: a list contain all sub directory names.
syntax:
fsutil.list_fns(path, pattern='.*')
List all files with pattern
in path
.
arguments:
-
path
: is a directory path. -
pattern
: is the file name pattern wanted. A regular expression.
return:
a alphabetical sorted list contain all file name in path
with pattern
.
syntax:
fsutil.makedirs(*path, mode=0755, uid=None, gid=None)
Make directory. If intermediate directory does not exist, create them too.
arguments:
-
*path
: is a single part path such as/tmp/foo
or a separated path such as('/tmp', 'foo')
. -
mode
: specifies permission mode for the dir created or existed.By defaul it is
0755
. -
uid
: andgid
to specify another user/group for the dir to create.By default they are
None
and the created dir inherits ownership from the running python program.
return: Nothing
raise:
OSError
if trying to create dir with the same path of a non-dir file, or
having other issue like permission denied.
syntax:
fsutil.read_file(path)
Read and return the entire file specified by path
arguments:
path
: is the file path to read.
return: file content in string.
syntax:
fsutil.remove(path, ignore_errors=False, onerror=None)
Recursively delete path
, the path
is one of file, directory or symbolic link.
arguments:
-
path
: is the path to remove. -
ignore_errors
: whether ignore os.error while deleting thepath
. -
onerror
: Ifignore_errors
is set toTrue
, errors(os.error) are ignored; otherwise, ifonerror
is set, it is called to handle the error with arguments(func, path, exc_info)
where func is os.listdir, os.remove, os.rmdir or os.path.isdir.
return: Nothing
syntax:
fsutil.write_file(path, content, uid=None, gid=None, atomic=False, fsync=True)
Write content
to file path
.
arguments:
-
path
: is the file path to write to. -
content
: specifies the content to write. -
uid
andgid
: specifies the user_id/group_id the file belongs to.Bedefault they are
None
, which means the file that has been written inheirts ownership of the running python script. -
atomic
: atomically write content to the path.Write content to a temporary file, then rename to the path. The temporary file names of same path in one process distinguish with
timeutil.ns()
, it is not atomic if the temporary files of same path created at the same nanosecond. The renaming will be an atomic operation (this is a POSIX requirement). -
fsync
: specify if need to synchronize data to storage device.
return: Nothing
syntax:
fsutil.calc_checksums(path, sha1=False, md5=False, crc32=False, sha256=False, block_size=READ_BLOCK, io_limit=READ_BLOCK):
Calculate checksums of path
, like: sha1
md5
crc32
.
from pykit import fsutil
file_name = 'test.file'
fsutil.write_file(file_name, '')
print fsutil.calc_checksums(file_name, sha1=True, md5=True, crc32=False, sha256=True)
#{
# 'sha1': 'da39a3ee5e6b4b0d3255bfef95601890afd80709',
# 'md5': 'd41d8cd98f00b204e9800998ecf8427e',
# 'crc32': None,
# 'sha256':'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
#}
arguments:
-
path
: is the file path to calculate. -
sha1
andmd5
andcrc32
andsha256
: are checksum types to calculate. Default isFalse
.The result of this type is
None
if the checksum type isFalse
. -
block_size
: is the buffer size while reading content ofpath
. -
io_limit
: is the IO limitation per second while reading content ofpath
.There is no limitation if
io_limit
is negative number.
return:
a dict with keys sha1
and md5
and crc32
and sha256
.
syntax:
fsutil.get_path_inode_usage(path)
Collect inode usage information of the file system path
is mounted on.
arguments:
path
: specifies the fs - path to collect usage info. Such as/tmp
or/home/alice
.
return: a dictionary in the following format:
{
'total': total number of inode,
'used': used inode(includes inode reserved for super user),
'available': total - used,
'percent': float(used) / 'total'
}
syntax:
fsutil.get_path_usage(path)
Collect space usage information of the file system path
is mounted on.
arguments:
path
: specifies the fs-path to collect usage info. Such as/tmp
or/home/alice
.
return: a dictionary in the following format:
{
'total': total space in byte,
'used': used space in byte(includes space reserved for super user),
'available': total - used,
'percent': float(used) / 'total',
}
There two concept for unused space: free
and available
because some file systems have a reserved(maybe 5%) for super user like root
:
-
free: with blocks reserved for super users.
-
available: without blocks reserved for super users.
Since most of the time an application can not run as root
then it can not use the reserved space.
Thus this function provides with the available
bytes by default.
syntax:
fsutil.iostat(device=None, path=None, stat_path=None)
Collect IO stat.
Synopsis:
print fsutil.iostat('/dev/sda1') # {'read': 6151, 'write': 34073, 'ioutil': 0}
print fsutil.iostat(path='/') # {'read': 6151, 'write': 34073, 'ioutil': 100}
It accepts either device
or path
as target to collect IO stat from:
-
device
should be a path starts with/dev
, such as/dev/sda1
. -
path
is any path on a valid mounted fs. Ifpath
is used anddevice
isNone
, it uses the device on which thepath
is mounted.
One must specify either device
or path
.
/proc/diskstats
provides accumulated IO stat since a host boots up.
Such as total count of read/write operation on a disk.
This function records changes in /proc/diskstats
and calculates the diff
between two recorded stat as return value.
fsutil.iostat
reads instant IO stat from /proc/diskstats
and save it in
stat_path
. When next time fsutil.iostat
is called, it calculates the
difference between the current stat from /proce/diskstats
and the saved stat.
If no previous recorded stat saved in stat_path
, it waits a second and load
/proc/diskstats
again, and calculate the diff.
arguments:
-
device
: specifies from which device to collect IO stat. -
path
: specifies from which fs path to collect IO stat. -
stat_path
: specifies where to store and load IO stat.By default it is
None
, then it usesconfig.iostat_stat_path
(/tmp/pykit-iostat
) to save stat.
return: a dict contains 3 field:
{
'read': 6151,
'write': 34073,
'ioutil': 0
}
read
and write
is in byte/second.
ioutil
is a percentage number between 0 and 100.
Zhang Yanpo (张炎泼) [email protected]
The MIT License (MIT)
Copyright (c) 2015 Zhang Yanpo (张炎泼) [email protected]