SQLite is back #155
Replies: 3 comments
-
Interesting - so there is actually a working locking/transaction mechanism in SQlite in a serverless network filesystem ? |
Beta Was this translation helpful? Give feedback.
-
Considering that we are using shared filesystems for virtually all our data storage internally, we should be extra careful to choose solutions that works over NFS. I am afraid that sqlite is not appropriate, and in fact, there is probably no file-based database engine that can be, considering that file locking is the only mechanism available to ensure the atomicity of transactions, and guarantee the integrity of the database under concurrent writes. From the sqlite FAQ https://www.sqlite.org/faq.html#q5:
It seems to me that a client/server solution is the only reliable way to support remote filesystems. Depending of the use-case, it may be a viable solution or we may need to rethink the design... I am open to ☕ discussions on that topic :-) |
Beta Was this translation helpful? Give feedback.
-
I would like to have more details about what is meant by "fcntl() is broken". This 20 years old message gives interesting information : https://sqlite-users.sqlite.narkive.com/5UmR4gG1/sqlite-unix-file-locking-is-badly-broken I am not sure that file locking over NFS is still broken. I red some comments saying that recent Systems (2003+) should have trusted NFS locks. But yet nothing more than sporadic comments. I believe that it will not be so easy to find a reliable backend storing data in file system. In Capsul, we are using Redis but it is not a good alternative for huge databases since everything is put in memory (and can be sync in a single file). When using a server, if it is not a permanent server with already configured accounts and authorizations, we have to deal with security issues. Ideally, we would like to rely on the access rights of the file system but it is not easy to securely replicate that in a temporary server started by a user. Therefore, I have in mind to use populse_db as an API between single file storage and multiprocessing Python code. To date there is only a SQLite backend but I already done a Postgresql one in earlier versions (Postgresql is not a good alternative to store several small databases in various directories, it can create a lot of files; more than direct strorage in json files). It is fairly easy to create a new backend thanks to the good test coverage. I plan to extend the API to make it even simpler to perform json-based storage/retreive without the need of any schema. It will be possible to deal with a kind of simple schema if storage or querying optimization is necessary. But querying will be very limited anyway. |
Beta Was this translation helpful? Give feedback.
-
We have had so many problems in BrainVISA when using SQLite database in parallel that we decided to stop using it for Capsul. So, I looked for a server-less solution to store/retrieve information in a single file using several processes or threads running in parallel. These tests were limited to key/value information but a solution offering more data structure as well a a kind of query language would be much appreciated. I looked at several technologies: HDF5, FUSE, lmdb, monetdbe and apsw. I finally wrote a simple test code whose goal is to write 40000 Python dictionaries of various size in the file using several parallel workers. Workers coordination (to avoid two workers writing the same value) is done using the "database" technology in order to assess that atomic read and write are possible. I had run the test using threads or processes and using both local file system or Neurospin's distributed file system.
It finally happens that the good old built-in sqlite3 Python module is a very good backend for this test if we take good care of transaction management. I think we are not using sqlite3 module the right way in BrainVISA and that explains the problems we have. I also believe that SQLite is still the first candidate for storing, modifying and querying data using file system. To check that SQLite is good when properly used, I will probably implement a new SQLite version of the Capsul v3 database as an alternative to Redis. If it is a viable alternative, we would be able to keep using SQLite wherever we need a file system based server-less database system.
Beta Was this translation helpful? Give feedback.
All reactions