Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add layer 3-log #11

Merged
merged 1 commit into from
Dec 14, 2023
Merged

Add layer 3-log #11

merged 1 commit into from
Dec 14, 2023

Conversation

lucassong-mh
Copy link
Contributor

No description provided.

@lucassong-mh lucassong-mh force-pushed the dev-songsw-3_log branch 2 times, most recently from 9ecf5ff to 00ac9ee Compare September 26, 2023 11:44
@lucassong-mh lucassong-mh force-pushed the dev-songsw-3_log branch 3 times, most recently from 4b91ed8 to 8c9dd5e Compare October 13, 2023 05:17
@lucassong-mh lucassong-mh force-pushed the dev-songsw-3_log branch 2 times, most recently from 17dc4f8 to fe3e702 Compare November 16, 2023 09:31
@lucassong-mh lucassong-mh marked this pull request as ready for review November 16, 2023 09:31
Copy link
Collaborator

@tatetian tatetian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first round of review, in which I have focused on TxLog and ChunkAlloc.

src/error.rs Outdated Show resolved Hide resolved
src/tx/current.rs Show resolved Hide resolved
src/layers/3-log/tx_log.rs Outdated Show resolved Hide resolved
use crate::tx::{CurrentTx, Tx, TxData, TxId, TxProvider};
use crate::util::{LazyDelete, RandomInit};

use alloc::collections::{BTreeMap, BTreeSet};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a recommended convention, the standard and third-party imports should be put before the imports of the current crate.

src/layers/3-log/tx_log.rs Outdated Show resolved Hide resolved
let log_cache = log_caches.get_mut(log_id).unwrap();
let mut cache_inner = log_cache.inner.write();
open_cache.lru_cache.iter().for_each(|(&pos, node)| {
cache_inner.lru_cache.put(pos, node.clone());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating items from the per-TX cache to the global cache looks like an expensive operation (O(N\log{N}), where N is the number of blocks in a TxLog). I am certain that this can be improved a lot. Try to do some microbenchmark and decide if this is a performance bottleneck.

@@ -1,9 +1,7 @@
//! The layer of transactional logging.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-level Rust docs should be provided. Document how the individual APIs, namely TxLogStore, TxLog, and Tx, that are provided by this module can be used together. Also, TxLogStore and TxLog (as well as Tx) are documented inadequately. Sadly, their current type-level Rust docs are basically what I wrote in the draft design. But this is now the official implementation. To this point, we should have been much more to say and write.

I cannot put my trust to any code that are poorly documented. Because if one cannot write decent docs to give the spec of the code, how on the earth can anyone, including the author, knows if the implementation is correct or not. Our abstractions are not trivial. They deserve to be well documented.

With ChatGPT, the job of writing docs in decent English is much easier.

src/layers/3-log/tx_log.rs Show resolved Hide resolved
src/layers/3-log/raw_log.rs Outdated Show resolved Hide resolved
src/layers/3-log/chunk.rs Outdated Show resolved Hide resolved
src/layers/3-log/chunk.rs Show resolved Hide resolved
src/layers/3-log/chunk.rs Show resolved Hide resolved
// A bitmap where each bit indicates whether a corresponding chunk has been
// allocated.
alloc_map: BitVec<usize, Lsb0>,
alloc_map: BitMap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe free_map is better, since all other fields are free.

// The number of free chunks.
free_count: usize,
// The minimum free chunk Id. Useful to narrow the scope of searching for
// The minimum free chunk Id. Useful to narrow the scope of searching for
// free chunk IDs.
min_free: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the start position to search in the alloc_map, min_free is a little vague. How about search_cursor ?

self.alloc_map[chunk_id] = false;
assert_eq!(self.alloc_map[chunk_id], true);
self.alloc_map.set(chunk_id, false);
self.free_count += 1;

// Keep the invariance that all free chunk IDs are no less than min_free
if chunk_id < self.min_free {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of min_free seems to encourage allocating the Chunk that deallocated just now. I. prefer to alloc the Chunk which may never used before or deallocate a long time ago.

use pod::Pod;
use serde::{Deserialize, Serialize};

pub type TxLogId = RawLogId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TxLogId and RawLogId are the same, a single LogId definition in the mod level could be better, for it establishes a connection between the TxLog and RawLog, as a master key.

@@ -463,8 +540,11 @@ impl<D: BlockSet> TxLogStore<D> {
/// This method must be called within a TX. Otherwise, this method panics.
pub fn open_log_in(&self, bucket: &str) -> Result<Arc<TxLog<D>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the function for better readability, such as open_max_log_in or open_latest_log_in?

@@ -463,8 +540,11 @@ impl<D: BlockSet> TxLogStore<D> {
/// This method must be called within a TX. Otherwise, this method panics.
pub fn open_log_in(&self, bucket: &str) -> Result<Arc<TxLog<D>>> {
let log_ids = self.list_logs(bucket)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maintain the max_log_id, avoiding list_log and iter them every time?

self.do_contain_log(log_id, &state, &current_tx)
}

fn do_contain_log(&self, log_id: TxLogId, state: &State, current_tx: &CurrentTx<'_>) -> bool {
if state.contains_log(log_id) {
if state.persistent.contains_log(log_id) {
let not_deleted = current_tx
.data_with(|store_edit: &TxLogStoreEdit| !store_edit.is_log_deleted(log_id));
not_deleted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid negated terms, and it seems that not_deleted and is_created are redundant, leaving the function call alone is enough.

}

/// A transactional log.
#[derive(Clone)]
pub struct TxLog<D> {
inner_log: Arc<TxLogInner<D>>,
tx_provider: Arc<TxProvider>,
can_append: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawLog also has a field can_append. Do they have the same meaning? Can we just keep one of them?

Copy link
Collaborator

@tatetian tatetian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution!

@tatetian tatetian merged commit 64f76c2 into asterinas:main Dec 14, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants