huggingface · ethanknights · Sep 8, 2024 · Sep 8, 2024
diff --git a/README.md b/README.md
@@ -75,7 +75,7 @@ with safe_open("model.safetensors", framework="pt", device="cpu") as f:
 
 ### Format
 
-- 8 bytes: `N`, an unsigned little-endian 64-bit integer, containing the size of the header
+- 8 bytes: `N`, an unsigned little-endian 64-bit integer, containing the size of the header.
 - N bytes: a JSON UTF-8 string representing the header.
   - The header data MUST begin with a `{` character (0x7B).
   - The header data MAY be trailing padded with whitespace (0x20).
@@ -90,9 +90,9 @@ Notes:
  - In general the subset of JSON is implicitly decided by `serde_json` for
    this library. Anything obscure might be modified at a later time, that odd ways
    to represent integer, newlines and escapes in utf-8 strings. This would only
-   be done for safety concerns
+   be done for safety concerns.
  - Tensor values are not checked against, in particular NaN and +/-Inf could
-   be in the file
+   be in the file.
  - Empty tensors (tensors with 1 dimension being 0) are allowed.
    They are not storing any data in the databuffer, yet retaining size in the header.
    They don't really bring a lot of values but are accepted since they are valid tensors
@@ -172,15 +172,15 @@ Since we can invent a new format we can propose additional benefits:
   on the size of the header of 100MB to prevent parsing extremely large JSON.
   Also when reading the file, there's a guarantee that addresses in the file
   do not overlap in any way, meaning when you're loading a file you should never
-  exceed the size of the file in memory
+  exceed the size of the file in memory.
 
 - Faster load: PyTorch seems to be the fastest file to load out in the major
   ML formats. However, it does seem to have an extra copy on CPU, which we
   can bypass in this lib by using `torch.UntypedStorage.from_file`.
   Currently, CPU loading times are extremely fast with this lib compared to pickle.
   GPU loading times are as fast or faster than PyTorch equivalent.
   Loading first on CPU with memmapping with torch, and then moving all tensors to GPU seems
-  to be faster too somehow (similar behavior in torch pickle)
+  to be faster too somehow (similar behavior in torch pickle).
 
 - Lazy loading: in distributed (multi-node or multi-gpu) settings, it's nice to be able to
   load only part of the tensors on the various models. For

diff --git a/docs/source/index.mdx b/docs/source/index.mdx
@@ -11,12 +11,12 @@ Safetensors is a new simple format for storing tensors safely (as opposed to pic
 
 ## Installation
 
-with pip:
+With pip:
 ```
 pip install safetensors
 ```
 
-with conda:
+With conda:
 ```
 conda install -c huggingface safetensors
 ```
@@ -34,7 +34,7 @@ with safe_open("model.safetensors", framework="pt", device=0) as f:
         tensors[k] = f.get_tensor(k)
 ```
 
-Loading only part of the tensors (interesting when running on multiple GPU)
+Loading only part of the tensors (useful when running on multiple GPUs):
 
 ```python
 from safetensors import safe_open
@@ -61,7 +61,7 @@ save_file(tensors, "model.safetensors")
 
 ## Format
 
-Let's say you have safetensors file named `model.safetensors`, then `model.safetensors` will have the following internal format:
+Let's say you have safetensors file named `model.safetensors`, this will have the following internal format:
 
 <div class="flex justify-center">
     <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/safetensors/safetensors-format.svg"/>