Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load text tutorial: Remove unnecessary +1 from VOCAB_SIZE+1 #2259

Closed
wants to merge 1 commit into from

Conversation

vaharoni
Copy link
Contributor

In the Beginners tutorials > Load and preprocess data > Text, there is the following code:

# `vocab_size` is `VOCAB_SIZE + 1` since `0` is used additionally for padding.
int_model = create_model(vocab_size=VOCAB_SIZE + 1, num_labels=4)

I believe that the +1 in VOCAB_SIZE + 1 is unnecessary, as the TextVectorization layer already includes a padding token (as well as an OOV token) when its output_mode is int. Per the layer's documentation:

      output_mode: Optional specification for the output of the layer. Values
        can be `"int"`, `"multi_hot"`, `"count"` or `"tf_idf"`, configuring the
        layer as follows:
          - `"int"`: Outputs integer indices, one integer index per split string
            token. When `output_mode == "int"`, 0 is reserved for masked
            locations; this reduces the vocab size to
            `max_tokens - 2` instead of `max_tokens - 1`.

@vaharoni vaharoni requested a review from a team as a code owner August 27, 2023 09:30
@github-actions
Copy link

Preview

Preview and run these notebook edits with Google Colab: Rendered notebook diffs available on ReviewNB.com.

Format and style

Use the TensorFlow docs notebook tools to format for consistent source diffs and lint for style:
$ python3 -m pip install -U --user git+https://github.com/tensorflow/docs

$ python3 -m tensorflow_docs.tools.nbfmt notebook.ipynb
$ python3 -m tensorflow_docs.tools.nblint --arg=repo:tensorflow/docs notebook.ipynb
If commits are added to the pull request, synchronize your local branch: git pull origin tutorial_text_fix

@8bitmp3 8bitmp3 added the review in progress Someone is actively reviewing this PR label Aug 29, 2023
@MarkDaoust MarkDaoust added ready to pull Start merge process and removed review in progress Someone is actively reviewing this PR labels Sep 8, 2023
@vaharoni vaharoni closed this Sep 9, 2023
copybara-service bot pushed a commit that referenced this pull request Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to pull Start merge process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants