Skip to content

Commit

Permalink
2024-02-08 19:15:19
Browse files Browse the repository at this point in the history
  • Loading branch information
wizardforcel committed Feb 8, 2024
1 parent d4d1847 commit 0a63444
Showing 1 changed file with 44 additions and 0 deletions.
44 changes: 44 additions & 0 deletions totrans/gen-dl_15.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1279,10 +1279,12 @@
id: totrans-170
prefs: []
type: TYPE_NORMAL
zh: '[![4](Images/4.png)](#co_music_generation_CO1-4)'
- en: We remove the unnecessary extra dimension with a `Reshape` layer.
id: totrans-171
prefs: []
type: TYPE_NORMAL
zh: 我们使用 `Reshape` 层去除不必要的额外维度。
- en: The reason we use convolutional operations rather than requiring two independent
vectors into the network is because we would like the network to learn how one
bar should follow on from another in a consistent way. Using a neural network
Expand All @@ -1292,19 +1294,23 @@
id: totrans-172
prefs: []
type: TYPE_NORMAL
zh: 我们使用卷积操作而不是要求两个独立的向量进入网络的原因是,我们希望网络学习如何以一种一致的方式让一个小节跟随另一个小节。使用神经网络沿着时间轴扩展输入向量意味着模型有机会学习音乐如何跨越小节流动,而不是将每个小节视为完全独立于上一个的。
- en: Chords, style, melody, and groove
id: totrans-173
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 和弦、风格、旋律和 groove
- en: 'Let’s now take a closer look at the four different inputs that feed the generator:'
id: totrans-174
prefs: []
type: TYPE_NORMAL
zh: 现在让我们更仔细地看一下喂给生成器的四种不同输入:
- en: Chords
id: totrans-175
prefs: []
type: TYPE_NORMAL
zh: 和弦
- en: The chords input is a single noise vector of length `Z_DIM`. This vector’s job
is to control the general progression of the music over time, shared across tracks,
so we use a `TemporalNetwork` to transform this single vector into a different
Expand All @@ -1314,37 +1320,45 @@
id: totrans-176
prefs: []
type: TYPE_NORMAL
zh: 和弦输入是一个长度为 `Z_DIM` 的单一噪声向量。这个向量的作用是控制音乐随时间的总体进展,跨越轨道共享,因此我们使用 `TemporalNetwork`
将这个单一向量转换为每个小节的不同潜在向量。请注意,虽然我们称这个输入为和弦,但它实际上可以控制音乐中每个小节变化的任何内容,比如一般的节奏风格,而不是特定于任何特定轨道。
- en: Style
id: totrans-177
prefs: []
type: TYPE_NORMAL
zh: 风格
- en: The style input is also a vector of length `Z_DIM`. This is carried forward
without transformation, so it is the same across all bars and tracks. It can be
thought of as the vector that controls the overall style of the piece (i.e., it
affects all bars and tracks consistently).
id: totrans-178
prefs: []
type: TYPE_NORMAL
zh: 风格输入也是长度为 `Z_DIM` 的向量。这个向量在不经过转换的情况下传递,因此在所有小节和轨道上都是相同的。它可以被视为控制乐曲整体风格的向量(即,它会一致地影响所有小节和轨道)。
- en: Melody
id: totrans-179
prefs: []
type: TYPE_NORMAL
zh: 旋律
- en: The melody input is an array of shape `[N_TRACKS, Z_DIM]`—that is, we provide
the model with a random noise vector of length `Z_DIM` for each track.
id: totrans-180
prefs: []
type: TYPE_NORMAL
zh: 旋律输入是一个形状为 `[N_TRACKS, Z_DIM]` 的数组—也就是说,我们为每个轨道提供长度为 `Z_DIM` 的随机噪声向量。
- en: Each of these vectors is passed through a track-specific `TemporalNetwork`,
where the weights are not shared between tracks. The output is a vector of length
`Z_DIM` for every bar of every track. The model can therefore use these input
vectors to fine-tune the content of every single bar and track independently.
id: totrans-181
prefs: []
type: TYPE_NORMAL
zh: 这些向量中的每一个都通过轨道特定的 `TemporalNetwork`,其中轨道之间的权重不共享。输出是每个轨道的每个小节的长度为 `Z_DIM` 的向量。因此,模型可以使用这些输入向量来独立地微调每个小节和轨道的内容。
- en: Groove
id: totrans-182
prefs: []
type: TYPE_NORMAL
zh: Groove
- en: The groove input is also an array of shape `[N_TRACKS, Z_DIM]`—a random noise
vector of length `Z_DIM` for each track. Unlike the melody input, these vectors
are not passed through the temporal network but instead are fed straight through,
Expand All @@ -1353,69 +1367,86 @@
id: totrans-183
prefs: []
type: TYPE_NORMAL
zh: groove 输入也是一个形状为 `[N_TRACKS, Z_DIM]` 的数组,即每个轨道的长度为 `Z_DIM` 的随机噪声向量。与旋律输入不同,这些向量不通过时间网络,而是直接传递,就像风格向量一样。因此,每个
groove 向量将影响轨道的整体属性,跨越所有小节。
- en: We can summarize the responsibilities of each component of the MuseGAN generator
as shown in [Table 11-1](#musegan_sections).
id: totrans-184
prefs: []
type: TYPE_NORMAL
zh: 我们可以总结每个 MuseGAN 生成器组件的责任,如 [表11-1](#musegan_sections) 所示。
- en: Table 11-1\. Components of the MuseGAN generator
id: totrans-185
prefs: []
type: TYPE_NORMAL
zh: 表11-1\. MuseGAN 生成器的组件
- en: '| | Output differs across bars? | Output differs across parts? |'
id: totrans-186
prefs: []
type: TYPE_TB
zh: '| | 输出在小节之间不同吗? | 输出在部分之间不同吗? |'
- en: '| --- | --- | --- |'
id: totrans-187
prefs: []
type: TYPE_TB
zh: '| --- | --- | --- |'
- en: '| Style | X | X |'
id: totrans-188
prefs: []
type: TYPE_TB
zh: '| 风格 | X | X |'
- en: '| Groove | X | ✓ |'
id: totrans-189
prefs: []
type: TYPE_TB
zh: '| Groove | X | ✓ |'
- en: '| Chords | ✓ | X |'
id: totrans-190
prefs: []
type: TYPE_TB
zh: '| 和弦 | ✓ | X |'
- en: '| Melody | ✓ | ✓ |'
id: totrans-191
prefs: []
type: TYPE_TB
zh: '| 旋律 | ✓ | ✓ |'
- en: The final piece of the MuseGAN generator is the *bar generator*—let’s see how
we can use this to glue together the outputs from the chord, style, melody, and
groove components.
id: totrans-192
prefs: []
type: TYPE_NORMAL
zh: MuseGAN 生成器的最后一部分是 *小节生成器*—让我们看看如何使用它来将和弦、风格、旋律和 groove 组件的输出粘合在一起。
- en: The bar generator
id: totrans-193
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 小节生成器
- en: The bar generator receives four latent vectors—one from each of the chord, style,
melody, and groove components. These are concatenated to produce a vector of length
`4 * Z_DIM` as input. The output is a piano roll representation of a single bar
for a single track—i.e., a tensor of shape `[1, n_steps_per_bar, n_pitches, 1]`.
id: totrans-194
prefs: []
type: TYPE_NORMAL
zh: 小节生成器接收四个潜在向量——来自和弦、风格、旋律和 groove 组件。这些被连接起来产生长度为 `4 * Z_DIM` 的输入向量。输出是单个轨道的单个小节的钢琴卷表示—即,形状为
`[1, n_steps_per_bar, n_pitches, 1]` 的张量。
- en: The bar generator is just a neural network that uses convolutional transpose
layers to expand the time and pitch dimensions of the input vector. We create
one bar generator for every track, and weights are not shared between tracks.
The Keras code to build a `BarGenerator` is given in [Example 11-7](#example0706).
id: totrans-195
prefs: []
type: TYPE_NORMAL
zh: 小节生成器只是一个使用卷积转置层来扩展输入向量的时间和音高维度的神经网络。我们为每个轨道创建一个小节生成器,轨道之间的权重不共享。构建 `BarGenerator`
的 Keras 代码在 [示例11-7](#example0706) 中给出。
- en: Example 11-7\. Building the `BarGenerator`
id: totrans-196
prefs:
- PREF_H5
type: TYPE_NORMAL
zh: 示例11-7\. 构建 `BarGenerator`
- en: '[PRE8]'
id: totrans-197
prefs: []
Expand All @@ -1425,39 +1456,48 @@
id: totrans-198
prefs: []
type: TYPE_NORMAL
zh: '[![1](Images/1.png)](#co_music_generation_CO2-1)'
- en: The input to the bar generator is a vector of length `4 * Z_DIM`.
id: totrans-199
prefs: []
type: TYPE_NORMAL
zh: bar 生成器的输入是长度为 `4 * Z_DIM` 的向量。
- en: '[![2](Images/2.png)](#co_music_generation_CO2-2)'
id: totrans-200
prefs: []
type: TYPE_NORMAL
zh: '[![2](Images/2.png)](#co_music_generation_CO2-2)'
- en: After passing it through a `Dense` layer, we reshape the tensor to prepare it
for the convolutional transpose operations.
id: totrans-201
prefs: []
type: TYPE_NORMAL
zh: 通过一个 `Dense` 层后,我们重新塑造张量以准备进行卷积转置操作。
- en: '[![3](Images/3.png)](#co_music_generation_CO2-3)'
id: totrans-202
prefs: []
type: TYPE_NORMAL
zh: '[![3](Images/3.png)](#co_music_generation_CO2-3)'
- en: First we expand the tensor along the timestep axis…​
id: totrans-203
prefs: []
type: TYPE_NORMAL
zh: 首先我们沿着时间步长轴扩展张量…​
- en: '[![4](Images/4.png)](#co_music_generation_CO2-4)'
id: totrans-204
prefs: []
type: TYPE_NORMAL
zh: '[![4](Images/4.png)](#co_music_generation_CO2-4)'
- en: …​then along the pitch axis.
id: totrans-205
prefs: []
type: TYPE_NORMAL
zh: …​然后沿着音高轴。
- en: '[![5](Images/5.png)](#co_music_generation_CO2-5)'
id: totrans-206
prefs: []
type: TYPE_NORMAL
zh: '[![5](Images/5.png)](#co_music_generation_CO2-5)'
- en: The final layer has a tanh activation applied, as we will be using a WGAN-GP
(which requires tanh output activation) to train the network.
id: totrans-207
Expand Down Expand Up @@ -1801,6 +1841,7 @@
id: totrans-260
prefs: []
type: TYPE_NORMAL
zh: 我们还探讨了如何调整标记化过程以处理多声部(多轨)音乐生成。网格标记化将乐谱的钢琴卷表示序列化,使我们能够在描述每个音轨中存在哪个音符的令牌的单个流上训练变压器,在离散的、等间隔的时间步长间隔内。基于事件的标记化产生了一个*配方*,描述了如何以顺序方式创建多行音乐,通过一系列指令的单个流。这两种方法都有优缺点——变压器基于的音乐生成方法的成功或失败往往严重依赖于标记化方法的选择。
- en: We also saw that generating music does not always require a sequential approach—MuseGAN
uses convolutions to generate polyphonic musical scores with multiple tracks,
by treating the score as an image where the tracks are individual channels of
Expand All @@ -1813,14 +1854,17 @@
id: totrans-261
prefs: []
type: TYPE_NORMAL
zh: 我们还看到生成音乐并不总是需要顺序方法——MuseGAN使用卷积来生成具有多轨的多声部乐谱,将乐谱视为图像,其中轨道是图像的各个通道。MuseGAN的新颖之处在于四个输入噪声向量(和弦、风格、旋律和节奏)的组织方式,使得可以对音乐的高级特征保持完全控制。虽然底层的和声仍然不像巴赫的那样完美或多样化,但这是对一个极其难以掌握的问题的良好尝试,并突显了GAN处理各种问题的能力。
- en: '^([1](ch11.xhtml#idm45387004193120-marker)) Cheng-Zhi Anna Huang et al., “Music
Transformer: Generating Music with Long-Term Structure,” September 12, 2018, [*https://arxiv.org/abs/1809.04281*](https://arxiv.org/abs/1809.04281).'
id: totrans-262
prefs: []
type: TYPE_NORMAL
zh: ^([1](ch11.xhtml#idm45387004193120-marker)) 黄成志安娜等人,“音乐变压器:生成具有长期结构的音乐”,2018年9月12日,[*https://arxiv.org/abs/1809.04281*](https://arxiv.org/abs/1809.04281)。
- en: '^([2](ch11.xhtml#idm45387004128000-marker)) Hao-Wen Dong et al., “MuseGAN:
Multi-Track Sequential Generative Adversarial Networks for Symbolic Music Generation
and Accompaniment,” September 19, 2017, [*https://arxiv.org/abs/1709.06298*](https://arxiv.org/abs/1709.06298).``'
id: totrans-263
prefs: []
type: TYPE_NORMAL
zh: ^([2](ch11.xhtml#idm45387004128000-marker)) 董浩文等人,“MuseGAN:用于符号音乐生成和伴奏的多轨序列生成对抗网络”,2017年9月19日,[*https://arxiv.org/abs/1709.06298*](https://arxiv.org/abs/1709.06298)。

0 comments on commit 0a63444

Please sign in to comment.