From 0a63444930aaf7810072644ca0eac733c073dd23 Mon Sep 17 00:00:00 2001 From: wizardforcel <562826179@qq.com> Date: Thu, 8 Feb 2024 19:15:21 +0800 Subject: [PATCH] 2024-02-08 19:15:19 --- totrans/gen-dl_15.yaml | 44 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/totrans/gen-dl_15.yaml b/totrans/gen-dl_15.yaml index 81c0f48..1c05066 100644 --- a/totrans/gen-dl_15.yaml +++ b/totrans/gen-dl_15.yaml @@ -1279,10 +1279,12 @@ id: totrans-170 prefs: [] type: TYPE_NORMAL + zh: '[![4](Images/4.png)](#co_music_generation_CO1-4)' - en: We remove the unnecessary extra dimension with a `Reshape` layer. id: totrans-171 prefs: [] type: TYPE_NORMAL + zh: 我们使用 `Reshape` 层去除不必要的额外维度。 - en: The reason we use convolutional operations rather than requiring two independent vectors into the network is because we would like the network to learn how one bar should follow on from another in a consistent way. Using a neural network @@ -1292,19 +1294,23 @@ id: totrans-172 prefs: [] type: TYPE_NORMAL + zh: 我们使用卷积操作而不是要求两个独立的向量进入网络的原因是,我们希望网络学习如何以一种一致的方式让一个小节跟随另一个小节。使用神经网络沿着时间轴扩展输入向量意味着模型有机会学习音乐如何跨越小节流动,而不是将每个小节视为完全独立于上一个的。 - en: Chords, style, melody, and groove id: totrans-173 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 和弦、风格、旋律和 groove - en: 'Let’s now take a closer look at the four different inputs that feed the generator:' id: totrans-174 prefs: [] type: TYPE_NORMAL + zh: 现在让我们更仔细地看一下喂给生成器的四种不同输入: - en: Chords id: totrans-175 prefs: [] type: TYPE_NORMAL + zh: 和弦 - en: The chords input is a single noise vector of length `Z_DIM`. This vector’s job is to control the general progression of the music over time, shared across tracks, so we use a `TemporalNetwork` to transform this single vector into a different @@ -1314,10 +1320,13 @@ id: totrans-176 prefs: [] type: TYPE_NORMAL + zh: 和弦输入是一个长度为 `Z_DIM` 的单一噪声向量。这个向量的作用是控制音乐随时间的总体进展,跨越轨道共享,因此我们使用 `TemporalNetwork` + 将这个单一向量转换为每个小节的不同潜在向量。请注意,虽然我们称这个输入为和弦,但它实际上可以控制音乐中每个小节变化的任何内容,比如一般的节奏风格,而不是特定于任何特定轨道。 - en: Style id: totrans-177 prefs: [] type: TYPE_NORMAL + zh: 风格 - en: The style input is also a vector of length `Z_DIM`. This is carried forward without transformation, so it is the same across all bars and tracks. It can be thought of as the vector that controls the overall style of the piece (i.e., it @@ -1325,15 +1334,18 @@ id: totrans-178 prefs: [] type: TYPE_NORMAL + zh: 风格输入也是长度为 `Z_DIM` 的向量。这个向量在不经过转换的情况下传递,因此在所有小节和轨道上都是相同的。它可以被视为控制乐曲整体风格的向量(即,它会一致地影响所有小节和轨道)。 - en: Melody id: totrans-179 prefs: [] type: TYPE_NORMAL + zh: 旋律 - en: The melody input is an array of shape `[N_TRACKS, Z_DIM]`—that is, we provide the model with a random noise vector of length `Z_DIM` for each track. id: totrans-180 prefs: [] type: TYPE_NORMAL + zh: 旋律输入是一个形状为 `[N_TRACKS, Z_DIM]` 的数组—也就是说,我们为每个轨道提供长度为 `Z_DIM` 的随机噪声向量。 - en: Each of these vectors is passed through a track-specific `TemporalNetwork`, where the weights are not shared between tracks. The output is a vector of length `Z_DIM` for every bar of every track. The model can therefore use these input @@ -1341,10 +1353,12 @@ id: totrans-181 prefs: [] type: TYPE_NORMAL + zh: 这些向量中的每一个都通过轨道特定的 `TemporalNetwork`,其中轨道之间的权重不共享。输出是每个轨道的每个小节的长度为 `Z_DIM` 的向量。因此,模型可以使用这些输入向量来独立地微调每个小节和轨道的内容。 - en: Groove id: totrans-182 prefs: [] type: TYPE_NORMAL + zh: Groove - en: The groove input is also an array of shape `[N_TRACKS, Z_DIM]`—a random noise vector of length `Z_DIM` for each track. Unlike the melody input, these vectors are not passed through the temporal network but instead are fed straight through, @@ -1353,50 +1367,62 @@ id: totrans-183 prefs: [] type: TYPE_NORMAL + zh: groove 输入也是一个形状为 `[N_TRACKS, Z_DIM]` 的数组,即每个轨道的长度为 `Z_DIM` 的随机噪声向量。与旋律输入不同,这些向量不通过时间网络,而是直接传递,就像风格向量一样。因此,每个 + groove 向量将影响轨道的整体属性,跨越所有小节。 - en: We can summarize the responsibilities of each component of the MuseGAN generator as shown in [Table 11-1](#musegan_sections). id: totrans-184 prefs: [] type: TYPE_NORMAL + zh: 我们可以总结每个 MuseGAN 生成器组件的责任,如 [表11-1](#musegan_sections) 所示。 - en: Table 11-1\. Components of the MuseGAN generator id: totrans-185 prefs: [] type: TYPE_NORMAL + zh: 表11-1\. MuseGAN 生成器的组件 - en: '| | Output differs across bars? | Output differs across parts? |' id: totrans-186 prefs: [] type: TYPE_TB + zh: '| | 输出在小节之间不同吗? | 输出在部分之间不同吗? |' - en: '| --- | --- | --- |' id: totrans-187 prefs: [] type: TYPE_TB + zh: '| --- | --- | --- |' - en: '| Style | X | X |' id: totrans-188 prefs: [] type: TYPE_TB + zh: '| 风格 | X | X |' - en: '| Groove | X | ✓ |' id: totrans-189 prefs: [] type: TYPE_TB + zh: '| Groove | X | ✓ |' - en: '| Chords | ✓ | X |' id: totrans-190 prefs: [] type: TYPE_TB + zh: '| 和弦 | ✓ | X |' - en: '| Melody | ✓ | ✓ |' id: totrans-191 prefs: [] type: TYPE_TB + zh: '| 旋律 | ✓ | ✓ |' - en: The final piece of the MuseGAN generator is the *bar generator*—let’s see how we can use this to glue together the outputs from the chord, style, melody, and groove components. id: totrans-192 prefs: [] type: TYPE_NORMAL + zh: MuseGAN 生成器的最后一部分是 *小节生成器*—让我们看看如何使用它来将和弦、风格、旋律和 groove 组件的输出粘合在一起。 - en: The bar generator id: totrans-193 prefs: - PREF_H3 type: TYPE_NORMAL + zh: 小节生成器 - en: The bar generator receives four latent vectors—one from each of the chord, style, melody, and groove components. These are concatenated to produce a vector of length `4 * Z_DIM` as input. The output is a piano roll representation of a single bar @@ -1404,6 +1430,8 @@ id: totrans-194 prefs: [] type: TYPE_NORMAL + zh: 小节生成器接收四个潜在向量——来自和弦、风格、旋律和 groove 组件。这些被连接起来产生长度为 `4 * Z_DIM` 的输入向量。输出是单个轨道的单个小节的钢琴卷表示—即,形状为 + `[1, n_steps_per_bar, n_pitches, 1]` 的张量。 - en: The bar generator is just a neural network that uses convolutional transpose layers to expand the time and pitch dimensions of the input vector. We create one bar generator for every track, and weights are not shared between tracks. @@ -1411,11 +1439,14 @@ id: totrans-195 prefs: [] type: TYPE_NORMAL + zh: 小节生成器只是一个使用卷积转置层来扩展输入向量的时间和音高维度的神经网络。我们为每个轨道创建一个小节生成器,轨道之间的权重不共享。构建 `BarGenerator` + 的 Keras 代码在 [示例11-7](#example0706) 中给出。 - en: Example 11-7\. Building the `BarGenerator` id: totrans-196 prefs: - PREF_H5 type: TYPE_NORMAL + zh: 示例11-7\. 构建 `BarGenerator` - en: '[PRE8]' id: totrans-197 prefs: [] @@ -1425,39 +1456,48 @@ id: totrans-198 prefs: [] type: TYPE_NORMAL + zh: '[![1](Images/1.png)](#co_music_generation_CO2-1)' - en: The input to the bar generator is a vector of length `4 * Z_DIM`. id: totrans-199 prefs: [] type: TYPE_NORMAL + zh: bar 生成器的输入是长度为 `4 * Z_DIM` 的向量。 - en: '[![2](Images/2.png)](#co_music_generation_CO2-2)' id: totrans-200 prefs: [] type: TYPE_NORMAL + zh: '[![2](Images/2.png)](#co_music_generation_CO2-2)' - en: After passing it through a `Dense` layer, we reshape the tensor to prepare it for the convolutional transpose operations. id: totrans-201 prefs: [] type: TYPE_NORMAL + zh: 通过一个 `Dense` 层后,我们重新塑造张量以准备进行卷积转置操作。 - en: '[![3](Images/3.png)](#co_music_generation_CO2-3)' id: totrans-202 prefs: [] type: TYPE_NORMAL + zh: '[![3](Images/3.png)](#co_music_generation_CO2-3)' - en: First we expand the tensor along the timestep axis…​ id: totrans-203 prefs: [] type: TYPE_NORMAL + zh: 首先我们沿着时间步长轴扩展张量…​ - en: '[![4](Images/4.png)](#co_music_generation_CO2-4)' id: totrans-204 prefs: [] type: TYPE_NORMAL + zh: '[![4](Images/4.png)](#co_music_generation_CO2-4)' - en: …​then along the pitch axis. id: totrans-205 prefs: [] type: TYPE_NORMAL + zh: …​然后沿着音高轴。 - en: '[![5](Images/5.png)](#co_music_generation_CO2-5)' id: totrans-206 prefs: [] type: TYPE_NORMAL + zh: '[![5](Images/5.png)](#co_music_generation_CO2-5)' - en: The final layer has a tanh activation applied, as we will be using a WGAN-GP (which requires tanh output activation) to train the network. id: totrans-207 @@ -1801,6 +1841,7 @@ id: totrans-260 prefs: [] type: TYPE_NORMAL + zh: 我们还探讨了如何调整标记化过程以处理多声部(多轨)音乐生成。网格标记化将乐谱的钢琴卷表示序列化,使我们能够在描述每个音轨中存在哪个音符的令牌的单个流上训练变压器,在离散的、等间隔的时间步长间隔内。基于事件的标记化产生了一个*配方*,描述了如何以顺序方式创建多行音乐,通过一系列指令的单个流。这两种方法都有优缺点——变压器基于的音乐生成方法的成功或失败往往严重依赖于标记化方法的选择。 - en: We also saw that generating music does not always require a sequential approach—MuseGAN uses convolutions to generate polyphonic musical scores with multiple tracks, by treating the score as an image where the tracks are individual channels of @@ -1813,14 +1854,17 @@ id: totrans-261 prefs: [] type: TYPE_NORMAL + zh: 我们还看到生成音乐并不总是需要顺序方法——MuseGAN使用卷积来生成具有多轨的多声部乐谱,将乐谱视为图像,其中轨道是图像的各个通道。MuseGAN的新颖之处在于四个输入噪声向量(和弦、风格、旋律和节奏)的组织方式,使得可以对音乐的高级特征保持完全控制。虽然底层的和声仍然不像巴赫的那样完美或多样化,但这是对一个极其难以掌握的问题的良好尝试,并突显了GAN处理各种问题的能力。 - en: '^([1](ch11.xhtml#idm45387004193120-marker)) Cheng-Zhi Anna Huang et al., “Music Transformer: Generating Music with Long-Term Structure,” September 12, 2018, [*https://arxiv.org/abs/1809.04281*](https://arxiv.org/abs/1809.04281).' id: totrans-262 prefs: [] type: TYPE_NORMAL + zh: ^([1](ch11.xhtml#idm45387004193120-marker)) 黄成志安娜等人,“音乐变压器:生成具有长期结构的音乐”,2018年9月12日,[*https://arxiv.org/abs/1809.04281*](https://arxiv.org/abs/1809.04281)。 - en: '^([2](ch11.xhtml#idm45387004128000-marker)) Hao-Wen Dong et al., “MuseGAN: Multi-Track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment,” September 19, 2017, [*https://arxiv.org/abs/1709.06298*](https://arxiv.org/abs/1709.06298).``' id: totrans-263 prefs: [] type: TYPE_NORMAL + zh: ^([2](ch11.xhtml#idm45387004128000-marker)) 董浩文等人,“MuseGAN:用于符号音乐生成和伴奏的多轨序列生成对抗网络”,2017年9月19日,[*https://arxiv.org/abs/1709.06298*](https://arxiv.org/abs/1709.06298)。