Skip to content

Latest commit

 

History

History
178 lines (149 loc) · 5.88 KB

notes.org

File metadata and controls

178 lines (149 loc) · 5.88 KB

From Skipgrams to Schemata

Idea

  • schemata are
    • ideally: a succession of simultaneous sounding groups of notes (“stages”)
    • actually: elaborated by
      • intermediate notes
      • non-simultaneity
  • Skipgram:
    • model limited non-adjacency
    • can be generalized to streams of overlapping entities
    • => can model non-simultaneity
  • approach:
    • generalize skipgrams (from sequences to cost-monotonic streams)
    • use SGs over SGs to model schemata
      • SGs over notes: non-simultaneous stages (internally)
      • SGs over stages: non-adjacent stages (externally)
    • frequent SG^2s in corpus <=> schemata?

Skipgrams

original idea

  • sequence of entities (letters, words, …)
    • n-grams: continuous sub-sequences of length n
      • total: L-n (L>n)
    • k-skip-n-grams: sequences of length n, that skip at most k entities (total: ?)
      • total: (k+1)(k+2)/6 * (3L-2k-6) for n=3 (Guthrie et al., 2006)
        • is there a general analysis (exact or asymptotic)

generalizations

  • flexible cost (Sears et al. 2017, maybe even earlier?)
    • “cost” instead of “skips” (skips = cost based on skipped elements/indices)
    • => total depends not on (n,k) but on (n,l), where l is RV #(elements in k-range).
  • overlapping input
    • “cost” idea + input may “overlap” (like notes, unlike words)
    • requirement: monotonic ordering in stream wrt. cost (read < as “before”)
      • ∀ x<y<z: cost(x,y) ≤ cost(x,z)
  • stochastic selection

algorithms

generalized skipgrams

Idea: collect prefixes while iterating over input In: cost-monotonic stream of entities, k, n, cost function Out: stream of skipgrams Algorithm:

Skipgrams(input, k, n, cost):
    prefixes = {}
    output = {}
    for x in input:
        # close old prefixes
        openpfxs = { pfx | pfxprefixes, totalcost(cost, pfx, x) > k }
        # extend remaining prefixes (and "empty prefix")
        extpfxs = { pfx+x | pfxopenpfxs } ∪ { x }
        # collect completed grams
        output = output ∪ { pfx | pfxextpfxs, |pfx|==n }
        # continue with remaining pfxs
        prefixes = openpfxs ∪ { pfx | pfxextpfxs, |pfx|<n }
    end
    return output
end

early filtering

Idea: include prefix predicate In: input stream, k, n, cost function, prefix predicate P Out: stream of skipgrams that satisfy P Algorithm: like generalized skipgrams but filter new prefixes by P

stable ordering

  • stable ordering
    • standard alg orders by index of last gram element in input stream
    • ordering by first:
      • eager: compute output, sort before returning it
        • not feasible for large output
      • lazy: priority queue
        • enqueue grams with priority according to index of first element
        • dequeue only where priority < minimal index in prefixes

Idea: hold back complete prefixes that might still be preceded by prefixes completed later In: input stream, k, n, cost function, output channel Out: stream of skipgrams on output channel, first elements keep order of input stream Algorithm:

StableSkipgrams(input, k, n, cost, outchannel):
    prefixes = {}
    outq = []
    i = 0 # running index
    for x in input:
        # close old prefixes
        openpfxs = { (prio, pfx) | (prio, pfx) ∈ prefixes, totalcost(cost, pfx, x) > k }
        # release queue items
        released, outq = release(outq, min({prio | (prio, _) ∈ openpfxs }))
        put(outchannel, released)
        # extend remaining prefixes (and "empty prefix")
        extpfxs = { (prio, pfx+x) | (prio, pfx) ∈ openpfxs } ∪ { (i, x) }
        # collect completed grams
        outq = merge(outq, sort([(prio, pfx) | (prio, pfx) ∈ extpfxs, |pfx|==n]))
        # continue with remaining pfxs
        prefixes = openpfxs ∪ { (prio, pfx) | (prio, pfx) ∈ extpfxs, |pfx|<n }
        i+=1
    end
    return out
end

stochastic skipgrams

  • SG²: large amount of output, takes very long to compute
  • sample skipgrams randomly:
    • (a) flip biased (p) coin for each skipgram -> easy to implement
    • (b) choose M random skipgrams -> controllable output size

Idea 1:

  • flip a coin at every prefix extension: ∀sg: P(sg ∈ output) = p
  • sg length n: biasⁿ = p

In: input stream, k, n, cost, p Out: stream of stochastic skipgrams with membership probability p

StochasticSkipgrams(input, k, n, cost, p):
    prefixes = {}
    output = {}
    bias = root(p, n)
    for x in input:
        # close old prefixes
        openpfxs = { pfx | pfxprefixes, totalcost(cost, pfx, x) > k }
        # extend remaining prefixes (and "empty prefix")
        extpfxs = { pfx+x | pfxopenpfxs, flip(bias) } ∪ { x }
        # collect completed grams
        output = output ∪ { pfx | pfxextpfxs, |pfx|==n }
        # continue with remaining pfxs
        prefixes = openpfxs ∪ { pfx | pfxextpfxs, |pfx|<n }
    end
    return output
end

Idea 2:

  • limit the number of prefixes extended with a given candidate
  • N ≤ lⁿ -> limit l: l=N^(1/n), where l is max. #(extended prefixes per candidate)
  • can we sample l based on available prefixes such that ‘N ≤’ becomes ‘E[N] =’?

Applying skipgrams to music

Method

  • input: stream of notes, ordered by onset
  • 1st pass (over input): vertical structure
    • overlap allowed
    • cost = total onset difference
  • 2nd pass (over 1st): horizontal structure
    • overlap not allowed
    • cost = max onset difference (step function)
  • output: count occurrences
  • equivalences:
    • order by (absolute) pitch
    • oct: pitch classes
    • transp: relative to first bass note (_1_1 is always 0)
  • parameters:
    • n_voices, n_stages from 2x2 to 3x4 and 4x3 (maybe)
      • p adapted to size
    • k_v, k_s, both 1 bar