- schemata are
- ideally: a succession of simultaneous sounding groups of notes (“stages”)
- actually: elaborated by
- intermediate notes
- non-simultaneity
- Skipgram:
- model limited non-adjacency
- can be generalized to streams of overlapping entities
- => can model non-simultaneity
- approach:
- generalize skipgrams (from sequences to cost-monotonic streams)
- use SGs over SGs to model schemata
- SGs over notes: non-simultaneous stages (internally)
- SGs over stages: non-adjacent stages (externally)
- frequent SG^2s in corpus <=> schemata?
- sequence of entities (letters, words, …)
- n-grams: continuous sub-sequences of length n
- total: L-n (L>n)
- k-skip-n-grams: sequences of length n, that skip at most k entities (total: ?)
- total: (k+1)(k+2)/6 * (3L-2k-6) for n=3 (Guthrie et al., 2006)
- is there a general analysis (exact or asymptotic)
- total: (k+1)(k+2)/6 * (3L-2k-6) for n=3 (Guthrie et al., 2006)
- n-grams: continuous sub-sequences of length n
- flexible cost (Sears et al. 2017, maybe even earlier?)
- “cost” instead of “skips” (skips = cost based on skipped elements/indices)
- => total depends not on (n,k) but on (n,l), where l is RV #(elements in k-range).
- overlapping input
- “cost” idea + input may “overlap” (like notes, unlike words)
- requirement: monotonic ordering in stream wrt. cost (read < as “before”)
- ∀ x<y<z: cost(x,y) ≤ cost(x,z)
- stochastic selection
- …
Idea: collect prefixes while iterating over input In: cost-monotonic stream of entities, k, n, cost function Out: stream of skipgrams Algorithm:
Skipgrams(input, k, n, cost):
prefixes = {}
output = {}
for x in input:
# close old prefixes
openpfxs = { pfx | pfx ∈ prefixes, totalcost(cost, pfx, x) > k }
# extend remaining prefixes (and "empty prefix")
extpfxs = { pfx+x | pfx ∈ openpfxs } ∪ { x }
# collect completed grams
output = output ∪ { pfx | pfx ∈ extpfxs, |pfx|==n }
# continue with remaining pfxs
prefixes = openpfxs ∪ { pfx | pfx ∈ extpfxs, |pfx|<n }
end
return output
end
Idea: include prefix predicate In: input stream, k, n, cost function, prefix predicate P Out: stream of skipgrams that satisfy P Algorithm: like generalized skipgrams but filter new prefixes by P
- stable ordering
- standard alg orders by index of last gram element in input stream
- ordering by first:
- eager: compute output, sort before returning it
- not feasible for large output
- lazy: priority queue
- enqueue grams with priority according to index of first element
- dequeue only where priority < minimal index in prefixes
- eager: compute output, sort before returning it
Idea: hold back complete prefixes that might still be preceded by prefixes completed later In: input stream, k, n, cost function, output channel Out: stream of skipgrams on output channel, first elements keep order of input stream Algorithm:
StableSkipgrams(input, k, n, cost, outchannel):
prefixes = {}
outq = []
i = 0 # running index
for x in input:
# close old prefixes
openpfxs = { (prio, pfx) | (prio, pfx) ∈ prefixes, totalcost(cost, pfx, x) > k }
# release queue items
released, outq = release(outq, min({prio | (prio, _) ∈ openpfxs }))
put(outchannel, released)
# extend remaining prefixes (and "empty prefix")
extpfxs = { (prio, pfx+x) | (prio, pfx) ∈ openpfxs } ∪ { (i, x) }
# collect completed grams
outq = merge(outq, sort([(prio, pfx) | (prio, pfx) ∈ extpfxs, |pfx|==n]))
# continue with remaining pfxs
prefixes = openpfxs ∪ { (prio, pfx) | (prio, pfx) ∈ extpfxs, |pfx|<n }
i+=1
end
return out
end
- SG²: large amount of output, takes very long to compute
- sample skipgrams randomly:
- (a) flip biased (p) coin for each skipgram -> easy to implement
- (b) choose M random skipgrams -> controllable output size
Idea 1:
- flip a coin at every prefix extension: ∀sg: P(sg ∈ output) = p
- sg length n: biasⁿ = p
In: input stream, k, n, cost, p Out: stream of stochastic skipgrams with membership probability p
StochasticSkipgrams(input, k, n, cost, p):
prefixes = {}
output = {}
bias = root(p, n)
for x in input:
# close old prefixes
openpfxs = { pfx | pfx ∈ prefixes, totalcost(cost, pfx, x) > k }
# extend remaining prefixes (and "empty prefix")
extpfxs = { pfx+x | pfx ∈ openpfxs, flip(bias) } ∪ { x }
# collect completed grams
output = output ∪ { pfx | pfx ∈ extpfxs, |pfx|==n }
# continue with remaining pfxs
prefixes = openpfxs ∪ { pfx | pfx ∈ extpfxs, |pfx|<n }
end
return output
end
Idea 2:
- limit the number of prefixes extended with a given candidate
- N ≤ lⁿ -> limit l: l=N^(1/n), where l is max. #(extended prefixes per candidate)
- can we sample l based on available prefixes such that ‘N ≤’ becomes ‘E[N] =’?
- input: stream of notes, ordered by onset
- 1st pass (over input): vertical structure
- overlap allowed
- cost = total onset difference
- 2nd pass (over 1st): horizontal structure
- overlap not allowed
- cost = max onset difference (step function)
- output: count occurrences
- equivalences:
- order by (absolute) pitch
- oct: pitch classes
- transp: relative to first bass note (_1_1 is always 0)
- parameters:
- n_voices, n_stages from 2x2 to 3x4 and 4x3 (maybe)
- p adapted to size
- k_v, k_s, both 1 bar
- n_voices, n_stages from 2x2 to 3x4 and 4x3 (maybe)