Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSMC & Sparse Particle Storage with RB and GPU-acceleration #22

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

THargreaves
Copy link
Collaborator

@THargreaves THargreaves commented Oct 29, 2024

A very much draft piece of work tightening up and testing the CSMC interface. Just wanted to check it technically works.

Contributions and comments welcomed.

TODO:

  • Tidy code
  • Improve runtime of unit tests
  • Replaced ground truth with RTS smoother
  • Implement GPU version
  • Considered behaviour for t = 0

On the last point, the callback is not ran after initialisation so x0 of the reference trajectory is not stored.

Use offset arrays to include initial state in reference trajectory. Implement Kalman smoother for use unit tests.
@THargreaves
Copy link
Collaborator Author

Looks like the tests before were passing by chance. I've made a few changes so they now have a valid unit test that passes:

  • Implemented RTS smoother and wrote unit test
  • Used OffsetVectors to include x0 in reference trajectory
  • Correct ancestry code

The first change required modifying the GaussianContainer to have proposal/filtered states, which breaks the RBPF test (though the others still pass) as discussed in #14.

I think the way around this might be to have initialise/update/predict act on the state itself, not the container of proposal/filtered states. The step method can then map these to the container (if the container is even needed at all).

@THargreaves THargreaves changed the title CSMC with RB and GPU-acceleration CSMC & Sparse Particle Storage with RB and GPU-acceleration Nov 12, 2024
@THargreaves
Copy link
Collaborator Author

THargreaves commented Nov 13, 2024

Code is very messy at this point but...it works 🙌

Have a unit test comparing GPU-accelerated, Rao-Blackwellised CSMC to RTS on a dummy Gaussian problem and the smoothing distributions match.

I will probably clean up once we chat on Friday to discuss interface changes.

The biggest long-term issue is that I'm storing the reference trajectories essentially as a vector of (D x 1) CuArrays (i.e. particle containers of size 1). This is probably not very efficient.

Maybe it's best falling back to the CPU for the reference trajectory stuff since we're no longer in a parallel setting. Though a question I've been wondering is whether having one reference trajectory is always optimal for CSMC/PG. The theory works the same with any number...and the GPU would be useful if we had multiple (can compute the ancestries in parallel).

@THargreaves
Copy link
Collaborator Author

Here's the PDF detailing the changes involved in this PR.

GeneralisedFilters.jl Containers.pdf

@THargreaves
Copy link
Collaborator Author

The last commit starts the conversion of the interface to act on distributions rather than the combined intermediate storage.

With this change the Kalman smoother, particle filter and RBPF are all simultaneously passing.

I apologise that this has made some other parts of the code a bit clunkier and may get in the way of some of your ideas. I don't see this as a final version though so happy to make changes to make things more elegant provided it still allows for the RBPF to work nicely.

Some of the main changes that were required:

  • Added type AbstractParticleFilter which has a modified step method which contains the resampling rather than inside of the predict step (where ancestors is not available)
  • Although the update_ref still takes place within initialise/predict, the updating of ancestor indices happens in this custom step, which is a bit clunky.
  • Add an instantiate method which creates the intermediate storage before it gets populated by initialise.

I'll spend the rest of the day making sure the other tests pass and introducing the other changes we discussed last Friday.

Thank you for your patience whilst I make these quite clunky changes—the fast GPU—RB—PGAS should all be worth it though!

Copy link
Collaborator

@charlesknipp charlesknipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what I see so far as long as unit tests are passing across the board. I also want to make sure this code is type stable since this could be a huge bottleneck in terms of performance.

StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
DataStructures = "0.18.20"
GaussianDistributions = "0.5.2"
OffsetArrays = "1.14.1"
Statistics = "1.11.1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little odd, I was able to remove this on my work computer (which is stuck at Julia-1.10.4)

Comment on lines +12 to +17
mutable struct Intermediate
proposed::Any
filtered::Any
ancestors::Any
Intermediate() = new()
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too keen on the idea of sacrificing type stability for convenience. If we can ensure that the RBPF is type stable with Intermediate that would be ideal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think we can compute these types at instantiation time, it'll just be a bit of a faff. It's basically a generalisation of the rb_type used with the CPU.

src/containers.jl Outdated Show resolved Hide resolved
src/algorithms/kalman.jl Outdated Show resolved Hide resolved
Comment on lines +47 to 55
function instantiate(
model::StateSpaceModel{T}, filter::BootstrapFilter; kwargs...
) where {T}
N = filter.N
particle_state = ParticleState(Vector{Vector{T}}(undef, N), Vector{T}(undef, N))
return ParticleContainer(
particle_state, deepcopy(particle_state), Vector{Int}(undef, N)
)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can generalize this to any AbstractParticleFilter, except for the RBPF

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Can probably even extend it to the RBPF too. It's just replacing the Vector{T} with the Rao-Blackwellised particle

@THargreaves
Copy link
Collaborator Author

I also want to make sure this code is type stable since this could be a huge bottleneck in terms of performance.

Yeah, I definitely think some things have slipped through in the process. Code, especially the GPU stuff, seems slower now than I expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants