-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LineSplit trait #1050
LineSplit trait #1050
Conversation
Hi @urschrei, @michaelkirk, it seems like every time I push changes to my fork it is triggering all the checks on this draft PR. Should I be concerned about the usage? |
c4ba6be
to
9e2c5d6
Compare
CI is configured to run tests on all PR's (even drafts). Maybe it's possible to turn CI off for draft PR's, but I prefer it as it is. In fact I actually sometimes use draft PR's exactly for this purpose: as a way to see if CI passes before I'm ready to mark it as ready and waste a living human's time on it. I'm not overly concerned about the usage. If you don't want it to run, you could close the PR until you're ready. |
Thanks @michaelkirk I'll try to limit unnecessary pushes for now anyway. I had understood an early draft PR as a way of letting others know I'm having a go at some feature. Perhaps I will open them a bit later in the process in future. I thought this PR would be a two or three quick commits but it's turning out to be tricky :/ |
/// # use geo::LineSplit; | ||
/// | ||
/// # let my_line = LineString::from(vec![coord! {x: 0., y: 0.},coord! {x: 1., y: 0.},]); | ||
/// if let Some(result) = my_line.line_split_twice(0.2, 0.5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more... is there a reason to not just have line_split_many
and return Vec<Option<T>>
? The guarantee is that the vec has n+1 entries for n input splits, and any of them might be None. Are the enums easier to use in the offset work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dabreegster :)
Regarding line_split_twice
I think you may be right that line_split_many
returning Vec<Option<T>>
is more broadly useful. I personally have a specific use-case for line_split_twice()
where I want to chop out a road segment between two chainages, but I can just move that function into my own project.
I think for a single line_split
the enum (LineSplitResult<T>
) is helpful as explained above.
Maybe your comment is because it comes wrapped in an option line_split(..) -> Option<LineSplitResult<Self>>
which is an annoying extra step to unwrap?
Just thinking out loud; the possible outcomes are
a)
None, // returned when the input line has NAN or INF values, or the input `fraction` is NAN
Some(LineSplitResult::First (T )),
Some(LineSplitResult::Second ( T)),
Some(LineSplitResult::FirstSecond (T, T)),
The tuple version is
b)
None, // returned when the input line has NAN or INF values, or the input `fraction` is NAN
Some((Some(T), None )),
Some((None, Some(T))),
Some((Some(T), Some(T))),
One level of wrapping can be removed like this;
c)
(None, None ), // returned when the input line has NAN or INF values, or the input `fraction` is NAN
(Some(T), None ),
(None, Some(T)),
(Some(T), Some(T)),
So why not just use the last one?
The most satisfying API would be if we are allowed to assume the input geometry and fraction
are well-formed, then the result could be like this:
d)
LineSplitResult::First (T ),
LineSplitResult::Second ( T),
LineSplitResult::FirstSecond (T, T),
In this case, the enum is helpful, because it prevents the user from expecting (None, None )
which can never happen if the inputs are valid. Therefore the enum stops rust from complaining about non-exhaustive match arms.
I think it depends on whether line_split
should be guarding against NAN/INF. Possibly not? Otherwise if other algorithms rely on this algorithm, and they all check for NAN every time... there will end up being a lot of NAN checks? On the other hand, if it is user facing, then it probably should guard against invalid input?
Apologies for the wall-of-text and thank you for your time :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining further! I think guarding against invalid input is necessary, so some way to represent that sounds helpful. I'm just trying to figure out why the enum is helpful vs returning a bunch of Option<T>
s.
I personally have a specific use-case for line_split_twice() where I want to chop out a road segment between two chainages, but I can just move that function into my own project.
I can imagine use cases for different numbers of split points. What about if some of the variations returned an array of fixed size, like line_split -> [Option<T>; 2]
and you used https://doc.rust-lang.org/reference/patterns.html#slice-patterns? If the input is invalid, [None, None]
. Otherwise the 3 variations can be represented. I just tried out slice patterns and https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=596e6e8d97ae0e673428dff4334c0b3f seems pretty succinct, and would save you from defining the enums.
In this case, the enum is helpful, because it prevents the user from expecting (None, None ) which can never happen if the inputs are valid. Therefore the enum stops rust from complaining about non-exhaustive match arms.
Is (None, None) => unreachable!("input should be valid")
in user code something to avoid? Similar to an line_split(...).expect("input should be valid")
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is (None, None) => unreachable!("input should be valid") in user code something to avoid
It wouldn't be the end of the world, but I think it'd be better to encode it in the type system if we can do so succinctly. Otherwise we're relying on the user to read (and forever remember!) the docs.
The most satisfying API would be if we are allowed to assume the input geometry and fraction are well-formed, then the result could be like this:
LineSplitResult::First (T ), LineSplitResult::Second ( T), LineSplitResult::FirstSecond (T, T),
Even though it's not any fewer cases, maybe a better way to communicate to the user the possibility of invalid results is something like:
// Library code
enum LineSplits {
First(T),
Second(T),
FirstSecond((T, T)),
}
fn split_line(line_string: &LineString) -> Result<LineSplits> { todo!() }
// User code
let splits = split_line(my_line_string).expect("valid input");
I'm late here, but let me drop 2 cents. I've implemented something like this for myself in Python and Rust, also working with OSM streets data. The best way to use was to have "line-substring" method with "from" and "to" fractions or distances. If I were to code this, I'd have 2 separate methods, to avoid confusion. The "parameter" mode makes sense, and I'd make it separate as well. Other options were:
|
Is your implementation public? Would love to check it out if so. (Currently looking through https://github.com/culebron/rust_routing/ -- cool stuff!) |
Thanks, actually the credit goes to you, who inspired me to try Rust. I implemented that before learning of Here's the Python and Rust code, free to use. Actually, it turned out I did use cutting in Python, but had to add a convenient wrapper around it. Python code
def line_substring(line, start=None, end=None):
if start is not None:
start = clip_position(line.length, start)
if end is not None:
end = clip_position(line.length, end)
if end is not None and start is not None:
if end < start:
start, end = end, start
end -= start
return cut_line(cut_line(line, start)[1], end)[0]
def clip_position(length, position):
if position is None:
return None
if position < -length:
return 0
if position > length:
return length
if position < 0:
return position + length
return position
def cut_line(line, distance):
# Cuts a line in two at a distance from its starting point
empty_line = LineString()
if distance is None:
return line, line
if line is None or line.is_empty:
return empty_line, empty_line
distance = clip_position(line.length, distance)
if distance == 0:
return empty_line, line
if distance >= line.length:
return line, empty_line
coords = list(line.coords)
points = [Point(p) for p in coords]
distances = [[line.project(p), p] for p in points]
distances[-1][0] = line.length
if not any(d == distance for d, p in distances):
distances += [[distance, line.interpolate(distance)]]
distances = sorted(distances, key=lambda v: v[0])
line1 = [p for d, p in distances if d <= distance]
line2 = [p for d, p in distances if d >= distance]
line1 = LineString(line1) if len(line1) > 1 else empty_line
line2 = LineString(line2) if len(line2) > 1 else empty_line
return line1, line2 Rust code
use geo::{Coord, LineString, EuclideanDistance, LineInterpolatePoint, EuclideanLength};
pub trait LineSubstring {
fn sub_frac(&self, start: f32, end: f32) -> Option<LineString<f32>>;
}
impl LineSubstring for LineString<f32> {
fn sub_frac(&self, mut start: f32, mut end: f32) -> Option<Self> {
if start.is_nan() || end.is_nan() || start > end { // probably should be Err, but that's too much wrapping, Greta Thunberg won't approve.
return None;
}
start = start.max(0.0).min(1.0);
end = end.max(0.0).min(1.0);
let tot_dist = self.euclidean_length();
let mut prev : Option<&Coord<f32>> = None;
let mut cur_dist = 0.0f32;
let mut result = Self(vec![]);
self.line_interpolate_point(start).map(|p| result.0.push(p.0));
for i in 0..self.0.len() {
prev.map(|p| cur_dist += self.0[i].euclidean_distance(p));
if cur_dist / tot_dist > end { break; }
if cur_dist / tot_dist > start { result.0.push(self.0[i].clone()); }
prev = Some(&self.0[i])
}
if end < 1.0 {
self.line_interpolate_point(end).map(|p| result.0.push(p.0));
}
Some(result)
}
}
#[cfg(test)]
mod line_substring_tests {
use geo::{LineString, Coord};
use crate::assert_delta;
use super::LineSubstring;
fn cmp(ls1: &LineString<f32>, start: f32, end: f32, vec2: Vec<(f32, f32)>) {
let ls1 = ls1.sub_frac(start, end).unwrap();
println!("expected {:?}, got: {:?}", vec2, ls1);
for (Coord { x: x1, y: y1}, (x2, y2)) in ls1.0.iter().zip(vec2.iter()) {
assert_delta!(x1, x2, 0.001);
assert_delta!(y1, y2, 0.001);
}
}
#[test]
fn half1_works() {
let l1 = LineString::from(vec![(0.0, 0.0), (100.0, 0.0)]);
cmp(&l1, 0.0, 0.5, vec![(0.0, 0.0), (50.0, 0.0)]);
cmp(&l1, -100.0, 0.5, vec![(0.0, 0.0), (50.0, 0.0)]);
cmp(&l1, 0.0, 1.0, vec![(0.0, 0.0), (100.0, 0.0)]);
cmp(&l1, 0.0, 2.0, vec![(0.0, 0.0), (100.0, 0.0)]);
assert!(l1.sub_frac(0.75, 0.25).is_none());
assert!(l1.sub_frac(f32::NAN, f32::NAN).is_none());
}
#[test]
fn half1_real_geom() {
let ls: LineString<f32> = LineString::from(vec![
(8564258.0, 5351193.5),
(8564267.0, 5351172.5),
(8564277.0, 5351118.0),
(8564288.0, 5351049.5),
(8564292.0, 5351002.0)]);
let result = ls.sub_frac(0.0, 0.5).unwrap();
assert_eq!(result.0.len(), 4);
}
} Editsedit (RobWalt): Sorry in advance for editing this comment 🙈 I just added the language highlighting to the code examples to make it easier to read. |
Closing this for the time being. If you find the time to work on this, please reopen! |
CHANGES.md
if knowledge of this change could be valuable to users.cargo fmt --all -- --check
cargo clippy --all-features --all-targets -- -Dwarnings
Split a
Line
orLineString
based on one or two fractional distances along the length using proposed trait methodsLineSplit::line_split(fraction)
andLineSplit::line_split_twice(fraction_start, fraction_end)
Previously discussed in #378 and #986
Needed for #935 see this comment
Still to do / discuss
Instead ofIn addition toline_split_twice
, considerline_split_many
. Being able to split atn
locations is desirable as discussed in LineString intersection API #985 and LineString slicing #986line_split_twice
?line_split_many
LineSplit
trait's default implementation forline_split_many
andline_split_twice
are failing tests due to precision issues which arise because these use a lazy implementation which just repeatedly applies theline_split
function. The error seems to compound much faster than I expected 😞 For the same reasons, these methods are also super inefficient.None
? Or just automatically swap thefraction_start
&fraction_end
arguments? Currently it does the latter.Consider supportingparameter
andlength_along
in addition tofraction
(see discussion below)Perhaps in a future PR
Consider breakingmeasure_line_string()
into a separate trait?Perhaps in a future PR
Consider optimization using dedicated prepared geometry typesLineStringMeasured
andLineMeasured
(see discussion below)Perhaps in a future PR
Example Usage
Click to expand code examples
Line LineSplit::line_split()
LineString LineSplit::line_Split()
Line LineSplit::line_split_twice()
// TODO write example
LineString LineSplit::line_split_twice()
Line LineSplit::line_split_many
Implementation Notes
Return Types
At first I tried implementing the return types like
Option<(Option<Line>, Option<Line>)>
but this causes issues in the implementation (non exhaustive match statements) and uncertainty for the user; The user may expect to possibly receiveSome((None, None))
which is never possible.Therefore the following types were implemented:
line_split
returnsOption<LineSplitResult<T>>
andline_split_twice
returnsOption<LineSplitTwiceResult<T>>
So that we get the best of both worlds I implemented
LineSplitResult::as_tuple()->(Option<&T>, Option<&T>)
and a set of single item accessors likeLineSplitResult::first()
andLineSplitResult::second()
which returnOption<&T>
I also added
.into_
variants, eginto_first
andinto_tuple
to consume the result and discard other returned data.This works great for
line_split()
but for theline_split_twice()
function I am still wondering if these return types are the best option. For example theLineSplitTwiceResult::FirstThird
variant is pretty confusing, although from an implementation point of view it makes total sense; it happens when there is no 'Second' line segment because thefraction_start==fraction_end
.Further Thoughts
Fraction
vsLength-Along
vsParameter
A point on a line string can be identified in three ways; as a
fraction
of length along, as alength along
the line, or as something that I call 'parameter
'A parameter is just a float where the integer part represents the segment index and the fractional part represents the
fraction
along the next line segment. In my previous tinkering I found that many algorithms are naturally faster when using parameters internally. This is because they do not need to know the length of every segment, and the total length of the linestring. Only the segment with the fractional part needs to be measured.It seems natural to me that line intersection algorithms and line split algorithms should have the option to use 'parameter' instead of fraction.
LineStringMeasured
andLineMeasured
In previous projects I have implemented 'measured' types which might look something like
struct LineStringMeasure<T>{(coordinates:Vec<Coord<T>>, length_segments:Vec<T>, length_total:T}
. For repeated queries on the same line, these can massively reduce the number of repeated length calculations. No idea if it would result in real performance improvements because storing the extra data may reduce cache locality etc.