findInterval {base} | R Documentation |

Find the indices of `x`

in `vec`

, where `vec`

must be
sorted (non-decreasingly); i.e., if `i <- findInterval(x,v)`

,
we have *v[i[j]] <= x[j] < v[i[j] + 1]*
where *v[0] := - Inf*,
*v[N+1] := + Inf*, and `N <- length(vec)`

.
At the two boundaries, the returned index may differ by 1, depending
on the optional arguments `rightmost.closed`

and `all.inside`

.

findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE)

`x` |
numeric. |

`vec` |
numeric, sorted (weakly) increasingly, of length `N` ,
say. |

`rightmost.closed` |
logical; if true, the rightmost interval,
`vec[N-1] .. vec[N]` is treated as closed, see below. |

`all.inside` |
logical; if true, the returned indices are coerced
into {1,...,N-1}, i.e., 0 is mapped to 1 and N to
N-1. |

The function `findInterval`

finds the index of one vector `x`

in
another, `vec`

, where the latter must be non-decreasing. Where
this is trivial, equivalent to `apply( outer(x, vec, ">="), 1, sum)`

,
as a matter of fact, the internal algorithm uses interval search
ensuring *O(n * log(N))* complexity where
`n <- length(x)`

(and `N <- length(vec)`

). For (almost)
sorted `x`

, it will be even faster, basically *O(n)*.

This is the same computation as for the empirical distribution
function, and indeed, `findInterval(t, sort(X))`

is
*identical* to *n * Fn(t;
X[1],..,X[n])* where *Fn* is the empirical distribution
function of *X[1],..,X[n]*.

When `rightmost.closed = TRUE`

, the result
for `x[j] = vec[N]`

(* = max(vec)*), is `N - 1`

as for
all other values in the last interval.

vector of length `length(x)`

with values in `0:N`

(and
`NA`

) where `N <- length(vec)`

, or values coerced to
`1:(N-1)`

if and only if `all.inside = TRUE`

(equivalently coercing all
x values *inside* the intervals). Note that `NA`

s are
propagated from `x`

, and `Inf`

values are allowed in
both `x`

and `vec`

.

Martin Maechler

`approx(*, method = "constant")`

which is a
generalization of `findInterval()`

, `ecdf`

for
computing the empirical distribution function which is (up to a factor
of *n*) also basically the same as findInterval(.).

N <- 100 X <- sort(round(rt(N, df=2), 2)) tt <- c(-100, seq(-2,2, len=201), +100) it <- findInterval(tt, X) tt[it < 1 | it >= N] # only first and last are outside range(X)

[Package *base* version 2.5.0 Index]