Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return NaN coordinate immediately when receiving NaN input #949

Merged
merged 7 commits into from
May 24, 2018

Conversation

kbevers
Copy link
Member

@kbevers kbevers commented Apr 23, 2018

Ideally NaNs shouldn't get special treatment but since double to int
conversion is undefined behavior we are forced to treat NaNs as
something special. Therefor we return a NaN coordinate when ever we
encounter a NaN in one of the dimensions of an input coordinate. The
error level is not changed since this all math operations on NaNs should
return a NaN.

This policy of returning NaNs immediately should also eliminate a bunch
of potential double to int conversion bugs.

Alternative version of #943

@kbevers kbevers force-pushed the return-nans-quickly branch from dd1b2be to a111d43 Compare April 23, 2018 21:07
@kbevers kbevers force-pushed the return-nans-quickly branch 2 times, most recently from f760ded to ff89a5e Compare April 24, 2018 07:51
@cffk
Copy link
Contributor

cffk commented Apr 24, 2018

C99 has the function, lround, which rounds a double to a long returning
an "implementation-defined value" (LONG_MIN on my system) if the double
does not fit in a long. Thus if we just replaced

int(floor(double))

with

lround(floor(double))

we would no longer have to include special tests for NaN. Of course for
non-C99 systems we would need to define lround

long lround(double x) {
  x = x < 0 ? ceil(x - 0.5) : floor(x + 0.5);
  return fabs(x) < -(double)LONG_MIN ? (long)x : LONG_MIN;
}

@schwehr Could you verify that lround(NaN) is safe (doesn't cause the
program to abort)?

@kbevers
Copy link
Member Author

kbevers commented Apr 24, 2018

You mean (int)floor(double), right? We would have to find all occurrences of that to be on the safe side. I can easily grep a handful of those but I don't find all of them, for instance these lines from PJ_isea.c (that sparked the whole issue in the first place):

    rx = floor(x + 0.5);
    ix = (int)rx;
    ry = floor(y + 0.5);
    iy = (int)ry;
    rz = floor(z + 0.5);
    iz = (int)rz;

I am good with the lround solution granted that we can create a safe non-C99 version of it.

@cffk
Copy link
Contributor

cffk commented Apr 24, 2018

Yes, I meant (inf)floor(double). (I was thinking C++.) I think that double to int conversions are only needed in a handful of situations: computing a UTM zone, getting the right face of an icosahedron...

@kbevers
Copy link
Member Author

kbevers commented Apr 24, 2018

I get these cases from grepping a bit:

λ grep "(int)[[:space:]]*floor" *.c
PJ_robin.c:    i = isnan(lp.phi) ? -1 : (int)floor(dphi * C1);
PJ_robin.c:        i = isnan(lp.phi) ? -1 : (int)floor(lp.phi * NODES);
PJ_unitconvert.c:    year = (int)floor(decimalyear);
PJ_unitconvert.c:    int year  = (int)floor( yyyymmdd / 10000 );
PJ_unitconvert.c:    int month = (int)floor((yyyymmdd - year*10000) / 100);
PJ_unitconvert.c:    int day   = (int)floor( yyyymmdd - year*10000 - month*100);
nad_intr.c:     indx.lam = isnan(t.lam) ? 0 : (int)floor(t.lam);
nad_intr.c:     indx.phi = isnan(t.phi) ? 0 : (int)floor(t.phi);
pj_apply_vgridshift.c:        grid_ix = (int) floor(grid_x);
pj_apply_vgridshift.c:        grid_iy = (int) floor(grid_y);

and then the rest of the (int) typecasts:

λ grep -v "(int)[[:space:]]*floor" *.c | grep (int)
PJ_healpix.c:        return pnpoly((int)sizeof(healpixVertsJit)/
PJ_healpix.c:        return pnpoly((int)sizeof(rhealpixVertsJit)/
PJ_horner.c:    int n = (int)horner_number_of_coefficients(order);
PJ_horner.c:        n = 2*(int)order + 2;
PJ_horner.c:    h->order = (int)order;
PJ_isea.c:    ix = (int)rx;
PJ_isea.c:    iy = (int)ry;
PJ_isea.c:    iz = (int)rz;
PJ_isea.c:     * (int)sidelength * 2 + 1 might be just as good
PJ_isea.c:    maxcoord = (int) (sidelength * 2.0 + 0.5);
PJ_isea.c:        sidelength = (int) (pow(g->aperture, g->resolution / 2.0) + 0.5);
PJ_isea.c:    hexes = (int) (pow(g->aperture, g->resolution) + 0.5);
PJ_isea.c:        height = (int) (pow(g->aperture, (g->resolution - 1) / 2.0));
PJ_isea.c:        sn = ((int) di->x) * height;
PJ_isea.c:        sn += ((int) di->y) / height;
PJ_isea.c:        sidelength = (int) (pow(g->aperture, g->resolution / 2.0) + 0.5);
PJ_isea.c:        sn = (int) ((quad - 1) * hexes + sidelength * di->x + di->y + 2);
PJ_isea.c:    hex->x = ((int)v.x << 4) + quad;
PJ_isea.c:    d = (int)v.x;
PJ_isea.c:    i = (int)v.y;
PJ_isea.c:        int offset = (int)(pow(3.0, g->resolution - 1) + 0.5);
PJ_isea.c:    sidelength = (int) (pow(g->aperture, g->resolution / 2.0) + 0.5);
PJ_pipeline.c:    P->opaque->steps = (int)steps;
PJ_pipeline.c:    argc = (int)argc_params (P->params);
PJ_unitconvert.c:    day = (int)(mjd - mjd_iter + 1);
cct.c:            print (PJ_LOG_ERROR, "Read error in record %d\n", (int) o->record_index);
cct.c:            print (PJ_LOG_NONE, "# Record %d UNREADABLE: %s", (int) o->record_index, buf);
cct.c:                   (int) o->record_index, buf, pj_strerrno (proj_errno(P)));
geod_set.c:                     n_S = (int)(geod_S / del_S + .5);
geodesic.c:  q = r == r ? (int)(floor(r / 90 + (real)(0.5))) : 0;
gie.c:    ntrips = (int) (endp==args? 100: fabs(ans));
gie.c:        fprintf (T.fout, "     FAILURE in %s(%d):\n", opt_strip_path (T.curr_file), (int) F->lineno);
gie.c:    fprintf (T.fout, "     FAILURE in %s(%d):\n", opt_strip_path (T.curr_file), (int) F->lineno);
gie.c:        fprintf (T.fout, "     FAILURE in %s(%d):\n     Too few args: %s\n", opt_strip_path (T.curr_file), (int) F->lineno, args);
gie.c:    fprintf (T.fout, "     FAILURE in %s(%d):\n",    opt_strip_path (T.curr_file), (int) F->lineno);
gie.c:            delim, (int) T.operation_lineno, pj_strerrno(proj_errno(T.P)),
gie.c:            delim, (int) T.operation_lineno
gie.c:    int i = (int) proj_atof (args);
gie.c:    ret = (int) pj_atof (err_const);
gie.c:    if (0==fgets (G->next_args, (int) G->next_args_size - 1, G->f))
gie.c:        if (0==fgets (G->next_args, (int) G->next_args_size - 1, G->f))
mk_cheby.c:    if (!(ncu = (int *)vector1(nu + nv, sizeof(int)))) {
nad_init.c:    for( id_end = (int)strlen(ct->id)-1; id_end > 0; id_end-- )
nad_init.c:        swap_words( ct->cvs, 4, (int)a_size * 2 );
nad_init.c:    for( id_end = (int)strlen(ct->id)-1; id_end > 0; id_end-- )
pj_gridinfo.c:                (int)header_size );
pj_malloc.c:    p->baz = pj_calloc (10, sizeof(int));
pj_open_lib.c:            fname[n = (int)strlen(fname)] = DIR_CHAR;
pj_open_lib.c:        fname[n = (int)strlen(fname)] = DIR_CHAR;
pj_param.c:    l = (int) strlen(opt);
pj_pr_list.c:                   l = (int)strlen(t->param) + 1;
pj_pr_list.c:        l = (int)strlen(t->param) + 1;
proj_4D_api.c:    P = pj_init_ctx (ctx, (int) argc, argv);
proj_etmerc.c:        zone = (int)(floor ((adjlon (P->lam0) + M_PI) * 30. / M_PI));
proj_strtod.c:    printf ("res = %20.15g. Rest = [%s],  errno = %d\n", res, endptr, (int) errno);
rtodms.c:       min = (int)fmod(r, 60.);
rtodms.c:       deg = (int)r;

It is not a huge task to go through each of them and change them to behave as we want them to. I can do that one of the coming days.

@cffk
Copy link
Contributor

cffk commented Apr 24, 2018

You will note that several of these (int) casts are actually rounds. Thus in geodesic.c

q = r == r ? (int)(floor(r / 90 + (real)(0.5))) : 0

becomes the simpler

q = lround(r / 90);

You still need to worry about how ties are handled (no problem in geodesic.c, however). And in this case, I don't need to worry about what arbitrary value gets returned with a NaN.

@cffk
Copy link
Contributor

cffk commented Apr 24, 2018

Here's my implementation of lround...

#include <math.h>
#include <limits.h>

double roundx(double x) {
  /* The handling of corner cases is copied from boost; see
   *   /~https://github.com/boostorg/math/pull/8
   * with improvements to return -0.0 when appropriate */
  double t;
  if (x == 0)
    return x;               /* Retain sign of 0 */
  else if (0 < x && x <  0.5)
    return +0.0;
  else if (0 > x && x > -0.5)
    return -0.0;
  else if (x > 0) {
    t = ceil(x);
    return 0.5 < t - v ? t - 1 : t;
  } else {                  /* Includes NaN */
    t = floor(x);
    return 0.5 < x - t ? t + 1 : t;
  }
}

long lroundx(double x) {
  /* Default value for overflow + NaN + (x == LONG_MIN) */
  long r = LONG_MIN;
  x = roundx(x);
  if (fabs(x) < -(double)LONG_MIN) /* Assume (double)LONG_MIN is exact */
    r = (int)x;
  return r;
}

@kbevers kbevers force-pushed the return-nans-quickly branch 12 times, most recently from e3df7a1 to d9f0d6c Compare May 8, 2018 06:11
@kbevers kbevers force-pushed the return-nans-quickly branch from d9f0d6c to 58cbb9f Compare May 8, 2018 07:16
@kbevers kbevers force-pushed the return-nans-quickly branch 2 times, most recently from 7b7804b to dc1802e Compare May 23, 2018 14:02
@kbevers kbevers force-pushed the return-nans-quickly branch from dc1802e to 6db260f Compare May 23, 2018 15:20
@kbevers
Copy link
Member Author

kbevers commented May 23, 2018

@cffk I finally got round to fixing the build errors in this. I plan on merging this tomorrow so that it can be included in PROJ 5.1.0RC1 (which I intend to release tomorrow afternoon CEST). If you have the time, could you give this a quick look and check if everything is sane?

There's some stupid things regarding pj_isnan because it is used by gie and hence need to be exported in the Windows DLL. That makes it necessary to declare the function at all times and not only when C99 math is unavailable.

@cffk
Copy link
Contributor

cffk commented May 23, 2018

Replacing (int)floor(x) by lround(floor(x)) is fine. But there are cases, e.g., maxcoord = (int) (sidelength * 2.0 + 0.5); where the use of lround changes from truncation to rounding. If you know the quantity is non-negative, you can fix by throwing in a call to floor.

@kbevers
Copy link
Member Author

kbevers commented May 24, 2018

With http://c-faq.com/fp/round.html in mind, I think the cases in PJ_isea can handled by removing the + 0.5 and using lround, so that

int offset = (int)(pow(3.0, g->resolution - 1) + 0.5)

becomes

long offset = lround(pow(3.0, g->resolution - 1))

The original intension must have been to round the numbers to the nearest integer value.

@cffk
Copy link
Contributor

cffk commented May 24, 2018

Indeed.. The problem is now how ties are treated: lround(x) rounds away from zero, floor(x+0.5) rounds up. These are the same if x >= 0. So the exercise is to check each case. (And perhaps in some cases, it's immaterial how ties are treated.)

Originally the code was doing double to int conversions like

    y = (int)(x + 0.5)

which results in rounding when typecasting. In an earlier attempt to
avoid buffer overflows in integer typecasts this was changed to

    y = lround(x + 0.5)

which doesn't give the origial results. We fix that here by instead
doing

    y = lround(x)

It is safe to so as long as x is positive.
@kbevers
Copy link
Member Author

kbevers commented May 24, 2018

I have checked all of the cases and I believe they are all cases of positive numbers, so this should be safe.

@cffk
Copy link
Contributor

cffk commented May 24, 2018

OK, looks good.

@kbevers kbevers added this to the 5.1.0 milestone May 24, 2018
@kbevers
Copy link
Member Author

kbevers commented May 24, 2018

OK, looks good.

Awesome. Thanks for supervising this!

@kbevers kbevers merged commit a45ad8c into OSGeo:master May 24, 2018
@kbevers kbevers deleted the return-nans-quickly branch July 9, 2018 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants