Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set correct checksums and samps_per_frame in Record.wrsamp #424

Merged
merged 5 commits into from
Nov 9, 2022

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented Sep 23, 2022

There are a couple of ways to write a record using this package. One is wfdb.wrsamp, the other is creating a Record object and calling the wrsamp method.

Record.wrsamp was, I think, meant to be more low-level and perhaps only intended for internal use of the package. But for various reasons, applications may want or need to call this method directly. So I think we need to support it, and make it more fool-proof; in particular, this method shouldn't generate invalid WFDB records even if the application supplies wrong parameters.

Here I address two of those parameters - checksum and samps_per_frame.

Typically, checksum is set by wfdb.rdrecord. But naturally, the input data may have been modified (explicitly by the application, or implicitly e.g. by smooth_frames) between reading the record and writing it. So when calling Record.wrsamp, we want to set this field to the actual checksums of the signals. For backward compatibility and for applications that want to edit an existing header file, we leave the checksums alone if they're absent or if they're already correct.

(WFDB checksums are stored and written as an integer but only the low 16 bits are used. So values of -1 and 65535 are equally correct. Yes, this is kind of broken.)

samps_per_frame is only relevant when writing multi-frequency data. If you are writing single-frequency signal files, then the header file must not claim there are multiple samples per frame. Since the field is optional, we can simply drop it when expanded is false.

(If you are writing multi-frequency signal files, then Record.wrsamp does correctly verify that samps_per_frame is consistent with the dimensions of e_d_signal.)

Fixes issue #333.

Benjamin Moody added 5 commits September 22, 2022 16:54
The original intention of this code was that get_write_fields would
return a list of signal information fields to be saved, and
check_sig_cohesion would check that those specified fields matched the
actual signal data.

However, this did not do what was intended.
HeaderMixin.get_write_fields returns a 2-tuple of (rec_write_fields,
sig_write_fields), not a list of field names.  Therefore, the
parameter to check_sig_cohesion never had "checksum" or "init_value"
as elements, and therefore, check_sig_cohesion never actually checked
the values of these fields when writing a record.

Many existing test cases rely on being able to read and write a record
as-is (missing fields, incorrect init_values, checksums that are
correct mod 2**16 but not equal to the result of calc_checksum); these
tests would fail if Record.wrsamp actually required init_value and
checksum fields to match as check_sig_cohesion expects.

This code should be redesigned, and the test cases probably should be
less strict.  In the meantime, however, make it clear that this
function isn't actually doing what it was pretending to do.
When the expanded argument to wrsamp is false, we are writing the
d_signal data, which contains one sample per frame by definition.
Therefore, the header file must not claim that the signal files will
contain multiple samples per frame.

The samps_per_frame attribute of the Record object may or may not be
set appropriately for the actual data contained in the Record object;
for example, rdrecord(r, smooth_frames=True) will set samps_per_frame
according to the number of samples per frame in the original record.
When writing a record, the application generally shouldn't have to
worry about details like calculating checksums.  The wfdb
package *should* ensure that the output files it generates are
correct, and *should not* trust applications to calculate checksums
correctly.

Moreover, although it was originally intended for SignalMixin.wr_dats
to verify that the application-supplied checksums were correct, this
never actually worked, and existing test cases depend on that fact.
So it is not reasonable for wr_dats to raise an exception if the
checksums are wrong.

Instead, update the checksums in the Record object when wrsamp is
called (and note that we have to do this before calling wrheader,
which is called before wr_dats).

For compatibility with old test cases, that expect to be able to read
and write records quasi-unmodified, we don't set checksums for signals
that previously had a checksum of None, or that previously had a
checksum that was correct mod 2**16.
Also reformat this as an f-string for readability.
@tompollard
Copy link
Member

Thanks @bemoody, basically this looks good to me, though I don't totally understand the purpose of the expanded flag. Is it true that only the expanded formats (e_d_signal and e_p_signal?) require the samps_per_frame value in the header?

Not directly related to this pull request, but I think the addition/removal of fields in the header makes them less human-readable. I'd prefer consistent fields that are sometimes populated with something like "NULL" than fields that disappear when not required,.

@bemoody
Copy link
Collaborator Author

bemoody commented Nov 9, 2022

If samples-per-frame is omitted from the header file, it defaults to 1, which is what we want when writing d_signal or p_signal data.

Multi-frequency was a fairly late addition to the format, which is why the samples-per-frame field in the header is optional. If I were designing a format or an API now, multi-frequency support wouldn't be optional.

@tompollard
Copy link
Member

Makes sense, thanks Benjamin.

@tompollard tompollard merged commit bbe54ce into main Nov 9, 2022
@tompollard tompollard deleted the wrsamp-checksum-spf branch November 9, 2022 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants