Set correct checksums and samps_per_frame in Record.wrsamp #424

bemoody · 2022-09-23T19:02:21Z

There are a couple of ways to write a record using this package. One is wfdb.wrsamp, the other is creating a Record object and calling the wrsamp method.

Record.wrsamp was, I think, meant to be more low-level and perhaps only intended for internal use of the package. But for various reasons, applications may want or need to call this method directly. So I think we need to support it, and make it more fool-proof; in particular, this method shouldn't generate invalid WFDB records even if the application supplies wrong parameters.

Here I address two of those parameters - checksum and samps_per_frame.

Typically, checksum is set by wfdb.rdrecord. But naturally, the input data may have been modified (explicitly by the application, or implicitly e.g. by smooth_frames) between reading the record and writing it. So when calling Record.wrsamp, we want to set this field to the actual checksums of the signals. For backward compatibility and for applications that want to edit an existing header file, we leave the checksums alone if they're absent or if they're already correct.

(WFDB checksums are stored and written as an integer but only the low 16 bits are used. So values of -1 and 65535 are equally correct. Yes, this is kind of broken.)

samps_per_frame is only relevant when writing multi-frequency data. If you are writing single-frequency signal files, then the header file must not claim there are multiple samples per frame. Since the field is optional, we can simply drop it when expanded is false.

(If you are writing multi-frequency signal files, then Record.wrsamp does correctly verify that samps_per_frame is consistent with the dimensions of e_d_signal.)

Fixes issue #333.

The original intention of this code was that get_write_fields would return a list of signal information fields to be saved, and check_sig_cohesion would check that those specified fields matched the actual signal data. However, this did not do what was intended. HeaderMixin.get_write_fields returns a 2-tuple of (rec_write_fields, sig_write_fields), not a list of field names. Therefore, the parameter to check_sig_cohesion never had "checksum" or "init_value" as elements, and therefore, check_sig_cohesion never actually checked the values of these fields when writing a record. Many existing test cases rely on being able to read and write a record as-is (missing fields, incorrect init_values, checksums that are correct mod 2**16 but not equal to the result of calc_checksum); these tests would fail if Record.wrsamp actually required init_value and checksum fields to match as check_sig_cohesion expects. This code should be redesigned, and the test cases probably should be less strict. In the meantime, however, make it clear that this function isn't actually doing what it was pretending to do.

When the expanded argument to wrsamp is false, we are writing the d_signal data, which contains one sample per frame by definition. Therefore, the header file must not claim that the signal files will contain multiple samples per frame. The samps_per_frame attribute of the Record object may or may not be set appropriately for the actual data contained in the Record object; for example, rdrecord(r, smooth_frames=True) will set samps_per_frame according to the number of samples per frame in the original record.

When writing a record, the application generally shouldn't have to worry about details like calculating checksums. The wfdb package *should* ensure that the output files it generates are correct, and *should not* trust applications to calculate checksums correctly. Moreover, although it was originally intended for SignalMixin.wr_dats to verify that the application-supplied checksums were correct, this never actually worked, and existing test cases depend on that fact. So it is not reasonable for wr_dats to raise an exception if the checksums are wrong. Instead, update the checksums in the Record object when wrsamp is called (and note that we have to do this before calling wrheader, which is called before wr_dats). For compatibility with old test cases, that expect to be able to read and write records quasi-unmodified, we don't set checksums for signals that previously had a checksum of None, or that previously had a checksum that was correct mod 2**16.

Also reformat this as an f-string for readability.

tompollard · 2022-11-08T20:30:05Z

Thanks @bemoody, basically this looks good to me, though I don't totally understand the purpose of the expanded flag. Is it true that only the expanded formats (e_d_signal and e_p_signal?) require the samps_per_frame value in the header?

Not directly related to this pull request, but I think the addition/removal of fields in the header makes them less human-readable. I'd prefer consistent fields that are sometimes populated with something like "NULL" than fields that disappear when not required,.

bemoody · 2022-11-09T17:38:29Z

If samples-per-frame is omitted from the header file, it defaults to 1, which is what we want when writing d_signal or p_signal data.

Multi-frequency was a fairly late addition to the format, which is why the samples-per-frame field in the header is optional. If I were designing a format or an API now, multi-frequency support wouldn't be optional.

tompollard · 2022-11-09T18:28:43Z

Makes sense, thanks Benjamin.

Benjamin Moody added 5 commits September 22, 2022 16:54

Test writing an originally-multi-frequency record with expanded=False.

9a0d086

check_sig_cohesion: fix error when expanded signal lengths mismatch.

489aa62

Also reformat this as an f-string for readability.

tompollard approved these changes Nov 9, 2022

View reviewed changes

tompollard merged commit bbe54ce into main Nov 9, 2022

tompollard deleted the wrsamp-checksum-spf branch November 9, 2022 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set correct checksums and samps_per_frame in Record.wrsamp #424

Set correct checksums and samps_per_frame in Record.wrsamp #424

bemoody commented Sep 23, 2022

tompollard commented Nov 8, 2022

bemoody commented Nov 9, 2022

tompollard commented Nov 9, 2022

Set correct checksums and samps_per_frame in Record.wrsamp #424

Set correct checksums and samps_per_frame in Record.wrsamp #424

Conversation

bemoody commented Sep 23, 2022

tompollard commented Nov 8, 2022

bemoody commented Nov 9, 2022

tompollard commented Nov 9, 2022