Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Replace defaultSendRequestWhen with pendingRequestTimeout #6865

Merged
merged 6 commits into from
Jan 9, 2024

Conversation

slugzero
Copy link
Contributor

@slugzero slugzero commented Jan 8, 2024

defaultSendRequestWhen is removed with Koenkk/zigbee-herdsman#860.
This PR replaces it with pendingRequestTimeout

Copy link
Contributor

@sjorge sjorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems I can't add a suggested change without requesting changes 🤷 PR is fine for me without the suggestions too.

However with the requested change in place, we could call the same modernExtend for the other devices update and just pass 0. Which seems cleaner and we want to eventually get most things in a modernExtend anyway.

slugzero and others added 2 commits January 9, 2024 07:34
Co-authored-by: Jorge Schrauwen <registration@blackdot.be>
Co-authored-by: Jorge Schrauwen <registration@blackdot.be>
@slugzero
Copy link
Contributor Author

slugzero commented Jan 9, 2024

Thanks, @sjorge, makes sense. I accepted your proposals, but for also changing it for the other two devices, I would have to dive a little bit deeper into the modernExtend thing first.

Btw, I saw that you are the original author of that function. I am not 100% sure whether the 1h timeout will lead to your "ideal" device behavior (I.e. ideal tradeoff between robustness of request delivery and perceived responsiveness of the device), and I don't own any of these devices to test myself.
Ideally, the timeout should be somewhere around 1-2x the interval the device is turning its Rx on and is able to receive messages. Setting a higher value than that could lead to a bad request being seemingly stuck for longer than necessary before it eventually fails, e.g. if the device turns Rx on every minute, we would retry sending every minute for up to 1h (or until receiving a genPollControl checkin) before giving up.

@sjorge
Copy link
Contributor

sjorge commented Jan 9, 2024

I'm not supper impressed with the Aqara TVOC sensor for which I added the function. It's has most clusters not discoverable and genPollCtrl is just completely absent.

Basically the device randomly sends a aqaraOpple report (non configurable), usually around 1x hour, don't think I've seen longer than that. So 1h is probably fine, although maybe 1h15 or something would be safer.

But we do configure reporting for temperature and humidity clusters, and they report more often (unless the device resets, which it does often, then it forgets reporting and bindings 😡 ). So 1h seems fine for that device.

It's probably not a good range for something like the sonoff contact sensors though, when not triggered they only report battery 1x every 36 hours.

Perhaps we should not have a default for timeoutMs ? We can then use 1h for the Aqara TVOC, 0 for the 2 others devices and just make it mandatory to specify something when adding the query. modernExtend is really nice new feature, basically you'd remove the current code for setting it to 0 from the configure function, if that all there is you can drop it completely. Then you add the quirkPendingRequestTimeout() to the extend: [] array in the device definition. they are executed in order, so having this particular one as the top most one makes sense (so you avoid sendpolicy with imediate for all other stuff after that). For the Aqara TVOC one after that i have the quirk to add the non discoverable clusters, after that the new 'traditional' modernExtends that are both fromZigbee, toZigbee, configure step and expose data wrapped in one.

I'd need to dig up one of the sonoff contact sensors, I stopped using them because they were unreliable for me. If it doesn't support genPollCtrl we can add the quirk with 36h as timeout value. Just annoying a lot of (usually cheaper chinese) devices do not implement genPollCtrl.

@slugzero
Copy link
Contributor Author

slugzero commented Jan 9, 2024

Well, I think it's probably not the best use of your time to dig out one of these devices just to optimize this value 😄 I have a bunch of the Aqara contact sensors in use and I believe they work similarly. But for several years now, I haven't yet felt the need to send any message to them after initial pairing.

I think it does makes sense to remove the default so that every device has to define its own value, though. I will change it. And look into the modernExtend, sounds good :-)

@Koenkk Koenkk merged commit 05e8da6 into Koenkk:master Jan 9, 2024
2 checks passed
@Koenkk
Copy link
Owner

Koenkk commented Jan 9, 2024

Thanks! Updated quirkPendingRequestTimeout to use timeLookup for better readability.

@slugzero slugzero deleted the request-queue-fixes-6 branch January 10, 2024 12:35
@sjorge
Copy link
Contributor

sjorge commented Jan 11, 2024

Sadly the behavior doesn't seem to be the same as before :(

I could apply 3 reporting configs, get up, walk over and hit the button. Then they would apply.

After this change, the first 1 fails before I can change the 3rd config via the frontend.
Walking over and hitting the button seems to do nothing by the time I get there (like 20-30 secs after the last apply)

I guess we can just remove the quirk then as it is now useless?

Edit: does it try to send once immediately and then queue for retry? If so does it wait for some standard poll request that will probably never come as this aqara device doesn't really follow the standards well it seems. Before it would just queue and spam the device when it send any message.

Edit2: joining is back to the hit and miss, need 4-5 attempts to get it through the initial configure as well. It does indeed look it at least tries to send each message once immediately but never tries again after. IMHO CSA is too lax with giving things the zigbee label, lumi/aqara/xiaomi/maji ignore so much of the standard.

@sjorge
Copy link
Contributor

sjorge commented Jan 11, 2024

I don't fully understand the queue code but i don't think it should behave differently from the old send active except it tries ones imediatly... so not sure why the behavior with the aqara device is so different.

@slugzero
Copy link
Contributor Author

Agree, it should not be different. Can you check the pendingRequestTimeout in the device database? Should be 1h, maybe it was reset somewhere

Do you get the failure message after sending the second command? Or also if you just wait for a while?

Can you check/post a debug log? Should contain lots of Request Queue output for that device.

@sjorge
Copy link
Contributor

sjorge commented Jan 11, 2024

I'll look in the db later tonight, I just hit the apply button on the 3 reporting configs I wanted to tweak and got the timeout error for the 1st one just as I hit apply for the 3rd.

@slugzero
Copy link
Contributor Author

There's one more thing you could try: Use the latest zhc with quirkPendingRequestTimeout together with zigbee-herdsman from last week. That way we would know if the queue or the quirk itself is the culprit. I'll see if I can reproduce with an aqara device tonight

@sjorge
Copy link
Contributor

sjorge commented Jan 11, 2024

I wonder if it defaulted to the old 'immediate' behavior, I'm on dev so it has the correct herdsman and the zhc change.

{
  "id": 17,
  "type": "EndDevice",
  "ieeeAddr": "0x54ef441000684954",
  "nwkAddr": 38890,
  "manufId": 4447,
  "manufName": "LUMI",
  "powerSource": "Battery",
  "modelId": "lumi.airmonitor.acn01",
  "epList": [
    1
  ],
  "endpoints": {
    "1": {
      "profId": 260,
      "epId": 1,
      "devId": 1026,
      "inClusterList": [
        0,
        3,
        1280,
        1,
        1026,
        1029,
        12,
        64704
      ],
      "outClusterList": [
        25
      ],
      "clusters": {
        "genBasic": {
          "attributes": {
            "modelId": "lumi.airmonitor.acn01",
            "appVersion": 29
          }
        },
        "msTemperatureMeasurement": {
          "attributes": {
            "measuredValue": 1871
          }
        },
        "msRelativeHumidity": {
          "attributes": {
            "measuredValue": 4239
          }
        },
        "genPowerCfg": {
          "attributes": {
            "batteryVoltage": 30
          }
        },
        "aqaraOpple": {
          "attributes": {
            "247": {
              "type": "Buffer",
              "data": [
                1,
                33,
                244,
                11,
                3,
                40,
                18,
                4,
                33,
                168,
                19,
                5,
                33,
                21,
                0,
                6,
                36,
                15,
                0,
                0,
                0,
                0,
                8,
                33,
                29,
                1,
                10,
                33,
                167,
                86,
                12,
                32,
                1,
                100,
                41,
                82,
                7,
                101,
                33,
                171,
                16,
                102,
                33,
                66,
                0,
                103,
                32,
                1
              ]
            },
            "276": 1,
            "297": 1,
            "airQuality": 1,
            "displayUnit": 1
          }
        },
        "genAnalogInput": {
          "attributes": {
            "presentValue": 70
          }
        }
      },
      "binds": [
        {
          "cluster": 1026,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b00228120b5",
          "endpointID": 1
        },
        {
          "cluster": 1029,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b00228120b5",
          "endpointID": 1
        },
        {
          "cluster": 12,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b00228120b5",
          "endpointID": 1
        },
        {
          "cluster": 1,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b00228120b5",
          "endpointID": 1
        },
        {
          "cluster": 64704,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b00228120b5",
          "endpointID": 1
        },
        {
          "cluster": 1026,
          "type": "endpoint",
          "deviceIeeeAddress": "0x001fee00000098be",
          "endpointID": 1
        },
        {
          "cluster": 1026,
          "type": "endpoint",
          "deviceIeeeAddress": "0x001fee00000097dd",
          "endpointID": 1
        }
      ],
      "configuredReportings": [
        {
          "cluster": 1,
          "attrId": 32,
          "minRepIntval": 3600,
          "maxRepIntval": 65000,
          "repChange": 0,
          "manufacturerCode": null
        },
        {
          "cluster": 12,
          "attrId": 85,
          "minRepIntval": 10,
          "maxRepIntval": 900,
          "repChange": 5,
          "manufacturerCode": null
        },
        {
          "cluster": 1026,
          "attrId": 0,
          "minRepIntval": 10,
          "maxRepIntval": 900,
          "repChange": 50,
          "manufacturerCode": null
        },
        {
          "cluster": 1029,
          "attrId": 0,
          "minRepIntval": 10,
          "maxRepIntval": 900,
          "repChange": 100,
          "manufacturerCode": null
        }
      ],
      "meta": {}
    }
  },
  "appVersion": 29,
  "stackVersion": 2,
  "hwVersion": 1,
  "dateCode": "20220222",
  "swBuildId": "2020\u0000www.",
  "zclVersion": 3,
  "interviewCompleted": true,
  "meta": {
    "configured": 1324213189
  },
  "lastSeen": 1704977197205,
  "defaultSendRequestWhen": "active"
}

Look at the logs I cannot find a reconfigure being triggered, I think there might be a bug with modernExtend not triggering a reconfigure if functions get added/changed. @Koenkk 👈

I'll manually reconfigure when I am home and hope it completes, and check the db again.
I'll need to reconfigure a few reportings after again so I'll know if there is an improvement too then.

@sjorge
Copy link
Contributor

sjorge commented Jan 11, 2024

OK done some poking on my dev network (just test devices so lots less traffic). I have one of those Aqara TVOC's on there purely for dev work.

Was still on zh 0.30.0:

{
  "id": 2,
  "type": "EndDevice",
  "ieeeAddr": "0x54ef441000684c5c",
  "nwkAddr": 33988,
  "manufId": 4447,
  "manufName": "LUMI",
  "powerSource": "Battery",
  "modelId": "lumi.airmonitor.acn01",
  "epList": [
    1
  ],
  "endpoints": {
    "1": {
      "profId": 260,
      "epId": 1,
      "devId": 1026,
      "inClusterList": [
        0,
        3,
        1280,
        1
      ],
      "outClusterList": [
        25
      ],
      "clusters": {
        "genBasic": {
          "attributes": {
            "modelId": "lumi.airmonitor.acn01",
            "appVersion": 29,
            "manufacturerName": "LUMI",
            "powerSource": 3,
            "zclVersion": 3,
            "stackVersion": 2,
            "hwVersion": 1,
            "dateCode": "20220222",
            "swBuildId": "2020\u0000www."
          }
        },
        "genPowerCfg": {
          "attributes": {
            "batteryVoltage": 30
          }
        },
        "aqaraOpple": {
          "attributes": {
            "247": {
              "type": "Buffer",
              "data": [
                1,
                33,
                184,
                11,
                3,
                40,
                21,
                4,
                33,
                168,
                1,
                5,
                33,
                80,
                0,
                6,
                36,
                4,
                0,
                0,
                0,
                0,
                8,
                33,
                29,
                1,
                10,
                33,
                0,
                0,
                12,
                32,
                1,
                100,
                41,
                115,
                8,
                101,
                33,
                96,
                16,
                102,
                33,
                0,
                0,
                103,
                32,
                0
              ]
            },
            "airQuality": 0,
            "displayUnit": 1
          }
        },
        "msTemperatureMeasurement": {
          "attributes": {
            "measuredValue": 2145
          }
        },
        "msRelativeHumidity": {
          "attributes": {
            "measuredValue": 4281
          }
        },
        "genAnalogInput": {
          "attributes": {
            "presentValue": 0
          }
        }
      },
      "binds": [
        {
          "cluster": 1,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001e17ef39",
          "endpointID": 1
        },
        {
          "cluster": 12,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001e17ef39",
          "endpointID": 1
        },
        {
          "cluster": 1026,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001e17ef39",
          "endpointID": 1
        },
        {
          "cluster": 1029,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001e17ef39",
          "endpointID": 1
        }
      ],
      "configuredReportings": [
        {
          "cluster": 1,
          "attrId": 32,
          "minRepIntval": 3600,
          "maxRepIntval": 65000,
          "repChange": 0,
          "manufacturerCode": null
        },
        {
          "cluster": 12,
          "attrId": 85,
          "minRepIntval": 10,
          "maxRepIntval": 3600,
          "repChange": 5,
          "manufacturerCode": null
        },
        {
          "cluster": 1026,
          "attrId": 0,
          "minRepIntval": 10,
          "maxRepIntval": 3600,
          "repChange": 100,
          "manufacturerCode": null
        },
        {
          "cluster": 1029,
          "attrId": 0,
          "minRepIntval": 10,
          "maxRepIntval": 3600,
          "repChange": 100,
          "manufacturerCode": null
        }
      ],
      "meta": {}
    }
  },
  "appVersion": 29,
  "stackVersion": 2,
  "hwVersion": 1,
  "dateCode": "20220222",
  "swBuildId": "2020\u0000www.",
  "zclVersion": 3,
  "interviewCompleted": true,
  "meta": {
    "configured": 1324213189
  },
  "lastSeen": 1704984050254,
  "defaultSendRequestWhen": "immediate"
}

After upgrading to 0.32.2: (no re-configure triggered)

Zigbee2MQTT:info  2024-01-11 15:45:44: Zigbee2MQTT started!
Zigbee2MQTT:debug 2024-01-11 15:45:59: Received Zigbee message from '0x54ef441000684c5c', type 'attributeReport', cluster 'genBasic', data '{"modelId":"lumi.airmonitor.acn01"}' from endpoint 1 with groupID 0
Zigbee2MQTT:debug 2024-01-11 15:45:59: Received Zigbee message from '0x54ef441000684c5c', type 'attributeReport', cluster 'aqaraOpple', data '{"247":{"data":[1,33,248,10,3,40,21,4,33,168,19,5,33,80,0,6,36,11,0,0,0,0,8,33,29,1,10,33,0,0,12,32,1,100,41,111,8,101,33,32,14,102,33,50,0,103,32,1],"type":"Buffer"}}' from endpoint 1 with groupID 0
Zigbee2MQTT:debug 2024-01-11 15:45:59: VOCKQJK11LM: Processed buffer into data {"1":2808,"3":21,"4":5032,"5":80,"6":11,"8":285,"10":0,"12":1,"100":2159,"101":3616,"102":50,"103":1}
Zigbee2MQTT:debug 2024-01-11 15:45:59: VOCKQJK11LM: unknown key 6 with value 11
Zigbee2MQTT:debug 2024-01-11 15:45:59: VOCKQJK11LM: Processed data into payload {"voltage":2808,"battery":0,"device_temperature":21,"power_outage_count":79,"temperature":21.59,"humidity":36.16}
Zigbee2MQTT:debug 2024-01-11 15:45:59: VOCKQJK11LM: Processed data into payload {"voltage":2808,"battery":0,"device_temperature":21,"power_outage_count":79,"temperature":21.59,"humidity":36.16}
Zigbee2MQTT:info  2024-01-11 15:45:59: MQTT publish: topic 'zigbee2mqtt/0x54ef441000684c5c', payload '{"air_quality":"excellent","battery":0,"device_temperature":21,"display_unit":"ppb_celsius","humidity":36.16,"linkquality":160,"power_outage_count":79,"temperature":21.59,"voc":67,"voltage":2808}'
{
  "id": 2,
  "type": "EndDevice",
  "ieeeAddr": "0x54ef441000684c5c",
  "nwkAddr": 33988,
  "manufId": 4447,
  "manufName": "LUMI",
  "powerSource": "Battery",
  "modelId": "lumi.airmonitor.acn01",
  "epList": [
    1
  ],
  "endpoints": {
    "1": {
      "profId": 260,
      "epId": 1,
      "devId": 1026,
      "inClusterList": [
        0,
        3,
        1280,
        1,
        1026,
        1029,
        12,
        64704
      ],
      "outClusterList": [
        25
      ],
      "clusters": {
        "genBasic": {
          "attributes": {
            "modelId": "lumi.airmonitor.acn01",
            "appVersion": 29,
            "manufacturerName": "LUMI",
            "powerSource": 3,
            "zclVersion": 3,
            "stackVersion": 2,
            "hwVersion": 1,
            "dateCode": "20220222",
            "swBuildId": "2020\u0000www."
          }
        },
        "genPowerCfg": {
          "attributes": {
            "batteryVoltage": 28
          }
        },
        "aqaraOpple": {
          "attributes": {
            "247": {
              "type": "Buffer",
              "data": [
                1,
                33,
                248,
                10,
                3,
                40,
                21,
                4,
                33,
                168,
                19,
                5,
                33,
                80,
                0,
                6,
                36,
                3,
                0,
                0,
                0,
                0,
                8,
                33,
                29,
                1,
                10,
                33,
                0,
                0,
                12,
                32,
                1,
                100,
                41,
                106,
                8,
                101,
                33,
                27,
                14,
                102,
                33,
                57,
                0,
                103,
                32,
                1
              ]
            },
            "airQuality": 1,
            "displayUnit": 1
          }
        },
        "msTemperatureMeasurement": {
          "attributes": {
            "measuredValue": 2154
          }
        },
        "msRelativeHumidity": {
          "attributes": {
            "measuredValue": 3610
          }
        },
        "genAnalogInput": {
          "attributes": {
            "presentValue": 57
          }
        }
      },
      "binds": [
        {
          "cluster": 1,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001e17ef39",
          "endpointID": 1
        },
        {
          "cluster": 12,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001e17ef39",
          "endpointID": 1
        },
        {
          "cluster": 1026,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001e17ef39",
          "endpointID": 1
        },
        {
          "cluster": 1029,
          "type": "endpoint",
          "deviceIeeeAddress": "0x00124b001e17ef39",
          "endpointID": 1
        }
      ],
      "configuredReportings": [
        {
          "cluster": 1,
          "attrId": 32,
          "minRepIntval": 3600,
          "maxRepIntval": 65000,
          "repChange": 0,
          "manufacturerCode": null
        },
        {
          "cluster": 12,
          "attrId": 85,
          "minRepIntval": 10,
          "maxRepIntval": 3600,
          "repChange": 5,
          "manufacturerCode": null
        },
        {
          "cluster": 1026,
          "attrId": 0,
          "minRepIntval": 10,
          "maxRepIntval": 3600,
          "repChange": 100,
          "manufacturerCode": null
        },
        {
          "cluster": 1029,
          "attrId": 0,
          "minRepIntval": 10,
          "maxRepIntval": 3600,
          "repChange": 100,
          "manufacturerCode": null
        }
      ],
      "meta": {}
    }
  },
  "appVersion": 29,
  "stackVersion": 2,
  "hwVersion": 1,
  "dateCode": "20220222",
  "swBuildId": "2020\u0000www.",
  "zclVersion": 3,
  "interviewCompleted": true,
  "meta": {
    "configured": 1324213189
  },
  "lastSeen": 1704984421724
}

It successfully executed the quirk extends as it added the missing clusters, but pendingRequestTimeout is still absent from the database. :|

@sjorge
Copy link
Contributor

sjorge commented Jan 11, 2024

Added some logging to it and the quirk does get called:

Zigbee2MQTT:debug 2024-01-11 15:50:08: Received Zigbee message from '0x54ef441000684c5c', type 'readResponse', cluster 'genPowerCfg', data '{"batteryVoltage":28}' from endpoint 1 with groupID 0
Zigbee2MQTT:info  2024-01-11 15:50:08: MQTT publish: topic 'zigbee2mqtt/0x54ef441000684c5c', payload '{"air_quality":"excellent","battery":0,"device_temperature":21,"display_unit":"ppb_celsius","humidity":36.75,"linkquality":182,"power_outage_count":79,"temperature":21.46,"voc":63,"voltage":2800}'
XXX: in quirkPendingRequestTimeout
Zigbee2MQTT:debug 2024-01-11 15:50:08: Reading for 0x54ef441000684c5c/1 aqaraOpple [{"attribute":"airQuality","min":-1,"max":-1,"change":-1}]

So it seems that either device.pendingRequestTimeout = timeoutMs; doesn't update the data structure or that device.save(); doesn't serialize it correctly to the database file.

I believe it is the save function, as it calls toDatabaseEntry() in zh but that one does not serialize pendingRequestTimeout

    private toDatabaseEntry(): DatabaseEntry {
        const epList = this.endpoints.map((e): number => e.ID);
        const endpoints: KeyValue = {};
        for (const endpoint of this.endpoints) {
            endpoints[endpoint.ID] = endpoint.toDatabaseRecord();
        }

        return {
            id: this.ID, type: this.type, ieeeAddr: this.ieeeAddr, nwkAddr: this.networkAddress,
            manufId: this.manufacturerID, manufName: this.manufacturerName, powerSource: this.powerSource,
            modelId: this.modelID, epList, endpoints, appVersion: this.applicationVersion,
            stackVersion: this.stackVersion, hwVersion: this.hardwareVersion, dateCode: this.dateCode,
            swBuildId: this.softwareBuildID, zclVersion: this.zclVersion, interviewCompleted: this.interviewCompleted,
            meta: this.meta, lastSeen: this.lastSeen, checkinInterval: this.checkinInterval
        };
    }

@slugzero
Copy link
Contributor Author

Thanks for checking!

It successfully executed the quirk extends as it added the missing clusters, but pendingRequestTimeout is still absent from the database. :|

Yes, I just noticed that too and that could explain it. It is simply not persisted, only checkinInterval (which is the actual device property from which pendingRequestTimeout is derived) is. But checkinInterval is not settable from the outside. Seems that I haven't considered the case that someone wants to change this from the outside permanently :-D

For testing, you can just try to also set checkinInterval from the quirk (need to make it public/add a setter first). As a proper fix, I think I'd rename the quirk to quirkCheckinInterval(), and add a setter for it to zh which also sets pendingRequesttimeout

@sjorge
Copy link
Contributor

sjorge commented Jan 11, 2024

/~https://github.com/Koenkk/zigbee-herdsman/compare/master...sjorge:device_save_pendingRequestTimeout?expand=1 also seems to work I think, but your proposal is probably better.

I'm out of time to mess around today, I might have another go at it this weekend unless you beat me too it. If you do a follow PR on zhc when renaming the quirk, also grab the change from /~https://github.com/Koenkk/zigbee-herdsman-converters/pull/6874/files if it's not merged yet. That fixes the other 2 devices to also use the quirk.

@slugzero
Copy link
Contributor Author

sounds good! Thanks for checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants