[etherlab-dev] Etherlab master looses mailbox configuration during client connection loss

Discussion:

Christoph Schröder

2017-09-20 15:08:55 UTC

Hi All,

I encountered a problem with the recovery abilities of the Etherlab
master after connection loss (e.g. pull out cable of one slave and plug
it in again). The master seems to reset the mailbox configuration. If I
start a VoE-request I get the following kernel message:
[132256.054043] EtherCAT ERROR 0-main-0: Data size (24) does not fit in
mailbox (0)!

The mailbox size configured through ecrt_slave_config_create_voe_handler
seems to be lost and not only for the slave disconnected, but also for
the slave that never lost it's connection (tested with 2). This happens
with and without the newest inofficial patchset (20170914).

This seems to be a bug as ecrt_slave_config_create_voe_handler has to be
called before ecrt_master_activate, so recreation of the config after
recovery of the connection is not possible.

Without connection loss everything works fine, but we would like to make
the system as robust as possible without the need to restart the
application. Does anyone has an idea how to fix this or can someone at
least explain what happens during a connection loss and recovery resp.
which functions are called by the master?

Thanks and best regards,
Christoph

________________________________

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher
Geschäftsführung: Prof. Dr. Bernd Rech (kommissarisch), Thomas Frederking

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de

Gavin Lambert

2017-09-20 23:37:24 UTC

Permalink

The size in ecrt_slave_config_create_voe_handler specifies the maximum size of the VoE request/reply, which is different from (but must be equal or smaller than) the actual mailbox size.

The mailbox size itself is specified by the slave's SII data, as read from its EEPROM during the INIT -> PREOP transition. If the slave reboots or once communications are restored it will be reconfigured (which involves redoing the INIT -> PREOP transition), but that shouldn't change the sizes unless there's a bug in the slave itself.

It's possible you're running into some kind of timing error, where it's somehow trying to execute the VoE request before the slave has properly re-entered PREOP or higher, although I thought that such cases are prevented in the unofficial patchset at least. My application code does perform an additional sanity check before executing requests though so it's possible I missed a corner case if that is omitted.

Try running your test again with "ethercat debug 1" in effect and look at the syslog. In particular look for log messages starting with "Mailbox configuration"; this is where it reports the detected mailbox sizes that are later used in the error message below. Also check in what order things are happening with regard to when you're interrupting the slave and when the VoE request tries to execute and generates the error.

-----Original Message-----
From: Christoph Schröder
Sent: Thursday, 21 September 2017 03:09
Subject: [etherlab-dev] Etherlab master looses mailbox configuration during
client connection loss
Hi All,
I encountered a problem with the recovery abilities of the Etherlab master
after connection loss (e.g. pull out cable of one slave and plug it in again). The
master seems to reset the mailbox configuration. If I start a VoE-request I get
[132256.054043] EtherCAT ERROR 0-main-0: Data size (24) does not fit in
mailbox (0)!
The mailbox size configured through ecrt_slave_config_create_voe_handler
seems to be lost and not only for the slave disconnected, but also for the
slave that never lost it's connection (tested with 2). This happens with and
without the newest inofficial patchset (20170914).
This seems to be a bug as ecrt_slave_config_create_voe_handler has to be
called before ecrt_master_activate, so recreation of the config after
recovery of the connection is not possible.
Without connection loss everything works fine, but we would like to make
the system as robust as possible without the need to restart the application.
Does anyone has an idea how to fix this or can someone at least explain what
happens during a connection loss and recovery resp.
which functions are called by the master?
Thanks and best regards,
Christoph
________________________________
Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
Forschungszentren e.V.
Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr.
Jutta Koch-Unterseher
Geschäftsführung: Prof. Dr. Bernd Rech (kommissarisch), Thomas Frederking
Sitz Berlin, AG Charlottenburg, 89 HRB 5583
Hahn-Meitner-Platz 1
D-14109 Berlin
http://www.helmholtz-berlin.de
_______________________________________________
etherlab-dev mailing list
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Christoph Schroeder

2017-09-21 09:25:20 UTC

Permalink

Hi Gavin,

thanks for your answer. The "Mailbox configuration" debug message hint
was a good idea. I found that the mailboxes aren't configured at all
during recovery of the slave. I looked up in the master code and found
that in the fsm_slave_scan.c some things are only done if the slave
supports CoE. Maybe this is a little odd configuration, but our slave is
a custom slave which only supports VoE! I attached a patch (applied to
the top of the current unofficial patchset) that fixed the issue for me.
Not sure if this is the correct way to fix it though, so someone who is
more familar with the master code should have a look at it. Maybe this
is also relevant for other Mailbox protocols. The master shouldn't
assume that mailboxes are only used if CoE is supported. I would
appreciate it if a fix can be added to the patchset in the future.

Best regards,
Christoph

Post by Gavin Lambert
The size in ecrt_slave_config_create_voe_handler specifies the maximum size of the VoE request/reply, which is different from (but must be equal or smaller than) the actual mailbox size.
The mailbox size itself is specified by the slave's SII data, as read from its EEPROM during the INIT -> PREOP transition. If the slave reboots or once communications are restored it will be reconfigured (which involves redoing the INIT -> PREOP transition), but that shouldn't change the sizes unless there's a bug in the slave itself.
It's possible you're running into some kind of timing error, where it's somehow trying to execute the VoE request before the slave has properly re-entered PREOP or higher, although I thought that such cases are prevented in the unofficial patchset at least. My application code does perform an additional sanity check before executing requests though so it's possible I missed a corner case if that is omitted.
Try running your test again with "ethercat debug 1" in effect and look at the syslog. In particular look for log messages starting with "Mailbox configuration"; this is where it reports the detected mailbox sizes that are later used in the error message below. Also check in what order things are happening with regard to when you're interrupting the slave and when the VoE request tries to execute and generates the error.

-----Original Message-----
From: Christoph SchrÃ¶der
Sent: Thursday, 21 September 2017 03:09
Subject: [etherlab-dev] Etherlab master looses mailbox configuration during
client connection loss
Hi All,
I encountered a problem with the recovery abilities of the Etherlab master
after connection loss (e.g. pull out cable of one slave and plug it in again). The
master seems to reset the mailbox configuration. If I start a VoE-request I get
[132256.054043] EtherCAT ERROR 0-main-0: Data size (24) does not fit in
mailbox (0)!
The mailbox size configured through ecrt_slave_config_create_voe_handler
seems to be lost and not only for the slave disconnected, but also for the
slave that never lost it's connection (tested with 2). This happens with and
without the newest inofficial patchset (20170914).
This seems to be a bug as ecrt_slave_config_create_voe_handler has to be
called before ecrt_master_activate, so recreation of the config after
recovery of the connection is not possible.
Without connection loss everything works fine, but we would like to make
the system as robust as possible without the need to restart the application.
Does anyone has an idea how to fix this or can someone at least explain what
happens during a connection loss and recovery resp.
which functions are called by the master?
Thanks and best regards,
Christoph
________________________________
Helmholtz-Zentrum Berlin fÃŒr Materialien und Energie GmbH
Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
Forschungszentren e.V.
Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr.
Jutta Koch-Unterseher
GeschÃ€ftsfÃŒhrung: Prof. Dr. Bernd Rech (kommissarisch), Thomas Frederking
Sitz Berlin, AG Charlottenburg, 89 HRB 5583
Hahn-Meitner-Platz 1
D-14109 Berlin
http://www.helmholtz-berlin.de
_______________________________________________
etherlab-dev mailing list
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

________________________________

Helmholtz-Zentrum Berlin fÃŒr Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher
GeschÃ€ftsfÃŒhrung: Prof. Dr. Bernd Rech (kommissarisch), Thomas Frederking

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de