[Etherlab-dev] Ethercat mailbox gateway and TwinSAFE loader issues

Discussion:

Mark Verrijt

2021-03-30 09:17:58 UTC

Looking at the raw data again (I cannot access my setup until tomorrow), my
previous statement:

*I would currently put my money on twincat internally sending a counter of
0 over the slave network evenif it is a request designated for the master.
If this is the case for twincat the 1001 slave counter would be at 3.*

is incorrect; I miscounted. Hopefully, the results of the tests you
suggested will get me back on track.

Hi Graeme,
Thanks for the info & suggestions.
We are using the latest loader, so apparently no retry mechanism there.
In light of your info/suggestions I would currently put my money on
twincat internally sending a counter of 0 over the slave network even
if it is a request designated for the master. If this is the case for
twincat the 1001 slave counter would be at 3. I will check this
a.s.a.p. with wireshark to verify (or find out it is something else).
Either way, I will report back with the results.
Regards,
Mark

Hi Mark,
The Mailbox Header has a Cnt parameter (bits 5-7 of the last byte
of the header). If this value is zero the slave should always
accept the incoming mailbox request. If the value is non-zero (1-7)
then the slave will only accept the request if the value is different
to the previous mailbox request Cnt value.
(I canât dig up where I got this information at the moment.)
Comparing the logs the Cnt values sent by the TwinSAFE Loader are the

same in both cases. However the Cnt value responses from the device 1001
differ. In the TwinCAT side the sent Cnt value does not clash with the
slaves internal Cnt value, whereas on the etherlab mbg side it does.

2 -> 1
3 -> 2
...
3 -> timeout
So I suspect the slaves internal Cnt value is 3, so on receiving a

request with 3 means it thinks itâs a duplicate and is ignored. So it
looks like it is bad luck. It looks like TwinCAT has previously
communicated with device 1001 so itâs count is misaligned enough not to
have a problem.

You could do a trial on the TwinCAT side by sending approx. 5 CoE

mailbox calls to the 1001 device so that itâs internal counter is the same
as the etherlab mbg start condition and see how TwinCAT deals with the
problem (you could log the EtherCAT slave network with Wire Shark.) Itâs
also possible TwinCAT just internally sends a cnt value of 0.

You could also check youâve got the latest TwinSAFE loader. The latest

version might have its own retry built in (or not).

Regards,
Graeme.

Mark Verrijt

Sent: Saturday, 27 March 2021 5:13 am
Subject: [Etherlab-dev] Ethercat mailbox gateway and TwinSAFE loader

issues

Hi,
We ran into some problems using ethercat_mbg together with the TwinSAFE
0 0:0 PREOP + EK1100
1 0:1 PREOP + EL6910, TwinSAFE PLC
while this setup does NOT work (fails with a timeout when using loader
0 0:0 PREOP + EK1100
1 0:1 PREOP + EL6910, TwinSAFE PLC
2 0:2 PREOP + EK1110 EtherCAT-Verlï¿œngerung
I checked what was happening with wireshark when using the twincat

master+mbg, and compared it to what I saw whilst using the ethercat
master+mbg for etherlab.

1. With twincat I see that when a request is done to the master via the

mbg it responds with a Cnt value (bits [4-6] of last byte of the mailbox
header) of 0. With etherlab mbg this value is increasing with each next
message, which also seems to be fine. I could not find in the spec Graeme
used (
https://www.ethercat.org/memberarea/download/ETG8200_V1i0i0_G_R_MailboxGateway.pdf)
what it should do.

2. When a request is done to the master via the mbg a response shorter

than 16 bytes will be zero padded to make it equal to 16 bytes with twincat
mbg, etherlab mbg simply sends a shorter message, which also seems to be
fine.

3. A difference of more importance: There is a discrepancy between the

way the Cnt value is updated for the two different master+mbg combinations.
In some situations this causes a timeout because the request Cnt (coming
from the loader) is equal to the slave Cnt value, which the slave will
ignore and thus a timeout occurs. I have added the raw data tracing for
both the master+mbg combinations if anybody is interested.

I'm not quite sure where to properly fix this, and am thus asking for

some advice/help.

For now I made a retry work-around in the CommandMbg class which simply

retires once with a different Cnt value in the request (Cnt-1 and wrapped
to 1-7) which "solves" the problem. I have attached it as a patch.

I myself would like a clean fix however. If somebody could point me in

the right direction I would be grateful.

Kind regards,
Mark

Mark Verrijt

2021-04-02 11:47:57 UTC

Permalink

Before I forget again, thanks Graeme (and all others involved) for this mbg
implementation in the first place.
It saves us from a lot of hassle when deploying our safety projects. Only
recently we discovered the twinsafe loader didn't work in all cases.

I finally got around to also looking at the mailbox traffic on the ethercat
side. I think I have found the cause.
For twincat the counter used on the ethercat side is separate from the one
received by the mbg.
For etherlab the counter in the message to the mbg is also seemed to be
used on the ethercat side, is that correct, or is something else happening
here?
Depending on the setup it can therefore become possible for two subsequent
requests to the same slave (with requests to others in between) to use the
same counter value.
See data at the bottom of the email.

What is your take on this?

Some other things worth mentioning (for completeness/archive):
- The addressing is done a bit differently:
.Twincat, Datagram header slave address: 0x03e9, Mailbox header address:
0xe000
.Etherlab, Datagram header slave address: 0x0002, Mailbox header
address: 0x03e9
- The counter in the mailbox reply of a slave is separate from the counter
value check in the (next) request.
Though logical, I mention it here explicitly because this messed up my
thinking quite a bit when I hadn't realized this yet.

Regards,
Mark

*** Twincat ***
mbg: -> 10 50 0a 00 e9 03 00 23 00 20 41 80 f9 02 00 00 00 00
ec: -> 0a 00 00 e0 00 13 00 20 41 80 f9 02 00 00 00 00
ec: <- 0a 00 00 e0 00 33 00 30 43 80 f9 02 ea 82 1e 00
mbg: <- 10 50 0a 00 e9 03 00 33 00 30 43 80 f9 02 ea 82 1e 00

mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 80 f9 01 00 00 00 00
ec: -> 0a 00 00 e0 00 23 00 20 41 80 f9 01 00 00 00 00
ec: <- 0a 00 00 e0 00 43 00 30 4b 80 f9 01 01 00 00 00
mbg: <- 10 50 0a 00 e9 03 00 43 00 30 4b 80 f9 01 01 00 00 00

.... other req/rep

mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 01 fe 00 00 00 00 00
ec: -> 0a 00 00 e0 00 33 00 20 41 01 fe 00 00 00 00 00
ec: <- fa 00 00 e0 00 53 00 30 41 01 fe 00 00 08 .....
mbg: <- 00 51 fa 00 e9 03 00 53 00 30 41 01 fe 00 00 08 .....

*** Etherlab ***
mbg: -> 10 50 0a 00 e9 03 00 23 00 20 41 80 f9 02 00 00 00 00
ec: -> 0a 00 e9 03 00 23 00 20 41 80 f9 02 00 00 00 00
ec: <- 0a 00 e9 03 00 13 00 30 43 80 f9 02 ea 82 1e 00
mbg: <- 10 50 0a 00 e9 03 00 13 00 30 43 80 f9 02 ea 82 1e 00

mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 80 f9 01 00 00 00 00
ec: -> 0a 00 e9 03 00 33 00 20 41 80 f9 01 00 00 00 00 <---.
ec: <- 0a 00 e9 03 00 23 00 30 4b 80 f9 01 01 00 00 00 |
mbg: <- 10 50 0a 00 e9 03 00 23 00 30 4b 80 f9 01 01 00 00 00 |
|
Problem occurs due to two subsequent requests |
.... other req/rep to the same slave with the same counter |
|
mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 01 fe 00 00 00 00 00 |
ec: -> 0a 00 e9 03 00 33 00 20 41 01 fe 00 00 00 00 00 <---'
ec: <- No Reply
mbg: <- Timeout

Mark Verrijt

2021-04-07 09:04:48 UTC

Permalink

Hey Graeme,

Thanks for the updated patch. I just tested and it now works fine with
the 3 slave setup (EK1100, EL6910, EK1110).
I did a quick tcpdump to verify, and it is as expected/patched:

mbg: -> 10 50 0a 00 e9 03 00 23 00 20 41 80 f9 02 00 00 00 00
ec: -> 0a 00 e9 03 00 03 00 20 41 80 f9 02 00 00 00 00
ec: <- 0a 00 e9 03 00 13 00 30 43 80 f9 02 ea 82 1e 00
mbg: <- 10 50 0a 00 e9 03 00 13 00 30 43 80 f9 02 ea 82 1e 00

mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 80 f9 01 00 00 00 00
ec: -> 0a 00 e9 03 00 03 00 20 41 80 f9 01 00 00 00 00
ec: <- 0a 00 e9 03 00 23 00 30 4b 80 f9 01 01 00 00 00
mbg: <- 10 50 0a 00 e9 03 00 23 00 30 4b 80 f9 01 01 00 00 00

mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 01 fe 00 00 00 00 00
ec: -> 0a 00 e9 03 00 03 00 20 41 01 fe 00 00 00 00 00
ec: <- fa 00 e9 03 00 33 00 30 41 01 fe 00 00 08 .....
mbg: <- 00 51 fa 00 e9 03 00 33 00 30 41 01 fe 00 00 08 .....

Regards,
Mark

Hi Mark,
Thanks for checking out the EtherCAT comms side. I didn’t find the documentation particularly clear in quite a few areas so not surprised there’s an issue or two.
The ethercat_mbg implementation passed through the CoE header info exactly as the TwinSAFE Loader requested. For standard mailbox communications it looks like the etherlab master always passes a value of 0 to the Cnt parameter, so the slave will always respond. I have updated the patch so that it will pass through 0 for the Cnt parameter for the mbg requests also (attached). Please let me know if there is still an issue (I don’t have anything to test with at the moment).
- Yes. TwinCAT starts its slave addressing at 1000 (0x03e8), whereas the etherlab master starts its slave position addressing at 0. I decided to report slave addresses in the TwinCAT fashion (starting at 1000), plus it helps distinguish the slaves mbg requests as per below. Note: the TwinSAFE loader isn’t interested in the first slave, so it starts talking to the second slave first (1001, 0x03e9).
I don’t think the mailbox header address is used by the slave. It is the datagram header that directs the CoE message to the slave. The previous ethercat_mbg implementation passed the mailbox header unchanged, using any reply with the mailbox header request 0x03e8 and above, and also with the datagram slave address offset from the mailbox header address by 0x03e7 (Note: the datagram slave address is the slave position + 1), to be an mbg response. With the extra datagram slave address vs CoE mailbox header address check you shouldn’t have any problems with the mbg server if you have more than 1000 slaves. This remains the same, with the change that the Cnt value in the header is now cleared.
TwinCAT on the other hand looks like it is setting the Mailbox address to 0xe000 so that the master knows that the request should be handled as an mbg request. Depending on implementation It may then mean you can have ~57344 slaves before TwinCAT has a problem with conflicting addresses. Note: the gateway can only access up to 4080 slaves.
Regards,
Graeme.
Sent: Saturday, 3 April 2021 12:48 am
Subject: Re: [Etherlab-dev] Ethercat mailbox gateway and TwinSAFE loader issues
Before I forget again, thanks Graeme (and all others involved) for this mbg implementation in the first place.
It saves us from a lot of hassle when deploying our safety projects. Only recently we discovered the twinsafe loader didn't work in all cases.
I finally got around to also looking at the mailbox traffic on the ethercat side. I think I have found the cause.
For twincat the counter used on the ethercat side is separate from the one received by the mbg.
For etherlab the counter in the message to the mbg is also seemed to be used on the ethercat side, is that correct, or is something else happening here?
Depending on the setup it can therefore become possible for two subsequent requests to the same slave (with requests to others in between) to use the same counter value.
See data at the bottom of the email.
What is your take on this?
.Twincat, Datagram header slave address: 0x03e9, Mailbox header address: 0xe000
.Etherlab, Datagram header slave address: 0x0002, Mailbox header address: 0x03e9
- The counter in the mailbox reply of a slave is separate from the counter value check in the (next) request.
Though logical, I mention it here explicitly because this messed up my thinking quite a bit when I hadn't realized this yet.
Regards,
Mark
*** Twincat ***
mbg: -> 10 50 0a 00 e9 03 00 23 00 20 41 80 f9 02 00 00 00 00
ec: -> 0a 00 00 e0 00 13 00 20 41 80 f9 02 00 00 00 00
ec: <- 0a 00 00 e0 00 33 00 30 43 80 f9 02 ea 82 1e 00
mbg: <- 10 50 0a 00 e9 03 00 33 00 30 43 80 f9 02 ea 82 1e 00
mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 80 f9 01 00 00 00 00
ec: -> 0a 00 00 e0 00 23 00 20 41 80 f9 01 00 00 00 00
ec: <- 0a 00 00 e0 00 43 00 30 4b 80 f9 01 01 00 00 00
mbg: <- 10 50 0a 00 e9 03 00 43 00 30 4b 80 f9 01 01 00 00 00
.... other req/rep
mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 01 fe 00 00 00 00 00
ec: -> 0a 00 00 e0 00 33 00 20 41 01 fe 00 00 00 00 00
ec: <- fa 00 00 e0 00 53 00 30 41 01 fe 00 00 08 .....
mbg: <- 00 51 fa 00 e9 03 00 53 00 30 41 01 fe 00 00 08 .....
*** Etherlab ***
mbg: -> 10 50 0a 00 e9 03 00 23 00 20 41 80 f9 02 00 00 00 00
ec: -> 0a 00 e9 03 00 23 00 20 41 80 f9 02 00 00 00 00
ec: <- 0a 00 e9 03 00 13 00 30 43 80 f9 02 ea 82 1e 00
mbg: <- 10 50 0a 00 e9 03 00 13 00 30 43 80 f9 02 ea 82 1e 00
mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 80 f9 01 00 00 00 00
ec: -> 0a 00 e9 03 00 33 00 20 41 80 f9 01 00 00 00 00 <---.
ec: <- 0a 00 e9 03 00 23 00 30 4b 80 f9 01 01 00 00 00 |
mbg: <- 10 50 0a 00 e9 03 00 23 00 30 4b 80 f9 01 01 00 00 00 |
|
Problem occurs due to two subsequent requests |
.... other req/rep to the same slave with the same counter |
|
mbg: -> 10 50 0a 00 e9 03 00 33 00 20 41 01 fe 00 00 00 00 00 |
ec: -> 0a 00 e9 03 00 33 00 20 41 01 fe 00 00 00 00 00 <---'
ec: <- No Reply
mbg: <- Timeout

--
Etherlab-dev mailing list
Etherlab-***@etherlab.org
https://lists.etherlab.org/