Discussion:
[etherlab-dev] Multiple mailbox protocols and other issues
Knud Baastrup
2014-07-10 14:04:20 UTC
Permalink
Hello Florian, Gavin, Frank and others

I have attached the complete list of patches that we use for our IgH EtherCAT master:

Patch 1 (ethercat_152_patch_1_eoe.patch):
Ensures that we use the call back functions (with io lock support) when using the IOCTL interface.

Patch 2 (ethercat_152_patch_2_foe.patch):
Wrong datagram used for timeout calculation.

Patch 3 (ethercat_152_patch_3_lrw_datagram.patch):
Input/Output counter were not incremented if new datagram were allocated.
Patch 4 (ethercat_152_patch_4_eoe_mac.patch)
Change the MAC address for eoe0sY devices to real local administrated MAC addresses based on the NIC part of eth0 and the EoE slaves ring position.

Patch 5 (ethercat_152_patch_5_priority_inheritance.patch)
Replaced semaphores with mutexes to utilize priority inheritance and limit impact from lower priority tasks (EtherCAT-EOE) running as sched_other task.

Patch 6 (ethercat_152_patch_6_mailbox.patch)
Alternative solution to Patch 9-10-11 provided by Frank Heckenbach for 1.5.0 (that I did not succeed to get up running on 1.5.2). In this solution I accept that a mailbox read request (e.g. FP-RD) for a given mailbox protocol can return data from any other mailbox protocol running at the same time. The data returned by a read datagram is therefore stored in a separate buffer for each mailbox protocol instead of the datagram data buffer. The mailbox state machines will check and fetch the data from their own buffer instead of the datagram buffer (that is no longer used for mailbox read data). A check_mbox flag is introduced to track when a given slave has an ongoing mailbox read request. In normal case the mailbox state machine will run as previously if no mailbox read request is ongoing, but if a mailbox read-request is ongoing (check_mbox flag is set) it will check its own mailbox buffer (as the ongoing mailbox read request might have returned its data) and otherwise wait until the read request is done and it gets the opportunity to reserve the mailbox for its own read request.

Patch 7 (ethercat_152_patch_7_sdoinfo.patch)
Alternative solution to Patch 28 provided by Frank Heckenback for 1.5.0. More simple but without the same user feed-back.

Patch 8 (ethercat_152_patch_8_ext_queue_protect.patch)
It is due to the usage of the call back function added in Patch 1 also very important to secure protection of the external datagram queue. Also discovered by Frank Heckenback in Patch 23.

Patch 9 (ethercat_152_patch_9_clear_mailbox.patch)
A slightly modified version of Patch 26 provided by Frank Heckenback. The required modification were needed due to the alternative multiple mailbox protocol solution provided in Patch 6.

Patch 10 (ethercat_152_patch_10_scan_skip_stats.patch)
No reason to write output statistics in syslog when issuing a slave scanning where UNMATCHED datagrams are expected behavior.


Venlig hilsen / Best regards,
Knud Baastrup
DEIF Wind Power Technology
SW Developer
Direct.: +45 9614 8458
E-mail: ***@deif.com<mailto:***@deif.com>
---------------------------------------------------------------
[cid:***@01CF9C58.9B9CBFB0]Retrofit your Vestas COTAS controller and optimize availability that will improve your annual energy generation, reduce service cost and extend the lifetime of your turbine.
V27 V39 V44 V47
Read more about DEIF's solutions to retrofit your turbines on our website<http://www.deifwindpower.com/retrofit.aspx?utm_source=Retrofit&utm_medium=email%20signatur&utm_term=Retrofit%2BVestas%2BCOTAS&utm_content=textlink&utm_campaign=Retrofit>
Florian Pose
2014-07-10 15:41:43 UTC
Permalink
Hello Kund,
thanks you very much. I will inspect them and come back to you.
--
Best regards,
Florian Pose

http://etherlab.org
Gavin Lambert
2014-08-06 07:40:40 UTC
Permalink
Post by Knud Baastrup
Wrong datagram used for timeout calculation.
You'll be happy to learn that this patch is not required if you use the
latest code; it was fixed in 8bb574da5da2.
Post by Knud Baastrup
Patch 4 (ethercat_152_patch_4_eoe_mac.patch)
Change the MAC address for eoe0sY devices to real local
administrated MAC addresses based on the NIC part of eth0 and
the EoE slaves ring position.
In the line "if (ETH_ALEN >= 5)", shouldn't that be > 5 or >= 6? Also, if
this test fails (which seems unlikely, but then why test if it can't
happen?), this will leave the address uninitialized, which seems
undesirable. Maybe it should fall back to the prior code in that case?
Post by Knud Baastrup
Patch 5 (ethercat_152_patch_5_priority_inheritance.patch)
Replaced semaphores with mutexes to utilize priority inheritance
and limit impact from lower priority tasks (EtherCAT-EOE) running
as sched_other task.
While I like the idea of using RT-mutexes, they do have a minimum kernel
version (I'm not sure exactly what that is except that it's somewhere in the
2.6.x series) and currently Etherlab provides drivers for some kernel
versions that are before that cutoff, I suspect. (And do rt_mutexes and
RTAI play nicely together or not?) So this might break compatibility, and
so possibly should be a configure option. Or maybe nobody cares about those
older kernel versions any more? (I don't personally, I'm just wondering if
it might be a concern.)

Also I found a few cases where "down" and "up" hadn't been changed to
"rt_mutex_lock" etc. Not sure if this was the result of a patch application
failure or if this was code added since the patches' base version.
Post by Knud Baastrup
Patch 6 (ethercat_152_patch_6_mailbox.patch)
Alternative solution to Patch 9-10-11 provided by Frank Heckenbach for
1.5.0 (that I did not succeed to get up running on 1.5.2).
Did you look at the updated-to-1.5.2 patches that I posted?
Post by Knud Baastrup
In this solution I accept that a mailbox read request (e.g. FP-RD) for
a given mailbox protocol can return data from any other mailbox protocol
running at the same time. [...]
In master/master.c near line 1240, on receipt of a datagram you're searching
through the slave list to check its mailbox settings. This code appears
unsafe in the case when this search fails (which presumably could occur eg.
if a datagram with a corrupted address arrives or if slave scanning has just
started and cleared master->slaves).

Also this seems to generate quite a lot of "Await configured mailbox
address" spam at debug level 1 even when no mailbox activity is taking
place..?

When compiling with a recent kernel version (3.13) I needed to #include
<linux/rtmutex.h> in master/slave.h in order to compile several files (the
first is master/fmmu_config.c). This doesn't appear to be needed in 3.2
however (or by some of the other .c files); I guess the indirect includes
have changed?
Post by Knud Baastrup
Patch 10 (ethercat_152_patch_10_scan_skip_stats.patch)
No reason to write output statistics in syslog when issuing a slave
scanning
Post by Knud Baastrup
where UNMATCHED datagrams are expected behavior.
I'm not sure I follow this one. Why are unmatched datagrams expected?
Also, this patch isn't going to stop them printing, just delay it until the
scan completes.

Otherwise, this patchset seems to work ok. I've only given it fairly basic
testing thus far though.

However there was a bit of fuzz and other conflicts when trying to apply
these to the latest HG source, so in the interests of making it easier for
Florian and anyone else using the latest source to examine and test these
patches, I've attached updated versions. These have only had the following
changes:
- de-fuzzed based on 8dd49f6f6d32 (close to stable-1.5 tip).
- various tabs converted to whitespace and related minor whitespace
changes.
- I noticed some inconsistent newline-brace styles but I left those
alone.
- omitted patch 2 as specified above (it was empty after defuzzing).
- converted a few extra down/ups in patch 5 as specified above.
- added #include to master/slave.h in patch 6 as specified above.

In particular I've basically just included changes required to compile; I
haven't tried to fix any of the other possible issues mentioned above.

I've also included a series file, so in theory it should immediately be
MQ-compatible if extracted into the .hg dir. (You might need to "hg qq -c
knud" first.)

Regards,
Gavin Lambert
Knud Baastrup
2014-08-08 09:26:15 UTC
Permalink
Thanks for your feed-back Gavin!

I have in-lined some comments!

Patch 1 to 5 were prepared by a colleague some time back but were at that time not forwarded to the etherlab-dev.

Mvh. Knud

-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 6. august 2014 09:41
To: Knud Baastrup
Cc: 'Florian Pose'; etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
Wrong datagram used for timeout calculation.
You'll be happy to learn that this patch is not required if you use the latest code; it was fixed in 8bb574da5da2.
[Knud] Great, we use the ec403cf308eb version released 2013-02-12 as baseline and yes I can see that the fix were provided 2 days later in 8bb574da5da2 by 2013-02-14.
Post by Knud Baastrup
Patch 4 (ethercat_152_patch_4_eoe_mac.patch)
Change the MAC address for eoe0sY devices to real local administrated
MAC addresses based on the NIC part of eth0 and the EoE slaves ring
position.
In the line "if (ETH_ALEN >= 5)", shouldn't that be > 5 or >= 6? Also, if this test fails (which seems unlikely, but then why test if it can't happen?), this will leave the address uninitialized, which seems undesirable. Maybe it should fall back to the prior code in that case?
[Knud] I agree that it should be either > 5 or >= 6. I will correct and test.
Post by Knud Baastrup
Patch 5 (ethercat_152_patch_5_priority_inheritance.patch)
Replaced semaphores with mutexes to utilize priority inheritance and
limit impact from lower priority tasks (EtherCAT-EOE) running as
sched_other task.
While I like the idea of using RT-mutexes, they do have a minimum kernel version (I'm not sure exactly what that is except that it's somewhere in the 2.6.x series) and currently Etherlab provides drivers for some kernel versions that are before that cutoff, I suspect. (And do rt_mutexes and RTAI play nicely together or not?) So this might break compatibility, and so possibly should be a configure option. Or maybe nobody cares about those older kernel versions any more? (I don't personally, I'm just wondering if it might be a concern.)
[Knud] I agree that it might need to be a configurable option. I do not know how to prepare that in a proper way and will leave it Florian or others to decide.

Also I found a few cases where "down" and "up" hadn't been changed to "rt_mutex_lock" etc. Not sure if this was the result of a patch application failure or if this was code added since the patches' base version.
[Knud] Probably because I have used ec403cf308eb as base version for the patches.
Post by Knud Baastrup
Patch 6 (ethercat_152_patch_6_mailbox.patch)
Alternative solution to Patch 9-10-11 provided by Frank Heckenbach for
1.5.0 (that I did not succeed to get up running on 1.5.2).
Did you look at the updated-to-1.5.2 patches that I posted?
[Knud] The ones named ethercat-1.5.0-patches-v2.tar.bz2 ? No I was at that point quite happy with my current alternative implementation.
Post by Knud Baastrup
In this solution I accept that a mailbox read request (e.g. FP-RD) for
a given mailbox protocol can return data from any other mailbox
protocol running at the same time. [...]
In master/master.c near line 1240, on receipt of a datagram you're searching through the slave list to check its mailbox settings. This code appears unsafe in the case when this search fails (which presumably could occur eg.
if a datagram with a corrupted address arrives or if slave scanning has just started and cleared master->slaves).
[Knud] I agree, I must ensure that the datagram is ignored if no matching station address. In the current patch such a datagram will be managed by the last slave and that were not the intention. I will correct and test.

Also this seems to generate quite a lot of "Await configured mailbox address" spam at debug level 1 even when no mailbox activity is taking place..?
[Knud] I will remove the debug message, it is not that relevant as you will know from other messages if mailbox offset address configuration fails for a slave. I will correct and test.

When compiling with a recent kernel version (3.13) I needed to #include <linux/rtmutex.h> in master/slave.h in order to compile several files (the first is master/fmmu_config.c). This doesn't appear to be needed in 3.2 however (or by some of the other .c files); I guess the indirect includes have changed?
[Knud] I have been testing with 3.4.37-rt51
Post by Knud Baastrup
Patch 10 (ethercat_152_patch_10_scan_skip_stats.patch)
No reason to write output statistics in syslog when issuing a slave
scanning
Post by Knud Baastrup
where UNMATCHED datagrams are expected behavior.
I'm not sure I follow this one. Why are unmatched datagrams expected?
Also, this patch isn't going to stop them printing, just delay it until the scan completes.
[Knud] Unprocessed datagrams will appear as UNMATCHED when you initiate an ethercat rescan but I just need to know that a rescan has taken place (as I already know that this will impact the EtherCAT bus and ongoing datagrams). After the rescan I want indeed to know about UNMATCHED datagrams as these in that case will appears for other reasons not related to the slave scanning.

Otherwise, this patchset seems to work ok. I've only given it fairly basic testing thus far though.

However there was a bit of fuzz and other conflicts when trying to apply these to the latest HG source, so in the interests of making it easier for Florian and anyone else using the latest source to examine and test these patches, I've attached updated versions. These have only had the following
changes:
- de-fuzzed based on 8dd49f6f6d32 (close to stable-1.5 tip).
- various tabs converted to whitespace and related minor whitespace changes.
- I noticed some inconsistent newline-brace styles but I left those alone.
- omitted patch 2 as specified above (it was empty after defuzzing).
- converted a few extra down/ups in patch 5 as specified above.
- added #include to master/slave.h in patch 6 as specified above.
[Knud] Thanks :-)

In particular I've basically just included changes required to compile; I haven't tried to fix any of the other possible issues mentioned above.
[Knud] I will apply the remaining changes and send some new patch files based on the ones you updated for me.

I've also included a series file, so in theory it should immediately be MQ-compatible if extracted into the .hg dir. (You might need to "hg qq -c knud" first.)
[Knud] Great, I used some time to understand hg and realize now how great the MQ extension is to manage patches. We use however GIT for our patches, so I will try to apply the same in GIT.

Regards,
Gavin Lambert
Knud Baastrup
2014-08-22 09:25:15 UTC
Permalink
Hi Gavin, Florian and others,

I have made a few changes in the patches based on your previous comments and as well added a new 12_sdo_directory.patch that I believe will solve the issue with the SDO directory fetching that is very time consuming task during slave scanning. I have attached the patches along with a series files that works well with the hg mq commands.

Patch 12 (12_sdo_directory.patch)
I have added the define EC_SKIP_SDO_DICT that when set will skip the fsm_master SDO directory fetching during slave scanning. I have instead added the SDO directory scanning to the fsm_slave (where it belongs) where a directory fetch can be requested in the same manner as an SDO/register/FoE/SoE request. I have added an ecrt_master_dict_upload in ecrt.h and as well updated the ethercat tool to perform an directory request the first time an "ethercat sdos" command is executed.

Mvh. Knud

-----Original Message-----
From: etherlab-dev-***@etherlab.org [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Knud Baastrup
Sent: 8. august 2014 11:26
To: Gavin Lambert
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues

Thanks for your feed-back Gavin!

I have in-lined some comments!

Patch 1 to 5 were prepared by a colleague some time back but were at that time not forwarded to the etherlab-dev.

Mvh. Knud

-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 6. august 2014 09:41
To: Knud Baastrup
Cc: 'Florian Pose'; etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
Wrong datagram used for timeout calculation.
You'll be happy to learn that this patch is not required if you use the latest code; it was fixed in 8bb574da5da2.
[Knud] Great, we use the ec403cf308eb version released 2013-02-12 as baseline and yes I can see that the fix were provided 2 days later in 8bb574da5da2 by 2013-02-14.
Post by Knud Baastrup
Patch 4 (ethercat_152_patch_4_eoe_mac.patch)
Change the MAC address for eoe0sY devices to real local administrated
MAC addresses based on the NIC part of eth0 and the EoE slaves ring
position.
In the line "if (ETH_ALEN >= 5)", shouldn't that be > 5 or >= 6? Also, if this test fails (which seems unlikely, but then why test if it can't happen?), this will leave the address uninitialized, which seems undesirable. Maybe it should fall back to the prior code in that case?
[Knud] I agree that it should be either > 5 or >= 6. I will correct and test.
Post by Knud Baastrup
Patch 5 (ethercat_152_patch_5_priority_inheritance.patch)
Replaced semaphores with mutexes to utilize priority inheritance and
limit impact from lower priority tasks (EtherCAT-EOE) running as
sched_other task.
While I like the idea of using RT-mutexes, they do have a minimum kernel version (I'm not sure exactly what that is except that it's somewhere in the 2.6.x series) and currently Etherlab provides drivers for some kernel versions that are before that cutoff, I suspect. (And do rt_mutexes and RTAI play nicely together or not?) So this might break compatibility, and so possibly should be a configure option. Or maybe nobody cares about those older kernel versions any more? (I don't personally, I'm just wondering if it might be a concern.) [Knud] I agree that it might need to be a configurable option. I do not know how to prepare that in a proper way and will leave it Florian or others to decide.

Also I found a few cases where "down" and "up" hadn't been changed to "rt_mutex_lock" etc. Not sure if this was the result of a patch application failure or if this was code added since the patches' base version.
[Knud] Probably because I have used ec403cf308eb as base version for the patches.
Post by Knud Baastrup
Patch 6 (ethercat_152_patch_6_mailbox.patch)
Alternative solution to Patch 9-10-11 provided by Frank Heckenbach for
1.5.0 (that I did not succeed to get up running on 1.5.2).
Did you look at the updated-to-1.5.2 patches that I posted?
[Knud] The ones named ethercat-1.5.0-patches-v2.tar.bz2 ? No I was at that point quite happy with my current alternative implementation.
Post by Knud Baastrup
In this solution I accept that a mailbox read request (e.g. FP-RD) for
a given mailbox protocol can return data from any other mailbox
protocol running at the same time. [...]
In master/master.c near line 1240, on receipt of a datagram you're searching through the slave list to check its mailbox settings. This code appears unsafe in the case when this search fails (which presumably could occur eg.
if a datagram with a corrupted address arrives or if slave scanning has just started and cleared master->slaves).
[Knud] I agree, I must ensure that the datagram is ignored if no matching station address. In the current patch such a datagram will be managed by the last slave and that were not the intention. I will correct and test.

Also this seems to generate quite a lot of "Await configured mailbox address" spam at debug level 1 even when no mailbox activity is taking place..?
[Knud] I will remove the debug message, it is not that relevant as you will know from other messages if mailbox offset address configuration fails for a slave. I will correct and test.

When compiling with a recent kernel version (3.13) I needed to #include <linux/rtmutex.h> in master/slave.h in order to compile several files (the first is master/fmmu_config.c). This doesn't appear to be needed in 3.2 however (or by some of the other .c files); I guess the indirect includes have changed?
[Knud] I have been testing with 3.4.37-rt51
Post by Knud Baastrup
Patch 10 (ethercat_152_patch_10_scan_skip_stats.patch)
No reason to write output statistics in syslog when issuing a slave
scanning
Post by Knud Baastrup
where UNMATCHED datagrams are expected behavior.
I'm not sure I follow this one. Why are unmatched datagrams expected?
Also, this patch isn't going to stop them printing, just delay it until the scan completes.
[Knud] Unprocessed datagrams will appear as UNMATCHED when you initiate an ethercat rescan but I just need to know that a rescan has taken place (as I already know that this will impact the EtherCAT bus and ongoing datagrams). After the rescan I want indeed to know about UNMATCHED datagrams as these in that case will appears for other reasons not related to the slave scanning.

Otherwise, this patchset seems to work ok. I've only given it fairly basic testing thus far though.

However there was a bit of fuzz and other conflicts when trying to apply these to the latest HG source, so in the interests of making it easier for Florian and anyone else using the latest source to examine and test these patches, I've attached updated versions. These have only had the following
changes:
- de-fuzzed based on 8dd49f6f6d32 (close to stable-1.5 tip).
- various tabs converted to whitespace and related minor whitespace changes.
- I noticed some inconsistent newline-brace styles but I left those alone.
- omitted patch 2 as specified above (it was empty after defuzzing).
- converted a few extra down/ups in patch 5 as specified above.
- added #include to master/slave.h in patch 6 as specified above.
[Knud] Thanks :-)

In particular I've basically just included changes required to compile; I haven't tried to fix any of the other possible issues mentioned above.
[Knud] I will apply the remaining changes and send some new patch files based on the ones you updated for me.

I've also included a series file, so in theory it should immediately be MQ-compatible if extracted into the .hg dir. (You might need to "hg qq -c knud" first.) [Knud] Great, I used some time to understand hg and realize now how great the MQ extension is to manage patches. We use however GIT for our patches, so I will try to apply the same in GIT.

Regards,
Gavin Lambert
Knud Baastrup
2014-09-17 08:21:45 UTC
Permalink
Hi Gavin, Florian and others,

I have updated the 06_mailbox.patch that had a major bug that disrupted the usage of the LRD and LRW commands. I have additionally added the 13_domain_lock.patch that solves an issue with concurrent access to the ecrt_domain_queue() and ecrt_domain_process() functions. These functions are likely to be called from multiple application tasks managing different domains and I believe the master should protect itself as the lack of protection can result in a corrupted master datagram queue. The error revealed itself as UNMATCHED datagrams in syslog with indexes no longer present in the master datagram queue.

Mvh. Knud


-----Original Message-----
From: etherlab-dev-***@etherlab.org [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Knud Baastrup
Sent: 22. august 2014 11:25
To: Gavin Lambert
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues

Hi Gavin, Florian and others,

I have made a few changes in the patches based on your previous comments and as well added a new 12_sdo_directory.patch that I believe will solve the issue with the SDO directory fetching that is very time consuming task during slave scanning. I have attached the patches along with a series files that works well with the hg mq commands.

Patch 12 (12_sdo_directory.patch)
I have added the define EC_SKIP_SDO_DICT that when set will skip the fsm_master SDO directory fetching during slave scanning. I have instead added the SDO directory scanning to the fsm_slave (where it belongs) where a directory fetch can be requested in the same manner as an SDO/register/FoE/SoE request. I have added an ecrt_master_dict_upload in ecrt.h and as well updated the ethercat tool to perform an directory request the first time an "ethercat sdos" command is executed.

Mvh. Knud

-----Original Message-----
From: etherlab-dev-***@etherlab.org [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Knud Baastrup
Sent: 8. august 2014 11:26
To: Gavin Lambert
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues

Thanks for your feed-back Gavin!

I have in-lined some comments!

Patch 1 to 5 were prepared by a colleague some time back but were at that time not forwarded to the etherlab-dev.

Mvh. Knud

-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 6. august 2014 09:41
To: Knud Baastrup
Cc: 'Florian Pose'; etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
Wrong datagram used for timeout calculation.
You'll be happy to learn that this patch is not required if you use the latest code; it was fixed in 8bb574da5da2.
[Knud] Great, we use the ec403cf308eb version released 2013-02-12 as baseline and yes I can see that the fix were provided 2 days later in 8bb574da5da2 by 2013-02-14.
Post by Knud Baastrup
Patch 4 (ethercat_152_patch_4_eoe_mac.patch)
Change the MAC address for eoe0sY devices to real local administrated
MAC addresses based on the NIC part of eth0 and the EoE slaves ring
position.
In the line "if (ETH_ALEN >= 5)", shouldn't that be > 5 or >= 6? Also, if this test fails (which seems unlikely, but then why test if it can't happen?), this will leave the address uninitialized, which seems undesirable. Maybe it should fall back to the prior code in that case?
[Knud] I agree that it should be either > 5 or >= 6. I will correct and test.
Post by Knud Baastrup
Patch 5 (ethercat_152_patch_5_priority_inheritance.patch)
Replaced semaphores with mutexes to utilize priority inheritance and
limit impact from lower priority tasks (EtherCAT-EOE) running as
sched_other task.
While I like the idea of using RT-mutexes, they do have a minimum kernel version (I'm not sure exactly what that is except that it's somewhere in the 2.6.x series) and currently Etherlab provides drivers for some kernel versions that are before that cutoff, I suspect. (And do rt_mutexes and RTAI play nicely together or not?) So this might break compatibility, and so possibly should be a configure option. Or maybe nobody cares about those older kernel versions any more? (I don't personally, I'm just wondering if it might be a concern.) [Knud] I agree that it might need to be a configurable option. I do not know how to prepare that in a proper way and will leave it Florian or others to decide.

Also I found a few cases where "down" and "up" hadn't been changed to "rt_mutex_lock" etc. Not sure if this was the result of a patch application failure or if this was code added since the patches' base version.
[Knud] Probably because I have used ec403cf308eb as base version for the patches.
Post by Knud Baastrup
Patch 6 (ethercat_152_patch_6_mailbox.patch)
Alternative solution to Patch 9-10-11 provided by Frank Heckenbach for
1.5.0 (that I did not succeed to get up running on 1.5.2).
Did you look at the updated-to-1.5.2 patches that I posted?
[Knud] The ones named ethercat-1.5.0-patches-v2.tar.bz2 ? No I was at that point quite happy with my current alternative implementation.
Post by Knud Baastrup
In this solution I accept that a mailbox read request (e.g. FP-RD) for
a given mailbox protocol can return data from any other mailbox
protocol running at the same time. [...]
In master/master.c near line 1240, on receipt of a datagram you're searching through the slave list to check its mailbox settings. This code appears unsafe in the case when this search fails (which presumably could occur eg.
if a datagram with a corrupted address arrives or if slave scanning has just started and cleared master->slaves).
[Knud] I agree, I must ensure that the datagram is ignored if no matching station address. In the current patch such a datagram will be managed by the last slave and that were not the intention. I will correct and test.

Also this seems to generate quite a lot of "Await configured mailbox address" spam at debug level 1 even when no mailbox activity is taking place..?
[Knud] I will remove the debug message, it is not that relevant as you will know from other messages if mailbox offset address configuration fails for a slave. I will correct and test.

When compiling with a recent kernel version (3.13) I needed to #include <linux/rtmutex.h> in master/slave.h in order to compile several files (the first is master/fmmu_config.c). This doesn't appear to be needed in 3.2 however (or by some of the other .c files); I guess the indirect includes have changed?
[Knud] I have been testing with 3.4.37-rt51
Post by Knud Baastrup
Patch 10 (ethercat_152_patch_10_scan_skip_stats.patch)
No reason to write output statistics in syslog when issuing a slave
scanning
Post by Knud Baastrup
where UNMATCHED datagrams are expected behavior.
I'm not sure I follow this one. Why are unmatched datagrams expected?
Also, this patch isn't going to stop them printing, just delay it until the scan completes.
[Knud] Unprocessed datagrams will appear as UNMATCHED when you initiate an ethercat rescan but I just need to know that a rescan has taken place (as I already know that this will impact the EtherCAT bus and ongoing datagrams). After the rescan I want indeed to know about UNMATCHED datagrams as these in that case will appears for other reasons not related to the slave scanning.

Otherwise, this patchset seems to work ok. I've only given it fairly basic testing thus far though.

However there was a bit of fuzz and other conflicts when trying to apply these to the latest HG source, so in the interests of making it easier for Florian and anyone else using the latest source to examine and test these patches, I've attached updated versions. These have only had the following
changes:
- de-fuzzed based on 8dd49f6f6d32 (close to stable-1.5 tip).
- various tabs converted to whitespace and related minor whitespace changes.
- I noticed some inconsistent newline-brace styles but I left those alone.
- omitted patch 2 as specified above (it was empty after defuzzing).
- converted a few extra down/ups in patch 5 as specified above.
- added #include to master/slave.h in patch 6 as specified above.
[Knud] Thanks :-)

In particular I've basically just included changes required to compile; I haven't tried to fix any of the other possible issues mentioned above.
[Knud] I will apply the remaining changes and send some new patch files based on the ones you updated for me.

I've also included a series file, so in theory it should immediately be MQ-compatible if extracted into the .hg dir. (You might need to "hg qq -c knud" first.) [Knud] Great, I used some time to understand hg and realize now how great the MQ extension is to manage patches. We use however GIT for our patches, so I will try to apply the same in GIT.

Regards,
Gavin Lambert
Knud Baastrup
2015-02-02 07:17:32 UTC
Permalink
Hi Gavin, Florian and others,

I will just update you on some additional pathes I have prepared for EtherCAT Master. I have attached the complete set of patches that we currently use, but only the below patches have been updated added.

04_eoe_mac.patch:
The EoE MAC address is now derived from the NIC part of the first global unique MAC address of the linked list of available network interfaces or otherwise the MAC address used by the EtherCAT master. The EoE MAC address will get the format 02:NIC:NIC:NIC:RP:RP where NIC comes from the unique MAC
address (if available) and RP is the ring position of the EoE slave.

14_fix_string_handling.patch - Thanks to Per Noergaard Christensen:
Fix string handling in EtherCAT tool (DataTypeHandler.cpp). Add null termination when receiving more unlegal characters than used.

15_ignore_mailbox_settings_if_corrupted_sii.patch:
Ignore mailbox settings if invalid values read from corrupted sii file.

16_improved_ethercat_rescan_performance.patch:
The SII data and PDOs will now be stored when the EtherCAT master is in its operation phase. The stored SII data and PDOs will be detached from the slaves prior to a scanning and re-attached during the scanning without the need to fetch the SII data and PDOs once again. The SII data and PDOs will however only be stored if the slave have a serial number defined as this serial number will be used when re-attaching the SII data and PDOs.

BR, Knud Baastrup


-----Original Message-----
From: etherlab-dev-***@etherlab.org [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Knud Baastrup
Sent: 17. september 2014 10:22
To: Gavin Lambert
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues

Hi Gavin, Florian and others,

I have updated the 06_mailbox.patch that had a major bug that disrupted the usage of the LRD and LRW commands. I have additionally added the 13_domain_lock.patch that solves an issue with concurrent access to the ecrt_domain_queue() and ecrt_domain_process() functions. These functions are likely to be called from multiple application tasks managing different domains and I believe the master should protect itself as the lack of protection can result in a corrupted master datagram queue. The error revealed itself as UNMATCHED datagrams in syslog with indexes no longer present in the master datagram queue.

Mvh. Knud


-----Original Message-----
From: etherlab-dev-***@etherlab.org [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Knud Baastrup
Sent: 22. august 2014 11:25
To: Gavin Lambert
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues

Hi Gavin, Florian and others,

I have made a few changes in the patches based on your previous comments and as well added a new 12_sdo_directory.patch that I believe will solve the issue with the SDO directory fetching that is very time consuming task during slave scanning. I have attached the patches along with a series files that works well with the hg mq commands.

Patch 12 (12_sdo_directory.patch)
I have added the define EC_SKIP_SDO_DICT that when set will skip the fsm_master SDO directory fetching during slave scanning. I have instead added the SDO directory scanning to the fsm_slave (where it belongs) where a directory fetch can be requested in the same manner as an SDO/register/FoE/SoE request. I have added an ecrt_master_dict_upload in ecrt.h and as well updated the ethercat tool to perform an directory request the first time an "ethercat sdos" command is executed.

Mvh. Knud

-----Original Message-----
From: etherlab-dev-***@etherlab.org [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Knud Baastrup
Sent: 8. august 2014 11:26
To: Gavin Lambert
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues

Thanks for your feed-back Gavin!

I have in-lined some comments!

Patch 1 to 5 were prepared by a colleague some time back but were at that time not forwarded to the etherlab-dev.

Mvh. Knud

-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 6. august 2014 09:41
To: Knud Baastrup
Cc: 'Florian Pose'; etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
Wrong datagram used for timeout calculation.
You'll be happy to learn that this patch is not required if you use the latest code; it was fixed in 8bb574da5da2.
[Knud] Great, we use the ec403cf308eb version released 2013-02-12 as baseline and yes I can see that the fix were provided 2 days later in 8bb574da5da2 by 2013-02-14.
Post by Knud Baastrup
Patch 4 (ethercat_152_patch_4_eoe_mac.patch)
Change the MAC address for eoe0sY devices to real local administrated
MAC addresses based on the NIC part of eth0 and the EoE slaves ring
position.
In the line "if (ETH_ALEN >= 5)", shouldn't that be > 5 or >= 6? Also, if this test fails (which seems unlikely, but then why test if it can't happen?), this will leave the address uninitialized, which seems undesirable. Maybe it should fall back to the prior code in that case?
[Knud] I agree that it should be either > 5 or >= 6. I will correct and test.
Post by Knud Baastrup
Patch 5 (ethercat_152_patch_5_priority_inheritance.patch)
Replaced semaphores with mutexes to utilize priority inheritance and
limit impact from lower priority tasks (EtherCAT-EOE) running as
sched_other task.
While I like the idea of using RT-mutexes, they do have a minimum kernel version (I'm not sure exactly what that is except that it's somewhere in the 2.6.x series) and currently Etherlab provides drivers for some kernel versions that are before that cutoff, I suspect. (And do rt_mutexes and RTAI play nicely together or not?) So this might break compatibility, and so possibly should be a configure option. Or maybe nobody cares about those older kernel versions any more? (I don't personally, I'm just wondering if it might be a concern.) [Knud] I agree that it might need to be a configurable option. I do not know how to prepare that in a proper way and will leave it Florian or others to decide.

Also I found a few cases where "down" and "up" hadn't been changed to "rt_mutex_lock" etc. Not sure if this was the result of a patch application failure or if this was code added since the patches' base version.
[Knud] Probably because I have used ec403cf308eb as base version for the patches.
Post by Knud Baastrup
Patch 6 (ethercat_152_patch_6_mailbox.patch)
Alternative solution to Patch 9-10-11 provided by Frank Heckenbach for
1.5.0 (that I did not succeed to get up running on 1.5.2).
Did you look at the updated-to-1.5.2 patches that I posted?
[Knud] The ones named ethercat-1.5.0-patches-v2.tar.bz2 ? No I was at that point quite happy with my current alternative implementation.
Post by Knud Baastrup
In this solution I accept that a mailbox read request (e.g. FP-RD) for
a given mailbox protocol can return data from any other mailbox
protocol running at the same time. [...]
In master/master.c near line 1240, on receipt of a datagram you're searching through the slave list to check its mailbox settings. This code appears unsafe in the case when this search fails (which presumably could occur eg.
if a datagram with a corrupted address arrives or if slave scanning has just started and cleared master->slaves).
[Knud] I agree, I must ensure that the datagram is ignored if no matching station address. In the current patch such a datagram will be managed by the last slave and that were not the intention. I will correct and test.

Also this seems to generate quite a lot of "Await configured mailbox address" spam at debug level 1 even when no mailbox activity is taking place..?
[Knud] I will remove the debug message, it is not that relevant as you will know from other messages if mailbox offset address configuration fails for a slave. I will correct and test.

When compiling with a recent kernel version (3.13) I needed to #include <linux/rtmutex.h> in master/slave.h in order to compile several files (the first is master/fmmu_config.c). This doesn't appear to be needed in 3.2 however (or by some of the other .c files); I guess the indirect includes have changed?
[Knud] I have been testing with 3.4.37-rt51
Post by Knud Baastrup
Patch 10 (ethercat_152_patch_10_scan_skip_stats.patch)
No reason to write output statistics in syslog when issuing a slave
scanning
Post by Knud Baastrup
where UNMATCHED datagrams are expected behavior.
I'm not sure I follow this one. Why are unmatched datagrams expected?
Also, this patch isn't going to stop them printing, just delay it until the scan completes.
[Knud] Unprocessed datagrams will appear as UNMATCHED when you initiate an ethercat rescan but I just need to know that a rescan has taken place (as I already know that this will impact the EtherCAT bus and ongoing datagrams). After the rescan I want indeed to know about UNMATCHED datagrams as these in that case will appears for other reasons not related to the slave scanning.

Otherwise, this patchset seems to work ok. I've only given it fairly basic testing thus far though.

However there was a bit of fuzz and other conflicts when trying to apply these to the latest HG source, so in the interests of making it easier for Florian and anyone else using the latest source to examine and test these patches, I've attached updated versions. These have only had the following
changes:
- de-fuzzed based on 8dd49f6f6d32 (close to stable-1.5 tip).
- various tabs converted to whitespace and related minor whitespace changes.
- I noticed some inconsistent newline-brace styles but I left those alone.
- omitted patch 2 as specified above (it was empty after defuzzing).
- converted a few extra down/ups in patch 5 as specified above.
- added #include to master/slave.h in patch 6 as specified above.
[Knud] Thanks :-)

In particular I've basically just included changes required to compile; I haven't tried to fix any of the other possible issues mentioned above.
[Knud] I will apply the remaining changes and send some new patch files based on the ones you updated for me.

I've also included a series file, so in theory it should immediately be MQ-compatible if extracted into the .hg dir. (You might need to "hg qq -c knud" first.) [Knud] Great, I used some time to understand hg and realize now how great the MQ extension is to manage patches. We use however GIT for our patches, so I will try to apply the same in GIT.

Regards,
Gavin Lambert
Gavin Lambert
2015-02-03 00:04:55 UTC
Permalink
Post by Knud Baastrup
The SII data and PDOs will now be stored when the EtherCAT master is in
its
Post by Knud Baastrup
operation phase. The stored SII data and PDOs will be detached from the
slaves prior to a scanning and re-attached during the scanning without the
need to fetch the SII data and PDOs once again. The SII data and PDOs will
however only be stored if the slave have a serial number defined as this
serial number will be used when re-attaching the SII data and PDOs.
Ooh, thanks for that one. That's one of the performance holes that I was
planning on investigating myself soonish.

(A somewhat related one is that when a slave drops to SAFEOP+ERROR as a
result of a comms watchdog error, it *should* be safe for the master to
bring it straight up to OP, especially when the slave has a serial and can
be unambiguously identified, but currently it's doing a full back-to-PREOP
reconfigure.)

I've been meaning to post the full patchset that I'm using at the moment (in
the hopes that a few pieces at least can get integrated), but I'm still
investigating a few issues (and working on unrelated things), so it's not
quite ready yet. Although (at least partly due to inertia) I'm currently
using Frank's mailbox patches rather than yours. :)
Graeme Foot
2015-02-03 00:37:53 UTC
Permalink
Post by Knud Baastrup
The SII data and PDOs will now be stored when the EtherCAT master is in
its
Post by Knud Baastrup
operation phase. The stored SII data and PDOs will be detached from
the slaves prior to a scanning and re-attached during the scanning
without the need to fetch the SII data and PDOs once again. The SII
data and PDOs will however only be stored if the slave have a serial
number defined as this serial number will be used when re-attaching the SII data and PDOs.
Ooh, thanks for that one. That's one of the performance holes that I was planning on investigating myself soonish.
Hi,

None of my modules seem to have serial numbers (Beckhoff IO and yaskawa amps), or is that a bug that's got a patch? I'm running the original 1.5.2 (+ misc patches).

G.
Gavin Lambert
2015-02-03 00:59:47 UTC
Permalink
Sent: Tuesday, 3 February 2015 13:38
To: Gavin Lambert; 'Knud Baastrup'
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Gavin Lambert
Post by Knud Baastrup
The SII data and PDOs will now be stored when the EtherCAT master is in
its
Post by Knud Baastrup
operation phase. The stored SII data and PDOs will be detached from
the slaves prior to a scanning and re-attached during the scanning
without the need to fetch the SII data and PDOs once again. The SII
data and PDOs will however only be stored if the slave have a serial
number defined as this serial number will be used when re-attaching
the
SII data and PDOs.
Post by Gavin Lambert
Ooh, thanks for that one. That's one of the performance holes that I was
planning on investigating myself soonish.
None of my modules seem to have serial numbers (Beckhoff IO and yaskawa
amps), or is that a bug that's got a patch? I'm running the original
1.5.2
(+ misc patches).
It's not a bug, or really related to the master software at all.

It's up to the device manufacturer whether a given device has a serial
number or not. In my case they do, but I'm mostly using in-house hardware.
Beckhoff modules as a general rule don't seem to have serial numbers.

This is separate from the "alias" which is used in network addressing and is
typically (but optionally) set by the network designer via the master
configurator. The EtherLab master lacks specific commands to set aliases
but it can recognise aliases set by other masters, and some slaves support a
CoE download to reconfigure their alias (which can be accessed via the
command line), and some others have dipswitches. Generally aliases are only
needed at "tree points" in the network graph, so if you're only using simple
chains they're less useful.

Technically the serial number can be altered by the master / network
designer as well via an EEPROM download, but this is less "encouraged".
Knud Baastrup
2015-02-03 07:08:13 UTC
Permalink
Yes, I decided not to use the alias as matching criteria for the slave data due to the "tree points" intended use of aliases. Currently we do not use aliases on our modules, but we are planning to introduce this in the near future to prevent that some modules get the wrong configuration if a rack of modules (with same vendor and product code as the following rack) are disconnected due to wire/power break. What do you mean by the "master configurator" ?

BR, Knud

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Gavin Lambert
Sent: 3. februar 2015 02:00
To: 'Graeme Foot'
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues
Sent: Tuesday, 3 February 2015 13:38
To: Gavin Lambert; 'Knud Baastrup'
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Gavin Lambert
Post by Knud Baastrup
The SII data and PDOs will now be stored when the EtherCAT master is in
its
Post by Knud Baastrup
operation phase. The stored SII data and PDOs will be detached
from the slaves prior to a scanning and re-attached during the
scanning without the need to fetch the SII data and PDOs once
again. The SII data and PDOs will however only be stored if the
slave have a serial number defined as this serial number will be
used when re-attaching
the
SII data and PDOs.
Post by Gavin Lambert
Ooh, thanks for that one. That's one of the performance holes that
I
was
planning on investigating myself soonish.
None of my modules seem to have serial numbers (Beckhoff IO and
yaskawa amps), or is that a bug that's got a patch? I'm running the
original
1.5.2
(+ misc patches).
It's not a bug, or really related to the master software at all.

It's up to the device manufacturer whether a given device has a serial number or not. In my case they do, but I'm mostly using in-house hardware.
Beckhoff modules as a general rule don't seem to have serial numbers.

This is separate from the "alias" which is used in network addressing and is typically (but optionally) set by the network designer via the master configurator. The EtherLab master lacks specific commands to set aliases but it can recognise aliases set by other masters, and some slaves support a CoE download to reconfigure their alias (which can be accessed via the command line), and some others have dipswitches. Generally aliases are only needed at "tree points" in the network graph, so if you're only using simple chains they're less useful.

Technically the serial number can be altered by the master / network designer as well via an EEPROM download, but this is less "encouraged".
Gavin Lambert
2015-02-03 07:26:11 UTC
Permalink
Post by Knud Baastrup
Yes, I decided not to use the alias as matching criteria for the slave
data
Post by Knud Baastrup
due to the "tree points" intended use of aliases. Currently we do not use
aliases on our modules, but we are planning to introduce this in the near
future to prevent that some modules get the wrong configuration if a rack
of
Post by Knud Baastrup
modules (with same vendor and product code as the following rack) are
disconnected due to wire/power break. What do you mean by the "master
configurator" ?
Depending on context, either the software that sets up the network layout
(assigning aliases, saving persistent parameters, etc), or the person in
charge of doing that work for a particular installation. Depending on how
your application and network operate, sometimes that's an explicit step, and
sometimes it's part of the application.

Since I'm a lazy person, when our modules are assigned a serial number
during production they get that assigned as their alias as well (although
they can also be explicitly assigned a different alias if the network
configurator wishes), which simplifies network configuration quite a bit.
But then, these are discrete units rather than "racks" so there's a higher
chance they'll get wired in an unexpected order, so I think this provides
the best compromise in network flexibility. YMMV.
Knud Baastrup
2015-02-03 08:56:32 UTC
Permalink
Thanks Gavin, I was just curious to know if this "master configurator" were a tool that I did not know about.

/Knud


-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 3. februar 2015 08:26
To: Knud Baastrup
Cc: etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
Yes, I decided not to use the alias as matching criteria for the slave
data
Post by Knud Baastrup
due to the "tree points" intended use of aliases. Currently we do not
use aliases on our modules, but we are planning to introduce this in
the near future to prevent that some modules get the wrong
configuration if a rack
of
Post by Knud Baastrup
modules (with same vendor and product code as the following rack) are
disconnected due to wire/power break. What do you mean by the "master
configurator" ?
Depending on context, either the software that sets up the network layout (assigning aliases, saving persistent parameters, etc), or the person in charge of doing that work for a particular installation. Depending on how your application and network operate, sometimes that's an explicit step, and sometimes it's part of the application.

Since I'm a lazy person, when our modules are assigned a serial number during production they get that assigned as their alias as well (although they can also be explicitly assigned a different alias if the network configurator wishes), which simplifies network configuration quite a bit.
But then, these are discrete units rather than "racks" so there's a higher chance they'll get wired in an unexpected order, so I think this provides the best compromise in network flexibility. YMMV.
Gavin Lambert
2015-02-09 07:43:31 UTC
Permalink
Post by Knud Baastrup
I will just update you on some additional pathes I have prepared for
EtherCAT
Post by Knud Baastrup
Master. I have attached the complete set of patches that we currently use,
but only the below patches have been updated added.
I'm just having a more detailed look through these patches now, and there's
a few niggles. ;)

1. There are several files that appear to have tabs in them; it's usually a
good idea when sharing patches with others to use spaces only, as different
people/editors use different tab sizes.

2. There's several diffs in various files (eg. 12_sdo_directory.patch's
master/ioctl.h) that contain only whitespace changes on various lines for no
readily apparent reason. These sorts of things can cause unnecessary
conflicts and hide the true intent of the patch. This may have been the
result of a space -> tab -> space conversion gone wrong.

3. This is optional, but I think it's good style to include a short text
description at the top of each patch file, which can act as a commit
message, and helps people reading the patch later without having to hunt
down the original email. (If you're using HG MQ it does this automatically;
I think git format-patch will also do this for you if you have a branch
structured with one commit per patch, though it also adds quite a bit of
extra email-header junk.)

4. Regarding 13_domain_lock.patch, I believe the original rationale of the
master is that locking between concurrent application tasks is the
responsibility of the application, not the master -- that's why in
kernelspace it has send/receive callbacks (formerly lock/unlock callbacks)
so that eg. RTAI locks can be substituted, or locking can be avoided if the
application has some other way to schedule things to avoid actual
concurrency (or if it's only running a single task). See the "Concurrent
Master Access" section in the docs. I don't have any personal objections to
this patch though.

5. Regarding 14_fix_string_handling.patch, I don't think this is the "right"
fix. I've attached Frank's 04 patch which fixes this a different way.

6. Regarding 16_improved_ethercat_rescan_performance.patch, it looks like a
stray temporary file was included in the patch. Also, I'm not sure it's
safe to retrieve the data only by serial number. Serial numbers are not
guaranteed unique between vendors, or even between product lines -- I think
at minimum you should include the vendor id and product code in the index.
Also, possibly this should have a #define config guard to disable this
functionality in case the master will be used at a site with pathological
slaves (eg. multiple slaves with identical non-zero vendor/product/serial
triplets, since *technically* they're not guaranteed unique at all --
although any slave vendor who does this deserves a kick).

I haven't had a chance to test things locally yet, but at least everything
is compiling ok with these patches. :)
Knud Baastrup
2015-02-10 07:47:12 UTC
Permalink
Thanks Gavin, see my inline comments below!

BR, Knud

-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 9. februar 2015 08:44
To: Knud Baastrup
Cc: etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
I will just update you on some additional pathes I have prepared for
EtherCAT
Post by Knud Baastrup
Master. I have attached the complete set of patches that we currently
use, but only the below patches have been updated added.
I'm just having a more detailed look through these patches now, and there's a few niggles. ;)

1. There are several files that appear to have tabs in them; it's usually a good idea when sharing patches with others to use spaces only, as different people/editors use different tab sizes.
[Knud] I fully agree, I will update and try to avoid in future patches!

2. There's several diffs in various files (eg. 12_sdo_directory.patch's
master/ioctl.h) that contain only whitespace changes on various lines for no readily apparent reason. These sorts of things can cause unnecessary conflicts and hide the true intent of the patch. This may have been the result of a space -> tab -> space conversion gone wrong.
[Knud] I fully agree, I will update and try to avoid in future patches!

3. This is optional, but I think it's good style to include a short text description at the top of each patch file, which can act as a commit message, and helps people reading the patch later without having to hunt down the original email. (If you're using HG MQ it does this automatically; I think git format-patch will also do this for you if you have a branch structured with one commit per patch, though it also adds quite a bit of extra email-header junk.)
[Knud] I have the patches structured with one commit per patch so using git-format-patch will work fine. I will apply.

4. Regarding 13_domain_lock.patch, I believe the original rationale of the master is that locking between concurrent application tasks is the responsibility of the application, not the master -- that's why in kernelspace it has send/receive callbacks (formerly lock/unlock callbacks) so that eg. RTAI locks can be substituted, or locking can be avoided if the application has some other way to schedule things to avoid actual concurrency (or if it's only running a single task). See the "Concurrent Master Access" section in the docs. I don't have any personal objections to this patch though.
[Knud] Yes, we just faced some cases where the application developers did not include the necessary locking, which have quite severe impact for the complete system. We have not used RTAI, but I am not sure I understand why the extra locks become a problem for RTAI?

5. Regarding 14_fix_string_handling.patch, I don't think this is the "right"
fix. I've attached Frank's 04 patch which fixes this a different way.
[Knud] I will talk with my colleague Per and check if we can revert this patch and instead use the one provided by Frank.

6. Regarding 16_improved_ethercat_rescan_performance.patch, it looks like a stray temporary file was included in the patch. Also, I'm not sure it's safe to retrieve the data only by serial number. Serial numbers are not guaranteed unique between vendors, or even between product lines -- I think at minimum you should include the vendor id and product code in the index.
Also, possibly this should have a #define config guard to disable this functionality in case the master will be used at a site with pathological slaves (eg. multiple slaves with identical non-zero vendor/product/serial triplets, since *technically* they're not guaranteed unique at all -- although any slave vendor who does this deserves a kick).
[Knud] Yes sorry, my mistake with the temporary file. I can only agree with you that vendor and productcode should be included in the index in order for this patch to be used in large scale, I will add this. I can also agree with the #define guard.

I haven't had a chance to test things locally yet, but at least everything is compiling ok with these patches. :)
Gavin Lambert
2015-02-10 22:18:32 UTC
Permalink
Post by Knud Baastrup
Post by Gavin Lambert
4. Regarding 13_domain_lock.patch, I believe the original rationale of the
master is that locking between concurrent application tasks is the
responsibility of the application, not the master -- that's why in
kernelspace it has send/receive callbacks (formerly lock/unlock
callbacks) so
Post by Knud Baastrup
Post by Gavin Lambert
that eg. RTAI locks can be substituted, or locking can be avoided if the
application has some other way to schedule things to avoid actual
concurrency
Post by Knud Baastrup
Post by Gavin Lambert
(or if it's only running a single task). See the "Concurrent Master
Access"
Post by Knud Baastrup
Post by Gavin Lambert
section in the docs. I don't have any personal objections to this patch
though.
Yes, we just faced some cases where the application developers did not
include the necessary locking, which have quite severe impact for the
complete system. We have not used RTAI, but I am not sure I understand why
the extra locks become a problem for RTAI?
Not a problem as such, but not necessarily sufficient to protect anything.
RTAI/Xenomai is a separate kernel, so concurrent tasks would be using RTAI
locks instead of regular kernel locks, so they would make the kernel locking
redundant. It does trivially hurt performance to take a lock that is never
contended, but it's usually not worth worrying about that unless it's in a
tight loop.

Having said that, I don't use RTAI *or* concurrent tasks, so it doesn't
really affect me either way. :)
Post by Knud Baastrup
Post by Gavin Lambert
6. Regarding 16_improved_ethercat_rescan_performance.patch, it looks like a
stray temporary file was included in the patch. Also, I'm not sure it's
safe
Post by Knud Baastrup
Post by Gavin Lambert
to retrieve the data only by serial number. Serial numbers are not
guaranteed unique between vendors, or even between product lines -- I think
at minimum you should include the vendor id and product code in the index.
Also, possibly this should have a #define config guard to disable this
functionality in case the master will be used at a site with pathological
slaves (eg. multiple slaves with identical non-zero vendor/product/serial
triplets, since *technically* they're not guaranteed unique at all --
although any slave vendor who does this deserves a kick).
Yes sorry, my mistake with the temporary file. I can only agree with
you that vendor and productcode should be included in the index in order
for
Post by Knud Baastrup
this patch to be used in large scale, I will add this. I can also agree
with
Post by Knud Baastrup
the #define guard.
To reduce network cycles a bit, I suggest trying the alias (if nonzero)
first, as this is required to be network-unique if defined (meaning that you
wouldn't need to check vendor/product/serial); falling back to reading
serial, check if nonzero, and only then read the vendor and product codes.
(I'm not sure if the alias is already known at that point or if that
requires a network cycle to read as well, but even if the latter it means
one read instead of at least three.)

Of course, I might be a little biased since as I mentioned before I usually
configure aliases on all slaves. :)
Post by Knud Baastrup
Post by Gavin Lambert
I haven't had a chance to test things locally yet, but at least
everything is
Post by Knud Baastrup
Post by Gavin Lambert
compiling ok with these patches. :)
I've given it some minimal testing now, and I'm happy to report that in a
network with about 10 slaves (all with serial numbers) this reduces the
total rescan time from about 45 seconds to about 2, at least for subsequent
scans. (Numbers are vague because the test conditions weren't quite
identical in each case.)

Although the SDO dictionary patch means that I'm not really testing your
mailbox patches anymore, because dictionary vs. other SDO requests were the
main cause of mailbox conflicts that I see with an unpatched master. (I'm
not using EoE, which is the other main source of conflicts.) That's also a
good patch, as the dictionary scan of those 10 slaves takes about two
minutes, and normally the information isn't required for standard running,
only when commissioning. (It's a little disconcerting to see "ethercat
sdos" just sit apparently dead for a few minutes though. Maybe it needs
some kind of progress reporting. Although it's not as bad when limited to a
single slave, which is probably the more common use case.)
Knud Baastrup
2015-02-12 14:30:15 UTC
Permalink
Thanks Gavin,

I have attached a new set of patches (now by using git format-patch, which also imply that the patch names are given by the commit text).

I believe that I have addressed the issues you highlighted in prior mails including the alias support that might be relevant for us as well sometime in the future.

The locks that conflicts with RTAI could be removed with a define guard, e.g. re-use the EC_RTDM define already available?

I have removed our string handling fix and we will now instead use the one provided by Frank.

One additional patch have been included concerning FoE Filename length (documented in the patch it selves).

BR, Knud


-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 10. februar 2015 23:19
To: Knud Baastrup
Cc: etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
Post by Gavin Lambert
4. Regarding 13_domain_lock.patch, I believe the original rationale
of
the
Post by Knud Baastrup
Post by Gavin Lambert
master is that locking between concurrent application tasks is the
responsibility of the application, not the master -- that's why in
kernelspace it has send/receive callbacks (formerly lock/unlock
callbacks) so
Post by Knud Baastrup
Post by Gavin Lambert
that eg. RTAI locks can be substituted, or locking can be avoided if
the application has some other way to schedule things to avoid actual
concurrency
Post by Knud Baastrup
Post by Gavin Lambert
(or if it's only running a single task). See the "Concurrent Master
Access"
Post by Knud Baastrup
Post by Gavin Lambert
section in the docs. I don't have any personal objections to this
patch though.
Yes, we just faced some cases where the application developers did not
include the necessary locking, which have quite severe impact for the
complete system. We have not used RTAI, but I am not sure I understand
why the extra locks become a problem for RTAI?
Not a problem as such, but not necessarily sufficient to protect anything.
RTAI/Xenomai is a separate kernel, so concurrent tasks would be using RTAI locks instead of regular kernel locks, so they would make the kernel locking redundant. It does trivially hurt performance to take a lock that is never contended, but it's usually not worth worrying about that unless it's in a tight loop.

Having said that, I don't use RTAI *or* concurrent tasks, so it doesn't really affect me either way. :)
Post by Knud Baastrup
Post by Gavin Lambert
6. Regarding 16_improved_ethercat_rescan_performance.patch, it looks
like
a
Post by Knud Baastrup
Post by Gavin Lambert
stray temporary file was included in the patch. Also, I'm not sure it's
safe
Post by Knud Baastrup
Post by Gavin Lambert
to retrieve the data only by serial number. Serial numbers are not
guaranteed unique between vendors, or even between product lines -- I think
at minimum you should include the vendor id and product code in the index.
Also, possibly this should have a #define config guard to disable
this functionality in case the master will be used at a site with
pathological slaves (eg. multiple slaves with identical non-zero
vendor/product/serial triplets, since *technically* they're not
guaranteed unique at all -- although any slave vendor who does this deserves a kick).
Yes sorry, my mistake with the temporary file. I can only agree with
you that vendor and productcode should be included in the index in order
for
Post by Knud Baastrup
this patch to be used in large scale, I will add this. I can also agree
with
Post by Knud Baastrup
the #define guard.
To reduce network cycles a bit, I suggest trying the alias (if nonzero) first, as this is required to be network-unique if defined (meaning that you wouldn't need to check vendor/product/serial); falling back to reading serial, check if nonzero, and only then read the vendor and product codes.
(I'm not sure if the alias is already known at that point or if that requires a network cycle to read as well, but even if the latter it means one read instead of at least three.)

Of course, I might be a little biased since as I mentioned before I usually configure aliases on all slaves. :)
Post by Knud Baastrup
Post by Gavin Lambert
I haven't had a chance to test things locally yet, but at least
everything is
Post by Knud Baastrup
Post by Gavin Lambert
compiling ok with these patches. :)
I've given it some minimal testing now, and I'm happy to report that in a network with about 10 slaves (all with serial numbers) this reduces the total rescan time from about 45 seconds to about 2, at least for subsequent scans. (Numbers are vague because the test conditions weren't quite identical in each case.)

Although the SDO dictionary patch means that I'm not really testing your mailbox patches anymore, because dictionary vs. other SDO requests were the main cause of mailbox conflicts that I see with an unpatched master. (I'm not using EoE, which is the other main source of conflicts.) That's also a good patch, as the dictionary scan of those 10 slaves takes about two minutes, and normally the information isn't required for standard running, only when commissioning. (It's a little disconcerting to see "ethercat sdos" just sit apparently dead for a few minutes though. Maybe it needs some kind of progress reporting. Although it's not as bad when limited to a single slave, which is probably the more common use case.)
Gavin Lambert
2015-02-12 22:50:23 UTC
Permalink
Post by Knud Baastrup
I have attached a new set of patches (now by using git format-patch, which
also imply that the patch names are given by the commit text).
I believe that I have addressed the issues you highlighted in prior mails
including the alias support that might be relevant for us as well sometime
in
Post by Knud Baastrup
the future.
Nice! Although there still seem to be some funny things going on with the
whitespace, eg. see patch 0013's master/fsm_slave_config.c's second hunk
(ec_fsm_slave_config_enter_mbox_sync).
Post by Knud Baastrup
The locks that conflicts with RTAI could be removed with a define guard,
e.g.
Post by Knud Baastrup
re-use the EC_RTDM define already available?
They're not conflicts, and I wasn't suggesting any specific changes (as I
said, I don't use RTDM myself so I don't really know specifics). It was
just a comment to indicate why the master originally didn't do any locking
there, and that your original problem *could* theoretically have been solved
by doing the locking in the application code instead, as it's only an issue
with concurrent realtime tasks, which are likely to need some
application-level locking anyway. It's a possible reason that Florian might
not want to accept the patch, but that doesn't mean that you should modify
or withdraw it -- that's something he can decide.


I'll integrate your new patchset into my build and do a bit more testing;
I'm hoping to post my full patch queue in a few days. This will include
your patches as well -- I hope you don't mind? (They'll be clearly
attributed, of course.)


Just some further thoughts on patch 0010 (deferring the sdo dictionary
fetch): one of the interesting things about fsm_slave over fsm_master is
that the former can run in parallel while the latter only in series. In
principle, this means that if someone issues "ethercat sdos" with no filters
on a large network, the fetch time could be reduced considerably. (It won't
reduce the time to that of a single slave, as it has caps on the number of
slave FSMs it can run in parallel to prevent blowing out the number of
frames and causing latency.) Currently your patch forces this to still run
sequentially anyway, because the ioctl is blocking and it only does one
slave at a time.

I was thinking about having a go at trying to make that change myself, but
having said that, given that running "ethercat sdos" on multiple slaves is
not particularly useful (since networks usually contain many duplicates) and
that this is generally only used during development or commissioning, I'm
not sure whether it's really worth it.

I was also wondering if it should do a more limited dictionary fetch by
default (just of the PDOs), which could improve some of the logging, but
even then those messages only appear when the debug level is increased, so
most of the time it wouldn't be all that useful. And someone can just look
at the slave docs (or a local dictionary scan) if they want to interpret
logs to find out what a particular SDO entry is called from its
index:subindex.
Knud Baastrup
2015-02-13 08:38:38 UTC
Permalink
See my in-lined comments!

BR, Knud

-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 12. februar 2015 23:50
To: Knud Baastrup
Cc: etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
I have attached a new set of patches (now by using git format-patch,
which also imply that the patch names are given by the commit text).
I believe that I have addressed the issues you highlighted in prior
mails including the alias support that might be relevant for us as
well sometime
in
Post by Knud Baastrup
the future.
Nice! Although there still seem to be some funny things going on with the whitespace, eg. see patch 0013's master/fsm_slave_config.c's second hunk (ec_fsm_slave_config_enter_mbox_sync).
[Knud] I guess I need more help to figure this out. I cannot (with my current knowledge of patch management) see anything wrong in this specific hunk (line 374 to 476). Do you get some kind of warning when applying the patch or how do you observe the issue?
Post by Knud Baastrup
The locks that conflicts with RTAI could be removed with a define guard,
e.g.
Post by Knud Baastrup
re-use the EC_RTDM define already available?
They're not conflicts, and I wasn't suggesting any specific changes (as I said, I don't use RTDM myself so I don't really know specifics). It was just a comment to indicate why the master originally didn't do any locking there, and that your original problem *could* theoretically have been solved by doing the locking in the application code instead, as it's only an issue with concurrent realtime tasks, which are likely to need some application-level locking anyway. It's a possible reason that Florian might not want to accept the patch, but that doesn't mean that you should modify or withdraw it -- that's something he can decide.
[Knud] Agree, we will keep the locks and let Florian decide to what extend he will include the patch.

I'll integrate your new patchset into my build and do a bit more testing; I'm hoping to post my full patch queue in a few days. This will include your patches as well -- I hope you don't mind? (They'll be clearly attributed, of course.)
[Knud] That is fully ok. I believe it will just further improve the chances for getting the patches merged into trunk by Florian. Feel free to do any improvements on the patches as well.

Just some further thoughts on patch 0010 (deferring the sdo dictionary
fetch): one of the interesting things about fsm_slave over fsm_master is that the former can run in parallel while the latter only in series. In principle, this means that if someone issues "ethercat sdos" with no filters on a large network, the fetch time could be reduced considerably. (It won't reduce the time to that of a single slave, as it has caps on the number of slave FSMs it can run in parallel to prevent blowing out the number of frames and causing latency.) Currently your patch forces this to still run sequentially anyway, because the ioctl is blocking and it only does one slave at a time.

I was thinking about having a go at trying to make that change myself, but having said that, given that running "ethercat sdos" on multiple slaves is not particularly useful (since networks usually contain many duplicates) and that this is generally only used during development or commissioning, I'm not sure whether it's really worth it.
[Knud] I agree, it is tempting now to take next step and do this in parallel. It currently requires a patient user to wait several minutes for an ethercat sdos command (with no progress information) to complete. Today we as well only fetch the dictionary for debug and test purposes so we do not plan to further improve this for the moment.

I was also wondering if it should do a more limited dictionary fetch by default (just of the PDOs), which could improve some of the logging, but even then those messages only appear when the debug level is increased, so most of the time it wouldn't be all that useful. And someone can just look at the slave docs (or a local dictionary scan) if they want to interpret logs to find out what a particular SDO entry is called from its index:subindex.
Gavin Lambert
2015-02-15 22:36:17 UTC
Permalink
Post by Knud Baastrup
Post by Gavin Lambert
Nice! Although there still seem to be some funny things going on with the
whitespace, eg. see patch 0013's master/fsm_slave_config.c's second hunk
(ec_fsm_slave_config_enter_mbox_sync).
I guess I need more help to figure this out. I cannot (with my current
knowledge of patch management) see anything wrong in this specific hunk
(line
Post by Knud Baastrup
374 to 476). Do you get some kind of warning when applying the patch or
how
Post by Knud Baastrup
do you observe the issue?
The second hunk covers lines 467 to 524 in the patched file.

There's no patching errors or anything like that, it's just that the
inserted lines have only four spaces instead of eight, so the indentation
appears wrong when compared to the surrounding code.

I didn't examine the patches with a fine-toothed comb (though I did spend a
bit of time looking through them, of course), so I don't know if there are
other instances of this or if this was the only one, but I happened to
notice this case so I thought I'd mention it. Obviously it doesn't affect
the actual operation of the patch, it's just a code style issue.
Knud Baastrup
2015-02-16 07:40:58 UTC
Permalink
Thanks, it is just fine you mention these kind of issues as well as I any way would not know as you mention your selves. I can see the lines having only one indentation after a line break and not two and it is actual an indentation error that is also present in the stable-1.5 branch and it has not been corrected with the patch.

Thanks,

Knud


-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 15. februar 2015 23:36
To: Knud Baastrup
Cc: etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
Post by Gavin Lambert
Nice! Although there still seem to be some funny things going on
with
the
Post by Knud Baastrup
Post by Gavin Lambert
whitespace, eg. see patch 0013's master/fsm_slave_config.c's second
hunk (ec_fsm_slave_config_enter_mbox_sync).
I guess I need more help to figure this out. I cannot (with my current
knowledge of patch management) see anything wrong in this specific hunk
(line
Post by Knud Baastrup
374 to 476). Do you get some kind of warning when applying the patch or
how
Post by Knud Baastrup
do you observe the issue?
The second hunk covers lines 467 to 524 in the patched file.

There's no patching errors or anything like that, it's just that the inserted lines have only four spaces instead of eight, so the indentation appears wrong when compared to the surrounding code.

I didn't examine the patches with a fine-toothed comb (though I did spend a bit of time looking through them, of course), so I don't know if there are other instances of this or if this was the only one, but I happened to notice this case so I thought I'd mention it. Obviously it doesn't affect the actual operation of the patch, it's just a code style issue.
Knud Baastrup
2015-02-27 08:44:15 UTC
Permalink
Added one additional patch (0015-Internal-SDO-requests-now-synchronized-with-external.patch) that solves an issue with input/output errors when executing the ethercat sdos command (that now fetch directory) while configured SDO requests are executed from user application. Can also be observed with ethercat upload/download from EtherCAT Tool together with the execution of configured SDO requests. See the documentation in the patch it selves for more information.

BR, Knud


-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Knud Baastrup
Sent: 12. februar 2015 15:30
To: Gavin Lambert
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues

Thanks Gavin,

I have attached a new set of patches (now by using git format-patch, which also imply that the patch names are given by the commit text).

I believe that I have addressed the issues you highlighted in prior mails including the alias support that might be relevant for us as well sometime in the future.

The locks that conflicts with RTAI could be removed with a define guard, e.g. re-use the EC_RTDM define already available?

I have removed our string handling fix and we will now instead use the one provided by Frank.

One additional patch have been included concerning FoE Filename length (documented in the patch it selves).

BR, Knud
Gavin Lambert
2015-03-02 06:23:57 UTC
Permalink
Post by Knud Baastrup
Added one additional patch
(0015-Internal-SDO-requests-now-synchronized-with-
Post by Knud Baastrup
external.patch) that solves an issue with input/output errors when
executing
Post by Knud Baastrup
the ethercat sdos command (that now fetch directory) while configured SDO
requests are executed from user application. Can also be observed with
ethercat upload/download from EtherCAT Tool together with the execution of
configured SDO requests. See the documentation in the patch it selves for
more information.
I just noticed that patch "17_remove_prints_to_avoid_syslog_spam.patch" from
the 02022015 patch series appears to have vanished from later series. I
assume this was intentional, as I don't recall seeing the spam it referred
to, but I thought I'd mention it just in case.

Also, some compiler warnings are still present from patch 0013:

master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_enter_attach_sii':
master/fsm_slave_scan.c:494:17: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'int' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Slave can re-use SII image data
stored."
^
master/fsm_slave_scan.c:502:17: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Slave can re-use SII image data
stored."
^
master/fsm_slave_scan.c:502:17: warning: format '%zu' expects argument of
type 'size_t', but argument 6 has type 'uint32_t' [-Wformat=]
master/fsm_slave_scan.c:502:17: warning: format '%zu' expects argument of
type 'size_t', but argument 7 has type 'uint32_t' [-Wformat=]
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_alias':
master/fsm_slave_scan.c:721:5: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'int' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Alias: %zu\n", slave->effective_alias);
^
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_serial':
master/fsm_slave_scan.c:759:5: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Serial Number: %zu\n",
slave->effective_serial_number);
^
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_vendor':
master/fsm_slave_scan.c:792:5: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Vendor ID: %zu\n", slave->effective_vendor_id);
^
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_product':
master/fsm_slave_scan.c:825:5: warning: format '%zu' expects argument of
type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Product code: %zu\n",
slave->effective_product_code);
^

The ones complaining about "int" probably need casts to "unsigned" (or
uint32_t if you prefer) due to default parameter extension, and the %zu
should just be %u for all of them.

Also vendor ids and product codes are usually printed in hex. Not sure
about serial numbers, but "ethercat slaves -v" displays those in hex too, so
that seems reasonable.

On a somewhat related note, I'm not sure "effective_serial" etc are good
variable names. "Effective alias" is phrased that way because there are
several different kinds of alias, but this contains the one that's currently
in use (eg. see the EC_REGALIAS code, which allows the effective alias to
come from a register rather than the SII); but that isn't really true for
the vendor/product/serial values. This is just a minor quibble of course.
:)

Although speaking of the EC_REGALIAS code, if that's enabled and if the
register 0x0012 alias is different from the SII alias, then this patch might
malfunction (it should probably skip reading the SII alias and go straight
for the register). Having said that, normally the two should be the same,
unless someone is in the process of changing the alias (in which case
rebooting the slave afterwards should "fix" everything). There might be
some odd slaves out there though, which could be why EC_REGALIAS was added
in the first place..?

Finally, this is one of those "probably not strictly necessary but it makes
things tidier just in case" changes, but I recommend adding the following
hunk to patch 0005:

--- a/master/datagram.c
+++ b/master/datagram.c
@@ -586,6 +586,9 @@
case EC_DATAGRAM_ERROR:
printk("error");
break;
+ case EC_DATAGRAM_INVALID:
+ printk("invalid");
+ break;
default:
printk("???");
}
Knud Baastrup
2015-03-02 12:48:44 UTC
Permalink
Thanks, attached updated patches. See inline comments.

BR, Knud

-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: 2. marts 2015 07:24
To: Knud Baastrup
Cc: etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues
Post by Knud Baastrup
Added one additional patch
(0015-Internal-SDO-requests-now-synchronized-with-
Post by Knud Baastrup
external.patch) that solves an issue with input/output errors when
executing
Post by Knud Baastrup
the ethercat sdos command (that now fetch directory) while configured
SDO requests are executed from user application. Can also be observed
with ethercat upload/download from EtherCAT Tool together with the
execution of configured SDO requests. See the documentation in the
patch it selves for more information.
I just noticed that patch "17_remove_prints_to_avoid_syslog_spam.patch" from the 02022015 patch series appears to have vanished from later series. I assume this was intentional, as I don't recall seeing the spam it referred to, but I thought I'd mention it just in case.
[Knud] Yes, it was attentional. I believe our observed syslog spam is provoked due wrong usage of the master library.


Also, some compiler warnings are still present from patch 0013:

master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_enter_attach_sii':
master/fsm_slave_scan.c:494:17: warning: format '%zu' expects argument of type 'size_t', but argument 5 has type 'int' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Slave can re-use SII image data stored."
^
master/fsm_slave_scan.c:502:17: warning: format '%zu' expects argument of type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Slave can re-use SII image data stored."
^
master/fsm_slave_scan.c:502:17: warning: format '%zu' expects argument of type 'size_t', but argument 6 has type 'uint32_t' [-Wformat=]
master/fsm_slave_scan.c:502:17: warning: format '%zu' expects argument of type 'size_t', but argument 7 has type 'uint32_t' [-Wformat=]
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_alias':
master/fsm_slave_scan.c:721:5: warning: format '%zu' expects argument of type 'size_t', but argument 5 has type 'int' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Alias: %zu\n", slave->effective_alias);
^
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_serial':
master/fsm_slave_scan.c:759:5: warning: format '%zu' expects argument of type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Serial Number: %zu\n",
slave->effective_serial_number);
^
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_vendor':
master/fsm_slave_scan.c:792:5: warning: format '%zu' expects argument of type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Vendor ID: %zu\n", slave->effective_vendor_id);
^
master/fsm_slave_scan.c: In function 'ec_fsm_slave_scan_state_sii_product':
master/fsm_slave_scan.c:825:5: warning: format '%zu' expects argument of type 'size_t', but argument 5 has type 'uint32_t' [-Wformat=]
EC_SLAVE_DBG(slave, 1, "Product code: %zu\n",
slave->effective_product_code);
^

The ones complaining about "int" probably need casts to "unsigned" (or uint32_t if you prefer) due to default parameter extension, and the %zu should just be %u for all of them.

Also vendor ids and product codes are usually printed in hex. Not sure about serial numbers, but "ethercat slaves -v" displays those in hex too, so that seems reasonable.
[Knud] I believe the compiler warnings should be fixed now. I did however not get any Warnings in my build environment despite the usage of -Wall.

On a somewhat related note, I'm not sure "effective_serial" etc are good variable names. "Effective alias" is phrased that way because there are several different kinds of alias, but this contains the one that's currently in use (eg. see the EC_REGALIAS code, which allows the effective alias to come from a register rather than the SII); but that isn't really true for the vendor/product/serial values. This is just a minor quibble of course.
:)
[Knud] Yes the naming can be argued, but I will keep the terminology effective_serial_number for now as e.g. the term serial_number has already been used for sii_image->sii.serial_number and you can say that the effective_serial number is effective in the sense that it is the one used when matching sii_images after a rescan.

Although speaking of the EC_REGALIAS code, if that's enabled and if the register 0x0012 alias is different from the SII alias, then this patch might malfunction (it should probably skip reading the SII alias and go straight for the register). Having said that, normally the two should be the same, unless someone is in the process of changing the alias (in which case rebooting the slave afterwards should "fix" everything). There might be some odd slaves out there though, which could be why EC_REGALIAS was added in the first place..?
[Knud] The patch should not be malfunctioning, but yes if alias (or a serial number) is updated after a re-scan the stored sii_image cannot be matched in the coming re-scan and a new sii_image will be created for that particular module.

Finally, this is one of those "probably not strictly necessary but it makes things tidier just in case" changes, but I recommend adding the following hunk to patch 0005:

--- a/master/datagram.c
+++ b/master/datagram.c
@@ -586,6 +586,9 @@
case EC_DATAGRAM_ERROR:
printk("error");
break;
+ case EC_DATAGRAM_INVALID:
+ printk("invalid");
+ break;
default:
printk("???");
}
[Knud] Agree - I have added the hunk into the patch.
Gavin Lambert
2015-03-03 00:08:19 UTC
Permalink
Post by Knud Baastrup
Thanks, attached updated patches. See inline comments.
Looks good, as far as I can see. :)

Although speaking of syslog spam, I'm getting quite a lot of "Busy -
processing internal SDO request!" now.
Post by Knud Baastrup
Post by Gavin Lambert
Although speaking of the EC_REGALIAS code, if that's enabled and if the
register 0x0012 alias is different from the SII alias, then this patch might
malfunction (it should probably skip reading the SII alias and go straight
for the register). Having said that, normally the two should be the same,
unless someone is in the process of changing the alias (in which case
rebooting the slave afterwards should "fix" everything). There might be
some
Post by Knud Baastrup
Post by Gavin Lambert
odd slaves out there though, which could be why EC_REGALIAS was added in
the
Post by Knud Baastrup
Post by Gavin Lambert
first place..?
The patch should not be malfunctioning, but yes if alias (or a serial
number) is updated after a re-scan the stored sii_image cannot be matched
in
Post by Knud Baastrup
the coming re-scan and a new sii_image will be created for that particular
module.
The issue would be if some slave always had some wrong value in its SII but
loaded some other value to register 0x0012 on startup (eg. from onboard
dipswitches). This is not as unlikely as it sounds as it can be quite
awkward for the slave to access its own SII, especially with the default
Etherlab configuration (EC_SII_ASSIGN is not defined by default, and I'm
fairly sure it's not implemented correctly anyway). This could either work
by coincidence (if the "wrong" value was still unique), or cause either a
cache miss or in the worst case a hit on the wrong data (if that alias value
is shared with another slave).

Fortunately the standards require that in this case the "wrong SII value"
must be zero, which would just make your patch ignore the alias instead of
getting an invalid cache hit, but it's always possible there's some slave
that violates this. (Also the standard says that in case they're both
non-zero it doesn't need to signal an error until the INIT->PREOP
transition, and the scan may occur before this.)

So I was thinking that in the EC_REGALIAS case your patch should just read
register 0x0012 sooner instead of reading the SII alias at first and then
reading 0x0012 later (but not using the latter for the SII lookup). It'd
save several network cycles too.

Having said that, I don't know how common use of EC_REGALIAS is (I don't use
it myself). Maybe it doesn't really matter.

Loading...