Discussion:
[etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c
Esben Haabendal
2018-02-27 15:39:10 UTC
Permalink
Hi

I have been fixing a number of locking related issues, and have a hard
time figuring out how to handle locking in general, and in
master/ioctl.c in particular.

As of patch
base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
there is now the macro pair of
ec_ioctl_lock_down_interruptible() and ec_ioctl_lock_up(), which maps to
ec_lock_down_interruptible() and ec_lock_up() for the non RTDM use-case.

But looking at the master/ioctl.c file as of patchset version 20171108,
there are the following number of calls:

8 ec_lock_down()
52 ec_lock_down_interruptible()
129 ec_lock_up()
6 ec_ioctl_lock_up()
4 ec_ioctl_lock_down_interruptible()

When should the ec_lock_* functions be called directly, and when should
they be wrapped (and thus compiled out for RTDM)?

And how is this supposed to work for RTDM in the first place?
I mean, there are code outside of ioctl.c which is called from ioctl.c
which are taking locks. Isn't this kind of defeating the purpose of
this idea?

Also, as far as I can see, EC_IOCTL_RTDM is only defined when compiling
rtdm-ioctl.c. When and how is master/ioctl.c supposed to be compiled
with EC_IOCTL_RTDM defined?

Best regards,
Esben Haabendal
Graeme Foot
2018-02-27 22:38:33 UTC
Permalink
Hi,

The ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() calls were added to protect the following functions when multiple send/receive loops are running:
- ec_master_send()
- ec_master_receive()
- ec_master_domain_process()
- ec_master_domain_queue()

In my opinion any locking on these functions should be at the application level instead. However, if they are being called from multiple processes (rather than threads) then you need to use something like a named semaphore so that all processes share the same lock. Of course if you are using callbacks (for EoE) you are probably doing that anyway.

The reason that they have been named differently is that they are the functions that are called from the realtime send/receive loop and the define allows them to be ignored. RTAI (and Xenomi?) are NOT allowed to call standard Linux blocking functions from a hard realtime thread (to maintain hard realtime). The EC_IOCTL_RTDM define turns off these locks for (RTAI / Xenomi) RTDM applications to avoid this problem. They should probably also be turned off for RTAI / Xenomi in general and as I said above use application level locking.

You can pass --enable-rtdm when compiling to enable RTDM (and compile rtdm-ioctl.o).

So in summary:
ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() are for the hard realtime loop function calls, otherwise use the standard lock calls.


Regards,
Graeme Foot


-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Esben Haabendal
Sent: Wednesday, 28 February 2018 4:39 AM
To: etherlab-***@etherlab.org
Subject: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Hi

I have been fixing a number of locking related issues, and have a hard time figuring out how to handle locking in general, and in master/ioctl.c in particular.

As of patch
base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
there is now the macro pair of
ec_ioctl_lock_down_interruptible() and ec_ioctl_lock_up(), which maps to
ec_lock_down_interruptible() and ec_lock_up() for the non RTDM use-case.

But looking at the master/ioctl.c file as of patchset version 20171108, there are the following number of calls:

8 ec_lock_down()
52 ec_lock_down_interruptible()
129 ec_lock_up()
6 ec_ioctl_lock_up()
4 ec_ioctl_lock_down_interruptible()

When should the ec_lock_* functions be called directly, and when should they be wrapped (and thus compiled out for RTDM)?

And how is this supposed to work for RTDM in the first place?
I mean, there are code outside of ioctl.c which is called from ioctl.c which are taking locks. Isn't this kind of defeating the purpose of this idea?

Also, as far as I can see, EC_IOCTL_RTDM is only defined when compiling rtdm-ioctl.c. When and how is master/ioctl.c supposed to be compiled with EC_IOCTL_RTDM defined?

Best regards,
Esben Haabendal
Esben Haabendal
2018-02-28 07:23:07 UTC
Permalink
Post by Graeme Foot
The ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() calls were added
to protect the following functions when multiple send/receive loops are
- ec_master_send()
- ec_master_receive()
- ec_master_domain_process()
- ec_master_domain_queue()
In my opinion any locking on these functions should be at the application
level instead.
Well, I have a different opinion on that.

Implementation of locking is inherently difficult to get right. You
both have to ensure against race conditions while avoiding deadlocks.
But you also do not want to block for too long.

While I do acknowledge that for trivial cases where there are only a
single application, it is not a big program for that single application
to maintain synchronization, I don't think that it is a good solution to
let each application developer in the more complicated situations (like
multiple independent processes) have to do this without any help from
etherlabmaster. Forcing all (or at least some) application developers
to solve this same problem again and again should not be the best we can
do.
Post by Graeme Foot
However, if they are being called from multiple processes (rather than
threads) then you need to use something like a named semaphore so that
all processes share the same lock. Of course if you are using
callbacks (for EoE) you are probably doing that anyway.
You easily have We have multiple processes. Even with just a single
application, you have EtherCAT-OP, EtherCAT-EoE and the application.
All using some of the same shared data structures. Throwing more
multiple application just adds to that, but I think it is critical
enough with just a single application.
Post by Graeme Foot
The reason that they have been named differently is that they are the
functions that are called from the realtime send/receive loop and the define
allows them to be ignored. RTAI (and Xenomi?) are NOT allowed to call
standard Linux blocking functions from a hard realtime thread (to maintain
hard realtime).
Yes, I get that.
Post by Graeme Foot
The EC_IOCTL_RTDM define turns off these locks for (RTAI / Xenomi)
RTDM applications to avoid this problem.
Yes, EC_IOCTL_RTDM turns off these locks. But it does not really do
anything, as it is never defined.
Post by Graeme Foot
They should probably also be turned off for RTAI / Xenomi in general
and as I said above use application level locking.
You can pass --enable-rtdm when compiling to enable RTDM (and compile rtdm-ioctl.o).
Passing --enable-rtdm to ./configure will define EC_RTDM macro and
enable the automake conditional ENABLE_RTDM.

This will trigger Kbuild to build rtdm-ioctl.o (from rtdm-ioctl.c), and
this will be done with EC_IOCTL_RTDM macro defined.

But ioctl.c will be compiled as always, and that is without
EC_IOCTL_RTDM defined. Was it supposed to be defined? If so, it should
be easy to fix, but someone should definitely do some real testing of
it.
Post by Graeme Foot
ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() are for the hard
realtime loop function calls, otherwise use the standard lock calls.
And these are the closed set of

- ec_ioctl_send()
- ec_ioctl_receive()
- ec_ioctl_domain_process()
- ec_ioctl_domain_queue()

?

/Esben
Graeme Foot
2018-03-01 00:02:10 UTC
Permalink
Post by Graeme Foot
-----Original Message-----
Sent: Wednesday, 28 February 2018 8:23 PM
Subject: Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c
Post by Graeme Foot
The ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() calls
were added to protect the following functions when multiple
send/receive loops are
- ecrt_master_send()
- ecrt_master_receive()
- ecrt_master_domain_process()
- ecrt_master_domain_queue()
In my opinion any locking on these functions should be at the
application level instead.
Well, I have a different opinion on that.
Implementation of locking is inherently difficult to get right. You both have to ensure against race conditions while avoiding deadlocks.
But you also do not want to block for too long.
While I do acknowledge that for trivial cases where there are only a single application, it is not a big program for that single application to maintain synchronization, I don't think that it is a good solution to let each application developer in the more complicated situations (like multiple independent processes) have to do this without any help from etherlabmaster. Forcing all (or at least some) application developers to solve this same problem again and again should not be the best we can do.
These 4 functions are special. The master should be written (and mostly seems to be) in a fashion that between a ecrt_master_activate() and ecrt_master_deactivate() the above calls do not require any locks to synchronize with the master code, except for the EOE thread which uses callbacks. And the reason it uses callbacks is because only your application knows if it is appropriate to allow the EOE thread to call ecrt_master_recieve() and ecrt_master_send_ext() etc at any particular time.

However, if your application has multiple send/receive loops then they need to be synchronized with each other (see next comment). There are a few more functions such as the distributed clock calls that are also in the send/receive loops. They also do not require ethercat level locks as they should be safe to call between the activate and deactivate. They also do not need application level locking as they should really only be called by one send/receive loop per master. All other ethercat function calls should not be in a hard realtime context.
Post by Graeme Foot
Post by Graeme Foot
However, if they are being called from multiple processes (rather than
threads) then you need to use something like a named semaphore so that
all processes share the same lock. Of course if you are using
callbacks (for EoE) you are probably doing that anyway.
You easily have We have multiple processes. Even with just a single application, you have EtherCAT-OP, EtherCAT-EoE and the application.
All using some of the same shared data structures. Throwing more multiple application just adds to that, but I think it is critical enough with just a single application.
Even if you have multiple processes (rather than threads), it is your design decision as to which process takes priority and whether a particular send/receive loop should wait a bit even if it could get the lock now. You can only do that if your application controls the locking of these functions.
Post by Graeme Foot
Post by Graeme Foot
The reason that they have been named differently is that they are the
functions that are called from the realtime send/receive loop and the
define allows them to be ignored. RTAI (and Xenomi?) are NOT allowed
to call standard Linux blocking functions from a hard realtime thread
(to maintain hard realtime).
Yes, I get that.
Post by Graeme Foot
The EC_IOCTL_RTDM define turns off these locks for (RTAI / Xenomi)
RTDM applications to avoid this problem.
Yes, EC_IOCTL_RTDM turns off these locks. But it does not really do anything, as it is never defined.
EC_IOCTL_RTDM gets defined for "rtdm-ioctl.c" if --enable-rtdm is used (as you say below). I don't know if you noticed, but "rtdm-ioctl.c" is just a link to "ioctl.c". So the rtdm version gets it, but the standard version does not.
Post by Graeme Foot
Post by Graeme Foot
They should probably also be turned off for RTAI / Xenomi in general
and as I said above use application level locking.
You can pass --enable-rtdm when compiling to enable RTDM (and compile rtdm-ioctl.o).
Passing --enable-rtdm to ./configure will define EC_RTDM macro and enable the automake conditional ENABLE_RTDM.
This will trigger Kbuild to build rtdm-ioctl.o (from rtdm-ioctl.c), and this will be done with EC_IOCTL_RTDM macro defined.
But ioctl.c will be compiled as always, and that is without EC_IOCTL_RTDM defined. Was it supposed to be defined? If so, it should be easy to fix, but someone should definitely do some real testing of it.
I think the locks should be disabled in both "rtdm-ioctl.c" and "ioctl.c" if using RTAI, but I use RTDM so haven't confirmed this. Further to that I don't think they should be there at all. Simple applications have one send/receive loop so don't need locks. Applications with multiple send/receive loops or EOE need to control their own locking for optimal results anyway.

Also, having these lock functions use master_sem the send/receive functions block unnecessarily with non-realtime ethercat function calls. At a minimum they should be locking on their own semaphore.
Post by Graeme Foot
Post by Graeme Foot
ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() are for the
hard realtime loop function calls, otherwise use the standard lock calls.
And these are the closed set of
- ecrt_ioctl_send()
- ecrt_ioctl_receive()
- ecrt_ioctl_domain_process()
- ecrt_ioctl_domain_queue()
?
Yes they are only for these 4 calls as these are the ones used in the hard realtime send/receive loops, so need to be disabled for RTAI.
Post by Graeme Foot
/Esben
Regards,
Graeme.
Esben Haabendal
2018-03-02 09:27:03 UTC
Permalink
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
The ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() calls
were added to protect the following functions when multiple
send/receive loops are
- ecrt_master_send()
- ecrt_master_receive()
- ecrt_master_domain_process()
- ecrt_master_domain_queue()
In my opinion any locking on these functions should be at the
application level instead.
Well, I have a different opinion on that.
Implementation of locking is inherently difficult to get right. You
both have to ensure against race conditions while avoiding deadlocks.
But you also do not want to block for too long.
While I do acknowledge that for trivial cases where there are only a
single application, it is not a big program for that single
application to maintain synchronization, I don't think that it is a
good solution to let each application developer in the more
complicated situations (like multiple independent processes) have to
do this without any help from etherlabmaster. Forcing all (or at
least some) application developers to solve this same problem again
and again should not be the best we can do.
These 4 functions are special. The master should be written (and
mostly seems to be) in a fashion that between a ecrt_master_activate()
and ecrt_master_deactivate() the above calls do not require any locks
to synchronize with the master code, except for the EOE thread which
uses callbacks.
Well, mostly is the same as not in my world. If they mostly do not
require any locks, they do require locks.

I guess some of the confusion I have is caused by the fact that it is
unclear exactly which functions are allowed to be called between
ecrt_master_activate() and ecrt_master_deactivate(), and which functions
are not. Do we have such a list? Or how can I create such a list
exactly? And would it be possible to enforce this directly in the API,
so if you call a function you are not allowed to call, you get an error
back instead of introducing random breakage by modifying unsynchronized
shared data structures?

As for the EOE thread, I might be overlooking something obvious. But
how are you supposed to use callbacks when using the library API?

As far as I read the code, if you are using EoE and not RTDM, you will
always use the standard ec_master_internal_send_cb() and
ec_master_internal_receive_cb() callbacks. See ec_ioctl_activate().

And with them, I don't see how you are supposed to do locking at the
application level. I will not be able to specify application level
locks that are used by both application and eoe, unless they are
implemented directly in the master (kernel).

And with ec_master_eoe_thread() running inside master, you really need
to do locking to synchronize access to master->datagram_queue (i.e. the
io_sem lock). If not, you will have race conditions between eoe and 3
of the 4 functions:

ecrt_master_send()
ecrt_master_receive()
ecrt_domain_queue()

So we really do not to do locking in at least these 3 when using EOE.

Or maybe the whole master->datagram_queue should be refactored to something
that can be synchronized safely in a lock-free way?
Post by Graeme Foot
And the reason it uses callbacks is because only your application
knows if it is appropriate to allow the EOE thread to call
ecrt_master_recieve() and ecrt_master_send_ext() etc at any particular
time.
I see what you mean. But I just don't see how it is possible to
accomplish this for user space applications.
Post by Graeme Foot
However, if your application has multiple send/receive loops then they need to
be synchronized with each other (see next comment). There are a few more
functions such as the distributed clock calls that are also in the
send/receive loops. They also do not require ethercat level locks as they
should be safe to call between the activate and deactivate. They also do not
need application level locking as they should really only be called by one
send/receive loop per master. All other ethercat function calls should not be
in a hard realtime context.
Ok. But you imply a struct rule for how to design applications doing
EtherCAT communication: "There can only be one send/receive loop per
master".

That might (is not in my case) accepted by all application developers.
I need to continue support for applications that use multiple
send/receive loops for multiple EtherCAT domains, each with different
cycle time.

One solution could be to create a single send/receive loop per master,
and then let the multiple application loops communicate with that
instead of directly with the master. That might actually be a really
good idea. But, if that is done, I think it should fit very well as a
general feature of the EtherCAT master. It should be possible to
implement without any application specific code.



The final "problem" is as you say a design question. Should etherlabmaster
support synchronization of access from multiple threads/processes, or should
this be left entirely to the application developer. But if
master->datagram_queue is changed to something lock-free, this "problem" will
likely be solved also.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
However, if they are being called from multiple processes (rather than
threads) then you need to use something like a named semaphore so that
all processes share the same lock. Of course if you are using
callbacks (for EoE) you are probably doing that anyway.
You easily have We have multiple processes. Even with just a single
application, you have EtherCAT-OP, EtherCAT-EoE and the application.
All using some of the same shared data structures. Throwing more multiple
application just adds to that, but I think it is critical enough with just a
single application.
Even if you have multiple processes (rather than threads), it is your design
decision as to which process takes priority and whether a particular
send/receive loop should wait a bit even if it could get the lock now. You
can only do that if your application controls the locking of these functions.
While that sounds reasonable, I believe it is not entirely true. You
could write a layer responsible for handling this, which gets
information from the application(s) for how to deal with this. Think of
it like a kind of scheduler.

Example from an application I have seen. You have two applications
(processes), each driving it's own EtherCAT domain. One
application/domain is running every 2 ms, the other application/domain
every 10 ms. In addition, EoE slaves are also used. For each
application, you could specify the priority, the cycle time and the
start time. EoE should also be given a priority. This EtherCAT
"scheduler" would then be able to decide which EtherCAT communication to
do and when.

So while I agree that it might be most obvious to implement the
decission of when to send/receive directly in the application, I think
it could also be implemented in a more generic way.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
The reason that they have been named differently is that they are the
functions that are called from the realtime send/receive loop and the
define allows them to be ignored. RTAI (and Xenomi?) are NOT allowed
to call standard Linux blocking functions from a hard realtime thread
(to maintain hard realtime).
Yes, I get that.
Post by Graeme Foot
The EC_IOCTL_RTDM define turns off these locks for (RTAI / Xenomi)
RTDM applications to avoid this problem.
Yes, EC_IOCTL_RTDM turns off these locks. But it does not really do
anything, as it is never defined.
EC_IOCTL_RTDM gets defined for "rtdm-ioctl.c" if --enable-rtdm is used (as you
say below). I don't know if you noticed, but "rtdm-ioctl.c" is just a link to
"ioctl.c". So the rtdm version gets it, but the standard version does not.
Argh, I didn't notice that. Thanks for pointing that out.
It makes much more sense then :)
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
They should probably also be turned off for RTAI / Xenomi in general
and as I said above use application level locking.
You can pass --enable-rtdm when compiling to enable RTDM (and compile rtdm-ioctl.o).
Passing --enable-rtdm to ./configure will define EC_RTDM macro and enable
the automake conditional ENABLE_RTDM.
This will trigger Kbuild to build rtdm-ioctl.o (from rtdm-ioctl.c), and this
will be done with EC_IOCTL_RTDM macro defined.
But ioctl.c will be compiled as always, and that is without EC_IOCTL_RTDM
defined. Was it supposed to be defined? If so, it should be easy to fix,
but someone should definitely do some real testing of it.
I think the locks should be disabled in both "rtdm-ioctl.c" and "ioctl.c" if
using RTAI, but I use RTDM so haven't confirmed this.
I don't see how it could ever be disabled for ioctl.c with the current
code.
Post by Graeme Foot
Further to that I don't think they should be there at all. Simple
applications have one send/receive loop so don't need locks.
Applications with multiple send/receive loops or EOE need to control
their own locking for optimal results anyway.
Again, I don't fully agree with you on that.

But more importantly, it is not possible for user-space applications in
combination with EoE, due to inability to set application callbacks.
Post by Graeme Foot
Also, having these lock functions use master_sem the send/receive functions
block unnecessarily with non-realtime ethercat function calls. At a minimum
they should be locking on their own semaphore.
True. I have patches for fixing that. The master_sem is definitely a
no-go for real-time.

/Esben
Graeme Foot
2018-03-04 23:02:41 UTC
Permalink
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
The ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() calls
were added to protect the following functions when multiple
send/receive loops are
- ecrt_master_send()
- ecrt_master_receive()
- ecrt_master_domain_process()
- ecrt_master_domain_queue()
In my opinion any locking on these functions should be at the
application level instead.
Well, I have a different opinion on that.
Implementation of locking is inherently difficult to get right. You
both have to ensure against race conditions while avoiding deadlocks.
But you also do not want to block for too long.
While I do acknowledge that for trivial cases where there are only a
single application, it is not a big program for that single
application to maintain synchronization, I don't think that it is a
good solution to let each application developer in the more
complicated situations (like multiple independent processes) have to
do this without any help from etherlabmaster. Forcing all (or at
least some) application developers to solve this same problem again
and again should not be the best we can do.
These 4 functions are special. The master should be written (and
mostly seems to be) in a fashion that between a ecrt_master_activate()
and ecrt_master_deactivate() the above calls do not require any locks
to synchronize with the master code, except for the EOE thread which
uses callbacks.
Well, mostly is the same as not in my world. If they mostly do not require any locks, they do require locks.
When I say mostly, I mean the vanilla Etherlab version is ok, but the patch that added ec_ioctl_lock_up() and ec_ioctl_lock_down() broke this. I do not use this patch.
I guess some of the confusion I have is caused by the fact that it is unclear exactly which functions are allowed to be called between
ecrt_master_activate() and ecrt_master_deactivate(), and which functions are not. Do we have such a list? Or how can I create such a list exactly? And would it be possible to enforce this directly in the API, so if you call a function you are not allowed to call, you get an error back instead of introducing random breakage by modifying unsynchronized shared data structures?
The functions that I think should not require locks between ecrt_master_activate() and ecrt_master_deactivate() are:
- ecrt_master_receive()
- ecrt_domain_process()
- ecrt_domain_state()
- ecrt_master_state()
- ecrt_domain_queue()
- ecrt_master_reference_clock_time()
- ecrt_master_sync_slave_clocks()
- ecrt_master_sync_reference_clock()
- ecrt_master_64bit_reference_clock_time_queue()
- ecrt_master_64bit_reference_clock_time()
- ecrt_master_application_time()
- ecrt_master_send()
- ecrt_master_send_ext()
- ecrt_master_deactivate_slaves()

And with some of my recent patchs
- ecrt_master_eoe_is_open()
- ecrt_master_eoe_process()
- ecrt_master_rt_slave_requests()
- ecrt_master_exec_slave_requests()

There may be some others I don't use.
As for the EOE thread, I might be overlooking something obvious. But how are you supposed to use callbacks when using the library API?
As far as I read the code, if you are using EoE and not RTDM, you will always use the standard ec_master_internal_send_cb() and
ec_master_internal_receive_cb() callbacks. See ec_ioctl_activate().
If you do not set EoE callbacks (with ecrt_master_callbacks()) there will be NO callbacks once the master goes active. (It will not use ec_master_internal_send_cb() and ec_master_internal_receive_cb(). It will not start an EoE processing thread. You will get a "No EoE processing because of missing callbacks!" message in your system log.)

If your application is in kernel space then you can specify callback functions and then use application level locking. If your application is in user space, then you cannot specify callback functions as the kernel cannot call back into user space. This is why I created my latest patch "0002-eoe-via-rtdm.patch". This patch lets you create your own EoE processing thread in your own application. This of course also lets you use application level locking.
And with them, I don't see how you are supposed to do locking at the application level. I will not be able to specify application level locks that are used by both application and eoe, unless they are implemented directly in the master (kernel).
The io_sem lock is only used by the master when it is idle. Once you activate the master it is up to your application to provide the locking.
ecrt_master_send()
ecrt_master_receive()
ecrt_domain_queue()
So we really do not to do locking in at least these 3 when using EOE.
Or maybe the whole master->datagram_queue should be refactored to something that can be synchronized safely in a lock-free way?
The point of the callback functions is so that you can make sure your application does not call any of ecrt_master_receive(), ecrt_master_send(), ecrt_master_send_ext(), ecrt_domain_process(), ecrt_domain_queue() for a given master at the same time. If you don't use EoE you don't need the callbacks, but you still need to protect ecrt_master_receive(), ecrt_master_send(),ecrt_domain_process(), ecrt_domain_queue().
Post by Graeme Foot
And the reason it uses callbacks is because only your application
knows if it is appropriate to allow the EOE thread to call
ecrt_master_recieve() and ecrt_master_send_ext() etc at any particular
time.
I see what you mean. But I just don't see how it is possible to accomplish this for user space applications.
Exactly why I create my patch "0002-eoe-via-rtdm.patch".
Post by Graeme Foot
However, if your application has multiple send/receive loops then they
need to be synchronized with each other (see next comment). There are
a few more functions such as the distributed clock calls that are also
in the send/receive loops. They also do not require ethercat level
locks as they should be safe to call between the activate and
deactivate. They also do not need application level locking as they
should really only be called by one send/receive loop per master. All
other ethercat function calls should not be in a hard realtime context.
Ok. But you imply a struct rule for how to design applications doing EtherCAT communication: "There can only be one send/receive loop per master".
That might (is not in my case) accepted by all application developers.
I need to continue support for applications that use multiple send/receive loops for multiple EtherCAT domains, each with different cycle time.
One solution could be to create a single send/receive loop per master, and then let the multiple application loops communicate with that instead of directly with the master. That might actually be a really good idea. But, if that is done, I think it should fit very well as a general feature of the EtherCAT master. It should be possible to implement without any application specific code.
That's not what I said. You can have multiple send/receive loops per master, but it is up to your application to make sure they don't call any of the ecrt functions I listed at the top at the same time (per master). If your application is a single process with multiple threads you can use an application semaphore (for example), but if it is multiple processes you will need a named semaphore (or similar) that all of the processes share.
The final "problem" is as you say a design question. Should etherlabmaster support synchronization of access from multiple threads/processes, or should this be left entirely to the application developer. But if
master->datagram_queue is changed to something lock-free, this "problem"
master->will
likely be solved also.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
However, if they are being called from multiple processes (rather than
threads) then you need to use something like a named semaphore so
that all processes share the same lock. Of course if you are using
callbacks (for EoE) you are probably doing that anyway.
You easily have We have multiple processes. Even with just a single
application, you have EtherCAT-OP, EtherCAT-EoE and the application.
All using some of the same shared data structures. Throwing more
multiple application just adds to that, but I think it is critical
enough with just a single application.
Even if you have multiple processes (rather than threads), it is your
design decision as to which process takes priority and whether a
particular send/receive loop should wait a bit even if it could get
the lock now. You can only do that if your application controls the locking of these functions.
While that sounds reasonable, I believe it is not entirely true. You could write a layer responsible for handling this, which gets information from the application(s) for how to deal with this. Think of it like a kind of scheduler.
Example from an application I have seen. You have two applications (processes), each driving it's own EtherCAT domain. One application/domain is running every 2 ms, the other application/domain every 10 ms. In addition, EoE slaves are also used. For each application, you could specify the priority, the cycle time and the start time. EoE should also be given a priority. This EtherCAT "scheduler" would then be able to decide which EtherCAT communication to do and when.
So while I agree that it might be most obvious to implement the decission of when to send/receive directly in the application, I think it could also be implemented in a more generic way.
You don't need the EtherCAT master to do that scheduling. That is already available in Linux / RTAI or whatever you use. But once one of the above ecrt function calls is in progress you need to ensure no other process / thread will call one at the same time.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
The reason that they have been named differently is that they are
the functions that are called from the realtime send/receive loop
and the define allows them to be ignored. RTAI (and Xenomi?) are
NOT allowed to call standard Linux blocking functions from a hard
realtime thread (to maintain hard realtime).
Yes, I get that.
Post by Graeme Foot
The EC_IOCTL_RTDM define turns off these locks for (RTAI / Xenomi)
RTDM applications to avoid this problem.
Yes, EC_IOCTL_RTDM turns off these locks. But it does not really do
anything, as it is never defined.
EC_IOCTL_RTDM gets defined for "rtdm-ioctl.c" if --enable-rtdm is used
(as you say below). I don't know if you noticed, but "rtdm-ioctl.c"
is just a link to "ioctl.c". So the rtdm version gets it, but the standard version does not.
Argh, I didn't notice that. Thanks for pointing that out.
It makes much more sense then :)
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
They should probably also be turned off for RTAI / Xenomi in
general and as I said above use application level locking.
You can pass --enable-rtdm when compiling to enable RTDM (and
compile rtdm-ioctl.o).
Passing --enable-rtdm to ./configure will define EC_RTDM macro and
enable the automake conditional ENABLE_RTDM.
This will trigger Kbuild to build rtdm-ioctl.o (from rtdm-ioctl.c),
and this will be done with EC_IOCTL_RTDM macro defined.
But ioctl.c will be compiled as always, and that is without
EC_IOCTL_RTDM defined. Was it supposed to be defined? If so, it
should be easy to fix, but someone should definitely do some real testing of it.
I think the locks should be disabled in both "rtdm-ioctl.c" and
"ioctl.c" if using RTAI, but I use RTDM so haven't confirmed this.
I don't see how it could ever be disabled for ioctl.c with the current code.
By not using the patch that added them.
Post by Graeme Foot
Further to that I don't think they should be there at all. Simple
applications have one send/receive loop so don't need locks.
Applications with multiple send/receive loops or EOE need to control
their own locking for optimal results anyway.
Again, I don't fully agree with you on that.
But more importantly, it is not possible for user-space applications in combination with EoE, due to inability to set application callbacks.
Post by Graeme Foot
Also, having these lock functions use master_sem the send/receive
functions block unnecessarily with non-realtime ethercat function
calls. At a minimum they should be locking on their own semaphore.
True. I have patches for fixing that. The master_sem is definitely a no-go for real-time.
/Esben
So, I believe the following two patches are the problem:
- base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
- base/0018-Use-call-back-functions.patch

My new EoE patch rolls back the changes from "base/0018-Use-call-back-functions.patch" and I use EC_IOCTL_RTDM so ignore the changes in "base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch".

Regards,
Graeme.
Esben Haabendal
2018-03-05 09:58:00 UTC
Permalink
Hi Graeme

Need to get one thing straight first.

RTDM user-space is not the same as "plain" Linux user-space.
I know that you are 100% aware that, but your replies to me seems less
clear about it.

Your 0002-eoe-via-rtdm.patch does not seem usable for "plain" user-space
applications, so does not apply to the applications we support.
Post by Graeme Foot
Post by Graeme Foot
These 4 functions are special. The master should be written (and
mostly seems to be) in a fashion that between a ecrt_master_activate()
and ecrt_master_deactivate() the above calls do not require any locks
to synchronize with the master code, except for the EOE thread which
uses callbacks.
Well, mostly is the same as not in my world. If they mostly do not require
any locks, they do require locks.
When I say mostly, I mean the vanilla Etherlab version is ok, but the patch
that added ec_ioctl_lock_up() and ec_ioctl_lock_down() broke this. I do not
use this patch.
Do you mean that the patches (and I guess you mean
0017-Master-locks-to-avoid-corrupted-datagram-queue.patch and
0018-Use-call-back-functions.patch) introduces the need for locking?

That does not sound right to me. As I read them (and your writing in
this thread), the need for locking is there with and without these
patches. Without them, locking needs to be handled at application
level. With the patches, the master tries to handle the locking.

And that is a clearly a design change, and one which you clearly does
not agree with.

But if you really mean that these patches broke etherlabmaster (and not
"just" changed the design), would you please explain it a bit more
detailed?
Post by Graeme Foot
As for the EOE thread, I might be overlooking something obvious. But how
are you supposed to use callbacks when using the library API?
As far as I read the code, if you are using EoE and not RTDM, you will
always use the standard ec_master_internal_send_cb() and
ec_master_internal_receive_cb() callbacks. See ec_ioctl_activate().
If you do not set EoE callbacks (with ecrt_master_callbacks()) there will be
NO callbacks once the master goes active. (It will not use
ec_master_internal_send_cb() and ec_master_internal_receive_cb(). It will not
start an EoE processing thread. You will get a "No EoE processing because of
missing callbacks!" message in your system log.)
In ec_ioctl_activate() there is:

#ifndef EC_IOCTL_RTDM
ecrt_master_callbacks(master, ec_master_internal_send_cb,
ec_master_internal_receive_cb, master);
#endif

So at least for (non RTDM) user-space applications, that activate the
master with ioctl, EoE callbacks are always set to
ec_master_internal_send_cb() and ec_master_internal_receive_cb().
Post by Graeme Foot
If your application is in kernel space then you can specify callback functions
and then use application level locking. If your application is in user space,
then you cannot specify callback functions as the kernel cannot call back into
user space. This is why I created my latest patch "0002-eoe-via-rtdm.patch".
This patch lets you create your own EoE processing thread in your own
application. This of course also lets you use application level locking.
But only for RTDM. The functions added are guarded by

#if !defined(__KERNEL__) && defined(EC_RTDM) && (EC_EOE)

which means that they do not apply when using non-RTDM user-space.
Post by Graeme Foot
And with them, I don't see how you are supposed to do locking at the
application level. I will not be able to specify application level locks
that are used by both application and eoe, unless they are implemented
directly in the master (kernel).
And with ec_master_eoe_thread() running inside master, you really need to do
locking to synchronize access to master->datagram_queue (i.e. the io_sem
lock). If not, you will have race conditions between eoe and 3 of the 4
The io_sem lock is only used by the master when it is idle. Once you activate
the master it is up to your application to provide the locking.
Ok. But I dare to say that it is an open question if that is an design
for etherlabmaster. I believe it has both pros and cons.

Pro: Application developers can implement synchronization as they find
most optimal for them.

Cons: Application developers need to figure out and implement
synchronization without much (any) help from the etherlabmaster code.
And getting synchronization is IMHO something that is often tricky to
get right (at least for many developers).
Post by Graeme Foot
ecrt_master_send()
ecrt_master_receive()
ecrt_domain_queue()
So we really do not to do locking in at least these 3 when using EOE.
Or maybe the whole master->datagram_queue should be refactored to something
that can be synchronized safely in a lock-free way?
The point of the callback functions is so that you can make sure your
application does not call any of ecrt_master_receive(), ecrt_master_send(),
ecrt_master_send_ext(), ecrt_domain_process(), ecrt_domain_queue() for a given
master at the same time. If you don't use EoE you don't need the callbacks,
but you still need to protect ecrt_master_receive(),
ecrt_master_send(),ecrt_domain_process(), ecrt_domain_queue().
Which I personally see as a trivial omission in the etherlabmaster
design (requiring all API users to spend time and code writing the same
code for protect the API from itself).
Post by Graeme Foot
Post by Graeme Foot
And the reason it uses callbacks is because only your application
knows if it is appropriate to allow the EOE thread to call
ecrt_master_recieve() and ecrt_master_send_ext() etc at any particular
time.
I see what you mean. But I just don't see how it is possible to accomplish
this for user space applications.
Exactly why I create my patch "0002-eoe-via-rtdm.patch".
And again, which does not extend to non-RTDM user-space.
Post by Graeme Foot
Post by Graeme Foot
However, if your application has multiple send/receive loops then they
need to be synchronized with each other (see next comment). There are
a few more functions such as the distributed clock calls that are also
in the send/receive loops. They also do not require ethercat level
locks as they should be safe to call between the activate and
deactivate. They also do not need application level locking as they
should really only be called by one send/receive loop per master. All
other ethercat function calls should not be in a hard realtime context.
Ok. But you imply a struct rule for how to design applications doing
EtherCAT communication: "There can only be one send/receive loop per
master".
That might (is not in my case) accepted by all application developers.
I need to continue support for applications that use multiple send/receive
loops for multiple EtherCAT domains, each with different cycle time.
One solution could be to create a single send/receive loop per master, and
then let the multiple application loops communicate with that instead of
directly with the master. That might actually be a really good idea. But,
if that is done, I think it should fit very well as a general feature of the
EtherCAT master. It should be possible to implement without any application
specific code.
That's not what I said. You can have multiple send/receive loops per master,
but it is up to your application to make sure they don't call any of the ecrt
functions I listed at the top at the same time (per master). If your
application is a single process with multiple threads you can use an
application semaphore (for example), but if it is multiple processes you will
need a named semaphore (or similar) that all of the processes share.
For the single process, multiple threads application, it is not that
bad. It is requring all such application developers to do the same over
and over again, wasting time and introducing the same bugs again and
again.

The multiple process world can be different though. You basically end
up with a new EtherCAT API. A combination of the etherlabmaster API and
a custom named semaphore API. Without this, applications will not work
properly together. Why not include such a feature directly in
etherlabmaster? Without it, I think we are making the user-space
applications (non-RTDM) into a second-class citizen.

It might not matter to you, but as this how we use etherlabmaster, it
matters to me.
Post by Graeme Foot
The final "problem" is as you say a design question. Should etherlabmaster
support synchronization of access from multiple threads/processes, or should
this be left entirely to the application developer. But if
master->datagram_queue is changed to something lock-free, this "problem"
master->will
likely be solved also.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
However, if they are being called from multiple processes (rather than
threads) then you need to use something like a named semaphore so
that all processes share the same lock. Of course if you are using
callbacks (for EoE) you are probably doing that anyway.
You easily have We have multiple processes. Even with just a single
application, you have EtherCAT-OP, EtherCAT-EoE and the application.
All using some of the same shared data structures. Throwing more
multiple application just adds to that, but I think it is critical
enough with just a single application.
Even if you have multiple processes (rather than threads), it is your
design decision as to which process takes priority and whether a
particular send/receive loop should wait a bit even if it could get
the lock now. You can only do that if your application controls the
locking of these functions.
While that sounds reasonable, I believe it is not entirely true. You could
write a layer responsible for handling this, which gets information from the
application(s) for how to deal with this. Think of it like a kind of
scheduler.
Example from an application I have seen. You have two applications
(processes), each driving it's own EtherCAT domain. One application/domain
is running every 2 ms, the other application/domain every 10 ms. In
addition, EoE slaves are also used. For each application, you could specify
the priority, the cycle time and the start time. EoE should also be given a
priority. This EtherCAT "scheduler" would then be able to decide which
EtherCAT communication to do and when.
So while I agree that it might be most obvious to implement the decission of
when to send/receive directly in the application, I think it could also be
implemented in a more generic way.
You don't need the EtherCAT master to do that scheduling. That is already
available in Linux / RTAI or whatever you use.
What do you mean?
Post by Graeme Foot
But once one of the above ecrt function calls is in progress you need
to ensure no other process / thread will call one at the same time.
Yes, that is your preferred design (that **I** need to ensure that). I
prefer a design where the master will do that for me (actually the
application developers using our platform).
Post by Graeme Foot
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
They should probably also be turned off for RTAI / Xenomi in
general and as I said above use application level locking.
You can pass --enable-rtdm when compiling to enable RTDM (and
compile rtdm-ioctl.o).
Passing --enable-rtdm to ./configure will define EC_RTDM macro and
enable the automake conditional ENABLE_RTDM.
This will trigger Kbuild to build rtdm-ioctl.o (from rtdm-ioctl.c),
and this will be done with EC_IOCTL_RTDM macro defined.
But ioctl.c will be compiled as always, and that is without
EC_IOCTL_RTDM defined. Was it supposed to be defined? If so, it
should be easy to fix, but someone should definitely do some real testing of it.
I think the locks should be disabled in both "rtdm-ioctl.c" and
"ioctl.c" if using RTAI, but I use RTDM so haven't confirmed this.
I don't see how it could ever be disabled for ioctl.c with the current code.
By not using the patch that added them.
Of-course. But back to the same discussion again.
Post by Graeme Foot
Post by Graeme Foot
Further to that I don't think they should be there at all. Simple
applications have one send/receive loop so don't need locks.
Applications with multiple send/receive loops or EOE need to control
their own locking for optimal results anyway.
Again, I don't fully agree with you on that.
But more importantly, it is not possible for user-space applications in
combination with EoE, due to inability to set application callbacks.
Post by Graeme Foot
Also, having these lock functions use master_sem the send/receive
functions block unnecessarily with non-realtime ethercat function
calls. At a minimum they should be locking on their own semaphore.
True. I have patches for fixing that. The master_sem is definitely a no-go
for real-time.
- base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
- base/0018-Use-call-back-functions.patch
Well, that depends on how you look at it. Removing those patches
removes the problem from etherlabmaster. But then I just have to do
exactly the same thing on top of etherlabmaster, and I will be back to
square one.
Post by Graeme Foot
My new EoE patch rolls back the changes from
"base/0018-Use-call-back-functions.patch" and I use EC_IOCTL_RTDM so ignore
the changes in
"base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch".
Ok. I think you could have been a bit more explicit about that
roll-back.

/Esben
Graeme Foot
2018-03-06 00:39:50 UTC
Permalink
Hi,

The email's getting a little hard to read so I'll try to summarise what I'm doing here first and reply to specifics below.

The primary problem with RTAI is that you cannot use Linux locks in RTAI hard realtime calls. So the base 0017 and 0018 patches are useless for RTAI. Secondly RTAI / RTDM cannot use callbacks, so EoE does not work for this framework.

I put the patches in the features branch of Gavin's patchset as they should be optional and especially patch 2 (0002-eoe-via-rtdm.patch) is specifically for RTAI / RTDM for EoE. I haven't looked too hard at making this one generic as it does break the previous patches I mentioned (base 0017 and 0018). But it also needs to break them as the whole point of it is to use an application level lock (i.e. RTAI lock) instead (as per the design of the vanilla Etherlab master). So no, the patch does not support "plain" user-space applications (i.e. not using RTDM) as my focus was only for RTDM applications. Patch 2 could probably have a few more defines to make sure it only affects RTDM users and keeps 0017/0018 active for non-RTDM installs.

patch 1 (0001-eoe-addif-delif-tools.patch) only looks at changing the lifetime options of the eoe interfaces so anyone should be able to use this.

patch 2 (0002-eoe-via-rtdm.patch) is targeted to add the ability for RTDM users to provide a user space alternate to the callbacks framework. If people prefer this method to allow "plain" Linux user space applications to supply their own locking then it could be extended. But it does mean that the base patches "0017-Master-locks-to-avoid-corrupted-datagram-queue.patch" and "0018-Use-call-back-functions.patch" do need to be removed.

So using patch 2 is a matter of preference: Patch 2 uses the Vanilla EtherLab EtherCAT master approach of "the user application knows best" as to when a lock can be acquired, but also covers the "realtime RTAI can't use Linux locks" issue; OR don't use patch 2 and go with the patches 0017/0018 approach where the master handles this locking (which as I said is not an option for RTAI).
Post by Esben Haabendal
Hi Graeme
Need to get one thing straight first.
RTDM user-space is not the same as "plain" Linux user-space.
I know that you are 100% aware that, but your replies to me seems less clear about it.
Your 0002-eoe-via-rtdm.patch does not seem usable for "plain" user-space applications, so does not apply to the applications we support.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
These 4 functions are special. The master should be written (and
mostly seems to be) in a fashion that between a
ecrt_master_activate() and ecrt_master_deactivate() the above calls
do not require any locks to synchronize with the master code,
except for the EOE thread which uses callbacks.
Well, mostly is the same as not in my world. If they mostly do not
require any locks, they do require locks.
When I say mostly, I mean the vanilla Etherlab version is ok, but the
patch that added ec_ioctl_lock_up() and ec_ioctl_lock_down() broke
this. I do not use this patch.
Do you mean that the patches (and I guess you mean 0017-Master-locks-to-avoid-corrupted-datagram-queue.patch and
0018-Use-call-back-functions.patch) introduces the need for locking?
That does not sound right to me. As I read them (and your writing in this thread), the need for locking is there with and without these patches. Without them, locking needs to be handled at application level. With the patches, the master tries to handle the locking.
And that is a clearly a design change, and one which you clearly does not agree with.
But if you really mean that these patches broke etherlabmaster (and not "just" changed the design), would you please explain it a bit more detailed?
Locking is required if multiple processes / threads will access these functions, regardless of which method you prefer.

Vanilla EtherLab approach is, user application knows best when to lock, leave it up to them. But EoE thread is running in the master so allow the user application to provide some callbacks so that the EoE thread can obtain the lock in the user application and call the appropriate send/receive function.

Gavin's patchset approach (base patches 0017/0018) is to change this to provide the locking within the master.
- Problem 1) Patch base/0017 uses master->master_sem which can be held by a non-realtime process
- Problem 2) Patch base/0017 uses master->master_sem which can't be used by RTAI, so for RTAI it is disabled anyway
- Problem 3) the callbacks are designed to allow the EoE send/receive to be synced with your application. Patch base/0018 repurposes this so that all user space ecrt_master_send() calls use the send callback and user space ecrt_master_receive() calls use the receive callback. The send callback is designed to call ecrt_master_send_ext() specifically for EoE processing, not ecrt_master_send(). The receive callback is designed to call ecrt_master_receive() so no real problem there.

Note: ecrt_master_send_ext() will queue any external datagrams and then call ecrt_master_send(). The drawbacks to calling ecrt_master_send_ext() rather than ecrt_master_send() from your main send/receive loops are:
1) If you have a very fast and tight send/receive loop this may add extra overhead you don't want
2) You can get extra jitter between calling ecrt_master_application_time() and the frame going onto the wire
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
As for the EOE thread, I might be overlooking something obvious. But
how are you supposed to use callbacks when using the library API?
As far as I read the code, if you are using EoE and not RTDM, you
will always use the standard ec_master_internal_send_cb() and
ec_master_internal_receive_cb() callbacks. See ec_ioctl_activate().
If you do not set EoE callbacks (with ecrt_master_callbacks()) there
will be NO callbacks once the master goes active. (It will not use
ec_master_internal_send_cb() and ec_master_internal_receive_cb(). It
will not start an EoE processing thread. You will get a "No EoE
processing because of missing callbacks!" message in your system log.)
#ifndef EC_IOCTL_RTDM
ecrt_master_callbacks(master, ec_master_internal_send_cb,
ec_master_internal_receive_cb, master); #endif
So at least for (non RTDM) user-space applications, that activate the master with ioctl, EoE callbacks are always set to
ec_master_internal_send_cb() and ec_master_internal_receive_cb().
Sorry, I misread your comment and the code. I'm using RTDM so don't notice this.

For RTAI/Xenomi the internal callbacks are not allowed as they use master->io_sem which is a non-RTAI compatible semaphore. The reason for patch base/0018 is to allow master->io_sem to be shared for the user space send/receive calls and the EoE thread (which will be created as the callbacks are defined). But it will break if your application also uses a kernel process, as it will not be using the master->io_sem semaphore (not likely I know).

The above ecrt_master_callbacks() in ec_ioctl_activate() was added way back in 2012 (rev 2433) when RTDM became more fully supported. I don't know what behaviour it had before that but that is probably irrelevant as it has been like this for so long.
Post by Esben Haabendal
Post by Graeme Foot
If your application is in kernel space then you can specify callback
functions and then use application level locking. If your application
is in user space, then you cannot specify callback functions as the
kernel cannot call back into user space. This is why I created my latest patch "0002-eoe-via-rtdm.patch".
This patch lets you create your own EoE processing thread in your own
application. This of course also lets you use application level locking.
But only for RTDM. The functions added are guarded by
#if !defined(__KERNEL__) && defined(EC_RTDM) && (EC_EOE)
which means that they do not apply when using non-RTDM user-space.
Correct.
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
And with them, I don't see how you are supposed to do locking at the
application level. I will not be able to specify application level
locks that are used by both application and eoe, unless they are
implemented directly in the master (kernel).
And with ec_master_eoe_thread() running inside master, you really
need to do locking to synchronize access to master->datagram_queue
(i.e. the io_sem lock). If not, you will have race conditions
between eoe and 3 of the 4
The io_sem lock is only used by the master when it is idle. Once you
activate the master it is up to your application to provide the locking.
Ok. But I dare to say that it is an open question if that is an design for etherlabmaster. I believe it has both pros and cons.
Pro: Application developers can implement synchronization as they find most optimal for them.
Cons: Application developers need to figure out and implement synchronization without much (any) help from the etherlabmaster code.
And getting synchronization is IMHO something that is often tricky to get right (at least for many developers).
Generally agree, but RTAI can't use EtherCAT master locking in hard realtime, so it needs an alternate solution anyway. And the vanilla Etherlab design is to use application level locking (except for the ecrt_master_callbacks() call in ec_ioctl_activate() which I suspect snuck through).
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
ecrt_master_send()
ecrt_master_receive()
ecrt_domain_queue()
So we really do not to do locking in at least these 3 when using EOE.
Or maybe the whole master->datagram_queue should be refactored to
something that can be synchronized safely in a lock-free way?
The point of the callback functions is so that you can make sure your
application does not call any of ecrt_master_receive(),
ecrt_master_send(), ecrt_master_send_ext(), ecrt_domain_process(),
ecrt_domain_queue() for a given master at the same time. If you don't
use EoE you don't need the callbacks, but you still need to protect
ecrt_master_receive(), ecrt_master_send(),ecrt_domain_process(), ecrt_domain_queue().
Which I personally see as a trivial omission in the etherlabmaster design (requiring all API users to spend time and code writing the same code for protect the API from itself).
Not a trivial omission as locking can't be done in the master for RTAI for hard realtime calls, so it need an alternative. Also I suspect most simple applications only require one send/receive loop, so don't require locking anyway. I've only needed to start adding locks since starting to use EoE.
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
And the reason it uses callbacks is because only your application
knows if it is appropriate to allow the EOE thread to call
ecrt_master_recieve() and ecrt_master_send_ext() etc at any
particular time.
I see what you mean. But I just don't see how it is possible to
accomplish this for user space applications.
Exactly why I create my patch "0002-eoe-via-rtdm.patch".
And again, which does not extend to non-RTDM user-space.
Agreed, but it could be, but that will require application level locks for those that don't currently use them (for non-RTDM user-space apps).
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
However, if your application has multiple send/receive loops then
they need to be synchronized with each other (see next comment).
There are a few more functions such as the distributed clock calls
that are also in the send/receive loops. They also do not require
ethercat level locks as they should be safe to call between the
activate and deactivate. They also do not need application level
locking as they should really only be called by one send/receive
loop per master. All other ethercat function calls should not be in a hard realtime context.
Ok. But you imply a struct rule for how to design applications doing
EtherCAT communication: "There can only be one send/receive loop per
master".
That might (is not in my case) accepted by all application developers.
I need to continue support for applications that use multiple
send/receive loops for multiple EtherCAT domains, each with different cycle time.
One solution could be to create a single send/receive loop per
master, and then let the multiple application loops communicate with
that instead of directly with the master. That might actually be a
really good idea. But, if that is done, I think it should fit very
well as a general feature of the EtherCAT master. It should be
possible to implement without any application specific code.
That's not what I said. You can have multiple send/receive loops per
master, but it is up to your application to make sure they don't call
any of the ecrt functions I listed at the top at the same time (per
master). If your application is a single process with multiple
threads you can use an application semaphore (for example), but if it
is multiple processes you will need a named semaphore (or similar) that all of the processes share.
For the single process, multiple threads application, it is not that bad. It is requring all such application developers to do the same over and over again, wasting time and introducing the same bugs again and again.
The multiple process world can be different though. You basically end up with a new EtherCAT API. A combination of the etherlabmaster API and a custom named semaphore API. Without this, applications will not work properly together. Why not include such a feature directly in etherlabmaster? Without it, I think we are making the user-space applications (non-RTDM) into a second-class citizen.
It might not matter to you, but as this how we use etherlabmaster, it matters to me.
It always maters and that's why the patch is optional.

For RTAI, application level locks is the only allowable case. For non-RTAI apps either approach is possible, but most would probably prefer the built in EtherCAT master level locks (until they aren't powerful enough anymore).
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
The final "problem" is as you say a design question. Should
etherlabmaster support synchronization of access from multiple
threads/processes, or should this be left entirely to the application
developer. But if
master->datagram_queue is changed to something lock-free, this "problem"
master->will
likely be solved also.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
However, if they are being called from multiple processes (rather than
threads) then you need to use something like a named semaphore
so that all processes share the same lock. Of course if you are
using callbacks (for EoE) you are probably doing that anyway.
You easily have We have multiple processes. Even with just a
single application, you have EtherCAT-OP, EtherCAT-EoE and the application.
All using some of the same shared data structures. Throwing more
multiple application just adds to that, but I think it is critical
enough with just a single application.
Even if you have multiple processes (rather than threads), it is
your design decision as to which process takes priority and whether
a particular send/receive loop should wait a bit even if it could
get the lock now. You can only do that if your application
controls the locking of these functions.
While that sounds reasonable, I believe it is not entirely true. You
could write a layer responsible for handling this, which gets
information from the
application(s) for how to deal with this. Think of it like a kind of
scheduler.
Example from an application I have seen. You have two applications
(processes), each driving it's own EtherCAT domain. One
application/domain is running every 2 ms, the other
application/domain every 10 ms. In addition, EoE slaves are also
used. For each application, you could specify the priority, the
cycle time and the start time. EoE should also be given a priority.
This EtherCAT "scheduler" would then be able to decide which EtherCAT communication to do and when.
So while I agree that it might be most obvious to implement the
decission of when to send/receive directly in the application, I
think it could also be implemented in a more generic way.
You don't need the EtherCAT master to do that scheduling. That is
already available in Linux / RTAI or whatever you use.
What do you mean?
In Linux you can give processes various priorities (via its nice value). For RTAI you can also give each task a priority, but I think it is more strict that standard Linux where higher priority tasks will fully run before lower priority tasks. If you have a single CPU (or single allocated CPU to all your EtherCAT processes) then the higher priority process will get more CPU and be woken up before a lower priority process if they are both waiting on a lock. Even with multiple CPU's the higher priority task should always get a lock before a lower priority task.

So you don't need the EtherCAT master to have a scheduler on top of this.
Post by Esben Haabendal
Post by Graeme Foot
But once one of the above ecrt function calls is in progress you need
to ensure no other process / thread will call one at the same time.
Yes, that is your preferred design (that **I** need to ensure that). I prefer a design where the master will do that for me (actually the application developers using our platform).
For RTAI its not preferred, it's required. Can't use Linux locks in RTAI hard realtime calls.
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
They should probably also be turned off for RTAI / Xenomi in
general and as I said above use application level locking.
You can pass --enable-rtdm when compiling to enable RTDM (and
compile rtdm-ioctl.o).
Passing --enable-rtdm to ./configure will define EC_RTDM macro and
enable the automake conditional ENABLE_RTDM.
This will trigger Kbuild to build rtdm-ioctl.o (from
rtdm-ioctl.c), and this will be done with EC_IOCTL_RTDM macro defined.
But ioctl.c will be compiled as always, and that is without
EC_IOCTL_RTDM defined. Was it supposed to be defined? If so, it
should be easy to fix, but someone should definitely do some real
testing of it.
I think the locks should be disabled in both "rtdm-ioctl.c" and
"ioctl.c" if using RTAI, but I use RTDM so haven't confirmed this.
I don't see how it could ever be disabled for ioctl.c with the current code.
By not using the patch that added them.
Of-course. But back to the same discussion again.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
Further to that I don't think they should be there at all. Simple
applications have one send/receive loop so don't need locks.
Applications with multiple send/receive loops or EOE need to
control their own locking for optimal results anyway.
Again, I don't fully agree with you on that.
But more importantly, it is not possible for user-space applications
in combination with EoE, due to inability to set application callbacks.
Post by Graeme Foot
Also, having these lock functions use master_sem the send/receive
functions block unnecessarily with non-realtime ethercat function
calls. At a minimum they should be locking on their own semaphore.
True. I have patches for fixing that. The master_sem is definitely
a no-go for real-time.
- base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
- base/0018-Use-call-back-functions.patch
Well, that depends on how you look at it. Removing those patches removes the problem from etherlabmaster. But then I just have to do exactly the same thing on top of etherlabmaster, and I will be back to square one.
Post by Graeme Foot
My new EoE patch rolls back the changes from
"base/0018-Use-call-back-functions.patch" and I use EC_IOCTL_RTDM so
ignore the changes in
"base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch".
Ok. I think you could have been a bit more explicit about that roll-back.
/Esben
Regards,
Graeme.
Graeme Foot
2018-03-06 21:52:30 UTC
Permalink
Hi Gavin,

I have attached updated eoe patches to maintain compatibility with your patchset for non-RTAI / RTDM applications.

Patch 0001:
This will now continue with the existing behaviour of auto-create and auto-delete of eoe ports as the eoe slaves are detected and removed. It does contain the fix for removing the lock from the transmit callback so eoe users will probably want this update even if they want to keep existing behaviour.

Patch 0002:
Still only targeted at RTAI / RTDM users, but will now maintain existing behaviour (internal locking via the internal callbacks) for Linux user space applications.


Regards,
Graeme.

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Graeme Foot
Sent: Tuesday, 6 March 2018 1:40 PM
To: Esben Haabendal <***@gmail.com>
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Hi,

The email's getting a little hard to read so I'll try to summarise what I'm doing here first and reply to specifics below.

The primary problem with RTAI is that you cannot use Linux locks in RTAI hard realtime calls. So the base 0017 and 0018 patches are useless for RTAI. Secondly RTAI / RTDM cannot use callbacks, so EoE does not work for this framework.

I put the patches in the features branch of Gavin's patchset as they should be optional and especially patch 2 (0002-eoe-via-rtdm.patch) is specifically for RTAI / RTDM for EoE. I haven't looked too hard at making this one generic as it does break the previous patches I mentioned (base 0017 and 0018). But it also needs to break them as the whole point of it is to use an application level lock (i.e. RTAI lock) instead (as per the design of the vanilla Etherlab master). So no, the patch does not support "plain" user-space applications (i.e. not using RTDM) as my focus was only for RTDM applications. Patch 2 could probably have a few more defines to make sure it only affects RTDM users and keeps 0017/0018 active for non-RTDM installs.

patch 1 (0001-eoe-addif-delif-tools.patch) only looks at changing the lifetime options of the eoe interfaces so anyone should be able to use this.

patch 2 (0002-eoe-via-rtdm.patch) is targeted to add the ability for RTDM users to provide a user space alternate to the callbacks framework. If people prefer this method to allow "plain" Linux user space applications to supply their own locking then it could be extended. But it does mean that the base patches "0017-Master-locks-to-avoid-corrupted-datagram-queue.patch" and "0018-Use-call-back-functions.patch" do need to be removed.

So using patch 2 is a matter of preference: Patch 2 uses the Vanilla EtherLab EtherCAT master approach of "the user application knows best" as to when a lock can be acquired, but also covers the "realtime RTAI can't use Linux locks" issue; OR don't use patch 2 and go with the patches 0017/0018 approach where the master handles this locking (which as I said is not an option for RTAI).
Post by Esben Haabendal
Hi Graeme
Need to get one thing straight first.
RTDM user-space is not the same as "plain" Linux user-space.
I know that you are 100% aware that, but your replies to me seems less clear about it.
Your 0002-eoe-via-rtdm.patch does not seem usable for "plain" user-space applications, so does not apply to the applications we support.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
These 4 functions are special. The master should be written (and
mostly seems to be) in a fashion that between a
ecrt_master_activate() and ecrt_master_deactivate() the above
calls do not require any locks to synchronize with the master
code, except for the EOE thread which uses callbacks.
Well, mostly is the same as not in my world. If they mostly do not
require any locks, they do require locks.
When I say mostly, I mean the vanilla Etherlab version is ok, but
the patch that added ec_ioctl_lock_up() and ec_ioctl_lock_down()
broke this. I do not use this patch.
Do you mean that the patches (and I guess you mean
0017-Master-locks-to-avoid-corrupted-datagram-queue.patch and
0018-Use-call-back-functions.patch) introduces the need for locking?
That does not sound right to me. As I read them (and your writing in this thread), the need for locking is there with and without these patches. Without them, locking needs to be handled at application level. With the patches, the master tries to handle the locking.
And that is a clearly a design change, and one which you clearly does not agree with.
But if you really mean that these patches broke etherlabmaster (and not "just" changed the design), would you please explain it a bit more detailed?
Locking is required if multiple processes / threads will access these functions, regardless of which method you prefer.

Vanilla EtherLab approach is, user application knows best when to lock, leave it up to them. But EoE thread is running in the master so allow the user application to provide some callbacks so that the EoE thread can obtain the lock in the user application and call the appropriate send/receive function.

Gavin's patchset approach (base patches 0017/0018) is to change this to provide the locking within the master.
- Problem 1) Patch base/0017 uses master->master_sem which can be held by a non-realtime process
- Problem 2) Patch base/0017 uses master->master_sem which can't be used by RTAI, so for RTAI it is disabled anyway
- Problem 3) the callbacks are designed to allow the EoE send/receive to be synced with your application. Patch base/0018 repurposes this so that all user space ecrt_master_send() calls use the send callback and user space ecrt_master_receive() calls use the receive callback. The send callback is designed to call ecrt_master_send_ext() specifically for EoE processing, not ecrt_master_send(). The receive callback is designed to call ecrt_master_receive() so no real problem there.

Note: ecrt_master_send_ext() will queue any external datagrams and then call ecrt_master_send(). The drawbacks to calling ecrt_master_send_ext() rather than ecrt_master_send() from your main send/receive loops are:
1) If you have a very fast and tight send/receive loop this may add extra overhead you don't want
2) You can get extra jitter between calling ecrt_master_application_time() and the frame going onto the wire
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
As for the EOE thread, I might be overlooking something obvious.
But how are you supposed to use callbacks when using the library API?
As far as I read the code, if you are using EoE and not RTDM, you
will always use the standard ec_master_internal_send_cb() and
ec_master_internal_receive_cb() callbacks. See ec_ioctl_activate().
If you do not set EoE callbacks (with ecrt_master_callbacks()) there
will be NO callbacks once the master goes active. (It will not use
ec_master_internal_send_cb() and ec_master_internal_receive_cb().
It will not start an EoE processing thread. You will get a "No EoE
processing because of missing callbacks!" message in your system log.)
#ifndef EC_IOCTL_RTDM
ecrt_master_callbacks(master, ec_master_internal_send_cb,
ec_master_internal_receive_cb, master); #endif
So at least for (non RTDM) user-space applications, that activate the
master with ioctl, EoE callbacks are always set to
ec_master_internal_send_cb() and ec_master_internal_receive_cb().
Sorry, I misread your comment and the code. I'm using RTDM so don't notice this.

For RTAI/Xenomi the internal callbacks are not allowed as they use master->io_sem which is a non-RTAI compatible semaphore. The reason for patch base/0018 is to allow master->io_sem to be shared for the user space send/receive calls and the EoE thread (which will be created as the callbacks are defined). But it will break if your application also uses a kernel process, as it will not be using the master->io_sem semaphore (not likely I know).

The above ecrt_master_callbacks() in ec_ioctl_activate() was added way back in 2012 (rev 2433) when RTDM became more fully supported. I don't know what behaviour it had before that but that is probably irrelevant as it has been like this for so long.
Post by Esben Haabendal
Post by Graeme Foot
If your application is in kernel space then you can specify callback
functions and then use application level locking. If your
application is in user space, then you cannot specify callback
functions as the kernel cannot call back into user space. This is why I created my latest patch "0002-eoe-via-rtdm.patch".
This patch lets you create your own EoE processing thread in your
own application. This of course also lets you use application level locking.
But only for RTDM. The functions added are guarded by
#if !defined(__KERNEL__) && defined(EC_RTDM) && (EC_EOE)
which means that they do not apply when using non-RTDM user-space.
Correct.
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
And with them, I don't see how you are supposed to do locking at
the application level. I will not be able to specify application
level locks that are used by both application and eoe, unless they
are implemented directly in the master (kernel).
And with ec_master_eoe_thread() running inside master, you really
need to do locking to synchronize access to master->datagram_queue
(i.e. the io_sem lock). If not, you will have race conditions
between eoe and 3 of the 4
The io_sem lock is only used by the master when it is idle. Once
you activate the master it is up to your application to provide the locking.
Ok. But I dare to say that it is an open question if that is an design for etherlabmaster. I believe it has both pros and cons.
Pro: Application developers can implement synchronization as they find most optimal for them.
Cons: Application developers need to figure out and implement synchronization without much (any) help from the etherlabmaster code.
And getting synchronization is IMHO something that is often tricky to get right (at least for many developers).
Generally agree, but RTAI can't use EtherCAT master locking in hard realtime, so it needs an alternate solution anyway. And the vanilla Etherlab design is to use application level locking (except for the ecrt_master_callbacks() call in ec_ioctl_activate() which I suspect snuck through).
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
ecrt_master_send()
ecrt_master_receive()
ecrt_domain_queue()
So we really do not to do locking in at least these 3 when using EOE.
Or maybe the whole master->datagram_queue should be refactored to
something that can be synchronized safely in a lock-free way?
The point of the callback functions is so that you can make sure
your application does not call any of ecrt_master_receive(),
ecrt_master_send(), ecrt_master_send_ext(), ecrt_domain_process(),
ecrt_domain_queue() for a given master at the same time. If you
don't use EoE you don't need the callbacks, but you still need to
protect ecrt_master_receive(), ecrt_master_send(),ecrt_domain_process(), ecrt_domain_queue().
Which I personally see as a trivial omission in the etherlabmaster design (requiring all API users to spend time and code writing the same code for protect the API from itself).
Not a trivial omission as locking can't be done in the master for RTAI for hard realtime calls, so it need an alternative. Also I suspect most simple applications only require one send/receive loop, so don't require locking anyway. I've only needed to start adding locks since starting to use EoE.
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
And the reason it uses callbacks is because only your application
knows if it is appropriate to allow the EOE thread to call
ecrt_master_recieve() and ecrt_master_send_ext() etc at any
particular time.
I see what you mean. But I just don't see how it is possible to
accomplish this for user space applications.
Exactly why I create my patch "0002-eoe-via-rtdm.patch".
And again, which does not extend to non-RTDM user-space.
Agreed, but it could be, but that will require application level locks for those that don't currently use them (for non-RTDM user-space apps).
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
However, if your application has multiple send/receive loops then
they need to be synchronized with each other (see next comment).
There are a few more functions such as the distributed clock
calls that are also in the send/receive loops. They also do not
require ethercat level locks as they should be safe to call
between the activate and deactivate. They also do not need
application level locking as they should really only be called by
one send/receive loop per master. All other ethercat function calls should not be in a hard realtime context.
Ok. But you imply a struct rule for how to design applications
doing EtherCAT communication: "There can only be one send/receive
loop per master".
That might (is not in my case) accepted by all application developers.
I need to continue support for applications that use multiple
send/receive loops for multiple EtherCAT domains, each with different cycle time.
One solution could be to create a single send/receive loop per
master, and then let the multiple application loops communicate
with that instead of directly with the master. That might actually
be a really good idea. But, if that is done, I think it should fit
very well as a general feature of the EtherCAT master. It should
be possible to implement without any application specific code.
That's not what I said. You can have multiple send/receive loops
per master, but it is up to your application to make sure they don't
call any of the ecrt functions I listed at the top at the same time
(per master). If your application is a single process with multiple
threads you can use an application semaphore (for example), but if
it is multiple processes you will need a named semaphore (or similar) that all of the processes share.
For the single process, multiple threads application, it is not that bad. It is requring all such application developers to do the same over and over again, wasting time and introducing the same bugs again and again.
The multiple process world can be different though. You basically end up with a new EtherCAT API. A combination of the etherlabmaster API and a custom named semaphore API. Without this, applications will not work properly together. Why not include such a feature directly in etherlabmaster? Without it, I think we are making the user-space applications (non-RTDM) into a second-class citizen.
It might not matter to you, but as this how we use etherlabmaster, it matters to me.
It always maters and that's why the patch is optional.

For RTAI, application level locks is the only allowable case. For non-RTAI apps either approach is possible, but most would probably prefer the built in EtherCAT master level locks (until they aren't powerful enough anymore).
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
The final "problem" is as you say a design question. Should
etherlabmaster support synchronization of access from multiple
threads/processes, or should this be left entirely to the
application developer. But if
master->datagram_queue is changed to something lock-free, this "problem"
master->will
likely be solved also.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
However, if they are being called from multiple processes (rather than
threads) then you need to use something like a named semaphore
so that all processes share the same lock. Of course if you
are using callbacks (for EoE) you are probably doing that anyway.
You easily have We have multiple processes. Even with just a
single application, you have EtherCAT-OP, EtherCAT-EoE and the application.
All using some of the same shared data structures. Throwing
more multiple application just adds to that, but I think it is
critical enough with just a single application.
Even if you have multiple processes (rather than threads), it is
your design decision as to which process takes priority and
whether a particular send/receive loop should wait a bit even if
it could get the lock now. You can only do that if your
application controls the locking of these functions.
While that sounds reasonable, I believe it is not entirely true.
You could write a layer responsible for handling this, which gets
information from the
application(s) for how to deal with this. Think of it like a kind
of scheduler.
Example from an application I have seen. You have two applications
(processes), each driving it's own EtherCAT domain. One
application/domain is running every 2 ms, the other
application/domain every 10 ms. In addition, EoE slaves are also
used. For each application, you could specify the priority, the
cycle time and the start time. EoE should also be given a priority.
This EtherCAT "scheduler" would then be able to decide which EtherCAT communication to do and when.
So while I agree that it might be most obvious to implement the
decission of when to send/receive directly in the application, I
think it could also be implemented in a more generic way.
You don't need the EtherCAT master to do that scheduling. That is
already available in Linux / RTAI or whatever you use.
What do you mean?
In Linux you can give processes various priorities (via its nice value). For RTAI you can also give each task a priority, but I think it is more strict that standard Linux where higher priority tasks will fully run before lower priority tasks. If you have a single CPU (or single allocated CPU to all your EtherCAT processes) then the higher priority process will get more CPU and be woken up before a lower priority process if they are both waiting on a lock. Even with multiple CPU's the higher priority task should always get a lock before a lower priority task.

So you don't need the EtherCAT master to have a scheduler on top of this.
Post by Esben Haabendal
Post by Graeme Foot
But once one of the above ecrt function calls is in progress you
need to ensure no other process / thread will call one at the same time.
Yes, that is your preferred design (that **I** need to ensure that). I prefer a design where the master will do that for me (actually the application developers using our platform).
For RTAI its not preferred, it's required. Can't use Linux locks in RTAI hard realtime calls.
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
They should probably also be turned off for RTAI / Xenomi in
general and as I said above use application level locking.
You can pass --enable-rtdm when compiling to enable RTDM (and
compile rtdm-ioctl.o).
Passing --enable-rtdm to ./configure will define EC_RTDM macro
and enable the automake conditional ENABLE_RTDM.
This will trigger Kbuild to build rtdm-ioctl.o (from
rtdm-ioctl.c), and this will be done with EC_IOCTL_RTDM macro defined.
But ioctl.c will be compiled as always, and that is without
EC_IOCTL_RTDM defined. Was it supposed to be defined? If so,
it should be easy to fix, but someone should definitely do some
real testing of it.
I think the locks should be disabled in both "rtdm-ioctl.c" and
"ioctl.c" if using RTAI, but I use RTDM so haven't confirmed this.
I don't see how it could ever be disabled for ioctl.c with the current code.
By not using the patch that added them.
Of-course. But back to the same discussion again.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
Further to that I don't think they should be there at all.
Simple applications have one send/receive loop so don't need locks.
Applications with multiple send/receive loops or EOE need to
control their own locking for optimal results anyway.
Again, I don't fully agree with you on that.
But more importantly, it is not possible for user-space
applications in combination with EoE, due to inability to set application callbacks.
Post by Graeme Foot
Also, having these lock functions use master_sem the send/receive
functions block unnecessarily with non-realtime ethercat function
calls. At a minimum they should be locking on their own semaphore.
True. I have patches for fixing that. The master_sem is
definitely a no-go for real-time.
- base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
- base/0018-Use-call-back-functions.patch
Well, that depends on how you look at it. Removing those patches removes the problem from etherlabmaster. But then I just have to do exactly the same thing on top of etherlabmaster, and I will be back to square one.
Post by Graeme Foot
My new EoE patch rolls back the changes from
"base/0018-Use-call-back-functions.patch" and I use EC_IOCTL_RTDM so
ignore the changes in
"base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch".
Ok. I think you could have been a bit more explicit about that roll-back.
/Esben
Regards,
Graeme.
Graeme Foot
2018-05-07 04:16:49 UTC
Permalink
Hi,

I found a couple of bugs in my patches. New patches attached.

Graeme.


-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Graeme Foot
Sent: Wednesday, 7 March 2018 10:53 AM
To: Gavin Lambert <***@compacsort.com>
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Hi Gavin,

I have attached updated eoe patches to maintain compatibility with your patchset for non-RTAI / RTDM applications.

Patch 0001:
This will now continue with the existing behaviour of auto-create and auto-delete of eoe ports as the eoe slaves are detected and removed. It does contain the fix for removing the lock from the transmit callback so eoe users will probably want this update even if they want to keep existing behaviour.

Patch 0002:
Still only targeted at RTAI / RTDM users, but will now maintain existing behaviour (internal locking via the internal callbacks) for Linux user space applications.


Regards,
Graeme.

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Graeme Foot
Sent: Tuesday, 6 March 2018 1:40 PM
To: Esben Haabendal <***@gmail.com>
Cc: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Hi,

The email's getting a little hard to read so I'll try to summarise what I'm doing here first and reply to specifics below.

The primary problem with RTAI is that you cannot use Linux locks in RTAI hard realtime calls. So the base 0017 and 0018 patches are useless for RTAI. Secondly RTAI / RTDM cannot use callbacks, so EoE does not work for this framework.

I put the patches in the features branch of Gavin's patchset as they should be optional and especially patch 2 (0002-eoe-via-rtdm.patch) is specifically for RTAI / RTDM for EoE. I haven't looked too hard at making this one generic as it does break the previous patches I mentioned (base 0017 and 0018). But it also needs to break them as the whole point of it is to use an application level lock (i.e. RTAI lock) instead (as per the design of the vanilla Etherlab master). So no, the patch does not support "plain" user-space applications (i.e. not using RTDM) as my focus was only for RTDM applications. Patch 2 could probably have a few more defines to make sure it only affects RTDM users and keeps 0017/0018 active for non-RTDM installs.

patch 1 (0001-eoe-addif-delif-tools.patch) only looks at changing the lifetime options of the eoe interfaces so anyone should be able to use this.

patch 2 (0002-eoe-via-rtdm.patch) is targeted to add the ability for RTDM users to provide a user space alternate to the callbacks framework. If people prefer this method to allow "plain" Linux user space applications to supply their own locking then it could be extended. But it does mean that the base patches "0017-Master-locks-to-avoid-corrupted-datagram-queue.patch" and "0018-Use-call-back-functions.patch" do need to be removed.

So using patch 2 is a matter of preference: Patch 2 uses the Vanilla EtherLab EtherCAT master approach of "the user application knows best" as to when a lock can be acquired, but also covers the "realtime RTAI can't use Linux locks" issue; OR don't use patch 2 and go with the patches 0017/0018 approach where the master handles this locking (which as I said is not an option for RTAI).
Post by Esben Haabendal
Hi Graeme
Need to get one thing straight first.
RTDM user-space is not the same as "plain" Linux user-space.
I know that you are 100% aware that, but your replies to me seems less clear about it.
Your 0002-eoe-via-rtdm.patch does not seem usable for "plain" user-space applications, so does not apply to the applications we support.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
These 4 functions are special. The master should be written (and
mostly seems to be) in a fashion that between a
ecrt_master_activate() and ecrt_master_deactivate() the above
calls do not require any locks to synchronize with the master
code, except for the EOE thread which uses callbacks.
Well, mostly is the same as not in my world. If they mostly do not
require any locks, they do require locks.
When I say mostly, I mean the vanilla Etherlab version is ok, but
the patch that added ec_ioctl_lock_up() and ec_ioctl_lock_down()
broke this. I do not use this patch.
Do you mean that the patches (and I guess you mean
0017-Master-locks-to-avoid-corrupted-datagram-queue.patch and
0018-Use-call-back-functions.patch) introduces the need for locking?
That does not sound right to me. As I read them (and your writing in this thread), the need for locking is there with and without these patches. Without them, locking needs to be handled at application level. With the patches, the master tries to handle the locking.
And that is a clearly a design change, and one which you clearly does not agree with.
But if you really mean that these patches broke etherlabmaster (and not "just" changed the design), would you please explain it a bit more detailed?
Locking is required if multiple processes / threads will access these functions, regardless of which method you prefer.

Vanilla EtherLab approach is, user application knows best when to lock, leave it up to them. But EoE thread is running in the master so allow the user application to provide some callbacks so that the EoE thread can obtain the lock in the user application and call the appropriate send/receive function.

Gavin's patchset approach (base patches 0017/0018) is to change this to provide the locking within the master.
- Problem 1) Patch base/0017 uses master->master_sem which can be held by a non-realtime process
- Problem 2) Patch base/0017 uses master->master_sem which can't be used by RTAI, so for RTAI it is disabled anyway
- Problem 3) the callbacks are designed to allow the EoE send/receive to be synced with your application. Patch base/0018 repurposes this so that all user space ecrt_master_send() calls use the send callback and user space ecrt_master_receive() calls use the receive callback. The send callback is designed to call ecrt_master_send_ext() specifically for EoE processing, not ecrt_master_send(). The receive callback is designed to call ecrt_master_receive() so no real problem there.

Note: ecrt_master_send_ext() will queue any external datagrams and then call ecrt_master_send(). The drawbacks to calling ecrt_master_send_ext() rather than ecrt_master_send() from your main send/receive loops are:
1) If you have a very fast and tight send/receive loop this may add extra overhead you don't want
2) You can get extra jitter between calling ecrt_master_application_time() and the frame going onto the wire
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
As for the EOE thread, I might be overlooking something obvious.
But how are you supposed to use callbacks when using the library API?
As far as I read the code, if you are using EoE and not RTDM, you
will always use the standard ec_master_internal_send_cb() and
ec_master_internal_receive_cb() callbacks. See ec_ioctl_activate().
If you do not set EoE callbacks (with ecrt_master_callbacks()) there
will be NO callbacks once the master goes active. (It will not use
ec_master_internal_send_cb() and ec_master_internal_receive_cb().
It will not start an EoE processing thread. You will get a "No EoE
processing because of missing callbacks!" message in your system log.)
#ifndef EC_IOCTL_RTDM
ecrt_master_callbacks(master, ec_master_internal_send_cb,
ec_master_internal_receive_cb, master); #endif
So at least for (non RTDM) user-space applications, that activate the
master with ioctl, EoE callbacks are always set to
ec_master_internal_send_cb() and ec_master_internal_receive_cb().
Sorry, I misread your comment and the code. I'm using RTDM so don't notice this.

For RTAI/Xenomi the internal callbacks are not allowed as they use master->io_sem which is a non-RTAI compatible semaphore. The reason for patch base/0018 is to allow master->io_sem to be shared for the user space send/receive calls and the EoE thread (which will be created as the callbacks are defined). But it will break if your application also uses a kernel process, as it will not be using the master->io_sem semaphore (not likely I know).

The above ecrt_master_callbacks() in ec_ioctl_activate() was added way back in 2012 (rev 2433) when RTDM became more fully supported. I don't know what behaviour it had before that but that is probably irrelevant as it has been like this for so long.
Post by Esben Haabendal
Post by Graeme Foot
If your application is in kernel space then you can specify callback
functions and then use application level locking. If your
application is in user space, then you cannot specify callback
functions as the kernel cannot call back into user space. This is why I created my latest patch "0002-eoe-via-rtdm.patch".
This patch lets you create your own EoE processing thread in your
own application. This of course also lets you use application level locking.
But only for RTDM. The functions added are guarded by
#if !defined(__KERNEL__) && defined(EC_RTDM) && (EC_EOE)
which means that they do not apply when using non-RTDM user-space.
Correct.
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
And with them, I don't see how you are supposed to do locking at
the application level. I will not be able to specify application
level locks that are used by both application and eoe, unless they
are implemented directly in the master (kernel).
And with ec_master_eoe_thread() running inside master, you really
need to do locking to synchronize access to master->datagram_queue
(i.e. the io_sem lock). If not, you will have race conditions
between eoe and 3 of the 4
The io_sem lock is only used by the master when it is idle. Once
you activate the master it is up to your application to provide the locking.
Ok. But I dare to say that it is an open question if that is an design for etherlabmaster. I believe it has both pros and cons.
Pro: Application developers can implement synchronization as they find most optimal for them.
Cons: Application developers need to figure out and implement synchronization without much (any) help from the etherlabmaster code.
And getting synchronization is IMHO something that is often tricky to get right (at least for many developers).
Generally agree, but RTAI can't use EtherCAT master locking in hard realtime, so it needs an alternate solution anyway. And the vanilla Etherlab design is to use application level locking (except for the ecrt_master_callbacks() call in ec_ioctl_activate() which I suspect snuck through).
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
ecrt_master_send()
ecrt_master_receive()
ecrt_domain_queue()
So we really do not to do locking in at least these 3 when using EOE.
Or maybe the whole master->datagram_queue should be refactored to
something that can be synchronized safely in a lock-free way?
The point of the callback functions is so that you can make sure
your application does not call any of ecrt_master_receive(),
ecrt_master_send(), ecrt_master_send_ext(), ecrt_domain_process(),
ecrt_domain_queue() for a given master at the same time. If you
don't use EoE you don't need the callbacks, but you still need to
protect ecrt_master_receive(), ecrt_master_send(),ecrt_domain_process(), ecrt_domain_queue().
Which I personally see as a trivial omission in the etherlabmaster design (requiring all API users to spend time and code writing the same code for protect the API from itself).
Not a trivial omission as locking can't be done in the master for RTAI for hard realtime calls, so it need an alternative. Also I suspect most simple applications only require one send/receive loop, so don't require locking anyway. I've only needed to start adding locks since starting to use EoE.
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
And the reason it uses callbacks is because only your application
knows if it is appropriate to allow the EOE thread to call
ecrt_master_recieve() and ecrt_master_send_ext() etc at any
particular time.
I see what you mean. But I just don't see how it is possible to
accomplish this for user space applications.
Exactly why I create my patch "0002-eoe-via-rtdm.patch".
And again, which does not extend to non-RTDM user-space.
Agreed, but it could be, but that will require application level locks for those that don't currently use them (for non-RTDM user-space apps).
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
However, if your application has multiple send/receive loops then
they need to be synchronized with each other (see next comment).
There are a few more functions such as the distributed clock
calls that are also in the send/receive loops. They also do not
require ethercat level locks as they should be safe to call
between the activate and deactivate. They also do not need
application level locking as they should really only be called by
one send/receive loop per master. All other ethercat function calls should not be in a hard realtime context.
Ok. But you imply a struct rule for how to design applications
doing EtherCAT communication: "There can only be one send/receive
loop per master".
That might (is not in my case) accepted by all application developers.
I need to continue support for applications that use multiple
send/receive loops for multiple EtherCAT domains, each with different cycle time.
One solution could be to create a single send/receive loop per
master, and then let the multiple application loops communicate
with that instead of directly with the master. That might actually
be a really good idea. But, if that is done, I think it should fit
very well as a general feature of the EtherCAT master. It should
be possible to implement without any application specific code.
That's not what I said. You can have multiple send/receive loops
per master, but it is up to your application to make sure they don't
call any of the ecrt functions I listed at the top at the same time
(per master). If your application is a single process with multiple
threads you can use an application semaphore (for example), but if
it is multiple processes you will need a named semaphore (or similar) that all of the processes share.
For the single process, multiple threads application, it is not that bad. It is requring all such application developers to do the same over and over again, wasting time and introducing the same bugs again and again.
The multiple process world can be different though. You basically end up with a new EtherCAT API. A combination of the etherlabmaster API and a custom named semaphore API. Without this, applications will not work properly together. Why not include such a feature directly in etherlabmaster? Without it, I think we are making the user-space applications (non-RTDM) into a second-class citizen.
It might not matter to you, but as this how we use etherlabmaster, it matters to me.
It always maters and that's why the patch is optional.

For RTAI, application level locks is the only allowable case. For non-RTAI apps either approach is possible, but most would probably prefer the built in EtherCAT master level locks (until they aren't powerful enough anymore).
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
The final "problem" is as you say a design question. Should
etherlabmaster support synchronization of access from multiple
threads/processes, or should this be left entirely to the
application developer. But if
master->datagram_queue is changed to something lock-free, this "problem"
master->will
likely be solved also.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
However, if they are being called from multiple processes (rather than
threads) then you need to use something like a named semaphore
so that all processes share the same lock. Of course if you
are using callbacks (for EoE) you are probably doing that anyway.
You easily have We have multiple processes. Even with just a
single application, you have EtherCAT-OP, EtherCAT-EoE and the application.
All using some of the same shared data structures. Throwing
more multiple application just adds to that, but I think it is
critical enough with just a single application.
Even if you have multiple processes (rather than threads), it is
your design decision as to which process takes priority and
whether a particular send/receive loop should wait a bit even if
it could get the lock now. You can only do that if your
application controls the locking of these functions.
While that sounds reasonable, I believe it is not entirely true.
You could write a layer responsible for handling this, which gets
information from the
application(s) for how to deal with this. Think of it like a kind
of scheduler.
Example from an application I have seen. You have two applications
(processes), each driving it's own EtherCAT domain. One
application/domain is running every 2 ms, the other
application/domain every 10 ms. In addition, EoE slaves are also
used. For each application, you could specify the priority, the
cycle time and the start time. EoE should also be given a priority.
This EtherCAT "scheduler" would then be able to decide which EtherCAT communication to do and when.
So while I agree that it might be most obvious to implement the
decission of when to send/receive directly in the application, I
think it could also be implemented in a more generic way.
You don't need the EtherCAT master to do that scheduling. That is
already available in Linux / RTAI or whatever you use.
What do you mean?
In Linux you can give processes various priorities (via its nice value). For RTAI you can also give each task a priority, but I think it is more strict that standard Linux where higher priority tasks will fully run before lower priority tasks. If you have a single CPU (or single allocated CPU to all your EtherCAT processes) then the higher priority process will get more CPU and be woken up before a lower priority process if they are both waiting on a lock. Even with multiple CPU's the higher priority task should always get a lock before a lower priority task.

So you don't need the EtherCAT master to have a scheduler on top of this.
Post by Esben Haabendal
Post by Graeme Foot
But once one of the above ecrt function calls is in progress you
need to ensure no other process / thread will call one at the same time.
Yes, that is your preferred design (that **I** need to ensure that). I prefer a design where the master will do that for me (actually the application developers using our platform).
For RTAI its not preferred, it's required. Can't use Linux locks in RTAI hard realtime calls.
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
They should probably also be turned off for RTAI / Xenomi in
general and as I said above use application level locking.
You can pass --enable-rtdm when compiling to enable RTDM (and
compile rtdm-ioctl.o).
Passing --enable-rtdm to ./configure will define EC_RTDM macro
and enable the automake conditional ENABLE_RTDM.
This will trigger Kbuild to build rtdm-ioctl.o (from
rtdm-ioctl.c), and this will be done with EC_IOCTL_RTDM macro defined.
But ioctl.c will be compiled as always, and that is without
EC_IOCTL_RTDM defined. Was it supposed to be defined? If so,
it should be easy to fix, but someone should definitely do some
real testing of it.
I think the locks should be disabled in both "rtdm-ioctl.c" and
"ioctl.c" if using RTAI, but I use RTDM so haven't confirmed this.
I don't see how it could ever be disabled for ioctl.c with the current code.
By not using the patch that added them.
Of-course. But back to the same discussion again.
Post by Graeme Foot
Post by Esben Haabendal
Post by Graeme Foot
Further to that I don't think they should be there at all.
Simple applications have one send/receive loop so don't need locks.
Applications with multiple send/receive loops or EOE need to
control their own locking for optimal results anyway.
Again, I don't fully agree with you on that.
But more importantly, it is not possible for user-space
applications in combination with EoE, due to inability to set application callbacks.
Post by Graeme Foot
Also, having these lock functions use master_sem the send/receive
functions block unnecessarily with non-realtime ethercat function
calls. At a minimum they should be locking on their own semaphore.
True. I have patches for fixing that. The master_sem is
definitely a no-go for real-time.
- base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
- base/0018-Use-call-back-functions.patch
Well, that depends on how you look at it. Removing those patches removes the problem from etherlabmaster. But then I just have to do exactly the same thing on top of etherlabmaster, and I will be back to square one.
Post by Graeme Foot
My new EoE patch rolls back the changes from
"base/0018-Use-call-back-functions.patch" and I use EC_IOCTL_RTDM so
ignore the changes in
"base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch".
Ok. I think you could have been a bit more explicit about that roll-back.
/Esben
Regards,
Graeme.

Gavin Lambert
2018-03-16 01:03:54 UTC
Permalink
Post by Esben Haabendal
The multiple process world can be different though. You basically end up
with a new EtherCAT API. A combination of the etherlabmaster API and a
custom named semaphore API. Without this, applications will not work
properly together. Why not include such a feature directly in
etherlabmaster? Without it, I think we are making the user-space
applications (non-RTDM) into a second-class citizen.
If you have multiple independent processes then they must (of necessity) either operate on separate masters (in which case no locking beyond what the kernel already does is required) or they must all communicate (through some mechanism of your own devising) with a single process who "owns" the master. The master library does not allow you to reserve or activate a single master concurrently in separate processes.

If you have multiple tasks within a single process operating on the same master (eg. multiple cycles with different intervals) then they _should_ operate on different domains, and *must* coordinate their calls to the ECRT APIs in some fashion. In upstream Etherlab, this requires application-level locking. In the current patchset, the locking is done for you (except for RTDM), but I'm not entirely convinced this is the correct design choice (see my other reply).

If you have one process that is running a realtime application loop and another process that only performs non-realtime tasks (eg. injecting CoE requests), even on the same master, then all versions of Etherlab handle this for you without requiring additional locking. This is how you can still use the "ethercat" command line tool while running a realtime application.
Loading...