[etherlab-dev] Pre-announcement: unofficial patchset update (Gavin Lambert)

Discussion:

Gavin Lambert

2017-05-05 04:44:58 UTC

2) Your 0054 patch successfully moved the slave sdo request processing
from the master fsm's idle state to be called every cycle of the master
thread.
However, I have my Linux environment set to run at 100Hz, so this
thread would only fire once every 10ms. Each SDO read/write request
would take approx. 30ms to complete. I didn't want to change Linux to
run at 1000Hz so I
created patch "etherlabmaster-0010-
allow_app_to_process_sdo_requests_from_realtime.patch".

Maybe I'm missing something about how the RTAI version works, but all I

did

was to move the logic from inside fsm_master to inside fsm_slave; AFAIK
both of these execute on the same thread (either the master IDLE or master
OPERATION thread). This is independent of the application realtime loop
either way, so this should not have changed how frequently they run; it

just

allows multiple slave requests to run in parallel rather than forcing them

execute sequentially, so it should be a net performance gain.
Though it does seem reasonable in your use case to want to run the slave
FSMs more often. I don't see how you could do that safely, though -- you
can't run ec_master_exec_slave_fsms outside of the master_sem lock, and
your RTAI task might interrupt Linux while it's still holding that lock,

which will

deadlock your RTAI task.

Are you configuring with --enable-hrtimer? After having a quick look at the
code, I can see a potential performance degradation from the patch if you
aren't using it (or did enable it but didn't call
ec_master_set_send_interval with an appropriate value after activating the
master). Perhaps you could try enabling that instead of using your patch
and see if it helps?

Graeme Foot

2017-05-07 08:32:55 UTC

Permalink

Post by Gavin Lambert

Maybe I'm missing something about how the RTAI version works, but all I

did

just

allows multiple slave requests to run in parallel rather than forcing them

which will

deadlock your RTAI task.

Hi,

I don't use ec_master_set_send_interval() or --enable-hrtimer since my masters operational thread has always run at 10ms (100Hz) anyway. (I probably should so will look at adding that in at some stage.) My Linux kernel is configured to run at 100Hz and the master thread is not realtime so is scheduled by the Linux scheduler (RTAI). Because Linux is set to 100Hz, it only runs the masters operation thread once every 10ms.

Prior to your SDO patch, ec_fsm_master_action_process_sdo() was being called by the masters fsm, but from its idle state. So it would only be called after all other processing and housekeeping was complete and was only being fired approx once every 800ms on my setup with 50 odd slaves. After the patch ec_master_exec_slave_fsms() is now called every time the masters operational thread fires. All good except that on my system that is still only once every 10ms.

So my new patch allows ec_master_exec_slave_fsms() to be called from my realtime context. As you pointed out the master_sem lock would cause a deadlock, so I don't use it. Because I don't use the lock I have instead added some flags to track whether it is currently safe to make the ec_master_exec_slave_fsms() call. It's generally just the rescan thats a problem.

I don't know if the patch will be useful for anyone else, but is useful if Linux is configured for 100Hz. It may also be useful on short cycle time systems, e.g. 100 - 250us cycle times, where you want to process the SDO's faster. Even if Linux is set to 1000Hz is will only schedule the master operational thread at 1ms. The master thread may also be delayed if the Linux side gets some heavy CPU usage.

Thanks for thinking about it. I like getting feedback for my patches as its easy to overlook potential bugs or alternate solutions which can be better.

Graeme.

Gavin Lambert

2017-05-08 00:30:45 UTC

Permalink

I don't use ec_master_set_send_interval() or --enable-hrtimer since my

masters operational thread

has always run at 10ms (100Hz) anyway. (I probably should so will look at

adding that in at some

stage.) My Linux kernel is configured to run at 100Hz and the master

thread is not realtime so is

scheduled by the Linux scheduler (RTAI). Because Linux is set to 100Hz,

it only runs the masters

operation thread once every 10ms.
Prior to your SDO patch, ec_fsm_master_action_process_sdo() was being

called by the masters

fsm, but from its idle state. So it would only be called after all other

processing and

housekeeping was complete and was only being fired approx once every 800ms

on my setup

with 50 odd slaves. After the patch ec_master_exec_slave_fsms() is now

called every time

the masters operational thread fires. All good except that on my system

that is still only once

every 10ms.

Right, but what I was saying is that prior to my patch it would actually
service the requests faster than 10ms even on your system due to the way the
re-scheduling is done (the master isn't idle until it finishes any
outstanding requests, so it calls schedule() instead of schedule_timeout(1)
-- ie. if there's no other work for the kernel to do it will reschedule
immediately instead of waiting for the next 10ms time slice). After my
patch the master is idle even while requests are in progress, so the
condition it checks is no longer sufficient and it calls schedule_timeout(1)
too soon, making it slower than it should be.

I didn't notice this regression because I *am* using --enable-hrtimer, which
does not have the same issue. So what I was suggesting is that *you* try
using --enable-hrtimer, which I'm reasonably certain will solve that
performance issue without needing to try to exec slave FSMs from the
realtime context.

If you can't use --enable-hrtimer for some reason, then the most likely
solution to the above issue is to find the two lines where it checks for
ec_fsm_master_idle (in ec_master_idle_thread and ec_master_operation_thread)
and change the condition from this:

if (ec_fsm_master_idle(&master->fsm)) {

to this:

if (ec_fsm_master_idle(&master->fsm)
&& !master->fsm_exec_count) {

This is just air code and I haven't tested it, but it seems reasonably
likely to solve the issue and restore performance without --enable-hrtimer
to pre-patch levels or better. Though there might be a risk that it will
make the kernel do some busy-waiting in some cases, though that shouldn't
bother an RTAI application.

So my new patch allows ec_master_exec_slave_fsms() to be called from my

realtime context. As

you pointed out the master_sem lock would cause a deadlock, so I don't use

it. Because I don't

use the lock I have instead added some flags to track whether it is

currently safe to make the

ec_master_exec_slave_fsms() call. It's generally just the rescan thats a

problem.

I haven't looked at your patch in detail, but it makes me nervous to pull
code outside of a lock like that; there are a lot of data structures that it
protects, and some of them might be more subtle than rescan. Also, while
this probably isn't a problem with RTAI (since it can pre-empt the Linux
kernel), this API probably would be unsafe to use with regular kernel or
userspace code due to the inverse problem -- what if the app code is in the
middle of executing ec_master_exec_slave_fsms() when the master thread
decides to start a rescan (or otherwise mutate data structures it depends
on)?

I don't know if the patch will be useful for anyone else, but is useful if

Linux is configured for

100Hz. It may also be useful on short cycle time systems, e.g. 100 -

250us cycle times,

where you want to process the SDO's faster. Even if Linux is set to

1000Hz is will only

schedule the master operational thread at 1ms. The master thread may also

be delayed if

the Linux side gets some heavy CPU usage.

SDOs by design are intended to be slower-than-cycle tasks. They're for
occasional configuration, diagnostic, or slow acyclic tasks, not for rapid
activity, so if you're trying to get 1ms or higher response rates out of
them, you're probably doing it wrong. (Recommended timeouts for SDO tasks
are generally measured in *seconds*, not milliseconds.)