[etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

Discussion:

Dr.-Ing. Matthias Schöpfer

2016-01-26 13:22:16 UTC

Hi!

We started using etherlab/ethercat and are quite impressed. Nice work!

We are running Linux with a RT_PREEMPT Kernel and e1000e ethercat
driver. We have to run at a cycle time of 1ms and we have jitter from
clock_nanosleep of about 15 microsecs max.

Nevertheless, we suffer from time to time from these:

EtherCAT WARNING 0: 2 datagrams UNMATCHED!
EtherCAT 0: Domain 0: Working counter changed to 9/9.
EtherCAT 0: Domain 0: Working counter changed to 0/9.

Especially, when we apply load to the system. From previous projects, I
experienced these effects when IRQ/Kernel Thread was not set to
appropriate RT Level.

My Question: has anybody experienced similar problems, and would it be
worth to investigate it. And if I decide to patch the kernel module,
where is a good starting point.

Thanks and regards,

Matthias Schöpfer
--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach

Office: +49 2226 83600 00
Fax: +49 2226 83600 11
Email: ***@robolab.de

mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte
Registergericht Amtsgericht Bonn
Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any
unauthorised copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie
die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail

Thomas Winding

2016-02-01 08:07:43 UTC

Permalink

Hi Matthias Schöpfer

I have seen the same problem. I am also running at 1 ms and using clock_nanosleep.

I suspect that the problem arise when you send a new telegram before you have received the previously send telegram.

Which version of the ethercat are you using?
Which version of kernel are you using?

Best regard,

Thomas Winding

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Dr.-Ing. Matthias Schöpfer
Sent: 26. januar 2016 14:22
To: etherlab-***@etherlab.org
Subject: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

Hi!

We started using etherlab/ethercat and are quite impressed. Nice work!

We are running Linux with a RT_PREEMPT Kernel and e1000e ethercat driver. We have to run at a cycle time of 1ms and we have jitter from clock_nanosleep of about 15 microsecs max.

Nevertheless, we suffer from time to time from these:

EtherCAT WARNING 0: 2 datagrams UNMATCHED!
EtherCAT 0: Domain 0: Working counter changed to 9/9.
EtherCAT 0: Domain 0: Working counter changed to 0/9.

Especially, when we apply load to the system. From previous projects, I experienced these effects when IRQ/Kernel Thread was not set to appropriate RT Level.

My Question: has anybody experienced similar problems, and would it be worth to investigate it. And if I decide to patch the kernel module, where is a good starting point.

Thanks and regards,

Matthias Schöpfer
--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach

Office: +49 2226 83600 00
Fax: +49 2226 83600 11
Email: ***@robolab.de

mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
etherlab-***@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Dr.-Ing. Matthias Schöpfer

2016-02-01 08:26:41 UTC

Permalink

Hi Thomas Winding,

thanks for your answer!

We are running kernel 3.12.31-rt45, since the newer kernel versions do
not seem to run as reliable regarding real time at least on our hardware.

ethercat is from the mecurical repository and we use the e1000e driver.

Most of our problems we might have solved, since we had an issue with
our cycle time / calculations where sometimes, our cycle lasted 600-800
microseconds, which seems to be way to long. We are now down to < 220
microseconds.

In rare cases, when we start the software, we still constantly running
in the mentioned issues. When we stop and restart, everything is fine. I
wonder, if we sometimes hit a misfortuned timing. As of lately, I was
not able to reproduce it.

Regarding your suspicion: I wonder, if it is better to sync the
transmits to the 1 ms cycle instead of the receives?!

Regards,

Matthias Schöpfer

Post by Thomas Winding
Hi Matthias Schöpfer
I have seen the same problem. I am also running at 1 ms and using clock_nanosleep.
I suspect that the problem arise when you send a new telegram before you have received the previously send telegram.
Which version of the ethercat are you using?
Which version of kernel are you using?
Best regard,
Thomas Winding
-----Original Message-----
Sent: 26. januar 2016 14:22
Subject: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel
Hi!
We started using etherlab/ethercat and are quite impressed. Nice work!
We are running Linux with a RT_PREEMPT Kernel and e1000e ethercat driver. We have to run at a cycle time of 1ms and we have jitter from clock_nanosleep of about 15 microsecs max.
EtherCAT WARNING 0: 2 datagrams UNMATCHED!
EtherCAT 0: Domain 0: Working counter changed to 9/9.
EtherCAT 0: Domain 0: Working counter changed to 0/9.
Especially, when we apply load to the system. From previous projects, I experienced these effects when IRQ/Kernel Thread was not set to appropriate RT Level.
My Question: has anybody experienced similar problems, and would it be worth to investigate it. And if I decide to patch the kernel module, where is a good starting point.
Thanks and regards,
Matthias Schöpfer
--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach
Office: +49 2226 83600 00
Fax: +49 2226 83600 11
mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach

Office: +49 2226 83600 00
Fax: +49 2226 83600 11
Email: ***@robolab.de

mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte
Registergericht Amtsgericht Bonn
Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any
unauthorised copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie
die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail

Armin Steinhoff

2016-02-01 10:14:30 UTC

Permalink

Hi Dr. Schöpfer,

please have a look to the test rack at the OASDL homepage and you will
see that the same release of PREEMPT_RT is performing very different on
different motherboards.

IMHO ... the most important issue is the SMI interrupt and the triggered
execution of the low level firmware of the motherboard.
Also a transaction on the PCI bus using DMA transfers can increase the
response time on a single core ...

Best Regards

Armin

http://www.steinhoff-automation.com

Post by Dr.-Ing. Matthias SchÃ¶pfer
Hi Thomas Winding,
thanks for your answer!
We are running kernel 3.12.31-rt45, since the newer kernel versions do
not seem to run as reliable regarding real time at least on our hardware.
ethercat is from the mecurical repository and we use the e1000e driver.
Most of our problems we might have solved, since we had an issue with
our cycle time / calculations where sometimes, our cycle lasted 600-800
microseconds, which seems to be way to long. We are now down to < 220
microseconds.
In rare cases, when we start the software, we still constantly running
in the mentioned issues. When we stop and restart, everything is fine. I
wonder, if we sometimes hit a misfortuned timing. As of lately, I was
not able to reproduce it.
Regarding your suspicion: I wonder, if it is better to sync the
transmits to the 1 ms cycle instead of the receives?!
Regards,
Matthias SchÃ¶pfer

Hi Matthias SchÃ¶pfer
I have seen the same problem. I am also running at 1 ms and using clock_nanosleep.
I suspect that the problem arise when you send a new telegram before you have received the previously send telegram.
Which version of the ethercat are you using?
Which version of kernel are you using?
Best regard,
Thomas Winding
-----Original Message-----
Sent: 26. januar 2016 14:22
Subject: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel
Hi!
We started using etherlab/ethercat and are quite impressed. Nice work!
We are running Linux with a RT_PREEMPT Kernel and e1000e ethercat driver. We have to run at a cycle time of 1ms and we have jitter from clock_nanosleep of about 15 microsecs max.
EtherCAT WARNING 0: 2 datagrams UNMATCHED!
EtherCAT 0: Domain 0: Working counter changed to 9/9.
EtherCAT 0: Domain 0: Working counter changed to 0/9.
Especially, when we apply load to the system. From previous projects, I experienced these effects when IRQ/Kernel Thread was not set to appropriate RT Level.
My Question: has anybody experienced similar problems, and would it be worth to investigate it. And if I decide to patch the kernel module, where is a good starting point.
Thanks and regards,
Matthias SchÃ¶pfer
--
Dr. Matthias SchÃ¶pfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach
Office: +49 2226 83600 00
Fax: +49 2226 83600 11
mz robolab GmbH
Vertretungsberechtigte GeschÃ€ftsfÃŒhrer: Dr. RÃŒdiger MaaÃ, Ralf Schulte Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
Diese E-Mail enthÃ€lt vertrauliche und/oder rechtlich geschÃŒtzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtÃŒmlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Dr.-Ing. Matthias Schöpfer

2016-02-01 13:01:16 UTC

Permalink

Hi Armin,

thanks for your message. We are using this kernel version in some other
realtime setups successfully. cyclictest and hwlatdetect show, that the
system is running as one would expect.

In these setups, we usually got a hardware interrupt, and it was usually
necessary to boost the priority of the threaded interrupt to be totally
safe even in situations of high (I/O) load.

So, my question is, if it could be a fruitful endeavor to look into the
etherlab kernel modules to set the priority for ethercat related
tasks/threads. From looking from outside, I see a EtherCAT-OP thread,
which I can easily put to RT prio (and did), but it seems that most work
is done in the ksoftirqd, which as far as I know is of general use.

I have not looked into the sources, and I am not 100% sure if this is
the source of some problems we see here. Therefore my question, has
anyone looked into it? If it is just about dispatching the work to a
dedicated kernel thread instead of ksoftirqd the required changes would
possibly small...

Thanks and Regards,

Mattthias

Post by Armin Steinhoff
Hi Dr. Schöpfer,
please have a look to the test rack at the OASDL homepage and you will
see that the same release of PREEMPT_RT is performing very different on
different motherboards.
IMHO ... the most important issue is the SMI interrupt and the triggered
execution of the low level firmware of the motherboard.
Also a transaction on the PCI bus using DMA transfers can increase the
response time on a single core ...
Best Regards
Armin
http://www.steinhoff-automation.com

Hi Matthias SchÃ¶pfer
I have seen the same problem. I am also running at 1 ms and using clock_nanosleep.
I suspect that the problem arise when you send a new telegram
before you have received the previously send telegram.
Which version of the ethercat are you using?
Which version of kernel are you using?
Best regard,
Thomas Winding
-----Original Message-----
Behalf Of Dr.-Ing. Matthias SchÃ¶pfer
Sent: 26. januar 2016 14:22
Subject: [etherlab-dev] Possible Realtime Issues with Ethercat Master
and RT Preempt Kernel
Hi!
We started using etherlab/ethercat and are quite impressed. Nice work!
We are running Linux with a RT_PREEMPT Kernel and e1000e ethercat
driver. We have to run at a cycle time of 1ms and we have jitter from
clock_nanosleep of about 15 microsecs max.
EtherCAT WARNING 0: 2 datagrams UNMATCHED!
EtherCAT 0: Domain 0: Working counter changed to 9/9.
EtherCAT 0: Domain 0: Working counter changed to 0/9.
Especially, when we apply load to the system. From previous projects,
I experienced these effects when IRQ/Kernel Thread was not set to
appropriate RT Level.
My Question: has anybody experienced similar problems, and would it
be worth to investigate it. And if I decide to patch the kernel
module, where is a good starting point.
Thanks and regards,
Matthias SchÃ¶pfer
--
Dr. Matthias SchÃ¶pfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach
Office: +49 2226 83600 00
Fax: +49 2226 83600 11
mz robolab GmbH
Vertretungsberechtigte GeschÃ€ftsfÃŒhrer: Dr. RÃŒdiger MaaÃ, Ralf
Schulte Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail.
Any unauthorised copying, disclosure or distribution of the material
in this e-mail is strictly forbidden.
Diese E-Mail enthÃ€lt vertrauliche und/oder rechtlich geschÃŒtzte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
E-Mail irrtÃŒmlich erhalten haben, informieren Sie bitte sofort den
Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie
die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Martin Troxler

2016-02-01 13:45:01 UTC

Permalink

Hi Matthias

The e1000e Tx/Rx IRQ threads must have a priority that is higher than
the priority of the realtime thread (the one that calls
ecrt_send/receive) and all throttling done by the e1000e must be
disabled. The priority of the Ethercat-OP thread is not really a problem.

But, we had a problem on some e1000e revisions when a network statistic
watch was running (e.g. gnome-system-monitor)

watch -n 0.1 cat /sys/class/net/eth1/statistics/*
sudo trace-cmd record -e sched -e 'irq_handler_*' -p function -l

'ec_*' -l 'ecrt_*' -l 'e1000*' -b 16384 -s 250 -r 90
Watch trace (use thread filters and zoom)

kernelshark

BTW. our machines use a 250microsecond cycle time!

Regards
Martin

Hi Armin,
thanks for your message. We are using this kernel version in some other
realtime setups successfully. cyclictest and hwlatdetect show, that the
system is running as one would expect.
In these setups, we usually got a hardware interrupt, and it was usually
necessary to boost the priority of the threaded interrupt to be totally
safe even in situations of high (I/O) load.
So, my question is, if it could be a fruitful endeavor to look into the
etherlab kernel modules to set the priority for ethercat related
tasks/threads. From looking from outside, I see a EtherCAT-OP thread,
which I can easily put to RT prio (and did), but it seems that most work
is done in the ksoftirqd, which as far as I know is of general use.
I have not looked into the sources, and I am not 100% sure if this is
the source of some problems we see here. Therefore my question, has
anyone looked into it? If it is just about dispatching the work to a
dedicated kernel thread instead of ksoftirqd the required changes would
possibly small...
Thanks and Regards,
Mattthias

Hi Matthias SchÃ¶pfer
I have seen the same problem. I am also running at 1 ms and using clock_nanosleep.
I suspect that the problem arise when you send a new telegram
before you have received the previously send telegram.
Which version of the ethercat are you using?
Which version of kernel are you using?
Best regard,
Thomas Winding
-----Original Message-----
Behalf Of Dr.-Ing. Matthias SchÃ¶pfer
Sent: 26. januar 2016 14:22
Subject: [etherlab-dev] Possible Realtime Issues with Ethercat Master
and RT Preempt Kernel
Hi!
We started using etherlab/ethercat and are quite impressed. Nice work!
We are running Linux with a RT_PREEMPT Kernel and e1000e ethercat
driver. We have to run at a cycle time of 1ms and we have jitter from
clock_nanosleep of about 15 microsecs max.
EtherCAT WARNING 0: 2 datagrams UNMATCHED!
EtherCAT 0: Domain 0: Working counter changed to 9/9.
EtherCAT 0: Domain 0: Working counter changed to 0/9.
Especially, when we apply load to the system. From previous projects,
I experienced these effects when IRQ/Kernel Thread was not set to
appropriate RT Level.
My Question: has anybody experienced similar problems, and would it
be worth to investigate it. And if I decide to patch the kernel
module, where is a good starting point.
Thanks and regards,
Matthias SchÃ¶pfer
--
Dr. Matthias SchÃ¶pfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach
Office: +49 2226 83600 00
Fax: +49 2226 83600 11
mz robolab GmbH
Vertretungsberechtigte GeschÃ€ftsfÃŒhrer: Dr. RÃŒdiger MaaÃ, Ralf
Schulte Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail.
Any unauthorised copying, disclosure or distribution of the material
in this e-mail is strictly forbidden.
Diese E-Mail enthÃ€lt vertrauliche und/oder rechtlich geschÃŒtzte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
E-Mail irrtÃŒmlich erhalten haben, informieren Sie bitte sofort den
Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie
die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Note:

This e-mail is for the named person's use only. It may contain confidential and/or privileged information. If you have received this e-mail in error, please notify the sender immediately and delete the material from any system. Any unauthorized copying, disclosure, distribution or other use of this information by persons or entities other than the intended recipient is prohibited.

Thank You.

Graeme Foot

2016-02-02 23:35:58 UTC

Permalink

Hi,

Just a thought for the day, how many EtherCAT slaves do you have and what's the last slaves "DC system time transmission delay"?

Use the command "ethercat slaves -v" and check the last slaves "DC system time transmission delay" value. If you have a linear topology then the complete wire transmission time (master -> slaves -> master) will be approximately twice this value. Plus some overhead for the network card driver to process the sending and receiving.

If you use the traditional cycle of:
- cycle sleep
- master_receive
- domain_process
- calc
- domain_queue
- master_send
- cycle sleep
- ...

Then you only have the sleep time available for the EtherCAT frames to go out and return. If the calc time takes too long on a particular cycle then there may not be enough time for the frames to return and be ready.

Another problem is if you do have a widely varying calc time then you get a widely varying send time (and slave read time). If using distributed clocks this may not be a problem (unless you miss the DC window), but without DC synchronization you can get some variance with when the IO on the slaves are being switched (or when a motor gets a new position Target).

The way I get around this is to use the following sequence:
- cycle sleep
- master_receive
- domain_process
- write cached values to domains
- domain_queue
- master_send
- calc - writing to cached values
- cycle sleep
- ...

The drawbacks of the above is that you need to calc and write values to a cached location else the master_receive / domain_process calls will overwrite your calced values with the returned packets data. It also adds an extra cycle before read data is available. You also have extra time overhead in managing the cached values and writing them before the send. The upside is you have a consistent send time and have nearly all of the scan period available for the frames to be sent and received allowing you to perform the calculations in parallel. Due to this you can also potentially reduce the overall cycle time, negating the problem of the reads being delayed an extra cycle.

Regards,
Graeme.

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Dr.-Ing. Matthias Schöpfer
Sent: Monday, 1 February 2016 9:27 p.m.
To: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

Hi Thomas Winding,

thanks for your answer!

We are running kernel 3.12.31-rt45, since the newer kernel versions do not seem to run as reliable regarding real time at least on our hardware.

ethercat is from the mecurical repository and we use the e1000e driver.

Most of our problems we might have solved, since we had an issue with our cycle time / calculations where sometimes, our cycle lasted 600-800 microseconds, which seems to be way to long. We are now down to < 220 microseconds.

In rare cases, when we start the software, we still constantly running in the mentioned issues. When we stop and restart, everything is fine. I wonder, if we sometimes hit a misfortuned timing. As of lately, I was not able to reproduce it.

Regarding your suspicion: I wonder, if it is better to sync the transmits to the 1 ms cycle instead of the receives?!

Regards,

Matthias Schöpfer

Post by Thomas Winding
Hi Matthias Schöpfer
I have seen the same problem. I am also running at 1 ms and using clock_nanosleep.
I suspect that the problem arise when you send a new telegram before you have received the previously send telegram.
Which version of the ethercat are you using?
Which version of kernel are you using?
Best regard,
Thomas Winding
-----Original Message-----
Behalf Of Dr.-Ing. Matthias Schöpfer
Sent: 26. januar 2016 14:22
Subject: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel
Hi!
We started using etherlab/ethercat and are quite impressed. Nice work!
We are running Linux with a RT_PREEMPT Kernel and e1000e ethercat driver. We have to run at a cycle time of 1ms and we have jitter from clock_nanosleep of about 15 microsecs max.
EtherCAT WARNING 0: 2 datagrams UNMATCHED!
EtherCAT 0: Domain 0: Working counter changed to 9/9.
EtherCAT 0: Domain 0: Working counter changed to 0/9.
Especially, when we apply load to the system. From previous projects, I experienced these effects when IRQ/Kernel Thread was not set to appropriate RT Level.
My Question: has anybody experienced similar problems, and would it be worth to investigate it. And if I decide to patch the kernel module, where is a good starting point.
Thanks and regards,
Matthias Schöpfer
--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach
Office: +49 2226 83600 00
Fax: +49 2226 83600 11
mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte
Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach

Office: +49 2226 83600 00
Fax: +49 2226 83600 11
Email: ***@robolab.de

mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
etherlab-***@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Tillman, Scott

2016-02-03 08:01:46 UTC

Permalink

Hi Graeme,

Since you brought up the typical process cycle: I have been using a process similar the second one you describe. I was very surprised when I was doing my initial development that the output frame and the return frame were overlaid, requiring double buffering of the output data. It seems like you should be able to configure the domain to place the return data in a separate (possibly neighboring) memory area. As it is the double buffering is the same idea, but causes an extra memcpy just prior to sending the domain data.

More problematic is the absence of any way to block (in user-space) waiting for the domain's return packet. As it is I am setting up my clock at 0.5ms to handle a 1ms frame time:

- sleep to output time for frame N
- memcpy buffered output states to domain frame
- ecrt_domain_queue and ecrt_master_send
- pre-calculate data for next output cycle (like servo trajectories which don't need immediate input data)
- sleep to input time for frame N return data
- trigger computations requiring latest input data (servo loop closure for instance)
- notify lower priority tasks of input changes (sensor reaction logics)
- sleep to output time for frame N+1

This is working well and has a near constant frame transmit delay minimizing output frame jitter. It would be nice to have some kind of blocking system where I could just setup a thread to block waiting for the next return frame rather than assuming a return time of, at most, half out actual cycle time.

Are these two things there somewhere and I've just missed them, or is there a good reason they haven't been implemented? It seems like these two items would minimize the overhead and maximize the processing time available for most applications.

-Scott Tillman

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Graeme Foot
Sent: Tuesday, February 02, 2016 6:36 PM
To: Dr.-Ing. Matthias Schöpfer <***@robolab.de>; etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

Hi,

Just a thought for the day, how many EtherCAT slaves do you have and what's the last slaves "DC system time transmission delay"?

Use the command "ethercat slaves -v" and check the last slaves "DC system time transmission delay" value. If you have a linear topology then the complete wire transmission time (master -> slaves -> master) will be approximately twice this value. Plus some overhead for the network card driver to process the sending and receiving.

If you use the traditional cycle of:
- cycle sleep
- master_receive
- domain_process
- calc
- domain_queue
- master_send
- cycle sleep
- ...

Then you only have the sleep time available for the EtherCAT frames to go out and return. If the calc time takes too long on a particular cycle then there may not be enough time for the frames to return and be ready.

Another problem is if you do have a widely varying calc time then you get a widely varying send time (and slave read time). If using distributed clocks this may not be a problem (unless you miss the DC window), but without DC synchronization you can get some variance with when the IO on the slaves are being switched (or when a motor gets a new position Target).

The way I get around this is to use the following sequence:
- cycle sleep
- master_receive
- domain_process
- write cached values to domains
- domain_queue
- master_send
- calc - writing to cached values
- cycle sleep
- ...

The drawbacks of the above is that you need to calc and write values to a cached location else the master_receive / domain_process calls will overwrite your calced values with the returned packets data. It also adds an extra cycle before read data is available. You also have extra time overhead in managing the cached values and writing them before the send. The upside is you have a consistent send time and have nearly all of the scan period available for the frames to be sent and received allowing you to perform the calculations in parallel. Due to this you can also potentially reduce the overall cycle time, negating the problem of the reads being delayed an extra cycle.

Regards,
Graeme.

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Dr.-Ing. Matthias Schöpfer
Sent: Monday, 1 February 2016 9:27 p.m.
To: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

Hi Thomas Winding,

thanks for your answer!

We are running kernel 3.12.31-rt45, since the newer kernel versions do not seem to run as reliable regarding real time at least on our hardware.

ethercat is from the mecurical repository and we use the e1000e driver.

Most of our problems we might have solved, since we had an issue with our cycle time / calculations where sometimes, our cycle lasted 600-800 microseconds, which seems to be way to long. We are now down to < 220 microseconds.

In rare cases, when we start the software, we still constantly running in the mentioned issues. When we stop and restart, everything is fine. I wonder, if we sometimes hit a misfortuned timing. As of lately, I was not able to reproduce it.

Regarding your suspicion: I wonder, if it is better to sync the transmits to the 1 ms cycle instead of the receives?!

Regards,

Matthias Schöpfer

Post by Thomas Winding
Hi Matthias Schöpfer
I have seen the same problem. I am also running at 1 ms and using clock_nanosleep.
I suspect that the problem arise when you send a new telegram before you have received the previously send telegram.
Which version of the ethercat are you using?
Which version of kernel are you using?
Best regard,
Thomas Winding
-----Original Message-----
Behalf Of Dr.-Ing. Matthias Schöpfer
Sent: 26. januar 2016 14:22
Subject: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel
Hi!
We started using etherlab/ethercat and are quite impressed. Nice work!
We are running Linux with a RT_PREEMPT Kernel and e1000e ethercat driver. We have to run at a cycle time of 1ms and we have jitter from clock_nanosleep of about 15 microsecs max.
EtherCAT WARNING 0: 2 datagrams UNMATCHED!
EtherCAT 0: Domain 0: Working counter changed to 9/9.
EtherCAT 0: Domain 0: Working counter changed to 0/9.
Especially, when we apply load to the system. From previous projects, I experienced these effects when IRQ/Kernel Thread was not set to appropriate RT Level.
My Question: has anybody experienced similar problems, and would it be worth to investigate it. And if I decide to patch the kernel module, where is a good starting point.
Thanks and regards,
Matthias Schöpfer
--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach
Office: +49 2226 83600 00
Fax: +49 2226 83600 11
mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte
Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach

Office: +49 2226 83600 00
Fax: +49 2226 83600 11
Email: ***@robolab.de

mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
etherlab-***@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev
_______________________________________________
etherlab-dev mailing list
etherlab-***@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Graeme Foot

2016-02-03 22:18:54 UTC

Permalink

Hi,

I saw some patches sent to the forum a while ago with regards to adding a call to check if the return packets had been received yet. Not sure if it was a blocking call or one that you had to poll. Searching for "Knowing when the packet has finished cycle" and "Waiting for network receive".

One of the other process cycle options is to split read PDO's and write PDO's into two different domains. Some of the recent Beckhoff TwinCAT literature describes how this can be useful in reducing response times.

Graeme.

-----Original Message-----
From: Tillman, Scott [mailto:***@bhemail.com]
Sent: Wednesday, 3 February 2016 9:02 p.m.
To: Graeme Foot; Dr.-Ing. Matthias Schöpfer; etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

Hi Graeme,

Since you brought up the typical process cycle: I have been using a process similar the second one you describe. I was very surprised when I was doing my initial development that the output frame and the return frame were overlaid, requiring double buffering of the output data. It seems like you should be able to configure the domain to place the return data in a separate (possibly neighboring) memory area. As it is the double buffering is the same idea, but causes an extra memcpy just prior to sending the domain data.

More problematic is the absence of any way to block (in user-space) waiting for the domain's return packet. As it is I am setting up my clock at 0.5ms to handle a 1ms frame time:

- sleep to output time for frame N
- memcpy buffered output states to domain frame
- ecrt_domain_queue and ecrt_master_send
- pre-calculate data for next output cycle (like servo trajectories which don't need immediate input data)
- sleep to input time for frame N return data
- trigger computations requiring latest input data (servo loop closure for instance)
- notify lower priority tasks of input changes (sensor reaction logics)
- sleep to output time for frame N+1

This is working well and has a near constant frame transmit delay minimizing output frame jitter. It would be nice to have some kind of blocking system where I could just setup a thread to block waiting for the next return frame rather than assuming a return time of, at most, half out actual cycle time.

Are these two things there somewhere and I've just missed them, or is there a good reason they haven't been implemented? It seems like these two items would minimize the overhead and maximize the processing time available for most applications.

-Scott Tillman

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Graeme Foot
Sent: Tuesday, February 02, 2016 6:36 PM
To: Dr.-Ing. Matthias Schöpfer <***@robolab.de>; etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

Hi,

Just a thought for the day, how many EtherCAT slaves do you have and what's the last slaves "DC system time transmission delay"?

Use the command "ethercat slaves -v" and check the last slaves "DC system time transmission delay" value. If you have a linear topology then the complete wire transmission time (master -> slaves -> master) will be approximately twice this value. Plus some overhead for the network card driver to process the sending and receiving.

If you use the traditional cycle of:
- cycle sleep
- master_receive
- domain_process
- calc
- domain_queue
- master_send
- cycle sleep
- ...

Then you only have the sleep time available for the EtherCAT frames to go out and return. If the calc time takes too long on a particular cycle then there may not be enough time for the frames to return and be ready.

Another problem is if you do have a widely varying calc time then you get a widely varying send time (and slave read time). If using distributed clocks this may not be a problem (unless you miss the DC window), but without DC synchronization you can get some variance with when the IO on the slaves are being switched (or when a motor gets a new position Target).

The way I get around this is to use the following sequence:
- cycle sleep
- master_receive
- domain_process
- write cached values to domains
- domain_queue
- master_send
- calc - writing to cached values
- cycle sleep
- ...

The drawbacks of the above is that you need to calc and write values to a cached location else the master_receive / domain_process calls will overwrite your calced values with the returned packets data. It also adds an extra cycle before read data is available. You also have extra time overhead in managing the cached values and writing them before the send. The upside is you have a consistent send time and have nearly all of the scan period available for the frames to be sent and received allowing you to perform the calculations in parallel. Due to this you can also potentially reduce the overall cycle time, negating the problem of the reads being delayed an extra cycle.

Regards,
Graeme.

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-***@etherlab.org] On Behalf Of Dr.-Ing. Matthias Schöpfer
Sent: Monday, 1 February 2016 9:27 p.m.
To: etherlab-***@etherlab.org
Subject: Re: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

Hi Thomas Winding,

thanks for your answer!

We are running kernel 3.12.31-rt45, since the newer kernel versions do not seem to run as reliable regarding real time at least on our hardware.

ethercat is from the mecurical repository and we use the e1000e driver.

Most of our problems we might have solved, since we had an issue with our cycle time / calculations where sometimes, our cycle lasted 600-800 microseconds, which seems to be way to long. We are now down to < 220 microseconds.

In rare cases, when we start the software, we still constantly running in the mentioned issues. When we stop and restart, everything is fine. I wonder, if we sometimes hit a misfortuned timing. As of lately, I was not able to reproduce it.

Regarding your suspicion: I wonder, if it is better to sync the transmits to the 1 ms cycle instead of the receives?!

Regards,

Matthias Schöpfer

Post by Thomas Winding
Hi Matthias Schöpfer
I have seen the same problem. I am also running at 1 ms and using clock_nanosleep.
I suspect that the problem arise when you send a new telegram before you have received the previously send telegram.
Which version of the ethercat are you using?
Which version of kernel are you using?
Best regard,
Thomas Winding
-----Original Message-----
Behalf Of Dr.-Ing. Matthias Schöpfer
Sent: 26. januar 2016 14:22
Subject: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel
Hi!
We started using etherlab/ethercat and are quite impressed. Nice work!
We are running Linux with a RT_PREEMPT Kernel and e1000e ethercat driver. We have to run at a cycle time of 1ms and we have jitter from clock_nanosleep of about 15 microsecs max.
EtherCAT WARNING 0: 2 datagrams UNMATCHED!
EtherCAT 0: Domain 0: Working counter changed to 9/9.
EtherCAT 0: Domain 0: Working counter changed to 0/9.
Especially, when we apply load to the system. From previous projects, I experienced these effects when IRQ/Kernel Thread was not set to appropriate RT Level.
My Question: has anybody experienced similar problems, and would it be worth to investigate it. And if I decide to patch the kernel module, where is a good starting point.
Thanks and regards,
Matthias Schöpfer
--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach
Office: +49 2226 83600 00
Fax: +49 2226 83600 11
mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte
Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

--
Dr. Matthias Schöpfer
mz robolab GmbH
Marie-Curie-Str. 1
53359 Rheinbach

Office: +49 2226 83600 00
Fax: +49 2226 83600 11
Email: ***@robolab.de

mz robolab GmbH
Vertretungsberechtigte Geschäftsführer: Dr. Rüdiger Maaß, Ralf Schulte Registergericht Amtsgericht Bonn Registernummer HRB 10595
---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
---
P please consider the environment before printing this e-mail
_______________________________________________
etherlab-dev mailing list
etherlab-***@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev
_______________________________________________
etherlab-dev mailing list
etherlab-***@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Gavin Lambert

2016-02-03 22:48:28 UTC

Permalink

Post by Tillman, Scott
Since you brought up the typical process cycle: I have been using a process
similar the second one you describe. I was very surprised when I was doing my
initial development that the output frame and the return frame were overlaid,
requiring double buffering of the output data. It seems like you should be able
to configure the domain to place the return data in a separate (possibly
neighboring) memory area. As it is the double buffering is the same idea, but
causes an extra memcpy just prior to sending the domain data.

The expectation is that you'll use the EC_WRITE_* macros to insert values into the domain memory; this takes care of byte-swapping to little-endian for you if you happen to be running on a big-endian machine. You can usually only get away with a blanket memcpy if you know your master code will only ever run on little-endian machines.

Post by Tillman, Scott
More problematic is the absence of any way to block (in user-space) waiting for
the domain's return packet. As it is I am setting up my clock at 0.5ms to handle

[...]

Post by Tillman, Scott
Are these two things there somewhere and I've just missed them, or is there a
good reason they haven't been implemented? It seems like these two items
would minimize the overhead and maximize the processing time available for
most applications.

There isn't really a way to do that; it's a fundamental design choice of the master. The EtherCAT-custom drivers disable interrupts and operate purely in polled mode in order to reduce the latency of handling an interrupt and subsequent context-switching to a kernel thread and then a user thread. What gets sacrificed along the way is any ability to wake up a thread when the packet arrives, since nothing actually knows that the packet has arrived until polled.

To put it another way, when the datagram arrives back from the slaves, it just sits in the network card's hardware buffer until the buffer read is triggered by an explicit call to ec_master_receive().

The generic drivers have interrupts enabled (so the packets will be immediately read out of the hardware buffer into a kernel buffer) but the master still treats it as a polled device and won't react until explicitly asked to receive.

With some patches (such that ec_master_receive will tell you if it has received all the datagrams back, or similar) you could call this repeatedly (perhaps with short sleeps) shortly after sending the datagrams to detect as soon as they're back again, but obviously this will increase the processor load and give the system less time to do non-realtime things. If you have some idle cores then this may not be a problem, however, and the quicker reaction may be worth it.

Having said that, as long as your calculation time is fairly constant, it's probably better to use the "classic" cycle structure than to do this -- the exact same input values will be read either way, as they're captured at the "input latch time" of the slave, which is typically either just after the last or in anticipation of the next datagram exchange.

Graeme Foot

2016-02-03 23:21:24 UTC

Permalink

Hi,

Yes, the EC_WRITE_* macros should still be used when writing to the cached write memory, but then a straight memcpy from the cache to the domain memory is fine.

Graeme.

-----Original Message-----
From: Gavin Lambert [mailto:***@compacsort.com]
Sent: Thursday, 4 February 2016 11:48 a.m.
To: 'Tillman, Scott'; Graeme Foot; Dr.-Ing. Matthias Schöpfer; etherlab-***@etherlab.org
Subject: RE: [etherlab-dev] Possible Realtime Issues with Ethercat Master and RT Preempt Kernel

Post by Tillman, Scott
Since you brought up the typical process cycle: I have been using a
process similar the second one you describe. I was very surprised
when I was doing my initial development that the output frame and the
return frame were overlaid, requiring double buffering of the output
data. It seems like you should be able to configure the domain to
place the return data in a separate (possibly
neighboring) memory area. As it is the double buffering is the same
idea, but causes an extra memcpy just prior to sending the domain data.

Post by Tillman, Scott
More problematic is the absence of any way to block (in user-space)
waiting for the domain's return packet. As it is I am setting up my

[...]

Post by Tillman, Scott
Are these two things there somewhere and I've just missed them, or is
there a good reason they haven't been implemented? It seems like
these two items would minimize the overhead and maximize the
processing time available for most applications.

Gavin Lambert

2016-02-03 23:31:04 UTC

Permalink

Well, I guess that would work too, but I was thinking of a different arrangement.

I have the "real" output values stored in scattered memory locations (in an object graph related to their functions; not structured like the domain memory at all) and then the cyclic task uses EC_WRITE_* to copy the individual values from the objects to the domain memory.

It's not really any different from having a secondary cache that can be memcpy'd, I guess, but it "feels" like less copying. (Well, I suppose technically it might be slightly slower when doing the actual copy, but conversely it'd be faster at doing the calculations, so I think that's a wash.)

OTOH I'm not controlling precision motors, so calculation latency probably doesn't bother me as much as it does some others. :)

Post by Thomas Winding
-----Original Message-----
Sent: Thursday, 4 February 2016 12:21
Subject: RE: [etherlab-dev] Possible Realtime Issues with Ethercat Master and
RT Preempt Kernel
Hi,
Yes, the EC_WRITE_* macros should still be used when writing to the cached
write memory, but then a straight memcpy from the cache to the domain
memory is fine.
Graeme.
-----Original Message-----
Sent: Thursday, 4 February 2016 11:48 a.m.
To: 'Tillman, Scott'; Graeme Foot; Dr.-Ing. Matthias Schöpfer; etherlab-
Subject: RE: [etherlab-dev] Possible Realtime Issues with Ethercat Master and
RT Preempt Kernel

Post by Tillman, Scott
Since you brought up the typical process cycle: I have been using a
process similar the second one you describe. I was very surprised
when I was doing my initial development that the output frame and the
return frame were overlaid, requiring double buffering of the output
data. It seems like you should be able to configure the domain to
place the return data in a separate (possibly
neighboring) memory area. As it is the double buffering is the same
idea, but causes an extra memcpy just prior to sending the domain data.

The expectation is that you'll use the EC_WRITE_* macros to insert values into
the domain memory; this takes care of byte-swapping to little-endian for you if
you happen to be running on a big-endian machine. You can usually only get
away with a blanket memcpy if you know your master code will only ever run on
little-endian machines.

Post by Tillman, Scott
More problematic is the absence of any way to block (in user-space)
waiting for the domain's return packet. As it is I am setting up my

[...]

Post by Tillman, Scott
Are these two things there somewhere and I've just missed them, or is
there a good reason they haven't been implemented? It seems like
these two items would minimize the overhead and maximize the
processing time available for most applications.

There isn't really a way to do that; it's a fundamental design choice of the
master. The EtherCAT-custom drivers disable interrupts and operate purely in
polled mode in order to reduce the latency of handling an interrupt and
subsequent context-switching to a kernel thread and then a user thread. What
gets sacrificed along the way is any ability to wake up a thread when the packet
arrives, since nothing actually knows that the packet has arrived until polled.
To put it another way, when the datagram arrives back from the slaves, it just
sits in the network card's hardware buffer until the buffer read is triggered by
an explicit call to ec_master_receive().
The generic drivers have interrupts enabled (so the packets will be immediately
read out of the hardware buffer into a kernel buffer) but the master still treats
it as a polled device and won't react until explicitly asked to receive.
With some patches (such that ec_master_receive will tell you if it has received
all the datagrams back, or similar) you could call this repeatedly (perhaps with
short sleeps) shortly after sending the datagrams to detect as soon as they're
back again, but obviously this will increase the processor load and give the
system less time to do non-realtime things. If you have some idle cores then
this may not be a problem, however, and the quicker reaction may be worth it.
Having said that, as long as your calculation time is fairly constant, it's probably
better to use the "classic" cycle structure than to do this -- the exact same
input values will be read either way, as they're captured at the "input latch
time" of the slave, which is typically either just after the last or in anticipation
of the next datagram exchange.

Tillman, Scott

2016-02-03 23:33:15 UTC

Permalink

Post by Gavin Lambert

Post by Tillman, Scott
Since you brought up the typical process cycle: I have been using a
process similar the second one you describe.

[...]

Post by Gavin Lambert

Post by Tillman, Scott
As it is the double buffering is the same
idea, but causes an extra memcpy just prior to sending the domain data.

The expectation is that you'll use the EC_WRITE_* macros to insert values
into the domain memory; this takes care of byte-swapping to little-endian for
you if you happen to be running on a big-endian machine. You can usually
only get away with a blanket memcpy if you know your master code will only
ever run on little-endian machines.

In my generic I/O framework byte ordering is performed (if needed) during the write into, or read out of, the data exchange arena. If the outbound frame and the inbound frame were not overlaid there wouldn't be any extra copies at all (since the exchange arena is shared memory visible to all applications). We have yet to actually *use* it on a big endien system, so that's all just optimized away anyway.

Post by Gavin Lambert

Post by Tillman, Scott
More problematic is the absence of any way to block (in user-space)
waiting for the domain's return packet. As it is I am setting up my

[...]

Post by Tillman, Scott
Are these two things there somewhere and I've just missed them, or is
there a good reason they haven't been implemented? It seems like
these two items would minimize the overhead and maximize the
processing time available for most applications.

[...]

This was my understanding from perusing the mailing list, but the conversations were years old, so, possibly, out of date. As I mentioned I have two kinds of RT workloads, computations that don't require immediate input data, and computations that do rely on immediate inputs. The former is often more complex, so it works out for me in my process. It might be of more concern were I dealing with shorter cycle times or a higher proportion of reactive computations.

-Scott Tillman