Linux boot hangs on SAM9G25 and SAM9G45

This forum is for users of Microchip MPUs and who are interested in using Linux OS.

Moderator: nferre

jay214128
Posts: 9
Joined: Fri Jan 23, 2015 3:29 am

Linux boot hangs on SAM9G25 and SAM9G45

Fri Jul 21, 2017 6:22 pm

I have been trying to get the 4.4 kernel booting on my custom boards using the SAM9G45 and SAM9G25 MCUs. The 9G45 seems to work most of the time, but the 9G25 always hangs right after the
Freeing unused kernel memory:
message on the debug console.

I have tracked this down to the atmel_serial driver used for the debug console. When the kernel execs /sbin/init from kernel_init_freeable(), it opens the /dev/console device and dups this to stdout and stderr (3 references). /sbin/init (init.sysvinit on my systems) ends up closing stdin, stdout, and stderr. These actions have the following effects.
1) the first open in kernel_init_freeable() invokes atmel_startup() in atmel_serial.c.
2) the last close from init invokes atmel_shutdown().

In atmel_shutdown(), it calls atmel_stop_tx(). atmel_stop_tx() disables the UART transmitter. This action prevents any further kernel output (printk) from working, and causes them to hang the kernel.

It seems that the UART transmitter should not be disabled in atmel_stop_tx() if the UART is also being used as a console device by the kernel. Indeed, I can correct the hang by commenting out the line of code that disables the UART transmitter in atmel_stop_tx(). This is specifically on the dbgu UART in the atmel_serial driver.

The following patch to atmel_serial.c seems to fix this bug.

Code: Select all

Index: drivers/tty/serial/atmel_serial.c
===================================================================
--- drivers/tty/serial/atmel_serial.c	(revision 4602)
+++ drivers/tty/serial/atmel_serial.c	(working copy)
@@ -73,10 +73,13 @@
 
 #include "serial_mctrl_gpio.h"
 
 static void atmel_start_rx(struct uart_port *port);
 static void atmel_stop_rx(struct uart_port *port);
+#ifdef CONFIG_SERIAL_ATMEL_CONSOLE
+static inline bool atmel_is_console_port(struct uart_port *port);
+#endif
 
 #ifdef CONFIG_SERIAL_ATMEL_TTYAT
 
 /* Use device name ttyAT, major 204 and minor 154-169.  This is necessary if we
  * should coexist with the 8250 driver, such as if we have an external 16C550
@@ -2002,10 +2005,17 @@ static void atmel_shutdown(struct uart_p
 	free_irq(port->irq, port);
 
 	atmel_port->ms_irq_enabled = false;
 
 	atmel_flush_buffer(port);
+
+#ifdef CONFIG_SERIAL_ATMEL_CONSOLE
+	/* Reenable the transmitter and receiver for console device. */
+	if (atmel_is_console_port(port)) {
+		atmel_uart_writel(port, ATMEL_US_CR, ATMEL_US_TXEN | ATMEL_US_RXEN);
+	}
+#endif
 }
 
 /*
  * Power / Clock management.
  */
Does this seem like the correct fix?
blue_z
Location: USA
Posts: 1560
Joined: Thu Apr 19, 2007 10:15 pm

Re: Linux boot hangs on SAM9G25 and SAM9G45

Sat Jul 22, 2017 1:48 am

jay214128 wrote:I have been trying to get the 4.4 kernel booting on my custom boards using the SAM9G45 and SAM9G25 MCUs. The 9G45 seems to work most of the time, but the 9G25 always hangs right after the ...
Can you be more clear? Does the SAM9G45 have the exact same symptom as the SAM9G25?
jay214128 wrote:This action prevents any further kernel output (printk) from working, and causes them to hang the kernel.
Does the kernel actually "hang", or are you exaggerating and the console has merely gone silent?
jay214128 wrote:Does this seem like the correct fix?
While it's commendable that you investigated this enough to come up with a patch that alleviates the problem, I wonder if you're treating the symptom rather than the root cause.
First it's not clear what 4.4 kernel that you are using. The patch does not seem to match either mainline version 4.4 nor the Linux4SAM 4.4-at91 branch.

Then there's the issue of all the other boards using Atmel SoCs with consoles on the DBGU port using SystemV initialization that boot successfully, e.g. running the Linux4SAM 5.5 demo (which is uses a 4.4.26 kernel) (which about 9 months old now).
Why aren't there more reports of this problem?
IOW what's so peculiar about your board(s) and system build that makes it susceptible to this issue?

Regards
jay214128
Posts: 9
Joined: Fri Jan 23, 2015 3:29 am

Re: Linux boot hangs on SAM9G25 and SAM9G45

Mon Jul 24, 2017 5:47 pm

Yes, when the issue occurs on the 9G45, the symptom is the same as the 9G25.

The issue is a race condition. The timing is different enough on the board with the 9G45 and the 9G25 to see it sometimes happen with the 9G45 and always with the 9G25.

Yes, I believe the kernel actually hangs. It is in a busy wait in atmel_console_putchar() waiting for the THR to be empty. As evidence of this, I can boot to a shell (init=/bin/sh on the linux command line). The /dev/console (tty) port is held open by the shell, so the problem does not occur. Letting the system sit idle, I can see other kernel activity happen, evidenced by console kernel messages such as, "random: nonblocking pool is initialized" being displayed. When booting /sbin/init and the system "hangs", no such messages ever appear. Any kernel thread that calls a printk function will hang.

This is from the Linux4SAM 4.4-at91 branch (4.4.51). I can see that this bug has been fixed by commit 497e1e1 upstream in the 4.4-at91 branch https://github.com/linux4sam/linux-at91 ... c0bd5d7c2f.

On my existing production code (3.2.y kernel) I have never observed this behavior. While porting to the 4.4.y kernel, I did not observe this behavior either (booting from an SD Card, 9G45 system). I did see it occasionally when booting from NAND flash (also 9G45), but did not track it down, since it was intermittent. When porting to the 4.4.y kernel on a second product platform (9G25 system), I ran into it every time booting from an SD Card, and subsequently tracked it down.

It is a race condition, so it is not surprising the it has not been reported before. It may also depend on which init one is using (sysvinit in my case). To trigger the hang, the application execed by the kernel for /sbin/init, must close it's stdin, stdout, and stderr (withouth dup()ing it), so the atmel_shutdown() routine gets called. Then, a kernel printk action must happen before some other user space application (init or something forked by init) reopens /dev/console. In my case, the next printk is a whine from oom_adj_write(), "udevd (715): /proc/715/oom_adj is deprecated, please use /proc/715/oom_score_adj instead.".

Return to “Linux”

Who is online

Users browsing this forum: No registered users and 1 guest