Sunday, August 1, 2021

Clusterboard A64 Insidious Reset Problem: Solved

A bare-metal compute node may soft-lock, spin-lock, deadlock, overheat, encounter resource starvation, the Docker daemon goes away, systemd becomes unstable, and on. In these cases, a watchdog timer acting like a dead man’s switch is not updated (pressed), a timer reaches zero, and the promise is the watchdog circuit restarts all the hardware like a power-on reset (POR).

Goal: Solve the Pine64 clusterboard A64 SoC reset problem to allow software and WDT resets.

My experience is that I can not get this working out of the box in U-Boot mainline for the Allwinner A64 SoC (sun50i) on a Pine64/SOPINE module in the clusterboard. This problem is more complex than I thought, so if it helps anyone else, I’ll document my thought process and investigation into how I made this work.

Results

  1. 2xAA RTC batteries in the clusterboard allow a reset.
  2. 2xAA RTC batteries in the powered clusterboard will die within a few months.
  3. Back-EMF from the clusterboard’s reset distributor ironically prevents a WDT reset.
  4. A barrier diode connected to clusterboard D1 and +RTC allows a WDT reset (see solution).

Investigation

Here is everything I’ve tried and my thought process while investigating this non-restart issue.

Kindly do not attempt to read everything. This is a detective story with many failures, but with a satisfying resolution.

This is my investigation into solving a problem that has been unsolved for years. Here are the questions I’ve asked myself.

Is a reset as simple as jumping back to the bootloader code?

No. The CPU cores may be locked up, or have wrong clock signals, and cannot reach a jump to, say, location zero for the CPU to act as if it were just turned on. We’ll need a hardware solution to reset compute modules – a watchdog timer.

Does my System on a Chip (SoC) have a hardware Watchdog Timer (WDT)?

Yes and no. The Allwinner A64 SoC used in the SOPINE (Pine64) modules has a hardware watchdog timer in the processor (A64 PDF schematic, p161), but there is no dedicated watchdog circuit external to the proccessor (SOPINE PDF schematic).

Allwinner A64 watchdog circuit
Allwinner A64 watchdog circuit

How does a Watchdog Timer (WDT) reset the System on a Chip (SoC)?

From the previous diagram, the WDT can send an interrupt (IRQ) or a reset signal (but what does “reset signal” mean?).

Useful information from the Allwinner A64 docs:

  • Timer register base address: 0x01C20C00, offsets in brackets below.
  • WDOG_IRQ_EN_REG (0xA0)WDOG_IRQ_EN defaults to 0, no IRQs are sent.
  • WDOG_CFG_REG (0xB4)WDOG_CONFIG defaults to 1 and sends reset signals to the whole system.
  • WDOG_CTRL_REG (0xB0) – Set WDOG_KEY_FIELD to 0xA57 and WDOG_RSTART to 1 to trigger a reset.
  • WDOG_MODE_REG (0xB8) – Set WDOG_INTV_VALUE to 2 for 2s and WDOG_EN to 1 to enable WDT.


Or in code, to reset the SoC:

writel(0x1, WDOG_CFG_REG);     // Signal the whole system

writel(0x10, WDOG_MODE_REG);    // Watchdog interval is 1s

writel(readl(WDOG_MODE_REG)|(1<<0), WDOG_MODE_REG); // Enable Watchdog

delay_ms(500);

writel(readl(WDOG_CTRL_REG)|(0xA57<<1)|(1<<0), WDOG_CTRL_REG); // Writel 0xA57 and 0x1

// The system will reset

The WDT in the A64 has a countdown register with a maximum 16-second watchdog period. When zero is reached, it generates a system-wide “general reset”.

What exactly is a reset signal?

From the Allwinner A64 docs, the USB controller accepts a reset signal as a register flag to enter USB suspension. The Audio Controller (OWA) accepts a register flag to reset that controller. Even the unused Smart Card Reader (SCR) accepts a reset register flag. The CPU has a few reset registers as well. So, setting a bit in various registers around the SoC causes resets in those associated controllers.

More on the CPU reset, it includes core reset, power-on reset, and H_Reset. The last mode will reset the cluster of cores, i.e. all the cores.

So how do these various reset registers get set? After all, the CPU has gone awry so it cannot set those registers. Let’s look at the A64 bus diagram.

Allwinner A64 advanced peripheral bus
Allwinner A64 advanced peripheral bus

There is something called an Advanced Peripheral Bus (APB) connected to the WDT (timer), which is connected to similar buses.

What is an Advanced Peripheral Bus (APB)?

APB is designed for low bandwidth control accesses, for example, register interfaces on system peripherals. This bus has an address and data phase similar to Advanced High-performance Bus (AHB), but a much reduced, low complexity signal list (for example no bursts). APB is part of Advanced Microcontroller Bus Architecture (AMBA) products licensable from ARM Limited.

From the system diagram above, we see the WDT (timer) puts addresses and data on the APB which crosses a bridge to the AHBs and sets registers in the various controllers – and reset registers are set around the SoC. Now I understand how the WDT resets all peripherals with dedicated buses.

Can we trigger a Power-On Reset (POR) programmatically with the Power Management IC (PMIC)?

There is an AXP803 Power Management IC (PMIC) external to the A64 SoC that has the ability to vary its voltages programmatically. It’s the power rails for the SoC and peripherals. Can this be programmed to power cycle the SoC?

AXP803 PMIC reset signal
AXP803 PMIC reset signal

By physically grounding the PWROK line, the PMIC shuts off. When the grounding is removed, the PMIC comes back to life as if the device were just plugged in. Then, what toggles the PWROK line if the CPU loses power?

AXP803 serial interface
AXP803 serial interface

There is a serial interface. What can this do for us? Short answer: nothing. This is a blind alley. The AXP803 is primarily a Li-ion charging IC and does not have any mode like “shut off and then promise to turn back on”. Some external hardware or at minimum an RC (resistor-capacitor) circuit would be needed to achieve a PMIC reset with the SOPINE’s AXP803 PMIC. Let’s stick with the WDT solution.

Does my SoC have a Brown-Out Detect (BOD) circuit?

The AXP803 PMIC monitors such situations as low power, bad battery, PWRON pin signal, over-temperature, and GPIO input edge signals. When the events occur, the corresponding IRQ status will be set to 1 and will drive the IRQ pin low. It’s up to the host to consume/notice this IRQ. When voltage does drop, the PMIC will lower current until the primary voltage rises.

The AXP803 communicates with the A64 via the Reduced Serial Bus (RSB) and thus the A64 can “notice” incoming power error states. That is a rabbit hole I’ll leave, but to answer this line of thinking, yes, there is action taken on brown-out situations.

What is the difference between PSCI and SYSRESET in U-Boot?

This is where things go off the rails. There are several community patches/hacks to add support for the AXP803 PMIC, adding a sunxi WDT reset via writing directly to registers, and enabling Driver Model (DM) reset classes. Merely experimenting with PSCI, RESET, SYSRESET, and DM_RESET in U-Boot leads to compiler errors like “Error: do_reset() is already defined”, or runtime errors like “System reset not supported on this platform”, or even the board just hangs. What are the main options?

PSCI

Power State Coordination Interface (PSCI) is used for CPU and overall system power management used for system shutdown and reset. When CONFIG_PSCI_RESET is enabled, on reset, some instruction is written to the PSCI subsystem at an address through the PSCI 0.2 interface when CONFIG_ARM_PSCI_FW is also enabled. Seems simple, and seems complicated. Overwhelmingly, most ARM boards have # CONFIG_PSCI is not set in their defconfigs.

SYSRESET

The vast majority of ARM boards have CONFIG_SYSRESET=y in their defconfigs. This seems to be a modular way to reset various components on the SoC programmatically. It has provisions for warm and cold resets, as well as resetting the PMIC (power off then on according to sysreset.h).

SYSRESET_PSCI

To murky the waters, inside SYSRESET it can interact with PSCI to do the same as PSCI when CONFIG_SYSRESET_PSCI is enabled, but very, very few defconfigs have this.

SYSRESET_WATCHDOG and SYSRESET_RESETCTL

You can see this is getting confusing. This is where I get off this train and experiment with registers, myself.

Can I trigger a reset by writing to hardware registers?

Having failed to find the right combination of configuration flags over and over again, my new approach is to cause the WDT to fire via a timeout, and eventually via the reset command in the U-Boot shell. My goal is to get the manufacturer’s sample reset code to execute and observe a proper reset by monitoring the A64 via the serial cable.

After disabling PSCI reset with # CONFIG_PSCI_RESET is not set in the defconfig file, let’s examine arch/arm/mach-sunxi/board.c. It has a section with writel instructions from 0x01c20c00. The addressing and bit-twiddling seem fine, actually. When I explicitly try to invoke a sequence of register writes, either nothing happens, or the system hangs when an mdelay() statement is reached, or the board just halts.

Here again is the manufacture’s recommendation.

writel(0x1, WDOG_CFG_REG);      // Signal the whole system

writel(0x10, WDOG_MODE_REG);    // Watchdog interval is 1s

writel(readl(WDOG_MODE_REG)|(1<<0), WDOG_MODE_REG); // Enable Watchdog

delay_ms(500);

writel(readl(WDOG_CTRL_REG)|(0xA57<<1)|(1<<0), WDOG_CTRL_REG); // Writel 0xA57 and 0x1

// The system will reset

Being absolutely explicit with my hex values, here is what I tried in code. This loops forever.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

// arch/arm/lib/reset.c

#define TIMER_REG 0x01C20C00

// ... snip ...

do {

    putc('A');

    writel(0x1, TIMER_REG + 0xB4); // Set 1:0 to 01

    putc('B');

    // writel((1 << 5), TIMER_REG + 0xB8); // Set 7:4 to 0001 (...00010000)

    writel(0x10, TIMER_REG + 0xB8);        // 0b10000 = 0x10, or 1s

    putc('C');

    writel(readl(TIMER_REG + 0xB8) | (1 << 0), TIMER_REG + 0xB8); // Set 0 to 1 to enable WDT

    putc('D');

    mdelay(500);

    putc('E');

    writel(readl(TIMER_REG + 0xB0) | (0xA57 << 1) | (1 << 0), TIMER_REG + 0xB0); // Set 12:1 as 0xA57, and 0 to 1

    // Could also be writel(0x14AF, TIMER_REG + 0xB0);

    putc('F');

} while(1);

U-Boot> reset

ABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFAB

CDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCD

EFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEF

ABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFAB...

Frustrating.

Am I the only one with this problem?

My initial search for “Pine64 reset” led nowhere (too specific). There are a handful of unanswered pleas for help in the forums, which is why I tried to debug U-Boot on my own.

One day whilst reaching my wit’s end, I instead searched for “A64 watchdog reset” which led me to a deep thread with brilliant people collaborating in the thread titled “H6 Famous Reboot Problem” with nine pages. Allwinner makes the A64 and H6, the latter being very similar to the A64, but with better video support (not needed in a cluster computer). Jackpot.

People even describe the same path I took:

“I’ve tried to debug the reset_cpu() in arch/arm/mach-sunxi/board.c where it set some Watchdog register and loop infinitely, but it seems that watchdog never kicks in.” (ref)

There was a false victory.

“Bingo! The missing thing is CONFIG_NR_DRAM_BANKS=1.”

Could a certain flag not be set?

“Maybe nowayout param should be set to 1? I remember that nowayout=0 on H3 just disables watchdog hardware reset.”

The next idea was looking at the Arm Trusted Firmware (now called Trusted Firmware A, or TF-A).

“Mainline u-boot has a reset command, which triggers a watchdog-based reboot, and it just locks up the machine, when the watchdog timeout expires. The same thing simply happens in the kernel. The kernel tells ATF to reset, ATF does the same thing as u-boot (watchdog-based reset), and the SoC locks up.” (ref)

A sign of hope emerges.

“Changing to R_WDOG instead of WDOG in ATF fixes the issue. … A patch can be added to build/patch/atf/atf-sunxi64/.” (ref)

A consensus emerges that the problem is in the ATF (now called TF-A), and the fix (for H6) is as simple as:

diff --git a/plat/allwinner/sun50i_h6/include/sunxi_mmap.h b/plat/allwinner/sun50i_h6/include/sunxi_mmap.h

index f36491a8..f01745a4 100644

--- a/plat/allwinner/sun50i_h6/include/sunxi_mmap.h

+++ b/plat/allwinner/sun50i_h6/include/sunxi_mmap.h

@@ -58,4 +58,7 @@

#define SUNXI_R_UART_BASE      0x07080000

#define SUNXI_R_PIO_BASE       0x07022000

+#undef SUNXI_WDOG_BASE

+#define SUNXI_WDOG_BASE            SUNXI_R_WDOG_BASE

+

#endif /* SUNXI_MMAP_H */

--


Can the Allwinner H6 TF-A reset solution be applied to the A64?

We’re talking about the trusted watchdog now. Could the solution be as simple as pointing the regular watchdog code to the trusted watchdog? Let’s look at the system bus again.

Allwinner A64 trusted watchdog
Allwinner A64 trusted watchdog

It seems that in 2021 TF-A is already using the secure watchdog (SUNXI_R_WDOG) as we can see below. There is nothing to do here for the A64.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

// plat/allwinner/common/sunxi_native_pm.c

#define SUNXI_WDOG0_CTRL_REG       (SUNXI_R_WDOG_BASE + 0x0010)

#define SUNXI_WDOG0_CFG_REG        (SUNXI_R_WDOG_BASE + 0x0014)

#define SUNXI_WDOG0_MODE_REG       (SUNXI_R_WDOG_BASE + 0x0018)

 

// FYI: #define  __dead2  __attribute__((__noreturn__))

static void __dead2 sunxi_system_reset(void) {

    gicv2_cpuif_disable();

 

    /* Reset the whole system when the watchdog times out */

    mmio_write_32(SUNXI_WDOG0_CFG_REG, 1);

    /* Enable the watchdog with the shortest timeout (0.5 seconds) */

    mmio_write_32(SUNXI_WDOG0_MODE_REG, (0 << 4) | 1);

    /* Wait for twice the watchdog timeout before panicking */

    mdelay(1000);

 

    ERROR("PSCI: System reset failed\n");

    panic();

}

How to get U-Boot to call the TF-A trusted-watchdog system reset?

Since the ARM trusted watchdog is, well, trusted (right?), there needs to be communication from user-world to secure-world through the TF-A. Now, the TF-A has the sunxi_system_reset() defined in both sunxi_native_pm.c and sunxi_scpi_pm.c.

// plat/allwinner/common/sunxi_scpi_pm.c

static void __dead2 sunxi_system_reset(void) {

    uint32_t ret;

    gicv2_cpuif_disable();

 

    /* Send the system reset request to the SCP. */

    ret = scpi_sys_power_state(scpi_system_reboot);

    if (ret != SCP_OK) {

        ERROR("PSCI: SCPI %s failed: %d\n", "reboot", ret);

    }

 

    psci_power_down_wfi();

}

We now have to go even deeper to the SCPI, which stands for System Control and Power Interface. Which of the two implementations is used? According to the logic in allwinner-common.mk, the native implementation is used by default. So, how to call this programmatically?

Let’s chase down the “native PSCI ops” structure in sunxi_native_pm.c and see who executes the operation “system_reset”. This led to psci_system_off.c with a method called psci_system_reset(void).

// lib/psci/psci_system_off.c

void __dead2 psci_system_reset(void) {

    psci_print_power_domain_map();

 

    /* Notify the Secure Payload Dispatcher */

    if ((psci_spd_pm != NULL) && (psci_spd_pm->svc_system_reset != NULL)) {

        psci_spd_pm->svc_system_reset();

    }

 

    console_flush();

 

    /* Call the platform specific hook */

    psci_plat_pm_ops->system_reset();

 

    /* This function does not return. We should never get here */

}

Okay, going deeper, who calls psci_system_reset() then?

// lib/psci/psci_main.c

/*******************************************************************************

* PSCI top level handler for servicing SMCs.

******************************************************************************/

u_register_t psci_smc_handler(uint32_t smc_fid,

// ... snip ...

        case PSCI_SYSTEM_OFF:

            psci_system_off();

            /* We should never return from psci_system_off() */

            break;

 

        case PSCI_SYSTEM_RESET:

            psci_system_reset();

            /* We should never return from psci_system_reset() */

            break;

Sigh. What’s an SMC?

What is the Secure Memory Controller (SMC) and how to use it to trigger a reset?

The SMC is an Advanced Microcontroller Bus Architecture (AMBA) compliant SoC peripheral. It is an address-space controller with on-chip AMBA bus interfaces. The user guide gets wordy, but let’s say it’s a gatekeeper to protected address space that the TF-A secure code uses.

Back to U-Boot, we see that in the DTS for the A64, PSCI uses SMC.

// arch/arm/dts/sun50i-a64.dtsi

psci {

    compatible = "arm,psci-0.2";

    method = "smc";

};

We’ve now come full-circle back to PSCI in U-Boot. Let’s drop to the U-Boot shell and try to issue some SMC commands manually to see if it even works. Add CONFIG_CMD_SMC=y first.

U-Boot> smc

smc - Issue a Secure Monitor Call

 

Usage:

smc <fid> [arg1 ... arg6] [id]

  - fid Function ID

  - arg SMC arguments, passed to X1-X6 (default to zero)

  - id  Secure OS ID / Session ID, passed to W7 (defaults to zero)

 

U-Boot>

No obvious docs. No SMC examples. Just treading water in the deep end of the pool. What even is a Function ID? In U-Boot mainline, I found a lonely file called durian.c and saw a hint:

// psci.c

void reset_cpu(ulong addr) {

    struct arm_smccc_res res;

 

    arm_smccc_smc(0x84000009, 0, 0, 0, 0, 0, 0, 0, &res); // <-- hint

    debug("reset cpu error, %lx\n", res.a0);

}

Looking at the command processor for smc, it also arrives at arm_smccc_smc() via a method called do_call() in smccc-call.c, and again in a method named invoke_psci_fn().

Chasing down the latter, I found invoke_psci_fn(PSCI_0_2_FN_SYSTEM_RESET, 0, 0, 0) deep in code. Then PSCI_0_2_FN_SYSTEM_RESET is defined as PSCI_0_2_FN(9). We eventually arrive at:

// include/linux/psci.h

#define PSCI_0_2_FN_BASE   0x84000000

#define PSCI_0_2_FN(n)     (PSCI_0_2_FN_BASE + (n))

// Then, PSCI_0_2_FN_SYSTEM_RESET = PSCI_0_2_FN(9) = 0x84000000 + 9 = 0x84000009

Happily, we find that the Function ID for reset is also 0x84000009. Let’s trigger a system reset via TF-A using the smc command in U-Boot.

Absolutely nothing happened, except the board still hangs. Back to square one. This seemed like a hack, anyway. Moving on.

Is there a pointer math error?

I took a deep dive into the Allwinner A64 user manual again, and looked at WDT register offsets in a C++ struct. Are the struct offsets correct? I see u32 (4-bytes) entries, so does u32 ctl (WDOG_CTRL_REG) truly start at the 0x10 offset, or incorrectly at 0x12 (4 bytes * 3)?

// arch/arm/include/asm/arch-sunxi/watchdog.h

struct sunxi_wdog {

    u32 irq_en;  /* 0x00 */

    u32 irq_sta; /* 0x04 */

    u32 res1[2];

    u32 ctl;     /* 0x10 <-- 0x10 or 0x12? */

    u32 cfg;     /* 0x14 */

    u32 mode;    /* 0x18 */

    u32 res2;

};

Wouldn’t that be nice if this was a simple pointer error? Let’s see with a quick test.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

#define u32               unsigned int

#define SUNXI_TIMER_BASE  0x01c20c00

 

int main() {

    struct sunxi_wdog {

        u32 irq_en;  /* 0x00 */

        u32 irq_sta; /* 0x04 */

        u32 res1[2];

        u32 ctl;     /* 0x10 <-- Yes, correct offset */

        u32 cfg;     /* 0x14 */

        u32 mode;    /* 0x18 */

        u32 res2;

    };

 

    static const struct sunxi_wdog *wdog =

        (struct sunxi_wdog *)SUNXI_TIMER_BASE;

    

    std::cout << &wdog->ctl << std::endl;   // 0x1c20c10 - Wrong addr, should be 0x1c20cb0

    std::cout << &wdog->cfg << std::endl;   // 0x1c20c14 - Wrong addr, should be 0x1c20cb4

    std::cout << &wdog->mode << std::endl;  // 0x1c20c18 - Wrong addr, should be 0x1c20cb8

    return 0;

}

So, win, the 0x10 offset is correct. But, the pointers are all wrong.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

// #define SUNXI_TIMER_BASE  0x01c20c00

#define WDOG0_IRQ_EN_REG     0x01c20ca0 // Ends in A0

 

int main() {

    struct sunxi_wdog {

        u32 irq_en;  /* 0x00 */

        u32 irq_sta; /* 0x04 */

        u32 res1[2];

        u32 ctl;     /* 0x10 */

        u32 cfg;     /* 0x14 */

        u32 mode;    /* 0x18 */

        u32 res2;

    };

 

    static const struct sunxi_wdog *wdog =

        (struct sunxi_wdog *)WDOG0_IRQ_EN_REG;

    

    std::cout << &wdog->ctl << std::endl;   // 0x1c20cb0 - Correct

    std::cout << &wdog->cfg << std::endl;   // 0x1c20cb4 - Correct

    std::cout << &wdog->mode << std::endl;  // 0x1c20cb8 - Correct

    return 0;

}

Even with the correct register addresses, the board still does not reset.

Does the watchdog physically even work in the first place?

I found a thread suggesting that the Allwinner hardware may be broken.

The issue is real except on Pine H64 and Rongpin RP-H6B which seems to be NOT affected. Lot of users on OrangePi boards (Lite2 / One Plus and 3) are complaining about this issue.

and

We perform a simple watchdog test on different boards:

Pine H64 = H6 V200-AWIN H6448BA 7782 => OK
OrangePi Lite 2 = H6 V200-AWIN H8068BA 61C2 => KO
PineH64 = H8069BA 6892 => OK
Orange Pi 3 = HA047BA 69W2 => KO
OPiOnePlus = H7310BA 6842 => KO
OPiLite2 = H6448BA 6662 => KO
Beelink GS1 = H6 V200-AWIN H7309BA 6842 => KO

The community consensus again for sunxi hardware again is to use the trusted watchdog (R_WDOG). But, how?

In the 705-page Allwinner A64 user guide, it only has one solitary reference to an R_WDOG register at 0x01F01000 on page 74. That, and a mention of R_WDOG being a secure module under the “CPUS” domain. There is no reference on how to use the module or what register offsets it uses.

Let’s see if I can add the trusted watchdog to the A64 device tree with a graft from the H6 device tree.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

// From Allwinner H6 manual pg. 72

//     compatible = "allwinner,sun50i-h6-wdt",

//                  "allwinner,sun6i-a31-wdt";

//     reg = <0x07020400 0x20>;

//     interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_HIGH>;

//     clocks = <&osc24M>;

// };

 

// Update the register to use R_WDOG for the A64

r_watchdog: watchdog@1f01000 {

    compatible = "allwinner,sun50i-h6-wdt",

                 "allwinner,sun50i-a64-wdt",

                 "allwinner,sun6i-a31-wdt";

    reg = <0x01f01000 0x20>;

    interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_HIGH>;

    clocks = <&osc24M>;

};

No effect. Truthfully, I’m not even sure if the GIC_SPI 103 grafted to the A64 does anything. I’ll leave this alone and try something else.

Do I have a core with an old revision that might be defective?

Let’s find the A64 revision number using a function in assembly we cannot normally access.

// TF-A: bl31_main.c

extern int cpu_get_rev_var(void);

 

// Get revision number

INFO("CPU revision: %d\n", cpu_get_rev_var());

// Output: "INFO:    CPU revision: 4"

My board is on revision 4, and that is the latest revision. Good, good.

Let’s make sure all the processor errata are enabled in TF-A to be safe (in a Dockerfile).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

RUN git clone --depth=1 --single-branch -b v2.4 \

    https://github.com/ARM-software/arm-trusted-firmware.git arm-trusted-firmware

 

RUN cd arm-trusted-firmware &&\

    make \

    PLAT=sun50i_a64 \

    DEBUG=1 \

    bl31 \

    LOG_LEVEL=40 \

    ERRATA_A53_819472=1 \

    ERRATA_A53_824069=1 \

    ERRATA_A53_826319=1 \

    ERRATA_A53_827319=1 \

    ERRATA_A53_835769=1 \

    ERRATA_A53_836870=1 \

    ERRATA_A53_843419=1 \

    ERRATA_A53_855873=1 \

    ERRATA_A53_1530924=1

Let’s keep going and try something else.

As a PoC, can I hack TF-A to enter an infinite reset loop via secure watchdog?

Let’s see if instead of loading U-Boot SPL after TF-A, can I directly invoke the secure watchdog reset code that I tried earlier?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

/*******************************************************************************

* This function programs EL3 registers and performs other setup to enable entry

* into the next image after BL31 at the next ERET.

******************************************************************************/

void __init bl31_prepare_next_image_entry(void) {

    INFO("** Let's try a reboot! **\n");

 

#define mmio_read_32(c)      ({unsigned int __v = (*(volatile unsigned int *)(c)); __v; })

#define mmio_write_32(c, v)  ((*(volatile unsigned int *)(c)) = (v))

#define TIMER_REG            0x01C20C00

 

    mmio_write_32(TIMER_REG + 0xB4, 0x1);  // Set 1:0 to 01

    mmio_write_32(TIMER_REG + 0xB8, (1 << 5));  // Set 7:4 to 0001 (...00010000)

    mmio_write_32(TIMER_REG + 0xB8, mmio_read_32(TIMER_REG + 0xB8) | (1 << 0));  // Set 0:0 to 1 to enable WDT

    mdelay(500);

    mmio_write_32(TIMER_REG + 0xB0, mmio_read_32(TIMER_REG + 0xB0) | (0xA57 << 1) | (1 << 0));  // Set 12:1 as 0xA57, and 0:0 to 1

 

    mdelay(5000);

    ERROR("System reset failed\n");

}

Results: the system hangs.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

U-Boot SPL 2021.01 Armcube (Mar 25 2021 - 20:22:06 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

NOTICE:  BL31: Built : 20:21:36, Mar 25 2021

NOTICE:  BL31: Detected Allwinner A64/H64/R18 SoC (1689)

NOTICE:  BL31: Found U-Boot DTB at 0x407d2d0, model: SoPine with baseboard

... snip ...

INFO:    BL31: Platform setup done

INFO:    BL31: Initializing runtime services

INFO:    BL31: cortex_a53: CPU workaround for 819472 was applied

INFO:    BL31: cortex_a53: CPU workaround for 824069 was applied

INFO:    BL31: cortex_a53: CPU workaround for 827319 was applied

INFO:    BL31: cortex_a53: CPU workaround for 843419 was applied

INFO:    BL31: cortex_a53: CPU workaround for 855873 was applied

INFO:    BL31: cortex_a53: CPU workaround for 1530924 was applied

NOTICE:  PSCI: System suspend is unavailable

INFO:    ** Let's try a reboot! **

- no further output -

Let’s dig a little deeper with an experiment in timing:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

void __init bl31_prepare_next_image_entry(void) {

    INFO("** Let's try a reboot! **\n");

 

#define mmio_read_32(c)      ({unsigned int __v = (*(volatile unsigned int *)(c)); __v; })

#define mmio_write_32(c, v)  ((*(volatile unsigned int *)(c)) = (v))

#define TIMER_REG            0x01C20C00

 

    mmio_write_32(TIMER_REG + 0xB4, 0x1);       // Set 1:0 to 01

    mmio_write_32(TIMER_REG + 0xB8, (1 << 5));  // Set 7:4 to 0001 (...00010000)

    mmio_write_32(TIMER_REG + 0xB8, mmio_read_32(TIMER_REG + 0xB8) | (1 << 0));  // Set 0:0 to 1 to enable WDT

    mdelay(500);

    mmio_write_32(TIMER_REG + 0xB0, (0xA57 << 1) | (1 << 0));  // Set 12:1 as 0xA57, and 0:0 to 1

 

    // See how long the CPU keeps running

    for (int i = 1; i <= 200; i++) {

        INFO("%d ms\n", (i * 50));

        mdelay(50);

    }

    ERROR("System reset failed\n");

}

Here are several timing results:

U-Boot SPL 2021.01 Armcube (Mar 25 2021 - 21:27:30 +0000)

...

INFO:    2100 ms

INFO:    2150 ms

INFO:    2200 ms

INFO:    2250 ms

INFO:    2300 ms

- No further output -

U-Boot SPL 2021.01 Armcube (Mar 25 2021 - 21:27:30 +0000)

...

INFO:    1750 ms

INFO:    1800 ms

INFO:    1850 ms

INFO:    1900 ms

INFO:    1950 ms

- No further output -

U-Boot SPL 2021.01 Armcube (Mar 25 2021 - 21:27:30 +0000)

...

INFO:    3050 ms

INFO:    3100 ms

INFO:    3150 ms

INFO:    3200 ms

INFO:    3250 ms

IN

- No further output -

This means the CPU keeps running after the watchdog timer starts, then eventually halts, but still no restart. Running the test a dozen more times and eyeballing the average shows that the CPU runs for about two seconds before halting. This coincides with setting TIMER_REG + 0xB8 to (1 << 5) which is a two-second watchdog period. This at least means the regular watchdog fires on time.

Changing TIMER_REG from 0x01C20C00 (WDT) to 0x01f01000 (R_WDT) results in the same behaviour. Then, why doesn’t the system restart?

On reset, does the instruction pointer jump past the BL1 (Boot ROM) code into nothingness?

The watchdog times out and the CPU(s) halts. What would make the CPU halt? Plausibly, if it lost power, or a jump instruction landed in a NOP slide to the end of memory. Just to be safe, I’ll explicitly set the reset vector for each core. Note: Allwinner isn’t clear if CPUCFG starts at 0x01700c00 or 0x01700000, so I tried both.

#define SUNXI_CPUCFG_BASE               0x0170000 // Also tried 0x01700c00

#define SUNXI_CPUCFG_RVBAR_LO_REG(n)    (SUNXI_CPUCFG_BASE + 0xa0 + (n) * 8)

#define SUNXI_CPUCFG_RVBAR_HI_REG(n)    (SUNXI_CPUCFG_BASE + 0xa4 + (n) * 8)

 

/* Program all CPU entry points. */

for (unsigned int cpu = 0; cpu < 4; ++cpu) {

    mmio_write_32(SUNXI_CPUCFG_RVBAR_LO_REG(cpu), 0x0); // Default, anyway

    mmio_write_32(SUNXI_CPUCFG_RVBAR_HI_REG(cpu), 0x0); // Default, anyway

}

No new effect.

Can we even write to the reset vectors?

Yes, we can. From the experiment below, 0x01700000 + 0xa0 is writeable from 31:2.

// Note: Only bits 31:2 are valid

mmio_write_32(0x01700000 + 0xa0,0xffffffff);

INFO("SUNXI_CPUCFG_RVBAR_LO_REG: 0x%08x\n", mmio_read_32(0x01700000 + 0xa0 ));

// Output: "INFO:    SUNXI_CPUCFG_RVBAR_LO_REG: 0xfffffffc"

 

// Note: Only bits 31:2 are valid

mmio_write_32(0x01700c00 + 0xa0,0xffffffff);

INFO("SUNXI_CPUCFG_RVBAR_LO_REG: 0x%08x\n", mmio_read_32(0x01700c00 + 0xa0 ));

// Output: "INFO:    SUNXI_CPUCFG_RVBAR_LO_REG: 0x00000000"

Having experimented with writing several jump points like 0x04000000, 0x80000000, 0x80010000, and of course 0x00000000, I’m still no closer to solving this.

Is this a clusterboard problem only, or does reset work on the baseboard?

Reset works in the baseboard (thanks, Dave).

Reset works in the SOPINE baseboard
Reset works in the SOPINE baseboard

The exact same SD card with the hacked TF-A code enters an infinite reset loop on the baseboard. See below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

U-Boot SPL 2021.01 Armcube (Mar 26 2021 - 21:56:18 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

NOTICE:  BL31: Built : 21:55:52, Mar 26 2021

NOTICE:  BL31: Detected Allwinner A64/H64/R18 SoC (1689)

NOTICE:  BL31: Found U-Boot DTB at 0x407d2d0, model: SoPine with baseboard

INFO:    ARM GICv2 driver initialized

INFO:    Configuring SPC Controller

INFO:    PMIC: Probing AXP803 on RSB

INFO:    PMIC: dcdc1 voltage: 3.300V

INFO:    PMIC: dcdc5 voltage: 1.200V

INFO:    PMIC: dcdc6 voltage: 1.100V

INFO:    PMIC: dldo1 voltage: 3.300V

INFO:    PMIC: dldo2 voltage: 3.300V

INFO:    PMIC: dldo4 voltage: 3.300V

INFO:    PMIC: fldo1 voltage: 1.200V

INFO:    PMIC: Enabling DC SW

INFO:    BL31: Platform setup done

INFO:    BL31: Initializing runtime services

INFO:    BL31: cortex_a53: CPU workaround for 819472 was applied

INFO:    BL31: cortex_a53: CPU workaround for 824069 was applied

INFO:    BL31: cortex_a53: CPU workaround for 827319 was applied

INFO:    BL31: cortex_a53: CPU workaround for 843419 was applied

INFO:    BL31: cortex_a53: CPU workaround for 855873 was applied

INFO:    BL31: cortex_a53: CPU workaround for 1530924 was applied

NOTICE:  PSCI: System suspend is unavailable

INFO:    ** Let's try a reboot! **

INFO:    50 ms

INFO:    100 ms

... snip ...

INFO:    3050 ms

INFO:    3100 ms

INFO:    3150 ms

 

U-Boot SPL 2021.01 Armcube (Mar 26 2021 - 21:56:18 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

...

Why does this work? The power chip (PMIC) is on the SOPINE module. The SoC has the Advanced Peripheral Bus (APB), which carries the reset signal, internal to the A64 die. The only notable external components on the clusterboard are the RTL8370N Ethernet switch IC and the RTL8211E Ethernet port chip per SOPINE. The baseboard also has the RTL8211E Ethernet port chip.

What happens if USB and Ethernet devices are disabled in the device tree to remove external interference factors?

In this experiment, I’ve disabled USB and Ethernet in the device tree in case they somehow hold up the reset procedure.

&ohci0  { status = "disabled"; };

&ohci1  { status = "disabled"; };

&usbphy { status = "disabled"; };

&ehci0  { status = "disabled"; };

&ehci1  { status = "disabled"; };

&emac   { status = "disabled"; };

No restart observed. The SoC still hangs.

Is there a way to interrupt the reset process? Can a peripheral or module prevent a restart?

The reset signal (special register writes via the APB bus) is sent to all the SoC modules, not just the CPU. This probably isn’t a parallel operation because if all peripherals are on the same bus, then they need addressing, and only one address can exist on the bus at a time, so the reset signal must be a synchronous process. Can this synchronous reset queue be held up somehow? That is what I was trying to rule out by disabling the USB and Ethernet previously.

Let’s dive into the ARM SoC watchdog module SP805 for some context.

Skimming over the details, the most important information I could tease out of the watchdog docs is that it requires two clocks – one to drive the watchdog counter, and the other to drive the APB bus. Could the APB bus clock have stopped somehow on the clusterboard but not the baseboard? Let’s come back to this later.

What other hardware differences are there between the baseboard and the clusterboard?

There is one other difference between the baseboard and clusterboard: powering the real-time clock (RTC). Let’s compare.

SOPINE baseboard's RTC is powered by the rails
SOPINE baseboard’s RTC is powered by the rails (source)
SOPINE clusterboard uses mandatory AA batteries for RTC
SOPINE clusterboard uses mandatory AA batteries for RTC (source)

Did Pine64 revise the SOPINE module to sideline the 3.0V VCC-RTC from the power IC (PMIC) so only a physical battery can power the RTC? The schematics are in flux, so I’ll leave it to the experts to decide:

SOPINE module doesn't use 3.0V PMIC RTC anymore?
SOPINE module doesn’t use 3.0V PMIC RTC anymore?

Just to be more convincing, here is the Pine64 LTS schematic:

Pine64 LTS PMIC VCC_RTC is unused
Pine64 LTS PMIC VCC_RTC is unused

Why should this matter? Isn’t the RTC optional and used to keep the date and time roughly accurate while the SoC is powered off? Let’s pull on this thread a bit since a powered RTC seemingly allows the SoC to reset.

How important is the real-time clock (RTC) to the A64 SoC? Isn’t it supposed to be optional?

From Allwinner,

The real-time clock (RTC) is for calendar usage … The unit can be operated by the backup battery while the system power is off. … The alarm generates an alarm signal at a specified time in the power-off mode or normal operation mode. In normal operation mode, both the alarm interrupt and the power management wakeup are activated. (source)

It seems the RTC has an alarm that is capable of waking up power management. This could be what restarts the CPU. This could be the RTCINTR signal in the functional block diagram below from ARM.

ARM RTC block operation
ARM RTC block operation

Let’s ask some more questions.

Does the A64 VCC-RTC pin power everything in the RTC block?

If you are like me, you probably haven’t wondered why some SoC modules are prefixed with “R_”.

A64 RTC and R-modules
A64 RTC and R-modules

From my research,

The AR100, also called the CPUS or ARISC in SoC documentation is a coprocessor present in the A31 and newer sunxi SoCs. While the name “AR100” refers only to the OpenRISC CPU core, the processor is tightly integrated with other “RTC block” hardware. In general, any device whose name begins with “R_” is intended to be controlled by the AR100. This includes the R_PIO, R_PRCM, and several timers. (source)

From the A64 power docs, there is a VDD-CPUS pin to power the above. It’s safe to say that the A64 VCC-RTC pin powers only the RTC. So, no RTC power, no RTC alarm?

Could the clusterboard WDT reset solution be as simple as adding 2xAA batteries to the clusterboard to power the RTC?

Holy smokes, the clusterboard resets!

Result: Powering the real-time clock (RTC) in the A64 SoCs (strangely) allows a reset.

Clusterboard 2xAA batteries to power the RTC
Clusterboard 2xAA batteries to power the RTC

Baring any other discoveries, the current hypothesis is that the RTC needs power, and the only way to achieve that on the clusterboard is with batteries.

Hang on. The RTC runs without batteries; batteries keep it going when the main power is off.

Let’s examine some RTC registers without and with external RTC (battery) power. Here is the test code.

// bl31_main.c

#define RTC_REG     0x01f00000

...

for (int i = 1; i <= 2000; i++) {

    INFO("%d ms: YMD: 0x%04x, HMS: 0x%04x, ALRM0_CNT: 0x%04x, ALRM0_CUR: 0x%04x, ALRM0_EN: 0x%01x, ALRM_CFG: 0x%04x\n",

         (i * 5),

         mmio_read_32(RTC_REG + 0x10),

         mmio_read_32(RTC_REG + 0x14),

         mmio_read_32(RTC_REG + 0x20),

         mmio_read_32(RTC_REG + 0x24),

         mmio_read_32(RTC_REG + 0x28),

         mmio_read_32(RTC_REG + 0x50)

         );

    mdelay(5);

}

First, without batteries.

INFO:    610 ms: YMD: 0x0101, HMS: 0x0001, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    650 ms: YMD: 0x0101, HMS: 0x0001, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    655 ms: YMD: 0x0101, HMS: 0x0001, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    660 ms: YMD: 0x0101, HMS: 0x0002, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    665 ms: YMD: 0x0101, HMS: 0x0002, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    670 ms: YMD: 0x0101, HMS: 0x0002, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    675 ms: YMD: 0x0101, HMS: 0x0002, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    680 ms: YMD: 0x0101, HMS: 0x0002, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x

-- no further output --

Now, with batteries to power the RTC. This is the same for the SOPINE baseboard.

INFO:    395 ms: YMD: 0x0101, HMS: 0x0214, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    400 ms: YMD: 0x0101, HMS: 0x0214, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    405 ms: YMD: 0x0101, HMS: 0x0214, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    410 ms: YMD: 0x0101, HMS: 0x0214, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    415 ms: YMD: 0x0101, HMS: 0x0214, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    420 ms: YMD: 0x0101, HMS: 0x0214, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    425 ms: YMD: 0x0101, HMS: 0x0215, ALRM0_CNT: 0x0000, ALRM0_CUR: 0x0000, ALRM0_EN: 0x0, ALRM_CFG: 0x0000

INFO:    430 ms: YMD: 0x0101, HMS: 0x0215, ALRM0_CNT: 0x0000, ALRM0_C

-- a reset occurs --

The RTC seconds counter increments with and without batteries, as expected, but the alarm registers are all empty in both cases. I suspect the RTC is a red herring.

Let’s examine the schematics again. Some pins are pulled high by the VCC-RTC line.

In several ARM SoC designs, the external non-maskable interrupt (AP-NMI#) pin and AP-RESET# pin are pulled high by the VCC-RTC. For example:

Example: A31 NMI pulled high by VCC-RTC
Example: A31 NMI pulled high by VCC-RTC

What is unique about the Pine64 designs is that the power IC (PMIC) does not feed the VCC-RTC line, whereas the majority of other SoC board designs have the dedicated, regulated PMIC VCC_RTC output feeding the RTC along with the battery via the VCC-RTC line as it is in the always-on power domain.

Pine64 LTS PMIC VCC_RTC is unused
Pine64 LTS PMIC VCC_RTC is unused


AXP803 power-management IC (PMIC) facts (ref):

  1. APX803’s PWROK pin is pulled up to RTCLDO (outputs to VCC_RTC) internally.
  2. RTCLDO is always on, even during power down or reset.
  3. RTCLDO is powered by IPSOUT and feeds from ACIN/VBUS or BAT.
  4. PWROK is tied to AP-RESET# on the A64 SoC.
  5. PWROK stands for Power-On Key, not “Power OK”.
  6. When PMIC is shut down, VCC_RTC will be shut off for two seconds and pulled to GND via 1kΩ.
  7. The IRQ pin needs a 10kΩ pull-high (usually to VCC-RTC) as it is NMOS open-drain.

Inferences:

  1. Without VCC-RTC to pull PMIC’s IRQ pin high, IRQ floats or is grounded. Without the battery does PMIC fail to signal an interrupt (AP-NMI#) to the SoC (possibly missing a wake-up signal)?
  2. AP-RESET# is pulled high internally in the PMIC via RTCLDO, so the battery doesn’t affect this logic.

My new working hypothesis is that the NMI is never properly asserted without a battery on the clusterboard.

How important is the NMI pin to a reset on the SoC?

Interrupts are complex, so I’ll itemize some facts I’ve learned about the NMI pin.

SoC interrupt facts (ref):

  1. Allwinner sun50i SoCs (A31 and newer) have two interrupt controllers: GIC and R_INTC.
  2. GIC does not support wakeup and is inaccessible from the ARISC (power CPU).
  3. All IRQs that can be used to wake up the system are routed through R_INTC.
  4. All wake IRQs are enabled during suspend.
  5. R_INTC controls the NMI pin, the trigger, and mask for the NMI input pin.
  6. R_INTC provides the interrupt input for the ARISC coprocessor.
  7. R_INTC is in the always-on power domain.
  8. NMI pin is routed to the “IRQ 0” input on R_INTC.
  9. NMI trigger type is controlled by the NMI_CTRL_REG.
  10. SCP firmware = Crust = power management firmware.
  11. During suspend, the Crust will enable the interrupt input to the AR100.
  12. AR100 will treat any IRQ (subject to a mask) as a trigger to wake up.
  13. AR100 = CPUS = ARISC.

The NMI pin is the second-highest interrupt (IRQ), second to the RESET interrupt. When the SoC is reset or suspended, the NMI can easily trigger a wake-up and/or reinitialization of the BROM, CPUs, peripherals, and on. Additionally, the PMIC IRQ pin is asserted on thermal problems, rechargeable battery removal/insertion, power drop, and other programmable situations.

Is VCC-RTC getting power on the SOPINE baseboard but not the clusterboard?

Here is the SOPINE baseboard schematic. Below is the PCB trace of the baseboard just for fun.

SOPINE baseboard PCB traces
SOPINE baseboard PCB traces

On my baseboard, diode OD4 is missing (which is good because VCC-RTC and BAT-RTC are shorted through a 0Ω resistor), so VCC-RTC is seemingly only powered by a battery. Let’s put a multimeter on VCC-RTC and see if it is powered.

Baseboard electrical measurements:

  1. SOPINE removed, power on, VCC-RTC is 0V.
  2. SOPINE removed, power off, VCC-RTC resistance is infinite.
  3. SOPINE inserted, power on, VCC-RTC is 2.78V.
  4. SOPINE inserted, power off, VCC-RTC resistance increases from ~1.5MΩ (settles on 3.25MΩ).
  5. SOPINE inserted, power on, 1kΩ series resister, VCC-RTC draws ~2.7mA.
  6. SOPINE inserted, power off, VCC-RTC capacitance is 0.96uF.

Clusterboard (v2.3) electrical measurements:

  1. SOPINEs removed, power on, VCC-RTC is 0V.
  2. SOPINEs removed, power off, VCC-RTC resistance is infinite.
  3. 7xSOPINE inserted, power on, VCC-RTC is 2.78V (across battery holder).
  4. 1xSOPINE inserted, power off, VCC-RTC resistance increases from ~700kΩ (settles on 3.26MΩ).
  5. 7xSOPINE inserted, power off, VCC-RTC resistance increases from ~140kΩ (settles on 240kΩ).
  6. 7xSOPINE inserted, power on, 1kΩ series resister, VCC-RTC draws ~2.7mA.

Clusterboard (v2.3) VCC-RTC capacitance measurements:

  1. 1xSOPINE inserted, power off, VCC-RTC non-convergent capacitance.
  2. 2xSOPINE inserted, power off, VCC-RTC capacitance is 1.9uF.
  3. 3xSOPINE inserted, power off, VCC-RTC capacitance is 3.1uF.
  4. 4xSOPINE inserted, power off, VCC-RTC capacitance is 4.7uF.
  5. 5xSOPINE inserted, power off, VCC-RTC capacitance is 6.6uF.
  6. 6xSOPINE inserted, power off, VCC-RTC capacitance is 8.9uF.
  7. 7xSOPINE inserted, power off, VCC-RTC non-convergent capacitance.

Clusterboard (v2.3) battery measurements:

  1. 2xAA new lithium batteries voltage is 3.6V.
  2. 2xAA, no SOPINEs inserted, power off, current draw is 0.00mA.
  3. 2xAA, 1xSOPINE inserted, power off, current draw is 0.07mA.
  4. 2xAA, 3xSOPINE inserted, power off, current draw is 0.19mA.
  5. 2xAA, 7xSOPINE inserted, power off, current draw is 0.45mA.
  6. 2xAA, 1xSOPINE inserted, power on, current draw is 0.43mA, resets.
  7. 2xAA, 2xSOPINE inserted, power on, current draw is 0.86mA, resets.
  8. 2xAA, 3xSOPINE inserted, power on, current draw is 1.29mA, resets.
  9. 2xAA, 7xSOPINE inserted, power on, current draw is 3.01mA, resets.
  10. 1xAA, 1xSOPINE inserted, power on, current draw is 160mA, no reset.
SOPINE VCC-RTC 1.1uF capacitor bank
SOPINE VCC-RTC 1.1uF capacitor bank

Observations:

  1. If VCC-RTC were connected to the PMIC’s VCC_RTC, then the 10uF (C70) would be in parallel, and the single SOPINE capacitance would be 11.1uF, not 1.1uF.
  2. A single SOPINE also doesn’t restart without battery power, same as seven SOPINES, in the clusterboard.
  3. A clusterboard SOPINE has a mathematical VCC-RTC resistance of 1.68MΩ.
  4. 2xAA lithium batteries with 7000mAh will last 1.77 years at 0.45mA RTC draw (power off).
  5. 2xAA lithium batteries with 7000mAh will last only 97 days at 3.01mA RTC draw (power on).

Why is the mathematical VCC-RTC resistance of a clusterboard SOPINE 1.68MΩ when a measured resistance is 3.26MΩ?

Parallel resistance

Given the measured 240kΩ VCC-RTC resistance across seven SOPINES, each one must have a 1.68MΩ resistance, but one was measured at 3.25MΩ. That is suspicious. Let’s measure the resistance across each SOPINE individually.

Settled VCC-RTC resistance per SOPINE:

Measured VCC-RTC resistances across SOPINEs
Measured VCC-RTC resistances across SOPINEs

The parallel resistance is 278kΩ which is reasonably close to 240kΩ observed. We’ve learned the SOPINEs have different internal VCC-RTC resistances.

Do the different SOPINE VCC-RTC resistances affect the restart?

No. Both the 3.26MΩ SOPINE and the 1.19MΩ SOPINE fail to restart in the clusterboard, but both restart just fine in the baseboard. The problem likely isn’t related to a silicon defect.

Could the RTC’s external 32.768 kHz crystal not be active with no VCC-RTC?

No. Measured with an oscilloscope, the 32.768 kHz crystal (found just below the word “Designed” on the back of the SOPINE) outputs a perfect 32.768 kHz sine wave without batteries on the clusterboard. My hope was that somehow the xtal was unpowered so the RTC alarm wouldn’t activate.

Why do both the baseboard and the un-batteried clusterboard have 2.78V on the VCC-RTC line?

It’s possible that power is supplied internally by the A64 SoC in lieu of no external, dedicated VCC-RTC from batteries. I’m not able to find detailed power diagrams for the A64 SoC, but from a design point of view, it makes sense that the VCC-RTC pin is not electrically isolated while the RTC is on main power.

Could there be a 0.22V drop on the VCC-RTC line through a diode from 3.0V from the PMIC? No. I’ve established that the PMIC isn’t powering the VCC-RTC line. Also, the Schottky diode that was “deleted” from the schematics has a 0.49V drop which is too high.

Is the baseboard under-voltaged?

No. The clusterboard’s [email protected] adapter outputs 5.16V, while the baseboard’s [email protected] adapter outputs 5.36V. This doesn’t matter because the 5.15V is far away from the low-dropout voltage near the PMIC voltage of 3.0V for the RTC (which isn’t connected, anyway). I used an external PSU to reach 5.36V to power the clusterboard just to cover this unlikely possibility. External power can be excluded as a restart culprit.

How can I prevent a WDT restart in the baseboard?

If I can prevent a restart in the baseboard somehow, it may help understand how the wakeup process happens after a WDT reset.

However, when I cripple the device tree and the PMIC regulator initialization code in TF-A, baseboard WDT restarts still take place. I have not been able to prevent WDT restarts in the baseboard.

Crippled PMIC regulators in TF-A code:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

// axp803.c

#include <drivers/allwinner/axp.h>

 

const uint8_t axp_chip_id = AXP803_CHIP_ID;

const char *const axp_compatible = "x-powers,axp803";

 

const struct axp_regulator axp_regulators[] = {

    {"dcdc1", 1600, 3400, 100, NA, 0x20, 0x10, 0},

    {"dcdc5", 800,  1840, 10,  32, 0x24, 0x10, 4},

    {"dcdc6", 600,  1520, 10,  50, 0x25, 0x10, 5},

    // {"dldo1",  700, 3300, 100, NA, 0x15, 0x12, 3}, // disable

    // {"dldo2",  700, 4200, 100, 27, 0x16, 0x12, 4}, // disable

    // {"dldo3",  700, 3300, 100, NA, 0x17, 0x12, 5}, // disable

    // {"dldo4",  700, 3300, 100, NA, 0x18, 0x12, 6}, // disable

    // {"fldo1",  700, 1450,  50, NA, 0x1c, 0x13, 2}, // disable

    {}

};

 

// Reference:

//struct axp_regulator {

//    const char *dt_name;

//    uint16_t min_volt;

//    uint16_t max_volt;

//    uint16_t step;

//    unsigned char split;

//    unsigned char volt_reg;

//    unsigned char switch_reg;

//    unsigned char switch_bit;

//};

Crippled device tree:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

/dts-v1/;

 

/ {

    model = "SOPINE with CRIPPLED baseboard";

    #address-cells = < 0x01 >;

    #size-cells = < 0x01 >;

 

    aliases {

    };

 

    chosen {

        #address-cells = < 0x01 >;

        #size-cells = < 0x01 >;

        ranges;

    };

 

    cpus {

        #address-cells = < 0x01 >;

        #size-cells = < 0x00 >;

 

            compatible = "arm,cortex-a53";

            device_type = "cpu";

            reg = < 0x00 >;

            enable-method = "psci";

            next-level-cache = < 0x01 >;

            clocks = < 0x02 0x15 >;

            clock-names = "cpu";

            #cooling-cells = < 0x02 >;

            phandle = < 0x03 >;

        };

 

            status = "disabled";

            phandle = < 0x04 >;

        };

 

            status = "disabled";

            phandle = < 0x05 >;

        };

 

            status = "disabled";

            phandle = < 0x06 >;

        };

 

        l2-cache {

            compatible = "cache";

            cache-level = < 0x02 >;

            phandle = < 0x01 >;

        };

    };

 

    osc24M_clk {

        #clock-cells = < 0x00 >;

        compatible = "fixed-clock";

        clock-frequency = < 0x16e3600 >;

        clock-output-names = "osc24M";

        phandle = < 0x07 >;

    };

 

    osc32k_clk {

        status = "disabled";

        phandle = < 0x08 >;

    };

 

    pmu {

        status = "disabled";

    };

 

    soc {

        compatible = "simple-bus";

        #address-cells = < 0x01 >;

        #size-cells = < 0x01 >;

        ranges;

 

            status = "disabled";

            phandle = < 0x09 >;

        };

 

            phandle = < 0x0a >;

        };

 

            status = "disabled";

            phandle = < 0x02 >;

        };

 

            compatible = "snps,dw-apb-uart";

            reg = < 0x1c28000 0x400 >;

            interrupts = < 0x00 0x00 0x04 >;

            reg-shift = < 0x02 >;

            reg-io-width = < 0x04 >;

            clocks = < 0x02 0x43 >;

            resets = < 0x02 0x2e >;

            phandle = < 0x0b >;

        };

 

            status = "disabled";

            phandle = < 0x0c >;

        };

 

            status = "disabled";

            phandle = < 0x0d >;

        };

 

            status = "disabled";

            phandle = < 0x0e >;

        };

 

            status = "disabled";

            phandle = < 0x0f >;

        };

 

            status = "disabled";

            phandle = < 0x10 >;

        };

    };

 

    binman {

        multiple-images;

        phandle = < 0x11 >;

 

        u-boot-sunxi-with-spl {

            filename = "u-boot-sunxi-with-spl.bin";

            pad-byte = < 0xff >;

 

            blob {

                filename = "spl/sunxi-spl.bin";

            };

 

            fit {

                description = "Configuration to load ATF before U-Boot";

                #address-cells = < 0x01 >;

                fit,fdt-list = "of-list";

 

                images {

 

                    uboot {

                        description = "U-Boot (64-bit)";

                        type = "standalone";

                        os = "u-boot";

                        arch = "arm64";

                        compression = "none";

                        load = < 0x4a000000 >;

 

                        u-boot-nodtb {

                        };

                    };

 

                    atf {

                        description = "ARM Trusted Firmware";

                        type = "firmware";

                        os = "arm-trusted-firmware";

                        arch = "arm64";

                        compression = "none";

                        load = < 0x44000 >;

                        entry = < 0x44000 >;

 

                        atf-bl31 {

                            filename = "bl31.bin";

                            missing-msg = "atf-bl31-sunxi";

                        };

                    };

 

                    scp {

                        description = "SCP firmware";

                        type = "firmware";

                        arch = "or1k";

                        compression = "none";

                        load = < 0x50000 >;

 

                        scp {

                            filename = "scp.bin";

                            missing-msg = "scp-sunxi";

                        };

                    };

 

                    @fdt-SEQ {

                        description = "NAME";

                        type = "flat_dt";

                        compression = "none";

                    };

                };

 

                configurations {

                    default = "config-1";

 

                    @config-SEQ {

                        description = "NAME";

                        firmware = "atf";

                        loadables = "scp\0uboot";

                        fdt = "fdt-SEQ";

                    };

                };

            };

        };

    };

 

    __symbols__ {

        L2 = "/cpus/l2-cache";

        osc24M = "/osc24M_clk";

        osc32k = "/osc32k_clk";

        binman = "/binman";

    };

};

Even with disabled nodes and disabled interrupt controllers, WDT resets still take place on the baseboard.

U-Boot SPL 2021.01 Armcube (Apr 03 2021 - 21:07:36 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

NOTICE:  BL31: Built : 20:05:13, Apr  3 2021

NOTICE:  BL31: Detected Allwinner A64/H64/R18 SoC (1689)

NOTICE:  BL31: Found U-Boot DTB at 0x407d2d0, model: SOPINE with CRIPPLED baseboard

INFO:    ARM GICv2 driver initialized <-- Disable this!

INFO:    Configuring SPC Controller

INFO:    PMIC: Probing AXP803 on RSB

WARNING: PMIC: No PMIC DT node, skipping setup

INFO:    BL31: Platform setup done!

INFO:    BL31: Initializing runtime services

...

No effect. Let’s disable that “ARM GICv2 driver” in TF-A code next.

U-Boot SPL 2021.01 Armcube (Apr 03 2021 - 21:22:31 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

NOTICE:  BL31: Built : 21:22:02, Apr  3 2021

NOTICE:  BL31: Detected Allwinner A64/H64/R18 SoC (1689)

NOTICE:  BL31: Found U-Boot DTB at 0x407d2d0, model: SOPINE with CRIPPLED baseboard

INFO:    Configuring SPC Controller

INFO:    PMIC: Probing AXP803 on RSB

WARNING: PMIC: No PMIC DT node, skipping setup

INFO:    BL31: Platform setup done!

INFO:    BL31: Initializing runtime services

...

No effect. Let’s remove more TF-A code and see what happens. I’ve removed the security setup and even PMIC initialization.

U-Boot SPL 2021.01 Armcube (Apr 03 2021 - 21:29:08 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

NOTICE:  BL31: Built : 21:28:38, Apr  3 2021

NOTICE:  BL31: Detected Allwinner A64/H64/R18 SoC (1689)

NOTICE:  BL31: Found U-Boot DTB at 0x407d2d0, model: SOPINE with CRIPPLED baseboard

INFO:    BL31: Platform setup done!

INFO:    BL31: Initializing runtime services

...

No effect. Let’s obliterate all DTB loading code in the TF-A next.

U-Boot SPL 2021.01 Armcube (Apr 03 2021 - 21:36:02 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

NOTICE:  BL31: Built : 21:35:33, Apr  3 2021

NOTICE:  BL31: Detected Allwinner A64/H64/R18 SoC (1689)

NOTICE:  BL31: No DTB found.

INFO:    BL31: Platform setup done!

INFO:    BL31: Initializing runtime services

...

No effect. Let’s initiate a WDT reset as the very first thing TF-A does – a Hail Mary pass. Here is the code and results.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

// bl31/bl31_main.c

void bl31_main(void)

{

    NOTICE("BL31: %s\n", version_string);

    NOTICE("BL31: %s\n", build_message);

    INFO("** Let's try a reboot! **\n");

 

#define mmio_read_32(c)      ({unsigned int __v = (*(volatile unsigned int *)(c)); __v; })

#define mmio_write_32(c, v)  ((*(volatile unsigned int *)(c)) = (v))

#define TIMER_REG            0x01f01000

 

    // Set the WDT timeout period in WDOG_MODE_REG

    mmio_write_32(TIMER_REG + 0xB8, (1 << 5));  // Set 7:4 to 0001 (...00010000) or 2s

 

    // Reset the WDT in WDOG_CTRL_REG

    mmio_write_32(TIMER_REG + 0xB0, (0xA57 << 1) | (1 << 0));  // Set 12:1 as 0xA57, and 0:0 to 1

 

    // Set system reset function in WDOG_CFG_REG to 01

    mmio_write_32(TIMER_REG + 0xB4, 0x1);

 

    // Enable watchdog in WDOG_MODE_REG

    mmio_write_32(TIMER_REG + 0xB8, mmio_read_32(TIMER_REG + 0xB8) | (1 << 0));  // Set 0:0 to 1 to enable WDT

 

    for(;;) {}

}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

U-Boot SPL 2021.01 Armcube (Apr 03 2021 - 21:41:49 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

NOTICE:  BL31: Built : 21:41:20, Apr  3 2021

INFO:    ** Let's try a reboot! **

 

U-Boot SPL 2021.01 Armcube (Apr 03 2021 - 21:41:49 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

NOTICE:  BL31: Built : 21:41:20, Apr  3 2021

INFO:    ** Let's try a reboot! **

 

U-Boot SPL 2021.01 Armcube (Apr 03 2021 - 21:41:49 +0000)

DRAM: 2048 MiB

Trying to boot from MMC1

NOTICE:  BL31: v2.4(debug):v2.4-dirty

NOTICE:  BL31: Built : 21:41:20, Apr  3 2021

INFO:    ** Let's try a reboot! **

...

No effect. Literally, the first action TF-A takes is to initiate a WDT reset, and it succeeds. Without DTB initialization, without GIC interrupts setup, and without any register writes at all except the WDT reset, a reset takes place. How can the baseboard WDT reset be disabled?

Does the RX UART0 pin on the baseboard being pulled high trigger a wakeup?

Another hardware difference between the clusterboard and baseboard is the presence of a small protection circuit to prevent the RX pin of UART0 from supplying power to the SoC while it is turned off.

RX pin pulled high on the baseboard only
RX pin pulled high on the baseboard only

The clusterboard has no such circuit. As an experiment, I connected the clusterboard’s Z1 regulator’s input (DCDC1) to a 10kO resistor and then to the PB9_A pin of J4. No effect. No reset was observed.

With no battery, there should be 0V across the OD3 Schottky diode, right?

Previously, I measured 2.78V across the baseboard’s battery jack in the diagram below.

RTC battery jack on baseboard
RTC battery jack on baseboard

This perplexed me earlier and sent me off in another direction because the un-batteried RTC voltage is the same in both the clusterboard and baseboard. According to the schematics, with no battery and hence no electrical connection to the OD3 Schottky diode, there should be 0V across the diode.

Phantom diode voltage should be 0V
Phantom diode voltage should be 0V

That is not 0V. Moreover, when I walk away and come back, the voltage across the diode changes.

Phantom diode voltage changes
Phantom diode voltage changes

Here is a circuit representation of the diode having an EMF across it.

RTC OD3 diode voltage schematic
RTC OD3 diode voltage schematic

What is really curious is that my body’s capacitance affects the voltage readings across the RTC. When my leg moves closer to the multimeter’s probe cables, the voltage changes.

My background is in discrete electronics, so forgive me if EM noise and stray capacitance were easily discerned by the reader. I pulled on this thread just for a bit, next.

Coming back to that 2.78V, the RTC voltage across the battery jack changes wildly depending on how far or close my body is to the baseboard like a theremin, or if I touch the positive line. Is the baseboard that sensitive?

Here is the same as above, but with a simple oscilloscope on the VCC-RTC battery line.

This waveform is interesting, actually.

VCC-RTC ripple waveform
VCC-RTC ripple waveform

Does this waveform come from the 5V wall adapter?

With the oscilloscope, the DC-out of the wall adapter is a clean 5.3V with no latent ripple. You could level a picture frame by how flat the output is. This is surprising because the frequency of the waveform is approximately 60 Hz. Even though the voltage is clean, we shouldn’t discount the ground loop.

Why does this waveform look familiar?

At first blush, it looks like a square wave is fed to a high-pass RC filter as explained in this StackExchange thread.

RC differentiator waveform
RC differentiator waveform (credit)

There is no 60 Hz PLL on the SOPINE. Any boost converters would operate outside human hearing well past 20 kHz. The two external clocks are 24 MHz and 32.7 kHz. The latter may cause a harmonic which appears to be around 60 Hz. Or, it may just be me and my human capacitive-antenna absorbing transient EM signals from the lights and mains cabling.

Does this waveform manifest when the baseboard is on 5V batteries?

No. Operating on batteries in a minimal-EM environment results in the disappearance of the above waveform. As Dave shared with me, my body is likely operating as an antenna for surrounding mains EM with the wall adapter facilitating the unwanted circuit.

On batteries through a 5V USB battery, we can see 2.78V is consistently observed without stray, human capacitance.

As an aside, just measuring the waveform across the RTC battery terminals with an oscilloscope drops the voltage reading from 2.78V to 0.3V. This oscilloscope has capacitance and resistance, and it looks like it is preventing an RC circuit from charging a baseboard capacitor. When the scope is removed, a baseboard capacitor slowly charges as seen below.

Let’s explore something else and come back to that diode voltage.

Let’s measure the baseboard OD3 diode voltage again under 5V battery power.

This time on 5V batteries, there is no OD3 diode voltage anymore. It is gone. It was an artifact of stray capacitance and ambient EM radiation passing through a rectifier.

We can, however, bypass the diode and see how the VCC-RTC line behaves. When I short the OD3 diode, the oscilloscope no longer interferes with the VCC-RTC voltage and we can see about 2.78V.

The clusterboard has no such rectifier and shows a nominal scope output across VCC-RTC.

Clusterboard VCC-RTC voltage under battery power
Clusterboard VCC-RTC voltage under battery power

Is the conclusion that the VCC-RTC is also unpowered on the baseboard, there is no leak or PCB error, yet an externally-powered RTC is not what wakes up the baseboard?

The 2.78V is still a mystery. There is no documentation on it. That needs to be resolved.

Unable to give up, let’s turn to the powered microscope and physically map the VCC-RTC line on the SOPINE module.

SOPINE PCB top near the PMIC and VCC_RTC pin
SOPINE PCB top near the PMIC and VCC_RTC pin

Hello. Where is this little hole taking the electron flow? There should be no other components on the PMIC’s VCC_RTC line beside that 10uF capacitor. There was an OD4 diode, but it is deleted from the schematics.

OD4 is labeled as deleted
OD4 is labeled as deleted

Let’s see where this hole goes.

SOPINE PCB bottom near the SD card slot
SOPINE PCB bottom near the SD card slot

The diode exists. In fact, it is the only Schottky diode in the schematic. It is stamped “SS”, short for “SS14”, which is an AliExpress-found equivalent for the XBS104S14 the schematic called (calls?) for. It’s not supposed to exist. Could this be powering the VCC-RTC line? Let’s see.

The VCC-RTC line is powered by the PMIC. In fact, placing the probe right on pin 49 of the PMIC (VCC_RTC) shows 3.00V. Placing the probe on the diode also shows 3.00V on one side and 2.80V on the other side.

Wait. The voltage drop across the diode is 0.2V? That doesn’t match the typical forward voltage drop of 0.49V in the specs. It turns out in the graphs, under a very low current the forward voltage drop can be around 0.2V.

XBS104S14 forward voltage response curves
XBS104S14 forward voltage response curves

This 2.80V is remarkably close to the 2.78V measured across the battery terminals, enough so that I am satisfied.

The real-time clock (RTC) is powered and running. Then why does only an RTC battery enable a clusterboard SOPINE wakeup, but one is unnecessary in the baseboard? We are seemingly back at square one.

Is the VCC-RTC line dirty? Does the RTC ever lose power, however briefly?

A voltmeter and an oscilloscope with a microsecond range (by default) show an instantaneous 2.78V. Let’s make the scope more sensitive and stretch the time range to see if there are any dropouts.

Clusterboard VCC-RTC voltage dropouts
Clusterboard VCC-RTC voltage dropouts

Yes! The clusterboard experiences small voltage dropouts when observed over a period of several seconds. From the image above, the RTC voltage drops from 2.78V to about 2V briefly. Here is a video of this phenomenon.

What causes these RTC voltage dropouts?

These voltage dropouts only happen when on 5V battery power. When the clusterboard is powered by the 15A brick adapter, no dropouts occur. I suspect a clusterboard with a single SOPINE module draws more than 1A – the limit of my USB batteries.

Can I try something random that makes little sense but might just work?

Why not give the clusterboard VCC-RTC the full 3.00V without 2xAA batteries? Shot in the dark, with only one SOPINE on the clusterboard for obvious safety, let’s short the OD4 diode on the SOPINE module (to reclaim that 0.2V drop) and observe the UART line to see what the SoC does if anything.

Short the OD4 Schottky diode to gain 0.2V
Short the OD4 Schottky diode to gain 0.2V

The SOPINE in the clusterboard sometimes resets! Without 2xAA batteries (and yes without shorting VCC to GND), a reset is triggered, but only sometimes.

Could it be possible there is just not enough current reaching the RTC? When on USB 5V batteries, the VCC-RTC did experience voltage dropouts seemingly around when the RTL8211E chips were blinking. Raising the VCC-RTC line to 3.00V may have taken the line right on the cusp of supplying sufficient current to fully operate the RTC.

Let’s trace the current differences between the baseboard and clusterboard.

The AXP803 PMIC spec shows the RTCLDO (VCC_RTC) supplies 60mA typical. There is a 10kΩ pull-up resistor drawing 2.8V/10kΩ or 0.28mA ≪ 60 mA, so current isn’t lost there. How about that PWROK pin? We learned earlier it is tied to RTCLDO. Let’s see where its current goes.

Possible RTCLDO current through the reset system
Possible RTCLDO current through the reset system

The worst-case current seems to be if RESET is tied to GND which becomes 3.0V/1kΩ or 3mA ≪ 60 mA. However, there is a huge difference between the baseboard and the clusterboard: the baseboard has no reset mechanism, while the clusterboard has NOT-gates which constantly draw current.

Possible RTCLDO current drain through reset system
Possible RTCLDO current drain through reset system

How much current? Let’s turn to the 74LVT04 logic IC specs.

74LVT04 current and voltage specs
74LVT04 current and voltage specs

The -32mA stands out for the high output which is the default state. However, two NOT-gates are used per SOPINE, so I’m unsure how to calculate the negative current draw on RESET. It’s possible there is enough EMF to draw a higher current on the RESET line through the PWROK pin and finally through the VCC_RTC which meets or exceeds the 60mA (and 100mA max) of the PMIC.

Can we simulate the reset circuit?

After two days of effort with Multisim and OrCAD, the experience is too painful. The NXP line of BiCMOS 74LVTxx components is not found in either simulation suite, and efforts to download and/or import the model files from 1999 have left me shaking my head.

OrCAD 74LVT04 import failure
OrCAD 74LVT04 import failure

I’m hoping my StackExchange question can help. I got close with OrCAD, but not close enough. The problem with using digital parts in Multisim is the internal current and resistances are not simulated, which is what we crucially need.

In the meantime, let’s resist the temptation to slice this delicious wire separating the SOPINE RTC from the clusterboard, yet in which doing so would definitely answer the question of whether the non-reset culprit is ironically the reset distributor.

Clusterboard RESET_A line from 74LVT04
Clusterboard RESET_A line from 74LVT04

Let’s model the 74LVT04 IC electrically and simulate the reset circuit.

StackExchange came through. We can now simulate the electrical characteristics of the 74LVT04 hex inverter.

However, when I attempt to recreate the SOPINE schematic around the RTCLDO, the clusterboard electrical values don’t match real measurements, plus I see high-frequency ringing and pulses of high voltage.

SOPINE schematic with clusterboard reset
SOPINE schematic with clusterboard reset

When I read the AXP803 manual carefully, I realize the PWROK is a push-pull line pulling to VCC_RTC internally. What is this? Research shows it is a GPIO line flanked by two transistors (MOSFETs?), one connected to GND, and the other to VCC_RTC. Let’s try to simulate that with ideal MOSFETs.

AXP803 PWROK push-pull simulation
AXP803 PWROK push-pull simulation

By putting a probe between R16 and C79 on the SOPINE, we can measure the PWROK line voltage. There is a difference between the baseboard and the clusterboard:

  • Baseboard PWROK: 3.00V
  • Clusterboard PWROK: 3.24V
  • Clusterboard R358: -2mV
  • Clusterboard RESET_A, SOPINE inserted: 3.29V
  • Clusterboard RESET_A, no SOPINE: 3.29V
SOPINE PWROK voltage probe point
SOPINE PWROK voltage probe point

Even though the push-pull MOSFETs are not correct, we can simulate the 74LVT04 feeding 3.3V into the PWROK line which is supposed to be a max of 3.00V. This is as far as we can simulate, however. We can see how complex this is becoming, and how many unknown components there are to guess.

SOPINE power simulation is now too complex
SOPINE power simulation is now too complex

The PMIC is a reactive component with an LDO and unknown push-pull circuit. Can we empirically discover more electrical differences instead?

Simulations aren’t panning out. We don’t have any information on the design of the push-pull and how it reacts to back-EMF from the 74LVT04. We just know 3.3V of back-EMF prevents a WDT reset. Let’s try something else. Now that we know how to measure the PWROK line (and RESET input), let’s graph the voltage around resets.

PWROK clusterboard voltage during reset. Left: no RTC battery. Right: Normal and RTC battery
PWROK clusterboard voltage during reset. Left: no RTC battery. Right: Normal and RTC battery
Clusterboard PWROK voltages after WDT reset
Clusterboard PWROK voltages after WDT reset

Result: On the clusterboard during a WDT reset, the PWROK line drops from 3.24V to 2.25V and never recovers unless a hardware reset is performed.

This is a wonderful discovery. It means we are on the right track. The left waveform is self-explanatory. The right waveform is more interesting. Here are the observations (15A power, one SOPINE, clusterboard):

Left PWROK waveform:

  • No RTC battery, WDT reset fires after 2s.

Right PWROK waveform:

  • No RTC battery, no WDT reset.
  • RTC battery, no WDT reset.
  • RTC battery, WDT reset fires after 2s.

The right waveform is identical for the above three observations as well. Only the left waveform is unusual for normal operations.

What constitutes a HIGH signal in the A64?

Turning to the electrical guide of the A64, we see that VCC-IO can range from 3.0V to 3.6V.

A64 VCC-IO limits
A64 VCC-IO limits

In the SOPINE schematic, VCC-IO is supplied by DCDC1 from the PMIC, which is regulated at 3.3V. Then, on page 34 of the datasheet, we see the following table.

A64 DC electrical characteristics table
A64 DC electrical characteristics table

Then, 0.7 * 3.3V = 2.31V. Thus, the minimum threshold for a HIGH signal (no reset) on the A64 RESET# line from PWROK is 2.31V, yet PWROK drops to 2.25V – still HIGH?

But is it LOW? The upper threshold for a LOW signal is 0.3 * 3.3V = 0.99V. What does inverting-logic RESET do when the signal is between HIGH and LOW? The datasheet indicates the line is not pulled up or down. Normally the region between logic thresholds is undefined, but RESET is inverted. Here is a possible logic inverter with a single Darlington pair (I chose values for saturation at 2.3V to show the effect clearly).

Possible RESET line buffer/inverter in the A64 at 2.3V saturation
Possible RESET line buffer/inverter in the A64 at 2.3V saturation

Among friends, let’s state the supposition that under 2.3V a RESET will occur until we know more. Now we can move on to how to solve this problem without 2xAA batteries by preventing the voltage drop at all.

Let’s prevent the PWROK voltage drop.

How to solve this back-EMF from the clusterboard reset distributor without resorting to SMD re-soldering (or batteries)? Below is a fun animation I made while I wait for some custom testing hardware to arrive.

Clusterboard reset distributor PCB layers
Clusterboard reset distributor PCB layers (source)

The positive line from the RTC battery runs directly into the seven SOPINES through the 204-pin DDR3 SODIMM connectors. Those run directly into the VCC-RTC pins on the A64 ICs. The PMIC VCC_RTC out is protected by the Schottky diode per SOPINE. However, the NMI-IRQ line is susceptible to the battery’s EMF, and the battery experiences reverse voltage from the NMI-IRQ line. A Schottky diode with, say, a 0.2V drop on this line will help the battery (until it drops below 2.8V!) but doesn’t address the reset problem.

Clusterboard +RTC battery line runs across all SOPINEs
Clusterboard +RTC battery line runs across all SOPINEs

Until we understand how back-EMF on PWROK prevents a reset and how a battery on IRQ allows it again, let’s try to force 3V on the RTC line from some other source on the clusterboard to simulate a connected battery.

The DVDDIO (AVDDH) line on the clusterboard supplies 3.3V and runs through a middle PCB layer on the clusterboard. Let’s see if we can place a common diode between it and the +RTC battery terminal. But, where does a DVDDIO pin come up for air, and which diode will maintain 3.0V or more?

Clusterboard 3.3V DVDDIO layer ideal exit point for soldering
Clusterboard 3.3V DVDDIO layer ideal exit point for soldering (source)

The cathode (-) of diode D1 is a perfect tap: it has 3.3V, is regulated by the clusterboard, is large enough to solder with conventional tools, and is right beside the battery holder. You read that right: the cathode (minus side) is the target.

Safe 3.3V tap point
Safe, regulated 3.3V tap point

What common barrier diode should we use to power the RTC line?

Diodes are neat. They behave differently under different voltages and loads, however. We need at least 3.0V on RTC with one SOPINE and seven SOPINEs. Let’s simulate how a handful of diodes behave under different loads.

VCC-RTC barrier diode simulations
VCC-RTC barrier diode simulations

Here, a couple of barrier diodes stand out: 1N5819 (1A) and 1N5822 (3A). These are ones I happen to have.


Let’s give the DVDDIO diode cathode-tap workaround a try.

WDT reset works. Hardware reset works. Normal operation works. How satisfying.

Result: The diode solution works, the clusterboard now resets under a WDT reset, and the clusterboard reset hardware is still functional.


Solution & Discussion

The reset distributor on the clusterboard would be better served with normally-floating outputs (not active high) and pulled low when in reset mode (like a momentary switch).

I’m convinced this is not a design oversight – the reset system looks great on paper and my hat goes off to the designers.

Who knew that some back-EMF from a system we take for granted would interfere with the little-known circuitry of the PMIC that cannot even be simulated because it is reactive?

There is no bypass or hack that can be applied to the clusterboard v2.3 reset distributor itself, so countering the 74LVT04 back-EMF with 3.3V on the RTC battery line, and possibly adding a diode between the battery holder positive terminal and the +RTC line to protect the battery, is a reset solution for under 50 cents.

Warning:

Do not use the 2xAA batteries with this method unless you unsolder the battery holder and add a protection diode (

see what happens

). Lithium batteries especially do not respond well to being charged. Also, the batteries must be over 2.80V, so 2×1.2V NiMH batteries will not work.

Solution Details

You can probably run a diode on the outside of the battery holder.

Personally, I’d like to remove the battery holder and affix the barrier diode. The holder is held with a weak glue pad which can gently be pried up by hand, but be gentle with the copper pads. If we manage to keep our copper pads intact, we end up with the following fix.

Pine64 clusterboard reset solution
Pine64 clusterboard reset solution

Remove the Battery Holder Easily

An effective way to remove the battery holder without a soldering iron is to use an Exacto knife on the tiny metal flaps on the battery holder posts.

Bend the battery connector flaps to the middle
Bend the battery connector flaps to the middle

With a Jeweler’s screwdriver, push the middle of both terminals so the metal pieces fall through and off.

Push the metal pieces through so they fall out
Push the metal pieces through so they fall out

We will be left with a battery holder that is easy to pull off, only held by sticky tape, because the PCB posts are no longer attached to the battery holder.

The battery holder is no longer held to the PCB
The battery holder is no longer held to the PCB

Simply lift up the battery holder from the far end without tools.

The battery holder is held with a weak tape
The battery holder is held with a weak tape

With clippers or a Dremel, we can cut the battery posts to mere nubs. Be careful not to Dremel the PCB mask layer (green layer).

Dremel the battery posts to nubs
Dremel the battery posts to nubs

Solder the Low-Forward-Voltage Schottky Diode

Simply place the 1N5819 diode with the silver band facing the +RTC terminal on the clusterboard and add some flux before soldering.

Place the diode and flux on the clusterboard
Place the diode and flux on the clusterboard

One suggestion is to hold the diode down with a clip before soldering.

Hold the diode in place with a clip
Hold the diode in place with a clip

Finally, tin the soldering iron, and within 2~3 seconds per terminal, apply solder to the D1 diode and then the +RTC solder pad. Clean the flux with a saponify cleaner, or isopropyl alcohol with a Q-tip works well.

The end result should look something like this.

Soldered Schottky diode between D1 and +RTC
Soldered Schottky diode between D1 and +RTC

Success: We’ve solved the Pine64 clusterboard A64 SoC reset problem to allow software and watchdog-timer resets. This ensures the individual compute modules will restart when they run into software trouble.



from Hacker News https://ift.tt/3C1Gcnc

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.