build a homemade diy tft lcd driver breakout board pcb price
I found the TFT screen and Uno on Banggood.com about a month ago and over the weekend I was messing with the pair and found the tftbmp draw code in the demo.. I extended it with the ability to read any bmp file on the SD card.. so all you do is put your bitmaps on the SD and plug it in.. Having to add/edit/recompile/reload the Uno everytime is BS... Here is my code:
Add some jazz & pizazz to your project with a color touchscreen LCD. This TFT display is big (2.8" diagonal) bright (4 white-LED backlight) and colorful! 240x320 pixels with individual RGB pixel control, this has way more resolution than a black and white 128x64 display. As a bonus, this display has a resistive touchscreen attached to it already, so you can detect finger presses anywhere on the screen.
The ILI9325 screen has been discontinued, but the 2.8" ILI341 touchscreen breakout is nearly identical, having both an 8-bit and a SPI interface mode as well as a MicroSD card socket, all in the same form factor!
Adafruit invests time and resources providing this open source design, please support Adafruit and open-source hardware by purchasing products from Adafruit!
After I published my $1 MCU write-up, several readers suggested I look at application processors — the MMU-endowed chips necessary to run real operating systems like Linux. Massive shifts over the last few years have seen internet-connected devices become more featureful (and hopefully, more secure), and I’m finding myself putting Linux into more and more places.
Among beginner engineers, application processors supplicate reverence: one minor PCB bug and your $10,000 prototype becomes a paperweight. There’s an occult consortium of engineering pros who drop these chips into designs with utter confidence, while the uninitiated cower for their Raspberry Pis and overpriced industrial SOMs.
This article is targeted at embedded engineers who are familiar with microcontrollers but not with microprocessors or Linux, so I wanted to put together something with a quick primer on why you’d want to run embedded Linux, a broad overview of what’s involved in designing around application processors, and then a dive into some specific parts you should check out — and others you should avoid — for entry-level embedded Linux systems.
Just like my microcontroller article, the parts I picked range from the well-worn horses that have pulled along products for the better part of this decade, to fresh-faced ICs with intriguing capabilities that you can keep up your sleeve.
If my mantra for the microcontroller article was that you should pick the right part for the job and not be afraid to learn new software ecosystems, my argument for this post is even simpler: once you’re booted into Linux on basically any of these parts, they become identical development environments.
That makes chips running embedded Linux almost a commodity product: as long as your processor checks off the right boxes, your application code won’t know if it’s running on an ST or a Microchip part — even if one of those is a brand-new dual-core Cortex-A7 and the other is an old ARM9. Your I2C drivers, your GPIO calls — even your V4L-based image processing code — will all work seamlessly.
At least, that’s the sales pitch. Getting a part booted is an entirely different ordeal altogether — that’s what we’ll be focused on. Except for some minor benchmarking at the end, once we get to a shell prompt, we’ll consider the job completed.
As a departure from my microcontroller review, this time I’m focusing heavily on hardware design: unlike the microcontrollers I reviewed, these chips vary considerably in PCB design difficulty — a discussion I would be in error to omit. To this end, I designed a dev board from scratch for each application processor reviewed. Well, actually, many dev boards for each processor: roughly 25 different designs in total. This allowed me to try out different DDR layout and power management strategies — as well as fix some bugs along the way.
I intentionally designed these boards from scratch rather than starting with someone else’s CAD files. This helped me discover little “gotchas” that each CPU has, as well as optimize the design for cost and hand-assembly. Each of these boards was designed across one or two days’ worth of time and used JLC’s low-cost 4-layer PCB manufacturing service.
These boards won’t win any awards for power consumption or EMC: to keep things easy, I often cheated by combining power rails together that would typically be powered (and sequenced!) separately. Also, I limited the on-board peripherals to the bare minimum required to boot, so there are no audio CODECs, little I2C sensors, or Ethernet PHYs on these boards.
As a result, the boards I built for this review are akin to the notes from your high school history class or a recording you made of yourself practicing a piece of music to study later. So while I’ll post pictures of the boards and screenshots of layouts to illustrate specific points, these aren’t intended to serve as reference designs or anything; the whole point of the review is to get you to a spot where you’ll want to go off and design your own little Linux boards. Teach a person to fish, you know?
Coming from microcontrollers, the first thing you’ll notice is that Linux doesn’t usually run on Cortex-M, 8051, AVR, or other popular microcontroller architectures. Instead, we use application processors — popular ones are the Arm Cortex-A, ARM926EJ-S, and several MIPS iterations.
The biggest difference between these application processors and a microcontroller is quite simple: microprocessors have a memory management unit (MMU), and microcontrollers don’t. Yes, you can run Linux without an MMU, but you usually shouldn’t: Cortex-M7 parts that can barely hit 500 MHz routinely go for double or quadruple the price of faster Cortex-A7s. They’re power-hungry: microcontrollers are built on larger processes than application processors to reduce their leakage current. And without an MMU and generally-low clock speeds, they’re downright slow.
Other than the MMU, the lines between MCUs and MPUs are getting blurred. Modern application processors often feature a similar peripheral complement as microcontrollers, and high-end Cortex-M7 microcontrollers often have similar clock speeds as entry-level application processors.
When your microcontroller project outgrows its super loop and the random ISRs you’ve sprinkled throughout your code with care, there are many bare-metal tasking kernels to turn to — FreeRTOS, ThreadX (now Azure RTOS), RT-Thread, μC/OS, etc. By an academic definition, these are operating systems. However, compared to Linux, it’s more useful to think of these as a framework you use to write your bare-metal application inside. They provide the core components of an operating system: threads (and obviously a scheduler), semaphores, message-passing, and events. Some of these also have networking, filesystems, and other libraries.
Comparing bare-metal RTOSs to Linux simply comes down to the fundamental difference betweenthese and Linux: memory management and protection. This one technical difference makes Linux running on an application processor behave quite differently from your microcontroller running an RTOS.((Before the RTOS snobs attack with pitchforks, yes, there are large-scale, well-tested RTOSes that are usually run on application processors with memory management units. Look at RTEMS as an example. They don’t have some of the limitations discussed below, and have many advantages over Linux for safety-critical real-time applications.))
Small microcontroller applications can usually get by with static allocations for everything, but as your application grows, you’ll find yourself calling malloc() more and more, and that’s when weird bugs will start creeping up in your application. With complex, long-running systems, you’ll notice things working 95% of the time — only to crash at random (and usually inopportune) times. These bugs evade the most javertian developers, and in my experience, they almost always stem from memory allocation issues: usually either memory leaks (that can be fixed with appropriate free() calls), or more serious problems like memory fragmentation (when the allocator runs out of appropriately-sized free blocks).
Because Linux-capable application processors have a memory management unit, *alloc() calls execute swiftly and reliably. Physical memory is only reserved (faulted in) when you actually access a memory location. Memory fragmentation is much less an issue since Linux frees and reorganizes pages behind the scenes. Plus, switching to Linux provides easier-to-use diagnostic tools (like valgrind) to catch bugs in your application code in the first place. And finally, because applications run in virtual memory, if your app does have memory bugs in it, Linux will kill it — leaving the rest of your system running. ((As a last-ditch kludge, it’s not uncommon to call your app in a superloop shell script to automatically restart it if it crashes without having to restart the entire system.))
Running something like lwIP under FreeRTOS on a bare-metal microcontroller is acceptable for a lot of simple applications, but application-level network services like HTTP can burden you to implement in a reliable fashion. Stuff that seems simple to a desktop programmer — like a WebSockets server that can accept multiple simultaneous connections — can be tricky to implement in bare-metal network stacks. Because C doesn’t have good programming constructs for asynchronous calls or exceptions, code tends to contain either a lot of weird state machines or tons of nested branches. It’s horrible to debug problems that occur. In Linux, you get a first-class network stack, plus tons of rock-solid userspace libraries that sit on top of that stack and provide application-level network connectivity. Plus, you can use a variety of high-level programming languages that are easier to handle the asynchronous nature of networking.
Somewhat related is the rest of the standards-based communication / interface frameworks built into the kernel. I2S, parallel camera interfaces, RGB LCDs, SDIO, and basically all those other scary high-bandwidth interfaces seem to come together much faster when you’re in Linux. But the big one is USB host capabilities. On Linux, USB devices just work. If your touchscreen drivers are glitching out and you have a client demo to show off in a half-hour, just plug in a USB mouse until you can fix it (I’ve been there before). Product requirements change and now you need audio? Grab a $20 USB dongle until you can respin the board with a proper audio codec. On many boards without Ethernet, I just use a USB-to-Ethernet adapter to allow remote file transfer and GDB debugging. Don’t forget that, at the end of the day, an embedded Linux system is shockingly similar to your computer.
When thinking about embedded device security, there are usually two things we’re talking about: device security (making sure the device can only boot from verified firmware), and network security (authentication, intrusion prevention, data integrity checks, etc).
Device security is all about chain of trust: we need a bootloader to read in an encrypted image, decrypt and verify it, before finally executing it. The bootloader and keys need to be in ROM so that they cannot be modified. Because the image is encrypted, nefarious third-parties won’t be able to install the firmware on cloned hardware. And since the ROM authenticates the image before executing, people won’t be able to run custom firmware on the hardware.
Network security is about limiting software vulnerabilities and creating a trusted execution environment (TEE) where cryptographic operations can safely take place. The classic example is using client certificates to authenticate our client device to a server. If we perform the cryptographic hashing operation in a secure environment, even an attacker who has gained total control over our normal execution environment would be unable to read our private key.
In the world of microcontrollers, unless you’re using one of the newer Cortex-M23/M33 cores, your chip probably has a mishmash of security features that include hardware cryptographic support, (notoriously insecure) flash read-out protection, execute-only memory, write protection, TRNG, and maybe a memory protection unit. While vendors might have an app note or simple example, it’s usually up to you to get all of these features enabled and working properly, and it’s challenging to establish a good chain of trust, and nearly impossible to perform cryptographic operations in a context that’s not accessible by the rest of the system.
Secure boot isn’t available on every application processor reviewed here, it’s much more common. While there are still vulnerabilities that get disclosed from time to time, my non-expert opinion is that the implementations seem much more robust than on Cortex-M parts: boot configuration data and keys are stored in one-time-programmable memory that is not accessible from non-privileged code. Network security is also more mature and easier to implement using Linux network stack and cryptography support, and OP-TEE provides a ready-to-roll secure environment for many parts reviewed here.
Imagine that you needed to persist some configuration data across reboot cycles. Sure, you can use structs and low-level flash programming code, but if this data needs to be appended to or changed in an arbitrary fashion, your code would start to get ridiculous. That’s why filesystems (and databases) exist. Yes, there are embedded libraries for filesystems, but these are way clunkier and more fragile than the capabilities you can get in Linux with nothing other than ticking a box in menuconfig. And databases? I’m not sure I’ve ever seen an honest attempt to run one on a microcontroller, while there’s a limitless number available on Linux.
In a bare-metal environment, you are limited to a single application image. As you build out the application, you’ll notice things get kind of clunky if your system has to do a few totally different things simultaneously. If you’re developing for Linux, you can break this functionality into separate processes, where you can develop, debug, and deploy separately as separate binary images.
The classic example is the separation between the main app and the updater. Here, the main app runs your device’s primary functionality, while a separate background service can run every day to phone home and grab the latest version of the main application binary. These apps do not have to interact at all, and they perform completely different tasks, so it makes sense to split them up into separate processes.
Bare-metal MCU development is primarily done in C and C++. Yes, there are interesting projects to run Python, Javascript, C#/.NET, and other languages on bare metal, but they’re usually focused on implementing the core language only; they don’t provide a runtime that is the same as a PC. And even their language implementation is often incompatible. That means your code (and the libraries you use) have to be written specifically for these micro-implementations. As a result, just because you can run MicroPython on an ESP32 doesn’t mean you can drop Flask on it and build up a web application server. By switching to embedded Linux, you can use the same programming languages and software libraries you’d use on your PC.
Classic bare-metal systems don’t impose any sort of application separation from the hardware. You can throw a random I2C_SendReceive() function in anywhere you’d like.
In Linux, there is a hard separation between userspace calls and the underlying hardware driver code. One key advantage of this is how easy it is to move from one hardware platform to another; it’s not uncommon to only have to change a couple of lines of code to specify the new device names when porting your code.
Yes, you can poke GPIO pins, perform I2C transactions, and fire off SPI messages from userspace in Linux, and there are some good reasons to use these tools during diagnosing and debugging. Plus, if you’re implementing a custom I2C peripheral device on a microcontroller, and there’s very little configuration to be done, it may seem silly to write a kernel driver whose only job is to expose a character device that basically passes on whatever data directly to the I2C device you’ve built.
But if you’re interfacing with off-the-shelf displays, accelerometers, IMUs, light sensors, pressure sensors, temperature sensors, ADCs, DACs, and basically anything else you’d toss on an I2C or SPI bus, Linux already has built-in support for this hardware that you can flip on when building your kernel and configure in your DTS file.
When you combine all these challenges together, you can see that building out bare-metal C code is challenging (and thus expensive). If you want to be able to staff your shop with lesser-experienced developers who come from web-programming code schools or otherwise have only basic computer science backgrounds, you’ll need an architecture that’s easier to develop on.
This is especially true when the majority of the project is hardware-agnostic application code, and only a minor part of the project is low-level hardware interfacing.
Sleep-mode power consumption. First, the good news: active mode power consumption of application processors is quite good when compared to microcontrollers. These parts tend to be built on smaller process nodes, so you get more megahertz for your ampere than the larger processes used for Cortex-M devices. Unfortunately, embedded Linux devices have a battery life that’s measured in hours or days, not months or years.
Modern low-power microcontrollers have a sleep-mode current consumption in the order of 1 μA — and that figure includes SRAM retention and usually even a low-power RTC oscillator running. Low-duty-cycle applications (like a sensor that logs a data point every hour) can run off a watch battery for a decade.
Application processors, however, can use 300 times as much power while asleep (that leaky 40 nm process has to catch up with us eventually!), but even thatpales in comparison to the SDRAM, which can eat through 10 mA (yes mA, not μA) or more in self-refresh mode. Sure, you can suspend-to-flash (hibernate), but that’s only an option if you don’t need responsive wake-up.
Even companies like Apple can’t get around these fundamental limitations: compare the 18-hour battery life of the Apple Watch (which uses an application processor) to the 10-day life of the Pebble (which uses an STM32 microcontroller with a battery half the size of the Apple Watch).
Boot time. Embedded Linux systems can take several seconds to boot up, which is orders of magnitude longer than a microcontroller’s start-up time. Alright, to be fair, this is a bit of an apples-to-oranges comparison: if you were to start initializing tons of external peripherals, mount a filesystem, and initialize a large application in an RTOS on a microcontroller, it could take several seconds to boot up as well. While boot time is a culmination of tons of different components that can all be tweaked and tuned, the fundamental limit is caused by application processors’ inability to execute code from external flash memory; they must copy it into RAM first ((unless you’re running an XIP kernel)).
Responsiveness. By default, Linux’s scheduler and resource system are full of unbounded latencies that under weird and improbable scenarios may take a long time to resolve (or may actually never resolve). Have you ever seen your mouse lock up for 3 seconds randomly? There you go. If you’re building a ventilator with Linux, think carefully about that. To combat this, there’s been a PREEMPT_RT patch for some time that turns Linux into a real-time operating system with a scheduler that can basically preempt anything to make sure a hard-real-time task gets a chance to run.
Also, when many people think they need a hard-real-time kernel, they really just want their code to be low-jitter. Coming from Microcontrollerland, it feels like a 1000 MHz processor should be able to bit-bang something like a 50 kHz square wave consistently, but you would be wrong. The Linux scheduler is going to give you something on the order of ±10 µs of jitter for interrupts, not the ±10 ns jitter you’re used to on microcontrollers. This can be remedied too, though: while Linux gobbles up all the normal ARM interrupt vectors, it doesn’t touch FIQ, so you can write custom FIQ handlers that execute completely outside of kernel space.
Honestly, in practice, it’s much more common to just delegate these tasks to a separate microcontroller. Some of the parts reviewed here even include a built-in microcontroller co-processor designed for controls-oriented tasks, and it’s also pretty common to just solder down a $1 microcontroller and talk to it over SPI or I2C.
The first step is to architect your system. This is hard to do unless what you’re building is trivial or you have a lot of experience, so you’ll probably start by buying some reference hardware, trying it out to see if it can do what you’re trying to do (both in terms of hardware and software), and then using that as a jumping-off point for your own designs.
I want to note that many designers focus too heavily on the hardware peripheral selection of the reference platform when architecting their system, and don’t spend enough time thinking about software early on. Just because your 500 MHz Cortex-A5 supports a parallel camera sensor interface doesn’t mean you’ll be able to forward-prop images through your custom SegNet implementation at 30 fps, and many parts reviewed here with dual Ethernet MACs would struggle to run even a modest web app.
Figuring out system requirements for your software frameworks can be rather unintuitive. For example, doing a multi-touch-capable finger-painting app in Qt 5 is actually much less of a resource hog than running a simple backend server for a web app written in a modern stack using a JIT-compiled language. Many developers familiar with traditional Linux server/desktop development assume they’ll just throw a .NET Core web app on their rootfs and call it a day — only to discover that they’ve completely run out of RAM, or their app takes more than five minutes to launch, or they discover that Node.js can’t even be compiled for the ARM9 processor they’ve been designing around.
The best advice I have is to simply try to run the software you’re interested in using on target hardware and try to characterize the performance as much as possible. Here are some guidelines for where to begin:
Slower ARM9 cores are for simple headless gadgets written in C/C++.Yes, you can run basic, animation-free low-resolution touch linuxfb apps with these, but blending and other advanced 2D graphics technology can really bog things down. And yes, you can run very simple Python scripts, but in my testing, even a “Hello, World!” Flask app took 38 seconds from launch to actually spitting out a web page to my browser on a 300 MHz ARM9. Yes, obviously once the Python file was compiled, it was much faster, but you should primarily be serving up static content using lightweight HTTP servers whenever possible. And, no, you can’t even compile Node.JS or .NET Core for these architectures. These also tend to boot from small-capacity SPI flash chips, which limits your framework choices.
Mid-range 500-1000 MHz Cortex-A-series systems can start to support interpreted / JIT-compiled languages better, but make sure you have plenty of RAM — 128 MB is really the bare minimum to consider. These have no issues running simple C/C++ touch-based GUIs running directly on a framebuffer but can stumble if you want to do lots of SVG rendering, pinch/zoom gestures, and any other canvas work.
Multi-core 1+ GHz Cortex-A parts with 256 MB of RAM or more will begin to support desktop/server-like deployments. With large eMMC storage (4 GB or more), decent 2D graphics acceleration (or even 3D acceleration on some parts), you can build up complex interactive touchscreen apps using native C/C++ programming, and if the app is simple enough and you have sufficient RAM, potentially using an HTML/JS/CSS-based rendering engine. If you’re building an Internet-enabled device, you should have no issues doing the bulk of your development in Node.js, .NET Core, or Python if you prefer that over C/C++.
I know that there are lots of people — especially hobbyists but even professional engineers — who have gotten to this point in the article and are thinking, “I do all my embedded Linux development with Raspberry Pi boards — why do I need to read this?” Yes, Raspberry Pi single-board computers, on the surface, look similar to some of these parts: they run Linux, you can attach displays to them, do networking, and they have USB, GPIO, I2C, and SPI signals available.
And for what it’s worth, the BCM2711 mounted on the Pi 4 is a beast of a processor and would easily best any part in this review on that measure. Dig a bit deeper, though: this processor has video decoding and graphics acceleration, but not even a single ADC input. It has built-in HDMI transmitters that can drive dual 4k displays, but just two PWM channels. This is a processor that was custom-made, from the ground up, to go into smart TVs and set-top boxes — it’s not a general-purpose embedded Linux application processor, so it isn’t generally suited for embedded Linux work.
It might be the perfect processor for your particular project, but it probably isn’t; forcing yourself to use a Pi early in the design process will over-constrain things. Yes, there are always workarounds to the aforementioned shortcomings — like I2C-interfaced PWM chips, SPI-interfaced ADCs, or LCD modules with HDMI receivers — but they involve external hardware that adds power, bulk, and cost. If you’re building a quantity-of-one project and you don’t care about these things, then maybe the Pi is the right choice for the job, but if you’re prototyping a real product that’s going to go into production someday, you’ll want to look at the entire landscape before deciding what’s best.
This article is all about getting an embedded application processor booting Linux — not building an entire embedded system. If you’re considering running Linux in an embedded design, you likely have some combination of Bluetooth, WiFi, Ethernet, TFT touch screen, audio, camera, or low-power RF transceiver work going on.
If you’re coming from the MCU world, you’ll have a lot of catching up to do in these areas, since the interfaces (and even architectural strategies) are quite different. For example, while single-chip WiFi/BT MCUs are common, very few application processors have integrated WiFi/BT, so you’ll typically use external SDIO- or USB-interfaced chipsets. Your SPI-interfaced ILI9341 TFTs will often be replaced with parallel RGB or MIPI models. And instead of burping out tones with your MCU’s 12-bit DAC, you’ll be wiring up I2S audio CODECs to your processor.
Processor vendors vigorously encourage reference design modification and reuse for customer designs. I think most professional engineers are most concerned with getting Rev A hardware that boots up than playing around with optimization, so many custom Linux boards I see are spitting images of off-the-shelf EVKs.
But depending on the complexity of your project, this can become downright absurd. If you need the massive amount of RAM that some EVKs come with, and your design uses the same sorts of large parallel display and camera interfaces, audio codecs, and networking interfaces on the EVK, then it may be reasonable to use this as your base with little modification. However, using a 10-layer stack-up on your simple IoT gateway — just because that’s what the ref design used — is probably not something I’d throw in my portfolio to reflect a shining moment of ingenuity.
People forget that these EVKs are built at substantially higher volumes than prototype hardware is; I often have to explain to inexperienced project managers why it’s going to cost nearly $4000 to manufacture 5 prototypes of something you can buy for $56 each.
You may discover that it’s worth the extra time to clean up the design a bit, simplify your stackup, and reduce your BOM — or just start from scratch. All of the boards I built up for this review were designed in a few days and easily hand-assembled with low-cost hot-plate / hot-air / pencil soldering in a few hours onto cheap 4-layer PCBs from JLC. Even including the cost of assembly labor, it would be hard to spend more than a few hundred bucks on a round of prototypes so long as your design doesn’t have a ton of extraneous circuitry.
Most of the parts in this review come in BGA packages, so we should talk a little bit about this. These seem to make less-experienced engineers nervous — both during layout and prototype assembly. As you would expect, more-experienced engineers are more than happy to gatekeep and discourage less-experienced engineers from using these parts, but actually, I think BGAs are much easier to design around than high-pin-count ultra-fine-pitch QFPs, which are usually your only other packaging option.
The standard 0.8mm-pitch BGAs that mostly make up this review have a coarse-enough pitch to allow a single trace to pass between two adjacent balls, as well as allowing a via to be placed in the middle of a 4-ball grid with enough room between adjacent vias to allow a track to go between them. This is illustrated in the image above on the left: notice that the inner-most signals on the blue (bottom) layer escape the BGA package by traveling between the vias used to escape the outer-most signals on the blue layer.
In general, you can escape 4 rows of signals on a 0.8mm-pitch BGA with this strategy: the first two rows of signals from the BGA can be escaped on the component-side layer, while the next two rows of signals must be escaped on a second layer. If you need to escape more rows of signals, you’d need additional layers. IC designers are acutely aware of that; if an IC is designed for a 4-layer board (with two signal layers and two power planes), only the outer 4 rows of balls will carry I/O signals. If they need to escape more signals, they can start selectively depopulating balls on the outside of the package — removing a single ball gives space for three or four signals to fit through.
For 0.65mm-pitch BGAs (top right), a via can still (barely) fit between four pins, but there’s not enough room for a signal to travel between adjacent vias; they’re just too close. That’s why almost all 0.65mm-pitch BGAs must have selective depopulations on the outside of the BGA. You can see the escape strategy in the image on the right is much less orderly — there are other constraints (diff pairs, random power nets, final signal destinations) that often muck this strategy up. I think the biggest annoyance with BGAs is that decoupling capacitors usually end up on the bottom of the board if you have to escape many of the signals, though you can squeeze them onto the top side if you bump up the number of layers on your board (many solder-down SOMs do this).
Hand-assembling PCBs with these BGAs on them is a breeze. Because 0.8mm-pitch BGAs have such a coarse pitch, placement accuracy isn’t particularly important, and I’ve never once detected a short-circuit on a board I’ve soldered. That’s a far cry from 0.4mm-pitch (or even 0.5mm-pitch) QFPs, which routinely have minor short-circuits here and there — mostly due to poor stencil alignment. I haven’t had issues soldering 0.65mm-pitch BGAs, either, but I feel like I have to be much more careful with them.
To actually solder the boards, if you have an electric cooktop (I like the Cuisineart ones), you can hot-plate solder boards with BGAs on them. I have a reflow oven, but I didn’t use it once during this review — instead, I hot-plate the top side of the board, flip it over, paste it up, place the passives on the back, and hit it with a bit of hot air. Personally, I wouldn’t use a hot-air gun to solder BGAs or other large components, but others do it all the time. The advantage to hot-plate soldering is that you can poke and nudge misbehaving parts into place during the reflow cycle. I also like to give my BGAs a small tap to force them to self-align if they weren’t already.
Microcontrollers are almost universally supplied with a single, fixed voltage (which might be regulated down internally), while most microprocessors have a minimum of three voltage domains that must be supplied by external regulators: I/O (usually 3.3V), core (usually 1.0-1.2V), and memory (fixed for each technology — 1.35V for DDR3L, 1.5V for old-school DDR3, 1.8V for DDR2, and 2.5V for DDR). There are often additional analog supplies, and some higher-performance parts might have six or more different voltages you have to supply.
While many entry-level parts can be powered by a few discrete LDOs or DC/DC converters, some parts have stringent power-sequencing requirements. Also, to minimize power consumption, many parts recommend using dynamic voltage scaling, where the core voltage is automatically lowered when the CPU idles and lowers its clock frequency.
These two points lead designers to I2C-interfaced PMIC (power management integrated circuit) chips that are specifically tailored to the processor’s voltage and sequencing requirements, and whose output voltages can be changed on the fly. These chips might integrate four or more DC/DC converters, plus several LDOs. Many include multiple DC inputs along with built-in lithium-ion battery charging. Coupled with the large inductors, capacitors, and multiple precision resistors some of these PMICs require, this added circuitry can explode your bill of materials (BOM) and board area.
Regardless of your voltage regulator choices, these parts gesticulate wildly in their power consumption, so you’ll need some basic PDN design ability to ensure you can supply the parts with the current they need when they need it. And while you won’t need to do any simulation or verification just to get things to boot, if things are marginal, expect EMC issues down the road that would not come up if you were working with simple microcontrollers.
No commonly-used microprocessor has built-in flash memory, so you’re going to need to wire something up to the MPU to store your code and persistent data. If you’ve used parts from fabless companies who didn’t want to pay for flash IP, you’ve probably gotten used to soldering down an SPI NOR flash chip, programming your hex file to it, and moving on with your life. When using microprocessors, there are many more decisions to consider.
Most MPUs can boot from SPI NOR flash, SPI NAND flash, parallel, or MMC (for use with eMMC or MicroSD cards). Because of its organization, NOR flash memory has better read speeds but worse write speeds than NAND flash. SPI NOR flash memory is widely used for tiny systems with up to 16 MB of storage, but above that, SPI NAND and parallel-interfaced NOR and NAND flash become cheaper. Parallel-interfaced NOR flash used to be the ubiquitous boot media for embedded Linux devices, but I don’t see it deployed as much anymore — even though it can be found at sometimes half the price of SPI flash. My only explanation for its unpopularity is that no one likes wasting lots of I/O pins on parallel memory.
Above 1 GB, MMC is the dominant technology in use today. For development work, it’s especially hard to beat a MicroSD card — in low volumes they tend to be cheaper per gigabyte than anything else out there, and you can easily read and write to them without having to interact with the MPU’s USB bootloader; that’s why it was my boot media of choice on almost all platforms reviewed here. In production, you can easily switch to eMMC, which is, very loosely speaking, a solder-down version of a MicroSD card.
Back when parallel-interfaced flash memory was the only game in town, there was no need for boot ROMs: unlike SPI or MMC, these devices have address and data pins, so they are easily memory-mapped; indeed, older processors would simply start executing code straight out of parallel flash on reset.
That’s all changed though: modern application processors have boot ROM code baked into the chip to initialize the SPI, parallel, or SDIO interface, load a few pages out of flash memory into RAM, and start executing it. Some of these ROMs are quite fancy, actually, and can even load files stored inside a filesystem on an MMC device. When building embedded hardware around a part, you’ll have to pay close attention to how to configure this boot ROM.
While some microprocessors have a basic boot strategy that simply tries every possible flash memory interface in a specified order, others have extremely complicated (“flexible”?) boot options that must be configured through one-time-programmable fuses or GPIO bootstrap pins. And no, we’re not talking about one or two signals you need to handle: some parts have more than 30 different bootstrap signals that must be pulled high or low to get the part booting correctly.
Unlike MCU-based designs, on an embedded Linux system, you absolutely, positively, must have a console UART available. Linux’s entire tracing architecture is built around logging messages to a console, as is the U-Boot bootloader.
That doesn’t mean you shouldn’t also have JTAG/SWD access, especially in the early stage of development when you’re bringing up your bootloader (otherwise you’ll be stuck with printf() calls). Having said that, if you actually have to break out your J-Link on your embedded Linux board, it probably means you’re having a really bad day. While you can attach a debugger to an MPU, getting everything set up correctly is extremely clunky when compared to debugging an MCU. Prepare to relocate symbol tables as your code transitions from SRAM to main DRAM memory. It’s not uncommon to have to muck around with other registers, too (like forcing your CPU out of Thumb mode). And on top of that, I’ve found that some U-Boot ports remux the JTAG pins (either due to alternate functionality or to save power), and the JTAG chains on some parts are quite complex and require using less-commonly used pins and features of the interface. Oh, and since you have an underlying Boot ROM that executes first, JTAG adapters can screw that up, too.
If you start searching around the Internet, you’ll stumble upon a lot of posts from people asking about routing an SDRAM memory bus, only to be discouraged by “experts” lecturing them on how unbelievably complex memory routing is and how you need a minimum 6-layer stack-up and super precise length-tuning and controlled impedances and $200,000 in equipment to get a design working.
That’s utter bullshit. In the grand scheme of things, routing memory is, at worst, a bit tedious. Once you’ve had some practice, it should take about an hour or so to route a 16-bit-wide single-chip DDR3 memory bus, so I’d hardly call it an insurmountable challenge. It’s worth investing a bit of time to learn about it since it will give you immense design flexibility when architecting your system (since you won’t be beholden to expensive SoMs or SiP-packaged parts).
Let’s get one thing straight: I’m not talking about laying out a 64-bit-wide quad-bank memory bus with 16 chips on an 8-layer stack-up. Instead, we’re focused on a single 16-bit-wide memory chip routed point-to-point with the CPU. This is the layout strategy you’d use with all the parts in this review, and it is drastically simpler than multi-chip layouts — no address bus terminations, complex T-topology routes, or fly-by write-leveling to worry about. And with modern dual-die DRAM packages, you can get up to 2 GB capacity in a single DDR3L chip. In exchange for the markup you’ll pay for the dual-die chips, you’ll end up with much easier PCB routing.
When most people think of DDR routing, length-tuning is the first thing that comes to mind. If you use a decent PCB design package, setting up length-tuning rules and laying down meandered routes is so trivial to do that most designers don’t think anything of it — they just go ahead and length-match everything that’s relatively high-speed — SDRAM, SDIO, parallel CSI / LCD, etc. Other than adding a bit of design time, there’s no reason not to maximize your timing margins, so this makes sense.
But what if you’re stuck in a crappy software package, manually exporting spreadsheets of track lengths, manually determining matching constraints, and — gasp — maybe even manually creating meanders? Just how important is length-matching? Can you get by without it?
Most microprocessors reviewed here top out at DDR3-800, which has a bit period of 1250 ps. Slow DDR3-800 memory might have a data setup time of up to 165 ps at AC135 levels, and a hold time of 150 ps. There’s also a worst-case skew of 200 ps. Let’s assume our microprocessor has the same specs. That means we have 200 ps of skew from our processor + 200 ps of skew from our DRAM chip + 165 ps setup time + 150 ps of hold time = 715 ps total. That leaves a margin of 535 ps (more than 3500 mil!) for PCB length mismatching.
Are our assumptions about the MPU’s memory controller valid? Who knows. One issue I ran into is that there’s a nebulous cloud surrounding the DDR controllers on many application processors. Take the i.MX 6UL as an example: I discovered multiple posts where people add up worst-case timing parameters in the datasheet, only to end up with practically no timing margin. These official datasheet numbers seem to be pulled out of thin air — so much so that NXP literally removed the entire DDR section in their datasheet and replaced it with a boiler-plate explanation telling users to follow the “hardware design guidelines.” Texas Instruments and ST also lack memory controller timing information in their documentation — again, referring users to stringent hardware design rules. ((Rockchip and Allwinner don’t specify any sort of timing data or length-tuning guidelines for their processors at all.))
How stringent are these rules? Almost all of these companies recommend a ±25-mil match on each byte group. Assuming 150 ps/cm propagation delay, that’s ±3.175 ps — only 0.25% of that 1250ps DDR3-800 bit period. That’s absolutely nuts. Imagine if you were told to ensure your breadboard wires were all within half an inch in length of each other before wiring up your Arduino SPI sensor project — that’s the equivalent timing margin we’re talking about.
To settle this, I empirically tested two DDR3-800 designs — one with and one without length tuning — and they performed identically. In neither case was I ever able to get a single bit error, even after thousands of iterations of memory stress-tests. Yes, that doesn’t prove that the design would run for 24/7/365 without a bit error, but it’s definitely a start. Just to verify I wasn’t on the margin, or that this was only valid for one processor, I overclocked a second system’s memory controller by two times — running a DDR3-800 controller at DDR3-1600 speeds — and I was still unable to get a single bit error. In fact, all five of my discrete-SDRAM-based designs violated these length-matching guidelines and all five of them completed memory tests without issue, and in all my other testing, I never experienced a single crash or lock-up on any of these boards.
My take-away: length-tuning is easy if you have good CAD software, and there’s no reason not to spend an extra 30 minutes length-tuning things to maximize your timing budget. But if you use crappy CAD software or you’re rushing to get a prototype out the door, don’t sweat it — especially for Rev A.
More importantly, a corollary: if your design doesn’t work, length-tuning is probably the lastthing you should be looking at. For starters, make sure you have all the pins connected properly — even if the failures appear intermittent. For example, accidentally swapping byte lane strobes / masks (like I’ve done) will cause 8-bit operations to fail without affecting 32-bit operations. Since the bulk of RAM accesses are 32-bit, things will appear to kinda-sorta work.
Instead of worrying about length-tuning, if a design is failing (either functionally or in the EMC test chamber), I would look first at power distribution and signal integrity. I threw together some HyperLynx simulations of various board designs with different routing strategies to illustrate some of this. I’m not an SI expert, and there are better resources online if you want to learn more practical techniques; for more theory, the books that everyone seems to recommend are by Howard Johnson: High Speed Digital Design: A Handbook of Black Magic and High Speed Signal Propagation: Advanced Black Magic, though I’d also add Henry Ott’s Electromagnetic Compatibility Engineering book to that list.
Ideally, every signal’s source impedance, trace impedance, and load impedance would match. This is especially important as a trace’s length starts to approach the wavelength of the signal (I think the rule of thumb is 1/20th the wavelength), which will definitely be true for 400 MHz and faster DDR layouts.
Using a proper PCB stack-up (usually a ~0.1mm prepreg will result in a close-to-50-ohm impedance for a 5mil-wide trace) is your first line of defense against impedance issues, and is usually sufficient for getting things working well enough to avoid simulation / refinement.
For the data groups, DDR3 uses on-die termination (ODT), configurable for 40, 60, or 120 ohm on memory chips (and usually the same or similar on the CPU) along with adjustable output impedance drivers. ODT is only enabled on the receiver’s end, so depending on whether you’re writing data or reading data, ODT will either be enabled on the memory chip, or on the CPU.
For simple point-to-point routing, don’t worry too much about ODT settings. As can be seen in the above eye diagram, the difference between 33-ohm and 80-ohm ODT terminations on a CPU reading from DRAM is perceivable, but both are well within AC175 levels (the most stringent voltage levels in the DDR3 spec). The BSP for your processor will initialize the DRAM controller with default settings that will likely work just fine.
The biggest source of EMC issues related to DDR3 is likely going to come from your address bus. DDR3 uses a one-way address bus (the CPU is always the transmitter and the memory chip is always the receiver), and DDR memory chips do not have on-chip termination for these signals. Theoretically, they should be terminated to VTT (a voltage derived from VDDQ/2) with resistors placed next to the DDR memory chip. On large fly-by buses with multiple memory chips, you’ll see these VTT termination resistors next to the last chip on the bus. The resistors absorb the EM wave propagating from the MPU which reduces the reflections back along the transmission line that all the memory chips would see as voltage fluctuations. On small point-to-point designs, the length of the address bus is usually so short that there’s no need to terminate. If you run into EMC issues, consider software fixes first, like using slower slew-rate settings or increasing the output impedance to soften up your signals a bit.
Another source of SI issues is cross-coupling between traces. To reduce cross-talk, you can put plenty of space between traces — three times the width (3S) is a standard rule of thumb. I sound like a broken record, but again, don’t be too dogmatic about this unless you’re failing tests, as the lengths involved with routing a single chip are so short. The above figure illustrates the routing of a DDR bus with no length-tuning but with ample space between traces. Note the eye diagram (below) shows much better signal integrity (at the expense of timing skew) than the first eye diagram presented in this section.
Because DDR memory doesn’t care about the order of the bits getting stored, you can swap individual bits — except the least-significant one if you’re using write-leveling — in each byte lane with no issues. Byte lanes themselves are also completely swappable. Having said that, since all the parts I reviewed are designed to work with a single x16-wide DDR chip (which has an industry-standard pinout), I found that most pins were already balled out reasonably well. Before you start swapping pins, make sure you’re not overlooking an obvious layout that the IC designers intended.
Instead of worrying about chatter you read on forums or what the HyperLynx salesperson is trying to spin, for simple point-to-point DDR designs, you shouldn’t have any issues if you follow these suggestions:
Pay attention to PCB stack-up. Use a 4-layer stack-up with thin prepreg (~0.1mm) to lower the impedance of your microstrips — this allows the traces to transfer more energy to the receiver. Those inner layers should be solid ground and DDR VDD planes respectively. Make sure there are no splits under the routes. If you’re nit-picky, pull back the outer-layer copper fills from these tracks so you don’t inadvertently create coplanar structures that will lower the impedance too much.
Avoid multiple DRAM chips. If you start adding extra DRAM chips, you’ll have to route your address/command signals with a fly-by topology (which requires terminating all those signals — yuck), or a T-topology (which requires additional routing complexity). Stick with 16-bit-wide SDRAM, and if you need more capacity, spend the extra money on a dual-die chip — you can get up to 2 GB of RAM in a single X16-wide dual-rank chip, which should be plenty for anything you’d throw at these CPUs.
Faster RAM makes routing easier.Even though our crappy processors reviewed here rarely can go past 400-533 MHz DDR speeds, using 800 or 933 MHz DDR chips will ease your timing budget. The reduced setup/hold times make address/command length-tuning almost entirely unnecessary, and the reduced skew even helps with the bidrectional data bus signals.
Developing on an MCU is simple: install the vendor’s IDE, create a new project, and start programming/debugging. There might be some .c/.h files to include from a library you’d like to use, and rarely, a precompiled lib you’ll have to link against.
When building embedded Linux systems, we need to start by compiling all the off-the-shelf software we plan on running — the bootloader, kernel, and userspace libraries and applications. We’ll have to write and customize shell scripts and configuration files, and we’ll also often write applications from scratch. It’s really a totally different development process, so let’s talk about some prerequisites.
If you want to build a software image for a Linux system, you’ll need a Linux system. If you’re also the person designing the hardware, this is a bit of a catch-22 since most PCB designers work in Windows. While Windows Subsystem for Linux will run all the software you need to build an image for your board, WSL currently has no ability to pass through USB devices, so you won’t be able to use hardware debuggers (or even a USB microSD card reader) from within your Linux system. And since WSL2 is Hyper-V-based, once it’s enabled, you won’t be able to launch VMware, which uses its own hypervisor((Though a beta versions of VMWare will address this)).
Consequently, I recommend users skip over all the newfangled tech until it matures a bit more, and instead just spin up an old-school VMWare virtual machine and install Linux on it. In VMWare you can pass through your MicroSD card reader, debug probe, and even the device itself (which usually has a USB bootloader).
Building images is a computationally heavy and highly-parallel workload, so it benefits from large, high-wattage HEDT/server-grade multicore CPUs in your computer — make sure to pass as many cores through to your VM as possible. Compiling all the software for your target will also eat through storage quickly: I would allocate an absolute minimum of 200 GB if you anticipate juggling between a few large embedded Linux projects simultaneously.
While your specific project will likely call for much more software than this, these are the five components that go into every modern embedded Linux system((Yes, there are alternatives to these components, but the further you move away from the embedded Linux canon, the more you’ll find yourself on your own island, scratching your head trying to get things to work.)):
A cross toolchain, usually GCC + glibc, which contains your compiler, binutils, and C library. This doesn’t actually go into your embedded Linux system, but rather is used to build the other components.
a root filesystem, which contains the aforementioned userspace components, along with any loadable kernel modules you compiled, shared libraries, and configuration files.
As you’re reading through this, don’t get overwhelmed: if your hardware is reasonably close to an existing reference design or evaluation kit, someone has already gone to the trouble of creating default configurations for you for all of these components, and you can simply find and modify them. As an embedded Linux developer doing BSP work, you’ll spend way more time reading other people’s code and modifying it than you will be writing new software from scratch.
Just like with microcontroller development, when working on embedded Linux projects, you’ll write and compile the software on your computer, then remotely test it on your target. When programming microcontrollers, you’d probably just use your vendor’s IDE, which comes with a cross toolchain — a toolchain designed to build software for one CPU architecture on a system running a different architecture. As an example, when programming an ATTiny1616, you’d use a version of GCC built to run on your x64 computer but designed to emit AVR code. With embedded Linux development, you’ll need a cross toolchain here, too (unless you’re one of the rare types coding on an ARM-based laptop or building an x64-powered embedded system).
When configuring your toolchain, there are two lightweight C libraries to consider — musl libc and uClibc-ng — which implement a subset of features of the full glibc, while being 1/5th the size. Most software compiles fine against them, so they’re a great choice when you don’t need the full libc features. Between the two of them, uClibc is the older project that tries to act more like glibc, while musl is a fresh rewrite that offers some pretty impressive stats, but is less compatible.
Unfortunately, our CPU’s boot ROM can’t directly load our kernel. Linux has to be invoked in a specific way to obtain boot arguments and a pointer to the device tree and initrd, and it also expects that main memory has already been initialized. Boot ROMs also don’t know how to initialize main memory, so we would have nowhere to store Linux. Also, boot ROMs tend to just load a few KB from flash at the most — not enough to house an entire kernel. So, we need a small program that the boot ROM can load that will initialize our main memory and then load the entire (usually-multi-megabyte) Linux kernel and then execute it.
The most popular bootloader for embedded systems, Das U-Boot, does all of that — but adds a ton of extra features. It has a fully interactive shell, scripting support, and USB/network booting.
If you’re using a tiny SPI flash chip for booting, you’ll probably store your kernel, device tree, and initrd / root filesystem at different offsets in raw flash — which U-Boot will gladly load into RAM and execute for you. But since it also has full filesystem support, so you could store your kernel and device tree as normal files on a partition of an SD card, eMMC device, or on a USB flash drive.
U-Boot has to know a lotof technical details about your system. There’s a dedicated board.c port for each supported platform that initializes clocks, DRAM, and relevant memory peripherals, along with initializing any important peripherals, like your UART console or a PMIC that might need to be configured properly before bringing the CPU up to full speed. Newer board ports often store at least some of this configuration information inside a Device Tree, which we’ll talk about later. Some of the DRAM configuration data is often autodetected, allowing you to change DRAM size and layout without altering the U-Boot port’s code for your processor ((If you have a DRAM layout on the margins of working, or you’re using a memory chip with very different timings than the one the port was built for, you may have to tune these values)). You configure what you want U-Boot to do by writing a script that tells it which device to initialize, which file/address to load into which memory address, and what boot arguments to pass along to Linux. While these can be hard-coded, you’ll often store these names and addresses as environmental variables (the boot script itself can be stored as a bootcmd environmental variable). So a large part of getting U-Boot working on a new board is working out the environment.
Here’s the headline act. Once U-Boot turns over the program counter to Linux, the kernel initializes itself, loads its own set of device drivers((Linux does notcall into U-Boot drivers the way that an old PC operating system like DOS makes calls into BIOS functions.)) and other kernel modules, and calls your init program.
To get your board working, the necessary kernel hacking will usually be limited to enabling filesystems, network features, and device drivers — but there are more advanced options to control and tune the underlying functionality of the kernel.
Turning drivers on and off is easy, but actually configuring these drivers is where new developers get hung up. One big difference between embedded Linux and desktop Linux is that embedded Linux systems have to manually pass the hardware configuration information to Linux through a Device Tree file or platform data C code, since we don’t have EFI or ACPI or any of that desktop stuff that lets Linux auto-discover our hardware.
We need to tell Linux the addresses and configurations for all of our CPU’s fancy on-chip peripherals, and which kernel modules to load for each of them. You may think that’s part of the Linux port for our CPU, but in Linux’s eyes, even peripherals that are literally inside our processor— like LCD controllers, SPI interfaces, or ADCs — have nothing to do with the CPU, so they’re handled totally separately as device drivers stored in separate kernel modules.
And then there’s all the off-chip peripherals on our PCB. Sensors, displays, and basically all other non-USB devices need to be manually instantiated and configured. This is how we tell Linx that there’s an MPU6050 IMU attached to I2C0 with an address of 0x68, or an OV5640 image sensor attached to a MIPI D-PHY. Many device drivers have additional configuration information, like a prescalar factor, update rate, or interrupt pin use.
The old way of doing this was manually adding C structs to a platform_data C file for the board, but the modern way is with a Device Tree, which is a configuration file that describes every piece of hardware on the board in a weird quasi-C/JSONish syntax. Each logical piece of hardware is represented as a node that is nested under its parent bus/device; its node is adorned with any configuration parameters needed by the driver.
A DTS file is not compiled into the kernel, but rather, into a separate .dtb binary blob file that you have to deal with (save to your flash memory, configure u-boot to load, etc)((OK, I lied. You can actually append the DTB to the kernel so U-Boot doesn’t need to know about it. I see this done a lot with simple systems that boot from raw Flash devices.)). I think beginners have a reason to be frustrated at this system, since there’s basically two separate places you have to think about device drivers: Kconfig and your DTS file, and if these get out of sync, it can be frustrating to diagnose, since you won’t get a compilation error if your device tree contains nodes that there are no drivers for, or if your kernel is built with a driver that isn’t actually referenced for in the DTS file, or if you misspell a property or something (since all bindings are resolved at runtime).
Once Linux has finished initializing, it runs init. This is the first userspace program invoked on start-up. Our init program will likely want to run some shell scripts, so it’d be nice to have a sh we can invoke. Those scripts might touch or echo or cat things. It looks like we’re going to need to put a lot of userspace software on our root filesystem just to get things to boot — now imagine we want to actually login (getty), list a directory (ls), configure a network (ifconfig), or edit a text file (vi, emacs, nano, vim, flamewars ensue).
Rather than compiling all of these separately, BusyBox collects small, light-weight versions of these programs (plus hundreds more) into a single source tree that we can compile and link into a single binary executable. We then create symbolic links to BusyBox named after all these separate tools, then when we call them on the command line to start up, BusyBox determines how it was invoked and runs the appropriate command. Genius!
BusyBox configuration is obvious and uses the same Kconfig-based system that Linux and U-Boot use. You simply tell it which packages (and options) you wish to build the binary image with. There’s not much else to say — though a minor “gotcha” for new users is that the lightweight versions of these tools often have fewer features and don’t always support the same syntax/arguments.
Linux requires a root filesystem; it needs to know where the root filesystem is and what filesystem format it uses, and this parameter is part of its boot arguments.
Many simple devices don’t need to persist data across reboot cycles, so they can just copy the entire rootfs into RAM before booting (this is called initrd). But what if you want to write data back to your root filesystem? Other than MMC, all embedded flash memory is unmanaged — it is up to the host to work around bad blocks that develop over time from repeated write/erase cycles. Most normal filesystems are not optimized for this workload, so there are specialized filesystems that target flash memory; the three most popular are JFFS2, YAFFS2, and UBIFS. These filesystems have vastly different performance envelopes, but for what it’s worth, I generally see UBIFS deployed more on higher-end devices and YAFFS2 and JFFS2 deployed on smaller systems.
MMC devices have a built-in flash memory controller that abstracts away the details of the underlying flash memory and handles bad blocks for you. These managed flash devices are much simpler to use in designs since they use traditional partition tables and filesystems — they can be used just like the hard drives and SSDs in your PC.
If the preceding section made you dizzy, don’t worry: there’s really no reason to hand-configure and hand-compile all of that stuff individually. Instead, everyone uses build systems — the two big ones being Yocto and Buildroot — to automatically fetch and compile a full toolchain, U-Boot, Linux kernel, BusyBox, plus thousands of other packages you may wish, and install everything into a target filesystem ready to deploy to your hardware.
Even more im