More eculocate: I spent the past couple of weeks first on rearranging
the client code to be a bit less awful - now there's a
command line client as well as an android/termux GUI one, and they share code - and then on adding an
affordance for configuring the wifi network that doesn't require
hardcoding it at build time.
This is kind of a distraction from plugging it into an actual
motorbike, because I could have just hardcoded the wifi details for
tethering to my phone, but once it's attached to the bike then I can
only flash it OTA and I don't want to be messing around tethering my
development machine to my phone just so it can be on the same network
as the bike.
In January I said:
There's a convention for provisioning wifi on these devices
which involves using a mobile phone app to connect to it using BLE
then sending the ssid/password of the chosen access point. In fact
there's even a prebuilt Android app which we can use and an esp32
arduino library which we can't (because we have elected to make our
lives difficult and use Rust instead). But I am led to believe that
"rewrite everything in Rust" is idiomatic for Rust programmers anyway.
I haven't done this yet.
And ... I still haven't, or not as such, because change of
plan. Because I would have to write the device-side code myself
anyway (because Rust), and I would want to write the phone-side code
instead of using their app (because I'd like to have one app to do
everything instead of separate apps for provisioning and for
recording), then I have free rein on the protocol, and the Espressif
protocol
seems ... quite involved?
(What I didn't learn until checking references for writing this blog
post which has already taken way too long it is already Tuesday is
that the Espressif protocol I linked there no longer exists
except in git history, and appears to have been replaced by a different
one called
Blufi
so I am feeling even less bad about swerving reimplementing it)
Background, if you're lucky enough to not know much about Bluetooth
Low Energy: the usual interaction model is "GATT", which can be
thought of as a network-based key/value store. A device defines
"characteristics" and "attributes" and allows read/write access to
them. If you're working "with the grain" of the design, you use this
to model some kind of object/record/aggregate/entity -
or, alternatively, you could just have attributes for rx and tx
and send streams of data over them in any format you like.
The Espressif protocol - as was - defines a protobuf-based protocol
that runs over both BLE and also WiFi with SoftAP, whereas if we're
restricting ourselves to BLE we can actually leverage GATT instead of
using it to emulate a stream interface.
So what have we done instead? We have four characteristics (with
hindsight maybe this should have been four attritbutes of one
characteristic). eculocate scans for wifi networks it can see, then
it sets the attribute max_index to the number ot visible
networks. The client then loops throuh 1..max_index writing to the
attribute current_index, which causes eculocate to update
current_network to contain the ssid of the nth scanned network. When
the client finds a network it likes, it writes the corresponding
password to secret and eculocate saves the ssid/secret into
flash. The loop feels a bit weird, but there isn't any way - at least,
I can't see any way - to specify that an attribute is array-valued.
To provide security for the precious wifi credentials we simply enable
pairing, which causes the two ends to encrypt with AES-CCM.
Sounds simple, and it probably would be except that USB bluetooth
hardware is a market for
lemons (I
originally said something much much ruder there) and the degree of
care and research needed to get one that works properly in Linux far
exceeds the degree of care and research I exerted. The first dongle
i bought, which identifies as 33fa:0010 and is said to be based on a
"Barrot" chipset, is a buggy pile of shite which locks up and resets
randomly, even when using a kernel that contains
e7d1cad6545c,
meaning that every test of the code had to be preceded by removing and
reloading the module and thewn it would work about 70% of the time
provided I didn't try to pair, which caused it to time out consistently.
Then I dug out a Raspberry Pi Zero W from the junk box, but the
builtin bluetooth on that only supports 4.1, and the TrouBLE stack
won't pair with anything older than 4.2
(I have since received an Edup B3536 dongle which is based on the
Realtek chipset and said to be a whole lot better - but it only
arrived after I started this blog post and I haven't tried it in anger
yet)
It turns out that with the current hardware configuration pairing is
not actually going to help much anyway. The device has no
keyboard or display (thus no way to display or confirm a pairing pin),
so is restricted to "just works" pairing, which offers no protection
against MITM at the time of pairing. Since all we're using Bluetooth
for is entering the wifi details, and the BLE service is shut down
once it successfully connects to wifi, there is no point setting up a
secure connection "for later" as there is no "later".
eculocate goes into "wifi chooser" mode when there's no saved network
or when it can't connect to the saved network, so a black hat (or
black balaclava) could attack it by separating your motorbike from
your phone so the phone is out of range, then turn the ignition on,
then configuring a different wifi network. It won't help them a lot as
they're still lacking the ed25119 key that controls TCP
connections. And at this point if they're sitting on your motorbike
and the ignition is on, maybe you have bigger problems ... perhaps we
could regard the bike's ignition key as a hardware token.
The proper way to do pairing and have it actually be useful would be
to add an NFC reader/writer and use OOB authentication
but I am very much feeling like I wil save that for another day.
This week has been about implementing OTA flashing.
The ESP32 supports an A/B partitioning scheme for the application
partition, so you can safely install a new firmware without destroying
the firmware that's currently running. You write to partition ota_0
having booted from ota_1 and then flip the active boot partition and
do the opposite on the next update. Rust support for this is in the
esp_hal_ota crate, but if you
want to understand what it's doing you should first read the official
Espressif C API documentation.
It's pretty easy to use: as per the
example
you make an Ota object, you call ota_begin on it, and then you call
ota_write chunk every time you get some new data (from the network
or from BLE or from the USB stick or wherever you are getting the
update from) until the whole thing's transferred, then you call ota_flush
at the end.
It does/did seem to have a bug (as far as I can tell): if you turn off
the builtin CRC checking (because you are using a reliable network, or
you have some other form of integrity check), it will try to do it
anyway. It's a one-line fix
which I will open a PR for as soon as I'm a bit more convinced I
haven't misundestood the whole thing.
Anyway, I plugged it in, hooked it up and ... it worked. It was
however a lot slower (it took minutes) than flashing using espflash
over USB, which takes seconds, and this is because it (more precisely,
the API afforded by
embedded-storage)
takes a quite naive approach to managing erase blocks: if you get 1187
bytes from the network and ask embedded-storage to write it, it will
read the whole erase block (4096 bytes) into RAM, erase it - you can
only erase in units of one block - and then write the modified block
back. Given you'd just written less than an entire block, the next
packet will cause it to erase the same block again to write a different
part of it. You're probably erasing each block two or three times.
Rather than change esp-hal-ota I decided to take the coward's way out
and buffer data in my application until I had a block's worth. Suddenly it got massively faster.
The other feature this week was authentication: the device should only
accept legitimate (read: cryptograpically signed) firmware. The
constraint here is that we can't read the whole firmware into RAM
before verifying the signature - there's 384K of RAM and 4MB of flash,
it's not going to fit. Our self-imposed additional constraint is that
we don't want to write the firmware to flash as we go along and then
validate the signature at the end, because if it's wrong then we've
already overwritten a working firmware with malicious data. I say this
is self-imposed because it's not actually obvious that it's a real
problem unless there's some way for the bad actor to switch to the
new firmware, but it doesn't seem pretty even so.
So tl;dr we need to verify the firmware before we've read it all, and
before we've written any of it. We do this using the approach
explained by Gennaro and Rohatgi in How to Sign Digital
Streams -
the first block is digitally signed and each block contains a sha256
of the following block.
We use
ed25519-dalek
for signature checking, because it looked easy to get running with no_std and I've
heard of at least two of the authors. For SHA256 - it turns out that the
ESP32-C3 has a hardware SHA accelerator, so we use esp_hal::sha here and don't need to do it in software.
I bashed my head against this for a while because I didn't read the F
manual closely enough. hasher.update doesn't eat all the bytes that
you feed it, and it returns the unconsumed data - so you have to run
it in a loop until it comes back empty. If you just call it once, as I
did, you get the hash of the first 32 bytes only.
(In the end, this - it seems to me - doesn't protect a lot better
against wiping good images than the verify-at-the-end approach. If
a MITM is able to substitute block 22 of 150 blocks with their own
data, we will write blocks 1-21 and then abort, whereas
verifying-at-the-end means we'll write blocks 1-150 but not switch to
the new image. Either way we've trashed whatever was there before)
That's bascally all there is to relate this week, although I also
invested a little more time reading bits of the Rust
Book
that our Rust study group at $work hasn't reached yet, in order to
make the error handling less crappy. Next steps are to write the
session registration thing so that UDP is authenticated, and to add
wifi provisioning so we're not hardcoding my wifi network details.
What's missing from this motorbike? The answer is shocking.
I removed the shock from my motorbike today so I can take it to
ABE tomorrow to be
rebuilt. Some notes for posterity and so that I remember how to
reinstall it.
I mostly followed the Haynes manual: the words are good but the
pictures are awful. They say to remove the fuel tank, but I didn't
really want to, on account of how it's full of fuel. I found it worked
to lift the tank a bit and stick some wooden blocks underneath to hold
it up. While doing this the vent/breather pipe popped off, as it
always does.
In the order that I tackled them:
the reservoir needs to be removed: use a JIS screwdriver to loosen
the strap around it and then just slide it out.
to get to the nut/bolt at the lower end I had to loosen the rear
hugger - two screws removed. I used a socket on the nut and a
spanner on the bolt to stop it from spinning while I turned the nut.
I can't see any way to get a socket onto the nut/bolt at the top
end, but it eventually succumbed to two spanners. Reassembling this
is going to be "fun" if it's finicky about torque settings.
then it's "just" a matter of untangling everything to remove the
shock and its reservoir. I had to unplug the connector for the
stator cable, as there was no way to get the reservoir through the
tangle otherwise.
Hopefully having now written this down I'll not forget to reattach all
the bits
I'll tell you a joke about UDP, but you might not get it.
We have a new name. "Thing I can plug into my motorbike ECU to log the
data (rpm, speed, throttle position, temperatures etc etc) it
produces" is Leonard-of-Quirm-level naming. I'd provisionally been
calling it "eculogical" which I didn't like, and now it's called
"eculocate" which I ... can tolerate.
And I've got it to the point where it (kind of) works - but, now I've
decided I need to semi-fundamentally break it again. I'll get to
that.
On the server side we have a UDP socket that listens for subscription message
containing [(interval, table-number, start, end), ...] (actually
binary encoded) and then sends back the requested table data once
every interval milliseconds for the next minute. Then it stops,
because this is UDP and we can't reliably tell when the peer has gone
away, so the peer should send another subscription message in the
meantime if it wants to carry on receiving.
For now we're just offering the raw tables, because I'm going to need
much more example data to figure out the structure. Eventually we'll
do some processing on device so that clients can query "RPM" or "TPS"
without having to know their table/offset - as that varies between
bike models.
Notes:
the embassy-net IP stack (actually smoltcp) requires you to
statically declare how many sockets you're going to
use
which is fine once you know you have to.
And we have an Android client. Well, it's Android insofar as it runs
on my phone, but I don't think it'd qualify for Android Market or the
Play Store or whatever it's called now. I sidestepped the
whole android app development slog, by installing Termux and
Termux:GUI on my phone and writing the client side as a Python script. I don't even like Python and I still found this preferable to
the Android Studio build/run process: I simply sshed into my phone and
used tramp to edit the script. I believe that Termux:GUI doesn't
support the full range of Android widgets but it has buttons and
labels and text boxes and LinearLayouts which is enough for me. Adding
dns-sd (zeronconf) support was the work of about 20 minutes, which was nice.
Having achieved that milestone I made a list of what's left before I
can plug it into my motorbike and take it for a ride (cable, power
supply, some form of protective casing) and realised that once I
detach it from its USB umbilical I will no longer be able to release
new versions simply by invoking cargo run. So, it needs a mechanism
for OTA updates, and this should probably come with some kind of
auth[nz] so that not just any Tom, Dick or Harry on the same wifi
network could flash random crap onto it. Then I considered that if
we're not trusting the wifi, the actual UDP service (which is
currently read-only but maybe some day might include a means of
writing to (and therefore probably bricking) the ecu) is also
sensitive.
Here's the plan:
we'll make an EdDSA key pair at build time and embed the public key in the binary
build tooling will sign the release artefact with the key
a TCP socket will listen for OTA update requests and verify the signature before
writing to the flash
the TCP socket will also listen for session key registrations (signed in the same way) and remember them for x hours (or until the ignition
is turned off and we lose power)
the UDP listener will reject subscription requests unless they come with a
valid session key
Additionally, we need to change the dns-sd stuff to advertise a TCP
service, the client to register a session key when it starts, the
subscription message format to include the session key, and the UDP
listener to check it. Which is what I meant when I said
"semi-fundamentally break it".
If this were commercial/proprietary software then we'd have separate
keys for the firmware signing and for the client. That seems less of
an issue when it's most likely the same person building the software as
is using the client, but it might be worth doing anyway.
Current status: bodged together a TCP listener, haven't touched on
crypto yet, and so far it only pretends to do the OTA update.
to jot down some of what I've been doing in the past month or so. The
tl;dr is "making a thing I can plug into my motorbike ECU to log the
data (rpm, speed, throttle position, temperatures etc etc) it
produces". For reasons mostly of ramifying the learning opportunities,
I decided the best way would be to get a cheap ESP-32 device (it's
RISC-V - isn't that cool?) and hook it up to a level converter, and
then write a program for it in Rust (Rust learning opportunity ahoy)
to twiddle the serial line appropriately and send the data over the
network to the mobile phone which sits on my handlebars.
It turns out that I spent way less time getting the serial interface
to the Honda K-line ECU signal
to reveal its secrets than on the "why don't you just ..." part where
I want to stream the data over wifi to another device. So this post is
actually not at all about hardware hacking.
The constraints I have imposed on myself here are
I do not want to hardcode my wifi ssid and password into the device
(actually, I have at least two wifi networks I may want to use it with)
the (yet to be written) data collection app should be able to find
the device without hardcoding its IP address or requiring me to type it
in - as the device is getting its address from DHCP, we don't even
know what address it will get
These are both in principle solved problems.
There's a convention for provisioning wifi on these devices
which involves using a mobile phone app to connect to it using BLE
then sending the ssid/password of the chosen access point. In fact
there's even a prebuilt Android app which we can use and an esp32
arduino library which we can't (because we have elected to make our
lives difficult and use Rust instead). But I am led to believe that
"rewrite everything in Rust" is idiomatic for Rust programmers anyway.
I haven't done this yet.
And for the "what's my IP address" problem there is a standard way, by
combining Multicast DNS
and DNS-based Service Discovery, for
computers to publish their services on the LAN. When I say
"computers": if this household is typical, mostly they're set-top
boxes, printers, light bulbs, smart speakers and thermostats rather
than general-purpose computing devices. I've mostly done this bit.
Terms
Multicast DNS is DNS, but peer-to-peer: it reuses mostly the same packet formats
but instead of requiring a centralised server which knows all the
names, every device listens on a multicast address
for DNS queries for its own name.
DNS-SD is a convention for which records you can query/need to send in
order to advertise what kind of services you have and where they are.
Because sending an A record alone is not sufficient for anyone with a
Mac and a fancy-schmancy service browser to know what kind of service
is on offer at that address. Is it a printer? A dishwasher? An
IoT air fryer?
The RFCs for each (which are, by the way, much easier reads than a lot
of RFCs and contain no EBNF at all) go to great lengths to point out
that each is independent of the other. But they stack well.
DNS-SD, 3048 metre view
DNS-SD is based on a paradigm of "services" and "service instances". A
"service" is the general "kind" of thing on offer and is named
something like _http._tcp.local - it will always end in _tcp.local
if it is TCP or _udp.local if it is anything other than TCP. For our
ECU project we chose the service name _keihin._udp.local after the
manufacturer of the ECUs that the device knows how to talk to. A
service instance might be something like
WiserHeat05AB12._http._tcp.local. Service names aren't usually
hierarchical but there are a few with a second level like
_printer._sub._http._tcp
The minimum/usual set of records you need to publish for DNS-SD is
this (pseudocode)
Your service instance needs a SRV and a TXT, then there's a PTR
connecting the service instance to the service for people who are
browsing the service - think about e.g. an "Add a printer" dialog box,
then there's a PTR from _services._udp.local to the service name PTR
for people who are running avahi-browse -a or its moral equivalent
in GUI-land. And not forgetting there's an A record matching the one in the
SRV record data.
Note: _services._udp.local is the right name for discovery
even if your service is TCP - there is no _services._tcp.local
Note: I am assuming .local is the suffix, which is likely true for
MDNS but probably not if you are using DNS-SD with regular DNS
MDNS
The single biggest problem when implementing MDNS is the lack of
tooling to test it against. In my experience:
dig: historically, you used to be able to dig @224.0.0.251 -p 5353 name.local but it largely worked by accident and now it
doesn't. Note
that even when it did work it wasn't sending the same packets as a
real MDNS query.
avahi-browse: e.g. avahi-browse -v -a shows all the services on
the LAN. Note that is is a frontend to the persistent avahi-daemon
and there is caching happening in there somewhere, so if it didn't
work then and you made a change and restarted your service ... is
your service still broken or did it not reissue the query? shrug-emoji.gif
mquery from Jeremie Miller's mdnsd: note that this has been forked
into a million pieces. The one I'm using is
https://github.com/troglobit/mdnsd.
This works for querying the _services._udp.local discovery name
but if you run e.g. mquery _scanner._tcp.local it appears to send
the query and then sit there silently ignoring the responses. But:
it doesn't cache as avahi-browse does, so that's good.
wireshark, of course. Wireshark is pretty good, but note that it
will display your replies as "Unsolicited" because the query id for
MDNS is (per the standard) 0, so there is no way for it to correlate
them with requests.
mdns-debugger was
handy for pointing out my TTLs (and a lot of other TTLs) were wrong.
It didn't point out that my PTR record data was incorrectly encoded
and was therefore naming a nonexistent A record, which was a source
of much hair tearing.
there are a couple of Android apps I also used, mostly to see what
they'd do when nothing was working (see "hair tearing" above) and I
wa sout of ideas. "mDNS Discovery" (com.mdns_discovery.app) and
"Service Browser" (com.druk.servicebrowser). The first one is
prettier, the second one very helpfully rendered the errant PTR as
with a backslash as eculogical\.local and so led me to the said
encoding error.
Where are we now?
I believe that it now does everything an mdns responder SHOULD(sic) do
except
compress labels in response data
ignore queries for records where the record we'd send is already in the answers section of the query message
NSEC
and I can't decide, in the context of this being a program that
probably nobody else in the world will ever use and even I will only
use on one single piece of hardware (I only have one motorbike)
whether implementing those things is a good and laudable decision
because spec compliance is important, or just a way of further putting
off the inevitable next step which involves writing the Android app
to collect the data.
It also could do with being extracted into its own module/crate/thing
to be more modular. I'd say "to aid reuse" but I don't think anyone
really wants to (or should want to) reuse my novice-level Rust code.
Learning in public.