I said in the previous entry that "pairing is not actually going to
help much anyway", with the implication that I was done thinking about
BLE. Turns out I was not done thinking about BLE. Notwithstanding that
the scope for anyone to maliciously connect the device to the wrong
wifi is pretty limited, the attack I hadn't considered is that an
attacker might be listening to us as we send the credentials for the
right wifi network. Which could be bad.
It continues to be true, though, that "pairing is not actually going
to help much anyway": although "Just Works" pairing is encypted
against passive eavesdropping, an active MITM will still see
everything we send, and can be put together for the price of two
bluetooth-capable MCUs. So, instead of relying on the transport we're
adding encryption/authentication of our own.
getrandom
A lot of Rust crypto stuff depends on the getrandom
crate that provides an interface
to the OS's randomness source.
getrandom doesn't have a supported target for esp32 no_std, meaning it
won't build without a custom randomness source. Further, the
mechanisom for adding a custom source has changed between getrandom
0.2 and 0.3. If you follow the current getrandom docs for adding the
esp32 hardware random source, and then cargo add some crate that
transitively adds a rand_core dependency, you may find as I did that
your crate depends on the older version: you now have two versions
of getrandom in your project and one of them still doesn't work. I
don't have nubers for how likely this is but I've hit it twice with
different randomly(sic) selected libraries.
The 0.2.17 docs
explain how to do custom implementations for the version of getrandom that
everyone's actually using, I can't now remember where I found the source for
making it work with esp_hal, but the tl;dr is I did this.
salt and shake
NaCl/libsodium seems a defensible choice of library that avoids the
"don't roll your own crypto" injunction. I picked
crypto_box
from nacl-compat on the Rust side, and expected PyNaCl's
nacl.public.Box to work with it. It does, but ...
when you encrypt with python, you get back a result which is 40
bytes longer than the input plaintext, because it "stores
authentication information and the nonce alongside [the ciphertext]"
when you decrypt with Rust, it expects the nonce and the ciphertext
to be passed as separate arguments to decrypt
the nonce is 24 bytes, not 40 bytes, just in case you were thinking
that explains the difference
It's simple once you know what's going on, but it would
be a lot simpler to work out what's going on if the docs were a bit
more precise than just saying "alongside". For posterity, the first 24
bytes are nonce, and the rest is a concatenation of the ciphertext and
a "16 byte authenticator", which is also described as a "MAC tag". The
Rust-side decrypt method expects the nonce as its first argument and
the combined ciphertext+tag as the second, so although I didn't find
out which of those two comes first, blessedly, we don't need to know.
"alongside". Bah.
(It took longer to work this out than it should have, because in the
process of sending it across Bluetooth I managed to append a chunk of
NUL bytes to the ciphertext, which not surprisingly caused it to
fail to decrypt no matter how I carved it up.)
there may be TrouBLE ahead
It wouldn't feel real if I hadn't had to spend some time after that
wrestling with bluetooth again.
This time: when you write a characteristic value that exceeds the MTU,
instead of writing it all in one go it uses a sequence of PrepareWrite
messages to send each chunk and then an ExecuteWrite to stitch it
together. TrouBLE (the rust bluetooth stack) handles this insofar as
it processes these messages and updates the value in the server, but it means you don't get a
GattEvent::Write
event that you can use to perform any other action (in our case,
save the data to nvs and and reboot).
Either you can match
GattEvent::Other
and then do arcane things to find out if it was an ExecuteWrite and
was for the appropriate attribute handle, or you can do what I did and add an
extra "I'm done now" step to the provisioning process where it writes
a boolean when it's finished.
More eculocate: I spent the past couple of weeks first on rearranging
the client code to be a bit less awful - now there's a
command line client as well as an android/termux GUI one, and they share code - and then on adding an
affordance for configuring the wifi network that doesn't require
hardcoding it at build time.
This is kind of a distraction from plugging it into an actual
motorbike, because I could have just hardcoded the wifi details for
tethering to my phone, but once it's attached to the bike then I can
only flash it OTA and I don't want to be messing around tethering my
development machine to my phone just so it can be on the same network
as the bike.
In January I said:
There's a convention for provisioning wifi on these devices
which involves using a mobile phone app to connect to it using BLE
then sending the ssid/password of the chosen access point. In fact
there's even a prebuilt Android app which we can use and an esp32
arduino library which we can't (because we have elected to make our
lives difficult and use Rust instead). But I am led to believe that
"rewrite everything in Rust" is idiomatic for Rust programmers anyway.
I haven't done this yet.
And ... I still haven't, or not as such, because change of
plan. Because I would have to write the device-side code myself
anyway (because Rust), and I would want to write the phone-side code
instead of using their app (because I'd like to have one app to do
everything instead of separate apps for provisioning and for
recording), then I have free rein on the protocol, and the Espressif
protocol
seems ... quite involved?
(What I didn't learn until checking references for writing this blog
post which has already taken way too long it is already Tuesday is
that the Espressif protocol I linked there no longer exists
except in git history, and appears to have been replaced by a different
one called
Blufi
so I am feeling even less bad about swerving reimplementing it)
Background, if you're lucky enough to not know much about Bluetooth
Low Energy: the usual interaction model is "GATT", which can be
thought of as a network-based key/value store. A device defines
"characteristics" and "attributes" and allows read/write access to
them. If you're working "with the grain" of the design, you use this
to model some kind of object/record/aggregate/entity -
or, alternatively, you could just have attributes for rx and tx
and send streams of data over them in any format you like.
The Espressif protocol - as was - defines a protobuf-based protocol
that runs over both BLE and also WiFi with SoftAP, whereas if we're
restricting ourselves to BLE we can actually leverage GATT instead of
using it to emulate a stream interface.
So what have we done instead? We have four characteristics (with
hindsight maybe this should have been four attritbutes of one
characteristic). eculocate scans for wifi networks it can see, then
it sets the attribute max_index to the number ot visible
networks. The client then loops throuh 1..max_index writing to the
attribute current_index, which causes eculocate to update
current_network to contain the ssid of the nth scanned network. When
the client finds a network it likes, it writes the corresponding
password to secret and eculocate saves the ssid/secret into
flash. The loop feels a bit weird, but there isn't any way - at least,
I can't see any way - to specify that an attribute is array-valued.
To provide security for the precious wifi credentials we simply enable
pairing, which causes the two ends to encrypt with AES-CCM.
Sounds simple, and it probably would be except that USB bluetooth
hardware is a market for
lemons (I
originally said something much much ruder there) and the degree of
care and research needed to get one that works properly in Linux far
exceeds the degree of care and research I exerted. The first dongle
i bought, which identifies as 33fa:0010 and is said to be based on a
"Barrot" chipset, is a buggy pile of shite which locks up and resets
randomly, even when using a kernel that contains
e7d1cad6545c,
meaning that every test of the code had to be preceded by removing and
reloading the module and thewn it would work about 70% of the time
provided I didn't try to pair, which caused it to time out consistently.
Then I dug out a Raspberry Pi Zero W from the junk box, but the
builtin bluetooth on that only supports 4.1, and the TrouBLE stack
won't pair with anything older than 4.2
(I have since received an Edup B3536 dongle which is based on the
Realtek chipset and said to be a whole lot better - but it only
arrived after I started this blog post and I haven't tried it in anger
yet)
It turns out that with the current hardware configuration pairing is
not actually going to help much anyway. The device has no
keyboard or display (thus no way to display or confirm a pairing pin),
so is restricted to "just works" pairing, which offers no protection
against MITM at the time of pairing. Since all we're using Bluetooth
for is entering the wifi details, and the BLE service is shut down
once it successfully connects to wifi, there is no point setting up a
secure connection "for later" as there is no "later".
eculocate goes into "wifi chooser" mode when there's no saved network
or when it can't connect to the saved network, so a black hat (or
black balaclava) could attack it by separating your motorbike from
your phone so the phone is out of range, then turn the ignition on,
then configuring a different wifi network. It won't help them a lot as
they're still lacking the ed25119 key that controls TCP
connections. And at this point if they're sitting on your motorbike
and the ignition is on, maybe you have bigger problems ... perhaps we
could regard the bike's ignition key as a hardware token.
The proper way to do pairing and have it actually be useful would be
to add an NFC reader/writer and use OOB authentication
but I am very much feeling like I wil save that for another day.
This week has been about implementing OTA flashing.
The ESP32 supports an A/B partitioning scheme for the application
partition, so you can safely install a new firmware without destroying
the firmware that's currently running. You write to partition ota_0
having booted from ota_1 and then flip the active boot partition and
do the opposite on the next update. Rust support for this is in the
esp_hal_ota crate, but if you
want to understand what it's doing you should first read the official
Espressif C API documentation.
It's pretty easy to use: as per the
example
you make an Ota object, you call ota_begin on it, and then you call
ota_write chunk every time you get some new data (from the network
or from BLE or from the USB stick or wherever you are getting the
update from) until the whole thing's transferred, then you call ota_flush
at the end.
It does/did seem to have a bug (as far as I can tell): if you turn off
the builtin CRC checking (because you are using a reliable network, or
you have some other form of integrity check), it will try to do it
anyway. It's a one-line fix
which I will open a PR for as soon as I'm a bit more convinced I
haven't misundestood the whole thing.
Anyway, I plugged it in, hooked it up and ... it worked. It was
however a lot slower (it took minutes) than flashing using espflash
over USB, which takes seconds, and this is because it (more precisely,
the API afforded by
embedded-storage)
takes a quite naive approach to managing erase blocks: if you get 1187
bytes from the network and ask embedded-storage to write it, it will
read the whole erase block (4096 bytes) into RAM, erase it - you can
only erase in units of one block - and then write the modified block
back. Given you'd just written less than an entire block, the next
packet will cause it to erase the same block again to write a different
part of it. You're probably erasing each block two or three times.
Rather than change esp-hal-ota I decided to take the coward's way out
and buffer data in my application until I had a block's worth. Suddenly it got massively faster.
The other feature this week was authentication: the device should only
accept legitimate (read: cryptograpically signed) firmware. The
constraint here is that we can't read the whole firmware into RAM
before verifying the signature - there's 384K of RAM and 4MB of flash,
it's not going to fit. Our self-imposed additional constraint is that
we don't want to write the firmware to flash as we go along and then
validate the signature at the end, because if it's wrong then we've
already overwritten a working firmware with malicious data. I say this
is self-imposed because it's not actually obvious that it's a real
problem unless there's some way for the bad actor to switch to the
new firmware, but it doesn't seem pretty even so.
So tl;dr we need to verify the firmware before we've read it all, and
before we've written any of it. We do this using the approach
explained by Gennaro and Rohatgi in How to Sign Digital
Streams -
the first block is digitally signed and each block contains a sha256
of the following block.
We use
ed25519-dalek
for signature checking, because it looked easy to get running with no_std and I've
heard of at least two of the authors. For SHA256 - it turns out that the
ESP32-C3 has a hardware SHA accelerator, so we use esp_hal::sha here and don't need to do it in software.
I bashed my head against this for a while because I didn't read the F
manual closely enough. hasher.update doesn't eat all the bytes that
you feed it, and it returns the unconsumed data - so you have to run
it in a loop until it comes back empty. If you just call it once, as I
did, you get the hash of the first 32 bytes only.
(In the end, this - it seems to me - doesn't protect a lot better
against wiping good images than the verify-at-the-end approach. If
a MITM is able to substitute block 22 of 150 blocks with their own
data, we will write blocks 1-21 and then abort, whereas
verifying-at-the-end means we'll write blocks 1-150 but not switch to
the new image. Either way we've trashed whatever was there before)
That's bascally all there is to relate this week, although I also
invested a little more time reading bits of the Rust
Book
that our Rust study group at $work hasn't reached yet, in order to
make the error handling less crappy. Next steps are to write the
session registration thing so that UDP is authenticated, and to add
wifi provisioning so we're not hardcoding my wifi network details.
What's missing from this motorbike? The answer is shocking.
I removed the shock from my motorbike today so I can take it to
ABE tomorrow to be
rebuilt. Some notes for posterity and so that I remember how to
reinstall it.
I mostly followed the Haynes manual: the words are good but the
pictures are awful. They say to remove the fuel tank, but I didn't
really want to, on account of how it's full of fuel. I found it worked
to lift the tank a bit and stick some wooden blocks underneath to hold
it up. While doing this the vent/breather pipe popped off, as it
always does.
In the order that I tackled them:
the reservoir needs to be removed: use a JIS screwdriver to loosen
the strap around it and then just slide it out.
to get to the nut/bolt at the lower end I had to loosen the rear
hugger - two screws removed. I used a socket on the nut and a
spanner on the bolt to stop it from spinning while I turned the nut.
I can't see any way to get a socket onto the nut/bolt at the top
end, but it eventually succumbed to two spanners. Reassembling this
is going to be "fun" if it's finicky about torque settings.
then it's "just" a matter of untangling everything to remove the
shock and its reservoir. I had to unplug the connector for the
stator cable, as there was no way to get the reservoir through the
tangle otherwise.
Hopefully having now written this down I'll not forget to reattach all
the bits
I'll tell you a joke about UDP, but you might not get it.
We have a new name. "Thing I can plug into my motorbike ECU to log the
data (rpm, speed, throttle position, temperatures etc etc) it
produces" is Leonard-of-Quirm-level naming. I'd provisionally been
calling it "eculogical" which I didn't like, and now it's called
"eculocate" which I ... can tolerate.
And I've got it to the point where it (kind of) works - but, now I've
decided I need to semi-fundamentally break it again. I'll get to
that.
On the server side we have a UDP socket that listens for subscription message
containing [(interval, table-number, start, end), ...] (actually
binary encoded) and then sends back the requested table data once
every interval milliseconds for the next minute. Then it stops,
because this is UDP and we can't reliably tell when the peer has gone
away, so the peer should send another subscription message in the
meantime if it wants to carry on receiving.
For now we're just offering the raw tables, because I'm going to need
much more example data to figure out the structure. Eventually we'll
do some processing on device so that clients can query "RPM" or "TPS"
without having to know their table/offset - as that varies between
bike models.
Notes:
the embassy-net IP stack (actually smoltcp) requires you to
statically declare how many sockets you're going to
use
which is fine once you know you have to.
And we have an Android client. Well, it's Android insofar as it runs
on my phone, but I don't think it'd qualify for Android Market or the
Play Store or whatever it's called now. I sidestepped the
whole android app development slog, by installing Termux and
Termux:GUI on my phone and writing the client side as a Python script. I don't even like Python and I still found this preferable to
the Android Studio build/run process: I simply sshed into my phone and
used tramp to edit the script. I believe that Termux:GUI doesn't
support the full range of Android widgets but it has buttons and
labels and text boxes and LinearLayouts which is enough for me. Adding
dns-sd (zeronconf) support was the work of about 20 minutes, which was nice.
Having achieved that milestone I made a list of what's left before I
can plug it into my motorbike and take it for a ride (cable, power
supply, some form of protective casing) and realised that once I
detach it from its USB umbilical I will no longer be able to release
new versions simply by invoking cargo run. So, it needs a mechanism
for OTA updates, and this should probably come with some kind of
auth[nz] so that not just any Tom, Dick or Harry on the same wifi
network could flash random crap onto it. Then I considered that if
we're not trusting the wifi, the actual UDP service (which is
currently read-only but maybe some day might include a means of
writing to (and therefore probably bricking) the ecu) is also
sensitive.
Here's the plan:
we'll make an EdDSA key pair at build time and embed the public key in the binary
build tooling will sign the release artefact with the key
a TCP socket will listen for OTA update requests and verify the signature before
writing to the flash
the TCP socket will also listen for session key registrations (signed in the same way) and remember them for x hours (or until the ignition
is turned off and we lose power)
the UDP listener will reject subscription requests unless they come with a
valid session key
Additionally, we need to change the dns-sd stuff to advertise a TCP
service, the client to register a session key when it starts, the
subscription message format to include the session key, and the UDP
listener to check it. Which is what I meant when I said
"semi-fundamentally break it".
If this were commercial/proprietary software then we'd have separate
keys for the firmware signing and for the client. That seems less of
an issue when it's most likely the same person building the software as
is using the client, but it might be worth doing anyway.
Current status: bodged together a TCP listener, haven't touched on
crypto yet, and so far it only pretends to do the OTA update.