LW - Proveably Safe Self Driving Cars by Davidmanheim

The Nonlinear Library: LessWrong

Konten disediakan oleh The Nonlinear Fund. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh The Nonlinear Fund atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.

17d ago 11:40

MP3•Beranda episode

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 22, 2024 16:12 (10d ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Proveably Safe Self Driving Cars, published by Davidmanheim on September 15, 2024 on LessWrong.
I've seen a fair amount of skepticism about the "Provably Safe AI" paradigm, but I think detractors give it too little credit. I suspect this is largely because of
idea inoculation - people have heard an undeveloped or weak man version of the idea, for example, that we can use formal methods to state our goals and prove that an AI will do that, and have already dismissed it. (Not to pick on him at all, but see my
question for Scott Aaronson here.)
I will not argue that Guaranteed Safe AI solves AI safety generally, or that it could do so - I will leave that to others. Instead, I want to provide a concrete example of a near-term application, to respond to critics who say that proveability isn't useful because it can't be feasibly used in real world cases when it involves the physical world, and when it is embedded within messy human systems.
I am making far narrower claims than the general ones which have been debated, but at the very least I think it is useful to establish whether this is actually a point of disagreement. And finally, I will admit that the problem I'm describing would be adding proveability to
a largely solved problem, but it provides a concrete example for where the approach is viable.
A path to provably safe autonomous vehicles
To start, even critics agree that formal verification is possible, and is already used in practice in certain places. And given (formally specified) threat models in different narrow domains, there are ways to do threat and risk modeling and get different types of guarantees. For example, we already have proveably verifiable code for things like
microkernels, and that means we can prove that buffer overflows, arithmetic exceptions, and deadlocks are impossible, and have hard guarantees for worst case execution time. This is a basis for further applications - we want to start at the bottom and build on provably secure systems, and get additional guarantees beyond that point. If we plan to make autonomous cars that are provably safe, we would build
starting from that type of kernel, and then we "only" have all of the other safety issues to address.
Secondly, everyone seems to agree that provable safety in physical systems requires a model of the world, and given the limits of physics, the limits of our models, and so on, any such approach can only provide approximate guarantees, and proofs would be conditional on those models. For example, we aren't going to formally verify that Newtonian physics is correct, we're instead formally verifying that if Newtonian physics is correct, the car will not crash in some situation.
Proven Input Reliability
Given that, can we guarantee that a car has some low probability of crashing?
Again, we need to build from the bottom up. We can show that sensors have some specific failure rate, and use that to show a low probability of not identifying other cars, or humans - not in the direct formal verification sense, but instead with the types of guarantees typically used for hardware, with known failure rates, built in error detection, and redundancy.
I'm not going to talk about how to do that class of risk analysis, but (modulus adversarial attacks, which I'll mention later,) estimating engineering reliability is a solved problem - if we don't have other problems to deal with. But we do, because cars are complex and interact with the wider world - so the trick will be integrating those risk analysis guarantees that we can prove into larger systems, and finding ways to build broader guarantees on top of them.
But for the engineering reliability, we don't only have engineering proof. Work like
DARPA's VerifAI is "applying formal methods to perception and ML components." Building guarantees about perceptio...

1851 episode

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech