What is the latest update on slowness of native_compute on MacOS https://github.com/coq/coq/issues/11178 ?

I would like to run benchmarks comparing it to our new compilation chains (CertiCoq and coq-malfunction), and CertiCoq also uses Dynlinking in some scenarios. But if it's an unfair comparison on MacOS we should do benchmarking on Linux only. Maybe @Paolo Giarrusso or @Michael Soegtrop have some recent experience with this?

I've always disabled native_compute at Coq compile-time since back then.... IIRC the slowdown was never that catastrophic for me

Same for me (native compute disabled). Also I don't have that many applications for native compute - mostly I work with stuff which involves - let's call it "managed partial computation". I would have to switch VST proof automation to fully reflective. Maybe I do this one day ...

But if I can help with testing ...

Maybe @Jason Gross or @Guillaume Melquiond have some experience with it? Or @Karl Palmskog knows about some coq-community projects using it?

Is there any way to get Coq's bench to run on Mac? The coq-neural-net-interp-computed-lite package is a decent bench of native_compute; the lines that use native_compute are ~2x faster in native than the vm. It's pretty heavy on primitive arrays, ints, and floats though.

are you looking for any project in Coq-community that uses `native_compute`

? We have Coqtail-math that used to use it, but I had to replace it with `vm_compute`

for some compatibility reason in the diff here. The diff shows how to enable `native_compute`

again for benchmarking

for context, Coqtail-math used it for Miller-Rabin primality test, could probably be scaled up by adjusting parameters or similar

@Jason Gross interesting. We don't have prim arrays down the pipelines yet, though. We need a C version.

if I remember correctly, the SHA-256 Gallina formalization in VST had some examples that used `native_compute`

, and the speedup was pretty staggering vs. `vm_compute`

.

Fiat-Crypto also has a couple examples using `native_compute`

here, which IIRC had a modest speedup over `vm_compute`

.

Karl Palmskog said:

if I remember correctly, the SHA-256 Gallina formalization in VST had some examples that used

`native_compute`

, and the speedup was pretty staggering vs.`vm_compute`

.

Yep, I have this benchmark already.

Wouldn't we be comparing apples and oranges there? AFAIK the certicoq pipeline does not implement accumulators and so cannot compute on open terms (or containing axioms). @Jason Gross are your examples closed computations?

@Jason Gross : what is the issue with Coq's bench on Mac? I never tried it, but usually getting stuff which runs on Linux to run on MacOS is not much more than changing some arguments to sed and similar (or use gnu sed).

Pierre-Marie Pédrot said:

Wouldn't we be comparing apples and oranges there? AFAIK the certicoq pipeline does not implement accumulators and so cannot compute on open terms (or containing axioms). Jason Gross are your examples closed computations?

Indeed the examples have to be closed computations

If the question is whether I am using `native_compute`

, then yes, I use it routinely through CoqInterval. Here is a small bench that I was running no later than yesterday to compare `vm_compute`

and `native_compute`

. (Obviously, the goal is trivial, and that is the wrong way of using `interval`

on purpose.)

```
From Coq Require Import Reals.
From Interval Require Import Tactic.
Open Scope R_scope.
(* native_compute *)
Goal forall x, 10 <= x <= 11 -> Rabs (exp x - exp x) <= 0x1p-6.
Proof. intros x Hx. Time interval with (i_bisect x, i_depth 30, i_native_compute). Qed.
(* Finished transaction in 38.025 secs (37.778u,0.07s) *)
(* vm_compute *)
Goal forall x, 10 <= x <= 11 -> Rabs (exp x - exp x) <= 0x1p-6.
Proof. intros x Hx. Time interval with (i_bisect x, i_depth 30). Qed.
(* Finished transaction in 159.933 secs (159.981u,0.031s) *)
```

Thnaks @Guillaume Melquiond, I think that well be useful and easy to try :)

I still find it amazing that coq-interval with vm_compute can do 2^30 computations of `Rabs (exp x - exp x) <= 0x1p-6`

in 160s - on a 4 GHz machine this are 640 cycles, about 300 cycles per interval computation of exp x - I guess to 60 bits precision. For the VM of a proof assistant, this is not bad at all.

I don't think it is that high. I think it is closer to 2^24 interval computations of `exp x`

, so not so great. But the reason I ran this benchmark is that I wanted to evaluate the implementation of the exponential that will be shipped in the next release of CoqInterval. The new implementation runs the same benchmark under 8 seconds with `vm_compute`

(i.e., 20x faster than the current implementation). Therefore, assuming I am in the correct ballpark with my estimate of 2^24 computations, gives 1900 cycles per interval computation with the VM, which I find quite satisfying. (And in case you are wondering, only exp will be optimized in the next release. It will take some more time for me to optimize the other functions in the same way.)

I would have thought that maybe smtcoq used vm_compute or native_compute to check certificates as well but I can't find a single reference to them in the smtcoq repo, @Chantal Keller am I misguided here?

SMTCoq uses vm_compute: see e.g. occurrences of `vm_`

in https://github.com/smtcoq/smtcoq/blob/coq-8.13/src/trace/smtCommands.ml, which refers to `Tactics.vm_cast_no_check`

has anyone tried switching to `native_compute`

on fourcolor?

Ah thanks @Chantal Keller !

@Chantal Keller I have two additional question, is certificate checking a bottleneck sometimes? Is reification a bottleneck? It it done in ocaml/ltac/?

Certificate checking is most often much faster than for the SMT solver to produce it, so it is not a bottleneck

Reification is done in OCaml and I never measured the time spent as it has always been negligeable

Karl Palmskog said:

has anyone tried switching to

`native_compute`

on fourcolor?

See Section 4 of https://inria.hal.science/hal-00650940

Pierre-Marie Pédrot said:

Wouldn't we be comparing apples and oranges there? AFAIK the certicoq pipeline does not implement accumulators and so cannot compute on open terms (or containing axioms). Jason Gross are your examples closed computations?

Yes, the nnci ones produce concrete arrays of floats, and the Fiat Crypto ones produce concrete lists of Z (or similar)

Michael Soegtrop said:

Jason Gross : what is the issue with Coq's bench on Mac? I never tried it, but usually getting stuff which runs on Linux to run on MacOS is not much more than changing some arguments to sed and similar (or use gnu sed).

AFAIK there's no way to say `@coqbot bench on mac`

. Nothing more than that

Last updated: Oct 13 2024 at 01:02 UTC