Conversation
src/decode.rs
Outdated
|
|
||
| impl<'a> Decoder<'a> { | ||
| /// Returns a new decoder which will deserialise from `read_from` and also an `Index` informing | ||
| /// the consumer of the expected contents. |
There was a problem hiding this comment.
I wonder if this docstring should mention that the ABI version number is checked.
(Sorry, only thought of this after raising the PR).
There was a problem hiding this comment.
Presumably that's not the only check that, conceptually, it might one day do?
There was a problem hiding this comment.
It's not impossible that we might check something else at some point, but I can't think of anything off the top of my head.
What are you suggesting? That it's not worth mentioning?
There was a problem hiding this comment.
Yeah, I doubt it's worth mentioning.
|
"Meta-data" is, to me, a meaningless term. I'm happy for |
src/decode.rs
Outdated
| impl<'a> Decoder<'a> { | ||
| /// Returns a new decoder which will deserialise from `read_from` and also an `Index` informing | ||
| /// the consumer of the expected contents. | ||
| pub fn new(read_from: &'a mut dyn Read) -> Result<(Index, Self), decode::Error> { |
There was a problem hiding this comment.
This feels like it's more of a from_ method than a new method, because I can vaguely imagine other things than a file that we want to deserialise (e.g. a string).
src/decode.rs
Outdated
|
|
||
| let index = Index::deserialize(&mut deser)?; | ||
| Ok(( | ||
| index.clone(), |
There was a problem hiding this comment.
Why are we cloning the index as part of a tuple when we include it in the struct too?
There was a problem hiding this comment.
The one in the struct is used to keep track of what the deserialiser has already given out, and is mutated each time the user is given (in this case) a MIR.
This will become more important later where the deserialiser will likely give out other kinds of data than just MIR. In that case, the data would have to be read out by the consumer in the correct order.
I considered using a state machine to instead of a mutable copy of the index, but decided I was trying to be too clever.
I also experimented with having a deserialiser which acts only as an interface to hand out iterators, so that the order the user requests the data is irrelevant. The problem there is that we can only have one mutable reference to the underlying T: Read and once we give it to serde, it's gone forever. This means we can't seek the underlying stream for subsequent iterator requests.
I hope that made sense, it's hard to explain.
src/lib.rs
Outdated
| /// YkMir -- Metadata serialiser and deserialiser for Yorick. | ||
| /// | ||
| /// This crate allows ykrustc to serialise the compiler's MIR structures for later deserialisation | ||
| /// by the Yorick runtime. The encoded data is specialised to a particular Rust input program (this |
There was a problem hiding this comment.
What does "specialised to a particular Rust input" mean?
There was a problem hiding this comment.
OK, I only have a vague understand of this at the current time, but as I understand it:
The metadata encoded into rlibs and dynlibs by Rustc is "generic" so that things like monomorphisations, type resolution and inlining can happen for arbitrary inputs (Rust programs) in later compilation sessions.
When we read our metadata at runtime, we want the data for the current binary. We don't care about the general case. We have two options:
- Spin up the compiler to resolve rusts metadata to our binary (not even sure how possible this is).
- Resolve everything at compile time and serialise that.
I'm proposing the latter.
There was a problem hiding this comment.
We could just drop the paragraph about specialisation until we understand it better.
I admit, there's a liberal amount of hand waving happening here.
There was a problem hiding this comment.
I think it might be better to drop it if we don't understand it :) That way we leave ourselves a bit of flexibility in the future if we haven't quite got things right / fully understood them.
src/lib.rs
Outdated
| /// | ||
| /// The MIR data is serialised to msgpack data in the following form: | ||
| /// | ||
| /// version -- The ABI version number of the serialised data. |
There was a problem hiding this comment.
Ah hah! I wasn't sure myself.
The version is just a number we bump when shape of the data structures serialised within. For example, when I add statement support I would bump the number, as struct Mir will get bigger and we don't want an earlier deserialiser to try and decode it.
What would be the correct term? "serialisation version"?
There was a problem hiding this comment.
"Serialisation version" works for me.
I'm not sure it's meaningless. Usually it means data which is secondary to the main data. In this scenario, the MIR is secondary to the compiled code. I suggested "meta-data" to be in line with what upstream call their serialisations. If you have another suggestion I don't mind trying it, as long as we describe in the crate docstring the relation to upstream "meta-data". |
|
Ah, if they call this stuff meta-data, fair enough. |
|
I think this is ready for re-review. The API is much simpler now. |
src/lib.rs
Outdated
|
|
||
| /// The version number of the data structures. | ||
| const FORMAT_VERSION: usize = 0; | ||
| const SER_VERSION: usize = 0; |
There was a problem hiding this comment.
I think this needs to be a fixed size integer, otherwise we might read in a file from another architecture and do the wrong thing. Which suggests we probably need to stick the architecture in here somewhere too...
There was a problem hiding this comment.
I think you are right on both counts. How about a u32 for the version, and the platform string?
There was a problem hiding this comment.
The platform string is variable size, which might be a bit annoying?
There was a problem hiding this comment.
I don't think the serialiser cares, but let's try it.
Failing that we can use a manually specified enum.
There was a problem hiding this comment.
Also (ho ho ho!) does the ser_version change alongside the VM version? In other words, we might keep the same serialisation format, but some other detail changes and then boom! So we probably need some sort of YK version number in there too?
There was a problem hiding this comment.
I've just understood you comment about it being built in to the executable.
The problem is, I think, that there are two compile-times: the compiler, and the VM. We could get into a scenario where the compiler was built with one version of ykpack, and the VM was built using another. Right?
So, is it the semantic version of ykpack we actually care for here? Should that be what is encoded?
There was a problem hiding this comment.
At least at first, we're probably going to compile to a big binary blob, so I don't think we need to worry about this (in other words, that blob will carry with it the run-time compiler which, by definition, must be in sync with everything else in the blob). I think we can just drop the version checks entirely, at least as long as we compile to a single blob.
There was a problem hiding this comment.
Hrm. I'm not sure I follow. I thought the VM would depend upon ykrt (and thus ykpack) via regular Cargo dependencies.
So the compiler will have (somewhere):
[dependencies]
ykpack = "x.y"
And then the VM would (perhaps transitively) have a similar dependency line in a Cargo manifest.
I've not considered the "blob" approach, nor know how hard it is to achieve.
There was a problem hiding this comment.
Exactly. cargo does the hard work for us, and we use the same tricks grmtools does to include small binary blobs into our big binary blob. If I'm wrong, we can put this stuff back later, but let's just drop the version checks for now, and make our life simpler.
There was a problem hiding this comment.
OK, I'll have to trust you on this one.
It seems to me that the VM author has to be careful to use the same version of ykpack as ykrustc used or boom.
src/lib.rs
Outdated
| for md in &inputs { | ||
| enc.serialise(md.clone()).unwrap(); | ||
| } | ||
| // User forgot: enc.done() |
There was a problem hiding this comment.
Would this be better?
// The user forgot to finalise the serialisation with a call to `enc.done()`.
There was a problem hiding this comment.
You need to say something like "This test fails because the user forgot to add enc.done", otherwise the comment floats adrift in a universe of gnomisms.
There was a problem hiding this comment.
OK. I would have thought this was evident from #[should_panic(expected = "not marked done")].
There was a problem hiding this comment.
I know your default style of comments, so I got it, but I doubt many other people would get it TBH.
|
|
||
| # We are using a git version to work around a bug: | ||
| # https://github.com/3Hren/msgpack-rust/issues/183 | ||
| [dependencies.rmp-serde] |
There was a problem hiding this comment.
If you want you can do something like rmp-serde = { git = "...", rev = "..." }. I have no preferences about which is better.
There was a problem hiding this comment.
Yeah, I started with that, but it makes a very long time. I prefer this way.
src/decode.rs
Outdated
| } | ||
|
|
||
| // Iterate over meta-data. | ||
| pub fn iter(self) -> MetaDataIterator<'a> { |
There was a problem hiding this comment.
Is it intentional that iter consumes self?
There was a problem hiding this comment.
Yeah, but if you think it's important to be able to allow the user to iterate multiple times using the same decoder instance, i can try &mut self. It's not immediately clear if it's possible though... I think you'd need >1 mutable ref to allow that.
There was a problem hiding this comment.
It might be better off as-is then. It might be worth documenting this, something like "Note that iterating over this structure consumes it"?
There was a problem hiding this comment.
Let me try to confirm my suspicion regarding &muts and get back to you.
There was a problem hiding this comment.
I've tried a few different approaches. I don't think it's possible to allow multiple iterations over the same decoder.
In light of that, I do wonder if instead of having a decoder struct from which we can get an iterator, we could have a decoder which is an iterator. What would you prefer?
There was a problem hiding this comment.
Good question. If it can only be a one-off iterator, then making the thole thing a one-off iterator does seem like a good idea.
There was a problem hiding this comment.
Let's try it in a separate commit, and we can revert if we hate it.
src/encode.rs
Outdated
|
|
||
| /// Serialises a meta-data item. | ||
| pub fn serialise(&mut self, md: MetaData) -> Result<(), encode::Error> { | ||
| Some(md).serialize(&mut self.ser)?; |
There was a problem hiding this comment.
If serialize return () on success, you can probably simplify this method to just Some(md).serialize(...).
| /// Finalises the serialisation and writes a sentinal. | ||
| pub fn done(mut self) -> Result<(), encode::Error> { | ||
| None::<Option<MetaData>>.serialize(&mut self.ser)?; | ||
| self.done = true; |
There was a problem hiding this comment.
self.done is never read from AFAICS, so I'm not sure what it's for?
There was a problem hiding this comment.
It should be used in the destructor.
src/lib.rs
Outdated
| /// version -- The serialisation format version number. | ||
| /// This should be bumped every time the meta-data structure changes. | ||
| /// md_0: \ | ||
| /// ...⋮ - Meta-data items. |
There was a problem hiding this comment.
I don't think we need the "⋮" character here as well as "...".
There was a problem hiding this comment.
Hah! I tried to paste a ⋮ into vim but thought it didn't work! I guess liberation mono doesn't have that glyph.
src/lib.rs
Outdated
| /// sentinal -- End of meta-data marker. | ||
| /// ----------- | ||
| /// | ||
| /// Where each md_i is an instance of `Some(MetaData)` and the sentinel is a `None`. |
There was a problem hiding this comment.
Double space at the beginning of the line should be single space?
There was a problem hiding this comment.
"Sentinel" and "sentinal" -- which is it? ;)
There was a problem hiding this comment.
Oops. I think it's "sentinel".
| } | ||
|
|
||
| /// The top-level meta-data type. | ||
| #[derive(Serialize, Deserialize, PartialEq, Eq, Debug, Clone)] |
There was a problem hiding this comment.
Do we need all of these to be derived?
There was a problem hiding this comment.
- Serialize and Deserialize, certainly.
- Eq and PartialEq simplify tests, and may be useful later on too.
- Debug is useful.
- Clone is also used in tests, since the serialiser takes ownership of what's being serialised, and we need to compare the original input to what comes back from the round trip.
There was a problem hiding this comment.
Let's delete the others that we don't need then: it just leads to bloat (and also makes me wonder what it mean to compare these things for equality!).
There was a problem hiding this comment.
I think you are suggesting we should remove Eq and PartialEq. This is going to make testing quite awkward, as I'll have to do a big old restructuring with multiple comparisons.
E.g. instead of comparing two metadata's using an assert_eq!(expect_md, got_md) I'll have to have a match statement for each variant, then in the case of a MIR, destructure each basic block, and the statements within, manually comparing.
There was a problem hiding this comment.
You've got me thinking though. What does a derived Eq for a reference do? Compare addresses? We need to be careful, as that might not be what we want.
As it happens, we have no references in our md at the moment.
There was a problem hiding this comment.
All comparisons involve references: fn eq(&self, other: &Self) in essence. So that isn't an issue. [If you mean "pointers", though, then yes, I think the pointer is compared and not the thing its pointed to.]
There was a problem hiding this comment.
I see.
I'd like to think we are sane enough to not serialise a pointer :P
|
I'm wondering if instead of just "metadata" we want to say something like "crate metadata" or similar? |
|
|
||
| impl<'a> Drop for Encoder<'a> { | ||
| fn drop(&mut self) { | ||
| if !self.done { |
There was a problem hiding this comment.
^ Use of self.done is here.
Why limit ourselves to crates? Who knows what we might need later :P |
|
I'm still not super happy with "metadata" as a name, just because it's void of content. It must be metadata of something, so I'm wondering what that something is. |
src/decode.rs
Outdated
| } | ||
|
|
||
| /// A (fallible) meta-data iterator. | ||
| /// Reusing the iterator once is has yielded `None` leads to undefined behaviour. |
There was a problem hiding this comment.
What did you think of this? We could actually force the iterator to continually return None instead of shooting off the end of the cursor, but it would require one more flag in the struct.
There was a problem hiding this comment.
If we're not exposing this to the end-user, then I think this is OK.
[Just spotted a typo though. Should be:"Reusing the iterator once it has yielded None...".]
|
BTW, are we still good to rename this to |
|
For the record, we've agreed on |
|
This should address all comments. |
src/decode.rs
Outdated
| use std::io::Read; | ||
|
|
||
| /// The pack decoder. | ||
| /// Offfers a simple iterator interface to serialised packs. |
|
Try this. |
README.md
Outdated
| Pack encoder / decoder for Yorick. | ||
|
|
||
| This library allows ykrustc to serialise various compile-time information for | ||
| later reading at runtime. |
There was a problem hiding this comment.
For consistency, maybe "run-time"?
|
Please squash. |
|
Oops, I'm ahead of myself! We need to do the "run-time" thing first. Sorry! |
|
Squash now? |
|
Yes please. |
This change adds a streamer "pack" encoder and decoder. The idea is that compile-time info can be serialised by ykrustc at compile-time and deserialised at run-time. For example the MIR will need to be consulted at runtime for code generation. We will have ykrustc serialise the MIR into packs which are stored in the resulting ELF file ready to be used later at run-time.
|
squashed, and commit message fleshed out. |
|
Oops. |
Here's a stab at the yorick metadata serialiser and deserialiser.
It currently does only what our custom serialiser did and no more. As such, there are some
FIXMEs.I spent quite some time trying to be too clever with the type system and with iterators and made quite a mess with the borrow checker. After about four iterations, I settled on this simpler and (I hope) idiot proof API.
I have a branch which changes to make the compiler to use this, but it depends upon this being merged:
rust-lang/rust#58013
Even once that is merged, I think we will have to rename the in-compiler serde crate in our compiler fork.
I'm starting to think
ykmirwasn't a good name. How aboutykmd, since we will probably be encoding more than just mir. We will be encoding metadata in general. Thoughts?