Skip to content

Performance Data Propagation#7

Merged
0xFA11 merged 16 commits intomasterfrom
rfc/performance-data
Feb 16, 2021
Merged

Performance Data Propagation#7
0xFA11 merged 16 commits intomasterfrom
rfc/performance-data

Conversation

@kvassall-unity
Copy link
Copy Markdown
Contributor

@kvassall-unity kvassall-unity commented Jan 29, 2021

@kvassall-unity kvassall-unity changed the title Initial commit for RFC on performance data Performance data propagation Jan 29, 2021
Comment thread text/0003-performance-data.md Outdated
Comment thread text/0003-performance-data.md Outdated
Comment thread text/0003-performance-data.md Outdated
Comment thread text/0003-performance-data.md Outdated
Comment thread text/0003-performance-data.md Outdated
@kvassall-unity kvassall-unity requested a review from 0xFA11 February 4, 2021 17:29
Comment thread text/0003-performance-data.md Outdated
Comment thread text/0003-performance-data.md Outdated
Comment thread text/0003-performance-data.md Outdated
Comment thread text/0003-performance-data.md Outdated
@wackoisgod

This comment has been minimized.

@kvassall-unity

This comment has been minimized.

@wackoisgod

This comment has been minimized.

@kvassall-unity

This comment has been minimized.

@becksebenius-unity

This comment has been minimized.

@becksebenius-unity

This comment has been minimized.

@wackoisgod

This comment has been minimized.

@kvassall-unity

This comment has been minimized.

@wackoisgod

This comment has been minimized.

Comment thread text/0003-performance-data.md Outdated
}

public interface ITransportProfilerData {
Dictionary<string, int> GetTransportGetData();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implementation detail and IMO doesn't block this RFC, but we should pass in the dictionary rather than having it return it so that we can re-use the dictionary between network ticks and avoid memory allocs

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see below that the unet transport example does cache its own dictionary. this approach could work as well, although I think this adds complexity because the internal dictionary is used externally and the transport doesn't know when it is safe to clear the dictionary for the next network tick

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the transport is in control of its own memory block and deciding what goes into it. It will cache regardless.
And yes i'm willing to refactor here in an actual PR here for performance gains. Passing in a dictionary to the transport might be useful in terms of clarity even though it will still be caching locally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add a similar implementation-detail question here — it might be too constraining to map string to int but not string to object. Since performance won't be a big concern here (throwing boxing cost out of the window), it might be useful (and more future-proof) to map string to object and allow passing other types such as float etc through this proposed design.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only somewhat agree. But for something that is typically a float (like timestamp) we can just represent with milliseconds. And outside of that i failed to come up with an example that required floats or a vector (which you can just split it up into three ints)

Having a contrarian example (which i don't have) would help me take a stronger position one way or the other

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still 51% on stringint mapping side but wanted to call out just in case. I'll try to convince both myself and you as well if I could find better examples.

@0xFA11 0xFA11 changed the title Performance data propagation Performance Data Propagation Feb 11, 2021
{
class ProfilerConstants
{
public const string NumberOfServerRPCs = "numberOfServerRPCs";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently went through an exercise to clean up the channel stuff that might be of use here. Check out Transport.cs circa line 9 / the 'Channel' definition. You can define each measurement as an enum, then if you need to output a string you can just call ToString() on it. Then you don't have to maintain the strings / pay for the string de-ref.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From another comment by @becksebenius-unity :

"Suggestion for the dictionary to use string keys instead of a predefined enum. unintuitively, it's actually better for performance (enums as dictionary keys actually get boxed 🤢). and if we're going to go with a flexible approach, lets maximize that instead of restricting them to predefined keys, that way users building their own tools for their own custom transport could extend. we can still provide predefined keys for the user in a constants class so that we can guarantee good data for our profiler."

Copy link
Copy Markdown
Contributor

@0xFA11 0xFA11 Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he's not suggesting to change Dictionary<string, int> API but to define an enum for "our" ProfilerConstants to be an enum and do ProfilerConstants.NumberOfServerRPCs.ToString() instead of raw string constants.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah i see. I can update to that. in the actual PR

Copy link
Copy Markdown
Contributor

@mattwalsh-unity mattwalsh-unity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment


# Prior art
[prior-art]: #prior-art

Copy link
Copy Markdown
Contributor

@mattwalsh-unity mattwalsh-unity Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a stat gathering feature added to MLAPI that might be extendable / have things we can steal (see ProfilerStat / ProfilerStatManager). It does a moving average. In particular I had to put in some smarts in the 'Record' method to handle when tons of samples are taken in a short period of time.

Though admittedly this samples / rate computation would live more naturally in 3rd party stuff, except that displaying this to users is so fundamental to letting them get started reverses my feeling to think we probably should keep including it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code will still be used. This should feed into this for the Profiler among other things.


Transports will need to output the number of bytes being sent in the current tick and the latency. The data will be written out to a `ProfilerTickData` chunk. This data will be generic and transport agnostic.

MLAPI will need to output the number of networked objects and the amount of RPCs being sent. This data will be added to the `ProfilerTickData` chunk.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the # of networked objects that are currently being managed? Subtle point, would the AOI-ness matter? Like would we count objects that are outside AOI for a client?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats an example. but ultimately it would be up to how that data is captured by MLAPI and isn't dependent on whether or not we have this system.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpreted this more as, how many objects sent state updates in this network tick. Both might be helpful though

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

In order to limit the interface, we are pushing the collected set or performance data to be obtainable only at the end of a tick. The following is roughly what this object could look like. Currently this object will be updated in `NetworkManager.LateUpdate()` for the MLAPI and `Transport.Send()` and other relevant transport method for the transport layer. At the end of the tick the user will be notified that the relevant data is complete and ready (both the MLAPI data and Transport data). Since MLAPI is responsible for collecting the data, and the data from transports will be generic and apply to all/most transports, MLAPI as the owner of the collection of performance data will call `GetTransportGetData()` on the transports through an interface in order to get the data exactly when it needs it in the pipeline before propagating the data out externally. Whether or not the data is collected per transport can also be toggled by a runtime flag `profilerEnabled`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this behavior vary whether we're a client or server?

And if we're a server, is this data aggregated for all connected clients or separated out per client? For example, with AOI the amount of data replicated will vary by client connection. Would you want to be able to see that? Or is it enough to lump it into one pile for all connections/clients?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dependent on what MLAPI wants to output. Currently i'll match 1-to-1 with existing Profiling which is all connections and server based.

void LateUpdate()
{
// ...
ProfilerTickData profilerData = new ProfilerTickData();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should use a pool, (impl I know)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will likely use a static pool where the receiver (third party) can't keep a reference or the reference is readonly

@LukeStampfli LukeStampfli added the open RFC is open for review and comments label Feb 15, 2021
@0xFA11 0xFA11 merged commit fcd900e into master Feb 16, 2021
@0xFA11 0xFA11 deleted the rfc/performance-data branch March 9, 2021 17:10
@LukeStampfli LukeStampfli added accepted RFC has been accepted and removed open RFC is open for review and comments labels May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

accepted RFC has been accepted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants