Parameters
Many parts of the extractor, even though they are well-documented, often lack an explanation as to why the code has been made that way. Often decisions on how to structure code, which clients to use first, ... are done based on these parameters:
- network usage: smaller network requests are preferred with respect to bigger ones, if the data received is the same
- performance: the less and smaller network requests, the better in order to save time, even if sometimes this means not having the full data
- stream availability: for example in YouTube different clients provide different streams, so requests are made to query various clients and obtain as many streams as possible, but in the correct order
- code and computational complexity: for example in YouTube clients that provide streams without signatures are preferred
- certainty that a request will have success: sometimes we may try multiple ways of fetching data in order for things not to break in case of changes on the services' side
- and more
Wiki proposal
The motivations for the decisions are sometimes put in comments near the code (e.g. see code below), but sometimes not even that. This means that if I were to edit the code, I would probably forget about the decisions that were made and structure the code in a non-optimal way, even if somebody already made the research needed to find the optimal way.
I would propose to add in NewPipe's wiki a section to be kept up-to-date that explains all decisions being made, e.g. how the parameters above were taken into consideration, or an analysis the consequences of a specific solution.
Example
An example of why this would benefit us is the following code:
|
// Use the androidStreamingData object first because there is no n param and no |
|
// signatureCiphers in streaming URLs of the Android client |
|
streamingDataAndCpnLoopList.add(new Pair<>(androidStreamingData, androidCpn)); |
|
streamingDataAndCpnLoopList.add(new Pair<>(html5StreamingData, html5Cpn)); |
|
// Use the iosStreamingData object in the last position because most of the available |
|
// streams can be extracted with the Android and web clients and also because the iOS |
|
// client is only enabled by default on livestreams |
|
streamingDataAndCpnLoopList.add(new Pair<>(iosStreamingData, iosCpn)); |
Is it really needed for us to fetch three different clients? The pros are higher stream availability and more certainty that something will work. The cons are higher network usage and lower performance. Is this a good tradeoff? We would need some data in order to answer this question. Not just "I tried and it seemed better this way", but rather some kind of test to find e.g. the performance of various approaches.
Another example is #864: the explanation for why the decision was made is provided in the PR description, but soon everybody of us except the coder of that code (i.e. @AudricV) will probably forget about it.
Parameters
Many parts of the extractor, even though they are well-documented, often lack an explanation as to why the code has been made that way. Often decisions on how to structure code, which clients to use first, ... are done based on these parameters:
Wiki proposal
The motivations for the decisions are sometimes put in comments near the code (e.g. see code below), but sometimes not even that. This means that if I were to edit the code, I would probably forget about the decisions that were made and structure the code in a non-optimal way, even if somebody already made the research needed to find the optimal way.
I would propose to add in NewPipe's wiki a section to be kept up-to-date that explains all decisions being made, e.g. how the parameters above were taken into consideration, or an analysis the consequences of a specific solution.
Example
An example of why this would benefit us is the following code:
NewPipeExtractor/extractor/src/main/java/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeStreamExtractor.java
Lines 1174 to 1181 in 9f9af35
Is it really needed for us to fetch three different clients? The pros are higher stream availability and more certainty that something will work. The cons are higher network usage and lower performance. Is this a good tradeoff? We would need some data in order to answer this question. Not just "I tried and it seemed better this way", but rather some kind of test to find e.g. the performance of various approaches.
Another example is #864: the explanation for why the decision was made is provided in the PR description, but soon everybody of us except the coder of that code (i.e. @AudricV) will probably forget about it.