Allow to specify custom tags to decide the assignment of servers#30
Conversation
|
You should add some integration test cases. |
Yes, i’m doing it. Sorry not to add it firstly. |
|
Em...If we use HDFS to store data, the shuffle server will not store data, |
| // it will incur humongous allocation, so we set it to 14m. | ||
| public static String RSS_CLIENT_READ_BUFFER_SIZE_DEFAULT_VALUE = "14m"; | ||
| // The tags specified by rss client to determine shuffle data placement. | ||
| public static String RSS_CLIENT_DATA_PLACEMENT_TAGS = "rss.client.data.placement.tags"; |
There was a problem hiding this comment.
Could we give it a better name ?
The reason is as below:
#30 (comment)
There was a problem hiding this comment.
rss.client.shuffle-server.assignment.tags ?
There was a problem hiding this comment.
rss.client.assignment.tags will be better.
| public static String RSS_CLIENT_READ_BUFFER_SIZE_DEFAULT_VALUE = "14m"; | ||
| // The tags specified by rss client to determine shuffle data placement. | ||
| public static String RSS_CLIENT_DATA_PLACEMENT_TAGS = "rss.client.data.placement.tags"; | ||
| public static String RSS_CLIENT_DATA_PLACEMENT_TAGS_DEFAULT_VALUES = Constants.SHUFFLE_SERVER_VERSION; |
There was a problem hiding this comment.
A shuffle server can have multiple tags. The version tag shouldn't be overrided the configurable tags.
There was a problem hiding this comment.
Ok,I will remove the default value of this conf
There was a problem hiding this comment.
Actually, our client tags should contain version tag in any time.
Codecov Report
@@ Coverage Diff @@
## master #30 +/- ##
============================================
+ Coverage 56.83% 56.87% +0.03%
- Complexity 1204 1207 +3
============================================
Files 152 152
Lines 8401 8429 +28
Branches 813 816 +3
============================================
+ Hits 4775 4794 +19
- Misses 3368 3376 +8
- Partials 258 259 +1
Continue to review full report at Codecov.
|
roryqi
left a comment
There was a problem hiding this comment.
LGTM, thanks for your contribution.
|
Would you like to add some docs about this feature in another pr? Because this feature brings some user-facing changes, we need some docs. |

What changes were proposed in this pull request?
Allow to specify custom tags to decide data placement.
Changelog
rss.server.tagsfor shuffle serverrss.client.assignment.tagsfor client to choose the data placementWhy are the changes needed?
Sometimes, we hope the specified spark/mr job's shuffle data can be stored the specified group of shuffle servers, maybe due to DC location and so on.
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
UTs.