Bfs procedure call#1306
Conversation
There was a problem hiding this comment.
please remove comments
| // pdata->output = array_append(pdata->output, SI_ConstStringVal("parent")); | ||
| // pdata->output = array_append(pdata->output, SI_Node(NULL)); // Place holder. |
There was a problem hiding this comment.
please remove comments
| GraphContext *gc = QueryCtx_GetGraphCtx(); | ||
|
|
||
| // Setup context. | ||
| BFSContext *pdata = rm_malloc(sizeof(BFSContext)); |
There was a problem hiding this comment.
Can you move pdata initialization to a dedicated function?
| BFSContext *pdata = rm_malloc(sizeof(BFSContext)); | ||
| pdata->i = 0; | ||
| pdata->n = 0; | ||
| pdata->g = gc->g; | ||
| pdata->node = GE_NEW_NODE(); | ||
| pdata->parent = GE_NEW_NODE(); | ||
| pdata->output = array_new(SIValue, 6); | ||
| pdata->output = array_append(pdata->output, SI_ConstStringVal("node")); | ||
| pdata->output = array_append(pdata->output, SI_Node(NULL)); // Place holder. | ||
| pdata->output = array_append(pdata->output, SI_ConstStringVal("level")); | ||
| pdata->output = array_append(pdata->output, SI_LongVal(0)); // Place holder. | ||
| pdata->output = array_append(pdata->output, SI_ConstStringVal("path")); | ||
| pdata->output = array_append(pdata->output, SI_NullVal()); // Place holder. | ||
| ctx->privateData = pdata; |
There was a problem hiding this comment.
Move to a dedicated function
| GrB_Matrix R; | ||
| GrB_Matrix TR; | ||
| if(reltype == NULL) { | ||
| R = Graph_GetAdjacencyMatrix(gc->g); | ||
| TR = Graph_GetTransposedAdjacencyMatrix(gc->g); | ||
| } else { | ||
| Schema *s = GraphContext_GetSchema(gc, reltype, SCHEMA_EDGE); | ||
| if(!s) return PROCEDURE_OK; // Failed to find schema, first step will return NULL. | ||
| R = Graph_GetRelationMatrix(gc->g, s->id); | ||
| if(Config_MaintainTranspose()) TR = Graph_GetTransposedRelationMatrix(gc->g, s->id); | ||
| else TR = GrB_NULL; | ||
| } |
There was a problem hiding this comment.
This is shared with the simple BFS procedure. move to a shared file please
| assert(LAGraph_bfs_both(&V, &PI, R, TR, source_id, max_level, false) == GrB_SUCCESS); | ||
|
|
||
| // Remove all values with a level of 0, as they are not connected to the source. | ||
| GxB_Vector_select(V, GrB_NULL, GrB_NULL, GxB_NONZERO, V, GrB_NULL, GrB_NULL); |
There was a problem hiding this comment.
As this line appears both here and BFS, can you add this one to the algorithm itself?
| // Retrieve all tuples representing connected nodes and their levels. | ||
| GrB_Index *node_ids = rm_malloc(nvals * sizeof(GrB_Index)); | ||
| int64_t *node_levels = rm_malloc(nvals * sizeof(GrB_Index)); | ||
| GrB_Vector_extractTuples_INT64(node_ids, node_levels, &nvals, V); | ||
| pdata->node_ids = node_ids; | ||
| pdata->node_levels = node_levels; |
| // TODO this could be a unaryop | ||
| // q(i) = i+1 for all entries in q. |
There was a problem hiding this comment.
@jeffreylovitz is this your comment? because it is right!
There was a problem hiding this comment.
No this is Tim's comment, in his next GraphBLAS version there's a built in semiring which does that. as you can see the +1 operation is costly.
b6684ae to
05d2101
Compare
swilly22
left a comment
There was a problem hiding this comment.
I've yet to go over the entire PR but I feel like I've accumulated enough comments which are meaningful at this point. I think it would be best if we take this online.
| uint yield_count = array_len(yield_exps); | ||
| op->output = array_new(const char *, yield_count); | ||
| for(uint i = 0; i < yield_count; i ++) { | ||
| const char *yield = yield_exps[i]->operand.variadic.entity_alias; |
There was a problem hiding this comment.
I think that if the yield_exps array is just a collection of strings, we should move from AR_ExpNode ** to const char ** but can't we yield more complex expressions, e.g. CALL P() YIELD x + 6 ?
I see that we do support CALL P() YIELD x AS y which means 2 strings might be specified.
There was a problem hiding this comment.
This logic is all intended to handle the YIELD x AS y construction that you point out. I find it redundant and annoying; it is one of the things I would like to tackle if we refactor procedure logic more broadly.
| #define LAGRAPH_ERROR(message,info) \ | ||
| { \ | ||
| fprintf (stderr, "LAGraph error: %s\n[%d]\nFile: %s Line: %d\n", \ | ||
| message, info, __FILE__, __LINE__) ; \ | ||
| LAGRAPH_FREE_ALL ; \ | ||
| return (info) ; \ | ||
| } | ||
|
|
||
| #define LAGRAPH_MAX(x,y) (((x) > (y)) ? (x) : (y)) | ||
| #define LAGRAPH_MIN(x,y) (((x) < (y)) ? (x) : (y)) |
There was a problem hiding this comment.
This is probably going to collide with future LAGraph algorithm implementations, I suggest adding a simplified version of LAGraph.h
| (*v_output) = NULL ; | ||
| bool compute_tree = (pi_output != NULL) ; | ||
|
|
||
| #if defined ( GxB_SUITESPARSE_GRAPHBLAS ) \ |
There was a problem hiding this comment.
I wonder if we need this condition, as we know which version of GraphBLAS we're using.
| #endif | ||
|
|
||
| GrB_Index nrows, ncols, nvalA, ignore, nvals ; | ||
| if(A == NULL) { |
There was a problem hiding this comment.
We're guarantee to always provide at-least A, in most cases we'll probably provide both A and AT
As such I wonder if we should remove this condition.
| #ifdef GxB_SUITESPARSE_GRAPHBLAS | ||
|
|
||
| // The CSR vs CSC status can be tested in SuiteSparse:GraphBLAS. | ||
| // However, even with SuiteSparse:GraphBLAS, this step is optional. | ||
| GxB_Format_Value A_format = -1, AT_format = -1 ; | ||
| bool A_csr = true, AT_csr = true ; | ||
| if(A != NULL) { | ||
| // A_csr is true if accessing A(i,:) is fast | ||
| GxB_get(A, GxB_FORMAT, &A_format) ; | ||
| A_csr = (A_format == GxB_BY_ROW) ; | ||
| } | ||
| if(AT != NULL) { | ||
| // AT_csr is true if accessing AT(i,:) is fast | ||
| GxB_get(AT, GxB_FORMAT, &AT_format) ; | ||
| AT_csr = (AT_format == GxB_BY_ROW) ; | ||
| } | ||
| // Assume CSR if A(i,:) and AT(i,:) are both fast. If csr is false, | ||
| // then the algorithm below will reverse the use of vxm and mxv. | ||
| if(push_pull) { | ||
| // both A and AT are provided. Require they have the same format. | ||
| // Either both A(i,:) and AT(i,:) are efficient to accesss, or both | ||
| // A(:,j) and AT(:,j) are efficient to access. | ||
| if(A_csr != AT_csr) { | ||
| LAGRAPH_ERROR("A and AT must in the same format:\n" | ||
| "both GxB_BY_ROW, or both GxB_BY_COL", | ||
| GrB_INVALID_VALUE) ; | ||
| } | ||
| } else { | ||
| // only A or AT are provided. Refuse to do the pull-only version. | ||
| if(A != NULL && A_format == GxB_BY_COL) { | ||
| // this would result in a pull-only BFS ... exceedingly slow | ||
| LAGRAPH_ERROR( | ||
| "SuiteSparse: AT not provided, so A must be GxB_BY_ROW\n" | ||
| "(or provide both A and AT, both in the same format,\n" | ||
| "either both GxB_BY_COL or both GxB_BY_ROW)", | ||
| GrB_INVALID_VALUE) ; | ||
| } | ||
| if(AT != NULL && AT_format == GxB_BY_ROW) { | ||
| // this would result in a pull-only BFS ... exceedingly slow | ||
| LAGRAPH_ERROR( | ||
| "SuiteSparse: A not provided, so AT must be GxB_BY_COL\n" | ||
| "(or provide both A and AT, both in the same format,\n" | ||
| "either both GxB_BY_COL or both GxB_BY_ROW)", | ||
| GrB_INVALID_VALUE) ; | ||
| } | ||
| } | ||
|
|
||
| #endif |
There was a problem hiding this comment.
Might want to consider removing this section, as we know we'll be passing A in CSR format and in most cases we'll probably pass AT in CST as-well.
|
|
||
| // Retrieve all tuples representing connected nodes. | ||
| GrB_Index *node_ids = rm_malloc(nvals * sizeof(GrB_Index)); | ||
| GrB_Vector_extractTuples_INT64(node_ids, GrB_NULL, &nvals, V); |
There was a problem hiding this comment.
If we don't care about at which level each node was reached, we might modify the underline algorithm accordingly and save some time.
There was a problem hiding this comment.
Agreed. I had levels as an optional output at one point, but deleted it because it didn't seem to add much utility.
| // Append each reachable node to the nodes output array. | ||
| Node n = GE_NEW_NODE(); | ||
| Graph_GetNode(pdata->g, node_id, &n); | ||
| SIArray_Append(&pdata->output[pdata->nodes_output_idx], SI_Node(&n)); |
There was a problem hiding this comment.
Wouldn't multiple iterations of loop at line 110 would cause an overwrite of n ?
as Node n is on the stack and we're passing a pointer to it as a new entry of pdata->output
There was a problem hiding this comment.
SIArray_Append clones its input value, so each iteration of this loop is incapable of overwriting the values that have actually been placed in the array.
| Edge e = GE_NEW_EDGE(parent_id, node_id); | ||
| Graph_GetEdge(pdata->g, id, &e); | ||
| // Append the edge to the edges output array. | ||
| SIArray_Append(&pdata->output[pdata->edges_output_idx], SI_Edge(&e)); |
There was a problem hiding this comment.
Similar connect as for the node case.
There was a problem hiding this comment.
SIArray_Append clones its input value, so each iteration of this loop is incapable of overwriting the values that have actually been placed in the array.
| assert(GrB_Vector_extractElement(&parent_id, pdata->parents, node_id) == GrB_SUCCESS); | ||
| parent_id --; // Decrement the parent ID by 1 to correct 1-indexing. | ||
| // Retrieve any edge connecting the parent node to the current node. | ||
| EdgeID id = Graph_GetSingleEdgeConnectingNodes(pdata->g, parent_id, node_id, pdata->reltype_id); |
There was a problem hiding this comment.
To reduce code size and overloading graph.h/c with additional functionality I prefer we'll use Graph_GetEdgesConnectingNodes here.
|
|
||
| BFSContext *pdata = (BFSContext *)ctx->privateData; | ||
|
|
||
| // Return NULL if this source has already been mapped or there are no connected nodes. |
There was a problem hiding this comment.
source has already been mapped is a bit unclear,
The BFS scan has already been performed at the invocation phase. I consider this as "mapping" the node.
05d2101 to
cc6c7f1
Compare
jeffreylovitz
left a comment
There was a problem hiding this comment.
I think this is considerably cleaner than its previous iteration and is merge-ready pending the changes we discussed earlier!
| output->name = "propertyKey"; | ||
| output->type = T_STRING; | ||
|
|
||
| ProcedureOutput *outputs = array_new(ProcedureOutput *, 1); |
There was a problem hiding this comment.
ProcedureOutput *outputs = array_new(ProcedureOutput, 1);
| print("actual...") | ||
| print(actual_result) |
| self.env.assertEquals(actual_result.result_set, empty_result_set) | ||
|
|
||
| # Leaf node | ||
| query = """MATCH (e {v:'e'}) CALL algo.BFS(e, 0, NULL) YIELD nodes""" |
46a2215 to
3cc77df
Compare
* added init time NodeScanCtx update on label scan (#1358) * added init time NodeScanCtx update on label scan * rephrase comment Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Bfs procedure call (#1306) * Import LAGraph files * Apply autoformatter * Add LAGraph include file * WIP Add BFS procedure * tmp Utility * Revert LAGraph inclusions * Replace LAGraph reference with GraphBLAS references * WIP * WIP * Add BFSTree procedure * Add unit tests * Update documentation * Resolve compiler warnings * Address PR comments * Make singular BFS procedure with variable outputs * Remove start_node yield * Address PR comments * Remove yield-edges BFS input arg * Address PR comments * Switch to LAGraph_bfs_pushpull * Remove unused code paths from LAGraph BFS algorithm * procedure invocation receives yield array * consume BFS result using iterator * test BFS returning no results * clean up, address PR comments Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * version bump Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> Co-authored-by: Jeffrey Lovitz <jeffrey.lovitz@gmail.com> Co-authored-by: swilly22 <roi@redislabs.com>
* added init time NodeScanCtx update on label scan (#1358) * added init time NodeScanCtx update on label scan * rephrase comment Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Bfs procedure call (#1306) * Import LAGraph files * Apply autoformatter * Add LAGraph include file * WIP Add BFS procedure * tmp Utility * Revert LAGraph inclusions * Replace LAGraph reference with GraphBLAS references * WIP * WIP * Add BFSTree procedure * Add unit tests * Update documentation * Resolve compiler warnings * Address PR comments * Make singular BFS procedure with variable outputs * Remove start_node yield * Address PR comments * Remove yield-edges BFS input arg * Address PR comments * Switch to LAGraph_bfs_pushpull * Remove unused code paths from LAGraph BFS algorithm * procedure invocation receives yield array * consume BFS result using iterator * test BFS returning no results * clean up, address PR comments Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Separate ExecutionPlan op construction logic into different files (#1352) * Separate ExecutionPlan op construction logic into different files * Address PR comments * Refactor header files * Address PR comments Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Support array properties in bulk loader (#1365) * Support array properties in bulk loader * Address PR comments * Add no_multi_key & hash_policy (#1363) * Add no_multi_key & hash_policy * Update ramp.yml At the moment we can't handle user hash policy due to virtual keys Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Add LGTM (#1367) * read only query (#1364) * read only query * added sleep, try to avoid ci failure * added more write scenarios. added replica shutdown * added skip flag to avoid memcheck failure * fixed PR comments * Update module.c * try to skip on any debugger. added more write scenarios * checkSlaveSynced moved to use time.sleep. Another try to skip debuger * added RO_QUERY to commands Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Fix typo (#1368) Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Add GRAPH.BULK documentation and link to bulk loader documentation (#1371) * Add GRAPH.BULK documentation and link to bulk loader documentation * Address PR comments * Allow upper case letters in proc names (#1372) * added to_lower for procedure name in Proc_get to support both upper and lower cases * new test for lookup_case_insensitive * Created strutil and moved the implementations of str_tolower and str_toupper to it. Remove the static implementations of these functions and replaced every call with a new call for str_tolower. minor changes in getProc and procRegister due to PR * fix indentations * fix indentations * minor changes * minor changes * indentations fix * indentations fix * indentations fix * indentations fix * indentations fix * PR fixes * PR fixes * PR fixes * PR fixes * PR fixes * PR fixes * PR fixes * PR fixes Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Adding a new procedure that returns all the procedures in the system (#1377) * Adding a new procedure that returns all the procedures in the system. This procedure may yield up to two fields for each procedure: its name and/or its mode (READ/WRITE). * PR fixes * Saving raxIterator reference instead of allocating it on the heap in proc_proceudres_private_data. More PR code fixes * Remove the getter "Procedure_ReadOnly" that gets only the procedures name, leaving only the getter "Procedure_IsReadOnly" that gets the whole procedure object. * Always set GraphCtx in QueryCtx while decoding graph keys (#1381) * LGTM alert fixes (#1383) Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Avoid double free of heap-allocated values in CASE...WHEN statements (#1386) * Avoid double free of heap-allocated values in CASE...WHEN statements * Update test_function_calls.py Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * With filter scoping (#1361) * Don't migrate WITH filters into Merge and Apply scopes * Don't place filters in previous scope for filtered aliases * Fix memory error * remove vector * consolidate operation locating * execution plan construction refactoring WIP * move arithmetic expression construction from AST to AR_EXP * discover skip, limit and black-listed ops modifiers * PR fixes * Fixes continued * Fix ResultSet updates in UNION queries * Clarify comments, minor cleanup * Fix unit tests * Fix caching logic for SKIP and LIMIT * Address PR comments * tidy up execution plan construction and join op * reset limit on aggregation * define 16 as the default batch size for traversal operations Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Updates to Astyle config file (#1384) Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * replace gitter with Discord (#1395) * reaplace gitter with Discord * replace gitter to discord * Allow property accesses on non-identifier entity references (#1389) * Allow property accesses on non-identifier entity references * access entity attribute only via the property function * address PR comments Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> * Minor suggested update. (#1398) * replace automation service (#1399) * redundent filter construction and placement (#1397) * redundent filter construction and placement * Add validating flow tests Co-authored-by: Jeffrey Lovitz <jeffrey.lovitz@gmail.com> * Parse full argv array on main thread (#1298) * Parse full argv array on main thread * remove read_flags from cmd_query Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> Co-authored-by: swilly22 <roi@redislabs.com> * query validation should not check if procedure outputs been defined (#1406) * query validation should not check if procedure outputs been defined * address PR comments * Add trusted-by logos (#1412) * Add trusted-by logos * add more logos * Fuzzy test crashes (#1408) * Correctly place filters without referenced variables * Emit compile-time error on constant non-boolean filter * Validate references per clause instead of per scope * Emit error on failed filter placement in optimize_cartesian_product * Fix free of invalid value in OpUnwind * Handle NULL sources in variable-length traversals * Address PR comments * Post-rebase fixes * Address PR comments * introduce AST unsupported op error Co-authored-by: swilly22 <roi@redislabs.com> * Graph version (#1382) * add graph version to graph context object * change graph version from string to uint32 * testing graph version * response with an error when graph version mismatch * release graph context on version mismatch * address PR comments * increase command arity to accomadate graph version argument * address PR comments Co-authored-by: DvirDukhan <dvir@redislabs.com> * Release version 2.2.8. Merged from master. Co-authored-by: DvirDukhan <dvir@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> Co-authored-by: Jeffrey Lovitz <jeffrey.lovitz@gmail.com> Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Guy Korland <gkorland@gmail.com> Co-authored-by: Yahya <5457202+anakaiti@users.noreply.github.com> Co-authored-by: Simon Prickett <simon@crudworks.org>
* Import LAGraph files * Apply autoformatter * Add LAGraph include file * WIP Add BFS procedure * tmp Utility * Revert LAGraph inclusions * Replace LAGraph reference with GraphBLAS references * WIP * WIP * Add BFSTree procedure * Add unit tests * Update documentation * Resolve compiler warnings * Address PR comments * Make singular BFS procedure with variable outputs * Remove start_node yield * Address PR comments * Remove yield-edges BFS input arg * Address PR comments * Switch to LAGraph_bfs_pushpull * Remove unused code paths from LAGraph BFS algorithm * procedure invocation receives yield array * consume BFS result using iterator * test BFS returning no results * clean up, address PR comments Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>

This PR introduces two new procedure calls,
algo.BFSandalgo.BFSTree.Each accepts a source node, an optional maximum level to visit, and an optional relationship type to traverse.
They differ in that
algo.BFSonly returns each connected node and its level relative to the source, whilealgo.BFSTreealso includes the entire path from source to destination. As the latter requires a series ofGrB_Vector_extractElementcalls, it is expected to be somewhat less performant than the simpler variant.Resolves #1304