- For monopartite batching assign self loop relationships for two node groups to the same relationship group. Allows for improved efficiency for most real-world monopartite graphs.
- Benchmarking module including:
- Methods to generate synthetic data that maps to each ingest method in the package
- Generate_benchmarks method to iterate through parameters and collect benchmarking information for each ingest method in the package
- Visualizations to easily analyze results
- Method to partition results by package version automatically
- Methods to retrieve and format real data for benchmarking scenarios
- Make
num_groups
optional parameter in ingest function. Allows for improved performance.
- Swap heatmap visualization axes
- Add group ID to each cell in heatmap
- Heatmap scale start at 0
- Update monopartite batching algorithm
- Update ingest algorithm
- Examples demonstrating each parallel ingest method with real data
- Fixed delimiter in generated group ID to be consistent between group and batch processes in monopartite
- Fixed monopartite batching color coding process where sometimes a property value would be found in conflicting groups
- Added additional tests for monopartite batching
- update how
num_groups
in ingest function is calculated internally
- Fix monopartite batch assignment bug
- Fix bug in ingest where partitioning was not performed according to defined groups
- Add changelog and PR template
- Add verification function to assert Spark version is compatible (>= 3.4.0)
- Replace
final_group
column withgroup
column - Add heatmap visualization module
- Add example notebook demonstrating the heatmap module and updated README.md
- Initial release and imports update