▲ 8 ▼ Easy distributed joins with pachyderm

medium.com posted by kenny 3433 days ago

Distributing your dataset/database joins can be a daunting task, to say the least. Not only do you need to think about how your data is sharded, indexed, etc., you need to think about what types and sizes of resources you need to allocate based on the scale of your data. For many, these considerations cause them to retreat to brute forcing their data transformations on increasingly beefy boxes.