
Hadoop is a great solution to the big data problem and with the instant access to servers and storage in the cloud, it's easier than ever to spin up and manage your own cluster. If you haven't heard too much about it yet, hadoop provides access to a distributed file system along with a framework for running map reduce jobs over the data. It takes care of replicating chunks of data to each node and running jobs in parallel for you. However, when you want to expand your hadoop cluster across availability zones you can run into some unexpected problems. So lets dig into the ideas we tried and the final solution that worked the best for our configuration.
→ Read More