Monday, November 1, 2021

Apache Drill: the reports of my death have been greatly exaggerated

There’s a somewhat breathless post entitled “The Death of Apache Drill” in a blog that has as a theme the imminent demise of technologies previously or currently associated with Hadoop, with the exception of Trino (formerly known as PrestoSQL). It’s ultimately a promotional piece for the website’s owner, which is entirely normal and usually it wouldn’t warrant further mention. But it’s done whatever it is that it takes to climb up to the first page of the search results for “Apache Drill” so we’d like to share some thoughts of our own on a few of the points it makes. I’ll forgo placing a link here but you will not find it difficult to locate the original.

Firstly, the title proclaims a little too much. Drill did suffer the loss of its primary corporate backer, and of course its pulse has been faint as a result, but we invite the author to visit the project and reconsider his declaration of death. We don’t have hundreds of active contributors making thousands of commits a year but there are enough of us to get bugs fixed, new data sources supported, and performance and reliability improved. In the near future I’ll blog about our recent work on Elasticsearch, Iceberg, JDBC writing, Parquet formats and Phoenix.

We’ve started talking about speeding up our release cadence to better reflect our recent activity. We’re rekindling the project’s communication channels, and improving and translating our documentation. Metrics like downloads of Drill-related software suggest to us that interest has stopped trending down and started trending up. If this is death, in short, then the phenomenon is a lot less about resting in peace than we’ve allowed ourselves to suppose.

Next, the notion that Drill is “tied”, locked in, to MapR and Hadoop. As far as Apache Drill is concerned, this has never been true in the time I’ve worked with it. You require nothing from MapR, nor do you need to run a single Hadoop service, in order to starting querying using the Drill binaries we distribute with default settings. That is not to say that you cannot integrate Drill with MapR products and Hadoop, it supports these things well and its history is certainly intertwined with theirs, but you need never configure them e.g. if you’re using cloud object storage instead. We do reuse some library code from Hadoop in our codebase, but so does Trino. The subsequent dichotomy implied by the post’s “Proprietary Solutions vs. Open Source” section heading is a false one, Apache Drill is entirely open source.

On, to the sentiment that users of Hadoop should be “fearful”. Hadoop probably was overdeployed as many of us rushed to cargo cult another Big Tech innovation that was developed for a context that only some of us actually share. Some of those deployments will likely revert to something simpler or better matched to the problem at hand. Nevertheless Hadoop is mature and capable software that solves a certain set of problems very well, it lives at Apache, and it is not about to vanish in a puff of smoke. I see no need at all for its users to feel afraid, regardless of how their big data stacks might evolve in the future. After all, who knows what the next scraps to fall from the Big Tech banquet table will be?

On performance and concurrency issues, I don’t have enough information to add anything useful to this. If they’re code problems, rather than misconfiguration, then we’d certainly make them a priority. It’s worth noting that, while there are projects that focus on speed above all else, contemporary Drill places as much weight on flexibility as it does on speed. And what about all the praise heaped on Trino? Well, we agree: this impressive project has accomplished a tremendous amount and we wish them the best for the future.

Drill is it a very interesting point in its history. It presents a unique opportunity to developers who would like to challenge themselves in that individual contributions are not diluted in a sea of commits from others, and even newcomers can have a major impact. If you’d like to come and pick an interesting problem in Drill to solve please feel welcomed, you’ll find us a friendly bunch. If you’d like a job working full time on Drill then send an email to me at dzamo at apache.org.



from Hacker News https://ift.tt/3pVeaWE

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.