lttore.blogg.se

Training apache lucene
Training apache lucene













  1. Training apache lucene code#
  2. Training apache lucene license#

Training apache lucene code#

Indeed, over time Amazon developers have steadily increased our participation in open source projects as a way to better serve customers, even in strategic areas like search that could yield competitive differentiation.įirst, as McCandless says, “The community is a fabulous resource: they suggest changes, and make the source code better.” By working with the Lucene community, we are better able to help our customers find the products they want, faster.

Training apache lucene license#

While the Apache License (Version 2.0) allows developers to modify the code without contributing changes back to the upstream community, Amazon chooses to actively contribute back to Lucene and other projects. In pushing Lucene to its limits, Amazon developers uncovered “rough edges,” bugs, and other issues, according to McCandless. Compared to Amazon’s internal product search service, McCandless argued, “Lucene has more features, is moving faster, has lots of developers working on it, offers a much bigger talent pool of experienced search developers, and more.”Īll of which, while true, doesn’t necessarily explain why Amazon contributes to upstream Lucene. Ultimately, it’s that community of sophisticated users that makes Lucene hum, and which made it such an attractive option for Amazon’s product search team.

training apache lucene

Lucene “isn’t a toy,” McCandless declared, “It’s used in practice all over the place by companies like Twitter, Uber, LinkedIn, and Tinder.” Many other teams at Amazon have used Lucene for years across a variety of applications, though not previously for product search. Second, while we might have worried about whether Lucene could meet our functionality and performance requirements, it’s not as if we’d be alone in using it at serious scale. In a follow-up discussion with me, McCandless stressed that the decision wasn’t trivial given our “very large, high-velocity catalog with exceptionally strong latency requirements and extremely peaky query rates.” Against such stringent demands, the product search team was unsure whether Lucene could keep up.įirst off, McCandless said, Lucene has attracted a massive community of passionate people who are constantly iterating on the technology. In a Berlin Buzzwords 2019 talk, McCandless (and Amazon search colleague Mike Sokolov) walked through the reasons that Amazon, after years of success with a homegrown search engine, elected to embrace Lucene. …so long as he could continue to contribute changes upstream, back to the open source Lucene project. McCandless, who joined Amazon in 2017, says that “the incredible challenge” of configuring Apache Lucene to run at Amazon scale was “too hard to resist”… To get a deeper appreciation for Amazon’s embrace of Lucene, I caught up with Mike McCandless, a 12-year veteran of the Lucene community.

training apache lucene

Although Amazon has powered its product search for years with a homegrown C++ search engine, today when you search for a new book or dishwasher on Amazon (or when you ask Alexa to search for you), you’re tapping the power of Apache Lucene (“Lucene”), an open source full-text search engine. It becomes dramatically harder, however, when searching at Amazon scale: think billions of products, complicated by millions of sellers constantly changing those products on a daily basis, with hundreds of millions of customers searching through that inventory at all hours. Results are presented in a sensible way.At pretty much any scale, search is hard. Enterprise search Solr systems index structured and unstructured data and documents from a variety of sources including file systems, intranets, document management systems, e-mail, and databases. Sirius provide Enterprise Search strategy, deployment and system integration, training, support and managed services using Apache Lucene/Solr.















Training apache lucene