February 14th, 2020 – Delaware – John Snow Labs team is pleased to announce the immediate availability of Spark NLP 2.4. This is the library’s biggest release ever, with major accuracy & scalability improvements across the open source, enterprise, and healthcare editions.
The changes include improvements to the core architecture of the library, retraining of all pre-trained models from scratch, and a suite of new pre-trained models & deep-learning networks that leverage new academic research results from 2019. In most cases, this release is the first production-grade, scalable, and trainable implementation of these new research made available to the AI community.
Named entity recognition: Spark NLP 2.4 still makes half as many mistakes as spaCy 2.2
Named entity recognition (NER) for entities such as people, places, drugs, genes, and others from free text is one of the most widely used NLP tasks. Transformers such as BERT, ELMO and others have improved the achievable accuracy on NER over the past two years – and Spark NLP 2.4 now comes with several out-of-the-box pipelines that make the most of these innovations:
Object Character Recognition (OCR): Automated image enhancement & scalable pipelines
Spark OCR is now a separate library from Spark NLP – enabling to configure object character recognition pipelines that improve accuracy for specific document types.
Spark OCR is now being in production in various large-scale, high-compliance use cases to read clinical records, faxes, invoices, books, and other document types. This new release has enabled customers to reach and surpass the accuracy previously achieved by OCR industry leaders such as Abbey, AWS, and Google Cloud – by implementing image processing algorithms, automating their selection and use, and enabling users to tune OCR pipelines for domain-specific document types.
Spark OCR is unique in its ability to scale OCR processing on any Spark cluster, unify image processing with downstream information extraction from text (using NLP techniques), and running on a customer’s infrastructure without requiring sharing or sending documents to a cloud provider.
Context-Based Text Matching: Accurately extract facts from large documents
A common NLP use case is extracting structured data from large documents. Financial statements, medical records, and legal documents can often be hundreds of pages long. In such cases, finding a specific fact – like a date, a monetary value, or a name – can be challenging since a document can include hundreds of such values to choose from.
Spark NLP 2.4 includes a context-based text matcher which enables users to specify the context inside a document in which a match should be searched for. The algorithm then first finds the relevant context and then performs a deeper search for the request fact within it.
Clinical entity resolution: Accurately map entities to large, hierarchical ontologies
Spark NLP for Healthcare already had the ability to map clinical entities to medical terminologies – such as drugs to RxNorm codes, procedures to ICD-10-PCS or CPT codes, and others. This release brings new pre-trained models with better accuracy:
More new functionality
The Spark NLP 2.4 Release Notes list the entire set of new features, upgrades, and bug fixes within this major release. Major new features include:
“This release continues our years-long commitment to provide our customers and the AI community the world’s most accurate, fast, and scalable NLP library”, said Saif Addin-Ellafi, lead Spark NLP developer at John Snow Labs.
About John Snow Labs
John Snow Labs Inc. is an award-winning healthcare AI & NLP company, accelerating progress in data science with state-of-the-art platforms, models and data. A third of the team have a PhD or MD degree and 75% of team members have at least a Master’s, coming from multiple disciplines covering data science, medicine, data engineering, pharma, security, and DataOps. A Delaware Corporation, John Snow Labs runs as a global virtual team located in 20 countries around the globe. We believe in being great partners, in making customers wildly successful, and in using data science to make the world a better place.
About John Snow Labs’ Spark NLP
Spark NLP is an open-source library for natural language processing in Python, Java, and Scala. Based on the most recent O’Reilly was AI Adoption in the Enterprise survey of 1,300 practitioners, Spark NLP is the most widely used NLP library in the enterprise today. It provides the AI industry with state-of-the-art, production-grade natural language processing capabilities – based on the most recent research results in deep learning, transformers, and distributed systems. The Spark NLP community enjoys new releases of the library every two weeks on average since the beginning of 2018. John Snow Labs continues to grow its team and support of the library, as well as license commercial Enterprise and Healthcare NLP software products that extend it.
Media ContactCompany Name: John Snow LabsContact Person: Ida LucenteEmail: Send EmailPhone: +1 (302) 786-5227Address:16192 Coastal Highway City: LewesState: Delaware 19958Country: United StatesWebsite: www.johnsnowlabs.com