Hexagon shaped overlay

Co-occurrence analysis spots nanotechnology breakthroughs

Adam Sanford
Hexagon shaped overlay

Co-occurrence analysis spots nanotechnology breakthroughs

From gene therapy and drug delivery to energy generation and catalysis, nanotechnology is revolutionizing many scientific fields. It’s an exciting time to be involved in nanotechnology research, but as studies proliferate, it also becomes difficult for individual researchers to pinpoint potential breakthroughs.

We recently discussed our novel methodology for analyzing the CAS Content Collection™, the largest human-curated repository of scientific publications, to develop CAS TrendScape™ maps of nanotechnology concepts. These maps provide a wealth of information in a condensed and easy-to-read format by clustering similar concepts together. Our analysis leveraged Natural Language Processing (NLP) techniques and human curation to synthesize vast amounts of data and reveal emerging areas of research.

This analysis can then be extended for greater insight — by examining the co-occurrences of concepts in the literature, we can identify promising intersections of nanomaterials and potential applications. This approach helps researchers quickly identify emerging ideas and their potential relevance in a complex innovation landscape.

How CAS TrendScape maps identify key concepts

To obtain a landscape view of the complex and growing field of nanotechnology, our first task was to identify the existing body of published literature. Using a customized search query, we identified about 3 million documents related to nanotechnology in the CAS Content Collection. We then used stemming and lemmatization text preprocessing techniques in Natural Language Toolkit (NLTK) to identify candidate phrases from titles and abstracts that could signal emerging topics.

After applying additional parameters around publication growth rate — the difference in the number of documents published on topic compared to the previous year — and completing extensive manual curation, we produced a graphical representation of emerging concepts in nanoscale materials, applications, and properties (see Figure 1). We then focused on the applications and materials branches to explore specific co-occurrences between the concepts.

Figure 1: CAS TrendScape map of the applications, materials, and properties in nanoscience which showed high growth in recent years. Source: CAS Content Collection

An NLP-based analysis reveals how ideas come together

To understand connections between concepts in the TrendScape map, we performed an NLP-based analysis which counts the number of co-occurrences of individual concepts in the same sentences of journal abstracts. This allowed us to quantify the degree of connection between any two concepts. Also, because we analyzed in-sentence co-occurrence, the likelihood of coincidental co-occurrence was kept low.

We counted the number of document titles and abstracts in which any phrase in one topic appeared in the same sentence as any phrase in another topic. For example, a document where the term “solar cells,” “photovoltaics,” or “perovskite solar” appeared in the same abstract sentence as “nanocavities'' or “nanocavity” was counted as a co-occurrence of the topics “solar cells” and “nanocavities.” We applied this analysis to journal and patent publications from 2019-2022 so that we could isolate the most recent emerging concepts.

The plots in Figure 2 show the average number of documents published between 2019 and 2022 where pairs of terms co-occur in the same sentence (x axis) and the average growth rate of documents with those co-occurrences over the same time period (y axis). For clarity, combinations are separated into two figures, showing concept co-occurrence within the same maps (i.e., terms that both appear in the application map or in the materials map), and term co-occurrence in different maps. The general trend observed in this data is that there is a wide range of growth rates for the combinations with relatively low publication frequency, with a long tail extending to high publication numbers but relatively low growth rates.

The most interesting concept pairs fell into two categories: pairs with a high growth rate, which shows emerging connections between concepts, and pairs with a high number of documents but a low growth rate, suggesting they are more well-established connections.

One important example of a fast-growing co-occurrence is where vaccines and lipid nanoparticles appear prominently. The exceptionally high growth rate of this combination is attributed to the large number of publications relating to COVID-19 vaccines in 2021-2022. The co-occurrence of concepts relating to vaccines at the time this topic was dominant in scientific research helped to validate our methodology.

Figure 2: Average percentage of year-over-year growth rate versus absolute number of publications from 2019-2022 for concepts co-occurring in the same sentence in journal abstracts for concepts (A) in the same TrendScape map and (B) in different TrendScape maps. S/P/C refers to synthesis, properties, and characterization concepts. Source: CAS Content Collection

As we explored co-occurring concepts more deeply, we revealed several other notable emerging topics:

  • Nanogenerators: Nanogenerators, specifically triboelectric and piezoelectric nanogenerators, are an important application identified in the CAS TrendScape map of applications. These devices generate electricity from motion through charge separation when two surfaces interact (tribo) or deform (piezo). Their growing frequency in publications is likely due to their use in wearable devices such as sensors and human-machine interfaces.

    We analyzed which materials co-occur most often with these applications, and we found that nanofibers and zinc oxide are most prominently associated with nanogenerators, with hydrogels growing quickly as well (see Figure 3). Nanofibers are of particular interest with triboelectric nanogenerators because of their high surface area, flexibility, and the possibility of synthesizing customized nanofiber materials using electrospinning.
Figure 3: Average 2019-2022 growth rate versus number of publications over that time period for terms co-occurring with nanogenerator applications. Source: CAS Content Collection

  • MXenes: MXenes are a class of inorganic 2D materials showing significant growth in research interest. Their main applications include electrocatalysis, photocatalysis, and batteries (see Figure 4). MXenes are well suited to these uses because of their high surface area, conductivity, and versatility by altering their surface functionality and/or combining them with other nanoscale materials such as carbon nanotubes. As the energy transition continues and catalysis and battery energy storage become increasingly important, MXenes may play a key role in maintaining renewable energy sources due to their properties. From composite electrodes with graphene to thermal energy storage applications, MXenes’ morphology, electrical properties, and customizability mean they will be important concepts for years to come.
Figure 4: Average 2019-2022 growth rate versus number of publications over that time period for terms co-occurring with MXenes. Source: CAS Content Collection.

  • Nanoplastics: Nanoplastics are typically defined as polymers being 1 µm or less in size, and they are the result of both intentional manufacturing and the fragmentation of larger plastics. Concerns are growing over nanoplastic waste and pollution impacting the oceans, biological life in general, and human health, and our analysis showed this material co-occurring with concepts including toxicity.

    Nanoplastics also co-occurred with other nanoparticles including silver and titanium dioxide in the context of removal from the environment and combined toxic effects. We expect to see continued growth in research interest relating to nanoplastics and potential nanotechnology-driven solutions for environmental remediation.

Innovation happens at the intersection of ideas

Leveraging technology and human expertise to identify emerging concepts in the literature is valuable for better understanding the research landscape in high-impact areas. Examining concept co-occurrence can reveal critical, otherwise unseen connections that help innovators identify investment opportunities and potential new research directions that could lead to the next important breakthrough.

Read more about the latest nanotechnology discoveries in our new CAS Insights Report.

Gain new perspectives for faster progress directly to your inbox.