If you ask us what the single biggest "technical challenge" of creating research※mesh is, the answer is unequivocal: corralling article metadata.
So here's a nerdy little piece about metadata, just in case you are interested in learning more about what "still searching for metadata" means in your admin report! 🤓
First things first: the research publishing landscape is a jungle. A chaotic one.
The scientific publication ecosystem is a mishmash of legacy systems and new technology platforms cobbled together in dynamic and ever-evolving ways. No single entity is "in charge" of the academic publishing ecosystem—but it is exactly that, an ecosystem. And it is an ecosystem that has some "Wild West" characteristics.
Different publishers (and different conglomerates of publishers) follow different standards for the amount and type of metadata they make available on new articles. Journals themselves can be managed by corporations or by individuals off the side of their desk—with varying interest, attention and commitment to the metadata of the articles they publish. And, when good metadata does exist, it might be strewn across different registries or be inconsistently indexed across various knowledge graphs.
The best part? Every single new system that has been created to address challenges or problems within the ecosystem has, itself, become yet another part of the ecosystem.
Why is sufficient article metadata so important?
To automatically generate content with precision and accuracy, research※mesh requires substantive and dependable metadata as "ground truth." The reason that our use of artificial intelligence is so consistently accurate on this platform is that the "AI part" of the process is actually only a very thin layer of the process, built on top of a much, much bigger stack of linear indexing and retrieval processes to ensure systematic and precise metadata for every piece of research showcased in a newsletter.
When you see "still collecting metadata" status for a publication in the queue, it simply means that the underlying data retrieval processes have not yet been able to establish enough ground truth about the article to proceed.
What causes insufficient metadata for an article?
Missing and incomplete metadata can be the result of lags between when new articles show up in some registries to when the metadata is propagated in other databases and graphs. There can sometimes be errors in the system (such as a publisher incorrectly specifying the DOI for the article) that can take days to months to be corrected across system. It can also be because the publisher or journal has simply not provided the data at all.
The beauty and power of research※mesh is that it automates the task of going back and searching for correct (or corrected) metadata over the course of time. This is a terrific job for a robot and a painfully tedious job for a human. Thank you, robots.
We are still working on the edge cases.
When you receive a "missing metadata" notice in your admin report, so does our team. We get the same alert, too. These are the outlier articles. A top priority for us is innovating solutions for these remaining categories of edge cases. The fact that we have already figured out how to handle so many quirky scenarios so far gives us confidence. However, as we solve the easier problems, the remaining ones become increasingly difficult, which is the inherent nature of edge cases.
If you have been using research※mesh for a while, you have probably noticed that the vast majority of metadata issues eventually resolve over time as the requisite data is identified and retrieved. But, yes, we do still definitely have a few hard nuts left to crack! So we just wanted to provide this little summary to highlight this point: when you see missing metadata notice, we are on the case! 🦸🤖📄