Text Mining Service For Researching Aggregated Data
The idea? Put publicly available datasets all in one place so it’s easier for researchers to do text mining and analysis on over 19 million academic articles. The CORD-19 dataset, a full-text and metadata dataset of COVID-19 and coronavirus-related research articles, is included.
Amy Kirchhoff is the Text and Data Mining Business Manager at ITHAKA, the organization that runs popular academic research site JSTOR. Kirchhoff is also an Archive Service Product Manager at Portico, ITHAKA’s content preservation archive. She leads a team building a new service to allow users to mine datasets related to COVID-19 research along with a host of other related data and published papers. “We are building a text mining service,” Kirchhoff says of the new program, which allows customized searches of aggregated datasets related to the novel coronavirus outbreak.
Teaching A Generation of Scholars To Text Mine
“Our TDM service is going to teach a generation of scholars to text mine,” Kirchhoff tells Cronicle. “We wanted this to be a platform to bring in content that doesn’t necessarily sit within our space.”
Research is still just beginning, but never before in history have research studies, vaccine development studies, and data mining programs been so accelerated in a race for solutions to a global threat. But with so many studies in beginning stages, where does all this data come from?
“There’s a big push for research on COVID-19,” Kirchhoff explains, so the organization pulled from several sources of data to create a platform where users can draw on several sources of information, including the CORD-19 dataset that makes research data available to researchers. A pilot up is and running that you can check out here.
“We made this available so you can build datasets of interest,” Kirchhoff says of the customizable search users can use in a multitude of ways to mine virus research data for insights and trends. “Once you’ve curated a dataset, you can work online in our space we’ve created for data analysis, or you can download the dataset to work with the data offline.”
Resources for The Higher Education Response to COVID-19
ITHAKA’s emphasis on teaching and learning is what led the organization to create this flexible tool, Kirchhoff says: “We’d like to teach a generation of researchers about text mining.” It’s not just COVID-19 data, and ITHAKA is working on coronavirus response that goes far beyond data.
The organization also has published numerous resources for institutes of higher education facing unprecedented challenges bringing students back to campus this fall. You can find some of those published higher education COVID-19 response resources here, to learn more about how ITHAKA is supporting the higher education response to the coronavirus outbreak. ITHAKA has put together a list of hosted coronavirus response webinars and media resources here, which include guides on how to bring college courses online, remote teaching resources, and published articles about how academic libraries are adjusting to respond to COVID-19.
The data mining service from ITHAKA is so flexible, it’s really not a set program. The pilot is located here: https://tdm-pilot.org/. Check it out and tell us how you use the service and what insights you might glean. As difficult a moment as this is for higher education, we’re looking forward to hearing more from the academic and tech communities about how the coronavirus outbreak changes accessibility to academic resources like those curated by ITHAKA as the global community moves online.