Skip to content

Legal Issues in Text and Data Mining: Literature and Text-Based Works


Dave Hansen

Computational research techniques such as text and data-mining hold tremendous opportunity for researchers across the disciplines, from mining scientific articles to create better conduct systematic reviews to better understanding of how concepts of gender, race, and identity are shared across popular literature over time. Unfortunately, legal barriers to text and data-mining often hinder research, in some cases halting it altogether, and in other cases causing researchers to bias their work by only relying on textual materials thought to be "safe" from copyright problems. This lunchtime workshop will survey the existing law and policy and highlight pathways forward for researchers under existing law, including fair use and TDM specific exemptions to copyright, with a specific focus on creating and using textual datasets. We will offer a hybrid attendance option, but the workshop will be hands-on, so we encourage in-person participation. We are also offering a second workshop on April 4 on legal issues with TDM for images and audio-visual materials. This workshop is led by Dave Hansen, Executive Director of Authors Alliance (, a nonprofit that exists to support authors who research and write for the public benefit. Dave is a copyright expert who has worked extensively on legal barriers to research, and is a PI for the Authors Alliance Text and Data-Mining: Demonstrating Fair Use Project, which is generously supported by the Mellon Foundation. Lunch will be provided. Please RSVP at so we have an accurate count for food. For virtual attendance, please register for the zoom webinar here: This event is organized by Duke University Libraries and the John Hope Franklin Humanities Institute.