DataCORE – Grappling With Big Files and Big Problems

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

James Halliday and Brian Keese

Poster description:

DataCORE is a brand new Samvera-based repository at Indiana University focusing exclusively on research data. This poster will show how the system works, including detailing how data flows in and out and some of the challenges they overcame in implementing it. The changing landscape of handling large data and how to move it around has necessitated some updates to their workflows.

(Enter the full screen by opening the options menu)

[iframe src=”https://docs.google.com/presentation/d/e/2PACX-1vRUGi4D3hxlALsuDgVBvtrLDdPQUbIgCXMDINltlSYsNLqjq5med9U0_-ufRuSe6N7yd2kDbXY5QPed/embed?start=false&loop=false&delayms=3000″ frameborder=”0″ width=”480″ height=”299″ allowfullscreen=”true” mozallowfullscreen=”true” webkitallowfullscreen=”true”]

About the authors:

James Halliday (Presenting author) is the Head of Repository Technology for Indiana University Libraries in Bloomington. Prior to this role, he has worked as a developer for Indiana University Libraries since 2001.

Brian Keese is a senior software engineer for Indiana University Libraries. He works on various projects in the Scholarly Communications department. Prior to this role, Brian worked for several years as a developer for the Avalon project.

2 thoughts on “DataCORE – Grappling With Big Files and Big Problems”

  1. Hi Jim and Brian,

    I would love to know how you’re organizing the data. Will the system host only one-off projects? Or is there a university-wide push to store datasets?

    Will the datasets be getting persistent identifiers? The current system does not display any prominently, though it sounds like you’re planning for DataCite integration.

    It looks like the current default license is CC-BY. Any plans to change this? Much of the content is not eligible for US copyright, and even the relatively light CC-BY restrictions can pose a barrier to reuse.

    1. Hi Ryan! Answers below…

      – We haven’t given much thought to overall data organization yet. The infrastructure allows us to group items in any way we choose, including having data sets belong to more than one group. But for the initial pilot phase it’ll just be one-offs.

      – Yes everything will be getting DOI’s. That functionality is something we’re working on right now.

      – Good point about the CC-BY license. We’ll talk to the appropriate folks to see what they recommend. We’re hoping for things to be as open as possible!

Comments are closed.

Skip to content