Zenodo spam detection based on neural networks

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

Pablo Panero; A Ioannidis; JB Gonzalez

Poster description:

Nobody wants to get something unwanted, like spam. The increase of spam content has become a problem in our digital era, and therefore it also affects digital repositories. Hosting spam can have an impact on a service, i.e. the actual hardware costs of storing it, getting skewed usage statistics, including distribution of material that violates copyright, and, most importantly, serving undesired content to users. Zenodo is a generalist research repository fostering open science practices. As the barrier for submissions is low, it is an easy target for spam. The repository’s staff has spent many hours manually detecting spam content, a process now assisted by an automated spam classification system, which still does not produce satisfactory results. Improvements of this classifier were based on an in-depth study of Zenodo’s data, a descriptive analysis, and feature extraction to corroborate expert knowledge gathered over years by Zenodo’s staff, as well as on a literature review of related topics such as spam classification in emails. Several types of neural network models were tested, displaying promising results for future integration. However, as the false positive rate is still unacceptable, Random Forest classifiers still prevail over neural network models.

(Click on the image to zoom in)

About the authors:

Pablo Panero (Presenting author) works at CERN as a software engineer for Zenodo, an open data service for the long tail of science. Pablo has been working on the Invenio framework for several years, most recently as a core developer of the new InvenioRDM. Previously, Pablo has been the lead developer of CERN Search, the Invenio based enterprise search solution.
Alex Ioannidis works at CERN’s IT department and is currently responsible for Zenodo, an open data service for the long tail of science. Alexandros has many years of experience as a software engineer, having led the design and operation of many production-level commercial and open-source software systems. He is also a core maintainer of the open-source Invenio framework for digital repositories. He has worked on promoting Software as a first-class citizen in the scientific research world.
José Benito González López leads the CERN section that is in charge of Invenio, the Digital Repository Framework, and several services that are running on top of it: CERN Document Server – CERN’s institutional repository, Zenodo – open data service for the long-tail of science, CERN Open Data Repositories, Digital Memory projects and also backend development of B2Share (EUDAT service). José is also a very experienced open source software developer and project manager with more than 15 years of experience, many of them devoted to the Open Source Project Indico which is the result of a European Project with the same name.

Availability of open government data in the Maghreb countries

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

Elsayed Elsawy; Ahmed Shehata

Poster description:

Recently, many Arab governments have adopted open government data policies, whereby web technologies are harnessed to provide access to government data.  The study’s problem lies in the fact that open data practices in the Arab world are relatively recent and are still in their nasal stages, making those practices the subject of constant criticism and evaluation to develop and improve them to be consistent with international practices. Moreover, there is no research carried out to identify the extent of the Maghreb Arab states’ progress in this field. The government open data portals in the Maghreb countries (Algeria, Libya, Mauritania, Morocco, and Tunisia)  are considered the gateway to studying the reality of open data services in those countries. Moreover, through them, it is possible to identify the extent to which government policies have achieved the goals set in this regard. This study explores the current practices used in preserving and sharing data in open data portals in the Maghreb countries and assessing whether its structure and organization of data on these portals are consistent with the goals for which these portals were established. To achieve this, a set of criteria and characteristics that open data portals must meet have been developed and then used as a framework for analysing the structure and organization of data for open data portals in the Maghreb countries. The study sample included three open data portals on the Internet: the Arab Maghreb countries’ portals (Tunisia, Algeria, and Morocco). In order to analyse the portals selected in the study sample, a list of 47 criteria was developed. This study’s results are thought to help achieve a better understanding of the current open data practices in the study sample and propose suggestions that would help improve the current initiatives.

(Enter the full screen by opening the options menu)

About the authors:

Dr Elsayed Elsawy

Assistant Professor of Documents and Archives at Sultan Qaboos
University (Sultanate of Oman). PhD from Lyon University (France)
2006. Worked at a number of Egyptian and Arab universities (Imam Muhammad bin Saud University in Saudi Arabia, Sultan Qaboos University in the Sultanate of Oman, Tanta University, Al-Azhar University, Mansoura University and Alexandria University in Egypt). Supervised a number of master’s and doctoral dissertations at the universities of Tanta, Al-Azhar (Egypt) and Sultan Qaboos (Oman). Published more than 30 studies in Arabic, French and English languages, in Web 2.0 applications, Cloud computing in records management, Medical records systems, Digitization of documents, Archival metadata, Records management systems, Electronic archives, Modern services for archival institutions, Education with documents.

Dr Ahmed Shehata

Dr. Ahmed Shehata is an assistant professor of Library and Information Science with experience demonstrated in a career spanning over twelve years. He worked in Egypt at Minia University as a lecturer in Library and Information Science Department. Also currently working as an Assistant professor at Sultan Qaboos University, Oman. Awarded PhD from Department of Information Studies at Aberystwyth University, the
UK, in 2016. Dr. Ahmed supervised many master’s degree students at Minia University (Egypt) and Sultan Qaboos (Oman). Dr. Ahmed published many scholarly papers in Arabic, and English language in Journals indexed in Clarivate and Scopus. His research interest is mainly focused on Information behaviour, Scholarly communication and Scholarly publishing. Dr Ahmed was awarded the Sharjah 2018 award of library literature.

PeruCRIS: A National Research Information Infrastructure based on DSpace-CRIS

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

Claudia Córdova; Abel Del CarpioFrancisco Talavera; Susanna Mornati; Andrea Bollini;  Claudio Cortese; Corrado Lombardi; Alfonso Maza; Ana Puente de la Puebla

Poster description:

PeruCRIS is the Peruvian project for setting up and operating a National Information Network on Science, Technology and Technological Innovation. It is based on open source software and open standards,
especifically as an extension of the already existing network of Peruvian open access repositories. The network is designed to be interoperable with CRIS systems and Open Access Repositories, and also to allow for direct submission into the National central hub (PeruCRIS Platform) from institutions and researchers. PeruCRIS Platform goes beyond mere aggregation, supporting data normalization, enrichment and curation
for collected data, and direct editing in the Directorios. These roles are reserved for Concytec staff, and serve as a means for ensuring data completeness, accuracy and general quality. A particular CERIF profile
and a set of controlled vocabularies are being developed in order to accommodate for particular information needs of national scope. An initial partial release of main directories is scheduled for Q2 2021.

(Enter the full screen by opening the options menu)

About the authors:

 

Repositories and Publishers: AgEcon Search forging new relationships

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

Linda Eells  and Julie Kelly

Poster description:

Many repositories do not work with publishers on a regular basis but that is not true for AgEcon Search, the subject repository in agricultural and applied economics. With journal articles as the most common document type and over 40 professional societies among the 340 groups that contribute material, we have formed a number of different relationships with publishers. The most common is that AgEcon Search hosts material initially created and made available by a small (often professional society) publisher. In one uniquely beneficial case, we receive and upload pdfs of each individual article from a large commercial publisher after an embargo period. In an opposite, negative situation, we were asked by a publisher to
remove years of articles after a society moved to a large commercial publisher and all older material was placed behind the publisher’s paywall. In recent, new collaborations, we assist publishers with creating DOIs, and are negotiating to serve as the pre-print server for one society journal that contracts with a commercial publisher. We encourage other repositories to consider how they might work together with publishers to make research materials more widely available, including institutional repositories whose
faculty serve on national or international society journal editorial boards.

(Enter the full screen by opening the options menu)

About the authors:

Linda Eells has been a librarian at the University of Minnesota since 2003, providing research and instruction services and co-coordinating the AgEcon Search disciplinary repository. Her research interests include open access publishing models and repository development and management. She holds a MLIS from the University of Wisconsin-Milwaukee, and a M.S. in Conservation Biology from the University of Minnesota-Twin Cities.

Julie Kelly (Presenting author) is a science librarian at the University of Minnesota, serving as co-coordinator of AgEcon Search, a disciplinary repository, as well as acting as library liaison for ecology and horticulture.  She holds a MLIS from the University of Wisconsin-Milwaukee, and a M.S. in Biology from the University of Minnesota-Duluth. Her research interests include historic data and grey literature.

Bepress to DSpace in PJs: Migrating two open repositories from home

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

Julia Corrice; Chloe McLaren; Jim Del Rosso; Gail Steinhart

Poster description:

In 2019, Cornell University Library (CUL) made the decision to migrate multiple institutional repositories from bepress into eCommons, its locally supported DSpace instance. When migration project planning began in February of 2020, we could not have anticipated the impact the COVID-19 pandemic would soon have on our work. We’ll share some background–how Cornell ended up with multiple institutional repositories and our rationale for consolidation, and then describe the work itself, with particular focus on the challenges and approaches we used to successfully complete this project during a pandemic, with all staff making an abrupt and unplanned transition to working remotely.

(Enter the full screen by opening the options menu)

About the authors:

Julia Corrice (Presenting author): Julia Corrice is the Metadata Operations Librarian at Cornell University Library, where she focuses on metadata maintenance activities, primarily the integration of metadata between systems and extending relationships between resources. Before joining Cornell, she worked as the Digital Services Librarian for the South Central Regional Library Council, a member of the Empire State Library Network. She received her MLIS in 2012 from Syracuse University with a CAS in Digital Libraries.

Chloe McLaren: Chloe McLaren is the Metadata Projects Librarian at Cornell University Library. She currently works broadly in the institutional repository and manages the library’s streaming video platform. Before becoming a librarian, she worked as a projectionist. She received her MLIS in 2010 from the University of Illinois at Urbana-Champaign.

A Recommendation System for an Open Archive

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

Gulce Bal Bozkurt and Gozde Boztepe Karatas

Poster description:

In this study, we developed a recommender system for OpenMETU which is the open archive of Middle East Technical University. Our system recommends items by using a content-based approach. In the content-based approach, the properties of items are vital due to the recommendations are based on them. Our recommendation system is based on the author, abstract, title, and subject similarity. To calculate these similarities, first of all, we extract features of each item by using natural language processing algorithms such as TF-IDF and Universal sentence encoder. We use cosine distance to measure the similarity between two items. Later, we give different weights to each of the similarities to calculate the overall score. Our system recommends the most similar 5 items to the visitor who visits an item in our archive.

(Enter the full screen by opening the options menu)

About the authors:

DataCORE – Grappling With Big Files and Big Problems

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

James Halliday and Brian Keese

Poster description:

DataCORE is a brand new Samvera-based repository at Indiana University focusing exclusively on research data. This poster will show how the system works, including detailing how data flows in and out and some of the challenges they overcame in implementing it. The changing landscape of handling large data and how to move it around has necessitated some updates to their workflows.

(Enter the full screen by opening the options menu)

About the authors:

James Halliday (Presenting author) is the Head of Repository Technology for Indiana University Libraries in Bloomington. Prior to this role, he has worked as a developer for Indiana University Libraries since 2001.

Brian Keese is a senior software engineer for Indiana University Libraries. He works on various projects in the Scholarly Communications department. Prior to this role, Brian worked for several years as a developer for the Avalon project.

RCIN – Digital Repository of Scientific Institutes in Poland

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

Błażej Betański; Tomasz Parkoła; Natalia Jeszke

Poster description:

This poster presents RCIN -a digital repository developed in OZwRCIN project, being an example of cooperation between thematically diverse scientific institutes across Poland. RCIN is developed to give access to and promote digital resources coming from various Polish scientific institutes. The poster covers information on technical solutions applied to increase the accessibility of data, .e.g various metadata schemes like Dublin Core and Darwin Core, accessibility in terms of user interface as well as large-scale digitisation and underlying infrastructure (software and hardware). RCIN is built as a largescale repository that holds digital assets from all participating institutions and provides multiple ways to access the data, including a web portal for the general public, multiple smaller institutional repositories that give access to resources provided by specific institutions and various APIs.

(Click on the image below and zoom in with your browser)

RCIN poster

About the Authors:

Tomasz Parkoła is the Head of Digital Libraries and Knowledge Platforms Department at Poznań Supercomputing and Networking Center (Poznań, Poland) where he manages research & development teams responsible for digital humanities infrastructure (http://ehum.psnc.pl/en/main-page/), products and services for digital libraries and cultural heritage (https://dingo.psnc.pl/) as well as Europeana-accredited Polish metadata aggregator FBC (https://fbc.pionier.net.pl/). He has been involved in national and international research and development projects with main themes on data access & processing, long-term preservation, digitization workflows as well as data aggregation & interoperability. He is a board member in the IMPACT Centre of Competence (since 2011), a co-head of DARIAH-EU’s Virtual Competence Centre on Scholarly Content Management (since 2019) and a Product Board member of the Open Preservation Foundation (since 2019). He was programme committee member for iPRES 2014, iPRES 2016, DATeCH 2017 and DARIAH AE 2019. He is an author or co-author of several dozens of scientific and popular science publications.

Błażej Betański (pr- M.Sc. Eng. in computer science, Team lead in Digital Libraries and Knowledge Platforms Department at Poznań Supercomputing and Networking Center. Since 2014 works at Poznań Supercomputing and Networking Center. Leader of the team developing the DInGO software package in the Department of Digital Libraries and Knowledge Platforms. Involved in research and development in the field of software for building digital libraries, museums and digital repositories as well as systems for managing the digitization process and long-term preservation. He participated in the development and implementation of dozens of systems for presenting online cultural resources carried out in the country and abroad.

Natalia Jeszke – M.A. in social science, Digital Content Specialist in Digital Libraries and Knowledge Platforms Department at Poznań Supercomputing and Networking Center. Librarian and Senior Librarian from 2010 till 2018, since 2018 works at Poznań Supercomputing and Networking Center. Involved in projects such as the promotion of cultural heritage in Federacja Bibliotek Cyfrowych and Europeana. At Digital Repository of Scientific Institutes (DRSI) responsible for creating valuable metadata for digital objects and maintain metadata standards for digital libraries in the context of processing them for search engines.

Status of students’ graduate (masters) theses in repositories of six European universities

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Authors:

Danijel Gudelj; Ljiljana Poljak; Vicko Tomić; Mirta Matošić; Matko Marušić; Ana Marušić

Poster description:

Institutions are often apprehensive when moving their repository to a new repository system, as this will affect hundreds of thousands of records collected over decades and the discoverability of their institution’s
research. This poster will present statistics on item usage and discovery after repository migration, using data from an external independent repository monitoring service to show an increase in usage of up to 250% and explain how this can be achieved.

(Enter the full screen by opening the options menu)

About the authors:

Danijel Gudelj
Danijel Gudelj is M.A. of sociology and croatology, graduated at Centre for Croatian Studies, University of Zagreb. Currently, he is a member of the editorial team of ST-OPEN, new overlay+ journal which was founded by the University of Split. For the last decade, he was working at the University of Split as an events manager. His interests are scientific research in the field of scientometry, scientific ethics, editing the scientific journal, cultural and social events etc. He has published several papers.

Ljiljana Poljak

Ljiljana Poljak holds a master’s degree in library and information science from the University of Zadar and is currently a PhD student at the same university. She works as a librarian at the University of Split Library Department of Research Services and Development, where her responsibilities include analysis of scientific productivity, web development, institutional repository management, and open science advocacy.

Vicko Tomić (Presenting author)

Vicko Tomić is an Editor-in-Chief of the overlay+ journal ST-OPEN, founded by the University of Split. He graduated sociology from the Faculty of Humanities and Social Sciences in Split and currently is a PhD candidate at the University of Split School of Medicine. His main fields of scientific interest are qualitative methodology, research ethics and research integrity. He has published several scientific papers and had presentations at several national and international scientific conferences. He is a member of the Croatian Sociological Society.

Mirta Matošić

Mirta Matošić is a Librarian Advisor at the University of Split Library. She received her Bachelor’s degree from the University Oklahoma, in Management Information Systems and in International Business, and her Master’s degree in Library and Information Sciences from the University of Zadar. She is pursuing her postgraduate degree at the University of Zadar in the Knowledge Society and Information Transfer programme. Her main research interests are academic librarianship, library management and evaluation.

Matko Marušić, MD, PhD

Dr Marušić is the founder and Editor emeritus of ST-OPEN journal. He is a graduate of Zagreb University School of Medicine (1970), where he also obtained PhD (1975), and professorship at the Department of Physiology (1980). His research interest has mostly been concentrated on immunology of bone marrow transplantation, but he also published in fields of oncology, philosophy of science, and medical education. Dr Marušić writes short stories, and two collections of his works became obligatory readings in the primary schools in Croatia.

Ana Marušić

Ana Marušić is Professor of Anatomy and Chair of the Department of Research in Biomedicine and Health at the University of Split School of Medicine, Split, Croatia. Since 2012, she is Honorary Professor of the Edinburgh University, Edinburgh, Scotland, UK. Prof. Marušić has been the Editor in Chief of the Croatian Medical Journal for 20 years, and is now the Founder and Co-editor in Chief of the Journal of Global Health. Prof. Marušić is Past President of the World Association of Medical Editors and Council of Science Editors. She is currently the Vice President of the European Association of Science Editors. Professor Marušić received her MD and PhD degrees from the University of Zagreb, Croatia. Her biomedical research is focused on the interactions between the immune and bone systems. Her research interests also include peer review and research integrity. She has been the principal investigator of several national and international research grants. She has given a number of invited lectures at international meetings. Prof. Marušić is the founder of the Croatian branch of the Cochrane Collaboration and creator of the Croatian public registry of clinical trials. She is on the Steering Group of the EQUATOR Network, an international initiative for promoting transparent and accurate reporting of health research.

A Framework for an Arabic Terminology Management System (TMS) using Artificial Intelligence

This poster is part of the Open Repositories 2021 Poster Session which takes place in the week of June 7-10. We encourage you to ask questions and engage in discussion on this poster by using the comments feature. Authors will respond to comments during this week.

Author:

Sherine Mahmoud Eid

Poster description:

Many translation projects in the Arab world were not implemented, or were delayed, due to manual translation processes and/or budgetary reasons. The poster proposes the use of Artificial intelligence (AI) in the form of Arabic machine translation (MT) to develop an Arabic Terminology Management System (TMS), followed by human Post-editing of machine-translation output. The system shall ease the translation process and generate output products, such as glossaries and taxonomies.

(Enter the full screen by opening the options menu)

About the author:

Sherine Eid is Head of the International Library Projects Unit at Bibliotheca Alexandrina, Egypt and has twenty-one years’ experience in the Library and Information Science field. Sherine is a member of the LD4 Steering Group. She is a reviewer for international journals and conferences and has contributed to many publications, published by IFLA, Bibliotheca Alexandrina, and Taylor and Francis. She holds a MBA degree in Global Marketing from Paris ESLSCA Business School.

Registration is filling up!

While most of our workshops have reached capacity, you can still register for the main Open Repositories 2021 conference. You can see the full program here: https://www.conftool.net/or2021/sessions.php  and register here: http://openrepos2021.eventbrite.com/. Only a few tickets are left so act quickly!

We have loads of great presentations, panels, developer tracks, 24×7 sessions, and posters as well as three exciting keynote sessions.

Please note: once you register on Eventbrite, you will receive an email inviting you to join Sched within a day or two. Sched is where all links to sessions will reside. For more information, please see the attendee guide.

Registration is open!

Please join us for the 16th annual Open Repositories Conference.

Keynote speakers will include Sir Jeremy Farrar (Director of the Wellcome Trust), Dr. Bianca Amaro (President of LA Referencia, a Network of Repositories of Open Access to Science in Latin America, and Coordinator of the Brazilian Open Science Program at the Brazilian Institute of Information in Science and Technology) and Dr. Tahu Kukutai (Te Rūnanga Tātari Tatauranga/National Institute of Demographic and Economic Analysis, Te Whare Wānanga o Waikato/The University of Waikato) and Dr. Stephanie Russo Carroll (Associate Director and Manager, Tribal Health Program and Assistant Professor, Mel and Enid Zuckerman College of Public Health, University of Arizona). Visit the Keynotes page for further information.

Workshops will be held on June 7th with the main conference beginning on June 8th. Workshop space is limited so there will be a cap on registration for each of the workshops. Some workshops do run concurrently so look at times carefully. Please visit the preliminary Programme schedule.

Registration is free. Once you have registered, you will be provided with information on how to access the conference on Zoom via Sched about four weeks before the conference begins. Welcome to Register!

Call for Proposals and key dates for OR 2021 virtual conference

The 2021 Call for Proposals is available. It describes the process for submitting presentations, panels, posters and workshops.

Key dates for OR 2021:

  • 25 January 2021: Deadline for submissions. 
  • 31 March 2021: Submitters notified of acceptance of full presentation, 24×7, workshop, poster and developer track proposals
  • Early April 2021: Registration opens (Registration is free!)
  • 7-10 June 2021: OR2021 conference

Postponement of in-person conference

The Open Repositories Steering Committee and the Stellenbosch University hosts have decided to postpone the in-person conference planned for summer 2021 in Stellenbosch, South Africa, and to hold a completely virtual conference instead. The uncertainty around the continuing pandemic, travel restrictions, and the budget crises many of our institutions are in have all contributed to this decision. For many of us the true value of Open Repositories is the international conversations and community that it fosters; we felt strongly that, even if conditions improve, an in-person conference or a hybrid virtual/in-person would not be successful given the likely poor attendance.

The program committee will be working to reshape the conference into a virtual one including reaching out to those who had their 2020 OR proposals accepted and deferred. We appreciate your patience as we do this work.

We want to reaffirm our plan to hold an Open Repositories conference in South Africa. After working with both our Stellenbosch and our Denver hosts (who were slated to host the 2022 OR), we have decided to continue to plan for Denver for 2022 and to plan for Stellenbosch in 2023.

Please do not hesitate to reach out to me as Chair or Claire Knowles as Vice-Chair of the OR Steering Committee if you have questions.

Thank you –

Sarah Shreeves, Chair ORSC

sshreeves@arizona.edu

 

Claire Knowles, Vice Chair ORSC

c.knowles1@leeds.ac.uk