
The story so far. ChatGPT, or other large language models, is not ideal for information retrieval alone because it often hallucinates or fabricates information, including references.

The story so far. ChatGPT, or other large language models, is not ideal for information retrieval alone because it often hallucinates or fabricates information, including references.

Warning: I am not a data or information scientist; this is a new area I am trying to learn about The rise of large language models, such as GPT-3, GPT-4, ChatGPT, and BERT, has dramatically improved information retrieval in recent years.

Warning: I wrote this to explain to myself high-level technical details in deep learning, it is likely to include inaccuracies.

Warning : Speculative piece! I recently did a keynote at the IATUL 2023 conference where I talked about the possible impact of large language models (LLMs) on academic libraries.
In my last blog post, I tried to identify seminal papers using a variety of methods. These were divided into two main categories. The first category was to look at text written by other authors mentioning that certain works were seminal. The most straightforward way was to search citation statements/context in scite.ai for keyword phrases like "Seminal works" + Topic.

I was recently asked a question - how do you find seminal papers/work/research? My first thought was just to read (articles, reference works etc)! But my second thought was, that's actually quite an interesting question, as I throw around the term "seminal paper" all the time in workshops.

As academic librarians helping early-stage researchers (Masters, Phds students), we are often asked to provide guidance on the literature review process in one shot classes. One thing we tend to focus on during such sessions is the keyword search technique, though many of us also cover alternative to keyword techniques like citation searching, starting off with review articles etc.

As I write this, OpenAI has just unleashed ChatGPT - their GPT3.5 Large Language Model(LLMs) for about a month, and the online world is equal parts hype and confusion. In my corner of Twitter with educators & librarians there is worry about how LLMs might make detecting plagiarism difficult. I've written my take about this elsewhere but it occurs to me that this isn't the first time I've seen someone worry about automated tools.

Epistemic status : I have been reading on and off technical papers on large language models since 2020, mostly get the gist but don't understand the deepest technical details. I have written and published on academic discovery search for most of my librarianship career since 2008. Since the 2000s the way search engines have worked has not changed.

I've started seriously studying and blogging about Open Access since 2012, starting off by reading books by Walt Crawford and Peter Suber and since then I've continued to read and muse about the issue covering everything from How academic libraries may change when Open Access becomes the norm

I recently came across "Automated citation recommendation tools encourage questionable citations" (which was first brought to my attention by this blog post) an exceedingly thought-provoking article about bias in discovery tools, particularly new ones that can suggest what to reference based on text in a paper. Leaving aside the issues such recommenders might bring, you don't have to think extremely hard to realize such tools will