InformatikEnglischMedium

Stories by Research Graph on Medium

Stories by Research Graph on Medium
Stories by Research Graph on Medium
StartseiteRSS-Feed
language
MegalodonLong-textsTransformer-architectureInformatikEnglisch
Veröffentlicht

An improvement architecture superior to the Transformer, proposed by Meta Author · Qingqin Fang ( ORCID: 0009–0003–5348–4264) Introduction Recently, researchers from Meta and the University of Southern California have introduced a model called Megalodon. They claim that this model can expand the context window of language models to handle millions of tokens without overwhelming your memory.

Large-language-modelsArtificial-intelligenceTransformersNatural-language-processInformatikEnglisch
Veröffentlicht
Autor Wenyi Pi

Understanding the Evolutionary Journey of LLMs Author Wenyi Pi ( ORCID : 0009–0002–2884–2771) Introduction When we talk about large language models (LLMs), we are actually referring to a type of advanced software that can communicate in a human-like manner. These models have the amazing ability to understand complex contexts and generate content that is coherent and has a human feel.

ModularNaiveAdvancedRetrieval-augmented-genInformatikEnglisch
Veröffentlicht

From Naive to Modular: Tracing the Evolution of Retrieval-Augmented Generation Author · Vaibhav Khobragade ( ORCID: 0009–0009–8807–5982) Introduction Large Language Models (LLMs) have achieved remarkable success.

Large-language-modelsRlhfFine-tuningInformatikEnglisch
Veröffentlicht
Autor Xuzeng He

Supervised Fine-tuning, Reinforcement Learning from Human Feedback and the latest SteerLM Author · Xuzeng He ( ORCID: 0009–0005–7317–7426) Introduction Large Language Models (LLMs), usually trained with extensive text data, can demonstrate remarkable capabilities in handling various tasks with state-of-the-art performance. However, people nowadays typically want something more personalised instead of a general solution.

Natural-language-processiTransformersArtificial-intelligenceInformatikEnglisch
Veröffentlicht

Attention mechanism not getting enough attention Author Dhruv Gupta ( ORCID : 0009–0004–7109–5403) Introduction As discussed in this article, RNNs were incapable of learning long-term dependencies. To solve this issue both LSTMs and GRUs were introduced. However, even though LSTMs and GRUs did a fairly decent job for textual data they did not perform well.

Fake-newsArtificial-intelligenceLarge-language-modelsInformatikEnglisch
Veröffentlicht

Large Language Models for Fake News Generation and Detection Author Amanda Kau ( ORCID : 0009–0004–4949–9284) Introduction In recent years, fake news has become an increasing concern for many, and for good reason. Newspapers, which we once trusted to deliver credible news through accountable journalists, are vanishing en masse along with their writers.

NaturallanguageprocessingLstmArtificial-intelligenceRecurrent-neural-networkInformatikEnglisch
Veröffentlicht

The Three Oldest Pillars of NLP Author Dhruv Gupta ( ORCID : 0009–0004–7109–5403) Introduction Natural Language Processing (NLP) has almost become synonymous with Large Language Models (LLMs), Generative AI, and fancy chatbots. With the ever-increasing amount of textual data and exponential growth in computational knowledge, these models are improving every day.

Large-language-modelsFrameworkRetrieval-augmentedInformatikEnglisch
Veröffentlicht

A Unified and Collaborative Framework for LLM Author · Qingqin Fang ( ORCID: 0009–0003–5348–4264) Introduction In today’s rapidly evolving field of artificial intelligence, large language models (LLMs) are demonstrating unprecedented potential. Particularly, the Retrieval-Augmented Generation (RAG) architecture has become a hot topic in AI technology due to its unique technical capabilities.

Large-language-modelsArtificial-intelligenceRetrieval-augmented-genMisinformation-researchInformatikEnglisch
Veröffentlicht
Autor Wenyi Pi

Exploring innovative Strategies in Combating Misinformation with Enhanced Multimodal Understanding Author Wenyi Pi ( ORCID : 0009–0002–2884–2771) Introduction Misinformation refers to false or inaccurate information that is often given to someone in a deliberate attempt to make them believe something that is not true. This has a significantly negative impact on public health, political stability and social trust and harmony.

Security-assessmentsAi-assistanceLarge-language-modelsInformatikEnglisch
Veröffentlicht
Autor Xuzeng He

Latest effort in assessing the security of the code generated by large language models Author · Xuzeng He ( ORCID: 0009–0005–7317–7426) Introduction With the surge of Large Language Models (LLMs) nowadays, there is a rising trend among developers to use Large Language Models to assist their daily code writing. Famous products include GitHub Copilot or simply ChatGPT.

Social-networkKnowledge-graphPretrained-language-modelInformatikEnglisch
Veröffentlicht
Autor Xuzeng He

Latest findings in pre-training graphs and using them for link recommendation Author · Xuzeng He ( ORCID: 0009–0005–7317–7426) Introduction A graph, in short, is a description of items linked by relations, where the items of a graph are called nodes (or vertices) and their relations are called edges (or links). Examples of graphs can include social networks (e.g. Instagram) or knowledge graphs (e.g. Wikipedia). In Instagram