nyhetsdataset - Infinite Lexicon - Infinite Lexicon

nyhetsdataset

Nyhetsdataset is a corpus of news articles assembled for research and development in natural language processing, text mining, and data-driven journalism. The collection is typically focused on Swedish-language news, with the possibility of including items from additional languages or translated content to support multilingual experimentation. The dataset is intended to enable tasks such as classification, summarization, information extraction, sentiment analysis, and trend detection over time.

The content is organized with articles accompanied by metadata. Common fields include article_id, title, lede or

Collection and licensing practices vary. Articles are gathered from public-facing news websites, feeds, or licensed providers,

Annotations and enhancements are sometimes included, such as topic labels, editorial categories, or linguistic marks (tokenization,

Applications include benchmarking NLP models, building news-aware recommender systems, and studying media coverage and discourse. Users

research-friendly

lemmatization).