Cargando…

Dataset construction method of cross-lingual summarization based on filtering and text augmentation

Existing cross-lingual summarization (CLS) datasets consist of inconsistent sample quality and low scale. To address these problems, we propose a method that jointly supervises quality and scale to build CLS datasets. In terms of quality supervision, the method adopts a multi-strategy filtering algo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pan, Hangyu, Xi, Yaoyi, Wang, Ling, Nan, Yu, Su, Zhizhong, Cao, Rong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2023
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280405/ https://www.ncbi.nlm.nih.gov/pubmed/37346668 http://dx.doi.org/10.7717/peerj-cs.1299

Ejemplares similares

Reaching for upper bound ROUGE score of extractive summarization methods
por: Akhmetov, Iskander, et al.
Publicado: (2022)

Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
por: Duwairi, Rehab, et al.
Publicado: (2021)

Generative adversarial network based adaptive data augmentation for handwritten Arabic text recognition
por: Eltay, Mohamed, et al.
Publicado: (2022)

Small facial image dataset augmentation using conditional GANs based on incomplete edge feature input
por: Hung, Shih-Kai, et al.
Publicado: (2021)

Introducing DynaPTI–constructing a dynamic patent technology indicator using text mining and machine learning
por: Freunek, Michael, et al.
Publicado: (2023)

Editorial: Text complexity and simplification
por: Ermakova, Liana, et al.
Publicado: (2023)

Lung Cancer Segmentation With Transfer Learning: Usefulness of a Pretrained Model Constructed From an Artificial Dataset Generated Using a Generative Adversarial Network
por: Nishio, Mizuho, et al.
Publicado: (2021)

Computational Modeling of Stereotype Content in Text
por: Fraser, Kathleen C., et al.
Publicado: (2022)

Unanswerable Questions About Images and Texts
por: Davis, Ernest
Publicado: (2020)

Distribution-preserving data augmentation
por: Saran, Nurdan Ayse, et al.
Publicado: (2021)

Error driven synapse augmented neurogenesis
por: Perrett, Adam, et al.
Publicado: (2022)

Text-Graph Enhanced Knowledge Graph Representation Learning
por: Hu, Linmei, et al.
Publicado: (2021)

Clear, easy, plain, and simple as keywords for text simplification
por: Vecchiato, Sara
Publicado: (2022)

Image–text coherence and its implications for multimodal AI
por: Alikhani, Malihe, et al.
Publicado: (2023)

Event classification from the Urdu language text on social media
por: Awan, Malik Daler Ali, et al.
Publicado: (2021)

Unsupervised Text Segmentation Predicts Eye Fixations During Reading
por: Yang, Jinbiao, et al.
Publicado: (2022)

Effect of stemming on text similarity for Arabic language at sentence level
por: Alhawarat, Mohammad O., et al.
Publicado: (2021)

Fusion of text and graph information for machine learning problems on networks
por: Makarov, Ilya, et al.
Publicado: (2021)

Chinese text classification by combining Chinese-BERTology-wwm and GCN
por: Xu, Xue, et al.
Publicado: (2023)

Latent based temporal optimization approach for improving the performance of collaborative filtering
por: Al-Hadi, Ismail Ahmed Al-Qasem, et al.
Publicado: (2020)

Enhancing neural collaborative filtering using hybrid feature selection for recommendation
por: Drammeh, Baboucarr, et al.
Publicado: (2023)

Boolean logic algebra driven similarity measure for text based applications
por: Abdalla, Hassan I., et al.
Publicado: (2021)

Online Brand Community User Segments: A Text Mining Approach
por: Ge, Ruichen, et al.
Publicado: (2022)

Multi-objective evolutionary optimization for dimensionality reduction of texts represented by synsets
por: Vélez de Mendizabal, Iñaki, et al.
Publicado: (2023)

Editorial: Deep learning with limited labeled data for vision, audio, and text
por: Orescanin, Marko, et al.
Publicado: (2023)

Understanding image-text relations and news values for multimodal news analysis
por: Cheema, Gullal S., et al.
Publicado: (2023)

Augmenting Semantic Lexicons Using Word Embeddings and Transfer Learning
por: Alshaabi, Thayer, et al.
Publicado: (2022)

Improving patient rehabilitation performance in exercise games using collaborative filtering approach
por: Ismail, Waidah, et al.
Publicado: (2021)

A Perspective on Building Ethical Datasets for Children's Conversational Agents
por: Bailey, Jakki O., et al.
Publicado: (2021)

Grounding human-object interaction to affordance behavior in multimodal datasets
por: Henlein, Alexander, et al.
Publicado: (2023)

Modelling Speaker Attribution in Narrative Texts With Biased and Bias-Adjustable Neural Networks
por: Dönicke, Tillmann, et al.
Publicado: (2022)

How We Do Things With Words: Analyzing Text as Social and Cultural Data
por: Nguyen, Dong, et al.
Publicado: (2020)

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
por: Albalawi, Rania, et al.
Publicado: (2020)

A hybrid model of complexity estimation: Evidence from Russian legal texts
por: Blinova, Olga, et al.
Publicado: (2022)

Efficient algorithm for directed text detection based on rotation decoupled bounding box
por: Wei, Songma, et al.
Publicado: (2023)

Deep skin diseases diagnostic system with Dual-channel Image and Extracted Text
por: Li, Huanyu, et al.
Publicado: (2023)

Data augmentation based malware detection using convolutional neural networks
por: Catak, Ferhat Ozgur, et al.
Publicado: (2021)

Are GRU Cells More Specific and LSTM Cells More Sensitive in Motive Classification of Text?
por: Gruber, Nicole, et al.
Publicado: (2020)

Research on sentiment classification for netizens based on the BERT-BiLSTM-TextCNN model
por: Jiang, Xuchu, et al.
Publicado: (2022)

Improving text mining in plant health domain with GAN and/or pre-trained language model
por: Jiang, Shufan, et al.
Publicado: (2023)

Cannot write session to /tmp/vufind_sessions/sess_97koa6nd30llp6ct4sekhjg0o0