LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 19

Search options

  1. Book ; Online: Reproducibility Analysis and Enhancements for Multi-Aspect Dense Retriever with Aspect Learning

    Bi, Keping / Sun, Xiaojie / Guo, Jiafeng / Cheng, Xueqi

    2024  

    Abstract: Multi-aspect dense retrieval aims to incorporate aspect information (e.g., brand and category) into dual encoders to facilitate relevance matching. As an early and representative multi-aspect dense retriever, MADRAL learns several extra aspect embeddings ...

    Abstract Multi-aspect dense retrieval aims to incorporate aspect information (e.g., brand and category) into dual encoders to facilitate relevance matching. As an early and representative multi-aspect dense retriever, MADRAL learns several extra aspect embeddings and fuses the explicit aspects with an implicit aspect "OTHER" for final representation. MADRAL was evaluated on proprietary data and its code was not released, making it challenging to validate its effectiveness on other datasets. We failed to reproduce its effectiveness on the public MA-Amazon data, motivating us to probe the reasons and re-examine its components. We propose several component alternatives for comparisons, including replacing "OTHER" with "CLS" and representing aspects with the first several content tokens. Through extensive experiments, we confirm that learning "OTHER" from scratch in aspect fusion is harmful. In contrast, our proposed variants can greatly enhance the retrieval performance. Our research not only sheds light on the limitations of MADRAL but also provides valuable insights for future studies on more powerful multi-aspect dense retrieval models. Code will be released at: https://github.com/sunxiaojie99/Reproducibility-for-MADRAL.

    Comment: accepted by ecir2024 as a reproducibility paper
    Keywords Computer Science - Information Retrieval
    Subject code 006
    Publishing date 2024-01-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: A Comparative Study of Training Objectives for Clarification Facet Generation

    Ni, Shiyu / Bi, Keping / Guo, Jiafeng / Cheng, Xueqi

    2023  

    Abstract: Due to the ambiguity and vagueness of a user query, it is essential to identify the query facets for the clarification of user intents. Existing work on query facet generation has achieved compelling performance by sequentially predicting the next facet ... ...

    Abstract Due to the ambiguity and vagueness of a user query, it is essential to identify the query facets for the clarification of user intents. Existing work on query facet generation has achieved compelling performance by sequentially predicting the next facet given previously generated facets based on pre-trained language generation models such as BART. Given a query, there are mainly two types of training objectives to guide the facet generation models. One is to generate the default sequence of ground-truth facets, and the other is to enumerate all the permutations of ground-truth facets and use the sequence that has the minimum loss for model updates. The second is permutation-invariant while the first is not. In this paper, we aim to conduct a systematic comparative study of various types of training objectives, with different properties of not only whether it is permutation-invariant but also whether it conducts sequential prediction and whether it can control the count of output facets. To this end, we propose another three training objectives of different aforementioned properties. For comprehensive comparisons, besides the commonly used evaluation that measures the matching with ground-truth facets, we also introduce two diversity metrics to measure the diversity of the generated facets. Based on an open-domain query facet dataset, i.e., MIMICS, we conduct extensive analyses and show the pros and cons of each method, which could shed light on model training for clarification facet generation. The code can be found at \url{https://github.com/ShiyuNee/Facet-Generation}
    Keywords Computer Science - Information Retrieval
    Subject code 006
    Publishing date 2023-10-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: CIR at the NTCIR-17 ULTRE-2 Task

    Yu, Lulu / Bi, Keping / Guo, Jiafeng / Cheng, Xueqi

    2023  

    Abstract: The Chinese academy of sciences Information Retrieval team (CIR) has participated in the NTCIR-17 ULTRE-2 task. This paper describes our approaches and reports our results on the ULTRE-2 task. We recognize the issue of false negatives in the Baidu search ...

    Abstract The Chinese academy of sciences Information Retrieval team (CIR) has participated in the NTCIR-17 ULTRE-2 task. This paper describes our approaches and reports our results on the ULTRE-2 task. We recognize the issue of false negatives in the Baidu search data in this competition is very severe, much more severe than position bias. Hence, we adopt the Dual Learning Algorithm (DLA) to address the position bias and use it as an auxiliary model to study how to alleviate the false negative issue. We approach the problem from two perspectives: 1) correcting the labels for non-clicked items by a relevance judgment model trained from DLA, and learn a new ranker that is initialized from DLA; 2) including random documents as true negatives and documents that have partial matching as hard negatives. Both methods can enhance the model performance and our best method has achieved nDCG@10 of 0.5355, which is 2.66% better than the best score from the organizer.

    Comment: 5 pages, 1 figure, NTCIR-17
    Keywords Computer Science - Information Retrieval
    Subject code 006
    Publishing date 2023-10-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search

    Sun, Xiaojie / Yu, Lulu / Wang, Yiting / Bi, Keping / Guo, Jiafeng

    2023  

    Abstract: An effective ranking model usually requires a large amount of training data to learn the relevance between documents and queries. User clicks are often used as training data since they can indicate relevance and are cheap to collect, but they contain ... ...

    Abstract An effective ranking model usually requires a large amount of training data to learn the relevance between documents and queries. User clicks are often used as training data since they can indicate relevance and are cheap to collect, but they contain substantial bias and noise. There has been some work on mitigating various types of bias in simulated user clicks to train effective learning-to-rank models based on multiple features. However, how to effectively use such methods on large-scale pre-trained models with real-world click data is unknown. To alleviate the data bias in the real world, we incorporate heuristic-based features, refine the ranking objective, add random negatives, and calibrate the propensity calculation in the pre-training stage. Then we fine-tune several pre-trained models and train an ensemble model to aggregate all the predictions from various pre-trained models with human-annotation data in the fine-tuning stage. Our approaches won 3rd place in the "Pre-training for Web Search" task in WSDM Cup 2023 and are 22.6% better than the 4th-ranked team.

    Comment: 4 pages, 2 figures, WSDM Cup 2023
    Keywords Computer Science - Information Retrieval
    Subject code 006
    Publishing date 2023-02-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Feature-Enhanced Network with Hybrid Debiasing Strategies for Unbiased Learning to Rank

    Yu, Lulu / Wang, Yiting / Sun, Xiaojie / Bi, Keping / Guo, Jiafeng

    2023  

    Abstract: Unbiased learning to rank (ULTR) aims to mitigate various biases existing in user clicks, such as position bias, trust bias, presentation bias, and learn an effective ranker. In this paper, we introduce our winning approach for the "Unbiased Learning to ... ...

    Abstract Unbiased learning to rank (ULTR) aims to mitigate various biases existing in user clicks, such as position bias, trust bias, presentation bias, and learn an effective ranker. In this paper, we introduce our winning approach for the "Unbiased Learning to Rank" task in WSDM Cup 2023. We find that the provided data is severely biased so neural models trained directly with the top 10 results with click information are unsatisfactory. So we extract multiple heuristic-based features for multi-fields of the results, adjust the click labels, add true negatives, and re-weight the samples during model training. Since the propensities learned by existing ULTR methods are not decreasing w.r.t. positions, we also calibrate the propensities according to the click ratios and ensemble the models trained in two different ways. Our method won the 3rd prize with a DCG@10 score of 9.80, which is 1.1% worse than the 2nd and 25.3% higher than the 4th.

    Comment: 5 pages, 1 figure, WSDM Cup 2023
    Keywords Computer Science - Information Retrieval
    Subject code 006
    Publishing date 2023-02-15
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Leveraging User Behavior History for Personalized Email Search

    Bi, Keping / Metrikov, Pavel / Li, Chunyuan / Byun, Byungki

    2021  

    Abstract: An effective email search engine can facilitate users' search tasks and improve their communication efficiency. Users could have varied preferences on various ranking signals of an email, such as relevance and recency based on their tasks at hand and ... ...

    Abstract An effective email search engine can facilitate users' search tasks and improve their communication efficiency. Users could have varied preferences on various ranking signals of an email, such as relevance and recency based on their tasks at hand and even their jobs. Thus a uniform matching pattern is not optimal for all users. Instead, an effective email ranker should conduct personalized ranking by taking users' characteristics into account. Existing studies have explored user characteristics from various angles to make email search results personalized. However, little attention has been given to users' search history for characterizing users. Although users' historical behaviors have been shown to be beneficial as context in Web search, their effect in email search has not been studied and remains unknown. Given these observations, we propose to leverage user search history as query context to characterize users and build a context-aware ranking model for email search. In contrast to previous context-dependent ranking techniques that are based on raw texts, we use ranking features in the search history. This frees us from potential privacy leakage while giving a better generalization power to unseen users. Accordingly, we propose a context-dependent neural ranking model (CNRM) that encodes the ranking features in users' search history as query context and show that it can significantly outperform the baseline neural model without using the context. We also investigate the benefit of the query context vectors obtained from CNRM on the state-of-the-art learning-to-rank model LambdaMart by clustering the vectors and incorporating the cluster information. Experimental results show that significantly better results can be achieved on LambdaMart as well, indicating that the query clusters can characterize different users and effectively turn the ranking model personalized.

    Comment: In proceedings of the Web Conference 2021
    Keywords Computer Science - Information Retrieval
    Subject code 005
    Publishing date 2021-02-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Asking Clarifying Questions Based on Negative Feedback in Conversational Search

    Bi, Keping / Ai, Qingyao / Croft, W. Bruce

    2021  

    Abstract: Users often need to look through multiple search result pages or reformulate queries when they have complex information-seeking needs. Conversational search systems make it possible to improve user satisfaction by asking questions to clarify users' ... ...

    Abstract Users often need to look through multiple search result pages or reformulate queries when they have complex information-seeking needs. Conversational search systems make it possible to improve user satisfaction by asking questions to clarify users' search intents. This, however, can take significant effort to answer a series of questions starting with "what/why/how". To quickly identify user intent and reduce effort during interactions, we propose an intent clarification task based on yes/no questions where the system needs to ask the correct question about intents within the fewest conversation turns. In this task, it is essential to use negative feedback about the previous questions in the conversation history. To this end, we propose a Maximum-Marginal-Relevance (MMR) based BERT model (MMR-BERT) to leverage negative feedback based on the MMR principle for the next clarifying question selection. Experiments on the Qulac dataset show that MMR-BERT outperforms state-of-the-art baselines significantly on the intent identification task and the selected questions also achieve significantly better performance in the associated document retrieval tasks.

    Comment: In the proceedings of ICTIR'21
    Keywords Computer Science - Information Retrieval
    Subject code 004
    Publishing date 2021-07-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: CAME

    Cai, Yinqiong / Fan, Yixing / Bi, Keping / Guo, Jiafeng / Chen, Wei / Zhang, Ruqing / Cheng, Xueqi

    Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval

    2023  

    Abstract: The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently. Since various matching patterns can exist between queries and relevant documents, previous work tries to combine multiple ... ...

    Abstract The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently. Since various matching patterns can exist between queries and relevant documents, previous work tries to combine multiple retrieval models to find as many relevant results as possible. The constructed ensembles, whether learned independently or jointly, do not care which component model is more suitable to an instance during training. Thus, they cannot fully exploit the capabilities of different types of retrieval models in identifying diverse relevance patterns. Motivated by this observation, in this paper, we propose a Mixture-of-Experts (MoE) model consisting of representative matching experts and a novel competitive learning mechanism to let the experts develop and enhance their expertise during training. Specifically, our MoE model shares the bottom layers to learn common semantic representations and uses differently structured upper layers to represent various types of retrieval experts. Our competitive learning mechanism has two stages: (1) a standardized learning stage to train the experts equally to develop their capabilities to conduct relevance matching; (2) a specialized learning stage where the experts compete with each other on every training instance and get rewards and updates according to their performance to enhance their expertise on certain types of samples. Experimental results on three retrieval benchmark datasets show that our method significantly outperforms the state-of-the-art baselines.
    Keywords Computer Science - Information Retrieval
    Subject code 006
    Publishing date 2023-11-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: A Transformer-based Embedding Model for Personalized Product Search

    Bi, Keping / Ai, Qingyao / Croft, W. Bruce

    2020  

    Abstract: Product search is an important way for people to browse and purchase items on E-commerce platforms. While customers tend to make choices based on their personal tastes and preferences, analysis of commercial product search logs has shown that ... ...

    Abstract Product search is an important way for people to browse and purchase items on E-commerce platforms. While customers tend to make choices based on their personal tastes and preferences, analysis of commercial product search logs has shown that personalization does not always improve product search quality. Most existing product search techniques, however, conduct undifferentiated personalization across search sessions. They either use a fixed coefficient to control the influence of personalization or let personalization take effect all the time with an attention mechanism. The only notable exception is the recently proposed zero-attention model (ZAM) that can adaptively adjust the effect of personalization by allowing the query to attend to a zero vector. Nonetheless, in ZAM, personalization can act at most as equally important as the query and the representations of items are static across the collection regardless of the items co-occurring in the user's historical purchases. Aware of these limitations, we propose a transformer-based embedding model (TEM) for personalized product search, which could dynamically control the influence of personalization by encoding the sequence of query and user's purchase history with a transformer architecture. Personalization could have a dominant impact when necessary and interactions between items can be taken into consideration when computing attention weights. Experimental results show that TEM outperforms state-of-the-art personalization product retrieval models significantly.

    Comment: In the proceedings of SIGIR 2020
    Keywords Computer Science - Information Retrieval ; H.3.3
    Subject code 005
    Publishing date 2020-05-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Learning a Fine-Grained Review-based Transformer Model for Personalized Product Search

    Bi, Keping / Ai, Qingyao / Croft, W. Bruce

    2020  

    Abstract: Product search has been a crucial entry point to serve people shopping online. Most existing personalized product models follow the paradigm of representing and matching user intents and items in the semantic space, where finer-grained matching is ... ...

    Abstract Product search has been a crucial entry point to serve people shopping online. Most existing personalized product models follow the paradigm of representing and matching user intents and items in the semantic space, where finer-grained matching is totally discarded and the ranking of an item cannot be explained further than just user/item level similarity. In addition, while some models in existing studies have created dynamic user representations based on search context, their representations for items are static across all search sessions. This makes every piece of information about the item always equally important in representing the item during matching with various user intents. Aware of the above limitations, we propose a review-based transformer model (RTM) for personalized product search, which encodes the sequence of query, user reviews, and item reviews with a transformer architecture. RTM conducts review-level matching between the user and item, where each review has a dynamic effect according to the context in the sequence. This makes it possible to identify useful reviews to explain the scoring. Experimental results show that RTM significantly outperforms state-of-the-art personalized product search baselines.

    Comment: To appear in SIGIR'2021
    Keywords Computer Science - Information Retrieval ; 68T07 ; 68P20 ; H.3.3
    Subject code 005
    Publishing date 2020-04-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top