Revisiting “The alleged problems with SIGIR”

Five years ago, the chair of SIGIR, Charlie Clarke, wrote in his farewell letter that the annual conference (being held this week) was “in decline”, based on slowed and declining YOY submission growth, especially compared to related ACM conferences such as KDD.

A main statistic in this conversation was the following figure, showing the decline in submissions after 2010,

solid: submitted; dashed: accepted

This prompted much discussion within the community. I contributed to the conversation in an essay, “The alleged problems with SIGIR”, laying out some of the reasons why, despite the submission statistics, things were going to be OK.

In preparation for a SIGIR 2021 panel on “The Future of SIGIR”, I decided to look at what’s happened since 2016, in terms of these statistics and the community in general. Let’s start with submission statistics.

solid: submitted; dashed: accepted; red: after 2016

If we take the number of submissions (or its growth) as a sign of health, SIGIR seems to have averted continued decline. Now, there will be debate around the attribution of the turnaround in submissions to various interventions: Charlie’s suggestions, Fernando’s suggestions, strategic program chairing, outreach, the broader AI ecosystem, etc. I’m personally less interested in this specific attribution debate, mostly because much of our data is observational and, without a really rigorous analysis, difficult to resolve, especially during a short business meeting.

Despite the crude signal of community health reflected by submission count, we nonetheless still hear comments that “SIGIR could have been KDD.” So, I’d like to spend a little time trying to understand this sentiment and, eventually, refute it. To do that, it’s worth reflecting on where SIGIR, as a community, has been and where it might go.

The Past

When I was in graduate school, SIGIR celebrated its twenty-fifth anniversary and released a two-CD set (!) with the complete SIGIR proceedings (1978–2002). I took the a week or two to review every proceedings in the archive. This was a great exercise as a student because it gave me context for current conversations, let me see research trajectories, and exposed me to ideas worth re-exploring. It also highlighted the community behind these conversations. SIGIR was not just machine learning (which did not even exist as an organized discipline at the start); it included library science, human-computer interaction, systems/efficiency, natural language processing, and machine translation.

While this disciplinary diversity in the community may seem strange, when members have a common goal (i.e. designing effective information access systems), it can leverage the different academic perspectives and make more robust progress. Abstract, quantitative evaluation metrics can be—and are—scrutinized by the members who come from human-oriented disciplines (e.g. library and information science). Scalability and efficiency of algorithms can be—and are—tested by the members who come from systems-oriented disciplines (e.g. compression, indexing). And, although far from in consensus, members seemed to have a mutual appreciation and respect across disciplinary boundaries.

In addition to this disciplinary diversity, there has been institutional diversity. The community has historically had a strong presence of non-academic participants. Industry has always been active (especially when companies like Yahoo and Microsoft were investing more heavily in basic IR research). And, I’d be foolish not to mention NIST’s contribution to the community, reflected in the report on the economic impact of TREC.

Things were not perfect, since there were, of course, many dimensions upon which diversity could be improved, including geographic concentration and associated anglocentrism (although initiatives like CLEF, NTCIR, and FIRE pushed against that), the types of problems selected, and who was involved in the conversations (and who was not).

The Present

Nowadays, the SIGIR community does include members from the information science, artificial intelligence, and efficiency perspectives but, looking at the frequent words in the titles of accepted papers to SIGIR 2021, we get a sense of the emphasis of the discourse.

Hamed Zamani, “SIGIR 2021 Statistics”, April 29, 2021.

As with many fields in computer science, current machine learning trends have substantial representation in the program. From a historical context, this seems much more disciplinarily concentrated, even compared to the “learning to rank heyday” of the late 2000s.

To get a sense of what’s missing, we can look at a similar visualization for user-focused research. Since 2016, CHIIR has been the information retrieval conference focused on human factors in information retrieval. Over time, it has matured, though it remains smaller and more focused than SIGIR. Looking at the frequent words in the titles of accepted papers to CHIIR 2021 gives us a sense for where that community is.

word cloud based on CHIIR 2021 abstracts

While there is substantial overlap between SIGIR and CHIIR in topic, there exist methodological differences (e.g. “deep”, “neural”, “graph” vs “(user) study”, “taxonomy”) and foci (e.g. “recommendation”, “ranking” vs “cognitive”, “behavior”). And, although I would not say that CHIIR members have been pushed out of the main technical conference, conversations focused on the cognitive aspects of how people seek, consume, and share information have been de-emphasized.

That said, the community has put some effort into broadening the diversity of background of attendees, with SIGIR events organized by Women in IR, the SIGIR DEI chairs, and Queer in AI.

The Future

Every 6–8 years, SIGIR members gather to discuss new problems on the horizon of information retrieval research (2004, 2012, 2018). At the 2018 meeting, attendees discussed open problems in core areas of algorithm design, evaluation, efficiency, and user understanding (SWIRL 2018 Report). In addition to these core areas, the emerging area of responsibility and safety was discussed and expanded upon in the SIGIR Workshop on Fairness, Accountability, Confidentiality, Transparency, and Safety in Information Retrieval (FACTS-IR).

Relevant to the question of community, the report for the FACTS-IR workshop points out that SIGIR should consider broadening to include voices from the social sciences and policy,

Fairness-aware measures are inherently related to the definition of fairness, which is acknowledged to be a multidisciplinary problem. Different parties will need to be involved in the process of identifying different definitions of fairness and making those definitions operational through computable measurements. We envisage collaborations between computer scientists, policymakers, social scientists, and lawyers, among others, to overcome this challenge.

It’s important to note that understanding and quantifying ill-defined and abstract concepts like “relevance” was, for the SIGIR community, a multidisciplinary endeavor. So, the spirit of bringing other perspectives into conversations about information access should be in the DNA of SIGIR.

Submission counts revisited

One reason for presenting this reflection on community is to argue that perhaps the right metric for the community is not number of submissions. After all, we could game that metric by sacrificing the diversity that makes SIGIR special and over-representing highly active sub-fields (e.g. deep learning, AI) and perhaps look more like KDD and perhaps that’s what is happening already.

A different way is for SIGIR, as a community, to align on core motivations (e.g. “the understanding, design, and evaluation of tools for supporting how people access information”) and measure progress toward that goal. Of course, this may include identifying how we make progress (e.g. “who is participating?”, “how do we work together?”). Much of this has been implicit so far for SIGIR.

Moreover, we should ask ourselves, “would we be OK if we had half the submissions but more robust conversation across more sub-disciplines?”

In conclusion, given the cursory evidence here, in contrast on my position five years ago, I am concerned about the community. Not because the number of submissions should be recovered, but because the diversity and robustness of discourse should be.

There was a nice point in the panel on the nature of community and how it emerges. I wanted to comment on it here but it didn’t really fit into the thoughts above.

There’s an argument that being open about what the community studies is in friction with a bottom-up, emergent research community. The core motivation of SIGIR is the interaction of members from different backgrounds. This laissez faire approach feels attractive, but can hurt disciplinary and other diversity in a few ways. Unfortunately, there is still gatekeeping through reviewing and members of the community (or prospective members) can be excluded through that process. Dominant groups can control what is published and who feels welcome to engage in the SIGIR dialog. Further, the shared goals of SIGIR do exist, even if they are implicit. It would be very strange if the entire community (or enough of it) decided one day to shift focus to studying Invasive Rats. Unfortunately, when things are implicit they can also disguise favoritism and bias.