stats170a_a2 – Ruilin Wu

Analyzing Factors Related to Conferences Publication
Shuyi Chen, Ruilin Wu

Abstract

This study examines how National Science Foundation (NSF) and National Institutes of Health (NIH) funding correlates with computer science conference publications from 2020 to 2024. After integrating open-source data with an existing database, we identified 1,840 principal investigators across 213 institutions. At the PI level, funding shows a weak relationship to publication counts (R² = 0.00174). In contrast, institution-level funding strongly predicts publication productivity (adjusted R² = 0.733). Incorporating regional interactions further improves explanatory power (adjusted R² = 0.8068), revealing a lower funding-to-publication “return” in the South compared to the Midwest. While the findings underscore funding’s importance institutionally, causal inferences are limited, and longer timelines may be needed to capture the full effect of recent grants.

Introduction

Conducting research and publishing findings are essential for advancing the field of computer science, and each study requires substantial financial support. Analyzing adjusted publication counts, which consider the number and ranking of authors in each paper, motivates us to examine the funding sources behind these publications. NSF and NIH datasets are processed separately, then merged after extracting principal investigator (PI) names and awarded funding. To maintain consistency, funding amounts for PIs with multiple awards are aggregated. Finally, the organized NSF and NIH datasets are merged with our existing cleaned database regarding the Computer Science field to explore potential correlations between the funding amounts received by principal investigators and their publication counts. This analysis aims to provide a deeper understanding of how research funding influences publication output in the field of computer science.

Statistical Methods

From 2020 to 2024, a total of 34,147 principal investigators (PIs) received funding from the National Science Foundation (NSF), while 76,219 PIs were funded by the National Institutes of Health (NIH) across various research fields worldwide.

Our existing cleaned dataset contains the information of 17,837 principal investigators. After matching these records with the existing cleaned dataset, 1,737 PIs from NSF were identified. Their average total funding was $1,356,554, with a mean total adjusted publication count of 7.31. Similarly, 211 PIs from NIH were matched, receiving an average total funding of $2,788,723, with a mean total adjusted publication count of 5.07. The adjusted publication counts are right-skewed, with most PIs and institutions having low counts, while a few contribute significantly more.

After merging the NSF and NIH awarded principal investigators (PIs), a total of 1,840 distinct PIs from 213 different institutions were identified. Their average total funding was $1,600,410, with a mean total adjusted publication count of 7.01. Total funding (log scale) follows a normal-like distribution, with PIs receiving mid-range amounts and institutions exhibiting greater variability. After filtering only US institutions to explore geographical factors, there are 158 institutions that received NSF and NIH funding in the past five years.

To assess the relationship between funding amounts and total adjusted publication counts, we employ a linear regression model for further analysis. Linear regression allows us to determine the extent to which funding influences research output by estimating the strength and direction of the association between funding and publication counts. The comparative analysis is conducted at two levels: the principal investigator(PI) level and the institution level. At the PI level, we examine how an individual researcher’s total awarded funding relates to their adjusted publication count. At the institution level, we aggregate the funding and the adjusted publication counts across institutions to analyze how research funding affects overall publication counts at a broader scale. Additionally, states of the institutions are being grouped into broader geographic regions: Northeast, Midwest, South, and West.

Results

Linear Regression Model 1: Total Adjusted Publications on Total Funding - Principle Investigator (Left)
Linear Regression Model 2: Total Adjusted Publications on Total Funding - Institution (Right)

Model 1 shows that at the PI level total funding is a very weak predictor of the total adjusted publication count. The estimated intercept is 6.863, meaning that with zero funding, an individual is predicted to have about 6.863 adjusted publications. The coefficient for total funding is 9.336e-08, indicating that an increase of $1,000,000 in funding is associated with an increase of roughly 0.093 publications. However, this effect is only marginally significant (p = 0.0737), and the model explains just 0.17% of the variance in publication counts (R-squared = 0.00174). Overall, funding does not appear to be a strong predictor of publication productivity at the PI level.

Model 2 reveals that higher funding at institutional levels are strongly associated with increased publication productivity. Specifically, for every additional dollar of funding, the model predicts an increase of approximately 4.277e-06 in the total adjusted publication count. In more interpretable terms, an increase of $1,000,000 in funding is associated with an increase of about 4.277 adjusted publications. An adjusted R-squared of 0.733 indicates that roughly 73.3% of the variability in publication productivity is explained by differences in funding levels. Moreover, the F-statistic of 435.4 (p-value < 2.2e-16) demonstrates that the overall model is statistically significant, confirming that funding is a very important predictor of publication output.

Linear Regression Model 3: Total Adjusted Publications on Funding by Region (Interaction Model)

Model 3 introduces an interaction between total funding and region, capturing whether the slope of funding’s effect on publications differs by region. Two of these interactions (Northeast and West) are not significant, however, region South exhibits a significant negative interaction term (-2.527e-06, p = 0.00015), each additional dollar of funding yields fewer publications compared to the Midwest. With an adjusted R-squared 0.8068, Model 3 explains about 80.7% of the variance in total adjusted publications—an improvement over a non-interactive model. These results suggest that while funding strongly predicts publication output at the institutional level, its effect can vary by region, with the South showing a notably smaller increase in publications per additional dollar of funding.

Discussion

The linear regression analyses reveal contrasting relationships between funding and adjusted publication counts at different levels of aggregation. At the principal investigator level, total funding explains very little of the variance in publication output, suggesting that individual productivity is more strongly influenced by factors such as research team size, institutional resources, collaborative networks, and disciplinary publishing norms. In contrast, institution-level analyses indicate a robust positive association between funding and publication productivity: higher funding generally translates into more adjusted publications. This likely reflects that sufficient financial resources enable improved research infrastructure, additional personnel, and stronger collaborative networks—factors that collectively foster a higher volume of publications. When regional factors are incorporated via an interaction of funding and region, the model’s explanatory power improves. Notably, the slope for the South is significantly lower than that of the baseline region, implying that an additional dollar of funding in the South yields fewer publications compared to the baseline. Meanwhile, the Northeast and West do not differ significantly from the Midwest. Although these regional differences are statistically meaningful, funding itself remains the primary predictor of publication output at the institutional level.

Despite these insights, the analyses do not establish a causal link: higher funding may lead to more publications, but institutions with a strong track record may also be more competitive in securing grants. Additionally, some awards may not bear fruit immediately due to the long timeline needed for project completion and subsequent publication. Future studies could extend the timeframe, investigate the quality of published research, examine discipline-specific publishing norms, and explore longer-term funding effects to gain a more comprehensive understanding of how funding influences publication outcomes in computer science.

Abstract

Introduction

Results

Discussion