The Institute for Data at the University of Pennsylvania

The Institute for Data at the University of Pennsylvania
Table of Contents

Institute for Data is a multidisciplinary research and training hub dedicated to developing new lines of scientific inquiry while serving as a bridge to transform research findings into knowledge products and best practices that benefit Greater Philadelphia Metropolitan Area residents.

Faculty support includes providing research expertise, mentoring and access to cutting-edge technology for data collection and analysis of research study data collection and analysis. In addition, support may include assistance with writing research proposals as well as acting as a central point for external partners.


The Institute for Data exists to serve as a hub between government, industry and academia related to foundational research in Big Data science and its application within disciplines like materials science, precision medicine, energy usage management or smart cities. Through our activities we bring existing and new efforts together for machine learning, high performance computing algorithms mathematical foundations of data science research together. This includes efforts in machine learning, high performance computing algorithms mathematical foundations of data science as well as research related to disciplines like materials science precision medicine energy usage smart cities etc.

The Institute for Data is dedicated to creating rigorous and interdisciplinary graduate and undergraduate education programs, equipping students with the knowledge required for cutting-edge research. Their curriculum fosters collaboration across fields – engineering and the sciences alike.

Additionally, the Institute is dedicated to creating collaborative research projects and centers that advance data science through collaborative research projects, training programs, research in foundational areas such as statistical modeling and mathematical theory, high performance computing and data mathematics, with applications across materials science, genomics healthcare finance energy domains. One such center is Transdisciplinary Research Institute for Advancing Data Science (TRIAD).

Columbia University stands out as an exceptional place to explore how data science is revolutionizing all fields, professions, and sectors. Boasting strong disciplinary departments in arts, humanities, social sciences as well as full-fledged professional schools such as architecture, business law journalism nursing medicine – Columbia serves as an ideal laboratory to gain a comprehensive view of this transformational trend and its effect on our lives.

The Institute for Data is also dedicated to expanding its leadership in using cutting-edge artificial intelligence (AI) technologies to address major problems and enhance quality of life. AI uses machines’ superior abilities for data acquisition and comprehension with concepts humans excel at such as reasoning, judgment and strategizing in order to form dynamic human-machine partnerships that produce solutions more powerful and impactful than what can be produced through individual disciplines alone.


Our Vision is to foster an inclusive world where all individuals, no matter their background and abilities, have access to impactful, equitable data-driven solutions that benefit society. The Lucy Family Institute serves as an intellectual and interdisciplinary beacon, linking academic domain knowledge with cutting-edge data science techniques in academia and industry domain knowledge; activities at this Institute foster a culture of cooperation and excellence.

Every modern field of inquiry demands knowledge of data science – this means collecting, curating, analyzing and interpreting it for various audiences – often non-technical audiences. Model analysis requires a deep knowledge of how to work with and interpret models, drawing from natural sciences, social sciences, engineering, law, business education medicine. Yale faculty from all fields are already making use of data science techniques to advance their research in new directions, from detecting fraud in financial systems to autism, cardiac surgery and materials science research – these scholars are applying advanced data methods that will transform their fields over time.

The Center seeks to advance data science through pioneering and influential research, training undergraduate and graduate students, and building lasting external partnerships. It serves a thriving community of domain experts and methodological specialists across the University by offering a framework for data-intensive research, open source software development projects, educational programming activities, collaboration opportunities with industry and government bodies, etc.

BIDS hosts many open events throughout the year that are open to everyone – discussion groups, workshops and training seminars are just some examples. Events are frequently recorded and archived so anyone can benefit from them later.

As part of the NSF Northeast Big Data Innovation Hub, Harvard, Columbia, MIT, CUNY Cornell, and more are pleased to collaborate in forging innovative public-private partnerships that address regional data analysis needs as well as workforce development needs. Together we are forging the next generation of public-private partnerships which will address regional data science needs from research through workforce development.


The Data Values Project envisions a world in which everyone is equally invested in data that impacts them. Our vision includes fair data design, production, and governance that allows individuals and communities to influence how it is used to improve lives; driven by evidence-based decision making that is transparent and accountable to those whom it serves.

To achieve our vision of data values and justice, it is vital that we gain an understanding of the power dynamics underpinning current data systems and their effect on how people engage with them. To shed light on these questions and identify steps which can help shift power balance in data systems. That is what the Data Values Project seeks to accomplish.

Civil society organizations can play a crucial role in helping individuals exercise their agency in data by serving as community representatives and encouraging participation in data design, collection, analysis and governance processes. Furthermore, they work closely with governments in developing mechanisms that review and improve data processes while upholding statistical rigor and international comparability.

Private companies can contribute positively to a fairer data future by being aware of their influence and adopting business practices that don’t exacerbate structural inequalities. They can work alongside civil society and government agencies to build trust within communities, invest in skill training programs and use data for good.

Combining these actions will empower people to have a stronger voice in decisions affecting them and hold decision makers and other actors accountable for using data in an equitable and responsible manner. While this approach is long-term commitment that requires significant resources and effort, the benefits of creating an equitable data future outweigh its costs; all stakeholders must accept the challenge to move forward effectively; we hope that this paper along with its wider campaign of activities and advocacy it will spark will serve as the start of movement aimed at rebalancing power through data.


As a hub for the University, the institute will serve as a gateway for data science research and education collaboration and support, connecting researchers to new collaborators and partners. Furthermore, funding will be made available for research that advances mathematical, statistical and algorithmic foundations of data science while simultaneously supporting projects and programs with applications outside the field of pure data science.

The Institute’s four research themes address multiple opportunities for significant impact in data science applications. This includes exploring the algorithmsic landscape of statistical problems and creating novel sketching, sampling, and sublinear time algorithms to address them; exploiting advances in mathematical theory for statistical methods; and creating theoretical tools necessary for addressing all application domains of data science – such as using neoclassical notions such as optimality, robustness, and calibration which are relevant today.

This multi-institution and cross-disciplinary collaboration brings together faculty from departments and institutions across campus – computer science, statistics and mathematics, economics, electrical engineering and operations research are among those represented. This approach builds on a long history of work in this area and on the premise that virtually all major advances in data science methodology began with scholars from other fields trying to solve a problem they couldn’t address on their own.

Additionally, our institute’s projects and programs that explore wider applications of data science will further solidify its position as a world-class leader in this emerging area. This includes working with industry partners, creating joint research centers, and equipping students at all levels with skills they’ll need to tackle data science challenges in real life.

Finally, the institute will address social, cultural, and ethical concerns surrounding new forms of technology in order to foster an environment in which these innovations respect human dignity. To do this, research that enables informed public debate will be conducted as well as building a national network of researchers who can anticipate and support its development – this work will be carried out in partnership with other MacArthur grantees such as Data & Society Research Institute.

Sam is an experienced information security specialist who works with enterprises to mature and improve their enterprise security programs. Previously, he worked as a security news reporter.