On January 19-20, 2020, the Center for International and Regional Studies (CIRS) hosted a Research Roundtable on Big Data in the Middle East. The meeting was held in order to generate an initial conversation on how big data can be meaningfully applied to deepen our understanding of social and political phenomena in the Middle East. With the growing availability and amount of different data and the enhanced capacity of data scientists to use computational tools for analysis, social scientists around the world are increasingly turning to big data to address some of their fields of research. How far these innovative developments in research are being demonstrated in the Middle East and how social science research questions can be explored through these new data sources and analytical tools was the primary purpose of this roundtable. Over the course of two days, participating scholars and experts engaged in a fruitful dialogue that explored several important areas, including: big data and healthcare, social media and user-generated content, analyzing data produced in Arabic language, social science research, food security, big data and museums, opportunities in the defense sector, female employment as well as religious discourse on social media and hate speech.
The discussion was initiated by Dr. Ingmar Weber, Research Director for Social Computing at the Qatar Computing Research Institute (QCRI). Dr. Weber’s presentation was centered on the topic of changing demographic trends in the Middle East and big data applications to measure these changes. Using Facebook as a platform for accessing user data and using various variables such as places lived and mobile phones used, it was reasoned that researchers could collect data on wealth distribution and income level of the users. These variables can also be used to extract data from other platforms like Twitter and Snapchat to get demographic data. This, in turn, can be used to track, International migration, Poverty, and digital gender gaps. The online digital platforms provide access to over 2 billion users, and the data can be used to address traditional attributes like interest, as well as understand the selection bias of users. However, there are limitations to this as models for bias correction are required and only include people that are online. For future research, a number of topics were identified that included interdisciplinary research efforts, conducting surveys to collect data from hard to reach population, and the use of satellite imagery to get onsite data.
Chiara Bernardi then steered the conversation to user-generated content and social impact. It was stated that social data is intervened with social behavior, and this creates essential knowledge and meaning. The social impact of these can be used to drive policies and legislation. Marketing is one industry where the relevance of this content is widely used; however, there are limits to this. There is a need to understand how this influences the strategy, and mixed frameworks are needed to interact with the industry data with the academics settings. In order to bridge the gap and understand what we can learn from user-generated content, a methodological framework in the Middle East is required. The term big data also needed a clear definition in terms of its volume. Content generated from multiple languages requires mapping and visualizing in order to understand the impact on behavior. Academics need to recognize the role and, at the same time, bridge the gaps in order to contextualize behavior on digital media. The structure of the platform and design was also highlighted as an essential component as it leads to different behaviors.
Challenges to analyze data produced in Arabic was next discussed by Wajdi Zaghouani, who stated that data in Arabic is becoming more and more available, and that data is the new oil. However, analyzing this data is difficult, as Arabic is a very challenging language. There is a lot of ambiguity and variation in the written and spoken format of the language, which requires new processing tools. In addition, the romanization of the language also poses a problem when it comes to processing. It was stated that there was a need for tools to separate the noisy data and convert it into a usable format. Fine graining of tools to analyze dialects and less commonly used Arabic variety was also highlighted as a key area. Speech processing and lack of concentrated collaboration of researchers were also identified as gaps. Zaghouani also identified the detection of hate speech, polarization, and sarcasm as an understudied area that requires further research.
Zahir Irani looked at the topic of food security and big data in the Middle East and argued that much attention has been paid to the food waste from the plate, but very little to food lost during the supply chain. It was argued that food security was a challenge around both availability and accessibility and that efforts were needs that maintain the sustainability of food production and less fluctuation on the viability. It is estimated that our food needs over the next 40 years will be greater than in the past 10,000 years. This is attributed to a number of factors, including food waste and loss, and an increase in the global population. Science and technology can be to understand the issues at hand. New technology and geological surveys can present a better picture of the physical environment and lead to increased food production. Irani highlighted that some of the drivers of food (in)security include population, income, water supply, food supply, soil erosion, imports, wastages, yields, demands, seasonality, consumption, safety and nutrition, and health wellbeing. The question of feeding future generations has become a global challenge and safeguarding food disruption, and consumption through circular economy principles requires quality big data.
The participants then discussed the topic of big data and healthcare in the Middle East. Mowafa Househ highlighted 3 core research areas; privacy and responsibility, the cultural and religious dimension of collecting data and Artificial Intelligence (AI) and big data. AI has huge impacts on healthcare. Computing power and data being generated has changed the way healthcare is practiced, diagnosed, and cured. Narrow AI and better performing computers can help gain insight into different populations of different social and physical conditions. However, this data can help detect correlation but not the causation. In terms of privacy laws, countries like Saudi Arabia, Qatar, and Bahrain have certain policies in place however state still have access to the health data of the people, even with the implementation of privacy laws. What the state can do with the data and what kind of approach can be taken, is a question that needs further examination. Many countries in the Middle East have a multiple-tier system that separates the people. Data is collected and disseminated differently from different groups (citizens and residents), which leads to missing data points. How do academics apply ethical frameworks where there is no discrimination among the various groups and what are the values that you put into the algorithm, were some of the research gaps identified? Culture and religious sensitivities also need to be taken into consideration when it comes to health data collection in the Middle East. Engaging the local stakeholders and policymakers and involving them in the conversation was also highlighted as a critical area for future research.
Lisa Singh addressed different ways big data can benefit social science research and stated that there are different kinds of big data that can be used. Currently, every discipline has its own methodology, and there is a need for more integrated ways to use these big data. Researchers need to study big data as a field rather than independently for various case studies. Another area highlighted where big data and social science could collaborate was early warning mechanisms, which are technically challenging, and lacks strong political will. Currently, researchers lack a more holistic picture of the methodology required, which stresses the need for integration of data and various ways that it can be brought together.
The participants also discussed social media and religious discourse in the Arab region. Walid Magdy presented examples of how big data is helping answer questions in social sciences. One of the studies conducted included looking at people’s opinions and the change in perception due to major events and trends. It was highlighted that results from the study indicated that global change in trends does not mean change in individual opinion. In regards to religion and social media, a case study conducted emphasized that many users used social media platforms to have discussions on topics such as atheism, share positive tweets about Islam and religion in general and re-share or re-post tweets as a form of ongoing charity. There is a need to complement these findings with anthropological studies, and innovation and technology are required for sentiment analysis, especially for data generated in Arabic. Social media is vast and represents many people, which in turn presents many opportunities to measure user behavior but requires the collaboration of social and computer scientists.
Marc Owen Jones broadened the discussion on social media by addressing the question of hate speech and propaganda. Jones addressed the issue of data weaponization and colonization, platform manipulation, and the notion of ethics. There are different approaches to data collected from diverse sources; this data can be used to gauge audience usage and behavior on social media. In many of the previously observed cases, hate speech tends to be controlled by automated bot accounts. This leads to the question of who has the power to manipulate the data and how a small number of people have the influence to shape the discussion on social media. In addition, the question of how data is weaponized to promote certain political views and ideas that are held by a group of people and not the general public needs examination. Other areas for future study involve examining the political economy of the technological companies, governance of platforms, and integrity and quality of the data.
Georgios Papoaiannou shifted the focus of discussion to big data and museums and emphasized that museums collect a large amount of data on a daily basis. This data can be used to address some of the challenges and implications of big data and museums. When it comes to big data and museums, there is more than one reality and a number of issues that need academic focus. Qatar museum authority opened 4 new museums in the past 5 years. These institutes generate data on a daily basis that can be used to address ways to help make these museums better in various ways. One of the research areas identified was the need for data-driven museums and policies through which correct and meaningful information could be collected. Papoaiannou also stressed addressing sentiment via textual or image data and the pros and cons of doing this, as a gap in the existing literature.
Eid Mohamed analyzed Egyptian culture through big data and looked at the question of whether Egyptians still cared about the Arab Spring. The cultural data can provide evidence of growing revolutionary consciousness in the general masses. Most excitingly, an analysis of such great masses of source material offers the research community the opportunity to work on the challenge of discovering the appropriate epistemologies for coming to terms with emergent transcultural identities and a transformed Arab world in the making. Digital humanities, in general, offer a new set of methods for dealing with such an abundance of materials. The Arab Spring needs to be explored through an approach of localizing the change by using local stories. The pre-2011 context of significance concerns earlier moments when popular resistance came to the fore, moments that 2011 has been considered to be a continuation of or inspired by. These can be traced to the writings of Taha Hussein and other revolutionary writers. Computational tools are required to analyze the vast body of corpora as well as online and offline activism.
The dialogue then moved to the discussion of big data and female labor in Turkey. Gunes Asik stated that female employment is essential for development and that big data is not just user-generated data but also can mean large administrative data. This includes population data kept by the government in time series. Though this data is reliable, it is very difficult to access as government approval is required. Female employment and labor demand, in general, is affected by a number of factors, including discrimination, government policies, and the emergence of new sectors. Some of the determinants of female employment include education, conservatism, child and elderly care, health, and lack of social protection. Asik identified a number of research gaps, such as the impact of informal employment, the effect of domestic violence, and using Google search and social media to collect the data, as well as the automation of jobs and its impact on different genders.
Charbel Chedrawi talked about opportunities for big data in the defense sector and detailed that defense data is a black box. Data for this sector is not easily accessible, and there are very few scholars working on the topic. Big data is the strategic assets of the 21st century and is a valuable raw material for security and defense. However, there are certain barriers in generating and applying this data, including infrastructure, human barriers, such as lack of IT professionals in the organization, and lack of proper training and financial barrier, as budget is allocated mainly to weapons rather than research and development. Chedrawi identified 5 areas of further study; identifying the resource gaps in defense sector and the limitations associated with it; the hazards of outsourcing; isomorphism of the institutions; the type of technology required for mining the data and the role of big data in reducing the transaction cost and how can the defense sector benefit from such economies.
As a general takeaway, the roundtable discussions indicated that for social scientists studying the Middle East who want to use new data sources, it is of fundamental importance that they bridge the disciplinary divide and develop partnerships with data scientists. In order to make the best use of the variety of new data available and apply them to critical social sciences research questions in the region, there is a need to actively develop interdisciplinary collaborations. Working with data scientists who have the requisite expertise in data analytics will help social scientists make sense of and extract meaning from data from multiple sources. Moving forward with the discussions at this roundtable CIRS plans to launch a research project in the near future with a thematic focus on some of the core issue(s) and big data in the Middle East.
Participants and Discussants:
- Shaza Afifi, Georgetown University in Qatar
- Gunes Asik, TOBB Economics and Technology University
- Zahra Babar, CIRS – Georgetown University in Qatar
- Mongoljin Batsaikhan, Georgetown University in Qatar
- Chiara Bernardi, University of Stirling
- Chaïmaa Benkermi, Georgetown University in Qatar
- Misba Bhatti, CIRS – Georgetown University in Qatar
- Charbel Chedrawi, Saint Joseph University
- Salma Hassabou, Georgetown University in Qatar
- Mowafa Housef, Hamad Bin Khalifa University
- Zahir Irani, University of Bradford
- Marc Owen Jones, Hamad Bin Khalifa University
- Mehran Kamrava, CIRS – Georgetown University in Qatar
- Walid Magdy, University of Edinburgh
- Eid Mohamed, Doha Institute for Graduate Studies
- Emad Mohamed, University of Wolverhampton
- Phoebe Musandu, Georgetown University in Qatar
- Georgios Papaioannou, University College of London- Qatar
- Khushboo Shah, Georgetown University in Qatar
- Lisa Singh, Georgetown University
- Elizabeth Wanucha, CIRS – Georgetown University in Qatar
- Ingmar Weber, Qatar Computing Research Institute
- Wajdi Zaghouani, Hamad Bin Khalifa University
Article by Misba Bhatti, Research Analyst at CIRS