Representation Discrimination in Media

We conduct numerous experiments since it is a sensitive topic to make claims on. The experiments are done over a long period of time so we follow the following meta data to store their results.

Experiment details
Experiment name
Date
Objective
Result
Future Work
Artifacts (code, input data)

Observe that, we do not stoe the output of the experiment because once can find that simply by running the code with that input data. Also, you can find our codebase in this github repository.

Task - Keyword Based Data Collection

This our curated dataset]] of ethnic word-contained articles from ebD- Bangla newspaper dataset. We have curated ethnic people related dataset using simply keyword based extraction. We considered the following keywords to filter out the articles.

  ethnic_tribe_names = [
    "চাকমা", "মারমা", "সাঁওতাল", "ত্রিপুরা", "গারো", "ওঁরাও", "তঞ্চ্যঙ্গা", "ম্রো", 
    "পাংখো", "চাক", "খেয়াং", "খুমি", "লুসাই","কুকি", "রাখাইন", "মণিপুরী",
    "হাজং", "খাসিয়া", "মং", "বর্মন", "পাহাড়ি", "মালপাহাড়ি", "মুন্ডা", "ভূমিজ",
    "কন্দ", "পাঙন", "লাওরা", "মুরং", "বাগদী"
] #"বম","কোচ","ডালু","কোল", "রাজবংশী", "পাত্র", "ভিল", "গণ্ড", "খাসি"

ethnicity_directed_words = [
    "আদিবাসী" , "আদিবাসি" , "উপজাতি", "নৃগোষ্ঠী"
]

TODO Failed Attempts

Word embedding model training on the whole dataset. This was a MAJOR blockage in our earlier work.

Experiment - Supervised Topic Modeling on Ethnic with Stemming

The only preprocessing we did was this -> to_remove = ['email\xa0protected', '\n\n\n\xa0\n\n\n\n\n', '\u200c্', '\n\n', '\xa0', '\n']. These are the top 10 topics we found in the articles.

Clearly, stemming is needed here. We used SBNLTK Stemmer here because this project is the largest BNLP library and the developer seems pretty active. After stemming, we got the following topics.

Experiment - Supervised Topic Modeling to find best Alpha score

We modeled coherence score against different alpha values of LDA model. For 10187 dataset,

We had the highest coherence score for alpha value around 73. So we get our final topic list from an LDA with coherence score = 0.7394. The topic list is below.

Experiment - Semi Supervised Topic Modeling on Ethnic, nonethnic

<2024-07-28 রবি> Objective: Since supervised topic modeling gave bad topics, let us try corex.

We collected 5342 ethnic articles from (4.4M->10132) news articles. We then used semi-supervised CoRex topic modeling. We used the following words as anchors.

Artifact	Link
Code	ethnic_5342corex.ipynb
Data	etnic_5342data

anchors = [["ক্রীড়া", "রুপা", "স্বর্ণপদক",  "ব্রোঞ্জপদক"], ["নির্বাচন","প্রার্থী", "চেয়ারম্যান"], ["পর্যটক"], ["উদযাপন", "নবান্ন", "উৎসব"], ["বিদ্রোহ", "কল্পনা"], ["মামলা"] , ["সরকার", "লুটপাট", "দুর্নীতি", "প্রশিক্ষণ"], [ "সেতু", "সংস্কার", "সংকট", "পানি"], ["বাংলাদেশ", "বিমানবাহিনী", "অফিসার", "ক্যাডেট"] ]

Later, we also collected exactly 5342 nonethnic articles for an equal comparison. The results are in result section below.

Task - Mass Annotation of 5342 Ethnic Articles

Artifact	Link
Code	ethnic_5342corex.ipynb
Data	etnic_5342data , nonethnic_5342data , Excel containing genre distribution

<2024-08-09 শুক্র> Objective: Quality data is necessary so we need to annotate to clean the data. Its also because of our definition of exact article.

I first did an exploratory analysis of the articles. When annotating, we paid close attention to why we are annotating the data as such by noting down the reason. It increased our understanding of the underlying data. We used the following interface for this stage of annotation and ended up annotating around 100 data.

If we remember, we only did keyword based extraction to collect ethnic article dataset. To make it cleaner, we decided to annotate them manually and identify the exact articles which are about ethnic people. But annotating 5000+ articles is a mammoth undertaking. So we considerd keybinding - user will just see the article and press E/N to annotate ethnic, nonethnic. Usage of keybinding increase annotation speed significantly. But still, we annotated around 150 nonethnic articles.

We also made one tool for easy understanding of Topics in Topic modeling by inspecting the documents in each topic.

Experiment - Ethnic 100 articles with Gemini-1.5 Flash

<2024-08-14 বুধ> Objective: Seeing if gemini-1.5 flash free tier can be used for annotation

A major part of our experiment is genre classification. We have identified the following 15 genres: Crime, Politics, Natual Disaster, Sciene & Environment, Government actions, Business & Economy, International Affairs, Education, Health, Sports, Protests, Culture, Entertainment, Lifestyle, Discourse. Additionally, we also wanna find if the description is Straight, Investigation, Commentary.

Human are biased at annotating things they relate to. According to a journalist on his comment about LLM for news analysis, he said and I quote, "LLMs are the least biased journalist". So even without rigourous validation, we went with experimenting with Gemini-1.5 for news genre and news style annotation.

Experiment - Nonethnic 100k articles with Corex

<2024-08-15 বৃহঃ> Objective: Seeing if using more articles gives us a more complete distribution of usual news genre.

We collected 100k nonethnic articles from 4.4M news articles. We then used semi-supervised CoRex topic modeling. We used the following words as anchors.

Artifact	Link
Code	Annotation tool code
Data	ethnic_{datasetexp05342annotated1}-151.csv , exploratory_{100annotation}

Artifact	Link
Code	Gemini₁₅.ipynb
Data	ethnic_{datasetexp05342}
Gemini Annotations	Gemini 100 Annotations in Drive

anchors = [["ক্রীড়া", "রুপা", "স্বর্ণপদক",  "ব্রোঞ্জপদক"], ["নির্বাচন","প্রার্থী", "চেয়ারম্যান"], ["পর্যটক"], ["উদযাপন", "নবান্ন", "উৎসব"], ["বিদ্রোহ", "কল্পনা"], ["মামলা"] , ["সরকার", "লুটপাট", "দুর্নীতি", "প্রশিক্ষণ"], [ "সেতু", "সংস্কার", "সংকট", "পানি"], ["বাংলাদেশ", "বিমানবাহিনী", "অফিসার", "ক্যাডেট"] ]

Experiment - Ethnic (4893) with Corex

Artifact	Link
Code	collecting_{100knonethnicarticles}.ipynb , Kaggle link , CoRex code
Data	100k nonethnic articles in drive , Excel containing genre distribution

Objective: Getting a final result on ethnic people related articles' topic distribution.

Since we are using CoRex, anchor words selection is an important phase. So we first select generic words as anchor words and then select ethnicity specific ethnic words. We will see what performance difference there is. Note: We did not do it. Because it would add bias to the process. For such a critical topic, we want to do it as unbiased as possible.

Note that, we increased threshold to 25.0 here since doc_prob was higher for overall distribution.

Experiment - Bangla news distribution (100k) with Corex

TODO Experiment - Combined 100 articles with Gemini-1.5 Flash after defining each genre.

<2024-08-14 বুধ> Objective: Seeing if gemini-1.5 flash free tier can be used for annotation Because all ethnic news articles can be called local news and that is not meaningful for us. So we add ""

Experiment - H0: There is no difference in size between ethnic and normal articles.

Artifact	Link
Code	Gemini₁₅.ipynb
Data	ethnic_{datasetexp05342}
Gemini Annotations	Gemini 100 Annotations in Drive

Atuel, Hazel et al provided a work very similar to ours in their work on majority and minority representation ¹⁵. They showed topic distribution, ethnic article count and ethnic article size as a means to understand media representation. So we now try to find prove the hypothesis.

Using Welch's t-test, p-value = 1.840763388749358e-195. For level of significance 0.05, we reject the H0.

Group	Total Articles	Average
Ethnic	4893	440.65563049254035
Nonethnic	100k	124.93548193554193

We also do some wordclouds here. For remove_stopword phase, we used a dictionary of stop_words, along with len(word)>3.

For len(word)>5, the word cloud is:

Experiment - Word Cloud for each Topic

Topic based word cloud did not work as expected, even though we rechecked stop word removal step. We think word cloud won't give us useful insight here because we use keyword based approach to select the ethnic articles. And thanks to that, not all articles are entirely ABOUT ethnic people. They just contain the name of ethnic minorities.

Experiment - Bangla News Sentiment Analysis

We use pretrained models from hugging face for this task. CSEBuetnlp published a similar Sentiment analysis using LLM paper recently ¹⁶. CUET also has a very nice data crawler and sentiment analysis code ¹⁷.

Experiment - Word Cloud for each Topic after bug fixing

In earlier versions, we did not handle punctuations so we had some noises. In this version, we handle punctuations and also, some more stopwords. The output is less noisy.

Experiment - Finding ethnic articles using ChatGPT Turbo

Artifact Type	Link
Code	Bangla Sentiment Analysis.ipynb
Data	4893 ethnic articles.

Model	is_ethnic	Featured	Style	Sentiment	Genre	Experiment
gpt-4o-mini	4/5	Less Diverse	Straight	Less Diverse	LGTM	prompt
gpt-3.5-turbo	5/5	Diverse	Straight	Diverse	LGTM	result , prompt
gemini-1.5-turbo
gpt-4o-mini (512ctx)	4/5	Subject	Diverse	Diverse	LGTM	prompt
gpt-3.5-turbo (512ctx)	4/5	Diverse	Diverse	Negative	LGTM	prompt

Now we really only need is_ethnic field. The rest 7 are just considering our cost, nothing else. So lets use gpt-3.5-turbo to get results for 100 articles.

From above table, we are sure that we can use gpt-3.5-turbo efficiently for our task. Now let us see if including more information reduces accuracy.

Annotation

Model	Accuracy	Prompt	CM	Result
gpt-3.5-turbo, ctx512	0.54 (40)	gpt-3₅-turbo-prompt-v2.txt	15/25 ,6/14	result
gpt-4o-mini, ctx512	0.74 (40)	gpt-3₅-turbo-prompt-v2.txt	19/25, 10/14	result
gpt-3.5-turbo, ctx10k	0.77 (40)	gpt-3₅-turbo-prompt-v2.txt	22/25 , 8/14	result
gpt-4o-mini	0.74 (40)	gpt-3₅-turbo-prompt-v2.txt	20/25, 9/14	result

Model	Accuracy	Prompt	CM	Result
gpt-3.5-turbo, ctx10k, with other	0.72 (40)	gpt-3₅-turbo-prompt-v2.txt	23/25 , 5/14	result

We finally ended up annotating is_ethnic label only. We had an aggrement score of 75% between 2 authors over 100 articles. The conflicts were resolved through manual supervision. The resulting final annotation is here. The conflict was mainly in terms of whether ethnic awami league news are ethnic news or not. They ARE ethnic news since they represent the political aspect of their life.

Experiment - Finalizing ChatGPT Turbo

<2024-09-13 শুক্র> annotation data: 100, two annotator, 75% agreement score, (Yes=63, No=37)

Annotation	Count
Yes	63
No	37

Annotated Dataset	Link
Annotated Dataset	Link
Abhi annotation	Link
Sharif annotation	Link

Model	Prompt	Accuracy	Precision	Recall	F1 score	Result
gpt-3.5-turbo, ctx10k	ethnic-only-prompt-v2.txt	62%	71.9	65.1	68.3	link
gpt-4o-mini, ctx 10k	ethnic-only-prompt-v2.txt	78%	81.5	84.12	82.8	link
gpt-3.5-turbo, ctx10k	combined and formal prompt.txt	72%	72.15	90.5	80.2	link , all metrics
gpt-4o-mini	combined and formal prompt.txt	67%	65.5	100	79.24%	link , all metrics
gpt-4o-mini, ctx 10k	combined and formal prompt.txt	69%	67.02	100	80.25	link , all metrics
gpt-4o-mini, ctx 10k	chatgpt4o-mini-prompt-v1.txt	70%	67.7	100	80.7	link , all metrics
gpt-3.5-turbo, ctx10k	chatgpt4o-mini-prompt-v1.txt	72%	76.9	79.4	78.13	link , all metrics
gpt-3.5-turbo, ctx10k	combined and indigenous.txt	77%	81.3	82.5	`81.9`	link ,all metrics

GPT 4o greatly overfit to TRUE class for combined prompt v1, hence performing terribly. DONE ANNOTATING! There was a small mishap midway, anyway, here is the 4893 annotated ethnic article

Experiment - EDA on ChatGPT Turbo Annotation

Experiment - Topic modeling, word cloud in curated dataset

The topic coherence score was high (114.78088415347356) and we used the same threshold as usual. And sports, health, local news, science, religion contents are not ABOUT ethnic people, they are just part of it.

This topic distribution provides stronger support against our hypothesis. Now let us do word cloud. For protest and conflict, we got the following word cloud.

Experiment - EDA on both ChatGPT Turbo Annotation

Dataset	Top 40 topics
Ethnic	0.67%
Usual	0.70%

Agenda setting for ethnic people is violence. Dataset 1: 36.50% Dataset 2: 53.89%

For priming, Violence is the major priming for ethnic article while investigation is the major priming for normal articles. Dataset 1: 19.44% Dataset 2: 33.22%

For mobilizing, there is not much difference. Dataset 1: 19.44% Dataset 2: 33.22%

Annotation

We now need to annotate sentiment, agenda setting, framing, priming, mobilizing.

Paper dumps - Framing Evaluation and Noun encoding

For that, we consider universal sentence encoder (USE) to encode frames. Because frames are noun phrases, as evident from the frames identified in and . And according to Ajallouda et al., the best technique that can be used to represent noun phrases is USE . Next, we cluster these frames into a hierarchical structure using Agglomerative Clustering, ultimately isolating X top-level frames. We use silhouette score to find the optimal number of clusters to be X.

According to , framing is inherently nuanced. The intertwined nature of frames often leads to confusion among annotators, as a single news piece can contain multiple frames . This complexity makes it challenging to ensure that all frames are accurately captured during annotation. Additionally, a machine may detect a subset of frames or identify an overlapping but distinct set of frames. This scenario is best treated as a fuzzy or soft evaluation problem. A "soft match" occurs when a machine's prediction is considered correct if it matches any of the multiple frames identified by human annotators, even if the match is not exact or comprehensive. The Jaccard Index is used to estimate similarity between sets or sequences . And Binary indicators are also used across domains to represent true or false in a relationship between sets , . So we use Jaccard Index and Binary Indicator for match, here-forth called Binary Match Indicator as soft evaluation metric. In our case, BMI can be represented as:
$$ \text{Binary Match Indicator} = \begin{cases} 1, & \text{if } \text{FrameX} \in \{ \text{Frame1}, \text{Frame2}, \ldots, \text{FrameN} \} \\ 0, & \text{otherwise} \end{cases} $$

This metric represents match or non-match between machine and human annotation. So we can now use regular formula to calculate human-machine agreement rate . In our case, if we evaluate over M articles, we can calculate the Average Agreement Rate using the binary match as follows:
$$ \text{Agreement Rate} = \quad \frac{1}{M} \sum_{i=1}^{M} \text{Binary Match Indicator}_i $$

Our human-machine agreement using Binary Match Indicator was 0.43. We also calculated Jaccard score to be 0.26. T

Experiment - Framing

We annotated 109 articles for 5-type framing. Our agreement score was around 86.23%. Our result in 5-type framing using LLM is:

Model	Acc	Precision	Recall	F1-score
gpt-3.5-turbo, llm-paper	50	49	50	47
gpt-4o-mini, combined	58	55	58	56

We now annoate 109 articles for 16 core frames we identified earlier from 1626 frames. The agreement score was ADITY. Our result using LLM is:

This metric measures the similarity between the predicted frame set and the actual frame set. Its also called Intersection over Union.

The Jaccard Index ranges from 0 (no overlap) to 1 (perfect overlap), making it suitable for fuzzy evaluations. We use Jaccard Index because this scenario can be considered a fuzzy or soft evaluation problem since the predicted frames are not required to exactly match the actual frame list but are instead considered correct if they are present within the actual frames. To evaluate such scenarios, you can use metrics that account for partial matches and the degree of overlap between the predicted and actual frame lists.

Despite the moderate Jaccard Index score, we still use LLM for frame identification because LLM can find most frames from an article. A human often overlooks and can't identify most frames, only focuses on few frames.

Survey

Literature

Model	Jaccard Index
gpt-4o-mini, combined	0.371

Atuel, Hazel et al first proves representation discrimination and then, tries to provide an explanation for it using sociological literature ¹⁸. Georgiou, Myria et al first performs interviews and then summarizes their findings by quoting what particiapnts have said about certain topics ¹⁹. However, they do support those participants' claims empirically.

Bryant et al discussed the effects of media under-representation on minority groups. The findings indicate that televised portrayals of racial/ ethnic minorities influence majority group members’ real-world perceptions about minority groups as well as minority group members’ evaluations of self. The factors facilitating this learning process (perception) include frequency of television exposure, characteristics of the content/message, realism of the portrayal, similarity to the model, identification with the model, and level of individual cognitive ability (Bandura, 1986; Potter, 1986). Taken together, these variables provide one framework for understanding the extent to which the content and number of portrayals of minorities on television may result in judgment formation ²⁰.

CDA

It focuses on linguistic devices or speech acts and how they serve powerful social groups to fulfill their interest

Van Dijk’S ideological square allows subtle analysis to express various ideological stances. It includes:

Behnam and Mahmoudy (2013) discovered the political ideology in Iran’s nuclear report through discourse structure. In determining the ideological structures, the presupposition concept was employed leading to a specific ideological structure. This can be depicted through the phrase: “Iran has not provided requested information…” (Kerr 2009, p.2). The phrase denotes a negative belief in Iran. Besides, another preference in which ideology can be determined is through repetitive words in the report such as undeclared, uncertainties, inconsistencies, and contamination. These words depict a destructive image for the country such as: a) Iran is trying to conceal information from the world view, and b) Iran is inconsistent in its nuclear program.

Ramanathan et al provided many more examples of CDA in their paper on "Applications of CDA" where researchers manually inspect articles to find bias towards certain idelogical stance ²¹.

When applied to the study of ethnic minorities, CDA often focuses on how these groups are represented in the media, politics, and public discourse. Key questions might include:

I think we can't use it because frankly, analyzing articles manually is time-consuming and will include my bias.

Question

Survey/Interview Topic: "Under-Representation in News Media and Its Impact on the Lives of Indigenous People in Bangladesh"

Case Finding

We need to delve deeper into analyzing the ethnic articles to find case studies to demonstrate clear discrimination. For it, we need to do the following tasks.

We can combine all of it into a single ChatGPT response. However, imo, this can be extended to our next task.

Task	Dependency
Filtering only ethnicity related articles (OERA)	ChatGPT
Finding sentiment in OERA	Bangla, ChatGPT
Retracing article name, author and category for case study	-
Use journalism domain analysis i.e. spokeperson, tone	ChatGPT

Additionally, finding case studies can greatly benefit from volunteers who are already working on it. They should already possess a large collection of such cases of discrimination, like the following.

References

Fonseca, António Filipe, et al. "Caste in the news: A computational analysis of Indian newspapers." Social Media+ Society 5.4 (2019): 2056305119896057.↩︎
Saleem, Muniba, and Srividya Ramasubramanian. "Muslim Americans’ responses to social identity threats: Effects of media representations and experiences of discrimination." Media Psychology 22.3 (2019): 373-393.↩︎
Haraldsson, Amanda. Media discrimination and women's political representation: experimental evidence of media effects on the supply-side. Diss. European University Institute, 2022.↩︎
Ittefaq, Muhammad, et al. "Discriminated in society and marginalized in media: Social representation of Christian sanitary workers in Pakistan." Journalism Practice 17.1 (2023): 66-84.↩︎
Galdi, Silvia, Francesca Guizzo, and Fabio Fasoli. "Media representation matters: The effects of exposure to counter-stereotypical gay male characters on heterosexual men’s expressions of discrimination." Group Processes & Intergroup Relations 26.6 (2023): 1329-1350.↩︎
Balasubramaniam, J. "Dalits and a Lack of Diversity in the Newsroom." Economic and Political Weekly (2011): 21-23.↩︎
http://api.ap.org/media/v/docs/AP_Classification_Metadata.htm ↩︎
Women, men and news: it’s life, Jim, but not as we know it Authors: Karen Ross, Karen Boyle, Cynthia Carter & Debbie Gin↩︎
Ross, Karen, and Cynthia Carter. "Women and news: A long and winding road." Media, Culture & Society 33.8 (2011): 1148-1165.↩︎
Bal, Haluk Mert, and Lemi Baruh. "Citizen involvement in emergency reporting: A study on witnessing and citizen journalism." Interactions: Studies in Communication & Culture 6.2 (2015): 213-231.↩︎
Ross, Karen, and Cynthia Carter. "Women and news: A long and winding road." Media, Culture & Society 33.8 (2011): 1148-1165.↩︎
Nziza, Elva. Representation of women in the news: An analysis of the New Times and Imvaho Nshya Newspapers in Rwanda. MS thesis. 2018.↩︎
https://mlpp.pressbooks.pub/pol111mhs/chapter/1-2-news/↩︎
Women, men and news: it’s life, Jim, but not as we know it Authors: Karen Ross, Karen Boyle, Cynthia Carter & Debbie Gin↩︎
Atuel, Hazel, Viviane Seyranian, and William D. Crano. "Media representations of majority and minority groups." European Journal of Social Psychology 37.3 (2007): 561-572.↩︎
https://github.com/csebuetnlp/BanglaEmotionBias ↩︎
https://sentiment.bangla.gov.bd/sentiment-emotion-analysis ↩︎
Atuel, Hazel, Viviane Seyranian, and William D. Crano. "Media representations of majority and minority groups." European Journal of Social Psychology 37.3 (2007): 561-572.↩︎
Georgiou, Myria. "Diaspora in the digital era: Minorities and media representation." Jemie 12 (2013): 80.↩︎
Minorities and the mass media: Television into the 21st century, BS Greenberg, D Mastro, JE Brand - Media effects, 2002 - taylorfrancis.com http://ndl.ethernet.edu.et/bitstream/123456789/58135/1/15.Jennings%20Bryant.pdf#page=344 ↩︎
Ramanathan, Renugah, and Tan Bee Hoon. "Application of Critical Discourse Analysis in Media Discourse Studies." 3L: Southeast Asian Journal of English Language Studies 21.3 (2015).↩︎

Newspaper	genre
Prothom Alo	Politics, Crime, International, Business, Sports, Entertainment, Jobs, Lifestyle, `Local news, Health, Environmental Concern`, Education, Technology, Gadgets, Religion, Science, Comic
The daily Star	Sports, Business, Entertainment, Life&Living, Youth, Tech&Startup, Environment, Education, Career, Fashion & Beauty, Food & Recipes, Health & Fitness, Lifehacks, Relationships & Family, Travel, TV & Film, Music, Theatre & Arts, Satire, Featured, Heritage, GADGETS, GAMING, GUIDES, STARTUPS
Bangladesh Protidin	Local news, Lifestyle, Business, Religion, International, Sports, National news, Campus, Corporate corner, health, Tech world, politics, Chitaggong, Science, Facebook corner, Foreign bangladeshi, Oddities

Dataset	Size	Approach	Comment
eBD Bangla news	2294710 articles	-	-
Curated Ethnic Word Dataset	337793 articles	Keyword Extraction	Buggy, For list of articles, same list is added continuously.
Target Ethnic Articles	14000	5 Keyword	Topic modeling result came out bad due to repetition.
Curated Ethnic Word Dataset	10187	Keyword Extraction	Fixed the bug where same article was being added for each of its words
Most Relevant Articles	221	5 Keywords

General articles	5000
Indigenous articles	4892

Representation Discrimination in Media

Literature Review

Computational Analysis

Sociological Analysis

Journalism

Terminology

Experiment

Task - Keyword Based Data Collection

TODO Failed Attempts

Experiment - Supervised Topic Modeling on Ethnic with Stemming

Experiment - Supervised Topic Modeling to find best Alpha score

Experiment - Semi Supervised Topic Modeling on Ethnic, nonethnic

Task - Mass Annotation of 5342 Ethnic Articles

Experiment - Ethnic 100 articles with Gemini-1.5 Flash

Experiment - Nonethnic 100k articles with Corex

Experiment - Ethnic (4893) with Corex

Experiment - Bangla news distribution (100k) with Corex

TODO Experiment - Combined 100 articles with Gemini-1.5 Flash after defining each genre.

Experiment - H0: There is no difference in size between ethnic and normal articles.

Experiment - Word Cloud for each Topic

Experiment - Bangla News Sentiment Analysis

Experiment - Word Cloud for each Topic after bug fixing

Experiment - Finding ethnic articles using ChatGPT Turbo

Annotation

Experiment - Finalizing ChatGPT Turbo

Experiment - EDA on ChatGPT Turbo Annotation

Experiment - Topic modeling, word cloud in curated dataset

Experiment - EDA on both ChatGPT Turbo Annotation

Annotation

Paper dumps - Framing Evaluation and Noun encoding

Experiment - Framing

Survey

Literature

CDA

Question

Case Finding

References

Code	Bangla news distribution (100K) with CoRex.ipynb
Data	100k nonethnic articles in drive , Excel containing genre distribution

Code	data_analysis.ipynb , Bangla Word Cloud.ipynb
Data	4893 ethnic articles., 100k nonethnic articles in drive ,

Data	4893 ethnic articles
Code	Bangla word cloud for topic.ipynb