[Air-L] Question about journalism databases
kalev leetaru
kalev.leetaru5 at gmail.com
Fri Oct 4 07:02:12 PDT 2024
In collaboration with the Internet Archive's TV News Archive, the TV
Explorer allows search back to 2009 for CNN/MSNBC/Fox, 2010 for the "big
three" evening news broadcasts, ~2014 for business channels, with other
start/end dates for a variety of other channels (NOTE that the archive.org
website keyword search is a discovery search engine and thus returns only a
list of matching broadcasts regardless of how many times the phrase was
mentioned in each vs the Explorer was designed for scholarly and
journalistic use cases and thus returns the number of actual times the
phrase was mentioned over time with various analytics):
https://api.gdeltproject.org/api/v2/summary/summary?d=iatv
For CNN/MSNBC/Fox/BBC News London, you can also search the onscreen text
back to 2020 and evening news broadcasts back to 2010:
https://api.gdeltproject.org/api/v2/summary/summary?d=iatvai
We completed machine transcription through fidelity-preserving LSMs (which
uniquely preserve code switching and which don't suffer the same issues as
fluency LSMs like Whisper) of the complete 2.5-million-hour international
archive earlier this year spanning more than 50 countries in 150 languages
over 24 years and will be making that searchable soon as well:
https://blog.gdeltproject.org/how-we-transcribed-2-5-million-hours-of-tv-news-in-just-7-days-how-we-could-have-finished-in-a-single-afternoon/
You can keyword search the 2017-present global web news monitoring in 400
languages, with the complete 30+ year historical archive available soon:
https://api.gdeltproject.org/api/v2/summary/summary?d=web
Journalists and scholars can also perform a wealth of more advanced visual
and textual analyses on all of the collections (for example, using
image-based embeddings to visually cluster a day of Russian, Iranian and
Chinese television, identify who is telling the story on Russian
television's 60 minutes, cluster a day of global news, use LLMs/LSMs/LMMs
to summarize, applying sentiment, NLP, NLU, anomaly detection, narrative
framing, and a wide array of other analytic techniques):
https://blog.gdeltproject.org/video-web-summit-2023-multimodal-generative-ai-in-the-real-world/
Kalev
On Fri, Oct 4, 2024 at 8:54 AM Sarah Ann Oates via Air-L <
air-l at listserv.aoir.org> wrote:
> Hi, this is the well-known TV archive https://tvnews.vanderbilt.edu/ but I
> don't know how comprehensive it is.
>
> FWIW, I use the online news story versions of CNN and Fox etc to do
> analysis as a sort of "proxy." But it's not really the same thing and I
> think Stu's observation about the repetition of a particular phrase is
> interesting!
>
> MediaCloud <https://mediacloud.org/> offers a way to search phrases in a
> range of media across the world (for free).
>
>
>
> Sarah Oates
> Pronouns: she/her
> Author of Seeing Red: Russian Propaganda and American News
> <
> https://www.amazon.com/Seeing-Red-Russian-Propaganda-American-ebook/dp/B0CW1GM9D1/ref=tmm_kin_swatch_0?_encoding=UTF8&dib_tag=se&dib=eyJ2IjoiMSJ9.aZ9Jw2n89cnTjdyn_RybBD9pFIpFsv02YipyBKLoj5e0wFrm79ywnB1TnhmZb4NvESjax1QNHsL3mdMkaTlzdjo3gfurG3nFJPKgEkLbELhl0P8ewg_ffVrS8u2-O3ijH119rZz2BOWILlTzJlBrUrSaqUoC49IvLY6m7iUW949RJzrzgabDOrktp8XtKyja9v3N5E4Mx5MIx2T1S5cYmbwqo32jtALFg3vy-WnUH3Q.pKW4C2MW9Lo2flUbHlX0XXIHPGWaPYHrD8pCZXOX5Mc&qid=1716987311&sr=8-1
> >
>
> Associate Dean for Research/Professor and Senior Scholar
> UMD Distinguished Scholar-Teacher
> Philip Merrill College of Journalism
> University of Maryland
> College Park, MD 20742
> Email: soates at umd.edu
> Phone: 301 405 4510
> www.media-politics.com
> Twitter: @media_politics
>
>
>
>
>
>
> On Fri, Oct 4, 2024 at 6:17 AM Anmol Panda via Air-L <
> air-l at listserv.aoir.org> wrote:
>
> > Hi. I have used ProQuest data to access historical news articles. It is a
> > very comprehensive database of newspapers in the US. But I am interested
> in
> > knowing if there is a repository of TV news content.
> >
> > Please let me know.
> >
> > On Fri, Oct 4, 2024, 5:48 AM Shulman, Stu via Air-L <
> > air-l at listserv.aoir.org>
> > wrote:
> >
> > > If I want to find out how many times close variants of the phrase “not
> > > enough fraud to change the outcome of the election” have been used by
> the
> > > media on air and in print since the 2020 US election, what database or
> > set
> > > of databases would be most comprehensive? Can the data be extracted to
> a
> > > spreadsheet?
> > >
> > > Thanks,
> > > ~Stu
> > >
> > > --
> > > Dr. Stuart W. Shulman
> > > Founder and CEO, Texifter
> > > Editor Emeritus, *Journal of Information Technology & Politics*
> > > _______________________________________________
> > > The Air-L at listserv.aoir.org mailing list
> > > is provided by the Association of Internet Researchers http://aoir.org
> > > Subscribe, change options or unsubscribe at:
> > > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >
> > > Join the Association of Internet Researchers:
> > > http://www.aoir.org/
> > >
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> >
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>
More information about the Air-L
mailing list