[Air-L] Novice questions on how to extract tweets for China "social credit system" chimera project
Shulman, Stu
stu at texifter.com
Thu Nov 21 05:42:49 PST 2024
I believe many of these authors (see the first link below) trust
and acknowledge our contribution to their work on these sorts of data
management problems over the last 14 years:
https://tinyurl.com/DT4Twitter
When academics sign up for a free TrustDefender account, which is
DiscoverText with some cool new features that people should try out for
teaching or research, this is "the welcome email" you get below. As it
points out, we have continued to provide access to tools and data despite
massive changes in the ecosystem. It is not a perfect friction-free data
world, but if you take the briefing we offer, you will have answers to your
list of questions.
https://calendly.com/discovertext
We can get you from novice to intermediate in under 90 minutes. We can get
you data. We have unique theories, methods, and tools. We use the Twitter
display, so you get all the visual elements. We segment metadata and make
it highly amenable to filtering. We provide unique deduplication and
sampling tools for creating purposive samples. We enable multi-coder crowd
source annotation and measurement of inter-rater reliability. We provide a
one-of-a-kind adjudication system for refining annotation, creating gold
standard training sets, and ranking human annotators. We provide automated
loading of data and keystroke coding. We provide on-board machine-learning
and deliver all of these elements in a graphical user interface. The five
pillars of text analytics are all in one place.
--- The Welcome Email ---
Yearlong TrustDefender Group License
I have just issued a yearlong TrustDefender group license to you. Did you
receive the email? It sometimes is directed to a spam folder. TrustDefender
is an improved version of DiscoverText specifically designed to make
teaching and collaborative research easier.
You will be the administrator of your own group account and will have the
ability to send out licenses to other people you would like to collaborate
with. Any recipient of a license from you who registers an account will
automatically be available to collaborate via your peer network. We have
streamlined this specific peer collaboration process in TrustDefender.
This service is 100% free for academics. Each member of your peer group can
get a license either as part of your account or you can send me their
emails and I will send them each a group account they can control. Either
way, this part is important, you need to remain "visible" in the
TrustDefender Peer Network to form connections and collaborate.
If you are a professor and you want to use TrustDefender in class, feel
free to let me know if you have questions or want a meeting about your
research design or implementation pedagogy.
Please review some of the introductory videos:
https://vimeo.com/showcase/5553857
If you do want to upload spreadsheets, that is easy to do:
https://vimeo.com/622539257
Book a meeting with the inventor:
https://calendly.com/discovertext
We are working on a keyword list for users:
https://tinyurl.com/DTManual
You can find a robust DiscoverText literature and creative methods ideas in
these academic papers:
https://discovertext.com/mentions/
There is a lot of uncertainty in the Twitter data ecosystem. You can no
longer collect Twitter data in real time using an API. Contact me directly
about these legacy datasets, which can be shared via the DiscoverText peer
network.
https://vimeo.com/503173700
There is an option to access Twitter data produced over the last 12 months
via Meltwater for a fee. Please contact me for a demo.
https://www.meltwater.com/en
Many scholars have massive stored Twitter datasets in the raw JSON format.
You can upload any historical Twitter data in JSON format to TrustDefender
for analysis:
https://vimeo.com/679097662
No academic should study Twitter data in a spreadsheet until they have
spent 7 minutes watching this "Case Against...":
https://vimeo.com/526218014
Going forward, the opportunities to collect unique, tailored, specific real
time or historical data will be very sharply curtailed by Twitter. As time
passes, archival questions of what remains and what is lost to history may
track closely to the history of newspapers. I completed a dissertation in
1999 about crumbly newspapers from the Progressive Era. Some were
available, others were not. This applies to Twitter data now and it always
did. Some data was preserved, much is lost or will be lost, even with
serious archival efforts.
I invite you to book a web meeting or send me a note if you have questions
about what remains possible. There are important questions about what comes
next in the history of information and how we work together to preserve
research opportunities.
Many people ask about Facebook, Instagram, and other social data.
DiscoverText has not been connected to Facebook's API since 2014. We do not
store or access any social data except Twitter. There is some Reddit data
in the Meltwater pipeline. A caveat is that some academics do have legal
access to non-Twitter social data and that data, when stored in
spreadsheets, can be uploaded by researchers into their DiscoverText
account like any other spreadsheet:
https://vimeo.com/622539257
A lot of people want to code transcripts of interviews and other
semi-structured data. This is not the ideal use case, but if your data fits
in a spreadsheet, it may be possible to make use of these tools.
I look forward to supporting your work,
~Stu
On Thu, Nov 21, 2024 at 6:36 AM Ollier Malaterre, Ariane via Air-L <
air-l at listserv.aoir.org> wrote:
> Hello amazing community,
>
> I'm working with Emilie Szwajnoch on a project to document the birth of
> the Chinese "social credit system" chimera in Western anglophone countries
> and we have naïve questions on how to extract tweets as this is our first
> time doing this and the situation with X is evolving rapidly. I hope
> experts in the community can help!
>
> a. How can we extract tweets in their entirety (text, pictures, and
> links)?
>
> b. If we extract tweets with shortened links, will the links work
> after extraction?
>
> c. Is it possible to extract all tweets posted in specific
> countries based on an X user's location? Or will we not have a full sample
> if an X user is not sharing their location (may this depend on countries'
> privacy regulations)?
>
> d. Is there a provider that this community trusts to do the
> extraction in case we don't do it ourselves?
>
> Thank you very much!
> Ariane Ollier-Malaterre, PhD<
> https://sites.google.com/site/olliermalaterre/home>
> Canada Research Chair on Digital Regulation at Work and in Life<
> https://digitalregulation.uqam.ca/en/home-english/>
> New Book Living with Digital Surveillance in China<
> https://www.routledge.com/9781032517704>
> New Book Le management à l'ère numérique<
> https://www.puq.ca/catalogue/livres/management-ere-numerique-4352.html>
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>
--
Dr. Stuart W. Shulman
Founder and CEO, Texifter
Editor Emeritus, *Journal of Information Technology & Politics*
More information about the Air-L
mailing list