Mastering Twitter Data Extraction: A Guide to Apify's Tweet X.com Scraper Input Options

This guide dives deep into the input options of Apify's Tweet X.com Scraper, empowering you to craft precise and effective queries for extracting data from Twitter (X). We'll go beyond the basics and explore advanced techniques to unlock the full potential of this powerful tool.

Link to Tweet X.com Scraper: https://apify.com/xtdata/twitter-x-scraper

Understanding the Input Schema

The input schema defines all the configurable parameters for the Tweet X.com Scraper. You can access the full schema on the Actor's page on Apify. Here, we'll focus on the most important and useful options.

1. Search Terms (searchTerms)

This is the heart of your query. It accepts an array of strings, allowing you to combine multiple search criteria. The power lies in using Twitter's advanced search operators.

Key Operators and Examples:

  • Keywords: data science (finds tweets containing both words)
  • Exact Phrase: "machine learning" (finds tweets with that exact phrase)
  • OR Operator: AI OR artificial intelligence (finds tweets with either "AI" or "artificial intelligence")
  • Exclusion: -election (excludes tweets containing "election")
  • Hashtags: #python (finds tweets with the #python hashtag)
  • Mentions: @apify (finds tweets mentioning @apify)
  • From User: from:google (finds tweets from the user @google)
  • To User: to:elonmusk (finds tweets directed *to* @elonmusk)
  • Date Range:
    • since:2024-01-01 (finds tweets since January 1, 2024)
    • until:2024-03-01 (finds tweets until March 1, 2024)
    • Combine: since:2024-01-01 until:2024-03-01 (finds tweets within that date range)
  • Dynamic date: since:{{date:YYYY-MM-DD,-7d}} (finds tweets since 7 days ago)
  • Conversation ID: conversation_id:1764275654328975637 finds reply of tweet which id is 1764275654328975637
  • Combination : #openai from:google since:{{date:YYYY-MM-DD,-7d}} until:{{date:YYYY-MM-DD}}
  • Filtering Replies/Retweets:
    • -filter:replies (Excludes all replies)
    • -filter:retweets (Excludes all retweets)

Resource: For a comprehensive list of operators, see the Twitter Advanced Search Guide.

2. Max Tweets (maxTweets)

This integer value sets the maximum number of tweets to retrieve. It's crucial for controlling costs and preventing excessively long run times. Start with a smaller number (e.g., 100) for testing and increase as needed.

3. Sort (sort)

This field determines the order of the results:

  • "Top": Returns the most relevant tweets (Twitter's default algorithm).
  • "Latest": Returns the most recent tweets (chronological order).

4. Language (tweetLanguage)

Use ISO 639-1 codes to filter by language (e.g., en for English, es for Spanish, fr for French). Leave this blank to retrieve tweets in all languages.

5. Filters (Boolean Flags)

These options allow you to target specific types of tweets:

  • onlyVerifiedUsers: true to include only tweets from verified accounts, false otherwise.
  • onlyTwitterBlue: true to include only tweets from Twitter Blue subscribers, false otherwise.
  • onlyImage: true to include only tweets containing images, false otherwise.
  • onlyVideo: true to include only tweets containing videos, false otherwise.
  • onlyQuote: true to include only tweets containing quote tweets, false otherwise.
  • includeSearchTerms: true to add search term field, false otherwise.

Advanced Techniques

  • Multiple Search Terms: You can provide an array of search terms in the searchTerms field. This is useful for fetching data across multiple date ranges or related keywords. The scraper will run separate queries for each term (up to the concurrency limit).
    
            {
                "searchTerms": [
                    "from:NASA since:2023-01-01 until:2023-06-30",
                    "from:NASA since:2023-07-01 until:2023-12-31"
                ]
            }
            
  • Start URLs: You can directly provide URLs for tweets, user profiles.
  • Combining Operators: The real power comes from combining multiple operators. For example: #AI from:google -filter:replies since:2024-01-01 (finds tweets with #AI, from Google, excluding replies, since Jan 1, 2024).

Example Input Configurations

Example 1: All tweets from @elonmusk mentioning "Tesla" and "SpaceX" since 2023:


    {
        "searchTerms": [
            "from:elonmusk Tesla SpaceX since:2023-01-01"
        ],
        "maxTweets": 500,
        "sort": "Top",
        "tweetLanguage": "en"
    }
    

Example 2: Tweets with #DataScience and #Python, excluding replies, in English, from the last 30 days:


    {
      "searchTerms": [
        "#DataScience #Python -filter:replies since:{{date:YYYY-MM-DD,-30d}} until:{{date:YYYY-MM-DD}}"
      ],
      "maxTweets": 200,
      "sort": "Latest",
       "tweetLanguage": "en"
    }
    

Example 3: Image tweets from @NatGeo in Spanish


    {
      "searchTerms": ["from:NatGeo"],
      "maxTweets": 100,
      "sort": "Top",
      "onlyImage": true,
      "tweetLanguage": "es"
    }
    

Example 4: Find replies to a specific tweet containing a specific hashtag


        {
            "searchTerms": [
                "conversation_id:1234567890 #MyHashtag"
            ],
             "maxTweets": 100,
            "sort": "Latest",
            "tweetLanguage": "en"
        }
    

Replace 1234567890 with the actual Tweet ID and #MyHashtag with your desired hashtag.

Troubleshooting

  • No Results/Few Results: Your search query might be too restrictive. Try broadening your keywords, removing filters, or expanding the date range. Also double-check your advanced search syntax.
  • Incomplete Data: Twitter's API has rate limits. If you're fetching a very large number of tweets, you might encounter incomplete results. Try breaking your query into smaller chunks (e.g., shorter date ranges).

Conclusion

By mastering the input options of the Tweet X.com Scraper, you can precisely target the Twitter data you need. Experiment with different combinations of keywords, operators, and filters to unlock the full potential of this powerful tool. Remember to always respect Twitter's Terms of Service and use the scraper responsibly. Happy data extracting!

Comments

Popular posts from this blog

Leveraging the Fast TikTok API on Apify for Efficient Data Scraping

Tutorial: Scraping Data from TikTok using Fast TikTok API with Python