News Scraper, Analyzer, Creator, Distributor

  • Description: Automates the process of scraping, analyzing, and categorizing news content from a list of specified sources at regular intervals.

  • Scenario:

    • Trigger: Based on the provided DATA SOURCE LIST, Genie triggers news sources review every 6 hours.

    • Scraping: Scrapes each website on the list and saves the content as Source[x]['content_fresh'].

    • Comparison: Compares Source[x]['content_fresh'] with Source[x]['content_old'] and identifies the differences, which are then stored in Source[x]['content_for_analysis'].

    • Content Extraction:

    • a. Analyzes Source[x]['content_for_analysis'] to find articles/news.

    • b. Extracts the following details: Title, Content or Mini content, Link to the full content, Leading Picture, Author, Date, Tags, and any other relevant metadata.

    • Categorization: If the extracted content is related to TOPIC_X, it is stored in NEWCONTENT[date time][category] with the fields {title, content, author, link}.

  • Value: Ensures timely and accurate collection of fresh news content, improves efficiency in tracking updates from multiple sources, and enables focused analysis of content related to specific topics.