Intermediate

AI-Powered Blog Automation

Build a complete automated content pipeline for WordPress

4 hours
7 Modules
Updated May 12, 2025
Stephen AI
Instructor: Stephen AI
Founder of The Prompt Index with expertise in AI automation and content generation workflows.
AI-Powered Blog Automation

Course Overview

This comprehensive course teaches you how to build and deploy an automated blog creation system powered by AI. Learn to create a Python script that automatically fetches research articles, selects the most relevant content, transforms it into engaging blog posts with custom images, and publishes directly to WordPress and Medium—all with minimal human intervention. By the end of this course, you'll have a powerful tool that can create and publish high-quality blog content with minimal oversight, saving you hours of work while maintaining quality and SEO benefits.

Requirements

  • Basic familiarity with Python (no expert knowledge required)
  • WordPress site with admin access
  • OpenAI API key (paid account)
  • Medium account (optional)
  • Google Colab (free)

What You'll Learn

  • Set up a complete AI workflow using OpenAI's GPT-4 and latest gpt-image-1/span>
  • Create scripts that fetch current research from specialized sources
  • Implement intelligent content selection algorithms
  • Transform technical content into engaging blog posts
  • Generate custom images for each post
  • Automate publication to WordPress and Medium
  • Schedule and maintain your automated content pipeline

Course Content

Understand the value of blog automation for SEO and discover the transformative potential of AI-powered content creation workflows.

Lessons in this module:

  • The Value of Blog Automation for SEO and Traffic
  • Overview of the Automated Blog Workflow
  • Setting Up Your Development Environment
  • Understanding the Key APIs and Tools

Module Content:

In today's fast-paced digital landscape, creating high-quality, engaging blog content consistently can be a challenging task. This course introduces you to an innovative solution: an automated blog creation script that leverages artificial intelligence to streamline your content production process.

The Value of Blog Automation for SEO and Traffic

Maintaining a consistent content strategy is crucial for SEO, audience engagement, and establishing authority. However, manually creating high-quality blog content is time-consuming and resource-intensive. Here's why AI-powered blog automation is a game-changer:

  • Time Efficiency: Reduce content creation time from hours to minutes
  • Consistency: Maintain a regular publishing schedule without burnout
  • SEO Advantages: Improve search rankings through consistent, relevant content
  • Content Diversity: Explore a wider range of topics than manual research allows
  • Cost-Effectiveness: Lower content production costs compared to hiring writers

By implementing the system you'll build in this course, you can publish fresh, relevant content regularly with minimal effort, directly boosting your site's search visibility and organic traffic growth.

Overview of the Automated Blog Workflow

Our automated blog creation system follows a well-defined workflow that combines several technologies to create a seamless content pipeline:

  1. Content Retrieval: The system fetches recent academic articles from arXiv, a repository of scholarly papers across various disciplines
  2. AI Analysis: OpenAI's GPT-4 evaluates the articles based on relevance, novelty, and audience interest
  3. Content Selection: The highest-scoring unpublished article is selected
  4. Blog Generation: The selected article is transformed into an engaging blog post that's accessible to a general audience
  5. Image Creation: gpt-image-1 generates a custom featured image based on the content
  6. Publication: The complete post is automatically published to WordPress and Medium (optional)

This modular approach allows for customization at each stage, ensuring the final content aligns with your brand voice and audience preferences.

Setting Up Your Development Environment

To build and run our automated blog system, we'll use Google Colab - a free, cloud-based Python environment that requires no local installation. Here's why it's ideal for our project:

  • Zero Setup: Run Python code directly in your browser
  • Pre-installed Libraries: Many required packages are already available
  • Free GPU Access: Process data faster with cloud computing resources
  • Persistent Storage: Connect to Google Drive to save your work
  • Easy Sharing: Collaborate with team members through shareable notebooks

Getting started with Google Colab is simple:

  1. Go to Google Colab
  2. Sign in with your Google account
  3. Click on "New Notebook" to create a new project

For our project, we'll need to install several Python libraries. This can be done directly in Colab with the following command:

!pip install requests beautifulsoup4 openai PyPDF2 medium-sdk markdown python-dotenv

These libraries handle various aspects of our workflow:

  • requests: For making HTTP requests to fetch articles
  • beautifulsoup4: For parsing HTML content
  • openai: For accessing GPT-4 and gpt-image-1
  • PyPDF2: For extracting text from PDF documents
  • medium-sdk: For publishing to Medium
  • markdown: For converting Markdown to HTML
  • python-dotenv: For managing environment variables

Understanding the Key APIs and Tools

Our blog automation system interacts with several external services through their APIs. Let's explore each one:

1. OpenAI API

The OpenAI API is central to our system, providing access to:

  • GPT-4: Powers our content analysis and blog post generation
  • gpt-image-1: Creates custom images for our blog posts

To use these capabilities, you'll need an API key from OpenAI. If you don't have one, you can sign up at OpenAI's website. The initial setup costs are minimal – you can start with as little as $5 and run the script for a whole month.

Here's a basic example of initializing the OpenAI client:

from openai import OpenAI

# Initialize the OpenAI client
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

2. WordPress REST API

The WordPress REST API allows our script to interact directly with your WordPress site, enabling:

  • Creating new posts programmatically
  • Uploading media files (images)
  • Setting featured images
  • Configuring post metadata

To use this API, you'll need:

  • Your WordPress site URL
  • Administrator credentials (username and application password)

Note that we'll be using an application password rather than your regular login password. This is a more secure approach that limits the permissions granted to our script.

3. Medium API (Optional)

If you want to cross-post your content to Medium, our script can also handle that through the Medium API. This extends your content's reach to Medium's audience with no additional effort.

To use this feature, you'll need a Medium integration token, which you can obtain from your Medium settings.

4. BeautifulSoup and Other Libraries

In addition to these APIs, we'll use several Python libraries to handle various aspects of our workflow:

  • BeautifulSoup: For parsing HTML from web pages
  • PyPDF2: For extracting text from PDF documents
  • Markdown: For converting our Markdown content to HTML for WordPress

These tools work together to create a seamless pipeline from content discovery to publication.

✅ Action Steps

  1. Create a Google Colab account if you don't already have one
  2. Get an OpenAI API key from OpenAI's platform
  3. Make sure you have administrator access to your WordPress site
  4. Set up an application password in your WordPress account (We'll cover this in detail in Module 2)
  5. Optional: Create a Medium account and get an integration token if you want to cross-post content

In the next module, we'll dive into setting up your environment in Google Colab and configuring all the necessary credentials for a smooth workflow.

Configure Google Colab, install required libraries, and set up all necessary API keys and credentials for your automated blog system.

Lessons in this module:

  • Creating a Google Colab Account
  • Installing Required Python Libraries
  • Configuring API Keys and Environment Variables
  • Setting Up WordPress Application Passwords

Module Content:

Before we can start building our automated blog system, we need to set up our development environment and configure all the necessary credentials and APIs. This module will guide you through each step of this process, ensuring you have everything you need to proceed with creating your content automation workflow.

Creating a Google Colab Account

Google Colab (Colaboratory) is a free cloud-based Jupyter notebook environment that requires no setup and runs entirely in the cloud. It's perfect for our blog automation project as it provides free access to computing resources and comes with many pre-installed libraries.

If you already have a Google account, you automatically have access to Google Colab. Here's how to get started:

  1. Go to https://colab.research.google.com
  2. If prompted, sign in with your Google account
  3. You'll be presented with a welcome screen. Click on "New Notebook" to create your first Colab notebook

When you open a new notebook, you'll see:

  • A single empty code cell with a play button on the left
  • Menu options at the top for file operations, editing, runtime management, etc.
  • Options to add new code or text cells

Google Colab notebooks are structured like Jupyter notebooks, with alternating cells for code and text (Markdown). This allows us to write executable code and document it simultaneously.

Key Colab features we'll use:

  • Code execution: Run Python code directly in your browser
  • File upload/download: Transfer files between your local machine and the Colab environment
  • Google Drive integration: Access and save files to your Google Drive
  • Environment persistence: Variables and installed libraries remain available throughout your session

Installing Required Python Libraries

Now that we have our Colab environment set up, let's install the Python libraries we'll need for our blog automation system. Copy and paste the following command into the first cell of your notebook:

!pip install requests beautifulsoup4 openai PyPDF2 medium-sdk markdown python-dotenv

Click the play button or press Shift+Enter to execute the cell. You'll see output as Colab installs these libraries.

Let's understand what each library does:

  • requests: For making HTTP requests to websites and APIs
  • beautifulsoup4: A library for parsing HTML and XML documents, which we'll use to extract content from web pages
  • openai: The official OpenAI Python client, which provides access to GPT-4 and gpt-image-1
  • PyPDF2: For extracting text from PDF documents, useful when we need to parse research papers
  • medium-sdk: A client library for interacting with the Medium API
  • markdown: For converting Markdown to HTML, which is needed when publishing to WordPress
  • python-dotenv: For loading environment variables from a .env file (though we'll use a different approach in Colab)

After the installation completes, let's verify that the libraries were installed correctly by importing them in a new code cell:

import requests
from bs4 import BeautifulSoup
from openai import OpenAI
import os
import io
from PyPDF2 import PdfReader
import re
import base64
from medium import Client
import markdown
import json

print("All libraries imported successfully!")

Run this cell. If all libraries were installed correctly, you should see the message "All libraries imported successfully!" without any error messages.

Configuring API Keys and Environment Variables

For security, we'll use environment variables to store sensitive information like API keys. In Colab, we can set these variables directly in our notebook.

Create a new cell and enter the following code, replacing the placeholder values with your actual credentials:

import os

# Set environment variables
os.environ['OPENAI_API_KEY'] = 'your_openai_api_key_here'
os.environ['MEDIUM_TOKEN'] = 'your_medium_token_here'  # Optional
os.environ['WORDPRESS_USERNAME'] = 'your_wordpress_username'
os.environ['WORDPRESS_PASSWORD'] = 'your_wordpress_application_password'
os.environ['WORDPRESS_URL'] = 'https://your-wordpress-site.com'

# Verify variables are set
print("Environment variables set:", list(os.environ.keys()))

For the OPENAI_API_KEY, you'll need to:

  1. Go to https://platform.openai.com/account/api-keys
  2. Sign in to your OpenAI account
  3. Click "Create new secret key"
  4. Copy the generated API key and paste it in the code above

For the MEDIUM_TOKEN (optional):

  1. Go to https://medium.com/me/settings
  2. Scroll down to the "Integration tokens" section
  3. Create a new token and copy it to the code above

For the WordPress credentials, we'll need to set up an application password, which we'll cover in the next section.

Security Note: While storing credentials this way is convenient for development, be careful with sharing your notebook. Anyone with access to your notebook could potentially see these credentials. For a production environment, consider using a more secure approach like a secrets manager.

Setting Up WordPress Application Passwords

WordPress application passwords provide a secure way for external applications to authenticate with your WordPress site without using your main account password. Here's how to set one up:

  1. Log in to your WordPress admin dashboard: Go to https://your-wordpress-site.com/wp-admin and sign in with your administrator account
  2. Navigate to Users: In the left sidebar, click on "Users"
  3. Edit your user profile: Find your admin user and click "Edit"
  4. Scroll down to Application Passwords: Near the bottom of the page, you'll find the "Application Passwords" section
  5. Generate a new password:
    • Enter a name for the application (e.g., "Blog Automation Script")
    • Click "Add New Application Password"
    • WordPress will generate a new password
  6. Copy the password: Copy the generated password and paste it into your environment variables code where it says your_wordpress_application_password

Important: WordPress will only show you the application password once, immediately after creation. Make sure to copy it before leaving the page. If you lose it, you'll need to generate a new one.

Application passwords have several benefits:

  • They can be revoked individually without changing your main password
  • They provide limited access compared to your main account password
  • They create a clear audit trail of which application is accessing your site

With your application password set up, update the environment variables in your notebook:

  • WORDPRESS_USERNAME should be your WordPress admin username or email
  • WORDPRESS_PASSWORD should be the application password you just generated
  • WORDPRESS_URL should be the URL of your WordPress site (e.g., https://example.com, without a trailing slash)

Now run the cell to set these environment variables. You should see them listed in the output.

Testing API Connections

Let's verify that our API connections are working correctly. Create a new cell and add the following code:

# Test OpenAI API
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

try:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Say hello!"}
        ],
        max_tokens=5
    )
    print("OpenAI API test successful:")
    print(response.choices[0].message.content)
except Exception as e:
    print("OpenAI API test failed:", e)

# Test WordPress connection
wordpress_url = os.environ['WORDPRESS_URL']
wordpress_api_url = f"{wordpress_url}/wp-json/wp/v2/posts?per_page=1"
credentials = f"{os.environ['WORDPRESS_USERNAME']}:{os.environ['WORDPRESS_PASSWORD']}"
token = base64.b64encode(credentials.encode())
headers = {'Authorization': f'Basic {token.decode("utf-8")}'}

try:
    response = requests.get(wordpress_api_url, headers=headers)
    if response.status_code == 200:
        print("\nWordPress API test successful!")
        print(f"Found {len(response.json())} post(s)")
    else:
        print("\nWordPress API test failed:", response.status_code)
        print(response.text)
except Exception as e:
    print("\nWordPress API test failed:", e)

Run this cell to test your API connections. If everything is set up correctly, you should see success messages for both the OpenAI API and the WordPress API.

If you encounter any errors:

  • For OpenAI errors, check that your API key is correct and that your account has sufficient credit
  • For WordPress errors, verify your site URL, username, and application password

✅ Action Steps

  1. Create a Google Colab notebook and name it "AI Blog Automation"
  2. Install all the required Python libraries
  3. Set up your OpenAI API key
  4. Generate a WordPress application password
  5. Configure all environment variables in your notebook
  6. Run the API connection tests to verify everything is working

Once all your connections are working properly, you're ready to move on to Module 3, where we'll start building the content fetching and selection components of our automated blog system.

Learn to retrieve research articles from academic sources and implement AI-powered content selection to find the most suitable topics for your blog.

Lessons in this module:

  • Retrieving Articles from Academic Sources
  • Using BeautifulSoup for Web Scraping
  • Implementing AI-Powered Content Selection
  • Tracking Published Articles to Avoid Duplication

Module Content:

Now that our environment is set up, it's time to start building the core components of our automated blog system. In this module, we'll focus on finding and selecting content—specifically, how to fetch articles from academic sources like arXiv and use AI to identify the most promising candidates for blog conversion.

Retrieving Articles from Academic Sources

For our blog automation system, we need a reliable source of high-quality content. Academic repositories like arXiv are excellent for this purpose because:

  • They contain cutting-edge research across many fields
  • The content is publicly accessible
  • Papers typically have clear structures (abstract, introduction, methodology, etc.)
  • There's a constant flow of new material

We'll be using arXiv as our primary source, but the techniques we'll cover can be adapted for other academic repositories or specialized websites in your niche.

Let's start by creating a function to fetch recent articles from arXiv. Add this to a new cell in your Colab notebook:

def fetch_arxiv_articles(search_query="ChatGPT"):
                    """
                    Fetches recent articles from arXiv based on the provided search query.
                    
                    Args:
                        search_query (str): The topic to search for on arXiv
                        
                    Returns:
                        list: A list of dictionaries containing article information
                    """
                    # Build the URL with the search query
                    url = f'https://arxiv.org/search/?query={search_query}&searchtype=all&source=header'
                    
                    print(f"Fetching articles from: {url}")
                    
                    # Send HTTP request to arXiv
                    response = requests.get(url)
                    
                    # Check if the request was successful
                    if response.status_code != 200:
                        print(f"Error: Failed to fetch articles. Status code: {response.status_code}")
                        return []
                    
                    # Parse the HTML content
                    soup = BeautifulSoup(response.text, 'html.parser')
                    
                    # Initialize empty list to store article data
                    articles = []
                    
                    # Find all article result items
                    for result in soup.find_all('li', class_='arxiv-result'):
                        try:
                            # Extract title
                            title_elem = result.find('p', class_='title')
                            
                            # Extract abstract
                            abstract_elem = result.find('span', class_='abstract-full')
                            
                            # Extract arXiv ID and build direct link
                            arxiv_id_elem = result.find('p', class_='list-title').find('a')
                            
                            if title_elem and abstract_elem and arxiv_id_elem:
                                title = title_elem.text.strip()
                                abstract = abstract_elem.text.strip()
                                arxiv_id = arxiv_id_elem.text.strip().split(':')[1].strip()
                                html_link = f'https://arxiv.org/abs/{arxiv_id}'
                                
                                # Extract authors
                                authors_elem = result.find('p', class_='authors')
                                authors = authors_elem.text.strip() if authors_elem else "Authors not found"
                                
                                # Create article dictionary and add to list
                                articles.append({
                                    'title': title,
                                    'abstract': abstract,
                                    'html_link': html_link,
                                    'authors': authors
                                })
                                
                                print(f"Extracted article: {title}")
                                print(f"HTML link: {html_link}")
                                print("-" * 50)
                        
                        except Exception as e:
                            print(f"Error extracting article: {e}")
                            continue
                    
                    print(f"Successfully retrieved {len(articles)} articles.")
                    return articles[:10]  # Return the first 10 articles

This function performs several important tasks:

  1. It constructs a URL for arXiv's search functionality with our desired query
  2. It sends an HTTP request to retrieve the search results page
  3. It parses the HTML response using BeautifulSoup
  4. It extracts key information for each article: title, abstract, link, and authors
  5. It returns a list of dictionaries, each containing the data for a single article

By default, the function searches for "ChatGPT", but you can customize this parameter to focus on articles relevant to your blog's niche. For example, if you run a healthcare blog, you might use search terms like "healthcare AI" or "medical machine learning".

Let's test our function with a quick example:

# Test our article fetching function
                articles = fetch_arxiv_articles("AI ethics")
                print(f"\nRetrieved {len(articles)} articles.")
                
                # Display the first article details
                if articles:
                    print("\nFirst article details:")
                    first_article = articles[0]
                    print(f"Title: {first_article['title']}")
                    print(f"Authors: {first_article['authors']}")
                    print(f"URL: {first_article['html_link']}")
                    print(f"Abstract Preview: {first_article['abstract'][:200]}...")

Using BeautifulSoup for Web Scraping

In our fetch_arxiv_articles function, we're already using BeautifulSoup to parse the HTML content from arXiv. Let's take a deeper look at how BeautifulSoup works and how we can use it more effectively for our web scraping needs.

BeautifulSoup is a Python library that makes it easy to scrape information from web pages. It creates a parse tree from HTML and XML documents that can be used to extract data in a hierarchical and intuitive way.

Here's a breakdown of the key BeautifulSoup concepts we're using:

  • Creating a soup object: soup = BeautifulSoup(response.text, 'html.parser')
  • Finding elements by tag and class: soup.find_all('li', class_='arxiv-result')
  • Navigating the tree: result.find('p', class_='title')
  • Extracting text content: title_elem.text.strip()

Now, let's enhance our article retrieval capabilities by adding a function to fetch the full content of an article once it's been selected. This is more complex than fetching the search results because arXiv papers can be available in HTML or PDF format, and we need to handle both cases:

def fetch_article_content(article):
                    """
                    Fetches the full content of an article from arXiv.
                    Attempts to get HTML version first, falls back to PDF if necessary.
                    
                    Args:
                        article (dict): Article information containing at least 'html_link'
                        
                    Returns:
                        dict: Enhanced article with full content
                    """
                    try:
                        # Extract article ID from link
                        article_id = re.search(r'/(\d+\.\d+)$', article['html_link']).group(1)
                        
                        # Try HTML version first
                        html_link = f'https://arxiv.org/html/{article_id}'
                        pdf_link = f'https://arxiv.org/pdf/{article_id}.pdf'
                        
                        print(f"Attempting to fetch HTML content from: {html_link}")
                        
                        # Try to get HTML version
                        response = requests.get(html_link)
                        
                        if response.status_code == 200:
                            print("HTML version available. Extracting content...")
                            content = extract_html_content(response.text)
                        else:
                            print(f"HTML version not available. Attempting to fetch PDF from: {pdf_link}")
                            content = extract_pdf_content(pdf_link)
                        
                        # Add authors and link to content
                        content['authors'] = article['authors']
                        content['html_link'] = article['html_link']
                        
                        return content
                    
                    except Exception as e:
                        print(f"Error fetching article content: {e}")
                        return None
                
                def extract_html_content(html_content):
                    """
                    Extracts article content from HTML format.
                    
                    Args:
                        html_content (str): The HTML content of the article
                        
                    Returns:
                        dict: Structured article content
                    """
                    soup = BeautifulSoup(html_content, 'html.parser')
                    
                    # Extract authors (might be already available from the search results)
                    try:
                        authors = soup.find('div', class_='ltx_authors').text.strip()
                    except:
                        authors = "Authors not found"
                    
                    # Extract abstract
                    try:
                        abstract = soup.find('div', class_='ltx_abstract').find('p').text.strip()
                    except:
                        abstract = "Abstract not found"
                    
                    # Extract sections (each with title and content)
                    content = []
                    for section in soup.find_all('section', class_='ltx_section'):
                        try:
                            section_title = section.find('h2').text.strip() if section.find('h2') else "Untitled Section"
                            section_content = []
                            
                            for para in section.find_all('p', class_='ltx_p'):
                                section_content.append(para.text.strip())
                            
                            content.append((section_title, '\n'.join(section_content)))
                        except Exception as e:
                            print(f"Error parsing section: {e}")
                    
                    return {
                        'authors': authors,
                        'abstract': abstract,
                        'content': content
                    }
                
                def extract_pdf_content(pdf_url):
                    """
                    Extracts article content from PDF format.
                    
                    Args:
                        pdf_url (str): URL to the PDF file
                        
                    Returns:
                        dict: Structured article content
                    """
                    # Download the PDF
                    response = requests.get(pdf_url)
                    if response.status_code != 200:
                        raise Exception(f"Failed to fetch PDF: HTTP {response.status_code}")
                    
                    # Create a binary stream from the PDF content
                    pdf_file = io.BytesIO(response.content)
                    
                    # Create a PDF reader object
                    pdf_reader = PdfReader(pdf_file)
                    
                    # Extract text from each page
                    content = []
                    for page in pdf_reader.pages:
                        content.append(page.extract_text())
                    
                    # Join all pages
                    full_text = '\n'.join(content)
                    
                    # Try to extract the abstract and other sections
                    authors = extract_authors_from_pdf(full_text)
                    abstract = extract_abstract_from_pdf(full_text)
                    
                    return {
                        'authors': authors,
                        'abstract': abstract,
                        'content': [('Full Content', full_text)]
                    }
                
                def extract_authors_from_pdf(text):
                    """
                    Attempts to extract author information from PDF text.
                    
                    Args:
                        text (str): The full text of the PDF
                        
                    Returns:
                        str: Author information
                    """
                    # Authors are typically at the beginning before the abstract
                    authors_match = re.search(r'^(.*?)\n\s*Abstract', text, re.DOTALL | re.MULTILINE)
                    if authors_match:
                        return authors_match.group(1).strip()
                    return "Authors not found"
                
                def extract_abstract_from_pdf(text):
                    """
                    Attempts to extract the abstract from PDF text.
                    
                    Args:
                        text (str): The full text of the PDF
                        
                    Returns:
                        str: The abstract
                    """
                    # Abstract is typically between "Abstract" and the first section (often "1. Introduction")
                    abstract_match = re.search(r'Abstract\n(.*?)\n\s*1\.', text, re.DOTALL)
                    if abstract_match:
                        return abstract_match.group(1).strip()
                    return "Abstract not found"

These functions handle both HTML and PDF versions of arXiv papers:

  • The fetch_article_content function first tries to get the HTML version, which is easier to parse
  • If HTML isn't available, it falls back to the PDF version
  • For HTML documents, extract_html_content uses BeautifulSoup to parse the structure
  • For PDFs, extract_pdf_content uses PyPDF2 to extract text and then applies regex-based parsing

The extraction from PDFs is less precise because PDFs don't maintain the same structured format as HTML. However, it provides a fallback option to ensure our system works with any arXiv paper.

Implementing AI-Powered Content Selection

Now that we can fetch articles from arXiv, we need a way to determine which one would make the best blog post. This is where the power of GPT-4 comes in—we'll use it to analyze and score each article based on certain criteria.

Let's create a function to send our articles to GPT-4 for analysis:

def send_to_chatgpt_for_scoring(articles):
                    """
                    Sends articles to GPT-4 for analysis and scoring to determine 
                    which article would make the best blog post.
                    
                    Args:
                        articles (list): List of article dictionaries
                        
                    Returns:
                        tuple: (raw_response, structured_response) - The raw and parsed responses from GPT-4
                    """
                    # Define the system prompt with scoring criteria
                    system_prompt = """
                    You are an AI research assistant tasked with selecting the most suitable article to be converted into an engaging blog post for AI enthusiasts. The ideal article should be interesting, unique, and provide practical insights or improve readers' understanding of AI, particularly in areas like prompting techniques or novel AI applications.
                
                    Analyze each article and assign scores based on these criteria:
                    1. Relevance to AI enthusiasts (0-10 points)
                    2. Novelty and uniqueness of the content (0-10 points)
                    3. Potential for practical application or skill improvement (0-10 points)
                    4. Clarity and accessibility of the ideas presented (0-10 points)
                    5. Potential for generating engaging "key takeaways" (0-10 points)
                
                    Total score will be out of 50 points. Provide a brief explanation for each score and sum up the total.
                
                    After scoring all articles, select the one with the highest score as the most suitable for blog conversion.
                
                    Format your response as a JSON object with the following structure:
                    {
                        "article_scores": [
                            {
                                "number": 1,
                                "title": "Article Title",
                                "total_score": 45,
                                "explanation": "Brief explanation of scores"
                            },
                            ...
                        ],
                        "selected_article": {
                            "number": 3,
                            "title": "Selected Article Title",
                            "link": "https://arxiv.org/abs/xxxx.xxxxx",
                            "reason": "Reason for selection"
                        }
                    }
                    """
                
                    # Create user prompt with article information
                    user_prompt = "Here are the articles to analyze:\n\n"
                    for i, article in enumerate(articles, 1):
                        # Include title, short abstract preview, and link
                        user_prompt += f"Article {i}:\nTitle: {article['title']}\nAbstract: {article['abstract'][:200]}…\nLink: {article['html_link']}\n\n"
                
                    user_prompt += "Analyze these articles, assign scores, and select the best for blog conversion. Provide your response in the JSON format specified in the system prompt."
                
                    # Send the request to GPT-4
                    try:
                        chat_completion = client.chat.completions.create(
                            model="gpt-4",
                            messages=[
                                {"role": "system", "content": system_prompt},
                                {"role": "user", "content": user_prompt}
                            ],
                            max_tokens=2000
                        )
                        
                        response_content = chat_completion.choices[0].message.content
                        
                        # Try to parse the response as JSON
                        try:
                            structured_response = json.loads(response_content)
                            print("Successfully parsed GPT-4 response as JSON.")
                        except json.JSONDecodeError as e:
                            print(f"Error parsing GPT-4 response as JSON: {e}")
                            print("Raw response:")
                            print(response_content)
                            structured_response = None
                        
                        return response_content, structured_response
                    
                    except Exception as e:
                        print(f"Error calling OpenAI API: {e}")
                        return None, None

This function does several important things:

  1. It sets up a detailed system prompt that explains the scoring criteria to GPT-4
  2. It creates a user prompt that includes information about each article
  3. It requests that GPT-4 return its analysis in a structured JSON format
  4. It sends the request to the OpenAI API and processes the response

The scoring criteria are carefully chosen to identify articles that would make engaging blog posts:

  • Relevance to AI enthusiasts: Ensures the content is interesting to your target audience
  • Novelty and uniqueness: Prioritizes fresh content over well-covered topics
  • Practical application: Favors content that readers can apply or learn from
  • Clarity and accessibility: Ensures the article's concepts can be explained clearly
  • Potential for key takeaways: Identifies content with clear, actionable insights

When GPT-4 analyzes the articles, it will return a JSON object containing scores for each article and its selection for the most suitable one. Now, let's create a function to parse this response and select the best article:

Tracking Published Articles to Avoid Duplication

To avoid republishing the same article more than once, we need a way to keep track of what we've already published. Let's create a simple system for tracking published articles using a JSON file:

def load_published_articles():
                    """
                    Loads the list of previously published articles from a JSON file.
                    If the file doesn't exist, returns an empty list.
                    
                    Returns:
                        list: List of article links that have already been published
                    """
                    try:
                        if os.path.exists('published_articles.json'):
                            with open('published_articles.json', 'r') as f:
                                return json.load(f)
                        return []
                    except Exception as e:
                        print(f"Error loading published articles: {e}")
                        return []
                
                def save_published_article(article_link):
                    """
                    Adds an article link to the list of published articles and saves it.
                    
                    Args:
                        article_link (str): The link to the published article
                    """
                    try:
                        published_articles = load_published_articles()
                        if article_link not in published_articles:
                            published_articles.append(article_link)
                            with open('published_articles.json', 'w') as f:
                                json.dump(published_articles, f)
                            print(f"Added {article_link} to published articles.")
                        else:
                            print(f"Article {article_link} is already in published list.")
                    except Exception as e:
                        print(f"Error saving published article: {e}")
                
                def is_article_published(article_link):
                    """
                    Checks if an article has already been published.
                    
                    Args:
                        article_link (str): The link to check
                        
                    Returns:
                        bool: True if the article has already been published, False otherwise
                    """
                    return article_link in load_published_articles()

These functions provide a simple but effective tracking system:

  • load_published_articles reads the list of published articles from a JSON file
  • save_published_article adds a new article link to the list and saves it
  • is_article_published checks if an article is already in the published list

In Google Colab, files are temporary by default. If you want to maintain your list of published articles between sessions, you can use Google Drive integration. Here's how to modify the functions to save to Google Drive:

from google.colab import drive
                
                # Mount Google Drive to access files between sessions
                drive.mount('/content/drive')
                
                def load_published_articles():
                    """
                    Loads published articles from Google Drive
                    """
                    drive_path = '/content/drive/My Drive/Blog Automation/published_articles.json'
                    
                    # Create directory if it doesn't exist
                    os.makedirs(os.path.dirname(drive_path), exist_ok=True)
                    
                    if os.path.exists(drive_path):
                        with open(drive_path, 'r') as f:
                            return json.load(f)
                    return []
                
                def save_published_article(article_link):
                    """
                    Saves published article to Google Drive
                    """
                    drive_path = '/content/drive/My Drive/Blog Automation/published_articles.json'
                    
                    published_articles = load_published_articles()
                    if article_link not in published_articles:
                        published_articles.append(article_link)
                        with open(drive_path, 'w') as f:
                            json.dump(published_articles, f)
                        print(f"Added {article_link} to published articles on Google Drive.")

Now, let's create a function to parse ChatGPT's response and select the best unpublished article:

def parse_chatgpt_response(response, articles):
                    """
                    Parses the JSON response from ChatGPT and selects the best unpublished article.
                    
                    Args:
                        response (str): The JSON response from ChatGPT
                        articles (list): The original list of articles
                        
                    Returns:
                        tuple: (selected_article, reason) - The selected article and reason for selection
                    """
                    try:
                        # Parse the JSON response
                        if isinstance(response, str):
                            structured_response = json.loads(response)
                        else:
                            structured_response = response
                        
                        # Get all article scores and sort by total score (highest first)
                        sorted_scores = sorted(
                            structured_response['article_scores'], 
                            key=lambda x: x['total_score'], 
                            reverse=True
                        )
                        
                        # Find an unpublished article with the highest score
                        for article_score in sorted_scores:
                            article_index = article_score['number'] - 1
                            
                            # Check if this article is within the range of our articles list
                            if article_index < len(articles):
                                article_link = articles[article_index]['html_link']
                                
                                # Check if this article has already been published
                                if not is_article_published(article_link):
                                    selected_article = articles[article_index]
                                    reason = f"Highest scored unpublished article. Score: {article_score['total_score']}/50. {article_score['explanation']}"
                                    return selected_article, reason
                        
                        return None, "All retrieved articles have already been published."
                    
                    except Exception as e:
                        print(f"Error parsing ChatGPT response: {e}")
                        return None, f"Error: {e}"

This function:

  1. Parses the JSON response from GPT-4
  2. Sorts the article scores from highest to lowest
  3. Checks each high-scoring article to see if it has already been published
  4. Returns the highest-scoring unpublished article and a reason for selection

Now, let's put everything together with a function to select an article:

def select_best_article(search_query="ChatGPT"):
                    """
                    Fetches articles, analyzes them, and selects the best unpublished article.
                    
                    Args:
                        search_query (str): The topic to search for
                        
                    Returns:
                        tuple: (selected_article, reason) - The selected article and reason for selection
                    """
                    print(f"Searching for articles related to: {search_query}")
                    
                    # Fetch articles from arXiv
                    articles = fetch_arxiv_articles(search_query)
                    
                    if not articles:
                        return None, "No articles found matching the query."
                    
                    print(f"\nAnalyzing {len(articles)} articles to find the best candidate...")
                    
                    # Send to GPT-4 for analysis
                    raw_response, structured_response = send_to_chatgpt_for_scoring(articles)
                    
                    if not structured_response:
                        return None, "Failed to get a valid response from GPT-4."
                    
                    # Parse the response and select the best unpublished article
                    selected_article, reason = parse_chatgpt_response(structured_response, articles)
                    
                    if not selected_article:
                        return None, reason
                    
                    print(f"\nSelected article: {selected_article['title']}")
                    print(f"Reason: {reason}")
                    
                    return selected_article, reason

Finally, let's test our article selection function:

# Test our article selection function
                selected_article, reason = select_best_article("AI ethics")
                
                if selected_article:
                    print("\nSelected Article Details:")
                    print(f"Title: {selected_article['title']}")
                    print(f"Authors: {selected_article['authors']}")
                    print(f"Link: {selected_article['html_link']}")
                    print(f"Abstract: {selected_article['abstract'][:200]}...")
                    
                    # Fetch the full content of the selected article
                    print("\nFetching full content...")
                    full_article = fetch_article_content(selected_article)
                    
                    if full_article:
                        print("Successfully fetched full article content.")
                        print(f"Number of content sections: {len(full_article['content'])}")
                        
                        # Display the first content section
                        if full_article['content']:
                            first_section = full_article['content'][0]
                            print(f"\nFirst section title: {first_section[0]}")
                            print(f"First section excerpt: {first_section[1][:200]}...")
                    else:
                        print("Failed to fetch full article content.")

✅ Action Steps

  1. Add all the code from this module to your Google Colab notebook
  2. Run the test code to fetch and select an article
  3. Experiment with different search queries relevant to your blog's niche
  4. Inspect the selected article's content to understand how it's structured
  5. If you're using Google Drive for persistence, mount your drive and test the persistent storage functions

In this module, you've learned how to:

  • Fetch articles from arXiv using requests and BeautifulSoup
  • Extract content from both HTML and PDF formats
  • Use GPT-4 to analyze and score articles based on their suitability for blog conversion
  • Track published articles to avoid duplication
  • Put everything together to select the best unpublished article

In the next module, we'll move on to the exciting part—using GPT-4 to transform the selected article into an engaging blog post that your audience will love.

Master the art of transforming technical content into engaging blog posts using GPT-4 and effective prompt engineering techniques.

Lessons in this module:

  • Extracting Full Article Content
  • Designing Effective Blog Generation Prompts
  • Converting Academic Content to Engaging Blog Posts
  • Formatting and Structuring Your Generated Content

Module Content:

In the previous module, we learned how to select the most promising academic article for our blog. Now comes the exciting part: transforming that technical content into an engaging, accessible blog post using GPT-4. This is where the real magic of our automated blog system happens.

Extracting Full Article Content

Before we can generate a blog post, we need to ensure we have the complete content of our selected article. We already implemented the fetch_article_content function in Module 3, which extracts content from either HTML or PDF versions of arXiv papers.

Let's review how this function prepares the content for transformation:

  1. It extracts the full text of the article, including all sections
  2. It structures the content into sections, each with a title and body text
  3. It preserves key metadata like authors and the original link

Sometimes, however, the extraction might not be perfect, especially from PDFs. Let's enhance our content preparation with a function that ensures the article content is ready for transformation:

def prepare_article_for_transformation(article):
                    """
                    Prepares an article's content for transformation into a blog post.
                    Ensures all necessary content is present and properly formatted.
                    
                    Args:
                        article (dict): The article with content already fetched
                        
                    Returns:
                        dict: The prepared article ready for transformation
                    """
                    prepared_article = article.copy()
                    
                    # Ensure title is present
                    if 'title' not in prepared_article or not prepared_article['title']:
                        prepared_article['title'] = "Untitled Article"
                        print("Warning: Article title is missing. Using 'Untitled Article' instead.")
                    
                    # Ensure abstract is present
                    if 'abstract' not in prepared_article or not prepared_article['abstract']:
                        prepared_article['abstract'] = "Abstract not available."
                        print("Warning: Article abstract is missing. Using placeholder instead.")
                    
                    # Ensure content is present and properly formatted
                    if 'content' not in prepared_article or not prepared_article['content']:
                        prepared_article['content'] = [("Introduction", "Content not available.")]
                        print("Warning: Article content is missing. Using placeholder instead.")
                    
                    # For PDF-extracted content, try to identify and split sections
                    if len(prepared_article['content']) == 1 and prepared_article['content'][0][0] == 'Full Content':
                        print("PDF-extracted content detected. Attempting to split into sections...")
                        full_text = prepared_article['content'][0][1]
                        
                        # Try to split the text into sections based on common section headings
                        sections = []
                        
                        # Common section patterns in academic papers
                        section_patterns = [
                            r'\n\s*(\d+\.?\s+[A-Z][a-zA-Z\s]+)\s*\n',  # Numbered sections: "1. Introduction"
                            r'\n\s*([A-Z][a-zA-Z\s]+)\s*\n',           # Capitalized sections: "INTRODUCTION"
                        ]
                        
                        # Try each pattern to find sections
                        for pattern in section_patterns:
                            if len(sections) == 0:  # Only proceed if we haven't found sections yet
                                matches = re.finditer(pattern, full_text)
                                sections_found = list(matches)
                                
                                if len(sections_found) > 0:
                                    # Found section headings, now extract content
                                    for i, match in enumerate(sections_found):
                                        section_title = match.group(1).strip()
                                        start_pos = match.end()
                                        
                                        # Find end of this section (start of next section or end of text)
                                        if i < len(sections_found) - 1:
                                            end_pos = sections_found[i + 1].start()
                                        else:
                                            end_pos = len(full_text)
                                        
                                        section_content = full_text[start_pos:end_pos].strip()
                                        sections.append((section_title, section_content))
                        
                        # If we found sections, update the content
                        if len(sections) > 0:
                            prepared_article['content'] = sections
                            print(f"Successfully split content into {len(sections)} sections.")
                        else:
                            # Fallback: create artificial sections if splitting failed
                            print("Could not identify sections. Creating artificial sections...")
                            paragraphs = full_text.split('\n\n')
                            
                            # Group paragraphs into reasonable-sized sections
                            chunks = [paragraphs[i:i + 5] for i in range(0, len(paragraphs), 5)]
                            
                            prepared_article['content'] = [
                                (f"Section {i+1}", '\n\n'.join(chunk)) 
                                for i, chunk in enumerate(chunks)
                            ]
                    
                    # Ensure authors are present
                    if 'authors' not in prepared_article or not prepared_article['authors']:
                        prepared_article['authors'] = "Unknown Authors"
                        print("Warning: Authors information is missing. Using 'Unknown Authors' instead.")
                    
                    # Clean up any excessive whitespace in all text fields
                    prepared_article['title'] = prepared_article['title'].strip()
                    prepared_article['abstract'] = prepared_article['abstract'].strip()
                    prepared_article['authors'] = prepared_article['authors'].strip()
                    
                    print("Article prepared successfully for transformation.")
                    return prepared_article

This preparation function performs several important tasks:

  • It checks for missing fields and adds placeholders if necessary
  • It attempts to improve PDF-extracted content by identifying and splitting sections
  • It handles cases where the content structure isn't clear by creating artificial sections
  • It cleans up whitespace and ensures consistent formatting

By properly preparing the article content, we ensure that GPT-4 receives well-structured input, which leads to better-quality blog posts.

Designing Effective Blog Generation Prompts

The quality of our generated blog posts depends heavily on how we prompt GPT-4. Well-crafted prompts lead to well-written content, while vague or poorly designed prompts can result in generic or unfocused posts.

Here are the key elements of an effective blog generation prompt:

  1. Clear instructions: Explicitly state what you want GPT-4 to do
  2. Style guidance: Specify the desired tone, voice, and level of formality
  3. Structural requirements: Outline the expected sections and formats
  4. Target audience: Define who the blog post is for
  5. Source attribution: Include instructions for citing the original article

Let's create a function that generates a comprehensive prompt for blog post creation:

def create_blog_generation_prompt(article):
                    """
                    Creates a detailed prompt for GPT-4 to transform an academic article into an engaging blog post.
                    
                    Args:
                        article (dict): The prepared article with all necessary content
                        
                    Returns:
                        str: The prompt for GPT-4
                    """
                    # Start with the system instruction
                    system_instruction = """
                    You are an expert blog writer specializing in transforming academic research into engaging, accessible content.
                    Your task is to create an informative and interesting blog post based on the academic article I will provide.
                    
                    Guidelines:
                    1. Use an informal but intelligent tone, like you're explaining to a smart friend
                    2. Avoid excessive jargon, but don't oversimplify important concepts
                    3. Structure the post with clear headings and subheadings
                    4. Include a compelling introduction that highlights the relevance and importance of the research
                    5. Break down complex ideas into understandable explanations
                    6. Add analogies or examples where helpful
                    7. Format as Markdown with appropriate headings, lists, and emphasis
                    8. Include a "Key Takeaways" section at the end
                    9. Create an engaging, SEO-friendly title (different from the academic title)
                    10. Properly credit the original research and authors
                    
                    The blog post should be 800-1200 words long and written for a general audience interested in technology and AI.
                    """
                    
                    # Extract content for the prompt
                    title = article['title']
                    authors = article['authors']
                    abstract = article['abstract']
                    
                    # Combine all content sections
                    content_text = ""
                    for section_title, section_content in article['content']:
                        content_text += f"\n\n## {section_title}\n{section_content}"
                    
                    # Create the full prompt
                    full_prompt = f"""
                    Please transform this academic article into an engaging blog post following the guidelines above.
                
                    ARTICLE TITLE: {title}
                    
                    AUTHORS: {authors}
                    
                    ABSTRACT: {abstract}
                    
                    CONTENT: {content_text}
                    
                    ORIGINAL SOURCE: {article.get('html_link', 'Source not available')}
                    
                    Remember to:
                    - Create an attention-grabbing title
                    - Start with an engaging introduction
                    - Break down the content into digestible sections
                    - Explain complex terms and concepts
                    - Include practical implications where relevant
                    - End with key takeaways
                    - Format in Markdown
                    - Include proper attribution to the original authors and research
                    """
                    
                    return system_instruction, full_prompt

This function generates a two-part prompt:

  1. A system instruction that defines the AI's role and general guidelines
  2. A full prompt that includes the specific article content and detailed instructions

By separating the prompt this way, we can take advantage of GPT-4's system and user message distinction, which helps it better understand the task.

Converting Academic Content to Engaging Blog Posts

Now that we have our prepared article and carefully crafted prompt, let's create a function to generate the blog post using GPT-4:

def generate_blog_post(article):
                    """
                    Generates an engaging blog post from an academic article using GPT-4.
                    
                    Args:
                        article (dict): The article content
                        
                    Returns:
                        str: The generated blog post in Markdown format
                    """
                    # Prepare the article for transformation
                    prepared_article = prepare_article_for_transformation(article)
                    
                    # Create the prompt
                    system_instruction, user_prompt = create_blog_generation_prompt(prepared_article)
                    
                    print("Sending request to GPT-4 to generate blog post...")
                    
                    try:
                        # Request the blog post from GPT-4
                        response = client.chat.completions.create(
                            model="gpt-4o",  # Using GPT-4o for optimal quality
                            messages=[
                                {"role": "system", "content": system_instruction},
                                {"role": "user", "content": user_prompt}
                            ],
                            max_tokens=4000,  # Allow enough tokens for a full blog post
                            temperature=0.7  # Slightly increased temperature for creativity
                        )
                        
                        # Extract the blog post content
                        blog_post = response.choices[0].message.content
                        
                        print("Blog post generated successfully!")
                        
                        # Save a copy of the generated blog post to a file
                        with open('generated_blog_post.md', 'w') as f:
                            f.write(blog_post)
                        
                        print("Blog post saved to 'generated_blog_post.md'")
                        
                        return blog_post
                    
                    except Exception as e:
                        print(f"Error generating blog post: {e}")
                        return None

This function:

  1. Prepares the article for transformation using our preparation function
  2. Creates a prompt using our prompt generation function
  3. Sends a request to GPT-4 with the appropriate parameters
  4. Saves the generated blog post to a file for reference
  5. Returns the generated blog post content

Note that we're using a temperature of 0.7, which provides a good balance between creativity and coherence. You can adjust this value to control the style of your generated posts:

  • Lower temperature (e.g., 0.3-0.5): More factual, conservative, and predictable outputs
  • Higher temperature (e.g., 0.7-0.9): More creative, diverse, and potentially unique outputs

Let's also create a function to handle potential failures and retries:

def generate_blog_post_with_retry(article, max_retries=3):
                    """
                    Generates a blog post with automatic retry on failure.
                    
                    Args:
                        article (dict): The article content
                        max_retries (int): Maximum number of retry attempts
                        
                    Returns:
                        str: The generated blog post
                    """
                    for attempt in range(1, max_retries + 1):
                        print(f"Attempt {attempt} of {max_retries} to generate blog post...")
                        
                        blog_post = generate_blog_post(article)
                        
                        if blog_post:
                            return blog_post
                        
                        print(f"Attempt {attempt} failed. Waiting before retry...")
                        import time
                        time.sleep(5)  # Wait 5 seconds between attempts
                    
                    print(f"Failed to generate blog post after {max_retries} attempts.")
                    return None

This retry function adds resilience to our system, automatically attempting to regenerate the blog post if the initial attempt fails.

Formatting and Structuring Your Generated Content

The blog post generated by GPT-4 will be in Markdown format, which is a lightweight markup language that's easy to read and write. However, for publishing to WordPress and other platforms, we may need to convert this Markdown to different formats.

Let's create functions to handle this conversion and ensure our content is properly structured:

def extract_blog_title(blog_post):
                    """
                    Extracts the title from the blog post.
                    
                    Args:
                        blog_post (str): The Markdown blog post
                        
                    Returns:
                        str: The extracted title
                    """
                    # Look for a title at the beginning, typically a level 1 heading
                    match = re.search(r'^#\s+(.+)$', blog_post, re.MULTILINE)
                    
                    if match:
                        return match.group(1).strip()
                    
                    # Fallback: look for the first line that might be a title
                    lines = blog_post.split('\n')
                    for line in lines[:5]:  # Check the first few lines
                        line = line.strip()
                        if line and not line.startswith('#') and not line.startswith('!') and not line.startswith('```'):
                            return line
                    
                    return "Untitled Blog Post"
                
                def markdown_to_html(markdown_content):
                    """
                    Converts Markdown content to HTML.
                    
                    Args:
                        markdown_content (str): The Markdown content
                        
                    Returns:
                        str: The HTML content
                    """
                    # Use the markdown library to convert to HTML
                    html_content = markdown.markdown(markdown_content, extensions=['extra'])
                    return html_content
                
                def markdown_to_wordpress_blocks(markdown_content):
                    """
                    Converts Markdown content to WordPress Gutenberg blocks format.
                    
                    Args:
                        markdown_content (str): The Markdown content
                        
                    Returns:
                        str: The WordPress blocks content
                    """
                    # First convert to HTML
                    html_content = markdown_to_html(markdown_content)
                    
                    # Convert HTML to WordPress blocks format
                    # Paragraphs
                    html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) # Headings html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) # Lists html_content = re.sub(r'
    (.*?)
', r'
    \1
', html_content, flags=re.DOTALL) html_content = re.sub(r'
    (.*?)
', r'
    \1
', html_content, flags=re.DOTALL) # Code blocks html_content = re.sub(r'
(.*?)
', r'
\1
', html_content, flags=re.DOTALL) # Blockquotes html_content = re.sub(r'
(.*?)
', r'
\1
', html_content, flags=re.DOTALL) return html_content def add_blog_metadata(blog_post, original_article): """ Adds metadata and attribution to the blog post. Args: blog_post (str): The blog post content original_article (dict): The original article data Returns: str: The blog post with added metadata """ # Extract the original title and authors original_title = original_article.get('title', 'Unknown Title') original_authors = original_article.get('authors', 'Unknown Authors') original_link = original_article.get('html_link', 'Source not available') # Create an attribution section attribution = f""" ## Source Attribution This blog post is based on the research article "{original_title}" by {original_authors}. You can find the original article [here]({original_link}). """ # Add a call-to-action for your site cta = """ ## Improve Your AI Skills If you enjoyed this article and want to learn more about AI and prompt engineering, check out our other [free courses](/ai-courses/) on topics like advanced prompt engineering, SEO blog writing with AI, and more. """ # Combine everything enhanced_blog_post = blog_post + attribution + cta return enhanced_blog_post

These functions handle various aspects of formatting and structuring our content:

  • extract_blog_title pulls the title from the generated blog post
  • markdown_to_html converts Markdown to standard HTML
  • markdown_to_wordpress_blocks converts Markdown to WordPress Gutenberg blocks format
  • add_blog_metadata adds attribution and a call-to-action to the blog post

With these functions, we ensure that our generated content is properly formatted for different publishing platforms and includes all necessary metadata and attribution.

Testing It All Together

Let's create a test function that puts all these components together to transform an article into a blog post:

def test_blog_generation(search_query="ChatGPT"):
                    """
                    Tests the entire blog generation process from article selection to blog post creation.
                    
                    Args:
                        search_query (str): The topic to search for
                        
                    Returns:
                        tuple: (title, blog_post) - The title and content of the generated blog post
                    """
                    print(f"Starting test with search query: {search_query}")
                    
                    # Step 1: Select the best article
                    selected_article, reason = select_best_article(search_query)
                    
                    if not selected_article:
                        print(f"Could not select an article: {reason}")
                        return None, None
                    
                    print("\nSelected article:")
                    print(f"Title: {selected_article['title']}")
                    print(f"Link: {selected_article['html_link']}")
                    print(f"Reason: {reason}")
                    
                    # Step 2: Fetch the full content
                    print("\nFetching full article content...")
                    full_article = fetch_article_content(selected_article)
                    
                    if not full_article:
                        print("Failed to fetch full article content.")
                        return None, None
                    
                    # Add the title from the selected article
                    full_article['title'] = selected_article['title']
                    
                    # Step 3: Generate the blog post
                    print("\nGenerating blog post...")
                    blog_post = generate_blog_post_with_retry(full_article)
                    
                    if not blog_post:
                        print("Failed to generate blog post.")
                        return None, None
                    
                    # Step 4: Extract the title and enhance the blog post
                    print("\nExtracting title and enhancing blog post...")
                    title = extract_blog_title(blog_post)
                    enhanced_blog_post = add_blog_metadata(blog_post, full_article)
                    
                    print("\nBlog post generated successfully!")
                    print(f"Title: {title}")
                    print(f"Length: {len(enhanced_blog_post.split())} words")
                    
                    # Return the title and blog post content
                    return title, enhanced_blog_post

Let's run this test function to see our blog generation process in action:

# Run the test
                title, blog_post = test_blog_generation("AI ethics")
                
                if title and blog_post:
                    # Preview the first 500 characters of the blog post
                    print("\nBlog Post Preview:")
                    print("-" * 50)
                    print(blog_post[:500] + "...")
                    print("-" * 50)
                    
                    # Save the full blog post
                    with open('enhanced_blog_post.md', 'w') as f:
                        f.write(blog_post)
                    
                    print("\nFull blog post saved to 'enhanced_blog_post.md'")
                    
                    # Convert to WordPress format
                    wp_content = markdown_to_wordpress_blocks(blog_post)
                    
                    with open('wordpress_blog_post.html', 'w') as f:
                        f.write(wp_content)
                    
                    print("WordPress format saved to 'wordpress_blog_post.html'")

✅ Action Steps

  1. Add all the code from this module to your Google Colab notebook
  2. Run the test function to generate a blog post
  3. Review the generated blog post and WordPress format files
  4. Experiment with different search queries
  5. Try adjusting the temperature parameter to control the style of the generated posts
  6. Customize the create_blog_generation_prompt function to better match your blog's style and audience

In this module, you've learned how to:

  • Prepare article content for transformation into a blog post
  • Design effective prompts for GPT-4 to generate high-quality blog content
  • Generate engaging blog posts from academic articles
  • Format and structure the generated content for different publishing platforms
  • Add metadata and attribution to ensure proper credit

In the next module, we'll cover how to generate custom images for your blog posts using gpt-image-1, which will make your automated content even more engaging and visually appealing.

Learn to generate custom images for your blog posts using OpenAI's gpt-image-1 model, enhancing visual appeal and engagement.

Lessons in this module:

  • Generating Relevant Image Descriptions
  • Creating Custom Featured Images with gpt-image-1
  • Image Handling and Processing
  • Best Practices for AI-Generated Visuals

Module Content:

Visual content is a critical component of engaging blog posts. In this module, we'll leverage OpenAI's latest gpt-image-1 model to automatically generate custom featured images for our blog posts. These visuals will not only make our content more appealing but also help with social sharing and reader engagement.

Generating Relevant Image Descriptions

Before we can generate an image, we need to create a meaningful and detailed description of what we want the image to depict. The quality of this description directly impacts the quality of the generated image.

Let's create a function that uses GPT-4 to generate an image description based on our blog post content:

def generate_image_description(blog_post, title):
                    """
                    Generates a detailed image description based on the blog post content.
                    
                    Args:
                        blog_post (str): The full blog post content
                        title (str): The title of the blog post
                        
                    Returns:
                        str: A detailed image description for gpt-image-1
                    """
                    # Create a prompt for GPT-4 to generate an image description
                    prompt = f"""
                    Based on the following blog post title and content, create a detailed description for an image 
                    that would serve as an engaging featured image for the post.
                    
                    BLOG TITLE: {title}
                    
                    BLOG CONTENT (excerpt):
                    {blog_post[:2000]}...
                    
                    Guidelines for the image description:
                    1. The description should be detailed and vivid (200-300 words)
                    2. Focus on creating a professional, eye-catching image relevant to the blog topic
                    3. Avoid requesting any text in the image
                    4. Consider using metaphors, abstract representations, or relevant visual concepts
                    5. Incorporate color recommendations to match the tone of the article
                    6. Aim for a clean, modern aesthetic appropriate for a technology/AI blog
                    7. The image should be appropriate for a professional audience
                    
                    Your description will be used with OpenAI's gpt-image-1 model to generate the actual image.
                    Only provide the description itself, without any additional commentary.
                    """
                    
                    try:
                        # Request the image description from GPT-4
                        response = client.chat.completions.create(
                            model="gpt-4o",
                            messages=[
                                {"role": "system", "content": "You are an expert in creating vivid, detailed image descriptions for AI image generation models."},
                                {"role": "user", "content": prompt}
                            ],
                            max_tokens=500,
                            temperature=0.7
                        )
                        
                        # Extract the image description
                        image_description = response.choices[0].message.content.strip()
                        
                        print("Image description generated successfully!")
                        print(f"Length: {len(image_description.split())} words")
                        
                        return image_description
                    
                    except Exception as e:
                        print(f"Error generating image description: {e}")
                        
                        # Fallback to a simple description based on the title
                        fallback_description = f"A professional, modern illustration representing the concept of {title}. The image should be clean, with a tech-inspired aesthetic, using a blue and purple color scheme."
                        
                        print(f"Using fallback description: {fallback_description}")
                        return fallback_description

This function:

  1. Takes the blog post content and title as input
  2. Creates a prompt instructing GPT-4 to generate a detailed image description
  3. Provides guidelines for the type of image we want (professional, relevant, without text)
  4. Uses GPT-4o to generate the description
  5. Includes a fallback in case of errors

The image description is crucial because it provides the creative direction for the image generation model. A well-crafted description will result in a more relevant and visually appealing image.

Creating Custom Featured Images with gpt-image-1

Now that we have a detailed image description, we can use OpenAI's gpt-image-1 model to generate our featured image. The gpt-image-1 model is the latest image generation model from OpenAI, offering improvements in quality and capabilities over previous models.

Let's create a function to generate an image using this model:

def generate_image(image_description):
                    """
                    Generates an image using OpenAI's gpt-image-1 model.
                    
                    Args:
                        image_description (str): Detailed description of the image to generate
                        
                    Returns:
                        bytes: The generated image data in bytes
                    """
                    try:
                        print("Generating image with gpt-image-1...")
                        print(f"Using description: {image_description[:100]}...")
                        
                        # Request image generation from OpenAI
                        response = client.images.generate(
                            model="gpt-image-1",
                            prompt=image_description,
                            n=1,  # Generate 1 image
                            size="1024x1024",  # Square format works best for most blog featured images
                            quality="medium",  # Medium quality as specified
                            response_format="b64_json"  # gpt-image-1 always returns base64-encoded images
                        )
                        
                        # Get the base64-encoded image data
                        image_b64 = response.data[0].b64_json
                        
                        # Convert base64 to bytes
                        image_data = base64.b64decode(image_b64)
                        
                        # Save a copy of the image for reference
                        with open('generated_image.png', 'wb') as f:
                            f.write(image_data)
                        
                        print("Image generated and saved to 'generated_image.png'")
                        
                        return image_data
                    
                    except Exception as e:
                        print(f"Error generating image: {e}")
                        
                        # If there's an error, return None or a placeholder image
                        try:
                            # Use a placeholder image if available
                            with open('placeholder.png', 'rb') as f:
                                return f.read()
                        except:
                            print("Could not load placeholder image.")
                            return None

Key features of our image generation function:

  • Uses the new "gpt-image-1" model from OpenAI
  • Sets quality to "medium" as requested for optimal balance of quality and API usage
  • Uses a 1024x1024 square format, which works well for featured images
  • Receives base64-encoded image data and converts it to binary format
  • Saves a local copy of the generated image for reference
  • Includes error handling and a fallback to a placeholder image if needed

The gpt-image-1 model has several advantages over previous models:

  • Better understanding of complex prompts and abstract concepts
  • More consistent aesthetic quality
  • Improved handling of details and composition
  • Multiple quality levels (high, medium, low) to balance quality and cost

Let's also create a more advanced function that can generate different image sizes based on our needs:

def generate_image_advanced(image_description, orientation="square", quality="medium"):
                    """
                    Generate an image with custom orientation and quality settings.
                    
                    Args:
                        image_description (str): Detailed description of the image to generate
                        orientation (str): Image orientation - "square", "landscape", or "portrait"
                        quality (str): Image quality - "high", "medium", or "low"
                        
                    Returns:
                        bytes: The generated image data in bytes
                    """
                    try:
                        # Set the image size based on orientation
                        if orientation == "landscape":
                            size = "1536x1024"
                        elif orientation == "portrait":
                            size = "1024x1536"
                        else:  # default to square
                            size = "1024x1024"
                        
                        # Validate quality parameter
                        if quality not in ["high", "medium", "low"]:
                            quality = "medium"  # Default to medium if invalid
                        
                        print(f"Generating {orientation} image with {quality} quality...")
                        
                        # Generate the image
                        response = client.images.generate(
                            model="gpt-image-1",
                            prompt=image_description,
                            n=1,
                            size=size,
                            quality=quality,
                            response_format="b64_json"
                        )
                        
                        # Get the base64-encoded image data
                        image_b64 = response.data[0].b64_json
                        
                        # Convert base64 to bytes
                        image_data = base64.b64decode(image_b64)
                        
                        # Save a copy of the image for reference
                        with open(f'generated_image_{orientation}_{quality}.png', 'wb') as f:
                            f.write(image_data)
                        
                        print(f"Image generated and saved to 'generated_image_{orientation}_{quality}.png'")
                        
                        return image_data
                    
                    except Exception as e:
                        print(f"Error generating image: {e}")
                        return None

This advanced function allows you to customize:

  • Orientation: Choose between square (1024x1024), landscape (1536x1024), or portrait (1024x1536)
  • Quality: Select high, medium, or low quality based on your needs and budget

You might use different orientations for different platforms—for example, landscape for blog headers, square for social media, and portrait for Pinterest.

Image Handling and Processing

Once we've generated our image, we need to handle and process it properly. This includes saving it, potentially resizing or optimizing it, and preparing it for upload to WordPress.

Let's create some utility functions for image handling:

def save_image_to_file(image_data, file_path="generated_image.png"):
                    """
                    Saves image data to a file.
                    
                    Args:
                        image_data (bytes): The image data in bytes
                        file_path (str): The path where the image should be saved
                        
                    Returns:
                        bool: True if successful, False otherwise
                    """
                    try:
                        with open(file_path, 'wb') as f:
                            f.write(image_data)
                        print(f"Image saved to {file_path}")
                        return True
                    except Exception as e:
                        print(f"Error saving image: {e}")
                        return False
                
                def optimize_image(image_data, quality=85):
                    """
                    Optimizes the image by reducing its file size without significantly affecting quality.
                    Requires PIL (Pillow) library.
                    
                    Args:
                        image_data (bytes): The original image data
                        quality (int): The JPEG quality (1-100)
                        
                    Returns:
                        bytes: The optimized image data
                    """
                    try:
                        from PIL import Image
                        import io
                        
                        # Open the image from bytes
                        image = Image.open(io.BytesIO(image_data))
                        
                        # Create an output buffer
                        output_buffer = io.BytesIO()
                        
                        # Save the image as JPEG with specified quality
                        image.convert('RGB').save(output_buffer, format='JPEG', quality=quality, optimize=True)
                        
                        # Get the optimized image data
                        optimized_data = output_buffer.getvalue()
                        
                        print(f"Image optimized: {len(image_data)} bytes -> {len(optimized_data)} bytes")
                        return optimized_data
                    
                    except ImportError:
                        print("PIL (Pillow) library not installed. Installing...")
                        !pip install Pillow
                        print("Please run this function again after installation.")
                        return image_data
                    
                    except Exception as e:
                        print(f"Error optimizing image: {e}")
                        return image_data

For more advanced image processing, we can also create functions for resizing and creating multiple versions of an image:

def create_image_variations(image_data):
                    """
                    Creates multiple sizes of an image for different purposes.
                    Requires PIL (Pillow) library.
                    
                    Args:
                        image_data (bytes): The original image data
                        
                    Returns:
                        dict: Dictionary with different image sizes
                    """
                    try:
                        from PIL import Image
                        import io
                        
                        # Open the image from bytes
                        original_image = Image.open(io.BytesIO(image_data))
                        
                        # Create different sizes
                        sizes = {
                            'thumbnail': (150, 150),
                            'medium': (300, 300),
                            'large': (600, 600),
                            'full': original_image.size
                        }
                        
                        # Dictionary to store the different image versions
                        image_versions = {}
                        
                        # Create each size
                        for name, dimensions in sizes.items():
                            # Create a copy of the image
                            img_copy = original_image.copy()
                            
                            # Resize the image
                            img_copy.thumbnail(dimensions, Image.LANCZOS)
                            
                            # Save to buffer
                            buffer = io.BytesIO()
                            img_copy.save(buffer, format='PNG')
                            
                            # Store in dictionary
                            image_versions[name] = buffer.getvalue()
                            
                            # Save to file for reference
                            save_image_to_file(buffer.getvalue(), f'generated_image_{name}.png')
                        
                        print(f"Created {len(image_versions)} image variations")
                        return image_versions
                    
                    except ImportError:
                        print("PIL (Pillow) library not installed. Installing...")
                        !pip install Pillow
                        print("Please run this function again after installation.")
                        return {'full': image_data}
                    
                    except Exception as e:
                        print(f"Error creating image variations: {e}")
                        return {'full': image_data}

These functions provide several capabilities:

  • Saving images to files for reference or backup
  • Optimizing images to reduce file size while maintaining quality
  • Creating multiple sizes of an image for different contexts (thumbnails, featured images, etc.)

Proper image handling ensures our blog posts look professional and load quickly, which is important for both user experience and SEO.

Best Practices for AI-Generated Visuals

While gpt-image-1 can create impressive images, following certain best practices will help you get the most out of the model:

  1. Be detailed and specific: The more detailed your description, the better the results. Include information about composition, style, colors, and mood.
  2. Avoid text in images: While gpt-image-1 is better at rendering text than previous models, text in images can still be inconsistent or illegible. It's better to add text using your CMS or image editing software if needed.
  3. Understand the limitations: AI-generated images may sometimes have inconsistencies in details like hands or complex patterns. Review images before publishing.
  4. Maintain consistency: For your blog's visual identity, try to maintain a consistent style across your images by using similar prompts.
  5. Consider ethical implications: Be mindful of potential biases in AI-generated images and ensure your descriptions promote diversity and inclusivity.
  6. Optimize for usage: Choose appropriate quality levels based on your needs—"medium" quality is often sufficient for blog featured images while saving on API costs.

Here are some examples of effective image prompts:

Blog Topic Effective Image Prompt
AI Ethics A conceptual illustration representing the balance between technology and ethics. Show a scale with a glowing blue AI neural network pattern on one side and a warm gold human silhouette on the other, perfectly balanced. The background should feature a gradient from deep blue to purple, with subtle binary code patterns. The scene should have a contemplative, serious mood with dramatic lighting that creates a sense of importance and depth.
Machine Learning Tutorial A professional, educational illustration showing a stylized human mind connecting with algorithms. The image should feature a human head profile in silhouette with visible neural connections inside, bridging to geometric patterns representing machine learning algorithms. Use a color palette of blue, purple, and teal with clean lines and a minimalist style. The overall aesthetic should be modern, high-tech, and approachable.
Remote Work Tips A warm, inviting home office setup viewed from above in isometric style. Include a modern desk with laptop, plants, a coffee mug, and notebook. The scene should have natural light streaming in from a window, creating a productive atmosphere. Use a color palette of warm neutrals with accents of green and blue. The style should be clean and slightly stylized rather than photorealistic, with attention to small details that make the space feel personalized.

Let's create a function that puts everything together to generate an image for our blog post:

def create_blog_featured_image(blog_post, title):
                    """
                    Creates a featured image for a blog post using GPT-4 for the description
                    and gpt-image-1 for image generation.
                    
                    Args:
                        blog_post (str): The blog post content
                        title (str): The blog post title
                        
                    Returns:
                        bytes: The generated image data
                    """
                    print("Creating featured image for blog post...")
                    
                    # Step 1: Generate image description
                    image_description = generate_image_description(blog_post, title)
                    
                    if not image_description:
                        print("Failed to generate image description.")
                        return None
                    
                    # Step 2: Generate the image using gpt-image-1
                    image_data = generate_image(image_description)
                    
                    if not image_data:
                        print("Failed to generate image.")
                        return None
                    
                    # Step 3: Optimize the image (optional)
                    try:
                        optimized_image = optimize_image(image_data)
                        print("Image optimized successfully.")
                        return optimized_image
                    except:
                        # If optimization fails, return the original image
                        print("Image optimization failed, using original image.")
                        return image_data

Testing It All Together

Let's test our image generation pipeline with a sample blog post:

# Test our image generation pipeline
                def test_image_generation():
                    """
                    Tests the image generation pipeline with a sample blog post.
                    """
                    # Sample blog title and excerpt
                    sample_title = "The Future of AI: Balancing Innovation and Ethics"
                    sample_content = """
                    Artificial Intelligence has made remarkable strides in recent years, transforming industries and creating new possibilities that were once confined to science fiction. As these technologies become more sophisticated and integrated into our daily lives, it's crucial to consider both the tremendous potential for innovation and the ethical considerations that come with it.
                    
                    In this blog post, we'll explore the cutting-edge developments in AI research, the challenges of responsible implementation, and how organizations are working to ensure these powerful tools benefit humanity.
                    
                    ## Recent Breakthroughs in AI
                    
                    The field of AI has seen unprecedented growth, with models becoming increasingly capable of understanding context, generating creative content, and solving complex problems. Large language models can now write code, compose music, and engage in nuanced conversations that closely resemble human interaction.
                    
                    ## Ethical Considerations
                    
                    As AI systems become more powerful, questions about privacy, bias, and accountability take center stage. Researchers and policymakers are working to develop frameworks that ensure these technologies are developed and deployed in ways that are fair, transparent, and beneficial to society.
                    """
                    
                    # Generate the image
                    image_data = create_blog_featured_image(sample_content, sample_title)
                    
                    if image_data:
                        print("Image generation test successful!")
                        return image_data
                    else:
                        print("Image generation test failed.")
                        return None
                
                # Run the test
                test_image = test_image_generation()
                
                # Display image info if generated successfully
                if test_image:
                    print(f"Image size: {len(test_image)} bytes")

✅ Action Steps

  1. Add all the code from this module to your Google Colab notebook
  2. Run the test function to generate a sample image
  3. Experiment with different quality settings and orientations
  4. Create your own image prompts for different types of blog content
  5. Try optimizing images and creating multiple sizes
  6. If using Google Drive for persistence, save your generated images there

In this module, you've learned how to:

  • Generate detailed image descriptions using GPT-4
  • Create custom featured images using OpenAI's gpt-image-1 model
  • Handle and process images for optimal quality and file size
  • Apply best practices for AI-generated visuals
  • Integrate image generation into your blog automation workflow

In the next module, we'll cover how to publish your blog posts and images to WordPress and Medium, completing our automated blog creation pipeline.

Automate the publication process to WordPress, including handling of featured images, content formatting, and post management.

Lessons in this module:

  • WordPress REST API Integration
  • Converting Markdown to WordPress Blocks
  • Handling Image Uploads and Featured Images
  • Managing WordPress Posts and Categories

Module Content:

Now that we've generated high-quality blog content and a custom featured image, it's time to publish them to our WordPress blog. In this module, we'll automate the entire publishing process, eliminating the need for manual copy-pasting and formatting.

WordPress REST API Integration

WordPress provides a powerful REST API that allows us to create posts, upload media, and manage our blog programmatically. We'll use this API to publish our generated content directly from our Python script.

First, let's create a function that handles authentication and basic API requests:

def create_wordpress_auth_header():
                    """
                    Creates the authentication header for WordPress API requests.
                    
                    Returns:
                        dict: Headers with authentication information
                    """
                    try:
                        # Get WordPress credentials from environment variables
                        username = os.environ['WORDPRESS_USERNAME']
                        password = os.environ['WORDPRESS_PASSWORD']
                        
                        # Create the authentication token (Basic Auth)
                        credentials = f"{username}:{password}"
                        token = base64.b64encode(credentials.encode())
                        
                        # Return the headers
                        return {'Authorization': f'Basic {token.decode("utf-8")}'}
                    
                    except KeyError:
                        print("Error: WordPress credentials not found in environment variables.")
                        return None
                    
                    except Exception as e:
                        print(f"Error creating WordPress authentication header: {e}")
                        return None
                
                def make_wordpress_api_request(endpoint, method="GET", data=None, files=None):
                    """
                    Makes a request to the WordPress REST API.
                    
                    Args:
                        endpoint (str): The API endpoint (without the base URL)
                        method (str): HTTP method (GET, POST, etc.)
                        data (dict): Data to send in the request (for POST, PUT, etc.)
                        files (dict): Files to upload
                        
                    Returns:
                        dict or None: The API response as a dictionary, or None if an error occurred
                    """
                    try:
                        # Get WordPress site URL from environment variables
                        wordpress_url = os.environ['WORDPRESS_URL']
                        
                        # Create the full API URL
                        api_url = f"{wordpress_url}/wp-json/wp/v2/{endpoint}"
                        
                        # Get authentication headers
                        headers = create_wordpress_auth_header()
                        
                        if not headers:
                            print("Failed to create authentication headers.")
                            return None
                        
                        # Make the request based on the method
                        if method.upper() == "GET":
                            response = requests.get(api_url, headers=headers)
                        elif method.upper() == "POST":
                            if files:
                                # For multipart form data (file uploads)
                                response = requests.post(api_url, headers=headers, data=data, files=files)
                            else:
                                # For JSON data
                                headers['Content-Type'] = 'application/json'
                                response = requests.post(api_url, headers=headers, json=data)
                        elif method.upper() == "PUT":
                            headers['Content-Type'] = 'application/json'
                            response = requests.put(api_url, headers=headers, json=data)
                        elif method.upper() == "DELETE":
                            response = requests.delete(api_url, headers=headers)
                        else:
                            print(f"Unsupported HTTP method: {method}")
                            return None
                        
                        # Check if the request was successful
                        if response.status_code in [200, 201]:
                            return response.json()
                        else:
                            print(f"API request failed with status code {response.status_code}")
                            print(f"Response: {response.text}")
                            return None
                    
                    except Exception as e:
                        print(f"Error making WordPress API request: {e}")
                        return None

These utility functions handle the authentication and basic API interaction with WordPress. Next, let's test the WordPress connection to ensure our credentials are correct:

def test_wordpress_connection():
                    """
                    Tests the connection to the WordPress REST API.
                    
                    Returns:
                        bool: True if the connection is successful, False otherwise
                    """
                    try:
                        print("Testing WordPress API connection...")
                        
                        # Try to get a list of posts (limit to 1)
                        response = make_wordpress_api_request('posts?per_page=1')
                        
                        if response is not None:
                            print("WordPress API connection successful!")
                            return True
                        else:
                            print("WordPress API connection failed.")
                            return False
                    
                    except Exception as e:
                        print(f"Error testing WordPress connection: {e}")
                        return False

Before we start using the WordPress API, it's a good idea to run this test to make sure our credentials are working correctly.

Converting Markdown to WordPress Blocks

WordPress uses the Gutenberg editor, which organizes content into blocks. To ensure our content looks good in WordPress, we need to convert our Markdown to the Gutenberg blocks format.

Let's create a comprehensive function to convert Markdown to WordPress blocks:

def markdown_to_wordpress_blocks(markdown_content):
                    """
                    Converts Markdown content to WordPress Gutenberg blocks format.
                    
                    Args:
                        markdown_content (str): The Markdown content
                        
                    Returns:
                        str: The WordPress blocks content
                    """
                    # First convert to HTML
                    html_content = markdown.markdown(markdown_content, extensions=['extra', 'codehilite', 'tables'])
                    
                    # Process the HTML to create WordPress blocks
                    
                    # 1. Extract the title (assumed to be the first h1)
                    title_match = re.search(r'

(.*?)

', html_content) if title_match: title = title_match.group(1) # Remove the title from the HTML as it will be set as the post title html_content = html_content.replace(title_match.group(0), '', 1) # 2. Convert different HTML elements to WordPress blocks # Paragraphs html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) # Headings html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) html_content = re.sub(r'

(.*?)

', r'

\1

', html_content) html_content = re.sub(r'
(.*?)
', r'
\1
', html_content) html_content = re.sub(r'
(.*?)
', r'
\1
', html_content) # Lists html_content = re.sub(r'
    (.*?)
', r'
    \1
', html_content, flags=re.DOTALL) html_content = re.sub(r'
    (.*?)
', r'
    \1
', html_content, flags=re.DOTALL) # Code blocks html_content = re.sub(r'
]*)>(.*?)
', r'
\1
', html_content, flags=re.DOTALL) # Blockquotes html_content = re.sub(r'
(.*?)
', r'
\1
', html_content, flags=re.DOTALL) # Images (if any are directly in the Markdown) html_content = re.sub(r'([^]*>', r'
\2
', html_content) # Tables html_content = re.sub(r'(.*?)
', r'
\1
', html_content, flags=re.DOTALL) # Add special formatting for a call-to-action at the end html_content += '''

Want to learn more about AI automation? Check out our free AI courses for more tutorials and guides!

''' return html_content

This function converts various Markdown elements to their corresponding WordPress blocks, ensuring our content looks good in the WordPress editor. It also adds a formatted call-to-action at the end of each post to drive traffic to your courses page.

Handling Image Uploads and Featured Images

Images are a crucial part of engaging blog posts. We need to upload our featured image to WordPress and attach it to our post.

Let's create functions to handle image uploads and featured images:

def upload_image_to_wordpress(image_data, filename="blog_image.png"):
                    """
                    Uploads an image to WordPress and returns its media ID.
                    
                    Args:
                        image_data (bytes): The image data in bytes
                        filename (str): The filename to use for the uploaded image
                        
                    Returns:
                        int or None: The media ID if successful, None otherwise
                    """
                    try:
                        print(f"Uploading image {filename} to WordPress...")
                        
                        # Prepare the files for the multipart form data request
                        files = {'file': (filename, image_data, 'image/png')}
                        
                        # Make the API request to upload the image
                        response = make_wordpress_api_request('media', method="POST", files=files)
                        
                        if response and 'id' in response:
                            media_id = response['id']
                            image_url = response['source_url']
                            print(f"Image uploaded successfully. Media ID: {media_id}")
                            print(f"Image URL: {image_url}")
                            return media_id
                        else:
                            print("Failed to upload image to WordPress.")
                            return None
                    
                    except Exception as e:
                        print(f"Error uploading image to WordPress: {e}")
                        return None
                
                def create_wordpress_post(title, content, image_id=None, status="draft", categories=None, tags=None):
                    """
                    Creates a new post in WordPress.
                    
                    Args:
                        title (str): The post title
                        content (str): The post content (in HTML or Gutenberg blocks format)
                        image_id (int): The media ID of the featured image
                        status (str): Post status - "draft", "publish", "pending", etc.
                        categories (list): List of category IDs
                        tags (list): List of tag IDs
                        
                    Returns:
                        dict or None: The API response if successful, None otherwise
                    """
                    try:
                        print(f"Creating WordPress post: {title}")
                        
                        # Prepare the post data
                        post_data = {
                            'title': title,
                            'content': content,
                            'status': status
                        }
                        
                        # Add featured image if provided
                        if image_id:
                            post_data['featured_media'] = image_id
                        
                        # Add categories if provided
                        if categories:
                            post_data['categories'] = categories
                        
                        # Add tags if provided
                        if tags:
                            post_data['tags'] = tags
                        
                        # Make the API request to create the post
                        response = make_wordpress_api_request('posts', method="POST", data=post_data)
                        
                        if response and 'id' in response:
                            post_id = response['id']
                            post_url = response['link']
                            print(f"Post created successfully. Post ID: {post_id}")
                            print(f"Post URL: {post_url}")
                            
                            # If the post is published, save it to our tracking system
                            if status == "publish" and 'link' in response:
                                save_published_article(post_url)
                            
                            return response
                        else:
                            print("Failed to create WordPress post.")
                            return None
                    
                    except Exception as e:
                        print(f"Error creating WordPress post: {e}")
                        return None

These functions handle uploading images to WordPress and creating posts with featured images. The upload_image_to_wordpress function takes image data and uploads it to WordPress, returning the media ID. The create_wordpress_post function creates a new post with the specified content and optionally attaches a featured image.

Managing WordPress Posts and Categories

To organize our blog effectively, we need to handle WordPress categories and tags. Let's create functions for managing these taxonomies:

def get_wordpress_categories():
                    """
                    Gets the list of categories from WordPress.
                    
                    Returns:
                        dict or None: A dictionary mapping category names to IDs
                    """
                    try:
                        response = make_wordpress_api_request('categories?per_page=100', method="GET")
                        
                        if response:
                            # Create a dictionary mapping category names to IDs
                            categories = {category['name']: category['id'] for category in response}
                            return categories
                        else:
                            print("Failed to get WordPress categories.")
                            return None
                    
                    except Exception as e:
                        print(f"Error getting WordPress categories: {e}")
                        return None
                
                def get_or_create_category(category_name):
                    """
                    Gets a category ID by name or creates it if it doesn't exist.
                    
                    Args:
                        category_name (str): The category name
                        
                    Returns:
                        int or None: The category ID if successful, None otherwise
                    """
                    try:
                        # Get existing categories
                        categories = get_wordpress_categories()
                        
                        if categories and category_name in categories:
                            return categories[category_name]
                        
                        # Category doesn't exist, create it
                        print(f"Creating category: {category_name}")
                        
                        category_data = {
                            'name': category_name
                        }
                        
                        response = make_wordpress_api_request('categories', method="POST", data=category_data)
                        
                        if response and 'id' in response:
                            return response['id']
                        else:
                            print(f"Failed to create category: {category_name}")
                            return None
                    
                    except Exception as e:
                        print(f"Error getting or creating category: {e}")
                        return None
                
                def get_wordpress_tags():
                    """
                    Gets the list of tags from WordPress.
                    
                    Returns:
                        dict or None: A dictionary mapping tag names to IDs
                    """
                    try:
                        response = make_wordpress_api_request('tags?per_page=100', method="GET")
                        
                        if response:
                            # Create a dictionary mapping tag names to IDs
                            tags = {tag['name']: tag['id'] for tag in response}
                            return tags
                        else:
                            print("Failed to get WordPress tags.")
                            return None
                    
                    except Exception as e:
                        print(f"Error getting WordPress tags: {e}")
                        return None
                
                def get_or_create_tag(tag_name):
                    """
                    Gets a tag ID by name or creates it if it doesn't exist.
                    
                    Args:
                        tag_name (str): The tag name
                        
                    Returns:
                        int or None: The tag ID if successful, None otherwise
                    """
                    try:
                        # Get existing tags
                        tags = get_wordpress_tags()
                        
                        if tags and tag_name in tags:
                            return tags[tag_name]
                        
                        # Tag doesn't exist, create it
                        print(f"Creating tag: {tag_name}")
                        
                        tag_data = {
                            'name': tag_name
                        }
                        
                        response = make_wordpress_api_request('tags', method="POST", data=tag_data)
                        
                        if response and 'id' in response:
                            return response['id']
                        else:
                            print(f"Failed to create tag: {tag_name}")
                            return None
                    
                    except Exception as e:
                        print(f"Error getting or creating tag: {e}")
                        return None

These functions allow us to get existing categories and tags, or create new ones if they don't exist. This is useful for organizing our blog posts into appropriate categories and tagging them with relevant keywords.

Let's also add a function to manage WordPress posts, such as updating or deleting them:

def update_wordpress_post(post_id, title=None, content=None, image_id=None, status=None, categories=None, tags=None):
                    """
                    Updates an existing WordPress post.
                    
                    Args:
                        post_id (int): The ID of the post to update
                        title (str): The new post title (optional)
                        content (str): The new post content (optional)
                        image_id (int): The new featured image ID (optional)
                        status (str): The new post status (optional)
                        categories (list): The new category IDs (optional)
                        tags (list): The new tag IDs (optional)
                        
                    Returns:
                        dict or None: The API response if successful, None otherwise
                    """
                    try:
                        print(f"Updating WordPress post ID: {post_id}")
                        
                        # Prepare the post data
                        post_data = {}
                        
                        # Only include fields that are provided
                        if title is not None:
                            post_data['title'] = title
                        
                        if content is not None:
                            post_data['content'] = content
                        
                        if image_id is not None:
                            post_data['featured_media'] = image_id
                        
                        if status is not None:
                            post_data['status'] = status
                        
                        if categories is not None:
                            post_data['categories'] = categories
                        
                        if tags is not None:
                            post_data['tags'] = tags
                        
                        # Make the API request to update the post
                        response = make_wordpress_api_request(f'posts/{post_id}', method="PUT", data=post_data)
                        
                        if response and 'id' in response:
                            print(f"Post updated successfully. Post ID: {response['id']}")
                            return response
                        else:
                            print("Failed to update WordPress post.")
                            return None
                    
                    except Exception as e:
                        print(f"Error updating WordPress post: {e}")
                        return None
                
                def delete_wordpress_post(post_id, force=False):
                    """
                    Deletes a WordPress post.
                    
                    Args:
                        post_id (int): The ID of the post to delete
                        force (bool): Whether to bypass the trash and delete the post permanently
                        
                    Returns:
                        dict or None: The API response if successful, None otherwise
                    """
                    try:
                        print(f"Deleting WordPress post ID: {post_id}")
                        
                        # Make the API request to delete the post
                        endpoint = f'posts/{post_id}'
                        if force:
                            endpoint += '?force=true'
                        
                        response = make_wordpress_api_request(endpoint, method="DELETE")
                        
                        if response:
                            print(f"Post deleted successfully. Post ID: {post_id}")
                            return response
                        else:
                            print("Failed to delete WordPress post.")
                            return None
                    
                    except Exception as e:
                        print(f"Error deleting WordPress post: {e}")
                        return None

These functions allow us to update or delete WordPress posts, which can be useful for managing our automated blog system.

Putting It All Together

Now, let's create a function that combines everything we've built so far to fully automate the blog creation and publication process:

def slugify(text):
                    """
                    Converts a string to a slug format suitable for URLs.
                    
                    Args:
                        text (str): The text to slugify
                        
                    Returns:
                        str: The slugified text
                    """
                    # Convert to lowercase
                    text = text.lower()
                    
                    # Remove non-word characters (except hyphens and spaces)
                    text = re.sub(r'[^\w\s-]', '', text)
                    
                    # Replace spaces with hyphens
                    text = re.sub(r'\s+', '-', text)
                    
                    # Remove multiple hyphens
                    text = re.sub(r'-+', '-', text)
                    
                    # Remove leading and trailing hyphens
                    text = text.strip('-')
                    
                    return text
                
                def publish_blog_post_with_image(title, content, image_data, categories=None, tags=None, status="draft"):
                    """
                    Publishes a blog post with a featured image to WordPress.
                    
                    Args:
                        title (str): The post title
                        content (str): The post content (in Markdown format)
                        image_data (bytes): The featured image data
                        categories (list): List of category names
                        tags (list): List of tag names
                        status (str): Post status - "draft" or "publish"
                        
                    Returns:
                        dict or None: Information about the published post
                    """
                    try:
                        print(f"Publishing blog post: {title}")
                        
                        # Step 1: Upload the featured image to WordPress
                        image_id = None
                        if image_data:
                            print("Uploading featured image...")
                            image_id = upload_image_to_wordpress(image_data, f"{slugify(title)}.png")
                        
                        # Step 2: Process categories
                        category_ids = []
                        if categories:
                            print("Processing categories...")
                            for category_name in categories:
                                category_id = get_or_create_category(category_name)
                                if category_id:
                                    category_ids.append(category_id)
                        
                        # Step 3: Process tags
                        tag_ids = []
                        if tags:
                            print("Processing tags...")
                            for tag_name in tags:
                                tag_id = get_or_create_tag(tag_name)
                                if tag_id:
                                    tag_ids.append(tag_id)
                        
                        # Step 4: Convert Markdown to WordPress blocks format
                        print("Converting content to WordPress blocks format...")
                        wp_content = markdown_to_wordpress_blocks(content)
                        
                        # Step 5: Create the WordPress post
                        print(f"Creating WordPress post with status: {status}")
                        wp_response = create_wordpress_post(
                            title=title,
                            content=wp_content,
                            image_id=image_id,
                            status=status,
                            categories=category_ids,
                            tags=tag_ids
                        )
                        
                        if wp_response and 'id' in wp_response:
                            print(f"Blog post published successfully! URL: {wp_response['link']}")
                            return wp_response
                        else:
                            print("Failed to publish blog post.")
                            return None
                    
                    except Exception as e:
                        print(f"Error publishing blog post: {e}")
                        return None
                
                def automated_blog_creation(search_query, categories=None, tags=None, publish=False):
                    """
                    Automates the entire blog creation process from article selection to publication.
                    
                    Args:
                        search_query (str): The topic to search for
                        categories (list): List of category names
                        tags (list): List of tag names
                        publish (bool): Whether to publish immediately or save as draft
                        
                    Returns:
                        dict: Information about the created blog post
                    """
                    result = {
                        'success': False,
                        'title': None,
                        'url': None,
                        'error': None
                    }
                    
                    try:
                        print(f"Starting automated blog creation for query: {search_query}")
                        
                        # Test WordPress connection first
                        if not test_wordpress_connection():
                            error_msg = "WordPress connection test failed. Please check your credentials."
                            print(error_msg)
                            result['error'] = error_msg
                            return result
                        
                        # Step 1: Select the best article
                        selected_article, reason = select_best_article(search_query)
                        
                        if not selected_article:
                            error_msg = f"Could not select an article: {reason}"
                            print(error_msg)
                            result['error'] = error_msg
                            return result
                        
                        print("\nSelected article:")
                        print(f"Title: {selected_article['title']}")
                        print(f"Link: {selected_article['html_link']}")
                        
                        # Step 2: Fetch the full content
                        print("\nFetching full article content...")
                        full_article = fetch_article_content(selected_article)
                        
                        if not full_article:
                            error_msg = "Failed to fetch full article content."
                            print(error_msg)
                            result['error'] = error_msg
                            return result
                        
                        # Add the title from the selected article
                        full_article['title'] = selected_article['title']
                        
                        # Step 3: Generate the blog post
                        print("\nGenerating blog post...")
                        blog_post = generate_blog_post_with_retry(full_article)
                        
                        if not blog_post:
                            error_msg = "Failed to generate blog post."
                            print(error_msg)
                            result['error'] = error_msg
                            return result
                        
                        # Step 4: Extract the title and enhance the blog post
                        print("\nExtracting title and enhancing blog post...")
                        title = extract_blog_title(blog_post)
                        enhanced_blog_post = add_blog_metadata(blog_post, full_article)
                        
                        # Step 5: Generate a featured image
                        print("\nGenerating featured image...")
                        image_data = create_blog_featured_image(enhanced_blog_post, title)
                        
                        if not image_data:
                            print("Warning: Failed to generate featured image. Continuing without image.")
                        
                        # Step 6: Publish to WordPress
                        status = "publish" if publish else "draft"
                        print(f"\nPublishing blog post to WordPress as {status}...")
                        
                        if not categories:
                            categories = ["AI", "Technology", "Research"]
                        
                        if not tags:
                            # Generate tags from the content
                            auto_tags = generate_tags_from_content(enhanced_blog_post, title)
                            tags = auto_tags if auto_tags else ["ai", "research", search_query]
                        
                        response = publish_blog_post_with_image(
                            title=title,
                            content=enhanced_blog_post,
                            image_data=image_data,
                            categories=categories,
                            tags=tags,
                            status=status
                        )
                        
                        if response and 'link' in response:
                            print("\nBlog creation and publication successful!")
                            result['success'] = True
                            result['title'] = title
                            result['url'] = response['link']
                            result['post_id'] = response['id']
                            
                            # Save to published articles if it's published
                            if status == "publish":
                                save_published_article(selected_article['html_link'])
                        else:
                            error_msg = "Failed to publish blog post."
                            print(error_msg)
                            result['error'] = error_msg
                        
                        return result
                    
                    except Exception as e:
                        error_msg = f"Error in automated blog creation: {e}"
                        print(error_msg)
                        result['error'] = error_msg
                        return result
                
                def generate_tags_from_content(content, title):
                    """
                    Generates tags from the blog content and title using GPT-4.
                    
                    Args:
                        content (str): The blog content
                        title (str): The blog title
                        
                    Returns:
                        list: A list of tag names
                    """
                    try:
                        prompt = f"""
                        Generate 5-7 relevant SEO tags for the following blog post. 
                        
                        Title: {title}
                        
                        Content excerpt:
                        {content[:1000]}...
                        
                        Return only the tags as a comma-separated list, all lowercase.
                        """
                        
                        response = client.chat.completions.create(
                            model="gpt-4o",
                            messages=[
                                {"role": "system", "content": "You are an expert SEO tagger. Generate relevant, concise tags for blog posts."},
                                {"role": "user", "content": prompt}
                            ],
                            max_tokens=100,
                            temperature=0.7
                        )
                        
                        tags_text = response.choices[0].message.content.strip()
                        
                        # Split by commas and clean up
                        tags = [tag.strip().lower() for tag in tags_text.split(',') if tag.strip()]
                        
                        print(f"Generated tags: {tags}")
                        return tags
                    
                    except Exception as e:
                        print(f"Error generating tags: {e}")
                        return None

These functions put everything together to create a complete blog automation system. The automated_blog_creation function handles the entire process from start to finish:

  1. Testing the WordPress connection
  2. Selecting the best article based on the search query
  3. Fetching the full content of the selected article
  4. Generating a blog post from the article
  5. Creating a featured image
  6. Publishing the post to WordPress with categories and tags

We also added a generate_tags_from_content function that uses GPT-4 to automatically generate relevant tags for the blog post.

Testing the Automated Blog Creation

Let's create a test function to verify that our automated blog creation system works correctly:

def test_automated_blog_creation():
                    """
                    Tests the automated blog creation process with a sample query.
                    """
                    # Define test parameters
                    search_query = "AI ethics"
                    categories = ["AI", "Ethics", "Technology"]
                    tags = ["ai ethics", "artificial intelligence", "responsible ai", "tech ethics", "ai governance"]
                    publish = False  # Set to False for testing (creates a draft)
                    
                    print(f"Testing automated blog creation with query: {search_query}")
                    
                    # Run the automated blog creation process
                    result = automated_blog_creation(
                        search_query=search_query,
                        categories=categories,
                        tags=tags,
                        publish=publish
                    )
                    
                    # Display the results
                    if result['success']:
                        print("\nAutomated blog creation test successful!")
                        print(f"Blog Title: {result['title']}")
                        print(f"Blog URL: {result['url']}")
                        print(f"Post ID: {result['post_id']}")
                        print("Status: Draft (not published)")
                    else:
                        print("\nAutomated blog creation test failed.")
                        print(f"Error: {result['error']}")
                
                # Run the test
                # test_automated_blog_creation()

This test function runs the entire blog creation process with a sample query, creating a draft post in WordPress. It's a great way to verify that everything is working correctly before scheduling regular automated posts.

Scheduling Automated Blog Posts

To complete our automated blog system, we can set up a scheduling system to create new posts automatically at regular intervals. Here's a simple scheduling function:

def schedule_automated_blog_posts(topics, interval_days=7):
                    """
                    Schedules automated blog posts to be created at regular intervals.
                    
                    Args:
                        topics (list): List of search queries to cycle through
                        interval_days (int): Number of days between posts
                        
                    Returns:
                        bool: True if scheduling was successful, False otherwise
                    """
                    try:
                        import time
                        from datetime import datetime
                        
                        print(f"Scheduling automated blog posts with {interval_days} day intervals")
                        print(f"Topics to cycle through: {topics}")
                        
                        # Keep track of the current topic index
                        topic_index = 0
                        
                        while True:
                            # Get the current topic
                            topic = topics[topic_index]
                            
                            # Print status
                            current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                            print(f"\n[{current_time}] Creating blog post for topic: {topic}")
                            
                            # Create the blog post
                            result = automated_blog_creation(
                                search_query=topic,
                                publish=True  # Automatically publish
                            )
                            
                            if result['success']:
                                print(f"Blog post created successfully: {result['title']}")
                                print(f"URL: {result['url']}")
                            else:
                                print(f"Failed to create blog post for topic: {topic}")
                                print(f"Error: {result['error']}")
                            
                            # Move to the next topic
                            topic_index = (topic_index + 1) % len(topics)
                            
                            # Wait for the specified interval
                            print(f"Waiting {interval_days} days until next post...")
                            time.sleep(interval_days * 24 * 60 * 60)  # Convert days to seconds
                    
                    except KeyboardInterrupt:
                        print("\nScheduling stopped by user.")
                        return True
                    except Exception as e:
                        print(f"Error scheduling automated blog posts: {e}")
                        return False
                
                
                # Example usage (commented out to prevent accidental execution)
                """
                topics = [
                    "AI ethics",
                    "machine learning applications",
                    "natural language processing",
                    "computer vision",
                    "reinforcement learning"
                ]
                
                # Schedule posts every 7 days
                schedule_automated_blog_posts(topics, interval_days=7)
                """

This function cycles through a list of topics, creating a new blog post for each one at the specified interval. You could run this in a separate script or adapt it to work with system schedulers like cron on Linux or Task Scheduler on Windows.

Since Google Colab sessions have time limits, here's an alternative approach using Google Colab scheduled executions:

# Create a function to run a single iteration
                def create_scheduled_blog_post(topic_file="topics.txt", post_log="post_log.txt"):
                    """
                    Creates a single blog post and logs the result.
                    Designed to be run by a scheduler.
                    
                    Args:
                        topic_file (str): Path to a file containing topics, one per line
                        post_log (str): Path to a log file to record posts
                        
                    Returns:
                        bool: True if successful, False otherwise
                    """
                    try:
                        # Mount Google Drive (if using Colab)
                        try:
                            from google.colab import drive
                            drive.mount('/content/drive')
                            base_path = "/content/drive/My Drive/BlogAutomation/"
                            topic_file = base_path + topic_file
                            post_log = base_path + post_log
                        except:
                            # Not running in Colab, use local paths
                            base_path = "./"
                        
                        # Ensure directories exist
                        os.makedirs(os.path.dirname(topic_file), exist_ok=True)
                        
                        # Read topics file
                        if os.path.exists(topic_file):
                            with open(topic_file, 'r') as f:
                                topics = [line.strip() for line in f.readlines() if line.strip()]
                        else:
                            # Default topics if file doesn't exist
                            topics = ["AI ethics", "machine learning", "natural language processing"]
                            with open(topic_file, 'w') as f:
                                f.write('\n'.join(topics))
                        
                        # Get current date for logging
                        current_date = datetime.now().strftime("%Y-%m-%d")
                        
                        # Read post log to find the most recent topic
                        last_topic = None
                        if os.path.exists(post_log):
                            with open(post_log, 'r') as f:
                                lines = f.readlines()
                                if lines:
                                    # Get the most recent entry
                                    last_line = lines[-1].strip()
                                    parts = last_line.split(' | ')
                                    if len(parts) >= 2:
                                        last_topic = parts[1]
                        
                        # Determine the next topic
                        if last_topic in topics:
                            next_index = (topics.index(last_topic) + 1) % len(topics)
                        else:
                            next_index = 0
                        
                        topic = topics[next_index]
                        
                        # Create the blog post
                        result = automated_blog_creation(
                            search_query=topic,
                            publish=True
                        )
                        
                        # Log the result
                        with open(post_log, 'a') as f:
                            if result['success']:
                                log_entry = f"{current_date} | {topic} | SUCCESS | {result['title']} | {result['url']}\n"
                            else:
                                log_entry = f"{current_date} | {topic} | FAILED | {result.get('error', 'Unknown error')}\n"
                            
                            f.write(log_entry)
                        
                        return result['success']
                    
                    except Exception as e:
                        print(f"Error in scheduled blog post creation: {e}")
                        # Log the error
                        try:
                            with open(post_log, 'a') as f:
                                f.write(f"{datetime.now().strftime('%Y-%m-%d')} | ERROR | {str(e)}\n")
                        except:
                            pass
                        return False

This function is designed to be run by an external scheduler, like a cron job or a scheduled task. It reads topics from a file, creates a blog post for the next topic in the rotation, and logs the result. This approach is more resilient than a continuous loop in Google Colab, which might be terminated if the session times out.

✅ Action Steps

  1. Add all the code from this module to your Google Colab notebook
  2. Run the test_wordpress_connection function to verify your WordPress credentials
  3. Try the test_automated_blog_creation function to create a draft post
  4. Check your WordPress dashboard to see the created post
  5. If everything looks good, consider setting up scheduled posts using the scheduling functions
  6. Customize categories and tags to match your blog's taxonomy

In this module, you've learned how to:

  • Interact with the WordPress REST API
  • Convert Markdown content to WordPress Gutenberg blocks
  • Upload images and set featured images
  • Manage WordPress categories and tags
  • Automate the entire blog creation and publication process
  • Schedule regular blog posts

In the next module, we'll explore advanced customization and scaling techniques to further enhance your automated blog system.

Learn advanced techniques for customizing your content style, scheduling publication, and scaling your automated blog system.

Lessons in this module:

  • Customizing Content Style and Format
  • Scheduling Automated Publication
  • Monitoring and Analytics Integration
  • Troubleshooting Common Issues

Module Content:

Now that you've built a functioning automated blog system, let's explore ways to customize it, optimize performance, scale it up, and troubleshoot common issues. This module will help you take your blog automation to the next level by adding more sophisticated features and ensuring it runs reliably over time.

Customizing Content Style and Format

One of the most powerful aspects of using AI for content creation is the ability to customize the output to match your brand's unique voice and style. Let's explore how to make your generated content truly your own.

Tailoring Your Blog's Voice

The blog posts generated by our system follow a general style defined in our prompts. However, you can make your content more distinctive by customizing the prompts to reflect your brand's voice.

def create_custom_style_prompt(style):
                    """
                    Creates prompts tailored to different content styles.
                    
                    Args:
                        style (str): The desired style - "casual", "professional", "technical", etc.
                        
                    Returns:
                        str: A system prompt that directs the style of content
                    """
                    style_prompts = {
                        "casual": """
                            Write in a casual, conversational tone. Use contractions, occasional slang, 
                            and a friendly approach. Address the reader directly as "you". Keep sentences 
                            short and paragraphs to 2-3 sentences. Feel free to use emoji occasionally 
                            and don't shy away from humor when appropriate. 👍
                        """,
                        
                        "professional": """
                            Maintain a professional, authoritative tone throughout. Use proper grammar 
                            and avoid contractions. Back up assertions with facts. Structure content with 
                            clear headings and logical flow. Avoid colloquialisms and ensure content 
                            reflects industry expertise. Use a formal third-person perspective.
                        """,
                        
                        "technical": """
                            Focus on precise technical details and terminology specific to the field. 
                            Include code examples, technical diagrams descriptions, and implementation details 
                            where relevant. Explain complex concepts clearly but don't oversimplify. 
                            Structure content with detailed headings and subheadings for clear navigation.
                        """,
                        
                        "storytelling": """
                            Begin with an engaging narrative hook. Weave technical information into a story 
                            format with a clear beginning, middle, and end. Use vivid analogies and metaphors 
                            to explain complex concepts. Create a narrative thread that connects different 
                            sections of the content. End with a satisfying conclusion that ties back to the opening.
                        """
                    }
                    
                    # Return the requested style prompt or a default one if not found
                    return style_prompts.get(style, """Write in a clear, engaging style that balances 
                                                     informative content with readability.""")

You can integrate this function into your blog generation process by updating the generate_blog_post function:

def generate_blog_post(article, style="professional"):
                    """
                    Generates an engaging blog post from an academic article using GPT-4.
                    
                    Args:
                        article (dict): The article content
                        style (str): The writing style to use
                        
                    Returns:
                        str: The generated blog post in Markdown format
                    """
                    # Get style-specific instructions
                    style_instructions = create_custom_style_prompt(style)
                    
                    # Create the system instruction
                    system_instruction = f"""
                    You are an expert blog writer specializing in transforming academic research into engaging, accessible content.
                    
                    {style_instructions}
                    
                    Your task is to create an informative and interesting blog post based on the academic article I will provide.
                    
                    Guidelines:
                    1. Structure the post with clear headings and subheadings
                    2. Include a compelling introduction that highlights the relevance and importance of the research
                    3. Break down complex ideas into understandable explanations
                    4. Add analogies or examples where helpful
                    5. Format as Markdown with appropriate headings, lists, and emphasis
                    6. Include a "Key Takeaways" section at the end
                    7. Create an engaging, SEO-friendly title (different from the academic title)
                    8. Properly credit the original research and authors
                    
                    The blog post should be 800-1200 words long and written for a general audience interested in technology and AI.
                    """
                    
                    # Rest of the function remains the same...

Customizing Image Generation

Similarly, you can customize your image generation to maintain a consistent visual style across all your blog posts:

def create_custom_image_style_prompt(visual_style):
                    """
                    Creates image description prompts tailored to different visual styles.
                    
                    Args:
                        visual_style (str): The desired visual style
                        
                    Returns:
                        str: Additional style instructions for image generation
                    """
                    style_prompts = {
                        "minimalist": """
                            The image should follow minimalist design principles with ample white space, 
                            limited color palette (2-3 colors maximum), and clean, simple shapes. 
                            Avoid clutter and complex patterns. Aim for a modern, elegant aesthetic 
                            with clear focal points.
                        """,
                        
                        "illustrated": """
                            Create a detailed, hand-drawn illustration style image. Use vivid colors, 
                            distinctive line work, and a creative representation of the concept. The 
                            style should feel like a professional illustration from a high-quality 
                            publication, with careful attention to composition and visual storytelling.
                        """,
                        
                        "futuristic": """
                            Design a high-tech, futuristic visualization with glowing elements, holographic 
                            effects, and a sci-fi aesthetic. Use a color palette dominated by blues, purples, 
                            and cyans. The composition should feel advanced, innovative, and cutting-edge, 
                            with elements that suggest technology beyond current capabilities.
                        """,
                        
                        "corporate": """
                            Create a professional business-style image suitable for corporate communications. 
                            Use a clean, balanced composition with subtle gradients, professional color schemes 
                            (blues, grays, with strategic accent colors), and a polished, executive-friendly appearance. 
                            The style should convey trustworthiness, expertise, and corporate professionalism.
                        """
                    }
                    
                    return style_prompts.get(visual_style, "Create a clean, professional image with balanced composition and appropriate colors.")

Then update your image description generation function:

def generate_image_description(blog_post, title, visual_style="corporate"):
                    """
                    Generates a detailed image description based on the blog post content with a specific visual style.
                    
                    Args:
                        blog_post (str): The full blog post content
                        title (str): The title of the blog post
                        visual_style (str): The desired visual style
                        
                    Returns:
                        str: A detailed image description for gpt-image-1
                    """
                    # Get style-specific instructions
                    style_instructions = create_custom_image_style_prompt(visual_style)
                    
                    # Create a prompt for GPT-4 to generate an image description
                    prompt = f"""
                    Based on the following blog post title and content, create a detailed description for an image 
                    that would serve as an engaging featured image for the post.
                    
                    BLOG TITLE: {title}
                    
                    BLOG CONTENT (excerpt):
                    {blog_post[:2000]}...
                    
                    Guidelines for the image description:
                    1. The description should be detailed and vivid (200-300 words)
                    2. Focus on creating a professional, eye-catching image relevant to the blog topic
                    3. Avoid requesting any text in the image
                    4. Consider using metaphors, abstract representations, or relevant visual concepts
                    5. Incorporate color recommendations to match the tone of the article
                    
                    VISUAL STYLE INSTRUCTIONS:
                    {style_instructions}
                    
                    Your description will be used with OpenAI's gpt-image-1 model to generate the actual image.
                    Only provide the description itself, without any additional commentary.
                    """
                    
                    # Rest of the function remains the same...

Custom Content Templates

You can also create templates for different types of blog posts, such as "how-to guides," "news analysis," or "product reviews." Here's an example:

def create_content_template(template_type):
                    """
                    Creates content structure templates for different blog post types.
                    
                    Args:
                        template_type (str): The type of template to use
                        
                    Returns:
                        dict: Template structure and instructions
                    """
                    templates = {
                        "how_to_guide": {
                            "structure": [
                                "Introduction to the problem", 
                                "Why this solution matters", 
                                "Step 1", 
                                "Step 2", 
                                "Step 3", 
                                "Common challenges and solutions", 
                                "Results you can expect", 
                                "Conclusion"
                            ],
                            "instructions": """
                                Format this content as a detailed how-to guide. Start with an engaging introduction
                                that describes the problem being solved and why it matters. Break down the process
                                into clear, numbered steps with detailed explanations for each. Include specific
                                examples, potential challenges, and their solutions. End with expected outcomes and
                                a motivational conclusion.
                            """
                        },
                        "news_analysis": {
                            "structure": [
                                "Key news summary", 
                                "Background context", 
                                "Key development 1", 
                                "Key development 2", 
                                "Key development 3", 
                                "Expert perspectives", 
                                "Future implications", 
                                "Conclusion"
                            ],
                            "instructions": """
                                Format this as an analytical news piece. Begin with a concise summary of the key news.
                                Provide essential background context for readers unfamiliar with the topic. Break down
                                2-3 key developments or aspects of the news, analyzing each in depth. Include diverse
                                expert perspectives. Discuss potential future implications. Conclude with the broader
                                significance of this news.
                            """
                        },
                        "research_summary": {
                            "structure": [
                                "Research overview", 
                                "Key findings", 
                                "Methodology", 
                                "Significant result 1", 
                                "Significant result 2", 
                                "Practical applications", 
                                "Limitations", 
                                "Future research directions"
                            ],
                            "instructions": """
                                Format this as an accessible research summary. Begin with a high-level overview of
                                the research focus and importance. Summarize key findings in plain language. Briefly
                                explain the methodology in accessible terms. Elaborate on 2-3 significant results and
                                their implications. Discuss practical real-world applications. Note important limitations
                                of the research. End with future research directions.
                            """
                        }
                    }
                    
                    return templates.get(template_type, {
                        "structure": ["Introduction", "Main content", "Conclusion"],
                        "instructions": "Format this as a standard blog post with clear sections."
                    })

You can integrate this into your blog generation workflow to create different types of content based on the source material.

Scheduling Automated Publication

A powerful aspect of automation is the ability to schedule content to be created and published at regular intervals. Let's develop a robust scheduling system.

Creating a Reliable Scheduler

Instead of trying to run a continuous process in Google Colab (which has runtime limitations), let's create a script that can be triggered by external schedulers like cron jobs or cloud-based schedulers.

def schedule_single_post_creation(topic=None, style="professional", template="research_summary", visual_style="corporate"):
                    """
                    Creates and publishes a single blog post. Designed to be run by external schedulers.
                    
                    Args:
                        topic (str): Optional specific topic to search for. If None, uses a rotating list.
                        style (str): The writing style to use
                        template (str): The content template type
                        visual_style (str): The visual style for the image
                        
                    Returns:
                        dict: Information about the created post
                    """
                    result = {
                        'success': False,
                        'title': None,
                        'url': None,
                        'timestamp': datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
                        'error': None
                    }
                    
                    try:
                        print(f"[{result['timestamp']}] Starting scheduled blog creation")
                        
                        # If no topic is provided, get one from a rotating list
                        if not topic:
                            topic = get_next_scheduled_topic()
                        
                        print(f"Using topic: {topic}")
                        
                        # Fetch articles and select the best one
                        articles = fetch_arxiv_articles(search_query=topic)
                        raw_response, structured_response = send_to_chatgpt_for_scoring(articles)
                        selected_article, reason = parse_chatgpt_response(raw_response, articles)
                        
                        if not selected_article:
                            error_msg = f"No suitable article found for topic: {topic}"
                            print(error_msg)
                            result['error'] = error_msg
                            return result
                        
                        # Fetch full content
                        full_article = fetch_article_content(selected_article)
                        if not full_article:
                            error_msg = "Failed to fetch full article content"
                            print(error_msg)
                            result['error'] = error_msg
                            return result
                        
                        # Add missing fields
                        full_article['title'] = selected_article['title']
                        full_article['html_link'] = selected_article['html_link']
                        
                        # Get content template
                        content_template = create_content_template(template)
                        
                        # Generate blog post with style and template
                        blog_post = generate_blog_post(full_article, style=style, template_instructions=content_template["instructions"])
                        
                        # Extract title
                        title = extract_blog_title(blog_post)
                        
                        # Generate image with custom style
                        image_description = generate_image_description(blog_post, title, visual_style=visual_style)
                        image_data = generate_image(image_description)
                        
                        # Publish to WordPress
                        publish_result = publish_blog_post_with_image(
                            title=title,
                            content=blog_post,
                            image_data=image_data,
                            categories=["AI", "Technology", "Research"],
                            status="publish"  # Automatically publish
                        )
                        
                        if publish_result and 'link' in publish_result:
                            # Success!
                            result['success'] = True
                            result['title'] = title
                            result['url'] = publish_result['link']
                            print(f"Successfully published: {title}")
                            
                            # Save to published list
                            save_published_article(selected_article['html_link'])
                        else:
                            error_msg = "Failed to publish to WordPress"
                            print(error_msg)
                            result['error'] = error_msg
                        
                        # Log the result
                        log_publishing_result(result)
                        
                        return result
                        
                    except Exception as e:
                        error_msg = f"Error in scheduled post creation: {str(e)}"
                        print(error_msg)
                        result['error'] = error_msg
                        log_publishing_result(result)
                        return result

Rotating Topic Selection

To maintain variety in your content, you can implement a rotating topic selection system:

def get_next_scheduled_topic():
                    """
                    Gets the next topic from a rotating list.
                    
                    Returns:
                        str: The next topic to use
                    """
                    # Define your topic rotation - customize these to match your blog's focus
                    topics = [
                        "AI ethics",
                        "machine learning applications",
                        "neural networks",
                        "natural language processing",
                        "computer vision advances",
                        "reinforcement learning",
                        "AI in healthcare",
                        "ChatGPT applications",
                        "LLM advancements",
                        "AI and privacy"
                    ]
                    
                    try:
                        # Load the last used topic index
                        if os.path.exists('topic_index.txt'):
                            with open('topic_index.txt', 'r') as f:
                                last_index = int(f.read().strip())
                        else:
                            last_index = -1
                        
                        # Get the next index, wrapping around if necessary
                        next_index = (last_index + 1) % len(topics)
                        
                        # Save the new index
                        with open('topic_index.txt', 'w') as f:
                            f.write(str(next_index))
                        
                        return topics[next_index]
                        
                    except Exception as e:
                        print(f"Error selecting topic: {e}")
                        # Fallback to a default topic
                        return "artificial intelligence"

Logging Publication Results

Keeping track of your automated publications is essential for monitoring and troubleshooting:

def log_publishing_result(result):
                    """
                    Logs the result of a publishing attempt to a CSV file.
                    
                    Args:
                        result (dict): The publishing result information
                    """
                    log_file = 'blog_publishing_log.csv'
                    file_exists = os.path.isfile(log_file)
                    
                    try:
                        with open(log_file, 'a', newline='') as f:
                            writer = csv.DictWriter(f, fieldnames=['timestamp', 'success', 'title', 'url', 'error'])
                            
                            if not file_exists:
                                writer.writeheader()
                            
                            writer.writerow(result)
                            
                        print(f"Result logged to {log_file}")
                    except Exception as e:
                        print(f"Error logging result: {e}")

Setting Up External Scheduling

For reliable scheduling, you'll want to set up an external scheduler. Here are some options:

  1. Using GitHub Actions:
    Create a GitHub repository for your script and add a workflow file like this:
    name: Scheduled Blog Post Generation
                    
                    on:
                      schedule:
                        - cron: '0 12 * * 1,4'  # Runs at 12:00 UTC on Monday and Thursday
                      workflow_dispatch:  # Allows manual triggering
                    
                    jobs:
                      generate_post:
                        runs-on: ubuntu-latest
                        steps:
                          - uses: actions/checkout@v2
                          - name: Set up Python
                            uses: actions/setup-python@v2
                            with:
                              python-version: '3.10'
                          - name: Install dependencies
                            run: |
                              python -m pip install --upgrade pip
                              pip install requests beautifulsoup4 openai PyPDF2 medium-sdk markdown python-dotenv
                          - name: Generate and publish blog post
                            env:
                              OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
                              WORDPRESS_USERNAME: ${{ secrets.WORDPRESS_USERNAME }}
                              WORDPRESS_PASSWORD: ${{ secrets.WORDPRESS_PASSWORD }}
                              WORDPRESS_URL: ${{ secrets.WORDPRESS_URL }}
                              MEDIUM_TOKEN: ${{ secrets.MEDIUM_TOKEN }}
                            run: python blog_automation.py
  2. Using Google Cloud Scheduler:
    You can adapt your script to run as a Cloud Function and trigger it with Cloud Scheduler.

Monitoring and Analytics Integration

To get the most out of your automated blog system, it's important to track performance and continuously improve.

Adding Google Analytics Tracking

First, let's modify our WordPress publishing function to add Google Analytics tracking parameters:

def add_analytics_tracking(blog_post, campaign="auto_blog"):
                    """
                    Adds UTM parameters to links in the blog post for tracking.
                    
                    Args:
                        blog_post (str): The blog post content
                        campaign (str): The campaign name for tracking
                        
                    Returns:
                        str: The blog post with tracking parameters added
                    """
                    # Regular expression to find links in markdown
                    link_pattern = r'\[([^\]]+)\]\(([^)]+)\)'
                    
                    # Function to replace links with tracked links
                    def add_utm(match):
                        link_text = match.group(1)
                        url = match.group(2)
                        
                        # Only add tracking to external links (not internal anchors)
                        if url.startswith('http') and 'utm_' not in url:
                            separator = '&' if '?' in url else '?'
                            tracked_url = f"{url}{separator}utm_source=blog&utm_medium=automated&utm_campaign={campaign}"
                            return f"[{link_text}]({tracked_url})"
                        else:
                            return match.group(0)
                    
                    # Replace links with tracked links
                    tracked_content = re.sub(link_pattern, add_utm, blog_post)
                    
                    return tracked_content

Tracking Publication Performance

Let's create a function to monitor how our automated posts are performing:

def track_post_performance(post_url, wordpress_url, username, password):
                    """
                    Fetches performance metrics for a published post.
                    
                    Args:
                        post_url (str): The URL of the post to track
                        wordpress_url (str): The WordPress site URL
                        username (str): WordPress username
                        password (str): WordPress application password
                        
                    Returns:
                        dict: Performance metrics for the post
                    """
                    try:
                        # Extract post ID from URL
                        post_id_match = re.search(r'p=(\d+)', post_url)
                        if not post_id_match:
                            # Try alternate format
                            post_id_match = re.search(r'/(\d+)/?$', post_url)
                        
                        if not post_id_match:
                            print(f"Could not extract post ID from URL: {post_url}")
                            return None
                            
                        post_id = post_id_match.group(1)
                        
                        # Set up authentication
                        credentials = f"{username}:{password}"
                        token = base64.b64encode(credentials.encode())
                        headers = {'Authorization': f'Basic {token.decode("utf-8")}'}
                        
                        # Make API request to get post data
                        api_url = f"{wordpress_url}/wp-json/wp/v2/posts/{post_id}?_embed"
                        response = requests.get(api_url, headers=headers)
                        
                        if response.status_code != 200:
                            print(f"Failed to get post data. Status code: {response.status_code}")
                            print(f"Response: {response.text}")
                            return None
                            
                        post_data = response.json()
                        
                        # Basic metrics from WordPress
                        metrics = {
                            'title': post_data['title']['rendered'],
                            'date_published': post_data['date'],
                            'comment_count': post_data['comment_count'],
                            'status': post_data['status'],
                            'link': post_data['link']
                        }
                        
                        # If you have a WordPress plugin that exposes view counts via the API, you could get that too
                        
                        return metrics
                        
                    except Exception as e:
                        print(f"Error tracking post performance: {e}")
                        return None

Integrating with Google Analytics API

For more advanced analytics, you can integrate with Google Analytics API:

def get_analytics_data(post_url, days=30):
                    """
                    Fetches Google Analytics data for a specific post.
                    Note: Requires setting up the Google Analytics API and authentication.
                    
                    Args:
                        post_url (str): The URL of the post to analyze
                        days (int): Number of days to analyze
                        
                    Returns:
                        dict: Analytics data for the post
                    """
                    try:
                        # This is a placeholder function that would need to be implemented
                        # with the Google Analytics API
                        
                        # You would need to:
                        # 1. Set up Google Analytics API credentials
                        # 2. Initialize the Analytics API client
                        # 3. Make requests to get pageviews, time on page, bounce rate, etc.
                        
                        # For demonstration purposes, we'll return dummy data
                        return {
                            'url': post_url,
                            'pageviews': 250,
                            'unique_visitors': 180,
                            'avg_time_on_page': '2:30',
                            'bounce_rate': '65%',
                            'top_referrers': ['google.com', 'linkedin.com', 'twitter.com']
                        }
                        
                    except Exception as e:
                        print(f"Error getting analytics data: {e}")
                        return None

Troubleshooting Common Issues

Even the best automation systems encounter issues. Let's build some troubleshooting tools and address common problems.

Error Handling and Retry Logic

When API calls fail, it's good to have retry logic:

def retry_function(func, max_attempts=3, retry_delay=5, *args, **kwargs):
                    """
                    Retries a function multiple times if it fails.
                    
                    Args:
                        func: The function to retry
                        max_attempts (int): Maximum number of retry attempts
                        retry_delay (int): Seconds to wait between retries
                        *args, **kwargs: Arguments to pass to the function
                        
                    Returns:
                        The result of the function if successful, None otherwise
                    """
                    for attempt in range(1, max_attempts + 1):
                        try:
                            return func(*args, **kwargs)
                        except Exception as e:
                            print(f"Attempt {attempt} failed: {e}")
                            if attempt < max_attempts:
                                print(f"Retrying in {retry_delay} seconds...")
                                time.sleep(retry_delay)
                                # Increase delay for next attempt (exponential backoff)
                                retry_delay *= 2
                            else:
                                print(f"All {max_attempts} attempts failed.")
                                return None

Common Issues and Solutions

Here's a function to diagnose common problems:

def diagnose_system_issues():
                    """
                    Diagnoses common issues with the blog automation system.
                    
                    Returns:
                        list: Identified issues and potential solutions
                    """
                    issues = []
                    
                    # Check API keys
                    try:
                        # Test OpenAI API
                        client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
                        client.chat.completions.create(
                            model="gpt-4o",
                            messages=[{"role": "user", "content": "Hello"}],
                            max_tokens=5
                        )
                    except Exception as e:
                        issues.append({
                            'component': 'OpenAI API',
                            'issue': str(e),
                            'solution': 'Check that your OpenAI API key is valid and has sufficient credits.'
                        })
                    
                    # Check WordPress credentials
                    try:
                        wordpress_url = os.environ['WORDPRESS_URL']
                        username = os.environ['WORDPRESS_USERNAME']
                        password = os.environ['WORDPRESS_PASSWORD']
                        
                        credentials = f"{username}:{password}"
                        token = base64.b64encode(credentials.encode())
                        headers = {'Authorization': f'Basic {token.decode("utf-8")}'}
                        
                        response = requests.get(f"{wordpress_url}/wp-json/wp/v2/posts?per_page=1", headers=headers)
                        
                        if response.status_code != 200:
                            issues.append({
                                'component': 'WordPress API',
                                'issue': f"HTTP {response.status_code}: {response.text}",
                                'solution': 'Verify your WordPress URL, username, and application password.'
                            })
                    except Exception as e:
                        issues.append({
                            'component': 'WordPress API',
                            'issue': str(e),
                            'solution': 'Check WordPress environment variables and site accessibility.'
                        })
                    
                    # Check published articles file
                    try:
                        published_articles = load_published_articles()
                    except Exception as e:
                        issues.append({
                            'component': 'Published Articles Tracking',
                            'issue': str(e),
                            'solution': 'The published_articles.json file may be corrupted. Try creating a new empty file.'
                        })
                    
                    # Check for networking issues
                    try:
                        requests.get('https://arxiv.org')
                    except Exception as e:
                        issues.append({
                            'component': 'Network Connectivity',
                            'issue': str(e),
                            'solution': 'Check your internet connection and firewall settings.'
                        })
                    
                    return issues

Self-Healing Functions

Let's add some self-healing capabilities to our system:

def fix_common_issues():
                    """
                    Attempts to automatically fix common issues.
                    
                    Returns:
                        dict: Results of attempted fixes
                    """
                    results = {
                        'fixed': [],
                        'failed': []
                    }
                    
                    # Try to fix published articles file
                    try:
                        if not os.path.exists('published_articles.json'):
                            with open('published_articles.json', 'w') as f:
                                json.dump([], f)
                            results['fixed'].append('Created missing published_articles.json file')
                        else:
                            # Try to load and validate the file
                            try:
                                with open('published_articles.json', 'r') as f:
                                    published = json.load(f)
                                    if not isinstance(published, list):
                                        # File exists but content is invalid
                                        with open('published_articles.json', 'w') as f:
                                            json.dump([], f)
                                        results['fixed'].append('Reset corrupted published_articles.json file')
                            except json.JSONDecodeError:
                                # File exists but content is invalid JSON
                                with open('published_articles.json', 'w') as f:
                                    json.dump([], f)
                                results['fixed'].append('Fixed invalid JSON in published_articles.json')
                    except Exception as e:
                        results['failed'].append(f'Failed to fix published_articles.json: {e}')
                    
                    # Try to fix topic index file
                    try:
                        if not os.path.exists('topic_index.txt'):
                            with open('topic_index.txt', 'w') as f:
                                f.write('0')
                            results['fixed'].append('Created missing topic_index.txt file')
                    except Exception as e:
                        results['failed'].append(f'Failed to fix topic_index.txt: {e}')
                    
                    return results

Advanced Scaling Techniques

As your automated blog system grows, you may want to scale it up in various ways.

Parallel Processing for Multiple Blogs

If you manage multiple blogs or want to generate multiple posts at once, parallel processing can help:

def generate_multiple_posts(topics, num_workers=3):
                    """
                    Generates multiple blog posts in parallel.
                    
                    Args:
                        topics (list): List of topics to create posts for
                        num_workers (int): Number of parallel workers
                        
                    Returns:
                        list: Results for each topic
                    """
                    from concurrent.futures import ThreadPoolExecutor
                    
                    results = []
                    
                    # Define a worker function
                    def worker(topic):
                        return schedule_single_post_creation(topic=topic)
                    
                    # Create a pool of workers and execute
                    with ThreadPoolExecutor(max_workers=num_workers) as executor:
                        future_to_topic = {executor.submit(worker, topic): topic for topic in topics}
                        
                        # Collect results as they complete
                        for future in concurrent.futures.as_completed(future_to_topic):
                            topic = future_to_topic[future]
                            try:
                                result = future.result()
                                result['topic'] = topic
                                results.append(result)
                                print(f"Completed post for topic: {topic}")
                            except Exception as e:
                                print(f"Error processing topic {topic}: {e}")
                                results.append({
                                    'topic': topic,
                                    'success': False,
                                    'error': str(e)
                                })
                    
                    return results

Advanced Content Strategy with AI

You can use AI to develop a more sophisticated content strategy:

def generate_content_strategy(niche, timeframe="3 months"):
                    """
                    Uses AI to generate a comprehensive content strategy.
                    
                    Args:
                        niche (str): The blog niche or industry
                        timeframe (str): The timeframe for the strategy
                        
                    Returns:
                        dict: The content strategy
                    """
                    prompt = f"""
                    Create a comprehensive content strategy for a blog in the {niche} niche over the next {timeframe}.
                    
                    Include:
                    1. Core content pillars (key topic areas)
                    2. Specific keyword recommendations for each pillar
                    3. A suggested content calendar with post frequency
                    4. Recommended content types for different topics (how-to, listicles, case studies, etc.)
                    5. Suggestions for internal linking structure
                    6. Ideas for content promotion
                    
                    Format the response as a structured JSON object with clear sections.
                    """
                    
                    response = client.chat.completions.create(
                        model="gpt-4o",
                        messages=[
                            {"role": "system", "content": "You are an expert content strategist specializing in SEO-focused blog strategies."},
                            {"role": "user", "content": prompt}
                        ],
                        max_tokens=2000
                    )
                    
                    try:
                        # Try to parse as JSON
                        strategy = json.loads(response.choices[0].message.content)
                        return strategy
                    except json.JSONDecodeError:
                        # If not valid JSON, return as text
                        return {'strategy_text': response.choices[0].message.content}

✅ Action Steps

  1. Customize your content style:
    • Modify the generate_blog_post function to use custom style prompts
    • Experiment with different writing styles to find what works best for your audience
    • Create templates for different types of content
  2. Set up reliable scheduling:
    • Choose an external scheduler (GitHub Actions, Cloud Scheduler, etc.)
    • Implement the topic rotation system
    • Set up logging to track publication history
  3. Implement analytics tracking:
    • Add UTM parameters to outbound links
    • Track post performance with WordPress and/or Google Analytics
    • Use performance data to refine your content strategy
  4. Add error handling and troubleshooting:
    • Implement the retry logic for critical functions
    • Add diagnostic tools to identify and fix common issues
    • Create a regular health check for your system
  5. Scale up your automated content:
    • Explore parallel processing for multiple blogs or topics
    • Use AI to develop a comprehensive content strategy
    • Consider integrating with other platforms for wider content distribution

In this module, you've learned how to:

  • Customize the style and format of your AI-generated content
  • Set up reliable scheduling for consistent publication
  • Track and analyze performance to optimize your content strategy
  • Troubleshoot common issues and implement self-healing mechanisms
  • Scale your automated blog system for greater impact

Your AI-powered blog automation system is now fully customized, reliable, and ready to scale. With the knowledge from this course, you have the tools to maintain a consistent, high-quality content pipeline with minimal manual effort.

What Our Students Say

This course has completely transformed my content strategy. I've gone from publishing 1 blog post a month to 3 posts a week, all with minimal effort. The SEO benefits have been incredible!

M
Michael T.
Digital Marketer

As a non-technical person, I was worried this would be too complex, but the step-by-step instructions made it incredibly easy to follow. My WordPress site now publishes content automatically with zero technical headaches.

J
Jennifer R.
Small Business Owner

The quality of the AI-generated content exceeded my expectations. With a few customizations to the prompts, the articles perfectly match my brand's voice. I'm saving over 20 hours a month on content creation.

A
Alex K.
Content Strategist

Ready to Automate Your Blog Content?

Transform your content strategy with AI-powered automation.

Start Course Now