Extracting YouTube Transcripts with JavaScript

Extracting YouTube Transcripts with JavaScript

YouTube videos often contain valuable information, and extracting transcripts can be useful for various purposes, from accessibility to content analysis. In this article, we'll explore how to use JavaScript to retrieve and process YouTube transcripts.

Prerequisites

Before diving into the code, make sure you have a basic understanding of JavaScript and the Document Object Model (DOM). Additionally, you'll need a YouTube video ID, which you can extract from the video URL.

Javascript Code

function reteriveTranscript() {
  const videoId = new URLSearchParams(window.location.search).get('v');
  const YT_INITIAL_PLAYER_RESPONSE_RE =
    /ytInitialPlayerResponse\s*=\s*({.+?})\s*;\s*(?:var\s+(?:meta|head)|<\/script|\n)/;
  let player = window.ytInitialPlayerResponse;
  if (!player || videoID !== player.videoDetails.videoId) {
    fetch('https://www.youtube.com/watch?v=' + videoId)
      .then(function (response) {
        return response.text();
      })
      .then(function (body) {
        const playerResponse = body.match(YT_INITIAL_PLAYER_RESPONSE_RE);
        if (!playerResponse) {
          console.warn('Unable to parse playerResponse');
          return;
        }
        player = JSON.parse(playerResponse[1]);
        const metadata = {
          title: player.videoDetails.title,
          duration: player.videoDetails.lengthSeconds,
          author: player.videoDetails.author,
          views: player.videoDetails.viewCount,
        };
        // Get the tracks and sort them by priority
        const tracks = player.captions.playerCaptionsTracklistRenderer.captionTracks;
        tracks.sort(compareTracks);

        // Get the transcript
        fetch(tracks[0].baseUrl + '&fmt=json3')
          .then(function (response) {
            return response.json();
          })
          .then(function (transcript) {
            const result = { transcript: transcript, metadata: metadata };

            const parsedTranscript = transcript.events
              // Remove invalid segments
              .filter(function (x) {
                return x.segs;
              })

              // Concatenate into single long string
              .map(function (x) {
                return x.segs
                  .map(function (y) {
                    return y.utf8;
                  })
                  .join(' ');
              })
              .join(' ')

              // Remove invalid characters
              .replace(/[\u200B-\u200D\uFEFF]/g, '')

              // Replace any whitespace with a single space
              .replace(/\s+/g, ' ');

            // Use 'result' here as needed
            console.log('EXTRACTED_TRANSCRIPT', parsedTranscript);
          });
      });
  }
}

function compareTracks(track1, track2) {
  const langCode1 = track1.languageCode;
  const langCode2 = track2.languageCode;

  if (langCode1 === 'en' && langCode2 !== 'en') {
    return -1; // English comes first
  } else if (langCode1 !== 'en' && langCode2 === 'en') {
    return 1; // English comes first
  } else if (track1.kind !== 'asr' && track2.kind === 'asr') {
    return -1; // Non-ASR comes first
  } else if (track1.kind === 'asr' && track2.kind !== 'asr') {
    return 1; // Non-ASR comes first
  }

  return 0; // Preserve order if both have same priority
}

Code Explanation

  1. retrieveTranscript Function

    Retrieves the video ID from the query parameters of the current URL.

     const videoId = new URLSearchParams(window.location.search).get('v');
    
  2. Defines a regular expression to extract the ytInitialPlayerResponse object from the YouTube video page's HTML.

     const YT_INITIAL_PLAYER_RESPONSE_RE = /ytInitialPlayerResponse\s*=\s*({.+?})\s*;\s*(?:var\s+(?:meta|head)|<\/script|\n)/;
    
  3. Checks if the ytInitialPlayerResponse object is available and matches the current video ID. If not, it fetches the YouTube video page HTML to extract the necessary information.

     if (!player || videoId !== player.videoDetails.videoId) { /* ... */ }
    
  4. Fetches the HTML content of the YouTube video page.

     fetch('https://www.youtube.com/watch?v=' + videoId)
       .then(response => response.text())
       .then(body => { /* ... */ });
    
  5. Extracts and parses the ytInitialPlayerResponse object from the HTML.

     const playerResponse = body.match(YT_INITIAL_PLAYER_RESPONSE_RE);
     player = JSON.parse(playerResponse[1]);
    
  6. Extracts metadata such as video title, duration, author, and views from the player

     const metadata = { /* ... */ };
    
  7. Sorts caption tracks based on language and type, then fetches the transcript data using the selected track's base URL.

     const tracks = player.captions.playerCaptionsTracklistRenderer.captionTracks;
     tracks.sort(compareTracks);
     fetch(tracks[0].baseUrl + '&fmt=json3')
       .then(response => response.json())
       .then(transcript => { /* ... */ });
    
  8. Processes the raw transcript data by filtering out invalid segments, mapping the UTF-8 text, and cleaning up unnecessary characters and whitespace.

     const parsedTranscript = transcript.events
       .filter(x => x.segs)
       .map(x => x.segs.map(y => y.utf8).join(' '))
       .join(' ')
       .replace(/[\u200B-\u200D\uFEFF]/g, '')
       .replace(/\s+/g, ' ');
    
  9. Generates the final content, including metadata, transcript, and placeholders for additional instructions.

     const parsedTranscript = [ /* ... */ ].join('\n');
    
  10. Sorting caption tracks sorts the available caption tracks, giving priority to English and non-automatic speech recognition (ASR) tracks.

Usage

To use the script, embed it in your project and call the retrieveTranscript function. Ensure that the YouTube API response format remains consistent, as the script relies on the structure of the ytInitialPlayerResponse object.

Summary

With this JavaScript code, you can easily retrieve and process YouTube transcripts programmatically. Feel free to adapt the code to suit your specific requirements or integrate it into web applications for enhanced functionality.

Happy Coding !!!