5. Examining and Modifying Text

Examining and modifying text is yet another common operation performed on text buffers. Examples: converting a selected portion of text into a comment while editing a program, determining and inserting the correct end tag while editing HTML, inserting a pair of HTML tags around the current word, etc. The GtkTextIter object provides functions to do such processing.

In this section we will develop two programs to demonstrate these functions. The first program will insert start/end li tags(not to be confused with text attribute tags) around the current line, when a button is clicked. The second program will insert an end tag for an unclosed start tag.

To insert tags around the current line, we first obtain an iter at the current cursor position. Then we move the iter to the beginning of the line, insert the start tag, move the iter to the end of the line, and insert the end tag.

An iter can be moved to a specified offset in the same line using


void gtk_text_iter_set_line_offset( GtkTextIter *iter,
                                    gint char_on_line );
The function moves iter within the line, to the character offset specified by char_on_line. If char_on_line is equal to the no. of characters in the line, the iter is moved to the start of the next line.

A character offset of zero, will move the iter to the beginning of the line. The iter can be moved to the end of the line using


gboolean gtk_text_iter_forward_to_line_end( GtkTextIter *iter );

Now that we know the functions required to implement the first program, here's the code.


#include <gtk/gtk.h>

void
on_window_destroy (GtkWidget *widget, gpointer data)
{
  gtk_main_quit ();
}

/* Callback for close button */
void
on_button_clicked (GtkWidget *button, GtkTextBuffer *buffer)
{
  GtkTextIter iter;
  GtkTextIter end;
  GtkTextMark *cursor;

  gchar *text;
  
  /* Get the mark at cursor. */
  cursor = gtk_text_buffer_get_mark (buffer, "insert");
  /* Get the iter at cursor. */
  gtk_text_buffer_get_iter_at_mark (buffer, &iter, cursor);

  gtk_text_iter_set_line_offset (&iter, 0);
  gtk_text_buffer_insert (buffer, &iter, "<li>", -1);
  gtk_text_iter_forward_to_line_end (&iter);
  gtk_text_buffer_insert (buffer, &iter, "</li>", -1);
}

int 
main(int argc, char *argv[])
{
  GtkWidget *window;
  GtkWidget *vbox;
  GtkWidget *text_view;
  GtkWidget *button;
  GtkTextBuffer *buffer;
  
  gtk_init (&argc, &argv);

  /* Create a Window. */
  window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
  gtk_window_set_title (GTK_WINDOW (window), "Insert Tags");

  /* Set a decent default size for the window. */
  gtk_window_set_default_size (GTK_WINDOW (window), 200, 200);
  g_signal_connect (G_OBJECT (window), "destroy", 
                    G_CALLBACK (on_window_destroy),
                    NULL);

  vbox = gtk_vbox_new (FALSE, 2);
  gtk_container_add (GTK_CONTAINER (window), vbox);

  /* Create a multiline text widget. */
  text_view = gtk_text_view_new ();
  gtk_box_pack_start (GTK_BOX (vbox), text_view, 1, 1, 0);

  /* Obtaining the buffer associated with the widget. */
  buffer = gtk_text_view_get_buffer (GTK_TEXT_VIEW (text_view));
  /* Set the default buffer text. */ 
  gtk_text_buffer_set_text (buffer, "Item1\nItem2\nItem3", -1);
  
  /* Create a insert bold tags button. */
  button = gtk_button_new_with_label ("Make List Item");
  gtk_box_pack_start (GTK_BOX (vbox), button, 0, 0, 0);
  g_signal_connect (G_OBJECT (button), "clicked", 
                    G_CALLBACK (on_button_clicked),
                    buffer);
  
  gtk_widget_show_all (window);

  gtk_main ();
  return 0;
}

For the second program, we will have to first get the iter at the current cursor position. We then search backwards from the cursor position, through the buffer till we hit on an unclosed tag. We then insert the corresponding end tag at the current cursor position. (Note that the procedure given does not take care of many special cases, and might not be the best way to determine an unclosed tag. But it serves our purpose of explaining text manipulation functions. Developing a perfect algorithm to determine an unclosed tag, is out of the scope of this tutorial.)

We can identify tags using the left angle bracket. So searching for start/end tags involves search for the left angle bracket. This can be done using


gboolean gtk_text_iter_backward_find_char( GtkTextIter *iter,
                                           GtkTextCharPredicate pred,
                                           gpointer user_data,
                                           const GtkTextIter *limit );
The function proceeds backwards from iter, and calls pred for each character in the buffer, with the character and user_data as arguments, till pred returns TRUE. (pred should return TRUE when a match is found.) If a match is found, the function moves iter to the matching position and returns TRUE. If a match is not found, the function moves iter to the beginning of the buffer or limit(if non-NULL) and returns FALSE.

For our purpose we write a predicate that returns TRUE when the character is a left angle bracket.

When we hit on a left angle bracket we check whether the corresponding tag is a start tag or an end tag. This is done by examining the character immediately after the left angle bracket. If it is a '/' it is an end tag.

To extract the character after the angle bracket we move the left angle bracket iter by one character. And then extract the character at that position. To move an iter forward by one character, the following function can be used.


gboolean gtk_text_iter_forward_char( GtkTextIter *iter );
To extract the character at an iter the following function can be used.

gunichar gtk_text_iter_get_char( const GtkTextIter *iter );

After determining the tag type we do the following,

We haven't mentioned how we extract the tag name. The tag name is extracted using two iters(start and end iter). The start iter is obtained by starting from the left angle bracket iter and searching for an alphanumeric character, in the forward direction. The end iter is obtained by starting from the start iter and searching for a non-alphanumeric character, in the forward direction. The search can be done using the forward variant of the gtk_text_iter_backward_find_char.

The code for the second example follows.


#include <ctype.h>
#include <string.h>

#include <gtk/gtk.h>

void
on_window_destroy (GtkWidget *widget, gpointer data)
{
  gtk_main_quit ();
}

gboolean 
islangle (gunichar ch, gpointer data)
{
  if (ch == '<')
    return TRUE;
  else
    return FALSE;
}

gboolean
notalnum (gunichar ch1, gpointer data)
{
  return !isalnum (ch1);
}

/*
 * Check whether the tag at ITER is an opening tag or a closing tag.
 */ 
gboolean 
is_closing (GtkTextIter *iter, GtkTextBuffer *buffer)
{
  GtkTextIter slash;

  slash = *iter;
  gtk_text_iter_forward_char (&slash);
  if (gtk_text_iter_get_char (&slash) == '/')
    return TRUE;
  else
    return FALSE;
}

/*
 * Returns the start/end tag at position specified by ITER.
 * Returns NULL if tag not found.
 */ 
char *
get_this_tag (GtkTextIter *iter, GtkTextBuffer *buffer)
{
  GtkTextIter start_tag = *iter;
  GtkTextIter end_tag;
  gboolean found;

  /* start_tag points to '<', moving to the next alphabet character
     will get the start of the tag name. */
  found = gtk_text_iter_forward_find_char (&start_tag, 
                                           (GtkTextCharPredicate) isalnum, 
                                           NULL, NULL);
  if (!found)
    return NULL;

  /* search for non-alnum character in the forward direction from start_tag */
  end_tag = start_tag;
  found = gtk_text_iter_forward_find_char (&end_tag, 
                                           (GtkTextCharPredicate) notalnum, 
                                           NULL, NULL);
  if (!found)
    return NULL;
  
  /* return the text between '<[/]' and non-alnum */
  return gtk_text_buffer_get_text (buffer, &start_tag, &end_tag, FALSE);
}

/*
 * Insert the closing tag specified by TAG at the current cursor
 * position.
 */
void
insert_closing_tag (GtkTextIter *iter, gchar *tag, GtkTextBuffer *buffer)
{
  char *insert;

  insert = g_strdup_printf ("</%s>", tag);
  gtk_text_buffer_insert_at_cursor(buffer, insert, strlen(insert));
  g_free (insert);
}

/* Callback for insert closing tag button */
void
on_button_clicked (GtkWidget *button, GtkTextBuffer *buffer)
{
  GtkTextIter iter;
  GQueue *stack;
  GtkTextMark *cursor;

  stack = g_queue_new ();

  /* Get the mark at cursor. */
  cursor = gtk_text_buffer_get_mark (buffer, "insert");
  /* Get the iter at cursor. */
  gtk_text_buffer_get_iter_at_mark (buffer, &iter, cursor);

  while (1)
    {
      int found;
      char *tag;
      char *tag_in_stack;

      /* Search backwards for '<'. */
      found = gtk_text_iter_backward_find_char (&iter, islangle, NULL, NULL);
      if (!found) 
        break;

      tag = get_this_tag (&iter, buffer);
      if (tag == NULL)
        continue;

      if (is_closing (&iter, buffer))
        {
          /* If it is a closing tag, push it into the stack */
          g_queue_push_head(stack, tag);
        }
      else
        {
          /* If it is an opening tag, pop an item from the stack. If
             there are no items in the stack, then this tag has not
             been closed, and it is the one we should close. */
          tag_in_stack = g_queue_pop_head(stack);
          if (tag_in_stack == NULL)
            {
              insert_closing_tag (&iter, tag, buffer);
              g_free (tag);
              break;
            }
          else
            g_free (tag_in_stack);
        }
    }

  g_queue_foreach (stack, (GFunc)g_free, NULL);
  g_queue_free (stack);
}

int 
main(int argc, char *argv[])
{
  GtkWidget *window;
  GtkWidget *vbox;
  GtkWidget *text_view;
  GtkWidget *button;
  GtkTextBuffer *buffer;
  
  gtk_init (&argc, &argv);

  /* Create a Window. */
  window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
  gtk_window_set_title (GTK_WINDOW (window), "Close Tag");

  /* Set a decent default size for the window. */
  gtk_window_set_default_size (GTK_WINDOW (window), 200, 200);
  g_signal_connect (G_OBJECT (window), "destroy", 
                    G_CALLBACK (on_window_destroy),
                    NULL);

  vbox = gtk_vbox_new (FALSE, 2);
  gtk_container_add (GTK_CONTAINER (window), vbox);

  /* Create a multiline text widget. */
  text_view = gtk_text_view_new ();
  gtk_box_pack_start (GTK_BOX (vbox), text_view, 1, 1, 0);

  /* Obtaining the buffer associated with the widget. */
  buffer = gtk_text_view_get_buffer (GTK_TEXT_VIEW (text_view));
  /* Set the default buffer text. */ 
  gtk_text_buffer_set_text (buffer, 
                            "<html>\n"
                            "<head><title>Title</title></head>\n"
                            "<body>\n"
                            "<h1>Heading</h1>\n", -1);
  
  /* Create a close button. */
  button = gtk_button_new_with_label ("Insert Close Tag");
  gtk_box_pack_start (GTK_BOX (vbox), button, 0, 0, 0);
  g_signal_connect (G_OBJECT (button), "clicked", 
                    G_CALLBACK (on_button_clicked),
                    buffer);
  
  gtk_widget_show_all (window);

  gtk_main ();
  return 0;
}

5.1. More functions to Examine and Modify Text

There a lot more functions to examine and modify text. Some of the interesting ones are listed below. You can get the complete list of available functions from the GTK+ manual.

There is a class of functions used to test for some characteristic of the text. For example to check whether the iter is at the beginning/end of word, sentence or line. The corresponding functions are


gboolean gtk_text_iter_starts_word( const GtkTextIter *iter );

gboolean gtk_text_iter_ends_word( const GtkTextIter *iter );

gboolean gtk_text_iter_starts_sentence( const GtkTextIter *iter );

gboolean gtk_text_iter_ends_sentence( const GtkTextIter *iter );

gboolean gtk_text_iter_starts_line( const GtkTextIter *iter );

gboolean gtk_text_iter_ends_line( const GtkTextIter *iter );

The family of functions based on tags and tag toggling have also not been mentioned so far. To check whether a position in the buffer starts/ends/toggles a tag the following functions can be used.


gboolean gtk_text_iter_begins_tag( const GtkTextIter *iter,
                                   GtkTextTag *tag );

gboolean gtk_text_iter_ends_tag( const GtkTextIter *iter,
                                 GtkTextTag *tag );

gboolean gtk_text_iter_toggles_tag( const GtkTextIter *iter,
                                    GtkTextTag *tag );
The return value of toggles_tag variant is the logical OR of begins_tag variant and ends_tag variant. If tag is NULL, the function returns TRUE if iter starts/ends/toggles any tag.

We can also move through a buffer based on tag toggling. To move to a position in the buffer where a particular tag toggles the following functions can be used.


gboolean gtk_text_iter_forward_to_tag_toggle( GtkTextIter *iter,
                                              GtkTextTag *tag );

gboolean gtk_text_iter_backward_to_tag_toggle( GtkTextIter *iter,
                                               GtkTextTag *tag );
If tag is NULL, toggling of any tag is considered.