Introduction
In this post, I am going to write a C program to remove comments and white spaces. For this, we must know the basic concepts of file handling in c. Because we will have to open a file, then read a file character by character. Next, we will check whether that character is white space or blank space, or enter key or single-line comment (//) or multi-line comment (/*…*/). If we get any character mention above, then we will remove it.
Also Read: C Language Program to Count the Number of Lowercase Letters in a Text File
But the main question is how to remove these characters. Does c program have any such functions? You will get all the answers about this. So at first, I am mentioning the contents of a file that we have to read in a c program.
Input File: file1.c
Suppose that the file name is “file1.c” and the contents of a file are:
Also Read: Program in C to Replace Capital C with Capital S in a File
#include<stdio.h>
//This is a simple c program
int main()
{
int a = 5, b = 10;
/* Now we will display the values of these variables. */
printf("a = %d and b = %d ", a, b);
return 0;
}
Output File: file1.c
Also Read: C Program to Copy the Contents of One File into Another File
On line 2 and line 6, there are single-line and multi-line comments respectively. The above c program is written in a file “file1.c”. This is not our main c program or we are not interested in the print output of the above program. We have to write a c program that reads the above file. After compiling and running the c program, you will find contents of the above file are
#include<stdio.h>intmain({inta=5,b=10;printf("a=%dandb=%d",a,b);return0;}
That means all the white spaces and the comments have been removed. Now let us see the actual c program and then I will explain the logic of this program.
C Program to Remove Comments and White Spaces from a File
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *fp1,*fp2;
int flag=0;
char ch;
fp1=fopen("file1.c","r");
fp2=fopen("file2.c","w");
if(fp1==NULL)
{
printf("Error while opening a file for reading");
return 0;
}
if(fp2==NULL)
{
printf("Error while opening a file for reading");
return 0;
}
while((ch=fgetc(fp1))!=EOF)
{
if((ch=='/ ')&&(flag==0))
{
flag=1;
continue;
}
else if((ch=='/')&&(flag==1))
{
flag=2;
continue;
}
else if((ch=='*')&&(flag==1))
{
flag=3;
continue;
}
if(flag==2)
{
if(ch=='n')
{
flag=0;
}
continue;
}
if(flag==3)
{
if(ch=='*')
{
flag=4;
}
continue;
}
if(flag==4)
{
if(ch=='/')
{
flag=0;
}
continue;
}
if(flag==0)
{
if((ch==13)||(ch==10))
{
continue;
}
else if((ch!=' '))
{
fputc(ch,fp2);
}
}
}
fclose(fp1);
fclose(fp2);
remove("file1.c");
rename("file2.c","file1.c");
fp1=fopen("file1.c","r");
while((ch=fgetc(fp1))!=EOF)
{
printf("%c",ch);
}
fclose(fp1);
return 0;
}
Detailed Explanation of this C Program:
FILE *fp1,*fp2;
In this c program, we are dealing with two files. Therefore, I have declared here two file pointers. One file pointer is used to open a file in reading mode and another file pointer is used to open a file in writing mode.
Also Read: The while loop in C Programming
int flag=0;
There are various situations where we have to change the value of this variable. What is the need to change the value of this variable? We are reading a file character by character and we don’t know what character we will read.
We have to make some decisions like if we read ‘/ ‘ then we will change the value of the flag. In comments, the first character is always ‘/’ but the next character may be ‘/’ and ‘*’. Therefore we are changing the value of the flag so that we can come to know that the first character was ‘/’.
char ch;
When we read a file character by character, then that character will be stored in this variable. In c programming, char is a data type.
fp1=fopen(“file1.c”,”r”);
We are opening a file “file1.c” in reading mode.
fp2=fopen(“file2.c”,”w”);
We are opening a file “file2.c” in reading mode. Actually, this is a temporary file.
if(fp1==NULL)
When we try to open a file with a file pointer, then there are chances that an error may occur before opening a file. There are many reasons but if there is an error then the value of fp becomes NULL. If this value of fp1 becomes NULL then we will display the message that there is an error.
Also Read: C Program to Count the Characters in the String Except Space
Line number 11 to 14 is the body of this if statement. This body will be executed when if the condition is true and the program will be terminated after executing line number 13.
printf(“Error while opening a file for reading”);
In the c program, printf() is a formatted output statement This printf() will display the message “Error while opening a file for reading”.
return 0;
This statement will terminate the program.
if(fp2==NULL)
A similar explanation like line number 10.
printf(“Error while opening a file for reading”);
In the c program, printf() is a formatted output statement This printf will display the message “Error while opening a file for writing”.
while((ch=fgetc(fp1))!=EOF)
In this c program, we have to read a file character by character. But that file is unknown to us. That means we don’t know how many characters are there in that file. In this c program, that file is “file1.c”. So we are using here while loop. We know that any loop statement repeats the number of statements that are written inside the block of the loop or body of the loop. But this repetition occurs until the condition remains true.
Also Read: Switch Case in C Programming
In this case, we will execute this loop until reaches to end of the file (EOF). If you carefully see this statement, we are using fgetc() to read character from a file associated with file pointer fp1 and that character will be stored in the variable ch. Now we are comparing this value of ch with EOF. If this ch value is not EOF then this loop will execute its body.
The body of the while loop is starting from line number 21 to line number 73. Every time this condition is checked and if that condition becomes true, this body will be executed.
if((ch==’/ ‘)&&(flag==0))
Here, we are checking the value of ch and the value of the flag. If the value of ch is ‘/ ‘ and that flag is 0, that means this first ‘/‘ of “//”. Because the value of the flag is 0. So if both the conditions are true then its body will be executed otherwise its body will not be executed.
Body of this if statement starts from line number 23 and ends at line number 26.
flag=1;
The value of the flag becomes 1. This value indicates that we got first ‘/ ‘ and now we are searching for another ‘/ ‘.
Also Read: C Program to Print Numbers Except Multiples of n
continue;
This statement will transfer control to the beginning of the loop. Why? Because we have read and identified the character. So there is no need to check further conditions. That is why we have written continue statement here.
else if((ch==’/’)&&(flag==1))
Here we are comparing the value of ch and flag. But this time, I am checking whether the value of the flag = 1. In the previous, if statement, I was comparing the value of flag = 0. The value of flag = 1 means we have already got first ‘/’ and here we are searching for another ‘/’. This flag value will tell me whether this is first ‘/ ‘ or another.
flag=2;
If we are in this body of if, that means both the conditions are true. So again change the value of the flag to 2. Why again? Because we know if we have got “//” then that complete line becomes a single line comment. So whatever comes after this line, we will not print it in our file2.c. Indirectly we are removing all these comment parts.
else if((ch==’*’)&&(flag==1))
In this condition, we are comparing the value of flag = 1 and ch = *. Because, after reading first ‘ / ‘, there is possibility of getting ‘ * ‘. Because this is also a comment. If both conditions are true then we will enter into the following body.
Also Read: Floyd Triangle in C Programming
flag=3;
This value of the flag tells us that we are working on the multi-line comment.
if(flag==2)
We have made the value of flag = 2 after getting “//”. Because we don’t want to write the comment part in file2.c. But “//” is a single-line comment and we are waiting for the single line to be finished. How would we know that this single line comment is finished or not? So inside the body of this if statement, we have also written another if statement.
if(ch==’n’)
Here I am searching whether I am reached the end of the line or not. ‘n’ is the new line character. This means after this we will switch on the next line. This means that we have removed the single line comment part. So no need to keep the value of flag=2. So, therefore, we are changing the value of flag=0.
if(flag==3)
We have made the value of flag=3 when we are at the starting at the multi-line comment. In single-line comment, we were searching for the line to be finished. But here we are searching for “*/”. Because this is the end of the multi-line comment.
Also Read: Palindrome in C using Pointers
if(ch==’*’)
When we get the value of ch=’*’, we will make the value of flag=4. Why? Because when we ‘/’ after this, we will come to know that multi-line comment is over.
if(flag==4)
If the value of flag=4, we are searching for ‘/’ to come to know that the multi-line comment is over. Therefore, in the next, if statement, we compare the value of ch = / and if that true, then we will make flag = 0.
if(flag==0)
If we have reached this statement that means the comment part is already deleted. How? Because of the value of flag=0. But still, we have to delete the white space between the words or tokens or between the two statements.
if((ch==13)||(ch==10))
Here, 13 and 10 are ASCII values for new-line and line feed respectively. If this condition becomes true that means there is a white space between two statements. So don’t write anything in file2.c. So again we have written a continue statement which transfers the control to the beginning of the loop.
Also Read: C Program to Remove Zeros from a number
else if((ch!=’ ‘))
This if statement checks for the white space between words or tokens. If this condition is true that means that the value of ch is not blank space. So we can write that character in a file. c. How can we write?
fputc(ch,fp2);
In a file handling c program, fputc() writes a single character to the file associated with the file pointer fp2.
remove(“file1.c”);
Here, I am removing file1.c. Because my actual output is written in file2.c
rename(“file2.c”,”file1.c”);
After removing file1.c, by using this function, we are renaming the name of file2.c to file1.c. Now, we have file1.c.
So this is the program written in c to remove white spaces and comments from the c program stored in a file. I hope, you have understood my explanation.
Thanks for reading.