Sunday, July 25, 2010

ADITI TECHNOLOGIES :exam papers



ADITI TECHNOLOGIES exam papers




SECTION 1-APTITUDE

Directions for question 1-10: Each question comprises four scattered segments of a sentence. Identify from among the four choices the sequences that correctly assembles the segments and completes the sentence.


1. A. Anniversaries can be dicey things.
B. As long as the dead are being commemorated on a particular day, it is fine.
C. The opposition could keep its gun powder dry and ready as well.
D. But when it comes to a government celebrating a year or two in office, there can be trouble.
(a) DACB
(b) CADB
(c) ABDC
(d) BCDA
2. A. It is less concerned with telling a tale.
B. As with so much Huxley's later fiction, one is not sure whether or not to call this book a true novel.
C. It is also weak on characterization but strong on talk.
D. Than with presenting an attitude of life.
(a) CBDA
(b) BADC
(c) ABDC
(d) DCBA
3. A. While the above is true for private sector companies, it is not so in the public limited companies.
B. But with the removal of control over premia, the premia at which issues are marked has gone up quite sharply.
C. So the cost of capital at even a lower debt equity ratio comes out lower.
D. Traditionally, the cost of equity is higher than the cost of debt.
(a) DBCA
(b) BADC
(c) ACDB
(d) CDAB
4. A. Compiling and debugging.
B. Testing.
C. Writing the code.
D. Thinking of the algorithm.
E. Understanding the problem
(a) DBECA
(b) CDABE
(c) ACDBE
(d) EDCAB
5. A. Such alliances are shaky from the start.
B. In this manner parties which are not able to get a mandate from the electorate are able to come to power.
C. We have seen the unique spectacle of political forming alliance just to form a
government.
D. Indian democracy continues to amaze.
(a) ADBC
(b) BADC
(c) DCBA
(d) BDCA
6. A. They are the three faces of dysphoria - bad feeling.
B. Anxiety, Depression and Anger.
C. When the three combine and get out of control, we get what is called mental illness.
D. All of us experience three emotions almost daily.
(a) ACDB
(b) CBDA
(c) CADB
(d) DBAC
7. A. This is the key to tap the creative power inside us.
B. It is difficult to control our thinking and feelings.
C . That is, unless we work at it conciously and persistently.
D. We are influenced and limited by attitudes, prejudices by other individuals and by external conditions to such an extent that few can control mental and emotional processes.
(a) ACDB
(b) BDCA
(c) CADB
(d) DBAC
8. A. Two vital facts must be understood.
B. The subconscious mind has the power to create.
C. The second is that it obeys the orders given to it by the conscious mind.
D. Its function is to bring to full expression whatever is desired by the conscious mind.
(a)ABCD
(b) CBDA
(c) ADBC
(d) DBAC
9. A. But what is not often understood is that this flash is outcome of long periods of incubation.
B. The layman thinks that it is a spell of divine flash which illuminates the dark and the hidden.
C. True, it does.
D. Inspiration is much misunderstood term.
(a) DBAC
(b) ACBD
(c) DBCA
(d) CADB
10. A. As a result, the world has undergone a transformation.
B. Science and civilization have made rapid strides especially in recent times.
C. This constitutes the basic factor in the use of productive resources.
D. But behind all the progress of mankind is the human factor which is invaluable
and irreplaceable.
(a) ABDC
(b) BADC
(c) DCAB
(d) CBDA
Directions for questions 11-20: Each problem contains a question and two statements giving certain data. You have to select the correct answer from (a) to (d) depending on the sufficiency of the data given in the statements to answer the question. Mark your answer as
(a) If statement (I) alone is sufficient.
(b) If statement (II) alone is sufficient.
(c) If both (I) and (II) together are sufficient but neither of statements alone is sufficient.
(d) Either of the statements (I) and (II) is sufficient.
(e) If statements (I) and (II) together are not sufficient.
11. What is the distance from City A to City C in kms?
(I) City A is 90 kms from City B.
(II) City B is 30 kms from City C.
12. Is z less than w? z and w are real numbers.
(I) z2 = 25
(II) w = 9
13. The value of an estate in January 1905 started gradually declining in such a way that at the end of each year it was worth only x times its value at the beginning of the year.
What was its worth in end December 1910 ?
(I) It was worth Rs.10,109 in the end of December 1906.
(II)It was worth Rs.12,345 in the beginning of January 1905.
14. In an election, 3 candidates A,B and C were representing for a membership of parliament. How many votes did each receive?
(I) A received 1006 votes more than B and 1213 more votes than C.
(II) Total votes cast were 15,414.
15. John studies Chinese in a school. Which school does he attend?
(I) All students in Jefferson High school take French.
(II) Maysville High School offers only Chinese.
16. How many girls passed the entrance exam this year?
(I) Last year 560 girls passed
(II) This year there was a 10% decrease over last year in the number of failures.
17. What is Raju's age?
(I) Raju, Vimala and Kishore are all of the same age.
(II) Total age of Vimala, Kishore and Abishek is 32 and Abishek is as old as Viala and Kishore together.
18. Is Sreedhar eligible for an entry pass to the company premisers?
(I) The company does not allow strangers to enter the company.
(II)All employees are elgible to get a pass.
19. Among five friends who is the tallest?
(I) D is taller than A and C.
(II)B is shorter than E but taller than D.
20. Can a democratic system operate without effective opposition?
(I) The opposition is indispensable.
(II) A good statesman always learns more from his opponents than from
his fervent supporters.
Directions for question 1-2 : Answer the questions based on the passage above them
A temple has 3 gateways, each of them is leading you into the temple, and at the end of each gateway there is an idol and as a devotee passes through the gateway with some flowers the number of flowers double. Ram enters the 1st gateway with some flowers and he puts same number of flowers at each idol and the end he is left with none.
21. How many flowers did Ram start with?
(a) 4
(b) 5
(c) 3
(d) 7
22. How many flowers does he put at each idol?
(a) 10
(b) 8
(c) 6
(d) 5
Directions for question 3-5 : Answer the questions based on the passage above them
Liz, Jenni, Jolie and Rick have an English final on Friday and they all would like to study together at least once before the test. Liz can study only on Monday, Tuesday and Wednesday nights and Thursday afternoon and night.Jenni can study only on Monday, Wednesday and Thursday nights and Tuesday afternoon and night. Jolie can study only on Wednesday and Thursday nights, Tuesday afternoon and Monday afternoon and night. Rick can study the afternoons and nights of Tuesday, Wednesday and Thursday, and on Monday afternoon.
23. If the group is to study twice, then the days could be
(a) Monday and Wednesday
(b) Tuesday and Thursday
(c) Wednesday and Thursday
(d) Monday and Friday
(e) Tuesday and Wednesday
24. If three of them tried to study together when all four couldn't
(a) this would be possible twice
(b) it would have to be on Wednesday night
(c) Rick could not attend the three person groups
(d) This could be accomplised on Monday and Tuesday only
(e) This would not be possible
25. If Liz decided to study every night,
(a) she would never be able to study with Rick
(b) she would never be able to study with Jolie
(c) she would have at least two study partners each night
(d) she would have to study alone on Monday night
(e) she would study with only Jenni on Thursday night
SECTION 2-COMPUTER AWARENESS (15 questions)
NOTE: The questions are of multiple choice format in the paper
1. What is the number of functions of a three variable boolean function?
2. Which is the most commonly used replacement algorithm?
Ans. LRU
3. Which memory management technique involves dividing the memory into fixed sized
blocks?
Ans. Paging
4. What is video resolution ?
5. The processing speed of a microprocessor depends on _____?
Ans. data bus width
SECTION 3: C TEST
NOTE: The questions are of multiple choice format in the paper
1. What is the output of the program given below
#include
main()
{
char i=0;
for(;i>=0;i++) ;
printf("%d\n",i);
}
2. What is the output of the following program
#include
main()
{
int i=0;
fork();
printf("%d",i++);
fork();
printf("%d",i++);
fork();
wait();
}
3. What is the memory allocated by the following definition ?
int (*x)[10];
4. What is the memory allocated by the following definition ?
int (*x)();
5. In the following program segment
#include
main()
{
int a=2;
int b=9;
int c=1;
while(b)
{
if(odd(b))
c=c*a;
a=a*a;
b=b/2;
}
printf("%d\n",c);
}
How many times is c=c*a calculated?
6. In the program segment in question 5 what is the value of a at the end of the while
loop?
7. What is the output for the program given below
typedef enum grade{GOOD,BAD,WORST,}BAD;
main()
{
BAD g1;
g1=1;
printf("%d",g1);
}
8. Give the output for the following program.
#define STYLE1 char
main()
{
typedef char STYLE2;
STYLE1 x;
STYLE2 y;
clrscr();
x=255;
y=255;
printf("%d %d\n",x,y);
}
9. Give the output for the following program segment.
#ifdef TRUE
int I=0;
#endif
main()
{
int j=0;
printf("%d %d\n",i,j);
}
10. In the following program
#include
main()
{
char *pDestn,*pSource="I Love You Daddy";
pDestn=malloc(strlen(pSource));
strcpy(pDestn,pSource);
printf("%s",pDestn);
free(pDestn);
}
(a)Free() fails
(b)Strcpy() fails
(c)prints I love You Daddy
(d)error
11. What is the output for the following program
#include
main()
{
char a[5][5],flag;
a[0][0]='A';
flag=((a==*a)&&(*a==a[0]));
printf("%d\n",flag);
}



EXAMPAPERS123.BLOGSPOT.COM

Sample Questions for Test in Various Disciplines Recruitment of Interns



Sample Questions for Test in Various Disciplines Recruitment of Internls



Subject: Information Technology

(1) All Questions are compulsory
(2) Maximum time allowed for the test is 60 minutes
(3) Each CORRECT answer will be awarded 1 mark
(4) 25 per cent marks will be deducted for each wrong answer

Sample Questions for Test Please tick (P) the correct one:

Q 1. You are designing the properties, methods, and events for components in a new VB application. There is a requirement that a customer have an ID number. How should you design the ID number?

a) as an event b) as a method c) as a property d) as a component

Q 2. When initializing the random-number generator using the Randomize statement, what happens if a number isn't supplied as argument?

a) Visual Basic generates an error b) the random-number generator is not initialized
c) the random-number generator is initialized using the current value of the system timer
d) the default value of 10 is used in place of the number

Q 3. Which of these statements is incorrect:

a) ActiveX DLLs are in-process servers b) ActiveX EXEs are out-of-process servers
c) ActiveX DLLs need to use marshalling
d) ActiveX EXEs run as seperate processes




Q. 4. Examine the following code:

public class Quiz2_3 {
public static void main( String[] args) {
int x = 010;
int y = 0x10;
int z = 10;
System.out.println(x+y+z);
}
}
Which one of the following correctly describes the behaviour when this program is compiled?

a) Compilation is successful and the output is 30.
b) Compilation is successful and the output is 36.
c) Compilation is successful and the output is 34.
d) The compiler rejects the expression y=0x10 because letters of the alphabet are not allowed in numeric constants.
e) The compiler rejects the expression y = 0x10 because a hexadecimal value cannot be assigned to an int variable.

Q5. On an operating system that treats filenames as case-sensitive, which of the following declarations are valid for a class for which the source is stored in the file Fred.java ? Select as many as apply :

a) package myPackage;
public class Fred{
Static void main( String[] args) {
/* body of Fred.main */
}
}
class Joe {
// body of class Joe
b) package myPackage;
public class FRED {
// body of class FRED
}
c) public class Fred {
int month = 1;
int day = 21;
}
d) Static void Fred.main(String[] args) {
//body of Fred.main }

6. Which of the following is not an advantage of table partioning?
a) Partition improves the performance of queries
b) It reduces amount of sorting required c) Both a & b d)None of these

Q.7. What Oracle package can be used to set a role for a user dynamically within an anonymous PL/SQL block?
a) DBMS_ROLE b) DBMS_SESSION c) DBMS_USER
d) DBMS_UTILITY e) Oracle does not provide a package for this.

Q 8. The Program Global Area (PGA) contains ALL of the following except:
a) Sort Area b) System Change Number (SCN) c) Session Information d) Cursor state

Q. 9. What PL/SQL statements ensure that all database changes brought about by SQL operations are either made permanent or undone at the same time (choose all that apply):
a) CANCEL b) CONFIRM c) None d) COMMIT e) ROLLBACK

Q10. Which of the following correctly describe the restrictions on the use of LONG columns (choose all that apply):
a). You cannot create an object type with a LONG attribute.
b). A table can contain only one LONG column.
c). LONG columns cannot appear in WHERE clauses.
d). LONG columns cannot be indexed. e). All the Above

Q.11. Which one of the following uses the correct syntax to include an applet called ButtonText in an HTML page?
a).


b).


c).


d).




Subject: Logical Reasoning

Logical Consistency

Each question has a main statement followed by four statements labelled A, B, C and D. Choose the ordered pair of statements where the first statement implies the second, and the two statements are logically consistent with the main statement.

1. All irresponsible parents cause their children to become drug addicts.
A. Mohan is a drug addict
B. Mohan is not a drug addict.
C. Mohan’s parent are responsible parents.
D. Mohan’s parent are irresponsible

a. BD b. AD c. CB d. DA

2. I date only on Sundays.
A. I dated.
B. I did not date.
C. It’s Sunday
D. It’s not Sunday

a. AC b. AD c. CD d. AB

Deductive Logic
Each question has a set of four statements. Each statement has three segments. Choose the alternative where the third segment in the statement can be logically deduced using both the preceding two, but not just from one of them.
1. A. Pegasus is a horse; Some horses have wings; Pegasus has wings.
B. Tina has weak legs; Polio at births can weaken legs; Tina has Polio.
C. All good people are not priests; Some priests may not be good; Ratan is a good priest.
D. Sharma sells medicine; All who sells medicine are rich; Sharma is not poor.

a. B & D b. D only c. B, B & C d. A & D
2. A. Geometry is derived from algebra; Algebra is derived from Trigonometry; Trigonometry is derived from Geometry.
B. Calculus is derived from Geometry; Geometry is derived from Trigonometry; All problems of Calculus can be solved by Trigonometry.
C. Maths is a science; Some science is definitely art; Maths is an art.
D. Commerce is an art; Most art is scientific enquiry; Commerce can be scientific enquiry.

a. A & D b. C & D c. D only d. A only

Data Interpretation

Directions for Q1-Q5: The following table refers to the production, import and consumption of fertilizers during the period 1988-93. Study it carefully and answer the questions that follow.

Year (in lakh tonnes) Cost of imports (Rs. Crore)
1988/89 90 20 110 645
1989/90 85 31 116 1540
1990/91 90 35 125 1336
1991/92 98 28 126 1935
1992/93 97 25 122 2220

Assume : There is no inventory at the start or the end of any year.

1. What was the percentage shortfall of domestic production in 1991-92?

a. 11.11 b. 22.22 c.28.57 d. 77.77

2. In which year did the imports register the largest percentage growth?

a. 1988-89 b. 1989-90 c. 1990-91 d. 1991-92

3. If the ratio of cost of domestic production to cost of import in 1992-93 was 2:3, what was the total cost of domestic production in 1992-93? (approximately)

a. Rs.8600 Cr. B. Rs. 1290 Cr. c. 5740 Cr. d. None of these

Numeric Ability

1. Three consecutive whole numbers are such that the square of the middle number is greater than the product of the other two by one. Find the middle number.
a. 6 b. 18 c. 12 d. All of these

2. An alloy contains tin, Aluminium and copper in the ratio 1/2 : 1/3 : 1/5. What is the approximate percentage of Aluminium in the alloy?

a. 20% b. 30% c. 32% d. 36%
3. A housewife has 1 litre of solution that contains milk and water in the ratio of 3:1. She adds 250 ml of 3:2 solution of milk and water to it and then uses 250 ml of the combined mixture to make curd. How much of pure milk is she left with?

a. 1000 ml b. 912.5 ml c. 750 ml d. 720 ml

Subject: MATHEMATICAL SCIENCES
Q1. Evaluate

(a) 0
(b) 
(c) 1
(d) does not exist

Q2. Find the least square approximation by a function of the form Ax+B to the function sin x on (0, ). Then A and B respectively are
(a) A=0, B=1 (b) A=1, B= (c) A=0, B=2/ (d) A=0, B=/2
Q3. Let f(x)= when 0 < x   and f(x)= when –  x  0. The Fourier series for this function is
(a) 1+cos x + cos 2x + cos 3x + ……………
(b) 1+sin x + sin 3x + ………………
(c) 1+sin x + +……….
(d) sin x + +……….
Q4. Suppose that F1=1, F2=1, F3=2, F4=3, F5=5 and in general Fn=Fn–1+Fn–2 for n  3. Then F1 + F2 + …. + Fn equals

(a) Fn+1–1 (b) Fn+2 (c) Fn+2–1 (d) Fn+1+Fn+2
Q5. Evaluate

(a) 2n–2
(b) n(n+1)2n
(c) n 22n–2
(d) none of these

Q6. A miner is trapped in a mine containing three doors. The first door leads to a tunnel that will take him to safety after 3 hours of travel. The second door leads to a tunnel that will return him to the mine after 5 hours of travel. The third door leads to a tunnel that will return him to the mine after 7 hours. If we assume that the miner is at all times equally likely to choose any one of the doors, the expected length of time until he reaches, safety is
(a) 15 hours
(b) 110 hours
(c) 30 hours
(d) 5 hours

Q7. The number of solutions of the equation
x1+x2+x3+x4=11,
where x1  7, 1  x2, x3  3 and 0  x4  3, is
(a) 10
(b) 98
(c) 112
(d) 210

Q8. The joint probability density function of X and Y is given by

f(x, y)=
P(X>1, Y<1) is

(a) e–1(1–e–2)
(b) e–1
(c) (1–e–2)
(d)

Q9. Mr and Mrs Thompson hosted a party for n married couples. Each person shook hands with everyone else, exclusing his or her spouse. No person shook hands with himself or herself. Let h(n) denote the number of handshakes made. The recurrence relation satisfied by h(n) is
(a) h(n)=h(n–1), h(0)=0 (b) h(n)=h(n–1)+h(n–2), h(0)=0
(c) h(n)=h(n–1)+ , h(0)=0 (d) h(n)=h(n–1)+4n, h(0)=0
Q10. Find the dual of the following problem:

Maximize Z=2x1+4x2+3x3
Subject to constraints:
3x1+4x2+2x360
2x1+x2+2x340
x1+3x2+2x380
x1, x2, x30
(a) Maximize Z=60y1+40y2+80y3
Subject to constraints:
3y1+2y2+y34

(b) Maximize Z=60y1+40y2+80y3
Subject to constraints:
3y1+4y2+2y360
2y1+y2+2y3  40
y1, y2, y30

(c) Minimize Z=60y1+40y2+80y3
Subject to constraints:
3y1+2y2+y32
4y1+y2+3y34
2y1+2y2+2y33

(d) None of these


Subject: Library & Information Science

1. Which process allows misplaced books to be restored on the shelves in a library.
a) Shelf study
b) Stock verification
c) Authority list
d) Shelf rectification

2. Which service refers the users to sources of information rather than providing information.
Reprography
a) Selective Dissemination of Information
b) Referral
c) Documentation

3. Which one is the correct spelling.
a) Encyclopedia Britainica
b) Encyclopedia Britannica
c) Encyclopedia Britainica
d) Encyclopaedia Britanica

4. Which document reports current activity of an organization
a) House journal
b) Handbook
c) Patent
d) Directory

5. MARC was the project of which organization.

a) Library of Congress, USA
b) National Library, Calcutta
c) British Library, UK
d) CNRS, France

6. Where is Indian National Science Academy (INSA) located
a) Bangalore
b) New Delhi
c) Calcutta
d) Pune

7. India’s information input centre for the International Nuclear Information
System is.
a) BARC
b) NISSAT
c) NISCAIR
d) Dept. of Atomic Energy

8. Which service supplies full text copy of a reference to the user.
a) Bibliography
b) Reprography
c) Document Delivery
d) Repackaging

9. Which of the following is a library software package
a) DELNET
b) NIC
c) UNESCO
d) Granthalaya

10. Translation service removes which of the following barrier.
a) Cultural
b) Political
c) Economic
d) Language








Tech Mahindra Exam Papers



Tech Mahindra Satyam Exam Papers









1. Replace the letters with numbers and solve the equation 4(ABCDE) = EDCBA.
answer is
(A) 87978 (B) 98765 (C) 56789 (D) 87912 (E) None of these

2. A certain number of workmen can do a piece of work in 25 days, in what time will another set of an equal number of men do a piece of work as great supposing that 2 men
of the first set can do as much work in a hour as 3 men in the second set can do in an hour

(A) 60 days (B)75 days (C) 90 days (D) 105 days (E) None of these

3. You are given two different length strings that have the characteristic that they both take exactly one hour to burn. However, neither string burns at a constant rate. Some sections of the strings burn very fast, other sections burn very slow. All you have to work with is a matches to calculate when exactly 45 minutes has elapsed.
(A) 1. Light both ends of the first string
2. Light end of the second string at the same time.
3. When the first string is finished burning,
4. Light the unlit end of the second string.
5. When the second string is finished burning exactly 45 minutes will have passed.
(B) 1. Light both ends of the second string
2. Light beginning of the first string at the same time.
3. when the first string is finished burning,
4. Light the unlit end of the second string.
5. When second string is finished burning exactly 45 minutes will have passed.
(C)
1. Light end of the first string
2. Light end of the second string at the same time
3. When the first string is finished burning,
4. Light the unlit end of the second string.
5. When the second string is finished burning exactly 45 minutes will have passed.
(D) None of these

4. Dot likes pots and pans but not cooks, She likes straw but not hay; she likes sagas but not poems. Does she like a star or a planet?
(A) Star (B) planet (C) More date required

5. In a certain code language, ‘851’ means ‘good sweet fruit’. ‘783’ means ‘good red rose’ and ‘341’ means ‘rose and fruit’. Which digit stands for ‘sweet’ in that language ?
(A) 8 (B) 5 (C) 1 (D) 3 (E) None of these

6. (i) Apron – Cap – Suit
(ii) Trot – Step – hop
(iii) Early – Late –Ago
(iv) Alone – Apiece – Another
(v) Rope – String – Ribbon
For each triplet (group if 3 words) above, have a common bond. Identify the triplets in which the words are not bonded.

(A) iii (B) ii (C) V (D) iv (E) None of these

7. A man hires a taxi to meet him at the railroad station at 3 p. m. to take him to an appointment. He catches an earlier train and arrives at 2 p.m. He decides to start walking, and is picked up en route by the taxi. He arrives twenty minutes early for his appointment. How long did he wald ?

(A) 45 minutes (B) 50 minutes (C) 24 minutes (D) 36 minutes (E) none of these

8.
Z4 X3 V9
A6 C2 ?
T5 R4 P15

(A) E10 (B) S10 (C) E12 (D) S12 (E) None of these

9. There are 10 statements written on a piece of paper:

1. At least one of statements 9 and 10 is ture.
2. This either is the first true or the first false statement.
3. There are three consecutive statements, which are false.
4. The difference between the numbers of the last true and the first true statement divides the number, that is to be found.
5. The sum of the numbers of the true statement divides the number, that is to be found.
6. This is not the last true statement.
7. The number of each true statement divides the number, that is to be found.
8. The number that is to be found is the percentage of true statements.
9. The number of divisors of the number, that is to be found,(apart from 1 and itself) is greater than the sum of the numbers of the true statements.
10. There are no three consecutive true statements.
What is the number?

(A) 420 (B) 520 (C) 415 (D) 515

10. “Lets hava some!” The kids around Betty as she checked the candies. “ Okey,
but I’ll have a few myself,” she told them. “It’s by age. A third of them for Bill, a
quarter for Eve, a fifth for Linda, and a sixth for Bruce. That leaves just six for me.”
How many were there in all?

(A) 200 (B) 180 (C) 120 (D) 90 (E) None of these


11. The question given below has problem and two statements numbered I and II giving
certain information. You have to decide if the information given in the statements is
sufficient for answering the problem. Indicate your answer as

(A) if the data in statement I alone is sufficient;
(B) if the data in statement II alone is sufficient;
(C) If the either in I and II alone is sufficient;
(D) If the data given in I and II are not sufficient;
(E) If the data given in I and II together are needed.

12. “At a party

(1) There were 9 men and children
(2) There were 2 more women than children
(3) The number of different man-woman couples possible was 24.
Of the three groups – men, women, and children
(4) There were 4 of one group
(5) There were 6 of one group
(6) There were 8 of one group”

Exactly one of the speaker’s statement is false. Which of (1) through (6) is false
(A) 3 (B) 5 (C) 4 (D)2 (E) none of these

13. Your pockets are tearing from the weight of all the coins in them. After you unload
them onto the kitchen table, you discover something surprising. You have exactly the
same number of pennies, nickels, dimes, and quarters, totaling $6.15. How many of
each coin do you have?

(A) 10 of each (B) 22 of each (C) 15 of each (D) 20 of each (E) None of these


14. I have ten boxes which I want to pack into crates. Each crate is capable of carrying a maximum of 25 kg. Unfortunately I only have three crates, and the total weight of the boxes is 75 kg:




How can I pack the boxes into crates so that each one has exactly 25 kg?

Crate 1 Crate 2 Crate 3
(A) 15, 10 11,13,1 9,8,4,2,2
(B) 15, 10 11,8,4,2 13,9,2,1
(C) 15,9,1 11,8,4,2 13,10,2
(D) all of these
(E) None of these

15. “What day do you go back to school, Henry?” asked his grandmother one day.
“Well,” Henry replied, “Nine days ago, the day before yesterday was three weeks before the second day of term.” If Henry had this conversation on a Sunday, what day of the week did he start school?

(A) Monday (B) Wednesday (

NOVELL COMPANY EXAM PAPERS





NOVEL COMPANY EXAMPAPERS




1). A beggr collects cigarette stubs and makes one ful cigarette
with every 7 stubs. Once he gets 49 stubs . How many cigarettes
can he smoke totally.
Ans. 8

2). A soldiar looses his way in a thick jungle at random walks
from his camp but mathematically in an interestingg fashion.
First he walks one mile east then half mile to north. Then 1/4
mile to west, then 1/8 mile to south and so on making a loop.
Finally hoe far he is from his camp and in which direction.
ans: in north and south directions
1/2 - 1/8 + 1/32 - 1/128 + 1/512 - and so on
= 1/2/((1-(-1/4))
similarly in east and west directions
1- 1/4 + 1/16 - 1/64 + 1/256 - and so on
= 1/(( 1- ( - 1/4))
add both the answers

3). hoe 1000000000 can be written as a product of two factors
neither of them containing zeros
Ans 2 power 9 x 5 ppower 9 ( check the answer )

4). Conversation between two mathematcians:
first : I have three childern. Thew pproduct of their ages is 36
. If you sum their ages . it is exactly same as my neighbour's
door number on my left. The sacond mathematiciaan verfies the
door number and says that the not sufficient . Then the first
says " o.k one more clue is that my youngest is the youngest"
Immmediately the second mathematician answers . Can you aanswer
the questoion asked by the first mathematician?
What are the childeren ages? ans 2 and 3 and 6

5). Light glows for every 13 seconds . How many times did it
between 1:57:58 and 3:20:47 am
ans : 383 + 1 = 384

6). 500 men are arranged in an array of 10 rows and 50 columns .
ALL tallest among each row aare asked to fall out . And the
shortest among THEM is A. Similarly after resuming that to their
originaal podsitions that the shorteest among each column are
asked to fall out. And the longest among them is B . Now who is
taller among A and B ?
ans A

7). A person spending out 1/3 for cloths , 1/5 of the remsaining
for food and 1/4 of the remaining for travelles is left with
Rs 100/- . How he had in the begining ?
ans RS 250/-

8). there are six boxes containing 5 , 7 , 14 , 16 , 18 , 29
balls of either red or blue in colour. Some boxes contain only
red balls and others contain only blue . One sales man sold one
box out of them and then he says " I have the same number of red
balls left out as that of blue ". Which box is the one he solds
out ?
Ans : total no of balls = 89 and (89-29 /2 = 60/2 = 30
and also 14 + 16 = 5 + 7 + 18 = 30

9). A chain is broken into three pieces of equal lenths
conttaining 3 links each. It is taken to a backsmith to join into
a single continuous one . How many links are to tobe opened to
make it ?
Ans : 2.

10). Grass in lawn grows equally thickand in a uniform rate. It
takes 24 days for 70 cows and 60 for 30 cows . How many cows can
eat away the same in 96 days.?
Ans : 18 or 19

11). There is a certain four digit number whose fourth digit is
twise the first digit.
Third digit is three more than second digit.
Sum of the first and fourth digits twise the third number.
What was that number ?
Ans : 2034 and 4368

If you qualify in the first part then you have to appear for
the second i.e the following part.
Part 2.

1. From a vessel on the first day, 1/3rd of the liquid
evaporates. On the second day 3/4th of the remaining liquid
evaporates. what fraction of the volume is present at the end of
the II day.

2. an orange galss has orange juice. and white glass has apple
juice. Bothe equal volume 50ml of the orange juice is taken and
poured into the apple juice. 50ml from the white glass is poured
into the orange glass. Of the two quantities, the amount of
apple juice in the orange glass and the amount of orange juice in
the white glass, which one is greater and by how much?

3. there is a 4 inch cube painted on all sides. this is cut
into no of 1 inch cubes. what is the no of cubes which have no
pointed sides.

4. sam and mala have a conversation. sam says i am vertainly not
over 40. mala says i am 38 and you are atleast 5 years older
than me. Now sam says you are atleast 39. all the sattements by
the two are false. How hold are they realy.

5. ram singh goes to his office in the city, every day from his
suburbun house. his driver mangaram drops him at the railway
station in the morning and picks him up in the evening. Every
evening ram singh reaches the station at 5 o'clock. mangaram
also reaches at the same time. one day ramsingh started early
from his office and came to the station at 4 o'clock. not
wanting to wait for the car he starts walking home. Mangaram
starts at normal time, picks him up on the way and takes him back
house, half an hour early. how much time did ram singh walk.

6. in a railway station, there are tow trains going. One in the
harbour line and one in the main line, each having a frequency of
10 minutes. the main line service starts at 5 o'clock. the
harbour line starts at 5.02a.m. a man goes to the station every
day to catch the first train. what is the probability of man
catchinhg the first train

7. some people went for vaction. unfortunately it rained for 13
days when they were there. but whenever it rained in the
morning, they had clean afternood and vice versa. In all they
enjoyed 11 morning and 12 afternoons. how many days did they
stay there totally

8. exalator problem repeat

9. a survey was taken among 100 people to firn their preference
of watching t.v. programmes. there are 3 channels. given no of
people who watch
at least channel 1
" " 2
" " 3
no channels at all
atleast channels 1and 3
" " 1 and 2
" " 2 and 3
find the no of people who watched all three.

10. albert and fernandes they have two leg swimming race. both
start from opposite and of the pool. On the first leg, the boys
pass each other at 18 mt from the deep end of the pool. during
the II leg they pass at 10 mt from the shallow end of the pool.
Both go at const speed. but one of them is faster. each boy
rests for 4 sec to see at the end of the i leg. what is the
length of the pool.

11. T H I S Each alphabet stands for one
I S digit, what is the maximum value T
-------------- can take
X F X X
X X U X
--------------
X X N X X
--------------

THE BASIC LANGUAGE




EXAMPAPERS123.BLOGSPOT.COM







CHAPTER-1

THE BASIC LANGUAGE
 object oriented programming.
 Encapsulation.
 Polymorphism.
 CPP Preprocessor Directives.
 Comments.
 CPP Data types
 New and Delete expressions
 Type convertions.

Object oriented programming: It is nothing but doing the programs with the help of objects. So first of all we have to know what is an object? How it is implemented in C++ programming? All these details are given below.
CLASS: A class is an expanded concept of a data structure: instead of holding only data, it can hold both data and functions.
OBJECT: An object is an instantiation of a class. In terms of variables, a class would be the type, and an object would be the variable.
By using these objects we can access the class members and member functions.

Classes are generally declared using the keyword class, with the following format:

class class_name { EG: Class student
access_specifier_1: {
member 1; charname;
access_specifier_2: int marks:
member2; float average;
… }s;
}object_names;
In the above eg: “S” is the object so we can access entire data of the student class by using this object. We discuss about the access specifiers in next concepts in detail.
Encapsulation: Wrapping up of a data in to single logical unit (i.e class) is called encapsulation. So writing class is known as encapsulation.
Polymorphism: Simply it one thing different actions let me explain consider one person (assume that person is one thing)he will exibit different actions depending on situations like his son called him daddy, and his father called him son, and his wife called him husband. I will explain how this concept is implemented in our C++ concepts in the upcoming chapters.

Inheritance: A key feature of C++ classes is inheritance. Inheritance allows creating classes which are derived from other classes, so that they automatically include some of its “parent’s” members, plus its own. We will the implementation of this concept in detail in the upcoming chapters.

Comments: comments are parts of the source code disregarded by the compiler. They simply do nothing. Their purpose is only to allow the programmer to insert notes or descriptions embedded within the source code.

C++ supports two ways to insert comments:
//line comment
/*block comment*/
The first of them, known as line comment, discards everything from where the pair of slash signs (//) is found up to the end of that same line. The second one, known as block comment, discards everything between the /* characters and the first appearance of the */ characters, with the possibility of including more than one line.

We are going to add comments to our second program:

If you include comments within the source code of your programs with out using the comment characters combinations //,/* or */, the compiler will take them as if they were C++ expressions, most likely causing one or several error messages when you compile it.
Preprocessor directives:
Preprocessor directives are lines included in the code of our programs that are not program statements but directives for the processor. These lines are always preceded by a pound sign (#). The processor executed before the actual compilation of code is generated by the statements.

These processor directives extend only across a single line of code. As soon as a new line character is found, the processor directive is considered to end. No semicolon (;) is expected at the end of a preprocessor directive. The only way a preprocessor directive can extend through more than one line is by preceding the new line character at the end of the line by a backslash (/)
Macro definitions (#define, #undef)
To define preprocessor macros we can use #define. Its formate is:
#define identifier replacement.
When the preprossor encounters this direccive , it replaces any occurrence of identifier in the rest of the code by replacement can be an expresson , a statement , a block or simply anything. The processor does not understand C++ , it simply replaces any occurrence of identifier by replacement.
#define TABLE_SIZE 100
int table 1 [TABLE_SIZE];
int table 2 [TABLE_SIZE];
After the preprocessor has replased TABLE_SIZE, the code becomes equivalent to:
Int table 1[100];
Int table 2 [100];
This use of #define as constant definer is already known by us from previous tutorials, but #define can work also with parameters to define function macros:
#define getmax(a,b) a>b?a:b
This would replace any occurrence of getmax followed by two arguments by the replacement expression, but also replasing each argument by its identifier, exactly as you would expect if it was a function:
//function macro
#include
Using namespace std;

#define getmax(a,b) ((a)>(b)?(a)b))

Int main()
{
Int x=5,y;
Y=getmax (x,2);
Cout << y << endl;
Cout<< getmax(7,x)<Return 0;
} 5
6

Defined macros are not affected by block structure. A macro lasts until it is undefined with the #undef preprosseor directive:
#define table_size 100
Int table 1 [TABLE_SIZE];
#undef TABLE_SIZE
#define TABLE_SIZE 200
Int table 2[TABLE_SIZE];

This would generate the same code as:

Int table 1 [100];
Int table 2 [200];

Function macro definitions accept two special operators (# and ##) in the replacement sequence:
If the operator # is used before a parameter is used in the replacement sequence, that parameter is replaced by a string literal (as if it were enclosed between double quotes)
#define str (x) #x
Cout << str(test);

This would be translated into:
Cout << “test”;

The operator ## concatenates two arguments leaving no blank spaces between them:

#define glue(a,b) a## b
Glue(c,out) << ”test’’;
This would also be translated into:
Cout <<”test”;
Because preprocessor replacements happen before any C++ syntax check, macro definitions can be a tricky feature, but be careful: code that relies heavily on complicated macros may result obscure to other programmers, since the syntax they expect in C++.
Conditional inclusions (#ifdef, #ifdef, #if, #endif, #else and #elif)
These directives allow to include or discard part of the code of a program if a certain condition is met.
#ifdef allowes a section of a program to be compiled only if the macro that is specified as the parameter has been defined, no matter which its value is. For example:
#ifdef TABLE_SIZE
int table [TABLE_SIZE];
#endif
In this case , the line of code int table [TABLE_SIZE]; is only compiled if TABLE_SIZE was previously defined with #define, independently of its value. If it was not defined, that line will not be included in the program compilation.
#ifndef serves for the exact opposite: the code between #indef and #endif directives is only compiled if the specified identifier has not been previously defined. For example:
#ifndef TABLE_SIZE
#define TABLE_SIZE 100
#endif
Int table [TABLE_SIZE];


/* My second program in C++
With more comments*/
#include
Using namespace std;
Int main ()
{
Cout <<”Hello World!”; // prints Hello World!
Count <<”I’m a C++program”; //prints I’m a C++ program
return 0;
} Hellow World! I’m a C++ program

Friday, May 28, 2010

DATA MINING AND DATA WARE HOUSE

DATA MINING AND DATA WARE HOUSE
Many software projects are accumulated by a great deal of data, so we really need information about the effective maintenance and reteving of data from the database. The newest, hottest technology to address these concerns is data mining and data warehousing.
Data Mining is the process of automated extraction of predictive information from large databases. It predicts future trends and finds behavior that the experts may miss as it lies beyond their expectations. Data Mining is part of a larger process called knowledge discovery, specifically, the step in which advanced statistical analysis and modeling techniques are applied to the data to find useful patterns and relationships.
Data warehousing takes a relatively simple idea and incorporates it into the technological underpinnings of a company. The idea is that a unified view of all data that a company collects will help improve operations. If hiring data can be combined with sales data, the idea is that it might be possible to discover and exploit patterns in the combined entity.
This paper will present an overview of the different process and advanced techniques involving in data mining and data warehousing.




1. Introduction to Data Mining:
Data mining can be defined as "a decision support process in which we search for patterns of information in data." This search may be done just by the user, i.e. just by performing queries, in which case it is quite hard and in most of the cases not comprehensive enough to reveal intricate patterns. Data mining uses sophisticated statistical analysis and modeling techniques to uncover such patterns and relationships hidden in organizational databases - patterns that ordinary methods might miss. Once found, the information needs to be presented in a suitable form, with graphs, reports, etc.
1.1 Data Mining Processes
From a process-oriented view, there are three classes of data mining activity: discovery, Predictive modeling and forensic analysis, as shown in figure below.
Discovery is the process of looking in a database to find hidden patterns without a
Predetermined idea or hypothesis about what the patterns may be. In other words, the
program takes the initiative in finding what the interesting patterns are, without the user
thinking of the relevant questions first.

Discovery is the process of looking in a database to find hidden patterns without a predetermined idea or hypothesis about what the patterns may be. In other words, the program takes the initiative in finding what the interesting patterns are, without the user thinking of the relevant questions first.
In predictive modeling patterns discovered from the database are used to predict the future. Predictive modeling thus allows the user to submit records with some unknown field values, and the system will guess the unknown values based on previous patterns discovered from the database. While discovery finds patterns in data, predictive modeling applies the patterns to guess values for new data items.
Forensic analysis:
This is the process of applying the extracted patterns to find anomalous or
unusual data elements. To discover the unusual, we first find what is the norm, and then
we detect those items that deviate from the usual within a given threshold. Discovery
helps us find "usual knowledge," but forensic analysis looks for unusual and specific
cases.
1.2 Data Mining Users and Activities
Data mining activities are usually performed by three different classes of users - executives, end users and analysts.
Executives need top-level insights and spend far less time with computers than the
other groups.
End users are sales people, market researchers, scientists, engineers, physicians,
etc.
Analysts may be financial analysts, statisticians, consultants, or database designers.
These users usually perform three types of data mining activity within a corporate environment: episodic, strategic and continuous data mining.
In episodic mining we look at data from one specific episode such as a specific direct marketing campaign. We may try to understand this data set, or use it for prediction on new marketing campaigns. Analysts usually perform episodic mining.
In strategic mining we look at larger sets of corporate data with the intention of gaining an overall understanding of specific measures such as profitability.
In continuous mining we try to understand how the world has changed within a given time period and try to gain an understanding of the factors that influence change.

1.3 Data Mining Applications:
Virtually any process can be studied, understood, and improved using data mining. The top three end uses of data mining are, not surprisingly, in the marketing area.
Data mining can find patterns in a customer database that can be applied to a prospect database so that customer acquisition can be appropriately targeted. For example, by identifying good candidates for mail offers or catalogs direct-mail marketers can reduce expenses and increase their sales. Targeting specific promotions to existing and potential customers offers similar benefits.
Market-basket analysis helps retailers understand which products are purchased together or by an individual over time. With data mining, retailers can determine which products to stock in which stores, and even how to place them within a store. Data mining can also help assess the effectiveness of promotions and coupons.
Another common use of data mining in many organizations is to help manage customer relationships. By determining characteristics of customers who are likely to leave for a competitor, a company can take action to retain that customer because doing so is usually far less expensive than acquiring a new customer.
Fraud detection is of great interest to telecommunications firms, credit-card companies, insurance companies, stock exchanges, and government agencies. The aggregate total for fraud losses is enormous. But with data mining, these companies can identify potentially fraudulent transactions and contain the damage.
Financial companies use data mining to determine market and industry characteristics as well as predict individual company and stock performance. Another interesting niche application is in the medical field: Data mining can help predict the effectiveness of surgical procedures, diagnostic tests, medications, service management, and process control.
1.4 Data Mining Techniques:
Data Mining has three major components Clustering or Classification, Association Rules and Sequence Analysis.


1.4.1 Classification:
The clustering techniques analyze a set of data and generate a set of grouping rules that can be used to classify future data. The mining tool automatically identifies the
clusters, by studying the pattern in the training data. Once the clusters are generated, classification can be used to identify, to which particular cluster, an input belongs. For example, one may classify diseases and provide the symptoms, which describe each class or subclass.
1.4.2 Association:
An association rule is a rule that implies certain association relationships among a set of objects in a database. In this process we discover a set of association rules at multiple levels of abstraction from the relevant set(s) of data in a database. For example, one may discover a set of symptoms often occurring together with certain kinds of diseases and further study the reasons behind them.
1.4.3 Sequential Analysis:
In sequential Analysis, we seek to discover patterns that occur in sequence. This
deals with data that appear in separate transactions (as opposed to data that appear in the same transaction in the case of association) e.g. if a shopper buys item A in the first week of the month, and then he buys item B in the second week etc.
1.4.4 Neural Nets and Decision Trees:
For any given problem, the nature of the data will affect the techniques you choose. Consequently, you'll need a variety of tools and technologies to find the best possible model. Classification models are among the most common, so the more popular ways for building them have been explained here. Classifications typically involve at least one of two workhorse statistical techniques - logistic regression (a generalization of linear regression) and discriminate analysis. However, as data mining becomes more common, neural nets and decision trees are also getting more consideration. Although complex in their own way, these methods require less statistical sophistication on the part of the user.
Neural nets use many parameters (the nodes in the hidden layer) to build a model that takes and combines a set of inputs to predict a continuous or categorical variable.

Source: "Introduction to Data Mining and Knowledge Discovery" by "Two Crows Corporation"

The value from each hidden node is a function of the weighted sum of the values from all the preceding nodes that feed into it. The process of building a model involves finding the connection weights that produce the most accurate results by "training" the neural net with data. The most common training method is back propagation, in which the output result is compared with known correct values. After each comparison, the weights are adjusted and a new result computed. After enough passes through the training data, the neural net typically becomes a very good predictor.
Decision trees represent a series of rules to lead to a class or value. For example, you may wish to classify loan applicants as good or bad credit risks. Figure below shows a simple decision tree that solves this problem. Armed with this tree and a loan application, a loan officer could determine whether an applicant is a good or bad credit risk. An individual with "Income > $40,000" and "High Debt" would be classified as a "Bad Risk," whereas an individual with "Income < $40,000" and "Job > 5 Years" would be classified as a "Good Risk."


Decision trees have become very popular because they are reasonably accurate and, unlike neural nets, easy to understand. Decision trees also take less time to build than neural nets. Neural nets and decision trees can also be used to perform regressions, and some types of neural nets can even perform clustering.
2.1 Introduction to Data warehousing:
In the current knowledge economy, it is now an indisputable fact that information is the key to organizations for gaining competitive advantage. Organizations very well know that the vital information for decision making is lying in its databases. Mountains of data are getting accumulated in various databases scattered around the enterprise. But the key to gaining competitive advantage lies in deriving insight and intelligence out of these data. Data warehousing helps in integrating categorizing, codifying and arranging the data from all parts of an enterprise.
According to Bill Inmon, known as the father of Data warehousing, The concept of data warehouse is depicted as figure




2.1.1 Subject oriented data:
All relevant data about a subject is gathered and stored as a single set in a useful format.
2.1.2 Integrated data:
Data is stored in a globally accepted fashion with consistent naming conventions, measurements, encoding structures and physical attributes, even when the underlying operational system store the data differently.


2.1.3 Non-volatile data:
The data warehouse is read-only, data is loaded in to the data warehouse and accesses there.
2.1.4 Time-variant data:
This long term data is from 5 to 10 years as opposed to the 30-60 days of operational data.
2.2 Structure of data warehouse:

The design of the data architecture is probably the most critical part of a data warehousing project. The key is to plan for growth and change, as opposed to trying to design the perfect system from the start. The design of the data architecture involves understanding all of the data and how different pieces are related. For example, payroll data might be related to sales data by the ID of the sales person, while the sales data might be related to customers by the customer ID. By connecting these two relationships, payroll data could be related to customers (e.g., which employees have ties to which customers).
Once the data architecture has been designed, you can then consider the kinds of reports that you are interested in. You might want to see a breakdown of employees by region, or a ranked list of customers by revenue. These kinds of reports are fairly simple. The power of a data warehouse becomes more obvious when you want to look at links between data associated with disparate parts of a organization (e.g., HR, accounts payable, and project management).
2.3 Benefits of Data warehousing:
Cost avoidance benefits.
Higher productivity.
Benefits through better analytical capability.
Manage business complexity.
Leverage on their existing investments.
End user spending.
Spending on e-business.
Accessibility and easy of use.
Real time information and analysis.
2.4 Techniques by different organization on efficient data warehouse:
That being said, most decisions to build data warehouses are driven by non-HR needs. Over the past decade, back office (supply chain) and front office (sales and marketing) organizations have spearheaded the creation of large corporate data warehouses. Improving the efficiency of the supply chain and competition for customers rely on the tactical uses that a data warehouse can provide. The key for other organizations, including HR, is to be involved in the creation of the warehouse so that their meets can be met by any resulting system.  This usually happens because both the data volume and question complexity have grown beyond what the current systems can handle. At that point the business becomes limited by the information that users can reasonably extract from the data system.
3.Conclusion:
Data mining offers great promise in helping organizations uncover hidden patterns in their data. However, data mining tools must be guided by users who understand the business, the data, and the general nature of the analytical methods involved. Realistic expectations can yield rewarding results across a wide range of applications, from improving revenues to reducing costs.
Building models is only one step in knowledge discovery. It's vital to collect and prepare the data properly and to check models against the real world. The "best" model is often found after building models of several different types and by trying out various technologies or algorithms.
The data mining area is still relatively young, and tools that support the whole of the data mining process in an easy to use fashion are rare. However, one of the most important issues facing researchers is the use of techniques against very large data sets. All the mining techniques are based on Artificial Intelligence, where they are generally executed against small sets of data, which can fit in memory. However, in data mining applications these techniques must be applied to data held in very large databases. These include use of parallelism and development of new database oriented techniques. However, much work is required before data mining can be successfully applied to large data sets. Only then will the true potential of data mining be able to be realized.
The data warehousing is the hottest concept for many software professionals to over come the sophisticated data to be managed efficiently. The data warehouse is repository (or archive) of information gathered from multiple sources, stored under a unified scheme, at a single site. Once gathered the data are stored for a long time permitting access to historical data. Thus, data ware houses provide the user a single consolidated interface to data, making decision support actions easier to implement. In the world of highly interconnected networks the data obtained or used by many companies would be very large and the maintenance becomes difficult and costly. So, the efficient data warehousing is to be implemented to obtain data from different branches (all over the world) and maintain it for providing information to all other branches (which does not have the concerned data).

Genetic Algorithms

Abstract
In this paper we explore the use of an adaptive search technique (genetic algorithms) to construct a system GABIL which continually learns and refines concept classification rules from its interaction with the environment. The performance of the system is measured on a set of concept learning problems and compared with the performance of two existing systems: ID5R and C4.5. Preliminary results support that, despite minimal system bias, GABIL is an effective concept learner and is quite competitive with ID5R and C4.5 as the target concept increases in complexity.
1 Introduction
An important requirement for both natural and artificial organisms is the ability to acquire concept classification rules from interactions with their environment. In this paper we explore the use of an adaptive search technique, namely Genetic Algorithms (GAs), as the central mechanism for building a system which continually learns and
refines concept classification rules from its interaction with the environment. We show how concept learning tasks can be represented and solved by GAs, and we provide empirical results, which illustrate the performance of GAs relative to more traditional methods. Finally, we discuss the advantages and disadvantages of this approach and describe future research activities.
2 Concept Learning Problems
Supervised concept learning involves inducing concept descriptions for the concepts to be learned from a set of positive and negative examples of the target concepts. Examples are represented as points in an n-dimensional feature space, which is defined a priori and for which all the legal values of the features are known. Concepts are therefore represented as subsets of points in the given dimensional space.
A concept learning program is presented with both a description of the feature space and a set of correctly classified examples of the concepts, and is expected to generate a reasonably accurate description of the (unknown) concepts. Since concepts can be arbitrarily complex subsets of a feature space, an important issue is the choice of the concept description language. The language must have sufficient expressive power to describe large subsets succinctly and yet be able to capture irregularities. The two language forms generally used are decision trees [Quinlan, 1986] and rules [Michalski, 1983]. Another important issue arises from the situation that there is a large (possibly infinite) set of concept descriptions, which are consistent with any particular finite set of examples. This is generally resolved by introducing either explicitly or implicitly a bias (preference) for certain kinds of descriptions (e.g., shorter or less complex descriptions may be preferred). Finally, there is the difficult issue of evaluating and comparing the performance of concept learning algorithms. The most widely used approach is a batch mode in which the set of examples is divided into training set and a test set. The concept learner is required to produce a concept description from the training examples. The validity of the description produced is then measured by the percentage of correct classifications made by the system on the second (test) set of examples during which no
further learning takes place. The alternative evaluation approach is an incremental mode in which the concept learner is required to produce a concept description from the examples seen so far and to use that description to classify the next incoming example. In this mode learning never stops, and evaluation is in terms of learning curves, which measure the predictive performance of the concept learner over time. This incremental and continuous model of concept learning matches more closely the kind of concept learning that an organism performs as it explores a complex and changing world. Consequently, we use predictive learning curves as our evaluation methodology.
3 Genetic Algorithms and Concept Learning
In order to apply GAs to a particular problem, we need to select an internal representation of the space to be searched and define an external evaluation function, which assigns utility to candidate solutions. Both components are critical to the successful application of the GAs to the problem of interest. 1
3.1 Representing the Search Space
The traditional internal representation used by Gas involves using fixed-length (generally binary) strings to represent points in the space to be searched. However, such representations do not appear well suited for representing the space of concept descriptions, which are generally symbolic in nature, which have both syntactic and semantic constraints, and which can be of widely varying length and complexity.There are two general approaches one might take to resolve this issue. The first involves changing the fundamental GA operators (crossover and mutation) to work effectively with complex non-string objects [Rendell, 1985]. This must be done carefully in order to preserve the properties, which make the GAs effective adaptive search procedures. Alternatively, one can attempt to construct a string representation, which minimizes any changes to the GAs. We are interested in pursuing both approaches. Our ideas on the first approach will be discussed briefly at the end of the paper. In the following sections we will describe our results using the second approach in which we try to apply classical GAs with minimal changes.
3.2 Defining Fixed-length Classifier Rules
Our approach to choosing a representation, which results in minimal changes to the standard GA operators, involves carefully selecting the concept description language. A natural way to express complex concepts is as a disjunctive set of (possibly overlapping) Classification rules. The left-hand side of each rule (distinct) consists of a conjunction of one or more tests involving feature values. The right-hand side of a rule indicates the concept (classification) to be assigned to the examples, which match its left-hand side. Collectively, a set of such rules can be thought of as representing the (unknown) concepts if the rules correctly classify the elements of the feature space. If we allow arbitrarily complex terms in the conjunctive left-hand side of such rules, we will have a very powerful description language, which will be difficult to represent as strings. However, by restricting the complexity of the elements of the conjunctions, we are able to use a string representation and standard GAs, with the only negative side effect that more rules may be required to express the concept. This is achieved by restricting each
element of a conjunction to be a test of the form: return true if the value of feature i of the example is in the given value set; return false otherwise.
For example, a rule might take the following symbolic form: "if (F2 = large) and (F5 = tall or thin) then it’s a widget". Since the left-hand sides are conjunctive forms
with internal disjunction, there is no loss of generality by requiring that there be at most one test for each feature (on the left hand side of a rule). With these restrictions we can now construct a fixed-length internal representation for classifier rules. Each fixed-length rule will have N feature tests, one for each feature. A fixed-length binary string, the length of which will depend of the type of feature (nominal, ordered, etc.) will represent each feature test. For simplicity, the examples used in this paper will involve features with nominal values. In this case we use k bits for the k values of a nominal feature. So, for example, if the legal values for F1 were the days of the week, then the pattern 0111110 would represent the test for F1 being a weekday. As an example, the left-hand side of a rule for a 5-feature problem would be represented internally as:
F1 F2 F3 F4 F5
0110010 1111 01 111100 11111
Notice that a feature test involving all 1’s matches any value of a feature and is equivalent to "dropping" that conjunctive term (i.e., the feature is irrelevant). So, in the
above example only the values of F1, F3, and F4 are relevant. For completeness, we allow patterns of all 0’s which match nothing. This means that any rule containing
such a pattern will not match (cover) any points in the feature space. While rules of this form are of no use in the final concept description, they are quite useful as storage areas for GAs when evolving and testing sets of rules. The right-hand side of a rule is simply the class (concept) to which the example belongs. This means that our "classifier system" is a "stimulus-response" system with no message passing.
3.3 Evolving Sets of Classifier Rules
Since a concept description will consist of one or more classifier rules, we still need to specify how Gas will be used to evolve sets of rules. There are currently two basic strategies: the Michigan approach exemplified by Holland’s classifier system [Holland, 1986], and the Pittsburgh approach exemplified by Smith’s LS-1 system [Smith, 1983]. Systems using the Michigan approach maintain a population of individual rules, which compete with each other for space and priority in the population. In contrast, systems using the Pittsburgh approach maintain a population of variable-length rule sets, which compete with each other with respect to performance on the domain task. Very little is currently known concerning the relative merits of the two approaches. In this paper we report on results obtained from using the Pittsburgh approach.2 that is, each individual in the population is a variable length string representing an unordered set of fixed-length rules (disjuncts). The number of rules in a particular individual is unrestricted and can range from 1 to a very large number depending on evolutionary pressures. Our goal was to achieve a representation that required minimal changes to the fundamental genetic
operators. We feel we have achieved this with our variable-length string representation involving fixedlength rules. Crossover can occur anywhere (i.e., both on rule boundaries and within rules). The only requirement is that the corresponding crossover points on the two parents "match up semantically". That is, if one parent is being cut on a rule boundary, then the other parent must be also cut on a rule boundary. Similarly, if one parent is being cut at point 5 bits to the right of a rule boundary, then the other parent must be cut in a similar spot (i.e., 5 bits to the right of some rule boundary). The mutation operator is unaffected and performs the usual bit-level mutations.
3.4 Choosing a Payoff Function
In addition to selecting a good representation, it is important to define a good payoff function, which rewards the right kinds of individuals. One of the nice features of using GAs for concept learning is that the payoff function is the natural place to centralize and make explicit any biases (preferences) for certain kinds of concept descriptions. It also makes it easy to study the effects of different biases by simply making changes to the payoff function. For the experiments reported in this paper, we wanted to minimize any a priori bias we might have. So we selected a payoff function involving only classification performance (ignoring, for example, length and complexity biases). The payoff (fitness) of each individual rule set is computed by testing the rule set on the current set of examples and letting:
payoff (individual I) = (percent correct) 2
This provides a non-linear bias toward correctly classifying all the examples while providing differential reward for imperfect rule sets.
3.5 The GA Concept Learner
Given the representation and payoff function described above, a standard GA can be used to evolve concept descriptions in several ways. The simplest approach involves using a batch mode in which a fixed set of examples is presented, and the GA must search the space of variable-length strings described above for a set of rules, which achieves a score of 100%. We will call this approach GABL (GA Batch concept Learner). The simplest way to produce an incremental GA concept learner is to use GABL incrementally in the following way. The concept learner initially accepts a single
example from a pool of examples. GABL is used to create a 100% correct rule set for this example. This rule set is used to predict the classification of the next example.
If the prediction is incorrect, GABL is invoked to evolve a new rule set using the two examples. If the prediction is correct, the example is simply stored with the previous example and the rule set remains unchanged. As each new additional instance is accepted, a prediction is made, and the GA is re-run in batch if the prediction is incorrect. We refer to this mode of operation as batch incremental and we refer to the GA batch-incremental concept learner as GABIL.
4 Initial Experiments
The experiments described in this section are designed to demonstrate the predictive performance of GABIL as a function of incremental increases in the size and complexity of the target concept. We invented a 4-feature world in which each feature has 4 possible distinct values (i.e., there are 256 instances in this world). This means that rules map into 16-bit strings and the length of individual rule sets is a multiple of 16. In addition to studying the behavior of GABIL as a function of increasing complexity, we were also interested in comparing its performance with an existing algorithm. ID5R [], which is a well-known incremental concept-learning algorithm, was chosen for comparison. ID5R uses decision trees as the description language and always produces a decision tree consistent with the instances seen.
We constructed a set of 12 concept learning problems, each consisting of a single target concept of increasing complexity. We varied the complexity by increasing
both the number of rules (disjuncts) and the number of relevant features per rule (conjuncts) required to correctly describing the concepts. The number of disjuncts ranged
from 1 to 4, while the number of conjuncts ranged from 1 to 3. Each target concept is labeled as nDmC, where n is the number of disjuncts and m is the number of conjuncts.
Each target concept is associated with one experiment. Within an experiment the number of disjuncts and conjuncts for the target concept remains fixed. The variation in target concept occurs between experiments. For each of the concepts, a set of 256 unique, noise free examples was generated from the feature space and labeled as positive or negative examples of the target concept. For he more complex concepts, this resulted in learning primarily from negative examples. For each concept, the 256 examples were randomly
Shuffled and then presented sequentially as described above. This procedure was repeated 10 times for each concept and for each learning algorithm. The performance
curves presented are the average behavior exhibited over 10 runs. In particular, they will remain at 100% indefinitely only when the algorithms have correctly learned the target concept. The graphs indicate that, on the simpler concepts, the predictive performance of ID5R improves more rapidly than that of GABIL. However, ID5R degrades in performance, as the target concept becomes more complex, with significant deterioration in predictive power.
It is not always possible for ID5R to make a prediction based on the decision tree. If it cannot use the tree to predict, we let ID5R make a random prediction. The performance of GABIL, on the other hand, is relatively insensitive to the increase in concept complexity, resulting in significantly better predictive capability than ID5R already on 4 distinct concepts. The analysis below suggests that this trend will continue with even larger numbers of disjuncts and conjuncts. We were surprised to see ID5R suffer the most on the 4D1C target concept, since syntactically the concept is only moderately complex. The target concept is of the form:
If (F1 = 0001) or (F2 = 0001) or (F3 = 0001) or
(F4 = 0001) then it’s positive
Although it is natural to expect that a simple target concept (from a syntactic viewpoint) would have a small decision tree representation, this is only a rough generalization.
This target concept is represented by ID5R as a decision tree of over 150 nodes. In fact, a unique leaf node in the decision tree represents each negative example. For this reason, ID5R cannot generalize over the negative examples, and has a good chance of predicting any negative example incorrectly. Furthermore, even the positive examples are not generalized well, resulting in prediction errors for positive examples. It is clear that the decision tree representation is poor for representing this particular concept. Target concept 4D1C represents a worst case, which explains why the difference between GABIL and ID5R is greatest for this concept. A similar situation occurs for target concepts 3D1C, 4D2C, and 4D3C, although to a lesser degree. ID5R relies upon Quinlan’s information theoretic entropy measure to build its decision trees.
5 Further Analysis and Comparisons
Having characterized the behavior of GABIL in this controlled concept world, we have begun to extend the analysis to more complex and challenging problems. One of our first steps was to look at the family of multiplexor problems introduced to the machine learning community by Wilson [Wilson, 1987]. Multiplexor problems fall into the general area of trying to induce a description of an input boolean function from input/output examples. Because no single individual input line is useful in distinguishing
class membership, information-theoretic approaches like Quinlan’s ID3 system have a particularly hard time inducing decision trees for multiplexor problems. Wilson’s work indicated that his GA-based classifier system Boole did not have such difficulties. Quinlan addressed some of these issues in the development of his C4 system. Quinlan subsequently reported that C4 outperforms Boole on the multiplexor Since we had access to C4.5 (a successor to C4 [Quinlan, 1989]), we felt that a direct comparison of
GABIL and C4.5 on multiplexor problems would be enlightening. Since C4.5 is a batch-mode system, we have to run it in a batch-incremental mode in the same manner as GABIL in order to provide meaningful comparisons. This can be achieved by running C4.5 in batch mode for every new instance seen, and using the resulting decision tree to predict the class of the next instance. The 6-input multiplexor problem has 6 features in
Which each feature has 2 possible distinct values (i.e., there are 64 instances in this world). This means that rules map into 12-bit strings and the length of individual rule
sets are a multiple of 12. For this concept we randomly generated a set of 200 examples from the feature space, each example labeled positive or negative. Since there are only 64 possible unique examples, the set does not contain unique examples, although they are noise free. This methodology allows for direct comparison with Quinlan’s reported results.
The set of 200 examples was randomly shuffled and then presented sequentially. This procedure was repeated 10 times for both learning algorithms. The performance curves presented in Figure 3 are the average behavior exhibited over 10 runs. GABIL clearly outperforms C4.5 on the 6-input multiplexor problem. As noted above, the weaker performance of C4.5 is not due to the choice of representation (decision tree). In fact, a compact decision tree can be created to describe the concept. The problem lies with the information theoretic bias itself, which makes it hard to find this compact tree. Preliminary results suggest similar performance differentials on larger multiplexor problems.
The concept description language and the search algorithm constitute strong biases for any concept learner. The above experiments indicate that ID-like systems can
suffer both from their decision tree language bias and from their information theoretic search bias. When the biases are appropriate, ID-like systems perform quite well. GABIL, however, due to its minimal system bias, performs uniformly well on target concepts of varying complexity. These initial results support the view that GABIL can be used as an effect concept learner although it may not outperform more strongly biased concept learning algorithms whose bias is appropriate for learning simpler target concepts.
6 Conclusions and Future Research
This paper presents a series of initial results regarding the use of GAs as the key element in the design of a system capable of continuously acquiring and refining concept classification rules from interactions with its environment. It is interesting to note that reasonable performance is achieved with minimal a priori bias. The initial results support the view that GAs can be used as an effective concept learner although they may not outperform algorithms specifically designed for concept learning when simple concepts are involved. This paper also sets the stage for the design of three additional GA-based concept learners. First, we wish to implement a variation of the current system that is truly incremental. Second, we are also very interested in understanding the difference between using the Pittsburgh approach and the Michigan approach in this problem domain. The current fixed-length rule representation can be used directly in Michigan-style classifier systems.
Third, we noted early in the paper that there were two basic strategies for selecting a representation for the concept description language. In this paper we developed a representation, which minimized the changes to standard GA implementations. We also plan to explore the alternative strategy of modifying the basic GA operators to deal effectively with non-string representations. We feel that the development and analysis of such systems is an important direction the research community should follow in order to develop additional results on these and other problems of interest.
Acknowledgements
We would like to thank Diana Gordon for her support and for many discussions on the biases in supervised concept learning systems. Diana was also instrumental in helping
us design our experimental methodology. We would also like to thank John Grefenstette and Alan Schultz for many useful comments about GABIL and crossover, J. R. Quinlan
for C4.5, and Paul Utgoff for ID5R.
.