how to create a small search engine

Note: If your don’t need to store keyword frequency, then go with Marmik Bhatt’s LIKE suggestion.
If you have large amount of data and you want to do a keyword search only (i.e. you are not going to be searching for phrases or use concepts like “near”), then you can simply create a keyword table:
CREATE TABLE address
(
id INT
(10) PRIMARY KEY,
/* ... */
);

CREATE TABLE keyword
(
word VARCHAR
(255),
address_id INT
(10),
frequency INT
(10),
PRIMARY KEY(word, article_id)
);
You then scan through the text that you are “indexing” and count each word that you find there.
If you want to do several keywords:
SELECT address.*, SUM(frequency) frequency_sum
FROM address
INNER JOIN keyword ON keyword.address_id = address.id
WHERE keyword.word IN ('keyword1', 'keyword2', /*...*/)
GROUP BY address.id;
Here i’ve done a frequency sum, which can be a dirty way to compare the usefulness of the result, when many are given.
Things to think about:
  • Do you want to insert all keywords into the database, or only those, that have a frequency higher than a specific value? If you insert all your table may become huge, if you insert only higher frequency ones, then you will not find the only article that mentions a specific word, but does so only once.
  • Do you want to insert all the available keywords for the specific article or only “top ones”? In this case the danger is that frequent words that add nothing to the meaning will begin pushing others out. Consider the word “However”, it may be in your article many more times than “mysql”, buy it is the latter that defines the article, not the former.
  • Do you want to exclude words shorter then a specific length of characters?
  • Do you want to exclude known “meaningless” words?
For search engine, I use ‘LIKE’ to search parameters… The query would look like…
SELECT * FROM tbl_keywords
INNER JOIN tbl_addresses ON tbl_addresses.id = tbl_keyword.address_id
WHERE tbl_keywords.keywords LIKE "% $keyword %";
$keyword is a variable retried from GET or POST request from the search bar.
You can also use JSON output of your search result so, using jquery you can provide fast search result output.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top