Forty Six and Two: 2010

Wednesday, 3 November 2010

Weekly Meetings

So I managed to hand in my Statement of Intent on time. All is well and I just met with Conor McBride (my supervisor). He gave me some pointers.

I need to write a few new methods for my sudoku solver program. First off I need to keep a copy of the original board stored somewhere so that I know what moves are mine to make and what moves are not supposed to be changed. I need to write a method to tell if I am 'Busted' or if the move is fine. I'll write a method called last good square that will take the current position and move backwards to determine what the last move made was. I'll need to store a list of the moves made (maybe not, this will just be another board). I also want to know how to iterate through each square and reach a solution. I would like to know what possible numbers I can put in each square. I don't think it'd be wise to create 81 different arrays, but maybe after filling in the original grid I could create a new grid which would be an array of arrays! When each array is created it would check what possible numbers could go into that space and then give it a definite size (i.e. only 2,4,7 can go in here so it will create an array of size 3 with [2,4,7]). Well this is some food for thought. Time to be bored for a bit.

Wednesday, 20 October 2010

Dusoku

So first of all a dump of all the information that was written on the board.

Project description:

collect, prioritize, implement human strategies for making progress in sudoku puzzle
identifying and abstracting re-usable search patterns(exploiting symmetry of rows and columns for example)
rate complexity of strategies, hence of problems which require them(evaluate against problem sample)
figure out what makes a good explanation of a move, hint towards a move or a criticism of a move
reflect on what is sudoku specific and what transfers (what strategies could be used for kakuro for example)
make it fun (evaluation criteria)

Objectives:

an analysis of human strategies for solving sudoku
a presentation of explanation for progress in sudoku
development of library components separating strategy-specific aspects, sudoku specific aspects, general constraint problem kit
delivery as web-based tutor (probably in Java)
evaluation(with respect to graded problem sample, user experience

Initial Plan:

grab a graded problem sample e.g. from www.websudoku.com
implement a basic model of board and solver
develop strategy framework and add basic explanation and guessing strategies
add more sophisticated strategies refactoring library kit
evaluate against sample
build basic web app
develop graphical representation of explanation, hints, criticisms
user experience evaluation
finish report

Okay so that's what was written down on the board. Now for me to emphasise some things.

First of all this system is for Humans! so human strategies must be employed. No silly brute forcing of the problem. It's just not possible for humans to work through every iteration of a sudoku puzzle.

What is progress? Progress may not mean filling in a number, but rather eliminating certain numbers from certain squares thereby allowing you to continue and find a space where a number will fit.

Translating strategies into code will be difficult.

I need to get large library of sudoku puzzles that are graded and keep a track of them. Then use them as a basis for grading them using my own program. (will have some way of seeing what strategies were needed to solve the problem and therefore how difficult it is)

So lets leave this here for now, I'll post more when I am not on my way to class.

Tuesday, 12 October 2010

E-Commerce

So this class it turns out is actually my favourite class of the year so far. Most people think I am mad, but I like the idea of getting to know how to make money from the internet. Obviously it would be awesome to be self employed. I think I could trade making money on the internet for one of my other goals of living in the wilderness and feeding off of the land.

E-Commerce Technologies - Tuesday 28th September 2010

Most of this class was discussing our project work specifically for this class. I am in a group with two others on my course and we have a deadline on Monday already. Also a lot of this class is availiable online in the form of chapters of a book. I don't know if I could make too many notes on this class but I'll try and raise points that the lecturer made.

This class is about how to make money from the internet. Even without selling anything. Turns out the first lecture had no real information so I'll just leave this blank and type up the next one later this evening.

Catch up Session

So I haven't been her in a while. I guess it's just me being lazy but I have beeen assigned my project for the year. The project I was assigne was a Sudoku Explainer. It is a program that is supposed to help people to learn how to play sudoku rather than to just complete these games via a brute force search. A link to the prject information page shall now be provided, http://local.cis.strath.ac.uk/teaching/ug/cs4projects/.

Anyway, back to the matter at hand.

Multimedia Information Access - Wednesday 29th September

Boolean Retrieval:

So this was a very easy and simple lecture. It talked about how to deal with searches that were more than one word. A popular way of doing this is to use boolean logic to determine which is best to search for or which terms to ignore.

We watched a video on youtube about a simple boolean search used in some sort of University database. This introduced the concept of boolean operators (which I already know a lot about ;). An extra boolean term was added (purely in terms of searching) and this was NEAR. NEAR basically means taht you want to search for these two words and you want them to be near each other in the text.

Using a Venn diagram the lecturer showed us how these boolean terms can work, however this was not very effective for searches of more than 4 terms as it became almost impossible to include every term in one diagram. Next in an example from the 2009 exam it was shown that when searching for several different terms it was best to search for the term with the fewest reults as this would allow you to quickly get some initial results whilst the rest of the search was being carried out.

Next we went on to cover Conjunctive Normal Form and Disjunctive Normal Form. To put it simply CNF is a Product of Sums term and DNF is a sum of Products term. To convert between these two forms we can use DeMorgan's Law (i.e. break the line change the sign) and then usual distributive laws to work out the new terms. Also, NOT is not allowed to be applied to a group. It must instead only be applied to single terms.

That may not be too clear but it was nearly 2 weeks ago and I have kind of forgotten most of what the guy said.

Wednesday, 29 September 2010

Back at Uni

So yesterday was my first day back at Uni. Also I should have a flat sorted by the end of the week. Now it's time for me to type up my notes so I can remember them when it comes time for the exams. It's like super secret study.

Multimedia Information Access - Tuesday 28th September 2010:

Information retrieval method. Indexing.

Searches where you simply match a character string are incredibly slow and are not guaranteed to return relevant results. Also they do not recognise words and cannot check for synonyms to help with your search. So to make searches faster there are several different methods that can be used. The method that was discussed in the lecture today was Indexing. Indexing is a way of representing text so that it can be searched quickly.

Indexing is a three step process. The major steps are Tokenisation, stop word removal and Stemming.

Tokenisation involves splitting the document up into separate words removing punctuation and capitalisation. This puts the words into a format so that they are

Stop word removal involves removing words that have little or no meaning. The words with the highest frequency of use do not help when trying to narrow down a search. Removing these words can help by reducing the space required to store the indexed version of the documents. Also similarly the words with the lowest frequency may not be useful at all as these may include spelling mistakes or peoples names. For languages that follow the Zipf distribution this is very easy to do.

Stemming involves removing suffixes to take the word back to its root. This is so that words like book, booking, booked and books all become the same, 'book'. This helps to ensure that the search will recongise a common concept and will return results that would not have been returned had a stemming algorithm not been applied.

The most common stemming algorithm in use today is Porter's Stemming Algorithm. This was written in c and is very fast. However it does still make mistakes. For some words it doesn't take them all the way back to their root, for example Europe vs European. Also it does not recognise proper nouns.

After these steps have been applied the words are then replaced by an id code and every document is now represented as a list of id codes. For each id code there is a list of documents which contain this id. This approach to searching is known as Inverted File. As you no longer have a document with words, you have a word with a list of documents.

Also a record of the position of the word can be taken so that you can run 'proximity type' searches.

That may not make much sense but it's just me writing down what is in my head so that I can remember it more clearly later.