Published
Author Roderic Page
OMG. Playing with extracting identifiers from text, I have a regular expression for GenBank accession numbers that looks something like this: (A[A-Z])[0-9]{6} | (U[0-9]){5} | (D[A-Z])[0-9]{6} | (E[A-Z])[0-9]{6} | (NC_)[0-9]{6}). OK, it won't get everything, but what is more worrying are the things it will pickup that aren't GenBank accession numbers.